Researchers are using cutting-edge AI models to “decipher” ancient scrolls that were heated during the eruption of Mount Vesuvius in 79 AD, which covered much of the Bay of Naples in ash, including the now-famous cities of Pompeii and Herculaneum. Work on deciphering the scrolls began centuries before the artificial intelligence revolution, but a myriad of new technologies is making the task easier and faster than ever before.
The term “AI” is as unwieldy and loosely used as the technology itself. What does it actually mean for AI to decipher what has eluded humanity for centuries? To find out, we spoke to experts working on algorithms and models to decipher and catalog the classics.
Loss and rediscovery of the scrolls
Nearly 2,000 years ago, the Bay of Naples was rocked by the devastating eruption of Mount Vesuvius, burying Pompeii and Herculaneum in ash and causing them to disappear from maps for more than 1,500 years.
Fast forward to 1750, when workers digging a well discovered a marble floor beneath the soil. Further excavation revealed a buried villa containing nearly 2,000 charred scrolls and scorched papyrus fragments. At first, the scrolls were mistaken for fishing nets or charred logs, and many were discarded or torched. Eventually, one of the scrolls was dropped and broke, revealing the blackened cylinder. According to the Getty Museum, the villa's scrolls, now known as the Villa dei Papyrus, are the only surviving library from the Classical world.
Like the frescoes and bone casts from Pompeii and Herculaneum, the scrolls are extremely fragile and nearly impossible to decipher: repeated attempts to laboriously decipher them have resulted in many of them being torn apart, and the information they miraculously contained has been lost to time.
However, some scholars believe that the scrolls read include works by the Greek philosopher Philodemus of Gadara, and that the villa belonged to his patron, Lucius Calpurnius Piso Caesonus, father-in-law of Julius Caesar.
Today, over 300 unopened scrolls remain, and earlier, crude attempts to uncover their contents have been mercifully avoided.
One of the unsealed Herculaneum papyri. Photo: Unknown / Wikimedia Commons
The Vesuvius challenge: Modern technology means no need to crush papyrus
The Vesuvius Challenge was launched in March 2023, a project challenging the public to use AI to identify letters, and ultimately words, hidden in the Herculaneum scrolls. The first word discovered and translated from one of the unopened papyrus scrolls (“purple”) was announced in October 2023. The finder of the word won $40,000 for their efforts, as part of $1 million paid out last year to those involved in the search for the lost library.
Machine learning and computer vision are the two types of artificial intelligence used in the challenge's virtual unwrapping method. Machine learning uses data and algorithms to enable AI systems to mimic human learning, improving accuracy over time. Computer vision is exactly what it sounds like: a field of study that helps computers identify objects and people, and ultimately allows machines to think through what they see.
From top to bottom: a reference photograph, a texture image, a network-generated predicted image, and the network-generated photo-realistic rendering of the letter on a scroll. Image: Parker et al., PLOS One 2019
“New computer vision techniques aimed at virtually opening the unopened Herculaneum papyri bring new hope to Herculaneum papyri, making it possible to decipher scrolls that were last read almost 2,000 years before the eruption of Mount Vesuvius,” Federica Nicolardi, a papyrologist at the University of Naples Federico II and a member of the Vesuvius Challenge papyrology team, said in an email to Gizmodo.
A team that included members of the Vesuvius Challenge trialed the technique with the Ein Gedi scroll in 2015. The work involved taking a three-dimensional volumetric scan of the scroll to reveal its 3D structure. Computer software then interpreted the bright pixels in the scan that represented each layer wrapped around the scroll and the ink remaining on its surface. Finally, the scroll was effectively “unwrapped,” and a digital version of the document was laid out in an easy-to-read form.
The Vesuvius Challenge's goal for 2024 is for teams to decipher 90% of the scrolls they've scanned, with cash prizes awarded for deciphering the first letter on a particular scroll and a bigger cash prize awarded for automatically splitting one of the scrolls. If deciphered, it would mark the first time the scrolls have been read since they were buried under ash.
Why do researchers need AI to read the scrolls?
“A big problem with working with ancient texts is that these texts are often in a fragmented state of preservation,” Thea Sommerschildt, a classics scholar at the University of Nottingham who is not a member of the Vesuvius Challenge, said in a phone interview with Gizmodo. “Machine learning is very good at identifying patterns, patterns in text, and using that to perform specific tasks.”
In classics, AI speeds up and scales processes that were previously laboriously performed by humans, and in the case of the Herculaneum Papyrus, those tasks take several forms.
“Participants figured out how to identify parts of a closed scroll that were likely inked, and then incrementally built a set of labels that could tease out the ink using convolutional neural networks and ultimately a Transformer-style network,” Brent Shields, a computer scientist at the University of Kentucky and principal investigator at the Educe Lab, told Gizmodo in a phone interview.
Simply put, convolutional neural networks are a set of machine learning models that utilize deep learning for their tasks. Convolutional neural networks are particularly useful for classification and computer vision-based tasks, which is why they are useful in processing the faint ink traces left on the carbonized papyrus.
“You can think of this technique as being a bit like pointillism,” Shields says, “We're looking at a very small area of the surface and determining whether that small area is ink or not.”
Transformers are an emerging AI technology that enables models to process huge strings of text and better handle multiple data streams. Such “multimodal” AI systems allow AI to generate images from text input or combine computer vision and natural language processing to read images of handwritten characters. (For those who don't know, the “T” in “ChatGPT” stands for Transformer.)
“Transformer is currently at the cutting edge of computer science because of its unparalleled ability to capture context,” Sommershield said, adding that this ability will not only “help us recover ancient, fragmentary texts,” but also help date texts and predict where they were written.
Computer vision isn't the only AI field used in classic works
The Vesuvius Challenge is just one approach researchers are taking to bring AI to the study of ancient texts.
In 2019, Sommerschild and his project co-leader, Google DeepMind researcher Yanis Asael, developed the Pythian model, a then-state-of-the-art neural network designed to restore ancient Greek texts. Pythia did so by recovering characters from damaged texts. Pythia's character error rate was 30.1%, compared with 57.3% for human epigraphers.
Since then, Sommershield and Asael's team has published a more powerful Transformer-based Ithaca model that uses neural networks to recover ancient texts and identify their origins. As the team states in their paper, Ithaca is “designed to assist and extend historians' workflow.” While the model alone had 62% accuracy in recovering damaged texts, historians who used Ithaca saw their accuracy jump from 25% to 72%. Ithaca and similar models “can unlock the potential for collaboration between artificial intelligence and historians,” the team writes.
In a paper published in the journal Computational Linguistics in 2024, their team presented a comprehensive survey of the use of machine learning to study ancient texts. They found that research is gaining momentum in areas ranging from digitization, restoration, and attribution work to linguistic analysis, textual criticism, and translation.
But the researchers also identified hurdles to overcome. Their data highlighted that different languages, histories, and geographies are represented at different rates in existing studies that use machine learning on ancient texts. As you might expect, ancient Greek and Latin texts were far more over-represented than other scripts, such as cuneiform, ancient Korean, and Indus script. Ensuring that all cultures are represented when researchers deploy machine learning on ancient texts is obviously a job for human researchers, not the models themselves.
Keeping humans in the loop
In all the fuss surrounding the Vesuvius Challenge, it's easy to forget a key fact: the AI is not reading the scrolls itself. This isn't to diminish the team's work; rather, it emphasizes it: the researchers aren't relying on the AI where it doesn't make sense, or where doing so might lead to inaccurate conclusions about the scrolls' contents.
“The AI framework doesn't determine the complete character format,” Shields says. It simply highlights where it recognizes ink on the scroll, “reducing the chance of hallucinations”—in other words, preventing the team's model from mistaking Eta for Theta and confusing the papyrus' meaning.
“It's the human being who sees how the individual ink decisions line up and whether it makes sense as a piece of text,” he added.
A fragment of a papyrus from Herculaneum in the National Library of Naples. Photo: Antonio Masiello/Getty Images
“When you start applying these techniques to ancient languages, you become acutely aware of their shortcomings and their potential,” Somerschild says. “The answer is that you need to keep humans involved.”
There's still a lot of work to be done
Earlier this month, Sommerschield and Assael organized the Machine Learning for Ancient Languages (ML4AL) workshop to foster collaboration and support research momentum in the field.
“It requires professionals, students, practitioners, museum people and the public to get involved, benefit, use, problem-solve, disrupt and really try to get the most out of it,” Somerschild added.
The next step in the Vesuvius Challenge is to create a workflow to scan the scrolls in large-scale chunks so they can be read efficiently. There are around 300 surviving scrolls to work with, and the documents (with conservators as handlers) will need to be transported to a particle accelerator in the UK for scanning. In total, it would cost $30 million to scan all the scrolls today.
So, your burning question: what can we actually learn from these documents discovered in the shadow of Mount Vesuvius? Nicolardi told Gizmodo, “We hope to find more philosophical books that shed light on Greek philosophy, especially those of Epicurus and his disciples, whose books are completely lost outside the Villa Papyri library.”
And that's not all: About 1,100 scrolls were discovered at the Villa dei Papiri in 1752 and 1754, according to the Getty Museum. But the villa site has never been fully excavated, and the project's website says it's “almost certain” that many more scrolls lie buried there. Excavations will be expensive, but by the time that moment comes, the team will have a mountain of scrolls to sift through.
But the scrolls are only one piece of the puzzle. The challenge now is to use AI to improve our understanding of the ancient world, which also means revisiting documents we know well. It's exciting to imagine reading something that hasn't been read in two millennia, but AI will have an impact on the entire classics. Sometimes being able to evaluate something in a new way is just as useful as seeing it for the first time.