Could AI Language Models Like ChatGPT Unlock Mysterious Ancient Texts?

Ancient writings like the Indus script and Voynich Manuscript have baffled scholars for decades. Some researchers think AI systems could help reveal their secrets.

By Kenna Hughes-Castleberry
Apr 10, 2023 3:00 PMApr 11, 2023 4:47 PM
Indus script
These terracota seals show text and images written by the ancient Indus Valley civilization. (Credit: DARSHAN KUMAR/Shutterstock)


Sign up for our email newsletter for the latest science news

Approximately 4,000 years ago, an ancient civilization living in the Indus Valley (today’s India and Pakistan) comprised 10 percent of the world’s population. Though few records remain about this group of people, archaeologists found they were advanced enough to have their own writing system — which has still yet to be deciphered.

Known as the Indus script, the mysterious text has puzzled scholars, linguists and even cryptographers for decades. Only a few hundred symbols have been classified, as scientists haven't discovered a “Rosetta stone,” or key, for decoding this unknown language. But recent advancements in artificial intelligence — including large language models like ChatGPT — could change that, providing further insights into ancient civilizations.

Uncovering the Indus Valley Script

While the Indus Valley Civilization was formally discovered in the 1920s, it wouldn’t be until 1999 that the first pieces of its script were unearthed. Seals, pottery and even bones were inscribed with strange symbols accompanied by animal figures. These complicated inscriptions made the discovery all the more tempting, placing the secrets to this complex society just out of reach.

Read More: Why We Still Can't Read the Writing of the Ancient Indus Civilization

“[The script] will help us learn a lot about this ancient civilization, their lifestyle [and their] knowledge about the world,” says Satish Palaniappan, an applied machine learning scientist at Microsoft. “All of that is locked up information we currently do not have access to."

Unlocking the Indus Valley Script

Palaniappan is one of many researchers using AI algorithms to try and decode the script. Along with a colleague, he developed an algorithm to identify similar characters in the text, looking for patterns in particular character frequencies, according to a recently published paper in the journal PLOS. Scholars can then use these character frequencies to create a key for decipherment.

Other antiquated languages, like ancient Egyptian, were deciphered with a multilingual key: the Rosetta stone. In that case, the stone connected an already-decoded speech (the Greek alphabet) with an undeciphered one (Egyptian hieroglyphics), allowing archaeologists to decode the unknown language.

Since the Indus Valley language lacks a multilingual key, that forces researchers like Palaniappan to think creatively in finding connections between the Indus script and other languages.

“With recent advantages in Natural Language Processing, especially with large language models like ChatGPT-3 and ChatGPT-4, we can try to fine-tune or provide more context into languages that we believe were derived from the Indus Scripts, like the Brahmi Scripts," he says. "And see if these generative models can get creative and figure out what each symbol means and how they fit into a language structure.”

Other Efforts to Unlock the Indus Script

Similarly, Peter Revesz, a professor of computing at the University of Nebraska-Lincoln, is trying to connect the Indus script with other languages. Like Palaniappan, Revesz, along with student Shruti Daggumati, grouped characters within the Indus Valley Script and compared them to similar-looking characters in both the Brahmi Scripts and the Phoenician alphabet, which had roots in Minoan culture.

Read More: Ancient Humans’ First Written Words Are 20,000 Years Old

“You feel like an archaeologist mixed with a computer scientist,” says Daggumati in a Youtube video about the project. “You get to be your own Indiana Jones.”

In a 2018 paper, Revesz and Daggumati found that the signs of the Indus script resembled some characters of the Phoenician alphabet with 90% certainty, according to the AI algorithm they used.

“We can think of that as a Bronze Age version of the Silk Road,” Revesz said, highlighting the connection between the two cultures. “It is possible that the use of scales, weights, and writing spread through these trade routes. Hence, the Indus Valley and Linear A script could be related. I’m developing AI algorithms to help investigate that possibility, which would be a key for deciphering the Indus Valley Script.”

Deciphering the Voynich Manuscript

Unlike the Indus script, a mysterious late-medieval text known as the Voynich Manuscript offers a wealth of characters for archaeologists and linguists to analyze. Written around 600 years ago, the 240-page text is made up of 25 to 30 unknown letters and characters. Adjoining the language, there are 126 colorful illustrations of alien-looking plants within its pages, of which 124 have been botanically identified based on the plant’s flower, leaf, or root structure.

A similar process has yet to be accomplished for the manuscript’s language, which has stumped cryptographers and linguists since its discovery in 1912.

“Deciphering the Voynich Manuscript might give some historical insight into medieval life,” says Kevin Knight, a former professor of computer science at the University of Southern California. “But that’s not what drives people to try to decipher it. They do it for the intellectual challenge. It would be great to be the first person in 500 years to read and understand such a mysterious document.”

Could AI Decode These Ancient Texts?

Knight and other scholars believe that the manuscript was written as a cipher, perhaps even as an anagram, which makes decoding it even more of a puzzle. For Knight, this is where an AI algorithm may prove helpful.

“If I show you a long cipher, you may notice that ‘P’ is always followed by ‘D,’” Knight says. “You might guess that ‘P’ and ‘D’ stand for ‘Q’ and ‘U’ respectively because that’s how QU works in English. Once you know ‘D’ stands for ‘U,’ you might look for patterns related to ‘U.’ The computer can do this reasoning faster and better than a person.”

Yet the medieval language encoded in the Voynich Manuscript could be an older version of English, French, or Latin, making the decipherment trickier. Knight continues to utilize AI algorithms to try to decode the Voynich Manuscript, but is still determining if it can be solved with the current versions of AI models, like ChatGPT.

“Generally speaking, GPT is good at carrying out straightforward tasks that don’t require trial-and-error with a pencil and eraser,” Knight says. “For example: adding numbers, translating a sentence, counting words, writing a paragraph on topic X, etc. It’s less good at solving complex puzzles. But, of course, future versions of GPT may very well learn how to do things like this.”

The Voynich Manuscript and the Indus Valley Script are some of the most complicated language puzzles out there. As such, many scholars around the world will no doubt anxiously wait for AI advancements that may help reveal the mysteries behind these ancient texts.

Read More: How Mathematicians Cracked the Zodiac Killer’s Cipher

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Our List

Sign up for our weekly science updates.

To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2024 Kalmbach Media Co.