Those of us for whom Star Trek serves as a benchmark for technological progress can only bemoan the fact that hopes for faster-than-light travel to other galaxies seem to be receding at warp speed, given that we no longer even have faster-than-sound travel to France. But I would prefer to focus on the bright side: We’re rapidly closing in on the Universal Translator, which means that when I do finally arrive in France, I’ll be able to communicate as easily as if I were on Earth.
The Universal Translator, of course, was a handheld device that instantly converted Captain Kirk’s futuristically clipped English into the language of whichever vaguely humanoid alien was offering to buy him a blue drink. It is impossible to overemphasize the potential usefulness of such a device on a visit to France, whose vaguely humanoid populace turns Klingon when confronted by a nonspeaker of their primitive but pretty language. Imagine the delight of the garçon when I mumble into my translator, “Can you bring me a good California chardonnay to drown the stench of this cheese?” and out comes flawless French. And back from the device will come a translation of the waiter’s enthusiastic response.
In fact, I already have something surprisingly close to a Universal Translator in my pocket, courtesy of a growing number of automated spoken-language-translating services that run on smartphones. I’m not counting on getting my favorite blue drink in any bar in the world just yet: “These systems still make mistakes that a 4-year-old wouldn’t make,” says Ashish Venugopal, a researcher at Google who works on the company’s Google Translate service. But unlike most 4-year-olds, Google has about a googol million dollars to throw at the problem, and more computing power, too.
Some of that power is spent prowling the web 24/7 to find examples of text—on websites, in email, or anywhere else—that can be paired with translations of that text into another language. The pairs of documents are digested by Google’s computers in chunks of three or so words, with each chunk analyzed and matched to its best translation. Having built in this way a constantly growing database of millions of translation chunks, Google Translate is armed to take on any sentence, find the set of phrases that most closely matches it, and spit back the translation into any of 64 languages. You can go to www.translate.google.com to try the results. Go ahead, do it now. I’ll wait right here.
Not bad, right? This take on machine translation is called a statistical approach because it involves finding the most likely phrase match across a giant sample. Over the past decade it has become the dominant model in the field, largely replacing longstanding efforts of human linguists to painstakingly draw up lists of rules of translation to guide computers. A major benefit of the statistical approach is simplicity, notes Robert Palmquist, CEO of SpeechGear, a machine translation company in Northfield, Minnesota. “A statistical system takes about a third as much time to develop as a rule-based system,” he says, “and it adapts much more easily to constantly changing vocabulary.” And as computing power becomes cheaper, statistical systems will be able to digest larger chunks of words, improving accuracy.
Now throw in a system that recognizes speech—also statistically driven, except it deals with chunks of phonemes, or spoken sounds, rather than written words—and add the sort of text-to-speech function that has been annoying us for years in our various talking devices, and you’ve got a complete system for translating a spoken language on the fly. Google offers Google Translate in Conversation Mode on smartphones for free, and less-free services for phones, tablets, or other devices are available from SpeechGear, IBM, SayHi Translate, and Jibbigo.
I forced several native speakers of other languages to engage me face-to-face in Google-mediated conversations. I would say something clever into my phone, then thrust the phone in my interlocutor’s face so he could immediately hear the translation. The phone would then Anglicize my reluctant partner’s non-English reply. The verdict? It works—sort of. Once the thrill of having a phone translator wears off (which takes about 12 seconds), shortcomings start to present themselves. “It’s good, but it’s not recognizing all the words or putting them all in the right order,” Shanghai high-school exchange student Tony Liu said after listening to Google’s efforts at translating my English to his Mandarin, and vice-versa.
To improve accuracy, some translation efforts, including SpeachGear’s, are merging statistical approaches with rule-based systems. “Statistical systems have a tough time with ambiguity, as in ‘He was safe at home,’ ” Palmquist says. “A rule-based system can more easily be told to check if the context is a baseball game.” Rule-based systems also take a lot less computing power and data storage. Google’s service works only if you have a good data connection, because it’s zipping every word back and forth between your phone and the Google servers on Mount Olympus, which do all the heavy lifting.
“You need network access to gigabytes of data for statistical systems,” says Mireille Boutin, a professor of engineering at Purdue University. “But what if you’re traveling and don’t have network access?”
A group led by Boutin is addressing that rather sharp limitation with a translator that runs entirely on a cell phone, no connection needed. To keep the system compact, it is designed specifically to translate conversation relating to ordering in a restaurant. That is harder than it may sound, given that dishes often don’t translate well. For instance, “the ingredients for paella vary depending on the country, the region, and the season,” Boutin notes. “It can take 10 words to translate a dish that has a one-word name in another language.”
But with data connections getting better all the time, the statistical approach is gaining favor, and a number of schemes are being rolled out to improve statistical techniques. One company is setting up a system to tap into the translational skills of the suckers who do stuff online for little or no money, or as leading-edge businesses are careful to call it, crowdsourcing.
Alex Buran, CEO of Translation Services USA in New York, got the brainstorm a year ago to invite people visiting his corporate website to edit or approve small chunks of text processed by the company’s machine-translation engine. If two or more other visitors approve one of those chunks, it enters the engine’s database as the new preferred translation. The engine gets better, and the winning translators get an award of 10 cents or more. (Average income for serial translators is about $11.50 an hour.) “We believe we’ll make the machine results so accurate that it will displace human translators in the end,” Buran says, adding that some 7,500 people are hard at it today.
Still, the gap between machine translators and their human counterparts may never be completely closed. “These programs can’t read body language or tone of voice, or deal with new colloquialisms or unusual variations in dialect,” says Craig Schlenoff, a mechanical engineer at the National Institute of Standards and Technology who has evaluated translation software used by U.S. soldiers in Afghanistan for “knock and talks”—visits to local residents for gathering intelligence and improving relations. And Bowen Zhou, who heads the speech-to-speech translation research group at IBM, points out that there are situations demanding subtleties of translation unlikely to be mastered by a machine anytime soon. “Speeches by heads of state require a specific play of words to carry the intended emotion and message,” he says. “And legal documents contain careful constructions with specifically tailored meanings.”
Personally, I’m relieved to hear it. Because that means my imperfect translating cell phone really is just like the Star Trek version. As Kirk himself said of the Universal Translator: “Not 100 percent efficient, of course. But nothing ever is.”