The use of human voiceover work is ubiquitous across modern media platforms, from video games to television and movies. But increasingly, the voices you hear on-screen aren’t totally human-made; they’re the result of artificial intelligence.
Respeecher, a voice cloning company founded in 2018 and based in Ukraine, is currently working with LucasFilm to provide voice services for the Star Wars projects. Respeecher’s speech-to-speech technology is responsible for synthesizing the voice of a younger Luke Skywalker in both The Mandalorian and The Book of Boba Fett, as well as restoring James Earl Jones’ iconic Darth Vader voice to its original quality in the Obi-Wan Kenobi series.
The company also digitally recreated the voice of the late NFL coach Vince Lombardi for a 2021 Super Bowl commercial and helped make possible an Aloe Blacc tribute to Avicii, in which Blacc sings in multiple languages — some of which he doesn’t actually speak.
How it Works
Dmytro Bielievtsov, co-founder and chief technology officer at Respeecher, says the process begins with target voice tracks of an actual human. These recordings, usually as long as an hour or two in total, are fed into the company’s AI software tool and analyzed until the voice can be cloned.
Testing is then conducted — to ensure this cloned voice can’t be distinguished from the original voice — before the replicated voice form is applied to a human “source speaker” (an actor reading lines from whatever project is being produced). The result is synthetic speech recordings that feature the emotions, intonations and nuance of real human voice, beyond what robotic-sounding text-to-speech programs can offer.
“In other words,” Bielievtsov says, “you talk into a microphone and the tech can make you sound exactly like a young Luke Skywalker.” In the case of The Mandalorian, the company captured actor Mark Hamill’s younger target voice by analyzing old interviews, voice recording dubs and automated dialogue replacements, the latter of which are post-production tracks used to improve an actor’s dialogue.
Respeecher also has a voice marketplace on its website; this allows clients to pick out what voices they’d like to use for their projects, whether they’re making a television commercial, an audiobook or some other form of content.
The company is currently working on real-time voice conversion technology, which synthesizes a person’s voice in real-time. Bielievtsov says the present system forgoes some quality in favor of speed and is so far being used in limited capacities, but its applications are inspiring. In healthcare, he explains, the technology could help people with voice-related impairments from procedures like laryngectomies — allowing them to once again “speak” with their natural voice.
Yet YouTube videos of what Respeecher’s technology can do to a person’s voice may provoke an uncanny valley response in some people. The reveal that filmmaker Morgan Neville digitally recreated the late Anthony Bourdain’s voice in the documentary Roadrunner, for example — voicing several lines Bourdain wrote but never actually spoke — generated significant controversy.
Read More: The Creepy Feeling in the Uncanny Valley
Similarly, the Emmy Award-winning 2020 short film In Event of Moon Disaster, produced by MIT’s Center for Advanced Virtuality to explore deepfake technologies, included Respeecher’s audio help. The documentary featured Richard Nixon reading the speech to be given if the Apollo 11 moon mission had never made it back to Earth. Nixon, of course, never actually said these words. But in this alternate history, his deepfaked speech rewrites reality.
It’s not hard to imagine what the technology might look like in the wrong hands. Yet Bielievtsov says Respeecher takes the ethics and safety concerns of its technology very seriously.
“We achieve ethical use of synthetic voices by requiring permissions to clone voices and limit the ability to copy anyone’s voice at Voice Marketplace,” he says, adding that the company is developing two technical defenses for its technology: a synthetic speech detector and audio watermarking.
Way of the Future?
Bielievtsov sees the future of AI voice replication as having widespread applications across many fields. Some of those applications are already yielding great results.
For example, English actor Michael York (who many know as Basil Exposition in the Austin Powers franchise) suffers from the rare disease amyloidosis. In recent years, speech has been difficult for him due to tongue swelling, one of the disorder’s symptoms.
When tasked with recording new narration for an animated medical film he’d narrated several years prior, York found his voice was not what it once was. Fortunately, AI technology from Respeecher helped match York’s target voice using data from the prior recording session, successfully allowing the film to be updated.
Bielievtsov believes voice cloning for cinematography, gaming, streaming and content creation is likely to increase in the coming years. Even call centers can now use it.
“Our team wants to democratize the technology, so that smaller film and TV studios and video game developers can use it to stretch their budgets further,” he says. “We want small creators to compete with huge studios with their ideas, implementation and creativity, but not with budgets.”