Stars from Hollywood's Golden Age are being resurrected through AI voice cloning deals with celebrity estates, suggesting that new business models are addressing some of the “Wild West” concerns about unauthorized AI impersonation.
ElevenLabs, an audio tech startup backed by venture capitalists including Andreessen Horowitz and Sequoia, has inked several deals with the estates of legendary actors, including Burt Reynolds, Judy Garland, James Dean and Sir Laurence Olivier, for its IconicVoices tool, which lets users listen to AI-generated voices read to them through an audiobook app.
Founded in 2023, ElevenLabs creates voices for books, news articles, video game characters, film pre-production, social media and advertising. The company has already worked with publishers such as The New York Times and The Washington Post, and was selected to participate in Disney's accelerator program earlier this year.
“It takes about 30 minutes of high-quality audio to create a professional-quality voice clone,” says Sam Sklar, a member of ElevenLabs' growth team, and the voices are generated from a catalog of celebrities. Once created, they can be used to read text (articles, PDFs, ePubs, newsletters, and other text content). However, the audio and content can't be exported, and everything is heard in the text-to-speech app.
For example, a user can listen to James Dean reading an article within the app, but they won't have access to audio for content that isn't already in the app.
Such deals could help set the boundaries for a future in which AI-generated audio content becomes a more controlled and curated domain without causing controversy. Google Play and Apple Books already use some AI-generated voices, but replicating the pace, intonation and emotion of a human voice is a high bar to overcome.
The AI industry has been plagued by concerns over the use of celebrity voices, but the company reversed course in May after actress Scarlett Johansson accused OpenAI of stealing her voice after rejecting a licensing offer.
“We are fully aware of the risks associated with synthetic media and take the safe use of our tools very seriously,” Scalar said. Safeguards include proactive content moderation, enforceable accountability through bans, and special provisions to safeguard the impact of AI voices on the 2024 election.
There remains significant unease among the current generation of actors surrounding the use of AI in generating voice content. Video game voice actors have expressed concerns, and last year's film and TV strikes were largely due to fears over the use of AI. The use of iconic voices sold by estates is a niche market that may be able to avoid these pitfalls, and represents a new revenue stream from AI rather than one that has been lost to it.
The issue of celebrity voice impersonation predates AI, such as Frito-Lay's use of a Tom Waits impersonator in its ads in 1988 and the Waits case in 2007, when Waits himself had long rejected an ad deal. AI offers an easier path to creating voice impersonators; a recent lawsuit against AI startup Lovo for improperly and unpaid use of voice actors to generate AI voices is a reminder that the world of AI voice generation is likely to remain somewhat complicated and litigious. (Lovo denies the allegations in the lawsuit, as well as the revenue-sharing model it offers actors for its cloned voices.)
Steve Cohen, a partner at the law firm Pollock & Cohen who is representing voice actors in a separate lawsuit alleging their voices were copied without permission, said it was difficult to assess the protections in place without seeing the specific language in Iconic Voice's contracts.
ElevenLabs points out how its IconicVoices tool obtains permission and manages the use of audio.
“Giving yourself permission to express your opinion is one of the fundamentals,” Cohen said. “I think the keys are permission, compensation and control.”
Clearer new laws might also act as a deterrent to those seeking to misappropriate voices — “for the exceptions, not the hardened bad guys,” Cohen said — but he added, quoting Bette Davis' line from All About Eve: “'Fasten your seat belt. You're going to be in for a wild ride.'”
The question of how realistic the cloned voices will sound is also still evolving. Many experts say the AI doesn't “know” what it's saying, limiting the quality of its performance. Scalar said ElevenLabs' latest level of voice quality is indistinguishable from real human speech. “ElevenLabs' text-to-speech tool can understand the context of words,” he said.
AI is only as good as the model used to train it, and actor audio datasets are part of that process.
“Neural models work by mimicking and memorizing the nuances and patterns present in the training data,” says Nauman Dawalatabad, a postdoctoral researcher at the MIT Computer Science and Artificial Intelligence Laboratory who has done extensive research on AI speech generation. “The quality and diversity of the training data has a significant impact on the model's performance.”
Movie star singing voices could help AI mimic and learn by providing “high-quality audio datasets for training and fine-tuning large-scale models,” which Dawaltabad said is essential to the process. But he expressed concern about whether “sounding human” is an appropriate test for the AI voice field, as it could intensify the divide between human voices and synthetic voices.
Voice actors are divided on the technology, with some refusing to consider it for contract consideration, while others say they can't ignore the opportunity to replicate their voices to produce some audiobooks faster and cheaper. “AI technology can help with workflow. AI is not a new tool for voice actors, producers and publishers — many of them are using AI to improve quality control in post-production,” Michelle Cobb, executive director of the Audio Publishers Association, told CNBC last year.
According to Dawaratabad, recent generative models have made significant advances over their predecessors, making it increasingly difficult to distinguish fake voices from real ones using only the ears. AI voice licensing can ease the burden on voice actors, but it cannot replace them, he added, because “AI steps into the process by focusing on correcting and enhancing ineffable aspects such as intonation, warmth, and emphasis that remain a challenge.”