OpenAI has developed a groundbreaking tool called Voice Engine, capable of creating highly convincing voice clones from just 15 seconds of audio. However, due to concerns over potential misuse for spreading misinformation, especially in a globally significant election year, the AI lab has opted not to release this tool widely. Instead, OpenAI emphasizes a cautious approach, prioritizing the evaluation of the technology’s impact through limited, responsible deployment.
“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” OpenAI stated in a blog post. The company expresses its intent to engage in discussions and conduct small-scale tests to guide decisions on the broader availability of this technology.
OpenAI revealed that Voice Engine, initially crafted in 2022 and integrated into ChatGPT’s text-to-speech function, has seen selective use by partner entities. These collaborations highlight the tool’s beneficial applications across various sectors. For instance, the education technology firm Age of Learning employs it to produce scripted voiceovers. Meanwhile, the “AI visual storytelling” app HeyGen allows users to generate fluent translations of recordings that retain the original speaker’s accent and voice nuances, such as producing English speech with a French accent from a French speaker’s audio sample.
A particularly poignant application involved researchers from the Norman Prince Neurosciences Institute using Voice Engine to “restore the voice” of a young woman who lost it due to a vascular brain tumor, working from a low-quality 15-second audio clip of her presentation.
Despite these promising uses, OpenAI has decided to “preview but not widely release this technology at this time.” The organization aims to “bolster societal resilience against the challenges brought by ever more convincing generative models.” As part of its immediate recommendations, OpenAI suggests moving away from voice-based authentication for securing access to bank accounts and other sensitive services.
Furthermore, OpenAI advocates for the development of “policies to protect the use of individuals’ voices in AI” and emphasizes the importance of “educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content.”
To ensure accountability, Voice Engine’s outputs are watermarked, enabling the tracing of any generated audio’s origins. OpenAI also requires that its partners obtain “explicit and informed consent from the original speaker” and prohibits the creation of personalized voice clones by individual users.
While OpenAI’s Voice Engine is notable for its efficiency and minimal audio sample requirements, other entities like ElevenLabs offer similar capabilities, allowing for the creation of voice clones with just a few minutes of audio. To address potential abuses, ElevenLabs has implemented a “no-go voices” feature to block the generation of voice clones for political figures actively involved in significant electoral campaigns in the US and the UK.