New OpenAI voice generator just needs a 15 seconds clip to clone your voice

Published March 31st, 2024 - 08:49 GMT
OpenAI Voice Engine, AI Voice Generator
Illustrative render of a Humanoid AI robot working at a radio station studio with OpenAI Logo (Shutterstock/ Edited by ALBAWABA)

ALBAWABA – The AI startup and ChatGPT maker, OpenAI, has shared on a blogpost their new digital voice generator, which it says is able to generate speech with an authentic tone using just a 15-second audio snippet, dubbed “Voice Engine.”

A broader release of the application, which OpenAI says they created back in 2022 and have been testing it in a variety of products behind the scenes, is being held back due to the possibility of abuse, particularly in an election year, the blogpost reads.

Currently, Voice Engine is available for preview to a very limited number of OpenAI partners like the storytelling platform HeyGen, the educational Age of Learning, AI communication app Livox, and Dimagi and Lifespan from the health care industry.

“These small-scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI states, adding that it hopes to make more people engage in the conversation of the potential widespread of synthetic voice.

“Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale,” the AI pioneer notes.

In an interview with TechCrunch, Jeff Harris, who is working on Voice Engine, explained that the model was trained on a combination of licensed and openly accessible materials, amid lawsuits brought against OpenAI for allegedly using unlicensed content to train its AI models, violating Intellectual Property laws.

Open AI is demanding that the partners obtain the original speaker's "explicit and informed consent," to refrain from creating channels that allow individual users to produce their own voices, and notify audiences that the voices being listened to are AI generated, while also water-marking outputs from the model to trace back the origin when the need arises.
 

Subscribe

Sign up to our newsletter for exclusive updates and enhanced content