OpenAI introduces AI technology that reproduces human voices
OpenAI, like several other companies, has created a new type of AI technology capable of generating synthetic voices rapidly and easily
Initially, OpenAI released a tool enabling users to generate digital images by describing them. Subsequently, it developed comparable technology for creating full-motion videos resembling those from Hollywood movies.
Now, OpenAI has revealed a technology capable of replicating a person’s voice. The prominent AI startup announced on Friday that a select group of businesses is testing a new OpenAI system called Voice Engine. This system can recreate a person’s voice from a 15-second recording. By uploading a recording of yourself along with a paragraph of text, the system can read the text using a synthetic voice that mimics your own.
The text doesn’t need to be in your native language. For instance, if you’re an English speaker, it can replicate your voice in Spanish, French, Chinese, or numerous other languages.
OpenAI is withholding the technology from broader distribution as it continues to evaluate its potential risks. Similar to image and video generators, a voice generator could facilitate the dissemination of misinformation on social media. It could also enable criminals to impersonate individuals online or during phone conversations.
The company expressed particular concern that such technology could be utilized to bypass voice authenticators that secure access to online banking accounts and other personal applications.
“In an interview, OpenAI product manager Jeff Harris emphasized the sensitivity of this matter and the importance of handling it correctly.”
The company is exploring methods to watermark synthetic voices or implement controls that restrict the use of the technology with the voices of politicians or other prominent figures.
In February, OpenAI adopted a similar approach with its video generator, Sora, by showcasing the technology but refraining from public release.
The company is investigating methods to watermark synthetic voices or implement controls to prevent the use of the technology with the voices of politicians or other prominent figures.
In February, OpenAI followed a similar strategy with its video generator, Sora, showcasing the technology without publicly releasing it.
OpenAI is one of several companies that have developed a new type of AI technology capable of rapidly and easily generating synthetic voices.
Businesses can utilize these technologies to produce audiobooks, provide voices for online chatbots, or even create an automated radio station DJ. Since last year, OpenAI has employed its technology to enable a version of ChatGPT to speak. Additionally, it has long provided businesses with a variety of voices for similar applications, all constructed from recordings provided by voice actors.
However, the company has not yet released a public tool that would enable individuals and businesses to replicate voices from a brief recording, as Voice Engine does. According to Harris, the ability to replicate any voice in this manner is what renders the technology hazardous. He noted that the technology could be particularly risky in an election year.
In January, residents of New Hampshire received robocall messages that discouraged them from voting in the state primary. The voice in the calls was likely artificially generated to resemble President Joe Biden. The Federal Communications Commission later banned such calls.
Harris stated that OpenAI currently has no immediate plans to profit from the technology. He mentioned that the tool could be especially beneficial to individuals who have lost their voices due to illness or injury.
He showcased how the technology had been utilized to recreate the voice of a woman who had lost hers due to brain cancer. According to Harris, she was now able to speak after providing a brief recording of a presentation she had delivered during high school.