AI-Generated Voice Firm Clamps Down After 4.chan Makes Celebrity Voices for Abuse

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,961
Reputation
7,413
Daps
135,808

Imran Khan deploys AI clone to campaign from behind bars in Pakistan​


PTI party uses ‘voice clone’ of imprisoned opposition leader to give impassioned speech in ‘virtual rally’

Agence France-Presse in Islamabad

Mon 18 Dec 2023 06.07 EST



Pakistan’s Imran Khan uses AI-crafted speech to call for votes from prison – video

Artificial intelligence allowed Pakistan’s former prime minister Imran Khan to campaign from behind bars on Monday, with a voice clone of the opposition leader giving an impassioned speech on his behalf.

Khan has been locked up since August and is being tried for leaking classified documents, allegations he says have been trumped up to stop him contesting general elections due in February.

His Pakistan Tehreek-e-Insaf (PTI) party used artificial intelligence to make a four-minute message from the 71-year-old, headlining a “virtual rally” hosted on social media overnight on Sunday into Monday despite internet disruptions that monitor NetBlocks said were consistent with previous attempts to censor Khan.

PTI said Khan sent a shorthand script through lawyers that was fleshed out into his rhetorical style. The text was then dubbed into audio using a tool from the AI firm ElevenLabs, which boasts the ability to create a “voice clone” from existing speech samples.

“My fellow Pakistanis, I would first like to praise the social media team for this historic attempt,” the voice mimicking Khan said. “Maybe you all are wondering how I am doing in jail,” the stilted voice adds. “Today, my determination for real freedom is very strong.”

The audio was broadcast at the end of a five-hour live-stream of speeches by PTI supporters on Facebook, X and YouTube, and was overlaid with historic footage of Khan and still images.

It was bookended with genuine video clips from the onetime cricket star’s former speeches, according to PTI, but a caption appeared at intervals flagging it as the “AI voice of Imran Khan based on his notes”.

“This was a no-brainer for us, when Imran Khan is no longer there to actually meet at a political rally,” said the US-based PTI social media chief Jibran Ilyas. “It was to get over the suppression.”

The PTI was the first political party in Pakistan to widely harness the potential of social media, using apps to target younger audiences who carried them to power five years ago.

“We wanted to get in election mode,” Ilyas told AFP. “No PTI political rally is complete without Imran Khan.”

State censors banned Khan from the airwaves earlier this year after his brief arrest in May sparked riots.

NetBlocks said social media was restricted for seven hours starting late on Sunday in an incident “consistent with previous instances of internet censorship” targeting Khan.

Nonetheless, the virtual rally was viewed by more than 4.5 million people across Facebook, X and YouTube.

“It wasn’t very convincing,” said the 38-year-old business manager Syed Muhammad Ashar in the eastern city of Lahore. “The grammar was strange too. But I will give them marks for trying.”

“Frankly, nothing can replace a real rally and a real speech.”

Hussain Javed Afroze, a media worker, praised the digitally delivered oration. “No other party uses technology like PTI does,” the 42-year-old said. “These are new tools, so I think it’s a positive thing to use them.”

Analysts have long-warned that bad actors may use artificial intelligence to impersonate leaders and sow disinformation, but far less has been said on how the technology may be used to skirt state suppression.

Khan, a hugely popular figure, was ousted last year after falling out with Pakistan’s military leaders, who analysts agree influenced his rise to power in 2018.

In the aftermath, he led an unprecedented campaign of defiance, accusing top brass of conspiring with the US to eject him and saying senior officers plotted an assassination attempt that left him wounded.

After supporters rioted following his May arrest in May, the PTI has been targeted in a huge crackdown by the military establishment, which has directly ruled Pakistan for more than half of its history.

Pakistan’s election commission confirmed on Friday that elections will be held on 8 February.

While behind bars Khan was replaced as the leader of PTI but he remains the figurehead of the party.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,961
Reputation
7,413
Daps
135,808

Voice cloning startup ElevenLabs lands $80M, achieves unicorn status​

Kyle Wiggers @kyle_l_wiggers / 3:01 AM EST•January 22, 2024

speech-recognition

Image Credits: Bryce Durbin/TechCrunch

There’s a lot of money in voice cloning.

Case in point: ElevenLabs, a startup developing AI-powered tools to create and edit synthetic voices, today announced that it closed an $80 million Series B round co-led by prominent investors including Andreessen Horowitz, former GitHub CEO Nat Friedman and entrepreneur Daniel Gross.

The round, which also had participation from Sequoia Capital, Smash Capital, SV Angel, BroadLight Capital and Credo Ventures, brings ElevenLabs’ total raised to $101 million and values the company at over $1 billion (up from ~$100 million last June). CEO Mati Staniszewski says the new cash will be put toward product development, expanding ElevenLabs’ infrastructure and team, AI research and “enhancing safety measures to ensure responsible and ethical development of AI technology.”

“We raised the new money to cement ElevenLabs’ position as the global leader in voice AI research and product deployment,” Staniszewski told TechCrunch in an email interview.

Co-founded in 2022 by Piotr Dabkowski, an ex-Google machine learning engineer, and Staniszewski, a former Palantir deployment strategist, ElevenLabs launched in beta around a year ago. Staniszewski says that he and Dabkowski, who grew up in Poland, were inspired to create voice cloning tools by poorly dubbed American films. AI could do better, they thought.

Today, ElevenLabs is perhaps best known for its browser-based speech generation app that can create lifelike voices with adjustable toggles for intonation, emotion, cadence and other key vocal characteristics. For free, users can enter text and get a recording of that text read aloud by one of several default voices. Paying customers can upload voice samples to craft new styles using ElevenLabs’ voice cloning.

Increasingly, ElevenLabs is investing in versions of its speech-generating tech aimed at creating audiobooks and dubbing films and TV shows, as well as generating character voices for games and marketing activations.

Last year, the company released a “speech to speech” tool that attempts to preserve a speaker’s voice, prosody and intonation while automatically removing background noise, and — in the case of movies and TV shows — translates and synchronizes speech with the source material. On the roadmap for the coming weeks is a new dubbing studio workflow with tools to generate and edit transcripts and translations and a subscription-based mobile app that narrates webpages and text using ElevenLabs voices.

ElevenLabs’ innovations have won the startup customers in Paradox Interactive, the game developer whose recent projects include Cities: Skylines 2 and Stellaris, and The Washington Post — among other publishing, media and entertainment companies. Staniszewski claims that ElevenLab users have generated the equivalent of more than 100 years of audio and that the platform is being used by employees at 41% of Fortune 500 companies.

But the publicity hasn’t been totally positive.

The infamous message board *****, known for its conspiratorial content, used ElevenLabs’ tools to share hateful messages mimicking celebrities like actress Emma Watson. The Verge’s James Vincent was able to tap ElevenLabs to maliciously clone voices in a matter of seconds, generating samples containing everything from threats of violence to racist and transphobic remarks. And over at Vox, reporter Joseph Cox documented generating a clone convincing enough to fool a bank’s authentication system.

In response, ElevenLabs has attempted to root out users repeatedly violating its terms of service, which prohibits abuse, and rolled out a tool to detect speech created by its platform. This year, ElevenLabs plans to improve the detection tool to flag audio from other voice-generating AI models and partner with unnamed “distribution players” to make the tool available on third-party platforms, Staniszewski says.

ElevenLabs

ElevenLabs offers an array of different voices, some synthetic, some cloned from voice actors.

ElevenLabs has also faced criticism from voice actors who claim that the company uses samples of their voices without their consent — samples that could be leveraged to promote content they don’t endorse or spread mis- and dis-information. In a recent Vice article, victims recount how ElevenLabs was used in harassment campaigns against them, in one example to share an actor’s private information — their home address — using a cloned voice.

Then there’s the elephant in the room: the existential threat platforms like ElevenLabs pose to the voice acting industry.

Motherboard writes about how voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them — sometimes without commensurate compensation. The fear is that voice work — particularly cheap, entry-level work — will eventually be replaced by AI-generated vocals, and that actors will have no recourse.

Some platforms are trying to strike a balance. Earlier this month, Replica Studios, an ElevenLabs competitor, signed a deal with SAG-AFTRA to create and license digital replicas of the media artist union members’ voices. In a press release, the organizations said that the arrangement established “fair” and “ethical” terms and conditions to ensure performer consent — and negotiating terms for uses of digital voice doubles in new works.

Even this didn’t please some voice actors, however — including SAG-AFTRA’s own members.

ElevenLabs’ solution is a marketplace for voices. Currently in alpha and set to become more widely available in the next several weeks, the marketplace allows users to create a voice, verify and share it. When others use a voice, the original creators receive compensation, Staniszewski says.

“Users always retain control over their voice’s availability and compensation terms,” he added. “The marketplace is designed as a step towards harmonizing AI advancements with established industry practices, while also bringing a diverse set of voices to ElevenLabs’ platform.”

Voice actors may take issue with the fact that ElevenLabs isn’t paying in cash, though — at least not at present. The current setup has creators receiving credit toward ElevenLabs’ premium services (which some find ironic, I’d wager).

Perhaps that’ll change in the future as ElevenLabs — which is now among the best-funded synthetic voice startups — attempts to beat back upstart competition like Papercup, Deepdub, ElevenLabs, Acapela, Respeecher and Voice.ai as well as Big Tech incumbents such as Amazon, Microsoft and Google. In any case, ElevenLabs, which plans to grow its headcount from 40 people to 100 by the end of the year, intends on sticking around — and making waves — in the fast-growing synthetic voice market.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,961
Reputation
7,413
Daps
135,808




1/1
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.








1/6
OpenAI Custom Voice Engine ~ found a form to apply for the voice engine api, also found fragments of the code that'll be used to showcase the demo in an upcoming blog post.

my understanding is that voice engine is capable of much more realistic and natural sounding voices

2/6
here's the partial code for the demo:
https://openai.com/_nuxt/Demo.de08f90c.js

3/6
very possible, yes

4/6
with the current info available, yeah

See Image 4 - mentions voice actors so maybe we get something akin to the best of ElevenLabs

5/6
the issue is that not all the forms have an entry point in the same loc

for example - openai[.]com/form/trademark-dispute is available under the /form entry point, whereas the report form can only be accessed in the chat interface of the GPT you'd want to report - which is fine…

6/6
seems so, the form mentions "voice actors" so hopefully it'll be really good for what it is

sam included "better voice mode" on his acknowledged requests for this year so maybe this is it

the name isn't too flashy but the trademark info is the catalyst behind the hype I assume
GJlBTAuWcAEHWY1.jpg

GJlBTBUWEAEiplg.jpg

GJlBTBxXUAAd93z.png

GJlBTCWWgAAWb4m.png







1/6
BREAKING NEWS:
OpenAI just released Voice Engine,
Provide text as input and a 15-second audio sample to copy the voice of the original speaker.

It sounds incredibly similar

Follow the

2/6
The use cases are endless.

For example, you can use this for multiple language translations.

As you can hear, the voice closely matches to the reference audio no matter what language it's in.

3/6
Here is the official announcement from
@OpenAI :

4/6
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. Navigating the Challenges and Opportunities of Synthetic Voices

5/6
Another mind-blowing fact?

OpenAI has released the Voice Engine way back in 2022. As always, they're many steps ahead of any company out there.

6/6
Thanks for reading! If you enjoyed this thread:

1. Please Like & RT.
2. Follow me
@godofprompt for more AI tips & tricks.
3. Get my FREE Prompt Engineering Guide:
GJ3TJzdW0AApmsp.jpg



OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology​

FILE - The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston. A wave of AI deepfakes tied to elections in Europe and Asia has coursed through social media for months, serving as a warning for more than 50 countries heading to the polls this year. (AP Photo/Michael Dwyer, File)

FILE - The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston. A wave of AI deepfakes tied to elections in Europe and Asia has coursed through social media for months, serving as a warning for more than 50 countries heading to the polls this year. (AP Photo/Michael Dwyer, File)

Updated 5:39 PM EDT, March 29, 2024


SAN FRANCISCO (AP) — ChatGPT-maker OpenAI is getting into the voice assistant business and showing off new technology that can clone a person’s voice, but says it won’t yet release it publicly due to safety concerns.

The artificial intelligence company unveiled its new Voice Engine technology Friday, just over a week after filing a trademark application for the name. The company claims that it can recreate a person’s voice with just 15 seconds of recording of that person talking.

OpenAI says it plans to preview it with early testers “but not widely release this technology at this time” because of the dangers of misuse.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the San Francisco company said in a statement.

In New Hampshire, authorities are investigating robocalls sent to thousands of voters just before the presidential primary that featured an AI-generated voice mimicking President Joe Biden.

A number of startup companies already sell voice-cloning technology, some of which is accessible to the public or for select business customers such as entertainment studios.

OpenAI says early Voice Engine testers have agreed to not impersonate a person without their consent and to disclose that the voices are AI-generated. The company, best known for its chatbot and the image-generator DALL-E, took a similar approach in announcing but not widely releasing its video-generator Sora.

However a trademark application filed on March 19 shows that OpenAI likely aims to get into the business of speech recognition and digital voice assistant. Eventually, improving such technology could help OpenAI compete with the likes of other voice products such as Amazon’s Alexa.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,961
Reputation
7,413
Daps
135,808


1/2
In January, we announced Dubbing Studio, an advanced workflow that gives you hands-on control over transcript, translation, and timing when dubbing your content. Creators and businesses use Dubbing Studio to localize podcasts, commercials, short films, and more.

This week, we added four new features to streamline your workflow:

(1) Trim Tool: Trim a generated clip to remove sections that don't sound right.
(2) Foreground track: Import Laughter, Singing, and any dialogue that you don’t want dubbed from the original audio using the Foreground track.
(3) Clip Looping: Loop the player on a portion of the track you’re working on.
(4) Clip History: Compare & choose from any of the last 10 generations of a given dialogue clip in the Clip History.

We also optimized dubbing rendering so it's now 10x faster to export.

2/2
If you want to use the voice we used to voiceover this update, look for “Brian” in the Text to Speech drop down.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
44,961
Reputation
7,413
Daps
135,808

ElevenLabs previews music-generating AI model​

Ken Yeung @thekenyeung

May 9, 2024 12:38 PM

AI-generated image of audiowaves surrounded by musical instruments.

AI-generated image of audiowaves surrounded by musical instruments.



Voice AI startup ElevenLabs is offering an early look at a new model that turns a prompt into song lyrics. To raise awareness, it’s following a similar playbook Sam Altman used when OpenAI introduced Sora, its video-generating AI, soliciting ideas on social media and turning them into lyrics.

Founded by former Google and Palantir employees, ElevenLabs specializes in using machine learning (ML) for voice cloning and synthesis in different languages. It offers many tools, including one capable of dubbing full-length movies. Unsurprisingly, the company has set its sights on the music industry.



Imagine the possibilities of using this model: Generate a fun lullaby to play for your kids to put them to sleep, produce a clever jingle for a marketing campaign, develop a snappy music intro for your podcast and more. Could there be a chance that someone might use ElevenLabs’ AI to develop the next hit song? Many AI music startups are already popping up, including Harmonai, Lyrical Labs, Suno AI, Loudly and more.

It’s also feasible that users could sell these AI-generated songs on the ElevenLabs marketplace, which it launched in January. The company’s Voice Library currently allows users to sell their AI-cloned voice for money while maintaining control over its availability and how they’re compensated.



However, AI music generation isn’t welcomed by all. As with all generative AI applications, the question is what ElevenLabs trained this model on and if it included copyrighted materials. And if so, whether it obtained permission from the rights holder or if it believes training without permission is protected by fair use. Some oppose the development of such technology because artists may find themselves out of a job. The concern is that the AI will be easily able to replicate the style of a particular artist and then you no longer need them to put out new music. They don’t want to do that Christmas album? No problem. Just use AI for that. And let’s also not forget about the possibility of this being used to produce deepfakes.

VentureBeat has contacted ElevenLabs for additional comment on its music model and will update this post if we hear back. We don’t know the maximum song length it can produce, but based on the example the company’s Head of Design Ammaar Reshi posted on X, it’s likely the AI will generate lyrics for a three-minute piece.
 
Top