+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Microsoft has created an AI that they think is "too dangerous" for public release

Jul 18, 2024, 10:54 IST
Microsoft withholds release of new AIiStock
In a world where technological advancements are often heralded with great fanfare and widespread availability, Microsoft has taken an unusually cautious step. The tech giant has developed an artificial intelligence (AI) speech generator so convincing and advanced that it has decided to withhold it from public release.
Advertisement

VALL-E 2 is an AI marvel capable of mimicking human speech with uncanny accuracy, using just a few seconds of audio. Representing a significant leap in text-to-speech (TTS) technology, Microsoft’s researchers boast that it achieves "human parity" in generating speech — meaning its output is virtually indistinguishable from a human’s voice.

What makes the AI so believable?

This extraordinary capability has been made possible through a couple of groundbreaking features. The first of these is “Repetition Aware Sampling”, which ensures that VALL-E 2 avoids the pitfalls of monotonous speech by addressing repetitions of "tokens" — the small units of language like words or syllables. This feature prevents the AI from getting stuck in a loop of sounds, making its speech flow more naturally.
Secondly, “Grouped Code Modeling” enhances efficiency by reducing the sequence length, allowing the model to process fewer individual tokens in a single input sequence. This improvement not only speeds up speech generation but also tackles the challenges of processing lengthy strings of sounds. As per the researchers, VALL-E 2 is the first voice AI to reach human parity in peech robustness, naturalness, and speaker similarity.

Fears of misuse

While the potential applications of VALL-E 2 are vast — ranging from educational tools and entertainment to accessibility features and interactive voice response systems — Microsoft has opted to keep this technological marvel under wraps. The decision is driven by concerns over the potential misuse of such advanced voice cloning technology. The risks include the ability to spoof voice identification systems and impersonate individuals convincingly.
"VALL-E 2 is purely a research project. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public," the researchers stated. This cautious approach aligns with similar restrictions placed by other AI companies, such as OpenAI, on their voice technology.

Despite the decision to withhold VALL-E 2 from public release, Microsoft’s researchers remain optimistic about the future of AI speech technology. They envision practical applications where synthesised speech maintains speaker identity and can be used safely and ethically. Any future deployment of such technology, they emphasise, must include protocols to ensure that the speaker approves the use of their voice and a robust synthesised speech detection model.

Advertisement

The findings of this research have been published in a pre-print paper.
You are subscribed to notifications!
Looks like you've blocked notifications!
Next Article