+

Cookies on the Business Insider India website

Business Insider India has updated its Privacy and Cookie policy. We use cookies to ensure that we give you the better experience on our website. If you continue without changing your settings, we\'ll assume that you are happy to receive all cookies on the Business Insider India website. However, you can change your cookie setting at any time by clicking on our Cookie Policy at any time. You can also see our Privacy Policy.

Close
HomeQuizzoneWhatsappShare Flash Reads
 

Microsoft's voice-recognition tech is now better than even teams of humans at transcribing conversations

Aug 22, 2017, 00:08 IST

Microsoft Distinguished Engineer Xuedong HuangMicrosoft

In October 2016, in a big milestone for artificial intelligence, Microsoft unveiled a system that can transcribe the contents of a phone call as well or better than human professionals.

Advertisement

But while Microsoft's system had fewer transcription errors than the average human transcriptionist, it still couldn't best a team of trained humans. So, the world of academia fired back with a new challenge: Lower the error rate to below what human teams can do.

Now Microsoft has done just that. In a blog entry on Sunday, Xuedong Huang, Microsoft Research's chief speech scientist, reported that the company had broken even that barrier.

It's a major milestone, Huang wrote. And it gives the company a sound foundation to go from mere transcription to understanding the meaning of what's being said, he said. Speech recognition is a fundamental building block for building more robust artificial intelligence.

"Moving from recognizing to understanding speech is the next major frontier for speech technology," Huang wrote.

Advertisement

Microsoft's voice recognition system has been improving rapidly. Transcription accuracy is judged by error rates; i.e., the portion of words a system gets wrong out of a given recording of speech. That error rate is determined using Switchboard, a standard test for voice transcription accuracy widely used in the industry, including by IBM and Google.

As recently as September 2016, Microsoft's error rate, according to Switchboard, was 6.3%, which means that out of every 100 words the system was getting more than 6 wrong. By comparison, a single human transcriptionist has an average error rate of 5.9%, and a team of trained humans clocks in with an error rate of around 5.1%.

Microsoft matched the former error rate in October and just beat the latter.

That's far sooner than the company expected. Indeed, back in 2015, Huang himself told Business Insider that building a system capable of surpassing a human at transcription was "four to five years away." Less than two years later, we're well past that point.

Still, challenges remain. Microsoft's transcription system is patterned after the audio coming from a nice, stable landline telephone, Geoffrey Zweig, formerly a principal researcher at the company, told Business Insider last October. The next frontier for voice recognition is to accurately transcribe speech even when it's coming over a lousy cell connection or an echoing McDonalds drive-thru speaker.

Advertisement

Speech science "still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available," Huang wrote in his blog post on Sunday.

NOW WATCH: 6 things in tech today that Bill Gates accurately predicted back in 1999

Next Article