Microsoft's voice-recognition tech is now better than even teams of humans at transcribing conversations
Microsoft
But while Microsoft's system had fewer transcription errors than the average human transcriptionist, it still couldn't best a team of trained humans. So, the world of academia fired back with a new challenge: Lower the error rate to below what human teams can do.
Now Microsoft has done just that. In a blog entry on Sunday, Xuedong Huang, Microsoft Research's chief speech scientist, reported that the company had broken even that barrier.
It's a major milestone, Huang wrote. And it gives the company a sound foundation to go from mere transcription to understanding the meaning of what's being said, he said. Speech recognition is a fundamental building block for building more robust artificial intelligence.
"Moving from recognizing to understanding speech is the next major frontier for speech technology," Huang wrote.
Microsoft's voice recognition system has been improving rapidly. Transcription accuracy is judged by error rates; i.e., the portion of words a system gets wrong out of a given recording of speech. That error rate is determined using Switchboard, a standard test for voice transcription accuracy widely used in the industry, including by IBM and Google.
As recently as September 2016, Microsoft's error rate, according to Switchboard, was 6.3%, which means that out of every 100 words the system was getting more than 6 wrong. By comparison, a single human transcriptionist has an average error rate of 5.9%, and a team of trained humans clocks in with an error rate of around 5.1%.
Microsoft matched the former error rate in October and just beat the latter.
That's far sooner than the company expected. Indeed, back in 2015, Huang himself told Business Insider that building a system capable of surpassing a human at transcription was "four to five years away." Less than two years later, we're well past that point.
Still, challenges remain. Microsoft's transcription system is patterned after the audio coming from a nice, stable landline telephone, Geoffrey Zweig, formerly a principal researcher at the company, told Business Insider last October. The next frontier for voice recognition is to accurately transcribe speech even when it's coming over a lousy cell connection or an echoing McDonalds drive-thru speaker.
Speech science "still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available," Huang wrote in his blog post on Sunday.
Get the latest Microsoft stock price here.
- I spent $2,000 for 7 nights in a 179-square-foot room on one of the world's largest cruise ships. Take a look inside my cabin.
- Colon cancer rates are rising in young people. If you have two symptoms you should get a colonoscopy, a GI oncologist says.
- Saudi Arabia wants China to help fund its struggling $500 billion Neom megaproject. Investors may not be too excited.
- Catan adds climate change to the latest edition of the world-famous board game
- Tired of blatant misinformation in the media? This video game can help you and your family fight fake news!
- Tired of blatant misinformation in the media? This video game can help you and your family fight fake news!
- JNK India IPO allotment – How to check allotment, GMP, listing date and more
- Indian Army unveils selfie point at Hombotingla Pass ahead of 25th anniversary of Kargil Vijay Diwas
- JNK India IPO allotment date
- JioCinema New Plans
- Realme Narzo 70 Launched
- Apple Let Loose event
- Elon Musk Apology
- RIL cash flows
- Charlie Munger
- Feedbank IPO allotment
- Tata IPO allotment
- Most generous retirement plans
- Broadcom lays off
- Cibil Score vs Cibil Report
- Birla and Bajaj in top Richest
- Nestle Sept 2023 report
- India Equity Market