For the first time in the history of artificial intelligence (AI), Microsoft has created a system that achieves human-level accuracy in speech recognition. It is a significant milestone for speech recognition technology which, until now, has struggled with recognizing the nuances in human speech.
In fact, even Harry Shum, Executive Vice President for Technology & Research at Microsoft, who is currently leading the 5,000 strong “Super-team” for artificial intelligence, was unabashedly pleased that they were able to achieve this now. In a press release this week, he said:
“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible.”
The standards for the test that Microsoft aced was created by the National Institute of Standards and Technology (NIST), and comprise a set of telephone conversations in English, Spanish and Mandarin Chinese. Dubbed the “Switchboard” test, it was originally created in the 1990s, but since then every company – including Microsoft, IBM and Google – has been using it as a benchmark to test their speech recognition technologies.
About a month ago, Microsoft said that it had achieved an error rate of 6.3% on the Switchboard test. After a few tweaks to their speech recognition model, they then did the unexpected – they had professional transcription experts do the test and pitted them against their own speech recognition system. The result? Microsoft came through with flying colors, becoming the first company to create a speech recognition system that made “the same or fewer errors” than trained professionals.
The Next Level for Speech Recognition
To say that this is a big achievement is an understatement. At a time when Amazon, Google and Apple are all working double-time to provide the best in digital assistant technology, Microsoft has clearly come out on top. It is now almost certain that they will incorporate improvements to their speech recognition technology into Cortana, who already lends her artificial intelligence capabilities to Windows devices.
But speech recognition goes beyond virtual assistants. The next level is to take the technology and tweak it even more, so it can understand conversations with background noise. The next level is to be able to identify the speakers and tag them in a conversation. The next level is for the artificial intelligence component of this software to keep learning, growing and enhancing its abilities.
There are new languages that need to be covered, there are infinite situations where speech recognition can be fine-tuned for greater accuracy and there is a multitude of applications where such technology can make our lives better.
For now, Microsoft has taken the lead on speech recognition – something that Amazon, Google, Apple and everyone else will now aspire to. They’ve basically set the bar at a high level. All that remains to be seen now is how Microsoft goes about integrating this new capability into its products and turns it into money.
Thanks for reading our work! Please bookmark 1redDrop.com to keep tabs on the hottest, most happening tech and business news from around the world.