Speechmatics, a startup developing speech recognition technology, announced June 27 that it raised $62 million in Series B funding.
The startup claims to have developed the “most accurate and inclusive speech-to-text engine available,” noting that it is able to accurately transcribe speech from a wide range of unique dialects and accents more accurately than other major software. According to the company’s leadership, the funding will enable the company to continue improving and building upon the diversity and accuracy of its technology.
“The patient capital will enable us to double down on our vision to close the gap between humanity and machines, which is incredibly exciting,” said Speechmatics’ CEO Katy Wigdahl. “We cannot wait to accelerate our growth and unlock the understanding of more and more voices.”
Based in Cambridge, UK, Speechmatics’ technology has been shown to better recognize the dialects of racial minorities than speech recognition software from Google and Amazon. According to Speechmatics, the company’s software has been shown to recognize African American voices with an accuracy rate of 82.8%, compared to Google and Amazon’s speech recognition software, which both have an accuracy of 68.6%.
“The Speechmatics team are undoubtedly a different pedigree of technologists,” said Jonathan Klahr, the managing director at Susquehanna Growth Equity. “We started tracking Speechmatics when our portfolio companies told us that again and again Speechmatics win on accuracy against all the other options including those coming from ‘Big Tech’ players.”
With tens of millions of dollars being invested in Speechmatics and other inclusivity-oriented language technology like Sanas’ accent-translation, it appears that the industry is developing further interest in developing technology for the dozens of unique accents and variants of individual languages, rather than just the individual languages themselves.
Due to breakthroughs in artificial intelligence, creating diverse speech recognition software is less time consuming than it was in its early days. According to Speechmatics’ June 27 announcement, speech recognition software historically required training data that was manually annotated — this often limited training data to a set of standardized and “commercially valuable” speakers. Nowadays, however, this manual annotation is no longer necessary, allowing developers to train speech-to-text engines on a much wider set of data, unlimited by age, gender, or dialect.