WhatsApp is currently working on developing an in-app speech recognition and automatic transcription system for the iOS app’s voice message feature, according to WhatsApp Beta Info, the publication that broke the news earlier this month.
In response to WhatsApp Beta Info’s report, WhatsApp has since confirmed that the function is currently in the very early stages of development. The introduction of an in-app transcription feature is a major upgrade for the app, as WhatsApp users currently must use third-party apps to transcribe voice messages in the app.
As WhatsApp Beta Info reports, users will have to opt-in to the automatic transcription function. Voice messages will be processed and transcribed using speech recognition systems developed by Apple, rather than WhatsApp’s parent company Facebook. According to WhatsApp Beta Info, data will be used to refine and improve Apple’s speech recognition software. WhatsApp stated that end-to-end encryption will protect user data, in order to ensure each user’s privacy; voice messages will not be linked to a user’s identity.
Once a voice message has been transcribed, the transcription will be saved to the user’s device, so users can view the same transcription multiple times, without having to “re-transcribe” a single voice message. Additionally, the transcription feature will also allow users to skip to specific parts of a voice message simply by tapping on a given word.
Currently, it appears that WhatsApp is only developing the function for iOS users. WhatsApp Beta Info noted that there are no indications that transcription will be made available for Android users.
While voice messages will not be processed using software developed by Facebook, WhatsApp’s parent company has also been involved in speech recognition developments of its own as of late.
Earlier this summer, Facebook announced the development of wav2vec-Unsupervised, an artificial intelligence system that doesn’t utilize any pre-transcribed speech to develop speech recognition models. This capability is particularly notable as it allows for the creation of speech recognition or transcription models for languages that have not been heavily researched by phoneticians or computational linguists, making the software more accessible for these languages.