First modeling languages in the Democratic Republic of the Congo, TWB and KoBo Inc will capture the voices of speakers of marginalized groups to develop language technology and data management tools for these groups.
Translators without Borders (TWB) and KoBo Inc. plan to develop automatic speech recognition (ASR) technology to aid humanitarians in the collection of data from speakers of marginalized languages in low-literacy contexts. Funded by the Cisco Foundation, the initiative will contribute to TWB’s ongoing mission to develop language technology for people with low literacy and KoBo’s mission to provide accessible and effective tools for humanitarian data collection and management.
Integrating ASR and speech-to-text mechanisms with a data collection and management tool, the collaborative initiative will enable humanitarians to engage people and conduct assessment on matters like the coronavirus, access to food and water, and what languages they speak and understand.
With the ongoing COVID-19 pandemic restricting mobility yet calling for broader language services, humanitarians have struggled to engage with groups living in vulnerable situations. TWB hopes the tool will mitigate some of the difficulties they have faced these past few months.
“We must listen to the voices of people that have historically been marginalized due to the languages they speak,” says Grace Tang, Gamayun program manager at TWB. “This collaboration with Cisco and KoBo Inc. is urgently needed and will help ensure voice recognition technology is a key part of communicating with speakers of marginalized languages and with those who have lower literacy levels, especially during COVID-19.”
Beginning with the Democratic Republic of the Congo, TWB will first model languages like French and Congolese Swahili for the ASR technology. The group plans to collaborate with local researchers to gather a wide range of voices for a collection of basic words. KoBo Inc will the integrate the speech into KoboToolbox, a free and open source suite of tools for field data collection.
“This technology – used responsibly – will ensure that humanitarians process what people are telling them on the ground more effectively” says Kobo co-founder Patrick Vinck. “Feedback from communities is too often ‘lost in translation’ and does not lead to operational changes in humanitarian action.”
The project builds on a successful pilot project funded by the Cisco Foundation, which developed machine translation and open-source language datasets in six additional languages. “The Cisco Foundation is excited to support this scalable, technology-driven initiative that makes sure even the most vulnerable people are heard during the COVID-19 crisis,” says Erin Connor, the Critical Human Needs portfolio manager at the Cisco Foundation. “This new collaboration between TWB and KoBo Toolbox unites two technologies that, together, will help humanitarians better understand the needs of people who speak marginalized languages.”
The project also adds to TWB’s efforts to support speakers of marginalized languages, translating millions of words of COVID-19 information as well as creating a multilingual COVID-19 glossary. TWB has also joined TICO-19, a coalition of academic institutions and industry partners like Amazon, Appen, and Translated to make crisis-related content available through machine translation (MT) models.