Lanfrica, an online catalog of African language resources specialized for linguists and natural language processing (NLP) specialists, has launched after months of development.
Lanfrica’s developers announced the platform’s official launch Feb. 15, noting that the platform hosts 2,199 different languages indigenous to the African continent (including several languages that are no longer actively spoken by native speakers). The platform mainly includes resources developed for NLP-, machine translation-, and speech recognition-related purposes, however Lanfrica also includes learning and entertainment resources as well.
“Lanfrica offers huge potential for better discoverability and representation of African languages on the web,” the platform’s developers wrote in a recent blog post. “Lanfrica is able to give useful statistics on the progress of African languages. … Such insight can lead to better allocation of funds, efforts, etc. toward bringing the more under-researched languages forward in NLP — thereby fostering the equal progress of African languages.”
The platform serves as a centralized location for researchers, professionals, and general interest users to search for resources for different African languages. Lanfrica claims that its goal is to help expand the presence of African languages in the tech world, as African languages are strongly underrepresented in the field of NLP and other related areas.
Lanfrica, then, allows individuals to more easily locate resources that may be difficult to track down otherwise. To compile resources that don’t name the language used, Lanfrica’s developers have also used artificial intelligence to accurately identify various African languages.
“At Lanfrica, we have created algorithms that can tell, with much effectiveness, the African language(s) involved in a resource, enabling us to even curate works that do not explicitly specify the African languages they worked on (which are very many),” the developers wrote.
Users are also encouraged to submit resources of their own — several of the languages that are hosted on the platform currently have no resources or just one or two. Users may submit papers, datasets, and other resources to help the platform grow.
Regardless of how widely spoken they are, many African languages are deemed low-resource languages — that is, linguists and NLP specialists have conducted and published less research on them than languages like English or Mandarin Chinese. As a result, languages like Swahili or Amharic (both of which are spoken by millions of people) tend to lag behind certain languages when it comes to technological developments such as MT or speech-to-text software.