In grade school, teachers show their students pictures of lone polar bears, stranded in the middle of icy waters. They show videos of pandas and tigers and rhinoceroses, all umbrellaed together with one ominous label: endangered. From a young age, we’re taught about the animals that are on their way out — consequences of overpopulation, deforestation, and climate change. But extinction doesn’t only loom before insects and mammals: only 3% of the world’s 7,000 languages have a stable speaker base. The other 97% face uncertain, endangered futures.
This is what the Pangloss collection attempts to address. Pangloss is an open archive started in 1995 by the French National Centre for Scientific Research (CNRS), which boasts over 780 hours of recordings of more than 170 languages. Half of the resources available in the archive are transcribed and annotated, allowing listeners who aren’t familiar with a specific language to engage with it.
In January of this past year, the Pangloss collection updated their website, making it easier to navigate and access their dictionaries. Clicking on the “corpora” tab, you’re taken to a world map scattered with pins, each representing a unique language or dialect. Pressing on a language (or a pin) opens a new page, where different recordings are listed — with their speakers and any additional translated material — that are available to play. Malang, a Vietic language spoken by the Dusun people of Borneo, and Kakabe, a language spoken by roughly 50,000 people in Guinea, are only two examples of the diverse languages hosted in the collective. Pangloss being primarily an archive of audio recordings is also significant, as it allows for languages that do not have a written tradition — which, of the endangered batch, are many — to be included in preservation efforts.
Language archives like Pangloss will prove to be essential to language revitalization campaigns, as AI grows more sophisticated and takes over transcription duties. Currently, it takes hundreds of hours of recordings to train an AI to recognize a language and its sounds. But as more linguists and scientists collaborate to create sophisticated tools, it will only take several hours of recordings to train an AI, meaning that languages will not have to rack up a TV show’s worth of hours to be identified and recognized by these systems.
Though it’s disheartening to hear of how many and how quickly languages are dying out, there are many international efforts geared at preserving and recording them so that they are not lost to history. Language is an integral part of culture, and so these archives and AIs working to record languages preserves not only their respective syntax and phonology, but their cultural traditions and customs.