Post Editing: Transcribing the obscure

By Katie Botkin September 26, 2012

In grad school, I had this idea that I would go on to get a PhD in field linguistics. We were doing these projects to figure out the phonetic structure of a language unknown to us based on our own transcriptions from a native speaker, and I picked the most obscure one I could find. There was a girl I knew from Gabon who spoke a language she called Bateke, which was also the name of her minority ethnic group there. The language had no alphabet, she said. Nothing was written down in her language, at least that she knew of. So I recorded her pronouncing a list of 100 words and sentences, transcribed those using the international phonetic alphabet, and attempted to construct a phonetic chart of her language — the starting point for any field linguist’s attempt at creating a working orthography. It was enough to whet my appetite, and I returned to her for later projects in grammar and so on. So when I read about minority languages, I think of that, and try to calculate how much time it would take to come up with an accurate written document in that language — translated or otherwise. And I start to get a little tired.

Once there’s an orthography of a language, things get a little easier. As Tim Brookes notes in this issue, artistically speaking, you can copy alphabets even if you know nothing about the language. If you need cultural cues or translation acumen, of course, there’s more that goes into it, as Jacques Barreau, Christopher S. Carter, Gary Muddyman, Andrea Edmundson and Sarah Teigan all point out.

If you’re trying to create language technology, there’s a whole other set of challenges with minority languages. Arvi Hurskainen’s article on Swahili machine translation (MT) made me wonder, however, if despite the scarcity of data, the Bantu language spoken by the Bateke could eventually follow in the footsteps of established rule-based Bantu MT engines, such as might be used for Swahili.

Our Core Focus this issue is localization, and we’ve chosen some articles that segue nicely from the minority language and emerging markets focus. Richard Sikes details the differences between internationalization, globalization and localization, and then Benjamin B. Sargent offers some details on the return on investment of localization, making a case for a growing number of languages. Next comes David Filip’s article (the first of two) on how to actually do this. Manish Kanwal and Akulaa Agarwal then give some specifics on how to localize formats such as video. There’s also a review of a comic book; columns on inclusion and community translation by Kate Edwards and Terena Bell, respectively; and a tech-slanted Takeaway on language on the web.

All in all, there’s plenty in this issue that may help the obscure become less so.