Henry Ford said history was bunk. James Joyce suggested history is a nightmare from which we are trying to awake. Others claim that ignorance of history will condemn us to repeat it. The history of language technology is so far neither bunkum nor nightmare. Indeed, apart from one or two illustrious exceptions it hardly exists yet. There must be a wealth of material just on the web, but so far no one yet seems to have structured it into useful timelines or offered a course on the topic.

I’ve been tinkering with the idea of collecting some basic materials on language tech history for some time. The field is huge yet divides easily enough into pre and post-digital technology. Obviously, digital language technology began with the advent of the computer in around 1945, so its history can in theory be charted from obvious usually accessible sources. One Canadian project called Infolingua (now cashless?) actually began work on collecting references to what it calls perilinguistics (“the body of knowledge and technology that deals with phenomena and problems having a linguistic component’ mainly covering automatic language processing, but also sociology and psycholinguistics). The database of 73,846 bibliographical entries for language techie type topics dated 1994 (!), claims 152 references from 1950-54, 848 from 54-59, 2,343 for 60-64 and 4005 up to 1970, etc. So that’s a start, even though it is not organized with a historical perspective in mind.

The true pioneer in this area, is John Hutchins, who has almost single-handedly been charting the development of machine translation from its beginnings and making most of his work available for free. He has recently launched a ‘repository’ for papers on MT and the first results can be found here. In fact, he is looking for a friendly website to house this collection permanently, so I’d encourage you to contact him via his site if you can help.

However, in addition to published sources, we need as full a record as possible of the people who developed this digital technology, and what they actually did and thought about the field over time. In other words, we need an oral history of language technology, not just a timeline of facts and technical publications. We want a record of the joys and sufferings of life in the lab.

John Hutchins and I have talked about launching an oral history of language technology, using the sort of approach used by the American Association of Artificial Intelligence for various AI topics (but not NLP it seems) or by the ACM on the development of neural nets .

Obviously to run interviews and capture some of the last 50 years before memories melt away, you would need to break the language tech field down into bite sized topical chunks (speech (recognition, synthesis, etc), dictionaries, translation, summarization, etc.). You would also have to ask appropriate pioneers or witnesses in a swathe of countries (US, Canada, W and E Europe, Russia, Japan, China, Korea, etc) to reminisce under the right conditions, so that the spoken results would eventually be databased for later cross-referencing etc. This adds up to a vast amount of oral history.

Such an undertaking would need extensive organization and funding, be backed by the right legal framework, and provide long term sustainability. This seems to point to an academic or professional institution, with branches in different countries and database and web resources commensurate with the scale of the task. Any offers?

However, there’s a strong case for starting on a modest basis (e.g. with the more elderly of our colleagues) to test the waters and scope the task. Perhaps a cluster of PhD students all working in a related area of language technology in one country could be persuaded to orient their research to this historical challenge. Possibly media or communications history departments (rather than the NLP people) might be tempted to have a go.

The history of linguistics is now a full-blooded discipline, as is the history of computing and to a lesser degree the history of translation. Isn’t it about time language technology got the same sort of historical treatment?

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.

