Text analytics

Bernard Huber has recently published Textalyser, a web server with a French-English interface that will analyse any input text into statistical data about word tokens. This sort of information is useful for pricing translations, sizing website content and anticipating difficulties in text to speech conversion for a text’s ‘longest’ words.

What text workers need is a fast-reaction tool box including at least word analytics, a concordancer to see words in context, and term extraction capabilities. Ideally accessible by clicking on any word in a text. We should be able to experience electronic words as portals to knowledge about the their lexical dominion, and their given instantiation in a document. But we cannot capture this ‘knowledge’ on a personal hard disc: what might look like a useful add-on to a word processing application actually needs to be web-based to benefit from richer, broader knowledge streams about words and language. Maybe the data that Textalsyer generates about texts, for example, can itself be aggregated to provide a further level of useful statistics about web-wide textual practice. But you probably need some sort of classificatory metadata about the semantic rather than purely formal content of the texts themselves to make this useful.

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.


Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC