Bernard Huber has recently published Textalyser, a web server with a French-English interface that will analyse any input text into statistical data about word tokens. This sort of information is useful for pricing translations, sizing website content and anticipating difficulties in text to speech conversion for a text’s ‘longest’ words.
What text workers need is a fast-reaction tool box including at least word analytics, a concordancer to see words in context, and term extraction capabilities. Ideally accessible by clicking on any word in a text. We should be able to experience electronic words as portals to knowledge about the their lexical dominion, and their given instantiation in a document. But we cannot capture this ‘knowledge’ on a personal hard disc: what might look like a useful add-on to a word processing application actually needs to be web-based to benefit from richer, broader knowledge streams about words and language. Maybe the data that Textalsyer generates about texts, for example, can itself be aggregated to provide a further level of useful statistics about web-wide textual practice. But you probably need some sort of classificatory metadata about the semantic rather than purely formal content of the texts themselves to make this useful.