The short version

The UK text processing company Corpora Software has struck again (see here for a previous posting on their products), this time with Summarize!, a personalized document summary tool. People have been researching automatic document summarization ever since the dawn of computing, but hard facts about its actual usage and effectiveness are rare. A quick précis (nudge, nudge) of any reports on successful experiments would be welcome.

Especially so because document construction and textual presentation style have already adapted to the information overload. Not just as digests and short versions, or by the use of purely graphical methods. Summarizing techniques are systematically used to make it easier to guess the core content of books, articles, and other documents. Non-fiction books now usually carry explanatory subtitles, articles use pull quotes or side-bars and boxes to summarize content, everyone uses lists (the easiest summarizer there is) and bullet points, and abstracting practices from scholarly publications have gone mainstream in executive summaries and the like. Likewise, copy-writers are taught to use an inverted triangle method for press releases – vital need-to-know new content at the top, tapering to often-ignored given information at the bottom. All of which simplifies and accelerates readability and knowledge capture. We’ve come a long way from the world of 9th century monkish manuscripts, where readers had to contnd with rare word division, and in which scribes started top left and ended bottom right with only rare letter coloring to signal discursive shifts.

Two cultural pendants to the short-version mindset. One is Tom Stoppard’s 30 minute versions of Shakespeare’s plays (remember a full length Hamlet can last 4 hours) for the Shakespeare Schools Festival , reported recently here (subscription maybe necessary). A sort of kids’ equivalent to the ecxecutive summary of long reports. The other is Eric Schulman’s famous The History of the Universe in 200 Words or Less .


Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.

