True to type

I read a report in Le Monde about how the French cryptologist David Naccache managed to discover an intentionally censured (blacked out) word in a CIA print document released by the White House on April 10. The phrase was “operative told an XXXXXXX service…”.

First they used OCR (check out Simson Garfinkel’s recent encomium of the virtues of character recognition technology) to identify the font used, since it determines the number of characters per unit of length (16 mm). Luckily the font – Arial – was proportional rather than monospace, which meant that an ‘i’ letter took up less space than ‘n’. So they used a dictionary to list the possible words (only 1,530!) of 16 mm. Since the target word came after the string ‘an’, this limited the possibilities to 346 nouns and adjectives. Of these, only 7 made possible sense in context (Ukrainian, uninvited, unofficial, incursive, Egyptian, indebted and Ugandan). Given extra-textual circumstances, ‘Egyptian’ was chosen as the most likely candidate.

None of this deciphering was automated, of course, and the actual decoding was hardly earth-shattering. But there’s obviously still semantic mileage in the formal properties of fonts.


Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.

