FBI yet to decipher secrets of translation

There are news items all over the map today on the U.S. Justice Department report that there are more than 120,000 hours of “potentially valuable terrorism-related recordings” that have not been translated by FBI linguists due to backlog, management and “computer storage” problems. Apparently a lot of the report’s criticisms of the translation effort remain classified.

“What good is taping thousands of hours of conversations of intelligence targets in foreign languages if we cannot translate promptly, securely, accurately and efficiently?” said Sen. Patrick Leahy of Vermont, the ranking Democrat on the judiciary committee.

“The Justice Department’s translation mess has become a chronic problem that has obvious implications for our national security,” Leahy said.

Let’s try and help the FBI with some creative accounting on those figures, and show that there are plausible translation solutions, even though they have failed to detect them. If we agree that speaking speed is a traditional 250 wpm, and the FBI has a backlog of 123,000 hours of the stuff, this makes around 1.8 billion words in a mix of Farsi, Arabic and other languages, that need transcribing and then translating (possibly selectively) into English.

Let’s say you wanted to do the job properly: first transcribe the whole of the spoken record into electronic text, and then edit and ready the master file for information retrieval and/or translation applications. You manage to put together a transcription team of 50 trained Arabic language stenographers (using chording keyboards for speed). They would each have to take down some 36 million words, and working 8 hrs a day they would be at it for over a year. Let’s say you pay around US$ 3.5 million for the transcription and subsequent editing.

If you can’t wait for the stenographers to get their act together, you can always try transcribing speech signals into electronic text by using a speech recognition system with language model capabilities for Farsi, various dialects of Arabic, and any other languages, etc. Currently, streaming this audio signal through the kind of cutting-edge 2004 products that were demonstrated and discussed at the recent SpeechTEK in New York, you might achieve a 20% error rate (this is being pretty generous, given the spotty quality of the input signal) in the transcription, making it useless (since incomplete) as an intelligence source. Cleaning up and editing the output would be time-consuming and costly.

But if you did go down the speech recognition path and came up with a reasonable automated transcript, technology currently on the market would be able to audio search the sound file for at least names, places and dates to see what’s in the transcripts. See HP’s speechbot or Streamsage among others. This would enable relevant parts of the transcript to be sent on for translation into English and hence to enter the FBI’s knowledge radar.

Now suppose you had an automated translation solution that would deliver a gisted version of the original Arabic / Farsi etc audio tapes – plenty of errors but lots of actionable content, as they say, between the errors. With a translation throughput of say 100,000 words an hour, you could theoretically translate the whole transcribed shebang into English in about 75 days of non-stop processing. And obviously in a fraction of that time if you need to just gist a few selected passages. If the right language pairs in available systems were operational, you could negotiate an MT price of say 5 cents a word, and your translation budget for this backlog would reach US$ 1.53 million. Let’s say the whole backlog cleanup costs around US$ 12 million.  Its a winner. According to the new reports, the FBI language services have a 2004 budget of US$ 70 million.

This is not the first time large scale phone tapping has had to be recorded and translated for intelligence work. David Stafford in Spies Beneath Berlin recounts how in Cold War Berlin in the early 1950s, the UK intelligence agency MI6 working with the CIA, built a tunnel called Stopwatch/Gold that ran for 1,924 meters under the Soviet Sector of Berlin in order to tap into sensitive Soviet phones. The predominantly Russian language calls captured in this way generated 25 tons of 2.5 hr tapes. Let’s say a tape contained around 37,500 spoken words, and the physical object weighed around 12 ounces; the phone taps added up to about 80,000 tapes, and a total audiostream of about 3.4 billion spoken words. At one stage, MI6 had to translate 20,000 of these tapes containing 368,000 communications or 75 million words using a staff of 300 transcriber/translators. We can only hope the war on terror ends, like the Cold War, with a whimper rather than a bang.

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.

