OKI Electric has finally launched its distributed automatic translation system as a free service for Internet users. The heart of this system is the nearly 20 year old PENSEE MT (Japanese<==>English) system, rewritten in Java at the end of the 1990s. The big news is in fact about one crucial design feature of yakushite â€“ it helps user ‘communities’ to enrich the system’s dictionaries. The idea of grass roots word wallahs uploading reams of likely lexemes into dictionary warehouses for free must have crossed many a resource-poor language tech developerâ€™s mind since the Web unfurled. It would be interesting to know how many have gained anything really useful from actually doing it.
One classic example is the Italian language services vendor LOGOS’ online Dictionary which has been fed by individuals to the tune of seven and a half million (validated but not guaranteed) terms in an unequal mix of languages. It would be a good thing for someone to take a close look at LOGOS’ dictionary and reveal some of its internal ecology; there may even be marketable tools around to find out a) what the distributions of languages and term fields are, and b) whether resources like this can be â€˜X-rayedâ€™ quickly for potential users. Of course LOGOS owns the database containing the contents, so this won’t happen soon. But appropriate analytics would also allow LOGOS to add value to its resource.
In the case of yakushite, OKI has wagered on enabling subject matter â€˜communitiesâ€™ to build dictionary resources to improve output quality . Apparently it has developed some user-friendly term extraction tools that speed up the candidate term compilation process. The idea sounds great, but will people actually want to spend much time extracting, checking and uploading a subset of terms about their small terminological corner of the multilingual forest just to translate a travel website? One obvious place to try this would be in schools and universities, where subject matter is a constant focus, and where there is plenty of cheap labor to extract and prepare dictionary resources in exchange for the odd credit. But it is difficult to imagine many professionals from business sectors sharing their high specialized, patchy yet vital glossaries with OKI.