Just before Christmas, Microsoft Research discreetly filed a patent for a data-driven type translation system:
An adaptive machine translation service for improving the performance of a user’s automatic machine translation system is disclosed. A user submits a source document to an automatic translation system. The source document and at least a portion of an automatically generated translation are then transmitted to a reliable modification source (i.e., a human translator) for review and correction. Training material is generated automatically based on modifications made by the reliable source. The training material is sent back to the user together with the corrected translation. The user’s automatic translation system is adapted based on the training material, thereby enabling the translation system to become customized through the normal workflow of acquiring corrected translations from a reliable source.
But don’t give up on that project you had to beat the rest of them to market. Just this week, IBM announced that it would freeing up 500 patents for use by the Open Source Software movement. Included in the list, I found these covering aspects of automatic translation:
US5644775 Method and system for facilitating language translation using string-formatting libraries
US5251130 Method and apparatus for facilitating contextual language translation within an interactive software application
US5640575 Method and apparatus of translation based on patterns
US5267156 Method for constructing a knowledge base, knowledge base system, machine translation method and system therefor
US6236958 Method and system for extracting pairs of multilingual terminology from an aligned multilingual textOthers include
US5640487 Building scalable n-gram language models using maximum likelihood
maximum entropy n-gram models
US5636291 Continuous parameter hidden Markov model approach to automatic
handwriting recognition
US5220621 Character recognition system using the generalized hough transformation and method
US6249605 Key character extraction and lexicon reduction for cursive text recognition
US6182115 Method and system for interactive sharing of text in a networked environment
US5678052 Methods and system for converting a text-based grammar to a compressed syntax diagram
US6311177 Accessing databases when viewing text on the web
US6216102 Natural language determination using partial words