Project to boost cross-lingual document interoperability

Thanks to Gary Price for this report on a European Commission/NSF project on “Quality of Service and Legitimacy in eGovernment,” a multi-institution initiative to improve the delivery of government services.

QUALEG brings together US (Eduard Hovy) and Israeli (Avigdor Gal) researchers to work on coordinating government type documents between Poland, France and Germany. In addition to four languages (including English), the project also has to handle different legacy software as well. The project will first create a semantically sensitive ontology.

“In the context of Digital Government, ontologies play an increasingly important role, as database metadata schemas, terminology standardization structures and the foundation for interfaces between applications,” says Eduard Hovy. “Yet the complexity and cost of building ontologies remains a daunting challenge.”

The researchers will use a “clustering” approach. In clustering, the machine is taught to understand relationships by word occurrence: glass, paperweight, perfume bottle versus glass, windshield, rearview mirror. One starts with “topic signatures” – a set of words for each category that are weighted by relevance to that topic. Using this system, accuracy can go as high as 75%, depending on how clear and distinct the topics are, says Hovy.

The process still requires some human intervention, at least in its initial stages. “Clustering has been used since the 60’s,” says Hovy, “But it’s never been very accurate by itself. If you give it additional help, it can be.”

QUALEG at this phase is a pilot program that it is hoped can be extended throughout Europe. But Hovy’s and Gal’s work is equally ambitious – it could lay the groundwork for an “ontology service bureau,” where those charged with database creation could have that initial painstaking step performed. If not fully automated, such a system could at least eliminate much of the individual time and effort that goes into ontology creation.

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.


