Machine translation of domain-specific expressions within ontologies and documents

Arcan, Mihael
Nowadays, most of the semantically structured data have labels stored only in English. Although the increasing number of ontologies offers an excellent opportunity to link this knowledge together, non-English users may encounter difficulties when using the resources represented in English only. Therefore, applications in information retrieval or cross-lingual business intelligence, using monolingual resources are limited to the language in which the ontology labels are stored. Because of that, ontologies need to be translated into different languages. Another important reason to translate ontologies is that they may already exist in different languages, but without aligning the ontology labels across languages, we are not able to align, compare or extend them. This dissertation examines the translation of domain-specific expressions represented in semantically structured resources or documents. The main challenge in translating ontologies is to disambiguate an ontology label with respect to the domain, which is defined by the ontology itself. Since the manual translation of ontologies is very time consuming and expensive, this work presents a domain-aware machine translation system to automatically translate the labels. As ontologies may change over time, having in place a machine translation system adaptable to an ontology can, therefore, be very beneficial. Since domain-specific expressions also occur in semantically unstructured resources, i.e. documents, this work also provides insights into translating the terms within their context. Differently to ontology translation, the terms in documents have to be identified and treated as one lexical entity to ensure satisfying translations.
Publisher DOI
Attribution-NonCommercial-NoDerivs 3.0 Ireland