Enhancing statistical machine translation with bilingual terminology in a CAT environment
Arcan, Mihael ; Turchi, Marco ; Tonelli, Sara ; Buitelaar, Paul
Arcan, Mihael
Turchi, Marco
Tonelli, Sara
Buitelaar, Paul
Loading...
Identifiers
http://hdl.handle.net/10379/14924
https://doi.org/10.13025/20976
https://doi.org/10.13025/20976
Repository DOI
Publication Date
2014-10-22
Type
Conference Paper
Downloads
Citation
Arcan, Mihael, Turchi, Marco, Tonelli, Sara, & Buitelaar, Paul. (2014). Enhancing statistical machine translation with bilingual terminology in a CAT environment. Paper presented at the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), Vancouver, Canada, 22-26 October.
Abstract
In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity. Therefore, we investigate several strategies to extract and align bilingual terminology, and to embed it into the SMT. We compare two embedding methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and the cache-based model. We tested our framework on two different domains showing improvements up to 15% BLEU score points.
Publisher
Association for Machine Translation in the Americas
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland