Publication

TED-MWE: a bilingual parallel corpus with MWE annotation: Towards a methodology for annotating MWEs in parallel multilingual corpora

Monti, Johanna
Sangati, Federico
Arcan, Mihael
Loading...
Thumbnail Image
Repository DOI
Publication Date
2015-12-03
Type
Conference Paper
Downloads
Citation
Monti, Johanna, Sangati, Federico, & Arcan, Mihael. (2015). TED-MWE: a bilingual parallel corpus with MWE annotation: Towards a methodology for annotating MWEs in parallel multilingual corpora. Paper presented at the Second Italian Conference on Computational Linguistics (CLiC-it 2015), Trento, Italy, 3-4 December.
Abstract
The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT training purposes and MWE translation quality evaluation. This paper describes a methodology to annotate a parallel spoken corpus with MWEs. The dataset used for this experiment is an English-Italian corpus extracted from the TED spoken corpus and complemented by an SMT output.
Funder
Publisher
Accademia University Press
Publisher DOI
10.4000/books.aaccademia.1514
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland