The Cardamom workbench for historical and under-resourced languages
; ; ; ; ;
Loading...
Publication Date
2023-09
Type
conference paper
Downloads
Citation
Adrian Doyle, Theodorus Fransen, Bernardo Stearns, John P. McCrae, Oksana Dereza, and Priya Rani. 2023. The Cardamom Workbench for Historical and Under-Resourced Languages. In Proceedings of the 4th Conference on Language, Data and Knowledge, pages 109–120, Vienna, Austria. NOVA CLUNL, Portugal.
Abstract
This paper describes the creation of a workbench tool designed to make technologies developed throughout the lifespan of the Cardamom project easily accessible to researchers who could most benefit from them, but who may not have the technical expertise to apply bleeding edge technologies to their own datasets. The workbench provides an intuitive graphical user interface (GUI) and workflow which abstract users away from underlying technical tasks, while providing them with
a suite of powerful NLP tools developed by the Cardamom team. These include tokenisers, POS-taggers, various annotation tools, and ML models. The performance of workbench tools can be improved as text and annotations are added by users. It is envisioned that this workbench will provide a simple route to digital publication for academics in the humanities, or more specifically, for linguists working with under-resourced or historical languages, who have collected text data but are unable to make it available online as a result of financial or technical restraints. This has the added benefit of increasing the availability of high quality, annotated text data to NLP researchers, thereby providing value to both communities of researchers.
Publisher
Association for Computational Linguistics
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International