CoFiF: A corpus of financial reports in French language
Ahmadi, Sina ; Daudert, Tobias
Ahmadi, Sina
Daudert, Tobias
Loading...
Identifiers
Publication Date
2019-08-12
Keywords
Type
Workshop paper
Downloads
Citation
Ahmadi, Sina, & Daudert, Tobias. (2019). CoFiF: A corpus of financial reports in French language. Paper presented at the The First Workshop on Financial Technology and Natural Language Processing (FinNLP), Macao, China, 12 August, https://doi.org/10.13025/zjf2-fn10
Abstract
In an era when machine learning and artificial intelligence have huge momentum, the data demand to train and test models is steadily growing. We introduce CoFiF, the first corpus comprising company reports in the French language. It contains over 188 million tokens in 2655 reports, covering reference documents, annual, semestrial and trimestrial reports. Our main focus is on the 60 largest French companies listed in France s main stock indices CAC40 and CAC Next 20. The corpus spans over 20 years, ranging from 1995 to 2018. To evaluate this novel collection of organizational writing, we use CoFiF to generate two character-level language models, a forward and a backward one, which we use to demonstrate the corpus potential on business, economics, and management research in the French language.
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland