NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French
Daudert, Tobias ; Ahmadi, Sina
Daudert, Tobias
Ahmadi, Sina
Loading...
Identifiers
Publication Date
2019-08-12
Type
Workshop paper
Downloads
Citation
Daudert, Tobias, & Ahmadi, Sina. (2019). NUIG at the FinSBD Task: Sentence boundary detection for noisy financial PDFs in English and French. Paper presented at the First Workshop on Financial Technology an Natural language Processing (FinNLP@IJCAI2019), Macao, China, 12 August, https://doi.org/10.13025/yzq2-dr94
Abstract
Portable Document Format (PDF) has become the industry-standard document as it is independent of the software, hardware or operating system. Publicly listed companies annually publish a variety of reports and too take advantage of PDF. This leads to the rise in PDF containing valuable financial information and the demand for approaches able to accurately extract this data. Analyzing and mining information requires a challenging extraction phase, particularly with respect to document structure. In this paper, we describe a sentence bound- ary detection approach capable of extracting complete sentences from unstructured lists of tokens. Our approach is based on the application of a language model and sequence classifier for both the English and the French language. The results show a good performance, achieving F1 scores of 0.855 and 0.91, and placed our team in 3rd and 5th for the French and English language, respectively.
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland