Publication

Bragrinews: um corpus temporal-causal (português-brasileiro) para a agricultura

Drury, Brett
Fernandes, Robson
Lopes, Alneu De Andrade
Repository DOI
Publication Date
2017-06-28
Type
Article
Downloads
Citation
Drury, Brett; Fernandes, Robson; Lopes, Alneu De Andrade (2017). Bragrinews: um corpus temporal-causal (português-brasileiro) para a agricultura. Linguamática 9 (1), 41-54
Abstract
There has been a recent sharp increase in interest in academia and industry in applying machine learning and artificial intelligence to agricultural problems. Text mining and related natural language processing techniques, have been rarely used to tackle agricultural problems, and at the time of writing there was a single project in the Portuguese language. It is possible that the failure of researchers to use text mining techniques to analyze Portuguese texts to resolve agricultural problems may be due to a lack of freely available corpora. To correct the lack of a Portuguese language agriculture centric corpus we are releasing a Brazilian-Portuguese agricultural language resource, which is described by this paper. The corpus is partially non-contiguous and spans a time period from 1996 to 2016. It consists of news stories that have been scraped from Brazilian News sites that have been annotated with the following information types: causal, sentiment, named entities that include temporal expressions. The corpus has additional resources such as a: treebank, lists of frequent: unigrams, bigrams and trigrams, as well words or phrases that have been identified by journalists as either: "important" or domain specific. It is hoped that the release of this corpus will stimulate the adoption of text mining in agriculture in the Lusophonic research community.
Funder
Publisher
University of Minho
Publisher DOI
10.21814/lm.9.1.245
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland