Publication

Learning content patterns from linked data

Muñoz, Emir
Citation
Muñoz, Emir. (2014). Learning content patterns from linked data. Paper presented at the Proceedings of the Second International Conference on Linked Data for Information Extraction - Volume 1267, Riva del Garda, Italy.
Abstract
Linked Data (LD) datasets (e.g., DBpedia, Freebase) are used in many knowledge extraction tasks due to the high variety of domains they cover. Unfortunately, many of these datasets do not provide a description for their properties and classes, reducing the users' freedom to understand, reuse or enrich them. This work attempts to fill part of this lack by presenting an unsupervised approach to discover syntactic patterns in the properties used in LD datasets. This approach produces a content patterns database generated from the textual data (content) of properties, which describes the syntactic structures that each property have. Our analysis enables (i) a human-understanding of syntactic patterns for properties in a LD dataset, and (ii) a structural description of properties that facilitates its reuse or extension. Results over DBpedia dataset also show that our approach enables (iii) the detection of data inconsistencies, and (iv) the validation and suggestion of new values for a property. We also outline how the resulting database can be exploited in several information extraction use cases.
Publisher
CEUR-WS.org
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland