SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the texts
Aldarra, Suad ; Muñoz, Emir ; Vandenbussche, Pierre-Yves ; Nováček, Vít
Aldarra, Suad
Muñoz, Emir
Vandenbussche, Pierre-Yves
Nováček, Vít
Loading...
Repository DOI
Publication Date
2014
Type
Conference Paper
Downloads
Citation
Suad Aldarra, Emir Muñoz, Pierre-Yves Vandenbussche, and Vít Nováček. 2014. SemanTex: semantic text exploration using document links implied by conceptual networks extracted from the text. In Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272 (ISWC-PD'14), Matthew Horridge, Marco Rospocher, and Jacco Van Ossenbruggen (Eds.), Vol. 1272. CEUR-WS.org, Aachen, Germany, Germany, 345-348.
Abstract
Despite of advances in digital document processing, exploration of implicit relationships within large amounts of textual resources can still be daunting. This is partly due to the ‘black-box’ nature of most current methods for computing links (i.e., similarities) between documents (c.f., [1] and [2]). The methods are mostly based on numeric computational models like vector spaces or probabilistic classifiers. Such models may perform well according to standard IR evaluation methodologies, but can be sub-optimal in applications aimed at end users due to the difficulties in interpreting the results and their provenance [3, 1]. Our Semantic Text Exploration prototype (abbreviated as SemanTex) aims at finding implicit links within a corpus of textual resources (such as articles or web pages) and exposing them to users in an intuitive front-end. We discover the links by: (1) finding concepts that are important in the corpus; (2) computing relationships between the concepts; (3) using the relationships for finding links between the texts. The links are annotated with the concepts from which the particular connection was computed. Apart of being presented to human users for manual exploration in the SemanTex interfaces, we are working on representing the semantically annotated links between textual documents in RDF and exposing the resulting datasets for particular domains (such as PubMed or New York Times articles) as a part of the Linked Open Data cloud.
Funder
Publisher
ACM
CEUR-WS.org
CEUR-WS.org
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland