Publication

Semantic network analysis for unsupervised topic linking and labelling

Hulpus, Ioana
Citation
Abstract
We rely more and more on machines to organise, analyse and summarise the vast amount of textual digital information that is being produced at a rate never seen before. At the same time, we notice an increase in availability of structured knowledge that is understandable by both humans and machines. The integration between unstructured text and structured knowledge is crucial for availing of the knowledge contained in text. The research questions that we tackle in this thesis are essential for understanding how applications can effectively link text elements to external background knowledge, and how this background knowledge can assist humans in the interpretation of vast text collections. Towards this goal, this thesis deals primarily with two core problems: word-sense disambiguation and topic labelling. Word-sense disambiguation is a fundamental problem that needs to be dealt with by most systems that need to integrate text and background knowledge. In this thesis, we investigate two scenarios for word-sense disambiguation. The first scenario focuses on disambiguation with multiple sense inventories simultaneously, and has not been addressed before. We tackle this problem by proposing a versatile disambiguation approach that only requires a short textual definition of word senses. The second scenario addresses word-sense disambiguation with a pre-given semantic graph, DBpedia. We propose a new disambiguation algorithm that solely relies on graph proximity for solving this problem. The novelty lies in that no previous work took a semantic graph approach to disambiguation with DBpedia. The second core problem this thesis tackles is topic labelling. Topic labelling is necessary for displaying text mining results in a human interpretable way. Broadly, its goal is to find a phrase that captures the essence (gist) behind a group of related words (topic). Our approach exploits the structure of the semantic graph of DBpedia in order to solve this problem. The unifying high-level hypothesis behind our research is that structural properties of concepts reveal their semantic properties. All our findings show a substantial correspondence between distributional semantics and semantics captured in the structure of semantic networks. This opens new opportunities for integrating the knowledge extracted from text through text mining and background knowledge, as well as for leveraging the benefits of this integration. Throughout this thesis we evaluate our proposed methods through user studies, compare their performances to related work and discuss our findings.
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland