Diffusion-based models for semantic relatedness

Torres-Tramón, Pablo
The assessment of semantic relatedness for a given pair of entities in a knowledge graph has become a critical step in a wide variety of artificial intelligence tasks, including but not restricted to fields such as machine learning, natural language processing, and information retrieval. Semantic relatedness is a generalisation of semantic similarity; entities are semantically assessed by virtue of their relationships in the knowledge graph rather than by their inherent similarity. Semantic relatedness measures have been widely addressed in the research literature, producing a wide range of relatedness functions. These functions require to enumerate paths between entities, using a simple and yet powerful intuition: the more the number of paths connecting the entities, the more related. However, extracting paths from knowledge graphs is computationally expensive. Since the number of paths increases exponentially with the number of edges, a denser graph affects the tractability of the assessment. This issue becomes critical in online services where potential computational bottlenecks can be a point of failure and delays. In this thesis, we introduce an approach to semantic relatedness based on diffusion processes over knowledge graphs. We argue that diffusion processes can replace paths as the source of semantics without affecting the performance of these measures in real-world applications. We formalise this form of relatedness, and we compare them against their path-based cousins. We also study the methods to compute diffusion in large knowledge graphs. Our findings show that diffusion-based models behave similarly to path-based ones in terms of ranking of entity pairs and have a better computational performance. We evaluate the computational cost of our models and give recommendations to build real-world applications from them. To this end, we tested and evaluated our model in two relevant applications: entity retrieval in knowledge graphs and entity linking over streams of text. In the former, we introduced a two-stage retrieval model that combined a standard information retrieval model with a re-rank function based on our form of relatedness. In the latter, our model was able to overcome the computational bottleneck of linking entities in microblog posts using diffusion-based relatedness models and producing annotations in the text.
NUI Galway
Publisher DOI
Attribution-NonCommercial-NoDerivs 3.0 Ireland