Publication

Enhancing knowledge graph completion models and selected biological applications

Mohamed, Sameh K.
Citation
Abstract
Knowledge completion is the task of extending knowledge graphs to enhance the quality of systems relying on them. In recent years, various knowledge completion techniques were developed to model knowledge graphs using different of features such as graph features and embeddings. These models showed complementary capabilities where graph feature model excelled in terms of interpretability and knowledge graph embedding models excelled in terms of accuracy and scalability. Despite the advances achieve by these models in extending knowledge graphs, they still have predictive accuracy. The evaluation of the capabilities of these models was also limited to standard benchmarks with no real use case scenarios especially. In this thesis, study both graph feature models and knowledge graph embedding models and their use in extending knowledge graph and we propose new models for both approaches. We also present and evaluation of the capabilities of knowledge graph embedding models in multiple real life biological use cases. First, we examine the current limitation of the poor feature representations in graph feature models and we propose a new graph feature model, the DSP model, which offers richer feature representations. We show by experimental evaluation that our new proposed model outper- forms the current state-of-the-art models on a standard NELL based benchmark with no extra added computational cost. Secondly, We study knowledge graph embedding models where we investigate their training pipeline and examine its different paths and their effects on the models accuracy and scalability. We then propose a new tensor factorisation based knowledge graph embedding model, the TriVec model, which models embedding using multiple vectors. We show that this representation allows our model to dynamical encode embedding interac- tions of different types of symmetric and asymmetric relationships which results in accuracy improvements. We show by experimental evaluation on different standard benchmarks that our model outperforms other state-of-the-art methods in terms of accuracy. We also study the potential uses of knowledge graph embedding models in biological uses cases where we demonstrate their different capabilities in predicting links in biological net- works, measure similarity between biological concepts and clustering biological entities. We then present three use case scenarios of the use of knowledge graph embedding models in predicting drug protein targets, polypharmacy side-effect and tissue-specific protein func- tions where we show that they knowledge graph embedding models represented by our newly proposed model, the TriVec model, outperform state-of-the art techniques in these use cases.
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland