Publication

A Hybrid Framework for Querying Linked Data Dynamically

Umbrich, Jürgen
Citation
Abstract
As of today, the Web has evolved to become the largest collection of information made available by mankind. Researchers and developers are continuously working on transforming this loosely connected data collection into a giant knowledge base. As part of this trend, the Semantic Web community has started a movement to transform the Web of unstructured text into the so called 'Web of Data'-a framework to create, share and reuse data by humans and machines alike across application, enterprise, and community boundaries. From this movement, Linked Data has emerged as a set of best practices to publish, connect and discover structured data on the Web using standard formats. As of today, there are over thirty billion public facts which can be accessed, reused and combined by individuals as well as organisations and companies. As the Web of Data continues to expand and diversify, it becomes more and more dynamic with data being constantly generated, removed and updated, e.g., from sensor/stream sources. New querying techniques are required to eXciently keep up with this trend. While traditional approaches facilitate fast query times by replicating Web data in optimised oYine index structures , they cannot deal eXciently with dynamic data and cannot guarantee up-to-date results. A new generation of distributed Linked Data query engines address this problem and deliver up-to-date results by retrieving query relevant data immediately before or during query execution. However fetching data at runtime from potentially hundreds or thousands of relevant Web sources is slow compared to optimised index lookups. This thesis studies and improves distributed query approaches for Linked Data and develops a hybrid query framework that oUers fresh and fast query results by combining centralised and distributed query techniques with a novel query planning approach based on knowledge about the dynamicity of data. We start by identifying the diUerent levels of dynamicity within Linked Data and highlight the challenges for centralised query approaches to deliver up-to-date results if operating over such dynamic data.We then present a study of link traversal based query execution approaches for Linked Data and show how the query performance can be improved by providing reasoning extensions.We have also developed an approximate index structure that summarises the graph-structured content of Web sources, and provide an algorithm that exploits this source summary index. Finally, we propose and evaluate a novel hybrid query engine framework that combines the execution strength of materialised query approaches with the live results from distributed query approaches. The query planning phase uses a cost-model that combines standard selectivity and novel dynamicity estimates to enable fast and fresh results.
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland