A Native and Adaptive Approach for Linked Stream Data Processing

Le Phuoc, Danh
Sensors, mobile devices and social platforms generate an immense amount of stream data in various formats and schemata. For these areas, the idea of Linked Stream Data is to extend RDF data model to cope with the heterogeneity of data sources and to enable the data integration¿not only among themselves, but also with other existing sources. This would enable a vast range of new, near real-time applications. Such applications drive the demand for processing engines that support continuous queries over Linked Stream Data and Linked Data. These engines must not only support the necessary functionalities but also meet the typical low-latency response requirement of stream processing applications. Since unmodified data stream management systems (DSMSs) and triple storages do not provide full functionalities required by Linked Stream Data processing, the rewriting approach could be used to delegate the processing to those systems. However, this suffers from the overhead of data transformation and does not enable full control over the query execution process. The overhead might be prohibitively expensive for the low-latency response requirement and the lack of full control of the execution process restricts optimisations partially and locally in each underlying sub-system. Moreover, the graph-based model of RDF data poses many challenges to designing a physical storage and optimising the processing when mapped to a relation-based data model. Nevertheless, most techniques and algorithms of DSMSs assume stream data being represented in that way. Therefore, algorithms and techniques for DSMSs and triple stores need to be carefully re-engineered to build an efficient and scalable processing engine for Linked Stream Data and Linked Data. In this work, we present an adaptive and native execution framework for Linked Stream Data and Linked Data, called CQELS (Continuous Query Evaluation over Linked Streams). The framework introduces one of the first continuous query languages over Linked Stream Data and Linked Data which is compatible with SPARQL 1.1. The flexibility of our execution framework enables performance gains of several orders of magnitudes over other related systems. For dealing with large RDF datasets and high update throughput RDF streams, we propose an efficient hybrid physical data organisation using novel data structures that support algorithms for efficient incremental evaluation of continuous query operators over Linked Stream Data. The framework also provides several adaptive optimisation algorithms. To demonstrate the advantages of the framework and of the CQELS processing engine in terms of performance, the thesis provides extensive experimental evaluations. The evaluations cover a comprehensive set of parameters that dictate the performance of a continuous queries over Linked Stream Data and Linked Data.
Publisher DOI
Attribution-NonCommercial-NoDerivs 3.0 Ireland