Publication

Loose coupling in heterogeneous event-based systems via approximate semantic matching and dynamic enrichment

Hasan, Souleiman
Citation
Abstract
There has been a significant change in the data landscape with the emergence of the Internet of Things (IoT). Tens of billions of devices are expected to connect to the Internet in the coming years within smart buildings, smart grids, smart cities, and cyber-physical systems. A basic requirement to realize the IoT is an infrastructure of sensing and communication solutions. Middleware systems, such as event processing, are also required to abstract the application developers from the underlying technologies. Large-scale event processing environments are open, distributed, and heterogeneous in semantics and contexts. Interoperability is a key requirement and currently addressed by top-down granular agreements represented by ontologies and taxonomies for semantics. Such approaches are non-scalable, and achieving such agreements may be unfeasible under the characteristics of current and future event environments such as the IoT. This thesis analyses this problem using a decoupling versus coupling trade-o framework. Event producers and consumers do not know each other and are decoupled in space, time, and synchronization to enable scalable deployments. They have boundaries that they have to cross in order to communicate with other systems. Such boundaries are syntactic, semantic, and pragmatic. Events are boundary objects that convey meanings signified by symbols. They must effectively cross the three levels of boundaries to establish interoperability and communication between event agents. The current event processing paradigm is focused on crossing lower syntactic boundaries. Thus, human agents are needed in the loop to cross semantic and pragmatic boundaries through explicit agreements on event types, properties, values, and contexts, introducing coupling into these systems. Coupling limits the paradigm and contradicts the fundamental basis of decoupling for scalability. A trade-off can be concluded between decoupling for scalability and coupling for interoperability. Space, time, and synchronization decoupling dimensions of event systems contribute to event transfer. I define two new types of problematic coupling dimensions: the semantic coupling and the pragmatic coupling. They correspond to granular and labour-intensive agreements on event semantics and contexts by humans involved in developing and using the event system. Such agreements may not be feasible in large-scale environments such as the IoT. Current approaches to semantic and context interoperability in event processing are coupled on one or more of these two dimensions, limiting scalability. This thesis concerns two research questions of how semantic and pragmatic coupling can be loosened effectively and efficiently. I propose an approach based on four elements: subsymbolic semantics, free tagging, dynamic native enrichment, and approximation. A statistical vector-space model of semantics is built from a textual corpus that reflects the mutual understanding of event producers and consumers. Subscriptions are consumers' expressions to match events of interest. Free tags, called themes, are added to events and subscriptions to improve their meanings. Subscriptions are enhanced with indications of context to dynamically enrich events. Terms in events and subscriptions are decoded into their subsymbolic vector representations that are then matched using an approximate probabilistic matcher, resulting in scored relevance of events to subscriptions. The hypotheses underlying the proposed approach are empirically validated within synthetic and real-world scenarios from the smart cities and energy management domains. A loose semantic coupling can be achieved with coarse-grained agreements on statistical semantics, with 100 approximate subscriptions compensating for 74,000 exact subscriptions otherwise needed. The approximate matcher achieves a magnitude of 1,000 events/sec of throughput, and an effectiveness of over than 95% F1Measure. Using thematic tagging, a lightweight amount of tags is needed: around 2􀀀-7 for events and 2-15 for subscriptions. It delivers a magnitude of 800 events/sec in the worst case and 85% F1Measure as opposed to 62% worst-case for non-thematic processing. Loose pragmatic coupling is achieved with 4 high-level clauses in the subscriptions to guide the dynamic enricher. They specify the source, the retrieval method, the context search strategy, and the fusion method of events with context. Enrichment is instantiated with spreading activation in Linked Data graphs. It is tested with 24,000 events, with live DBpedia, a structured version of Wikipedia, as a contextual source. It reaches an efficiency and effectiveness of 7 times more than other instantiations of the enricher. The research discussed in this thesis has been deployed in working systems for energy and water management where it has had an impact on real world applications. The model has also been developed into the concept of thingsonomies, an architecture for the Internet of Things that can tackle variety and allows IoT systems to evolve into large-scale, heterogeneous, and loosely coupled environments.
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland