Neurosymbolic visual reasoning with scene graph enrichment
Khan, Muhammad Jaleed
Khan, Muhammad Jaleed
Loading...
Identifiers
http://hdl.handle.net/10379/18065
https://doi.org/10.13025/17247
https://doi.org/10.13025/17247
Repository DOI
Publication Date
2024-02-26
Type
Thesis
Downloads
Citation
Abstract
Visual reasoning is a critical component of artificial intelligence that aims to understand, interpret, and reason about complex visual content. It has an interdisciplinary nature incorporating visual feature extraction and image generation from computer vision, linguistic feature extraction and language generation from natural language processing, and graph-based representation and semantic enrichment from knowledge representation and reasoning. Data-centric visual reasoning techniques often face limitations in intuitively interpreting visual content due to the limited expressiveness and generalisability of scene representations. We propose a knowledge-enhanced neurosymbolic visual reasoning framework based on scene graph enrichment. This framework employs deep learning techniques for object detection and relationship prediction in visual content to generate scene graph representations, which are then refined and semantically enriched using common sense knowledge extracted from a heterogeneous knowledge graph. The enriched scene graphs are used in downstream visual reasoning tasks, including image captioning, visual question answering and image generation. A comprehensive experimental analysis on the standard datasets and evaluation benchmarks demonstrates considerable improvement over existing state-of-the-art methods in terms of relationship recall rate, image captioning quality, question answering accuracy and image generation realism. The encouraging results validate the effectiveness of leveraging heterogeneous common sense knowledge for enhanced scene understanding and visual reasoning.
Funder
Publisher
NUI Galway