Publication

KnowZRel: Common sense knowledge-based zero-shot relationship retrieval for generalised scene graph generation

Khan, Muhammad Jaleed
Breslin, John G.
Curry, Edward
Citation
Khan, M. J., Breslin, J. G., & Curry, E. (2025). KnowZRel: Common Sense Knowledge-based Zero-Shot Relationship Retrieval for Generalised Scene Graph Generation. IEEE Transactions on Artificial Intelligence, 1-11. https://doi.org/10.1109/TAI.2025.3544177
Abstract
A scene graph is a key image representation in visual reasoning. The generalisability of Scene Graph Generation (SGG) methods is crucial for reliable reasoning and real-world applicability. However, imbalanced training datasets limit this, underrepresenting meaningful visual relationships. Current SGG methods using external knowledge sources face limitations due to these imbalances or restricted relationship coverage, impacting their reasoning and generalisation capabilities. We propose a novel neurosymbolic approach that integrates data-driven object detection with heterogeneous knowledge graph-based object refinement and zero-shot relationship retrieval, highlighting the loosely coupled synergy between neural and symbolic components. This combination addresses the limitations of imbalanced training datasets in scene graph generation and enables effective prediction of unseen visual relationships. Objects are detected using a region-based deep neural network and refined based on their positional and structural similarity, followed by retrieval of pairwise visual relationships using a heterogeneous knowledge graph. The redundant and irrelevant visual relationships are discarded based on the similarity of relationship labels and node embeddings. Finally, the visual relationships are interlinked to generate the scene graph. The employed heterogeneous knowledge graph combines diverse knowledge sources, offering rich common sense knowledge about objects and their interactions in the world. Our method, evaluated using the benchmark Visual Genome dataset and zero-shot recall (zR@K) metric, shows a 59.96% improvement over existing state-of-the-art methods, highlighting its effectiveness in generalised SGG. The object refinement step effectively improved the object detection performance by 57.1%. Additional evaluation using the GQA dataset confirms the cross-dataset generalisability of our method. We also compared various knowledge sources and embedding models to determine an optimal combination for zero-shot SGG. The source code is available at https://github.com/jaleedkhan/zsrr-sgg.
Publisher
Institute of Electrical and Electronics Engineers
Publisher DOI
Rights
Attribution 4.0 International