Publication

Semantic knowledge graphs to understand tumor evolution and predict disease survival in cancer

Alokkumar, Jha
Citation
Abstract
Genomics technologies have generated large amounts of easily accessible biological -omics data, providing an unprecedented opportunity to study the mechanism in cancer. However, clinical research and the life sciences domain critically require to have a unified, integrated data model to facilitate the prognostic and diagnostic validation of biomarkers obtained. Knowledge graphs emerged as a promising solution based on our research for genomics and other -omics datasets. The primary reason to select a knowledge graph-based approach is that much data come from single cohorts such as TCGA, ICGC etc. that are carefully constructed to mitigate bias. Emerging datasets supporting the understanding of the complete mechanism are unstructured and in silos. The larger datasets such as TCGA and ICGC are patient cohorts and have issues ranging from patient self-selection, to confounding by indication, to limited knowledge of outcome data, and can therefore result in inadvertent bias if used alone for biomarker discovery. However the inclusion of molecular data such as CNV (copy number variation), DNA methylation, gene expression and mutation data (COSMIC, DoCM, MethylDB) adds additional and mechanistic features along with observational data from these cohorts. We applied semantic web and linked data approaches for knowledge graph embedding and federated networks for rapidly changing information in characterizing the disease, specifically cancer. The rapid change plays a critical role in disease progression and disease mechanism. Usually, these mechanisms are explained through biomarkers retrieved using comparative analysis of cancer stages with control for quantitative gene expression data. However, our knowledge graph facilitated including not only the quantitative data but also supportive molecular mechanisms to understand the change in pattern and associated factors. The cancer genomic events are layered processes and prediction models have to accommodate multi-omics data so that each molecular subtype feeds into incremental knowledge. This layered knowledge helps to improve the prediction of clinical outcomes, to elucidate the interplay between different levels and in disease modeling through layered data assembling. In our approach, we extrapolated knowledge graphs beyond the conventional knowledge enrichment and introduced a pattern mining approach to track the indicators in diseases such as cancer in a continuous way. We introduced the topological motif perturbations approach across disease stages to uncover the instances responsible for the change in the pattern, thus for disease progression, by continuous knowledge enrichment. Further, we applied a GCNN-based (graph convolution neural network) approach to identify the features required to not only track the disease mechanism but also predict survival and relapse in patients. We have customized the neural network in such a way that, while learning, we could customize the weight of each dataset or new concept added into the knowledge graph. The customized GCNN not only helped to predict the relapse accurately but also help to dichotomize the most relevant level of each dataset in cancer genomics. The above approach was tested and validated across various cancer types and contributed not only towards a novel way to integrate, understand and predict cancer but also added novel biomarker, e.g. the contribution of biomarkers in Gynecological cancers such as breast cancer, ovarian cancer, cervical cancer, and uterus cancer. The biomarkers retrieved through this approach contributed novel information about genes such as MYH7 involved in these cancers. We applied a motif-based pattern mining approach and established the relevant biomarkers to explain cancer progression mechanisms. Lastly, we developed a prediction model for breast and pancreatic cancer and developed clinical indicators. We also contributed adding COSMIC, TCGA and other RDF datasets into the linked open data (LOD) cloud with new enriched links from our knowledge graph: ``Oncology LOD''.
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland