Linked functional annotation for differentially expressed gene (DEG) demonstrated using Illumina Body Map 2.0

Jha, Alokkumar
Khan, Yasar
Iqbal, Aftab
Zappa, Achille
Mehdi, Muntazir
Sahay, Ratnesh
Rebholz-Schuhmann, Dietrich
Jha, Alokkumar , Khan, Yasar , Mehdi, Muntazir , Iqbal, Aftab , Zappa, Achille , Sahay, Ratnesh , & Rebholz-Schuhmann, Dietrich. (2015). Linked Functional Annotation For Differentially Expressed Gene (DEG) Demonstrated using Illumina Body Map 2.0. Paper presented at the International Conference of Semantic Web Applications and Tools for the Life Sciences (SWAT4LS), Cambridge, United Kingdom, 09 December.
Semantic Web technologies are core for the integration of disparate data resources. It can be used to exploit data from next generation sequencing (NGS) for therapeutic decisions regarding cancer. In this manuscript, we describe how different data resources, which inform on the expression of specific genes in a tissue and its variants, can be brought together to indicate a risk for tissue-specific cancer for NGS data. This approach can be used to judge patient genomic data against public reference data resources. The TCGA and COSMIC repositories are being processed to connect and query information concerning the expression of genes, copy number variants (CNV), and somatic mutations. We annotated sets of differential expression data provided from the Illumina Body map 2.0 (HBM) concerning 16 different tissue types and identify genes with an RPKM (Reads Per Kilobase of transcript per Million mapped reads) value greater than 0.5 as measure indicating an associated risk for cancer. Thus, the differential expressed genes from HBM can be associated with a tissue type and gene expressions in COSMIC and TCGA leading to a potential biomarker for that particular tissue specific cancer. In the case of ovarian cancer, we retrieved the genomic positions (loci) and the associated genes of potential biomarker candidates, and suggest that this approach and platform can serve future studies well. Altogether, the presented linked annotation platform is the first approach to represent the COSMIC data in an RDF format and to link the data with the TCGA datasets. The proposed approach enriches mutations by filling in missing links from COSMIC and TCGA datasets which in turn helped to map mutations with associated phenotypes.
Publisher DOI
Attribution-NonCommercial-NoDerivs 3.0 Ireland