Publication

Novel Insights into Chromatin Structure and Gene Regulation through Integrative Analysis of High Throughput Genomics Data

Nguyen, Thong
Citation
Abstract
Genetics and epigenetics research has evolved dramatically over the last decade, owing to rapid developments in high-throughput genomics techniques. Analysis of the resulting quantities of data requires advanced computational strategies. In this thesis, we use computational and statistical methods to tackle biological questions relating to two major research topics. First, we carried out integrative analysis of next generation sequencing data to answer questions regarding chromatin biology and epigenetics. Second, we performed genome-wide analysis of the effects of genetic variants on gene expression, focussing on two key processes: mRNA decay and mRNA translation. The first question investigated genome-wide distribution of the histone variant H2AX, a key factor in the DNA damage response pathway. We assessed the genomic landscape of H2AX in human U2OS cells using H2AX ChIP-seq data. Strikingly, we found that H2AX was enriched in heterochromatic regions. Heterochromatin has previously been shown to be refractive to damage signalling through H2AX phosphorylation and, consequently, we hypothesized that the greater abundance of H2AX in heterochromatin helps to ensure sufficient H2AX phosphorylation to signal DNA damage events. We next turned to characterizing the chromatin organization of the genomic regions that are distal (distal junction - DJ) and proximal (proximal junction - PJ) to human nucleolar organizer regions (NORs). Because they are absent from the reference genome assembly, these regions represent a major gap in our understanding of the epigenetic configuration of the human genome. An integrative analysis of ChIP-seq, RNA-seq, FAIRE-seq and DNase-seq data, generated by the ENCODE consortium, revealed that the DJ resembles euchromatic regions and, surprisingly, harbours transcripts that are transcribed by RNA polymerase II. Laboratory experiments showed that the DJ is localized to the periphery of the nucleolus, where it anchors the ribosomal DNA arrays. This study sheds new light on the role of NORs in nucleolar formation and function, and enables further investigation of the link between nucleoli and human pathologies. The focus then shifts to studying genetic variation in gene expression. First, we set out to identify trans-acting genetic variants that influence RNA stability. We demonstrate that perturbation of RNA stabilization is detectable from mRNA expression data. Using the mRNA expression data generated from 726 HapMap3 samples, we calculated the relative expression of long-lived RNAs versus short-lived RNAs for each sample (referred to as RNA stability score or RS-score). Treating RS-score as a quantitative trait, we applied genome-wide association and identified a SNP, rs6137010, with which it is strongly associated in two Asian populations: Han Chinese from Beijing (CHB) and Japanese from Tokyo (JPT). This SNP is a cis-eQTL for SNRPB (a core component of the spliceosome) in CHB and JPT. Thus, we propose that the association between this SNP and inter-individual variation in RS-score is likely mediated by changes in SNRPB expression levels. The final question investigated the effects of genetic variants on mRNA translation. We developed a computational pipeline to identify genetic variants that influence allele-specific mRNA translation rate (AST). Analysis of allele-specific events is severely biased by the fact that short read sequences favour mapping to the reference allele. Thus, our pipeline first constructs a haplotype-resolved genome for a given cell-type by making use of high-throughput sequencing data that are publicly available for that cell-type. Both RNA-seq and Ribo-seq data are then mapped to the resulting haplotype-resolved genome in order to identify genes that show evidence of AST. Applying this pipeline for the datasets from HeLa cells, we found 171 protein-coding genes that are associated with AST. Inspection of heterozygous SNPs located in the AST genes revealed two interesting mutations, within the 5'UTR of two genes: ATP5H and SLCO4A1, that appear to inhibit translation initiation of these genes. To sum up, this thesis presents novel computational strategies for integrative analysis of large volumes of high-throughput genomics data. By addressing biological questions in the areas of chromatin biology and gene regulation, this thesis yields key insights into the DNA damage response, the role of NORs in nucleolar formation and function, and the effects of genetic variants on mRNA stability and mRNA translation.
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland