Analysis of gene regulation using high throughput genomics
Paul, Geeleher
Paul, Geeleher
Loading...
Repository DOI
Publication Date
2012-10-23
Type
Thesis
Downloads
Citation
Abstract
The recent development of high-throughput genomics techniques and their subsequent applications have completely transformed the study of biology. The analysis, interpretation and storage of the resulting large volumes of data have created a wide range of computational challenges and opportunities that have driven the majority of recent bioinformatics research. In this thesis we focus on four research questions grounded in functional genomics and epigenomics, yielding novel methodologies and biological insights. The first research question relates to whether miRNA activity, as a general regulatory effect, is a heritable trait. To do this, we used Affymetrix Human Exon Microarray and RNA-seq data from the International HapMap project. We confirmed such an association in humans using the regulatory effect score (RE-score) of a miRNA, which has previously been defined as the difference in the gene expression rank of targets of the miRNA compared to non-targeted genes. We also identified a SNP in the miRNA processing gene \textit{DROSHA}, which is associated with inter-individual difference in miRNA regulatory effect. During this analysis we noted that correlations between gene expression measures from RNA-seq and gene expression microarray platforms were often relatively poor. This led us to develop a method to improve the estimation of gene expression from microarrays. Our method uses samples for which there is both microarray and RNA-seq data available and builds statistical models which learn the relationship between probe level gene expression, as measured by the microarrays, and gene level expression, as measured by RNA-seq. These models can then be used to estimate gene expression on separate sets of microarray samples. We have assessed the performance of our method in comparison to Affymetrix Power Tools (APT). To do this, we fitted models for all genes on a training set of the HapMap YRI samples and tested performance on the HapMap CEU (both microarray and RNA-seq data are available for all of these samples). Overall, our method improves within sample correlations with RNA-seq substantially, but does not achieve the same level of performance as APT in terms of across sample correlations. The third research question aimed to determine whether or not it was possible to ascertain a consistent pattern of differential methylation in a limited number of ulcerative colitis (UC) biopsies, using data generated with the Agilent Human CpG Island microarray. Although there were no statistically significant differences between the sample groups at CpG island or probe level, we did uncover evidence of overall CpG island hypermethylation in UC. Subsequently, gene set analysis (GSA) revealed highly significant results for several GO biological processes. It became apparent that these results were a consequence of a sampling effect, which stems from the large differences in numbers of probes (targeting CpG sites) associated with genes in different gene sets. The fourth and final research question consisted of the development of a method to correct the bias in GSA analysis of these data. We applied our method to both the UC microarray dataset and a previously published genome-wide CpG island study of DNA methylation in lung cancer. We obtained novel biological insights into both of these conditions, consistent with their respective pathologies. Finally, we showed that this bias is also found with next generation sequencing based methylation assays, which we demonstrated using a HELP-seq dataset. In conclusion, this thesis presents novel analytical strategies encompassing gene expression and genome-wide methylation, and it also introduces methodologies that link microarray and RNA-seq measures of expression. It documents for the first time a correction for an intrinsic bias in GSA associated with many CpG island methylation platforms, and yields results of biological consequence with regard to endogenous RNAi regulatory processes and the epigenetic characterization of several human diseases.
Funder
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland