Publication

Investigation of intron coevolution and the autoimmune potential of alternative splicing

Keane, Peter
Citation
Abstract
Introns are non-coding intergenic sequences that are routinely excised from nascent pre-mRNA transcripts by a process known as splicing. Although they do not contribute directly to the protein-coding sequences of genes, they are known to have a number of important functions. In this thesis, we explore some of the functional implications of the presence of introns in mammalian genomes. In the first part of this thesis, we considered the hypothesis that tissue-specific alternative pre-mRNA splicing may result in autoimmune responses against self-antigens and showed that such restricted splice isoforms are often expressed in thymic epithelial cells, contributing to the establishment of self-tolerance of T lymphocytes. In the second part, we carried out an investigation of intron length coevolution as a means to explore the functional implications of the time taken to transcribe introns for biological processes that require precise temporal co-ordination. Immune self-tolerance of T lymphocytes is established during their development in the thymus. This process, called negative selection, involves the exposure of the developing T cells to a range of self-peptides. This includes peptides that are normally only expressed in specific tissues outside of the thymus. The expression of these tissue-restricted antigens (TRAs) is under the control of AIRE, a transcription factor that is expressed in the thymus by medullary thymic epithelial cells (mTECs). Tissue-specific alternative splicing also has the potential to introduce TRAs, but the expression of tissue-specific isoforms in the thymus had yet to be investigated. Through the re-analysis of publicly available next-generation sequencing data from thymic epithelial cells, we show that mTECs ectopically express a range of tissue-specific splice isoforms, and that the diversity of splice isoforms expressed in mTECs is greater than for any other tissue. This increased diversity is likely to be under the control, at least partially, of AIRE, as in the absence of AIRE there was a significant decrease in the splicing diversity and number of exons detected in mTECs. Remarkably, mTECs are known to express almost all known protein-coding genes, providing a comprehensive coverage of the proteome to developing T cells during negative selection. In a single mTEC however, only a small portion of the total proteome is expressed, suggesting that the total breadth of expression in mTECs is due to a highly diverse cell population. To assess the diversity of alternative splicing at the single mTEC level, we analyzed published scRNA-Seq datasets from mTECs and compared them to other similar cell types. We found that while in general the increased splicing diversity of the mTEC population was also apparent to some extent in single cells, this splicing diversity was greatly enhanced by the pooling of multiple cells. At the population level, we also calculated gene expression entropy as a measure of the total transcriptome diversity in mTECs, and found that the diversity of gene expression in mTECs is greater than any other tissue. Overall, our results suggest that the diversity of the mTEC transcriptome is greater than any other cell type, in terms of both alternative splicing and gene expression. This diversity is somewhat apparent in single mTECs, but is enhanced by the pooling of multiple cells. This diversity is under the partially under the control of AIRE, and reflects the role of AIRE in establishing and maintaining T cell tolerance to self. Precise regulation of the timing of gene expression is functionally relevant in some biological processes. This is particularly important for developmental processes, where intron delays coupled with negative feedback loops can establish oscillatory patterns of gene expression that are required for normal embryonic development. It has previously been suggested that the intron content of a set of genes involved in development are under purifying selection, suggesting that natural selection does act on intron length. In this thesis, we carried out an investigation of intron length coevolution in mammals to test the hypothesis that sets of genes that require precise coordination in the timing of their expression may be sensitive to evolutionary changes in intron length, and that such changes, when they occur, should be correlated among these sets of genes. We found strong evidence for intron length coevolution in sets of genes enriched for biological processes related to development and the cell cycle. We also found that genes that belong to the same protein complex or which are co-expressed are more likely to show evidence of intron length coevolution than randomly sampled genes. Overall, our results suggest that intron length may be functionally relevant in these gene sets, and that natural selection acts to maintain the relative intron length and transcriptional timing in these genes, revealing a novel aspect of intron evolution and function.
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland