Investigation of T cell receptor and immunoglobulin repertoire through next generation sequencing data
Yu, Yaxuan
Yu, Yaxuan
Loading...
Repository DOI
Publication Date
2017-08-08
Type
Thesis
Downloads
Citation
Abstract
The diversity of the immunological repertoire has long been a subject of research focus, providing important insights into the adaptive immune system. Rapid developments in next generation sequencing technologies have revolutionized the way immunological repertoires are analyzed, providing unprecedented high-resolution data. Nonetheless, these high-throughput approaches also present unique computational challenges that must be addressed through the development of accurate and efficient bioinformatics pipelines. In this thesis, we demonstrated a complete bioinformatics workflow for processing and analysis of high-throughput sequences from immune receptors, and applied these tools to explore research questions relating to the diversity of immune receptor genes in human populations. An aspect of the immunological repertoire that is frequently of immediate interest to immunologists is the distribution of different immune receptor clonotypes among individuals, as knowledge of this could lead to a better understanding of the dynamics of the immune system in different conditions. We first implemented a bioinformatics pipeline to analyze next generation sequencing data from T cell receptors and immunoglobulins. This pipeline featured an ultra-fast and accurate fast-tag-searching algorithm for VDJ alignments, which outperformed all the other similar pipelines on benchmarking. In addition to that, this pipeline included two novel functional components. The first function was polymorphism analysis, which reports putative novel SNPs found in the input sequences. The second novel function was the ability to construct lineage mutation trees to describe the affinity maturation process of immunoglobulins. No matter how sophisticated the alignment algorithms are, accurate gene alignment always requires the right reference database. Unfortunately, the IMGT database, which is the most widely used reference database in immunological repertoire analysis pipelines, has been shown to be incomplete and to contain numerous errors. Thus, the second task undertaken in this PhD thesis was to create a more comprehensive reference database for T cell receptors and immunoglobulin genes by exploiting the large volume of publicly available human genome resequencing data generated in recent years. Based on the variant calling information retrieved from the 1000 Genomes Project and the current human reference genomes, we were able to infer a set of putative alleles of immune receptor genes. Lym1k, our database of these inferred alleles, provided a more comprehensive collection of immune receptor alleles found in global human populations, as evidenced by a significantly improved alignment performance on real datasets compared to IMGT. The immune receptor loci are among the most dynamic regions of the human genome, with a high rate of structural variation, as well as high allelic diversity. Previous analyses of the allelic diversity of immune receptor genes in global human populations were constrained by the limited size of human genome resequencing data. We focused on addressing three research questions relating to the allelic diversity of immune receptor genes in our last research chapter. Firstly, it has been shown by many studies that African populations have greater overall allelic richness than other human populations, we thus compared the allelic diversity between African and Non-African populations for immune receptor genes. Not surprisingly, the immune receptor alleles in African populations were more diversified compared to Non-African populations. As the immune receptor genes with the same gene type are located adjacent to each other on the chromosome, we secondly investigated if genomic location was associated with allelic diversity, potentially reflecting differences in the frequency of receptor gene use between genes located towards the proximal or distal ends of the arrays of genes of a given type. However, we did not find an effect of position on allelic diversity. Lastly, we hypothesized that immune receptor genes that are more frequently selected during rearrangement are under higher diversifying selection pressure, and this would lead to a higher allelic diversity. Surprisingly, the correlation was absent from most of the gene types except for weak positive correlations in TCRA genes. In conclusion, this thesis demonstrated several novel high-throughput approaches and strategies for immunological repertoire analysis. It also addressed some important biological questions relating to the allelic diversity of immune receptor genes by exploiting public biological resources, which could potentially inform subsequent studies.
Funder
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland