Loading...
Thumbnail Image
Publication

Development, implementation, and evaluation of computational methods to quantify intratumoral heterogeneity and microsatellite instability

Citation
Abstract
Tumors are populations of aberrant cells that have genetic, epigenetic, and phenotypic differences to normal cells. When these differences are exhibited between cancer cells, it is referred to as intratumoral heterogeneity (ITH), and ITH is associated with several major issues in cancer like treatment resistance, relapse, and metastasis. One enabling characteristic of cancer that drives these differences is genome instability — the increased rate of genetic changes in the genome of a cell. When instability manifests in the short tandem repeats in the genome, it is known as microsatellite instability (MSI), and MSI is used to guide immune checkpoint inhibitor treatment. Given the clinical relevance of both ITH and MSI, researchers have created methods and tools to detect them in sequencing data with varying levels of success. Several tools that detect MSI are reported to have near-perfect performance but do not sufficiently disclose what types of data they can be used with. ITH on the other hand is poorly defined and few methods exist that capture the overall genetic aspect of it. Furthermore, there has yet to be an in-depth investigation into whether MSI itself is a heterogeneous phenomenon. The central aim of this thesis is to investigate novel ways to quantify the genetic aspects of ITH and determine whether MSI is a subclonal phenomenon. This central aim is achieved through three goals in which we (1) quantify somatic and germline variation using population genetics and determine its relationship to relapse and MSI, (2) benchmark the leading tools used to identify MSI to clarify their scope and performance, and (3) quantify ITH in MSI at the single-cell level. Chapter 2 addressed the first goal by exploring the use of population genetics statistics in large pan-cancer data. We first investigated whether these statistics when used to measure the somatic variation of a cancer sample could be used to predict whether an individual will relapse. Although we identified several cancer-type specific results related to relapse, we were not able to replicate all these findings after accounting for tumor purity and ploidy. We also investigated if another statistic that measured individual germline heterozygosity had any relationship to MSI score. Similarly, we did not find a relationship between germline variation and MSI score, but we did discover relationships between MSI score and the confounding factors of tumor purity and self-reported population group. The impact of these potential confounding factors should be taken into account when MSI score is used as a clinical biomarker. Next, in Chapter 3 we assessed the performance of the leading tools used to detect MSI in sequencing data. This was done by examining how each MSI tool performed on several sequencing datasets. Making use of this curated data, we validated most of the published performance metrics of these tools but identified several previously unreported shortcomings. The most significant of these findings was that there was a large drop in performance when applying tools originally evaluated on whole exome sequencing data to whole genome sequencing data. We also discovered that an as yet unpublished tool was able to outperform nearly all others on most data types. Lastly, we used the knowledge gained through the previous two chapters to investigate whether MSI tumors consisted of a mixture of cells with and without MSI. To do this, we collected all publicly available single-cell sequencing data that had paired clinical MSI status. Then we created a novel computational pipeline built around two machine-learning based methods to classify cancer cells as having MSI based on gene expression. This led to several findings, with the most important being that approximately one-third of all individuals in the analysis had evidence of cells with and without MSI in their samples. This directly challenges the current binary classification approaches used in research and clinical settings.
Publisher
University of Galway
Publisher DOI
Rights
CC BY-NC-ND