Publication

Analysis of clonal mutations in cancer as a means of studying variation in somatic mutation processes

Cleary, Siobhán
Citation
Abstract
Somatic mutations are mutations that arise throughout a person’s lifetime. They contribute to ageing, cancer and other age-related disorders. Recent technological advances led to many studies investigating somatic mutations in normal tissues. However, somatic mutations are hard to identify in normal tissues due to their low frequency and the difficulty distinguishing between real mutations and errors incorporated during the experimental processes. Studies of somatic mutations in normal tissues suggest that there is still much unknown about how somatic mutations contribute to cancer. Somatic mutations can be studied by analysing cancer samples. Generally, somatic mutations in cancer samples are studied to understand cancer progression and response to treatment. This thesis aimed to investigate somatic mutations present in all cancer cells of a sample (clonal mutations) as a means to understand what is happening in normal tissue. Chapter 2 describes a method to predict the total clonal mutation load of a cancer sample and the use of this approach to investigate the relationship between variation in clonal somatic mutation load and di↵erences between tissues in the risk of developing cancer. Before predicting the total clonal load, we first needed to distinguish between clonal mutations and mutations present in only a subset of cells (subclonal). We adjusted variant frequency for tumour purity and local copy number variation to classify variants as clonal or subclonal. We used the linear relationship between clonal variants and age to predict the total clonal burden for each tissue type. Under the assumption that subclonal mutation accumulation does not correlate with age, we determined what proportion of true clonal variants were classified as clonal. By adjusting various thresholds for classifying variants as clonal variants, we could classify, at best, 45% of the true clonal variants. We then used the relationship between clonal mutation burden and age to estimate the true clonal load for our samples. To investigate whether the estimated clonal mutation burden could be used as a proxy for the number of somatic mutations in healthy cells, we compared our results to somatic mutation burdens that have been measured directly in normal tissues (matched for age and tissue type with the cancer samples). We also found that the predicted clonal load was correlated with lifetime cancer risk. Our findings suggest that we can use predicted clonal load from cancer samples to investigate somatic mutations in the normal tissue and has the advantage of being able to use the large volume of cancer genomics data that has already been generated to extend our understanding of the accumulation of somatic mutations in normal tissues. The major histocompatibility complex (MHC) can present neoantigens resulting from somatic mutations on the cell surface, potentially directing an immune response against it. In Chapter 3, we investigated whether gene expression explains the lack of signal of immunoediting observed among clonal passenger mutations. This hypothesis stemmed from two publications that reported that driver mutations arise in gaps in the capacity of the immune system to recognize them. We investigated whether passenger mutations capable of eliciting an immune response occur preferentially on lowly expressed genes or if the mutant allele has a lower expression than the reference allele through a process termed allele-specific expression (ASE). The neoantigen must be expressed to be presented by the MHC on the cell surface, so a reduction in expression could be a means by which the immunogenic mutations are tolerated. After accounting for gene length and sequence context, we found no di↵erence in the expression of genes harbouring immunogenic mutations compared to nonimmunogenic or synonymous mutations. Additionally, there was no evidence that the mutant allele exhibited ASE more often for immunogenic mutations than nonimmunogenic mutations. Using simulations, we also estimated an upper bound for the impact of immunoediting on the mutational landscape in cancer, showing that at most 5% of missense mutations could be removed by this process. To our knowledge, this was the first attempt to quantify the proportion of missense mutations removed through immunoediting. Finally, in Chapter 4, we extended our analysis on the relationship between gene expression and somatic mutation accumulation by investigating the relationship between germline ASE and cancer risk. Here, we investigated the hypothesis that a single score representing germline ASE in all TSGs for an individual would be associated with an increased cancer risk because only mutations on the expressed copy would be required to disrupt the function of the gene. To assess this, we first tested the ability of two methods to predict ASE using genotype data. We modified a tool called PrediXcan which predicts overall gene expression to predict the expression of each haplotype and generated a ratio with the predicted values. We also applied logistic regression models using heterozygous SNP status as predictors and ASE status as the outcome. Although the performance of ASE predictions was poor for many genes using both methods, our results indicate that it may be possible to generate more accurate predictions using genotype data as input as more data becomes available. As a pilot study, we generated a single TSG ASE score using the genes for which the predictions worked well and assessed the relationship with breast cancer risk. We found no statistically significant relationship between TSG ASE and cancer risk, which is likely due to our inability to predict ASE in the TSGs that contribute to cancer risk in this tissue type, as assessed using cancer data. In conclusion, this thesis presented a novel approach to predict the true clonal load of cancer samples and demonstrated its similarity to the observed somatic mutation load in normal tissue. We also provided further insight into the role of the immune system in shaping the mutational landscape of cancer samples and, using a novel method, generated an estimate for the proportion of missense mutations removed through immunoediting. Finally, we also presented a novel approach to predict germline ASE using genotype data showing it is feasible for some genes and performance is likely to be improved as more data becomes available.
Funder
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland
CC BY-NC-ND 3.0 IE