Publication

Analysis of putative somatic mutations in 200,000 human exomes

Bennett, Declan
Citation
Abstract
Somatic mutations accumulate throughout life and contribute significantly to disease risk. While research into somatic mutation is well established in cancer, it is only in recent years that investigations into the implications of somatic mutations in healthy tissues have begun to be feasible, due to advances in sequencing technologies and protocols. The requirement of specialist techniques has, however, limited the study of somatic mutations in healthy tissues to small sample sizes, which do not allow for assessment of the impact of somatic mutations on human health on a population scale. We posited that it may be possible to study variation in the somatic mutation rate between individuals and across the genome through analysis of low-depth sequencing data, by developing strategies to distinguish the contribution of somatic mutations to the mismatches (relative to the reference genome) observed in these data from sequencing errors, DNA damage and other artefacts. Using somatic mutation rates obtained from the literature, we estimated that 0.4% of the mismatches between the UK Biobank exome sequencing reads and the reference genome were due to somatic mutations. We demonstrated that this proportion was sufficient to induce a relationship between the abundance of mismatches and age, when individuals were grouped by integer age. We then searched for additional sample properties that are correlated with the mismatch burden and found positive correlations with cancer diagnosis and smoking status. However, by carefully examining the UK Biobank exome sequencing data, we uncovered previously unreported batch effects relating to sequencing run. The observed associations with cancer diagnosis and smoking status were lost when we corrected for this batch effect. However, the batch correction improved the correlation between age and mismatch load. Individuals diagnosed with Lynch syndrome have increased somatic mutation loads due to deficiencies in mismatch repair genes and we investigated whether this effect could be detected in the exome sequencing data. In the UK Biobank, we identified 160 individuals with pathogenic variants associated with Lynch syndrome. Using the COSMIC signatures associated with mismatch repair, we compared the contribution of mismatch repair mutational signatures between the Lynch syndrome samples and the remaining samples. We detected a marginally statistically significant difference between the contribution of SBS18 between the two sample groups; however, this result did not survive multiple correction testing.
Funder
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland
CC BY-NC-ND 3.0 IE