The role of genomic data in stratifying patients within predictive models for breast cancer survival outcome
King, Lydia
King, Lydia
Loading...
Publication Date
2024-04-15
Keywords
Mathematical and Statistical Sciences, Science and Engineering, Genomics Data Science, Cancer Genomics, Bioinformatics, Copy Number Alterations, Allele-Specific Copy Number Profiling, Survival Analysis, Copy Number Changepoints, Breast Cancer, Breast Cancer Survival, Gene Expression, Microarray Analysis, Predictive Models
Type
Thesis
Downloads
Citation
Abstract
Genomic instability (GI), defined as an increased tendency for genomic alterations to occur, is a common feature of cancers and is recognised as a “facilitating” hallmark of cancer. Genomic alterations include base substitutions, indels, rearrangements and copy number alterations (CNAs). CNAs in cancer have been extensively profiled but due to the complexity of cancer genomes, frequent deviations from diploidy, i.e. having two sets of homologous chromosomes, and the presence of both tumour and non-tumour cells, many studies have been limited to reporting total copy number, the sum of the copy numbers of the two homologous chromosomes. Determining the CNA landscape of each homologous chromosome, i.e. allele-specific copy number, is important for the characterisation of certain genomic aberrations and the inference of their clonal history. Breast cancer is largely dominated by CNAs, rather than mutations in a single gene, with increasing evidence suggesting that the genomic landscape of the tumour is associated with survival and incorporating this information into treatment decisions is beneficial to the patient. This thesis uses total and allele-specific CNA data to explore the CNA landscape of breast tumours and their associations with survival. Focusing on observations from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort, we define novel metrics for total CNA measurements, estimating the distribution of these metrics allowing for missing values. Analysing distributions of the CNA metrics comparing groups of patients stratified by molecular classifications indicates that subtypes associated with worse survival outcomes tend to have significantly higher levels of GI, and higher deletion burden, than subtypes associated with better survival outcomes. Further investigation of these CNA metrics in the context of survival indicates that for molecular classifications displaying low levels of GI, the CNA metrics can partition patients based on survival outcome and aid in the identification of patients who may be more at risk. CNA metrics consistently selected as useful predictors for survival outcome include CNA metrics measuring the copy number deletion landscape, further indicating that deletions are more harmful than amplifications. Differential gene expression analysis is carried out to investigate the effect that CNAs have on gene expression. Genes observed to be dysregulated in patients with decreased survival outcomes are known to facilitate cell proliferation, tumour progression and invasion. Investigating the direct relationship between a gene’s CNA state and its expression, using a modified limma pipeline, two differentially expressed gene sets are produced, with some degree of congruence observed when comparing to published predictive and prognostic assays and additional genes emerging as new focus. Deriving allele-specific copy number profiles applying Allele-Specific Copy number Analysis of Tumours (ASCAT), models are proposed and assessed to identify and model features of changepoints in these profiles, including allele independent (AI) models and allele dependent (AD) models. Application of the AD models to defined intervals, including gene regions and genomic segments of specified length, identifies a number of gene and non-gene regions of interest.
Funder
Publisher
NUI Galway