Post-genome-wide association study investigations of the polygenic basis of schizophrenia and cognition
Fahey, Laura
Fahey, Laura
Loading...
Repository DOI
Publication Date
2022-10-14
Type
Thesis
Downloads
Citation
Abstract
Schizophrenia (SCZ) is a chronic, severe mental disorder affecting approximately 1% of the population worldwide. A core feature of SCZ is cognitive dysfunction in the form of impairments in memory, attention and IQ. Both SCZ and cognition have a complex aetiology influenced by both genetic and environmental factors. The heritability of SCZ is estimated to be approximately 80% and the heritability of cognition is estimated to be approximately 50% (average across lifetime). Genome-wide association studies (GWASs) have revealed that the genetic component of these phenotypes is based on the cumulative contribution of a large number of variants, each with small effect sizes; this is referred to as the polygenic model. The variants identified by SCZ GWAS map to genes that are enriched for expression in the brain (mainly in neuronal cell types) and immune related tissues, as well as genes with synaptic function. Similarly, GWAS studies of intelligence have implicated genes expressed in the brain, with involvement in the development of the central nervous system and synaptic function. This thesis comprises four studies. The first three focused on investigating the downstream proteins influenced by SCZ risk transcription factors. I created sets of genes directly regulated by these transcription factors in biologically relevant cell types and investigated them for enrichment of common and rare genetic variation, associated with SCZ and related phenotypes, for the purpose of identifying perturbed functional pathways. My final study investigated if modelling interactions between genetic variants, using machine learning, results in better performance than the standard, linear polygenic score method in the prediction of intelligence based on genotype data. My first study investigated sets of genes regulated by the transcription factor encoded by the SCZ risk gene, B-cell chronic lymphocytic leukaemia/lymphoma 11B (BCL11B), for enrichment of common and de novo genetic variation associated with SCZ. BCL11B is important for the regulation of cell differentiation and development in both the central nervous and immune systems. Both of these systems have previously been linked to SCZ. We identified direct target genes of BCL11B, in neuronal and immune cell types, using functional genomics data. We found that direct target genes of BCL11B in double positive (DP) developing T cells were enriched for genes containing missense de novo mutations (DNMs) reported in SCZ patients, the signal being concentrated in genes negatively regulated by the transcription factor. Biological processes found to be enriched for genes negatively regulated by BCL11B in DP T cells included immune system development and cytokine signalling. The conclusion from this work was that DNMs in immune pathways contribute to SCZ risk. My second study investigated genes regulated by myocyte enhancer factor 2 C (MEF2C), an important transcription factor during neurodevelopment. Mutation or deletion of MEF2C causes intellectual disability (ID) and common variants within MEF2C are associated with cognitive function and SCZ risk. We used a set of 1,055 genes that were differentially expressed in the adult mouse brain following early embryonic deletion of Mef2c in excitatory cortical neurons. We found these differentially expressed genes (DEGs) to be enriched for common genetic variation associated with SCZ, intelligence and educational attainment and enriched for genes containing DNMs reported in ASD and ID. Using single cell RNA-seq data, we identified that a number of different excitatory glutamatergic neurons in the cortex were enriched for these DEGs including deep layer pyramidal cells and cells in the retrosplenial cortex, entorhinal cortex and subiculum, and these cell types are also enriched for FMRP target genes. The involvement of MEF2C in synapse elimination suggests that disruption of this process in these cell types during neurodevelopment contributes to cognitive function and risk of neurodevelopmental disorders. My third study used differentially expressed genes from the mouse cortex of an updated Mef2c mouse model with a heterozygous DNA binding-deficient mutation of Mef2c (Mef2c-het). We combined this data with MEF2C ChIP-seq data from cortical neurons and single-cell data from the mouse brain to create a set of genes that are differentially regulated in Mef2c-het mice, direct target genes of MEF2C and elevated in expression in cortical neurons. We find this gene-set to be enriched for genes containing common genetic variation associated with IQ and EA, the signal being concentrated in those genes positively regulated by MEF2C. These positively regulated genes are enriched for functionality in the adenylyl cyclase signalling system, which is known to positively regulate synaptic transmission and has been linked to learning and memory. Our results suggest that MEF2C, through regulation of genes involved in the adenylyl cyclase signalling system, implicates synaptic function and in turn contributes to cognitive function. Further studies are required to understand the exact molecular mechanisms by which MEF2C functions in this system and to investigate its potential as a drug target to alleviate the cognitive deficits in disorders such as SCZ. My fourth study investigated the performance of the machine learning model, XGBoost, in comparison to a conventional polygenic score (PGS) method for predicting intelligence based on genotype data. We hypothesised that XGBoost would perform better than the PGS method, because it models interactions between genetic variants, rather than testing each variant for association with the phenotype independently. We found that XGBoost was capable of detecting small effects contributing to the complex polygenic phenotype, intelligence. Although it did not outperform the classical PGS method, it was capable of achieving the same prediction performance with a fraction of the SNPs and the top SNPs identified as being important for predictive performance were biologically relevant.
Funder
Publisher
NUI Galway