Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies

Autoři: Helian Feng aff001;  Nicholas Mancuso aff003;  Alexander Gusev aff005;  Arunabha Majumdar aff008;  Megan Major aff010;  Bogdan Pasaniuc aff008;  Peter Kraft aff001
Působiště autorů: Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America aff001;  Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America aff002;  Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America aff003;  Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America aff004;  Department of Medical Oncology, Dana-Farber Cancer Institute & Harvard Medical School, Boston, Massachusetts, United States of America aff005;  Division of Genetics, Brigham & Women’s Hospital, Boston, MA, United States of America aff006;  Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America aff007;  Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America aff008;  Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, California, United States of America aff009;  Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America aff010
Vyšlo v časopise: Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet 17(4): e1008973. doi:10.1371/journal.pgen.1008973
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008973


Transcriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan, UTMOST, or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, 5% and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.

Klíčová slova:

Body weight – Gene expression – Genetic polymorphism – Genetics – Genome-wide association studies – Heredity – Phenotypes – Research errors


