-
Články
Top novinky
Reklama- Vzdělávání
- Časopisy
Top články
Nové číslo
- Témata
Top novinky
Reklama- Kongresy
- Videa
- Podcasty
Nové podcasty
Reklama- Kariéra
Doporučené pozice
Reklama- Praxe
Top novinky
ReklamaSimultaneous SNP selection and adjustment for population structure in high dimensional prediction models
Autoři: Sahir R. Bhatnagar aff001; Yi Yang aff003; Tianyuan Lu aff004; Erwin Schurr aff006; JC Loredo-Osti aff007; Marie Forest aff008; Karim Oualkacha aff009; Celia M. T. Greenwood aff001
Působiště autorů: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada aff001; Department of Diagnostic Radiology, McGill University, Montréal, Québec, Canada aff002; Department of Mathematics and Statistics, McGill University, Montréal, Québec, Canada aff003; Quantitative Life Sciences, McGill University, Montreal, Québec, Canada aff004; Lady Davis Institute, Jewish General Hospital, Montréal, Québec, Canada aff005; Department of Medicine, McGill University, Montréal, Québec, Canada aff006; Department of Mathematics and Statistics, Memorial University, St. John’s, Newfoundland and Labrador, Canada aff007; École de Technologie Supérieure, Montréal, Québec, Canada aff008; Département de Mathématiques, Université du Québec à Montréal, Montréal, Québec, Canada aff009; Gerald Bronfman Department of Oncology, McGill University, Montréal, Québec, Canada aff010; Department of Human Genetics, McGill University, Montreal, Quebec, Canada aff011
Vyšlo v časopise: Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. PLoS Genet 16(5): e32767. doi:10.1371/journal.pgen.1008766
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008766Souhrn
Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).
Klíčová slova:
Algorithms – Covariance – Genetic loci – Genome-wide association studies – Mathematical models – Molecular genetics – Simulation and modeling – Variant genotypes
Zdroje
1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747. doi: 10.1038/nature08494 19812666
2. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nature genetics. 2010;42(7):565. doi: 10.1038/ng.608 20562875
3. Astle W, Balding DJ, et al. Population structure and cryptic relatedness in genetic association studies. Statistical Science. 2009;24(4):451–471.
4. Song M, Hao W, Storey JD. Testing for genetic associations in arbitrarily structured populations. Nature genetics. 2015;47(5):550–554. doi: 10.1038/ng.3244 25822090
5. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nature genetics. 2004;36(5):512. doi: 10.1038/ng1337 15052271
6. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS genetics. 2008;4(7):e1000130. doi: 10.1371/journal.pgen.1000130 18654633
7. Li J, Das K, Fu G, Li R, Wu R. The Bayesian lasso for genome-wide association studies. Bioinformatics. 2010;27(4):516–523. doi: 10.1093/bioinformatics/btq688 21156729
8. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST linear mixed models for genome-wide association studies. Nature methods. 2011;8(10):833–835. doi: 10.1038/nmeth.1681 21892150
9. Kang HM, Sul JH, Zaitlen NA, Kong Sy, Freimer NB, Sabatti C, et al. Variance component model to account for sample structure in genome-wide association studies. Nature genetics. 2010;42(4):348. doi: 10.1038/ng.548 20208533
10. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature genetics. 2006;38(2):203. doi: 10.1038/ng1702 16380716
11. Eu-Ahsunthornwattana J, Miller EN, Fakiola M, Jeronimo SM, Blackwell JM, Cordell HJ, et al. Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet. 2014;10(7):e1004445. doi: 10.1371/journal.pgen.1004445 25033443
12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38(8):904. doi: 10.1038/ng1847 16862161
13. Oualkacha K, Dastani Z, Li R, Cingolani PE, Spector TD, Hammond CJ, et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genetic epidemiology. 2013;37(4):366–376. doi: 10.1002/gepi.21725 23529756
14. Cordell HJ, Clayton DG. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. The American Journal of Human Genetics. 2002;70(1):124–141. doi: 10.1086/338007 11719900
15. Rakitsch B, Lippert C, Stegle O, Borgwardt K. A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics. 2013;29(2):206–214. doi: 10.1093/bioinformatics/bts669 23175758
16. Wang D, Eskridge KM, Crossa J. Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. Journal of agricultural, biological, and environmental statistics. 2011;16(2):170–184.
17. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996; p. 267–288.
18. Zou H. The adaptive lasso and its oracle properties. Journal of the American statistical association. 2006;101(476):1418–1429.
19. Ding X, Su S, Nandakumar K, Wang X, Fardo DW. A 2-step penalized regression method for family-based next-generation sequencing association studies. In: BMC proceedings. vol. 8. BioMed Central; 2014. p. S25.
20. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software. 2010;33(1):1. 20808728
21. Yang Y, Zou H. A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing. 2015;25(6):1129–1141.
22. Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nature genetics. 2014;46(2):100. doi: 10.1038/ng.2876 24473328
23. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(2):301–320.
24. Gilmour AR, Thompson R, Cullis BR. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics. 1995; p. 1440–1450.
25. Dandine-Roulland C. gaston: Genetic Data Handling (QC, GRM, LD, PCA) and Linear Mixed Models; 2018. Available from: https://CRAN.R-project.org/package=gaston.
26. Ochoa A, Storey JD. FST and kinship for arbitrary population structures I: Generalized definitions. bioRxiv. 2016.
27. Ochoa A, Storey JD. FST and kinship for arbitrary population structures II: Method of moments estimators. bioRxiv. 2016.
28. Reid S, Tibshirani R, Friedman J. A study of error variance estimation in lasso regression. Statistica Sinica. 2016; p. 35–67.
29. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203. doi: 10.1038/s41586-018-0579-z 30305743
30. Biobank U. Genotyping and quality control of UK Biobank, a large-scale, extensively phenotyped prospective resource. Available at biobank ctsu ox ac uk/crystal/docs/genotyping_qc pdf Accessed April. 2015;1 : 2016.
31. Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559 20926424
32. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Human molecular genetics. 2018;27(20):3641–3649. doi: 10.1093/hmg/ddy271 30124842
33. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016;48(10):1279. doi: 10.1038/ng.3643 27548312
34. Zhou X, Carbonetto P, Stephens M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS genetics. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264 23408905
35. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature genetics. 2012;44(7):821. doi: 10.1038/ng.2310 22706312
36. Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International journal of epidemiology. 2003;32(1):1–22.
37. Cherlin S, Howey RA, Cordell HJ. Using penalized regression to predict phenotype from SNP data. In: BMC proceedings. vol. 12. BioMed Central; 2018. p. 38.
38. Zhou W, Lo SH. Analysis of genotype by methylation interactions through sparsity-inducing regularized regression. In: BMC proceedings. vol. 12. BioMed Central; 2018. p. 40.
39. Howey RA, Cordell HJ. Application of Bayesian networks to GAW20 genetic and blood lipid data. In: BMC proceedings. vol. 12. BioMed Central; 2018. p. 19.
40. Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N. Estimating kinship in admixed populations. The American Journal of Human Genetics. 2012;91(1):122–138. doi: 10.1016/j.ajhg.2012.05.024 22748210
41. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome research. 2009;19(9):1655–1664. doi: 10.1101/gr.094052.109 19648217
42. Fortin A, Diez E, Rochefort D, Laroche L, Malo D, Rouleau GA, et al. Recombinant congenic strains derived from A/J and C57BL/6J: a tool for genetic dissection of complex traits. Genomics. 2001;74(1):21–35. doi: 10.1006/geno.2001.6528 11374899
43. Bennett BJ, Farber CR, Orozco L, Kang HM, Ghazalpour A, Siemers N, et al. A high-resolution association mapping panel for the dissection of complex traits in mice. Genome research. 2010;20(2):281–290. doi: 10.1101/gr.099234.109 20054062
44. Flint J, Eskin E. Genome-wide association studies in mice. Nature Reviews Genetics. 2012;13(11):807. doi: 10.1038/nrg3335 23044826
45. Cheng R, Lim JE, Samocha KE, Sokoloff G, Abney M, Skol AD, et al. Genome-wide association studies and the problem of relatedness among advanced intercross lines and other highly recombinant populations. Genetics. 2010;185(3):1033–1044. doi: 10.1534/genetics.110.116863 20439773
46. Di Pietrantonio T, Hernandez C, Girard M, Verville A, Orlova M, Belley A, et al. Strain-specific differences in the genetic control of two closely related mycobacteria. PLoS pathogens. 2010;6(10):e1001169. doi: 10.1371/journal.ppat.1001169 21060820
47. Wang H, Lengerich BJ, Aragam B, Xing EP. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics. 2018;35(7):1181–1187.
48. Sohrabi Y, Havelková H, Kobets T, Šíma M, Volkova V, Grekov I, et al. Mapping the Genes for Susceptibility and Response to Leishmania tropica in Mouse. PLoS neglected tropical diseases. 2013;7(7):e2282. doi: 10.1371/journal.pntd.0002282 23875032
49. Jackson AU, Fornés A, Galecki A, Miller RA, Burke DT. Multiple-trait quantitative trait loci analysis using a large mouse sibship. Genetics. 1999;151(2):785–795. 9927469
50. Stern MC, Benavides F, Klingelberger EA, Conti CJ. Allelotype analysis of chemically induced squamous cell carcinomas in F1 hybrids of two inbred mouse strains with different susceptibility to tumor progression. Carcinogenesis. 2000;21(7):1297–1301.
51. Lasko D, Cavenee W, Nordenskjöld M. Loss of constitutional heterozygosity in human cancer. Annual review of genetics. 1991;25(1):281–314. doi: 10.1146/annurev.ge.25.120191.001433 1687498
52. Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature genetics. 2015;47(3):284. doi: 10.1038/ng.3190 25642633
53. Allen N, Sudlow C, Downey P, Peakman T, Danesh J, Elliott P, et al. UK Biobank: Current status and what it means for epidemiology. Health Policy and Technology. 2012;1(3):123–126.
54. Zeng Y, Breheny P. The biglasso package: a memory-and computation-efficient solver for lasso model fitting with big data in R. arXiv preprint arXiv:170105936. 2017.
55. Spain SL, Barrett JC. Strategies for fine-mapping complex traits. Human molecular genetics. 2015;24(R1):R111–R119. doi: 10.1093/hmg/ddv260 26157023
56. Pirinen M, Donnelly P, Spencer CC, et al. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. The Annals of Applied Statistics. 2013;7(1):369–390.
57. Schelldorfer J, Bühlmann P, DE G, VAN S. Estimation for High-Dimensional Linear Mixed-Effects Models Using L1-Penalization. Scandinavian Journal of Statistics. 2011;38(2):197–214.
58. Tseng P, Yun S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming. 2009;117(1):387–423.
59. Meier L, Van De Geer S, Bühlmann P. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2008;70(1):53–71.
60. Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing. 1995;16(5):1190–1208.
61. Wakefield J. Bayesian and frequentist regression methods. Springer Science & Business Media; 2013.
62. Nishii R. Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics. 1984; p. 758–765.
63. Zou H, Hastie T, Tibshirani R, et al. On the “degrees of freedom” of the lasso. The Annals of Statistics. 2007;35(5):2173–2192.
64. Bondell HD, Krishna A, Ghosh SK. Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models. Biometrics. 2010;66(4):1069–1077. doi: 10.1111/j.1541-0420.2010.01391.x 20163404
65. Fan Y, Tang CY. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2013;75(3):531–552.
Článek Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucomaČlánek A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneumČlánek Sex-biased genetic programs in liver metabolism and liver fibrosis are controlled by EZH1 and EZH2
Článek vyšel v časopisePLOS Genetics
Nejčtenější tento týden
2020 Číslo 5- Eutanazie na žádost pacientů s demencí? Odborná polemika je stále živá
- „Jednohubky“ z klinického výzkumu – 2026/1
- Reprogramování hematoencefalické bariéry u modelu Alzheimerovy choroby
- Pomůže AI k rychlejšímu vývoji antibiotik na kapavku a MRSA?
- Ukažte mi, jak kašlete, a já vám řeknu, co vám je
-
Všechny články tohoto čísla
- A cross-disorder PRS-pheWAS of 5 major psychiatric disorders in UK Biobank
- Depletion of Ric-8B leads to reduced mTORC2 activity
- A copy number variant is associated with a spectrum of pigmentation patterns in the rock pigeon (Columba livia)
- An osteocalcin-deficient mouse strain without endocrine abnormalities
- Osteocalcin is necessary for the alignment of apatite crystallites, but not glucose metabolism, testosterone synthesis, or muscle mass
- Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model
- Accounting for long-range correlations in genome-wide simulations of large cohorts
- Novel frameshift variant in MYL2 reveals molecular differences between dominant and recessive forms of hypertrophic cardiomyopathy
- The domesticated transposase ALP2 mediates formation of a novel Polycomb protein complex by direct interaction with MSI1, a core subunit of Polycomb Repressive Complex 2 (PRC2)
- Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma
- The phosphorelay BarA/SirA activates the non-cognate regulator RcsB in Salmonella enterica
- Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta)
- The genomic landscape of metastasis in treatment-naïve breast cancer models
- Trans-ethnic meta-analysis of genome-wide association studies identifies maternal ITPR1 as a novel locus influencing fetal growth during sensitive periods in pregnancy
- Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago
- Single-nucleus RNA-seq identifies divergent populations of FSHD2 myotube nuclei
- Separable, Ctf4-mediated recruitment of DNA Polymerase α for initiation of DNA synthesis at replication origins and lagging-strand priming during replication elongation
- Bidirectional crosstalk between Hypoxia-Inducible Factor and glucocorticoid signalling in zebrafish larvae
- An EHBP-1-SID-3-DYN-1 axis promotes membranous tubule fission during endocytic recycling
- Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
- Interplay between axonal Wnt5-Vang and dendritic Wnt5-Drl/Ryk signaling controls glomerular patterning in the Drosophila antennal lobe
- Additive and mostly adaptive plastic responses of gene expression to multiple stress in Tribolium castaneum
- Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus
- Multiple mechanisms regulate H3 acetylation of enhancers in response to thyroid hormone
- A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneum
- Scalable probabilistic PCA for large-scale genetic variation data
- An Out-of-Patagonia migration explains the worldwide diversity and distribution of Saccharomyces eubayanus lineages
- Alternative splicing of jnk1a in zebrafish determines first heart field ventricular cardiomyocyte numbers through modulation of hand2 expression
- ASEP: Gene-based detection of allele-specific expression across individuals in a population by RNA sequencing
- ALC1/eIF4A1-mediated regulation of CtIP mRNA stability controls DNA end resection
- A high-fat diet induces a microbiota-dependent increase in stem cell activity in the Drosophila intestine
- The genetic architecture of the maize progenitor, teosinte, and how it was altered during maize domestication
- Exome-wide association study reveals largely distinct gene sets underlying specific resistance to dengue virus types 1 and 3 in Aedes aegypti
- Correction: Regulation of ATG4B Stability by RNF5 Limits Basal Levels of Autophagy and Influences Susceptibility to Bacterial Infection
- Sex-biased genetic programs in liver metabolism and liver fibrosis are controlled by EZH1 and EZH2
- UVR8-mediated inhibition of shade avoidance involves HFR1 stabilization in Arabidopsis
- Yeast mismatch repair components are required for stable inheritance of gene silencing
- Dynamic genetic architecture of yeast response to environmental perturbation shed light on origin of cryptic genetic variation
- Activation of cryptic splicing in bovine WDR19 is associated with reduced semen quality and male fertility
- The temporal regulation of TEK contributes to pollen wall exine patterning
- Intimate functional interactions between TGS1 and the Smn complex revealed by an analysis of the Drosophila eye development
- Saccharomyces cerevisiae Mus81-Mms4 prevents accelerated senescence in telomerase-deficient cells
- Interaction of YAP with the Myb-MuvB (MMB) complex defines a transcriptional program to promote the proliferation of cardiomyocytes
- Glucose transporter 10 modulates adipogenesis via an ascorbic acid-mediated pathway to protect mice against diet-induced metabolic dysregulation
- Congenital hearing impairment associated with peripheral cochlear nerve dysmyelination in glycosylation-deficient muscular dystrophy
- Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution
- The Mediator CDK8-Cyclin C complex modulates Dpp signaling in Drosophila by stimulating Mad-dependent transcription
- Correction: The persimmon genome reveals clues to the evolution of a lineage-specific sex determination system in plants
- Correction: Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome
- PLOS Genetics
- Archiv čísel
- Aktuální číslo
- Informace o časopisu
Nejčtenější v tomto čísle- A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneum
- The domesticated transposase ALP2 mediates formation of a novel Polycomb protein complex by direct interaction with MSI1, a core subunit of Polycomb Repressive Complex 2 (PRC2)
- Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus
- The phosphorelay BarA/SirA activates the non-cognate regulator RcsB in Salmonella enterica
Kurzy
Zvyšte si kvalifikaci online z pohodlí domova
Autoři: prof. MUDr. Vladimír Palička, CSc., Dr.h.c., doc. MUDr. Václav Vyskočil, Ph.D., MUDr. Petr Kasalický, CSc., MUDr. Jan Rosa, Ing. Pavel Havlík, Ing. Jan Adam, Hana Hejnová, DiS., Jana Křenková
Autoři: MUDr. Irena Krčmová, CSc.
Autoři: MDDr. Eleonóra Ivančová, PhD., MHA
Autoři: prof. MUDr. Eva Kubala Havrdová, DrSc.
Všechny kurzyPřihlášení#ADS_BOTTOM_SCRIPTS#Zapomenuté hesloZadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.
- Vzdělávání