-
Články
Top novinky
Reklama- Vzdělávání
- Časopisy
Top články
Nové číslo
- Témata
Top novinky
Reklama- Kongresy
- Videa
- Podcasty
Nové podcasty
Reklama- Kariéra
Doporučené pozice
Reklama- Praxe
Top novinky
ReklamaScalable probabilistic PCA for large-scale genetic variation data
Autoři: Aman Agrawal aff001; Alec M. Chiu aff002; Minh Le aff003; Eran Halperin aff002; Sriram Sankararaman aff002; Eran Halperin aff003; Sriram Sankararaman aff003
Působiště autorů: Department of Computer Science, Indian Institute of Technology, Delhi, India aff001; Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States of America aff002; Bioinformatics Interdepartmental Program, University of California, Los Angeles, California United States of America aff002; Department of Computer Science, University of California, Los Angeles, California, United States of America aff003; Department of Computer Science, University of California, Los Angeles, California United States of America aff003; Department of Human Genetics, University of California, Los Angeles, California, United States of America aff004; Department of Human Genetics, University of California, Los Angeles, California United States of America aff004; Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, California, United States of America aff005; Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, California United States of America aff005; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America aff006; Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, California United States of America aff006; Institute of Precision Health, University of California, Los Angeles, California, United States of America aff007
Vyšlo v časopise: Scalable probabilistic PCA for large-scale genetic variation data. PLoS Genet 16(5): e32767. doi:10.1371/journal.pgen.1008773
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1008773Souhrn
Principal component analysis (PCA) is a key tool for understanding population structure and controlling for population stratification in genome-wide association studies (GWAS). With the advent of large-scale datasets of genetic variation, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. We present ProPCA, a highly scalable method based on a probabilistic generative model, which computes the top PCs on genetic variation data efficiently. We applied ProPCA to compute the top five PCs on genotype data from the UK Biobank, consisting of 488,363 individuals and 146,671 SNPs, in about thirty minutes. To illustrate the utility of computing PCs in large samples, we leveraged the population structure inferred by ProPCA within White British individuals in the UK Biobank to identify several novel genome-wide signals of recent putative selection including missense mutations in RPGRIP1L and TLR4.
Klíčová slova:
Algorithms – Genome-wide association studies – Genomic signal processing – Genomics statistics – Molecular genetics – principal component analysis – Singular value decomposition – Variant genotypes
Zdroje
1. Novembre J, Ramachandran S. Perspectives on human population structure at the cusp of the sequencing era. Annual review of genomics and human genetics. 2011;12 : 245–274. 21801023
2. Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko A, Auton A, et al. Genes mirror geography within Europe. Nature. 2008;456(7219):274.
3. Yang WY, Novembre J, Eskin E, Halperin E. A model-based approach for analysis of spatial structure in genetic data. Nature genetics. 2012;44(6):725–731. doi: 10.1038/ng.2285 22610118
4. Baran Y, Quintela I, Carracedo Á, Pasaniuc B, Halperin E. Enhanced localization of genetic samples through linkage-disequilibrium correction. The American Journal of Human Genetics. 2013;92(6):882–894. doi: 10.1016/j.ajhg.2013.04.023 23726367
5. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nature reviews Genetics. 2010;11(7):459. doi: 10.1038/nrg2813 20548291
6. Patterson N, Price AL, Reich D. Population Structure and Eigenanalysis. PLoS Genetics. 2006;2(12):e190+. doi: 10.1371/journal.pgen.0020190 17194218
7. Hanis CL, Chakraborty R, Ferrell RE, Schull WJ. Individual admixture estimates: disease associations and individual risk of diabetes and gallbladder disease among Mexican-Americans in Starr County, Texas. Am J Phys Anthropol. 1986;70(4):433–441. 3766713
8. Pritchard J, Stephens M, Donnelly P. Inference of Population Structure Using Multilocus Genotype Data. Genetics. 2000;155 : 945–959. 10835412
9. Chen C, Durand E, Forbes F, François O. Bayesian clustering algorithms ascertaining spatial population structure: a new computer program and a comparison study. Molecular Ecology Resources. 2007;7(5):747–756.
10. Engelhardt BE, Stephens M. Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis. PLoS genetics. 2010;6(9):e1001117. doi: 10.1371/journal.pgen.1001117 20862358
11. Jolliffe IT. Principal Component Analysis and Factor Analysis. In: Principal component analysis. Springer; 1986. p. 115–128.
12. Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, et al. Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. The American Journal of Human Genetics. 2016;98(3):456–472. doi: 10.1016/j.ajhg.2015.12.022 26924531
13. Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of biobank-scale genotype datasets. Bioinformatics. 2017;. 28475694
14. Prive F, Aschard H, Ziyatdinov A, Blum M. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics. 2018;34(16):2781–2787. doi: 10.1093/bioinformatics/bty185 29617937
15. Bose A, Kalantzis V, Kontopoulou EM, Elkady M, Paschou P, Drineas P. TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes. Bioinformatics. 2019;35(19):3679–3683. 30957838
16. Chang C, Chow C, Tellier L, Vattikuti S, Purcell S, Lee J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4 : 7. doi: 10.1186/s13742-015-0047-8 25722852
17. Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38 : 904–909. 16862161
18. Canela-Xandri O, Law A, Gray A, Woolliams JA, Tenesa A. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach. Nature communications. 2015;6 : 10162. doi: 10.1038/ncomms10162 26657010
19. Roweis ST. EM algorithms for PCA and SPCA. In: Advances in neural information processing systems; 1998. p. 626–632.
20. Tipping ME, Bishop CM. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1999;61(3):611–622.
21. Liberty E, Zucker SW. The mailman algorithm: A note on matrix–vector multiplication. Information Processing Letters. 2009;109(3):179–182.
22. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562 : 203–209. doi: 10.1038/s41586-018-0579-z 30305743
23. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–575. doi: 10.1086/519795 17701901
24. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65.
25. Shriver MD, Kennedy GC, Parra EJ, Lawson HA, Sonpar V, Huang J, et al. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs. Human genomics. 2004;1(4):274. doi: 10.1186/1479-7364-1-4-274 15588487
26. Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, et al. Analysis and application of European genetic substructure using 300 K SNP information. PLoS genetics. 2008;4(1):e4. doi: 10.1371/journal.pgen.0040004 18208329
27. Wiegering A, Ruther U, Gerhardt C. The ciliary protein Rpgrip1l in development and disease. Dev Biol. 2018;442(1):60–68. 30075108
28. Delous M, Baala L, Salomon R, Laclef C, Vierkotten J, Tory K, et al. The ciliary gene RPGRIP1L is mutated in cerebello-oculo-renal syndrome (Joubert syndrome type B) and Meckel syndrome. Nature Genetics. 2007;39 : 875–881. 17558409
29. Devuyst O, Arnould VJ. Mutations in RPGRIP1L: extending the clinical spectrum of ciliopathies. Nephrology Dialysis Transplantation. 2008;23(5):1500–1503.
30. Khanna H, Davis EE, Murga-Zamalloa CA, Estrada-Cuzcano A, Lopez I, den Hollander AI, et al. A common allele in RPGRIP1L is a modifier of retinal degeneration in ciliopathies. Nature Genetics. 2009;41(6):739–45. doi: 10.1038/ng.366 19430481
31. Aschard H, Vilhjálmsson B, Greliche N, Morange P, Trégouët D, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. AJHG. 2014;94(5):662–76.
32. Korneev K, Atretkhany K, Drutskaya M, Grivennikov S, Kuprash D, Nedospasov S. TLR-signaling and proinflammatory cytokines as drivers of tumorigenesis. Cytokine. 2017;89 : 127–135. 26854213
33. Mockenhaupt F, Cramer J, Hamann L, Stegemann M, Eckert J, Oh N, et al. Toll-like receptor (TLR) polymorphisms in African children: Common TLR-4 variants predispose to severe malaria. PNAS. 2006;103(1):177–182. doi: 10.1073/pnas.0506803102 16371473
34. Van der Graaf C, Netea M, Morré S, Den Heijer M, Verweij P, Van der Meer J, et al. Toll-like receptor 4 Asp299Gly/Thr399Ile polymorphisms are a risk factor for Candida bloodstream infection. European Cytokine Network. 2006;17(1):29–34. 16613760
35. Field Y, Boyle EA, Telis N, Gao Z, Gaulton KJ, Golan D, et al. Detection of human adaptation during the past 2000 years. Science. 2016;354(6313):760–764. doi: 10.1126/science.aag0776 27738015
36. Albers, McVean. Dating genomic variants and shared ancestry in population-scale sequencing data. bioRxiv. 2019.
37. Wu Y, Sankararaman S. A scalable estimator of SNP heritability for biobank-scale data. Bioinformatics. 2018;34(13):i187–i194. doi: 10.1093/bioinformatics/bty253 29950019
38. Halko N, Martinsson PG, Tropp JA. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review. 2011;53(2):217–288.
39. Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nature genetics. 2012;44(3):243. doi: 10.1038/ng.1074 22306651
40. Hellenthal G, Auton A, Falush D. Inferring Human Colonization History Using a Copying Model. PLoS Genet. 2008;4(5):e1000078. doi: 10.1371/journal.pgen.1000078 18497854
41. Li N, Stephens M. Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data. Genetics. 2003;165(4):2213–2233. 14704198
42. Wen X, Stephens M. Using linear predictors to impute allele frequencies from summary or pooled genotype data. The annals of applied statistics. 2010;4(3):1158. 21479081
43. Schein AI, Saul LK, Ungar LH. A generalized linear model for principal component analysis of binary data. In: AISTATS. vol. 3; 2003. p. 10.
44. Li W, Cerise J, Yang Y, Han H. Application of t-SNE to human genetic data. J Bioinform Comput Biol. 2017;15(4):1750017. 28718343
45. Becht E, McInnes L, Healy J, Dutertre C, Kwok I, Ng L, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37 : 38–44.
46. Anderson TW, Rubin H. Statistical inference in factor analysis. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability. vol. 5; 1956. p. 111–150.
47. Szlam A, Tulloch A, Tygert M. Accurate Low-Rank Approximations Via a Few Iterations of Alternating Least Squares. SIAM Journal on Matrix Analysis and Applications. 2017;38(2):425–433.
48. Lehoucq RB, Sorensen DC. Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM Journal on Matrix Analysis and Applications. 1996;17(4):789–821.
49. Manichaikul A, Mychaleckyj J, Rich S, Daly K, Sale M, Chen W. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559 20926424
Článek Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucomaČlánek A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneumČlánek Sex-biased genetic programs in liver metabolism and liver fibrosis are controlled by EZH1 and EZH2
Článek vyšel v časopisePLOS Genetics
Nejčtenější tento týden
2020 Číslo 5- Eutanazie na žádost pacientů s demencí? Odborná polemika je stále živá
- „Jednohubky“ z klinického výzkumu – 2026/1
- Reprogramování hematoencefalické bariéry u modelu Alzheimerovy choroby
- Pomůže AI k rychlejšímu vývoji antibiotik na kapavku a MRSA?
- Ukažte mi, jak kašlete, a já vám řeknu, co vám je
-
Všechny články tohoto čísla
- A cross-disorder PRS-pheWAS of 5 major psychiatric disorders in UK Biobank
- Depletion of Ric-8B leads to reduced mTORC2 activity
- A copy number variant is associated with a spectrum of pigmentation patterns in the rock pigeon (Columba livia)
- An osteocalcin-deficient mouse strain without endocrine abnormalities
- Osteocalcin is necessary for the alignment of apatite crystallites, but not glucose metabolism, testosterone synthesis, or muscle mass
- Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model
- Accounting for long-range correlations in genome-wide simulations of large cohorts
- Novel frameshift variant in MYL2 reveals molecular differences between dominant and recessive forms of hypertrophic cardiomyopathy
- The domesticated transposase ALP2 mediates formation of a novel Polycomb protein complex by direct interaction with MSI1, a core subunit of Polycomb Repressive Complex 2 (PRC2)
- Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma
- The phosphorelay BarA/SirA activates the non-cognate regulator RcsB in Salmonella enterica
- Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta)
- The genomic landscape of metastasis in treatment-naïve breast cancer models
- Trans-ethnic meta-analysis of genome-wide association studies identifies maternal ITPR1 as a novel locus influencing fetal growth during sensitive periods in pregnancy
- Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago
- Single-nucleus RNA-seq identifies divergent populations of FSHD2 myotube nuclei
- Separable, Ctf4-mediated recruitment of DNA Polymerase α for initiation of DNA synthesis at replication origins and lagging-strand priming during replication elongation
- Bidirectional crosstalk between Hypoxia-Inducible Factor and glucocorticoid signalling in zebrafish larvae
- An EHBP-1-SID-3-DYN-1 axis promotes membranous tubule fission during endocytic recycling
- Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models
- Interplay between axonal Wnt5-Vang and dendritic Wnt5-Drl/Ryk signaling controls glomerular patterning in the Drosophila antennal lobe
- Additive and mostly adaptive plastic responses of gene expression to multiple stress in Tribolium castaneum
- Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus
- Multiple mechanisms regulate H3 acetylation of enhancers in response to thyroid hormone
- A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneum
- Scalable probabilistic PCA for large-scale genetic variation data
- An Out-of-Patagonia migration explains the worldwide diversity and distribution of Saccharomyces eubayanus lineages
- Alternative splicing of jnk1a in zebrafish determines first heart field ventricular cardiomyocyte numbers through modulation of hand2 expression
- ASEP: Gene-based detection of allele-specific expression across individuals in a population by RNA sequencing
- ALC1/eIF4A1-mediated regulation of CtIP mRNA stability controls DNA end resection
- A high-fat diet induces a microbiota-dependent increase in stem cell activity in the Drosophila intestine
- The genetic architecture of the maize progenitor, teosinte, and how it was altered during maize domestication
- Exome-wide association study reveals largely distinct gene sets underlying specific resistance to dengue virus types 1 and 3 in Aedes aegypti
- Correction: Regulation of ATG4B Stability by RNF5 Limits Basal Levels of Autophagy and Influences Susceptibility to Bacterial Infection
- Sex-biased genetic programs in liver metabolism and liver fibrosis are controlled by EZH1 and EZH2
- UVR8-mediated inhibition of shade avoidance involves HFR1 stabilization in Arabidopsis
- Yeast mismatch repair components are required for stable inheritance of gene silencing
- Dynamic genetic architecture of yeast response to environmental perturbation shed light on origin of cryptic genetic variation
- Activation of cryptic splicing in bovine WDR19 is associated with reduced semen quality and male fertility
- The temporal regulation of TEK contributes to pollen wall exine patterning
- Intimate functional interactions between TGS1 and the Smn complex revealed by an analysis of the Drosophila eye development
- Saccharomyces cerevisiae Mus81-Mms4 prevents accelerated senescence in telomerase-deficient cells
- Interaction of YAP with the Myb-MuvB (MMB) complex defines a transcriptional program to promote the proliferation of cardiomyocytes
- Glucose transporter 10 modulates adipogenesis via an ascorbic acid-mediated pathway to protect mice against diet-induced metabolic dysregulation
- Congenital hearing impairment associated with peripheral cochlear nerve dysmyelination in glycosylation-deficient muscular dystrophy
- Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution
- The Mediator CDK8-Cyclin C complex modulates Dpp signaling in Drosophila by stimulating Mad-dependent transcription
- Correction: The persimmon genome reveals clues to the evolution of a lineage-specific sex determination system in plants
- Correction: Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome
- PLOS Genetics
- Archiv čísel
- Aktuální číslo
- Informace o časopisu
Nejčtenější v tomto čísle- A new neuropeptide insect parathyroid hormone iPTH in the red flour beetle Tribolium castaneum
- The domesticated transposase ALP2 mediates formation of a novel Polycomb protein complex by direct interaction with MSI1, a core subunit of Polycomb Repressive Complex 2 (PRC2)
- Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus
- The phosphorelay BarA/SirA activates the non-cognate regulator RcsB in Salmonella enterica
Kurzy
Zvyšte si kvalifikaci online z pohodlí domova
Autoři: prof. MUDr. Vladimír Palička, CSc., Dr.h.c., doc. MUDr. Václav Vyskočil, Ph.D., MUDr. Petr Kasalický, CSc., MUDr. Jan Rosa, Ing. Pavel Havlík, Ing. Jan Adam, Hana Hejnová, DiS., Jana Křenková
Autoři: MUDr. Irena Krčmová, CSc.
Autoři: MDDr. Eleonóra Ivančová, PhD., MHA
Autoři: prof. MUDr. Eva Kubala Havrdová, DrSc.
Všechny kurzyPřihlášení#ADS_BOTTOM_SCRIPTS#Zapomenuté hesloZadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.
- Vzdělávání