-
Články
Top novinky
Reklama- Vzdělávání
- Časopisy
Top články
Nové číslo
- Témata
Top novinky
Reklama- Kongresy
- Videa
- Podcasty
Nové podcasty
Reklama- Kariéra
Doporučené pozice
Reklama- Praxe
Top novinky
ReklamaGenotype imputation using the Positional Burrows Wheeler Transform
Autoři: Simone Rubinacci aff001; Olivier Delaneau aff001; Jonathan Marchini aff003
Působiště autorů: Department of Computational Biology, University of Lausanne, Lausanne, Switzerland aff001; Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland aff002; Regeneron Genetics Center, Tarrytown, New York, USA aff003
Vyšlo v časopise: Genotype imputation using the Positional Burrows Wheeler Transform. PLoS Genet 16(11): e1009049. doi:10.1371/journal.pgen.1009049
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pgen.1009049Souhrn
Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.
Klíčová slova:
Algorithms – Consortia – Gene mapping – Genome-wide association studies – Genomics – Genotyping – Haplotypes – Hidden Markov models
Zdroje
1. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nature Reviews Genetics. 2010;11(7):499–511. doi: 10.1038/nrg2796 20517342
2. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. doi: 10.1038/s41586-018-0579-z 30305743
3. Zeggini E, Ioannidis JPA. Meta-analysis in genome-wide association studies. Pharmacogenomics. 2009;10(2):191–201. doi: 10.2217/14622416.10.2.191 19207020
4. Marchini JL. Haplotype Estimation and Genotype Imputation. In: Handbook of Statistical Genomics. 4th ed.; 2019.
5. Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nature Methods. 2013;10(1):5–6. doi: 10.1038/nmeth.2307 23269371
6. Delaneau O, Zagury JF, Robinson MR, Marchini J, Dermitzakis E. Accurate, scalable and integrative haplotype estimation. Nature Communications. 2019; (10)5436. doi: 10.1038/s41467-019-13225-y 31780650
7. Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nature Genetics. 2016;48(11):1443–1448. doi: 10.1038/ng.3679 27694958
8. Howie B, Marchini J, Stephens M. Genotype Imputation with Thousands of Genomes. G3: Genes, Genomes, Genetics. 2011;1(6):457–470. doi: 10.1534/g3.111.001198
9. International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851–861. doi: 10.1038/nature06258 17943122
10. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393 26432245
11. Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL, et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nature Communications. 2015;6 : 8111. doi: 10.1038/ncomms9111 26368830
12. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016;48(10):1279–1283. doi: 10.1038/ng.3643 27548312
13. Brody J, Morrison A, Bis J, O’Connell J, Brown M, Huffman J, et al. Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nature genetics. 2017;49(11):1560–1563. doi: 10.1038/ng.3968 29074945
14. Caulfield M, Davies J, Dennys M, Elbahy L, Fowler T, Hill S, et al. The 100,000 Genomes Project Protocol. https://doiorg/106084/m9figshare4530893v4. 2017.
15. Durbin R. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics. 2014;30(9):1266–1272. doi: 10.1093/bioinformatics/btu014 24413527
16. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature genetics. 2012;44(8):955–959. doi: 10.1038/ng.2354 22820512
17. Band G, Marchini J. BGEN: a binary file format for imputed genotype and haplotype data. BioRxiv. 2018;
18. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nature Genetics. 2016;48(10):1284–1287. doi: 10.1038/ng.3656 27571263
19. Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. American Journal of Human Genetics. 2018;103(3):338–348. doi: 10.1016/j.ajhg.2018.07.015 30100085
20. Li H. BGT: efficient and flexible genotype query across many samples. Bioinformatics. 2016;32(4):590–592. doi: 10.1093/bioinformatics/btv613 26500154
21. Ferragina P, Manzini G. Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science; 2000. p. 390–398.
22. Li N, Stephens M. Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data. Genetics. 2003;165(4):2213–2233. 14704198
23. Browning BL, Browning SR. Genotype Imputation with Millions of Reference Samples. American Journal of Human Genetics. 2016;98(1):116–126. doi: 10.1016/j.ajhg.2015.11.020 26748515
24. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257–286. doi: 10.1109/5.18626
25. Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS computational biology. 2016;12(5):e1004842. doi: 10.1371/journal.pcbi.1004842 27145223
Článek Formal commentaryČlánek A complementary study approach unravels novel players in the pathoetiology of Hirschsprung diseaseČlánek Suppression of retinal degeneration by two novel ERAD ubiquitin E3 ligases SORDD1/2 in DrosophilaČlánek Genetic engineering of sex chromosomes for batch cultivation of non-transgenic, sex-sorted malesČlánek The prefoldin complex stabilizes the von Hippel-Lindau protein against aggregation and degradationČlánek Opposing functions of Fng1 and the Rpd3 HDAC complex in H4 acetylation in Fusarium graminearumČlánek Folliculin variants linked to Birt-Hogg-Dubé syndrome are targeted for proteasomal degradation
Článek vyšel v časopisePLOS Genetics
Nejčtenější tento týden
2020 Číslo 11- Ukažte mi, jak kašlete, a já vám řeknu, co vám je
- Eutanazie na žádost pacientů s demencí? Odborná polemika je stále živá
- „Jednohubky“ z klinického výzkumu – 2026/1
- VIDEO: Terénní tým ECMO zachraňuje životy přímo v pražských ulicích
- Test BioCog: 10 minut k orientaci v kognitivním stavu pacienta
-
Všechny články tohoto čísla
- A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general population
- Mms19 promotes spindle microtubule assembly in Drosophila neural stem cells
- Mosquito genomes are frequently invaded by transposable elements through horizontal transfer
- Genotype imputation using the Positional Burrows Wheeler Transform
- Formal commentary
- The DNA damage response is required for oocyte cyst breakdown and follicle formation in mice
- Loss of hepatocyte cell division leads to liver inflammation and fibrosis
- Genetic compensation prevents myopathy and heart failure in an in vivo model of Bag3 deficiency
- A genetic variant controls interferon-β gene expression in human myeloid cells by preventing C/EBP-β binding on a conserved enhancer
- Identity-by-descent with uncertainty characterises connectivity of Plasmodium falciparum populations on the Colombian-Pacific coast
- A proteomic survey of microtubule-associated proteins in a R402H TUBA1A mutant mouse
- Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data
- A complementary study approach unravels novel players in the pathoetiology of Hirschsprung disease
- Unique genetic signatures of local adaptation over space and time for diapause, an ecologically relevant complex trait, in Drosophila melanogaster
- A pair of ascending neurons in the subesophageal zone mediates aversive sensory inputs-evoked backward locomotion in Drosophila larvae
- A frog with three sex chromosomes that co-mingle together in nature: Xenopus tropicalis has a degenerate W and a Y that evolved from a Z chromosome
- Mutations in PIH proteins MOT48, TWI1 and PF13 define common and unique steps for preassembly of each, different ciliary dynein
- The Bric-à-Brac BTB/POZ transcription factors are necessary in niche cells for germline stem cells establishment and homeostasis through control of BMP/DPP signaling in the Drosophila melanogaster ovary
- A novel role for kynurenine 3-monooxygenase in mitochondrial dynamics
- No association between SCN9A and monogenic human epilepsy disorders
- Genome-wide association study identifies 16 genomic regions associated with circulating cytokines at birth
- In vivo miRNA knockout screening identifies miR-190b as a novel tumor suppressor
- Runx2 is essential for the transdifferentiation of chondrocytes into osteoblasts
- Suppression of retinal degeneration by two novel ERAD ubiquitin E3 ligases SORDD1/2 in Drosophila
- Stability of SARS-CoV-2 phylogenies
- Dual genome-wide CRISPR knockout and CRISPR activation screens identify mechanisms that regulate the resistance to multiple ATR inhibitors
- Genetic engineering of sex chromosomes for batch cultivation of non-transgenic, sex-sorted males
- Differential transcript usage in the Parkinson’s disease brain
- The prefoldin complex stabilizes the von Hippel-Lindau protein against aggregation and degradation
- Cyclin B3 activates the Anaphase-Promoting Complex/Cyclosome in meiosis and mitosis
- Opposing functions of Fng1 and the Rpd3 HDAC complex in H4 acetylation in Fusarium graminearum
- Folliculin variants linked to Birt-Hogg-Dubé syndrome are targeted for proteasomal degradation
- A C. elegans Zona Pellucida domain protein functions via its ZPc domain
- Rare genetic variation at transcription factor binding sites modulates local DNA methylation profiles
- Innate immune signaling in Drosophila shifts anabolic lipid metabolism from triglyceride storage to phospholipid synthesis to support immune function
- Gtsf1 is essential for proper female sex determination and transposon silencing in the silkworm, Bombyx mori
- TOR Complex 2- independent mutations in the regulatory PIF pocket of Gad8AKT1/SGK1 define separate branches of the stress response mechanisms in fission yeast
- NIGT1 family proteins exhibit dual mode DNA recognition to regulate nutrient response-associated genes in Arabidopsis
- Oxidative stress antagonizes fluoroquinolone drug sensitivity via the SoxR-SUF Fe-S cluster homeostatic axis
- A spectrum of verticality across genes
- A context-dependent bifurcation in the Pointed transcriptional effector network contributes specificity and robustness to retinal cell fate acquisition
- Phenomic screen identifies a role for the yeast lysine acetyltransferase NuA4 in the control of Bcy1 subcellular localization, glycogen biosynthesis, and mitochondrial morphology
- PLOS Genetics
- Archiv čísel
- Aktuální číslo
- Informace o časopisu
Nejčtenější v tomto čísle- A genetic variant controls interferon-β gene expression in human myeloid cells by preventing C/EBP-β binding on a conserved enhancer
- A complementary study approach unravels novel players in the pathoetiology of Hirschsprung disease
- A C. elegans Zona Pellucida domain protein functions via its ZPc domain
- Stability of SARS-CoV-2 phylogenies
Kurzy
Zvyšte si kvalifikaci online z pohodlí domova
Autoři: prof. MUDr. Vladimír Palička, CSc., Dr.h.c., doc. MUDr. Václav Vyskočil, Ph.D., MUDr. Petr Kasalický, CSc., MUDr. Jan Rosa, Ing. Pavel Havlík, Ing. Jan Adam, Hana Hejnová, DiS., Jana Křenková
Autoři: MUDr. Irena Krčmová, CSc.
Autoři: MDDr. Eleonóra Ivančová, PhD., MHA
Autoři: prof. MUDr. Eva Kubala Havrdová, DrSc.
Všechny kurzyPřihlášení#ADS_BOTTOM_SCRIPTS#Zapomenuté hesloZadejte e-mailovou adresu, se kterou jste vytvářel(a) účet, budou Vám na ni zaslány informace k nastavení nového hesla.
- Vzdělávání