UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts

Autoři: Alex Diaz-Papkovich aff001;  Luke Anderson-Trocmé aff002;  Chief Ben-Eghan aff002;  Simon Gravel aff002
Působiště autorů: Quantitative Life Sciences, McGill University, Montreal, Québec, Canada aff001;  McGill University and Genome Quebec Innovation Centre, Montreal, Québec, Canada aff002;  Department of Human Genetics, McGill University, Montreal, Quebec, Canada aff003
Vyšlo v časopise: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet 15(11): e32767. doi:10.1371/journal.pgen.1008432
Kategorie: Research Article
doi: 10.1371/journal.pgen.1008432


Human populations feature both discrete and continuous patterns of variation. Current analysis approaches struggle to jointly identify these patterns because of modelling assumptions, mathematical constraints, or numerical challenges. Here we apply uniform manifold approximation and projection (UMAP), a non-linear dimension reduction tool, to three well-studied genotype datasets and discover overlooked subpopulations within the American Hispanic population, fine-scale relationships between geography, genotypes, and phenotypes in the UK population, and cryptic structure in the Thousand Genomes Project data. This approach is well-suited to the influx of large and diverse data and opens new lines of inquiry in population-scale datasets.

Klíčová slova:

African people – Caribbean – Data visualization – Ethnicities – Europe – Hispanic people – Chinese people – principal component analysis


