Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago

Autoři: Heini M. Natri aff001;  Katalina S. Bobowik aff003;  Pradiptajati Kusuma aff006;  Chelzie Crenna Darusallam aff006;  Guy S. Jacobs aff007;  Georgi Hudjashov aff008;  J. Stephen Lansing aff009;  Herawati Sudoyo aff006;  Nicholas E. Banovich aff002;  Murray P. Cox aff008;  Irene Gallego Romero aff003
Působiště autorů: Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America aff001;  The Translational Genomics Research Institute, Phoenix, Arizona, United States of America aff002;  Melbourne Integrative Genomics, University of Melbourne, Parkville, Australia aff003;  School of BioSciences, University of Melbourne, Parkville, Australia aff004;  Centre for Stem Cell Systems, University of Melbourne, Parkville, Australia aff005;  Genome Diversity and Diseases Laboratory, Eijkman Institute for Molecular Biology, Jakarta, Indonesia aff006;  Complexity Institute, Nanyang Technological University, Singapore, Singapore aff007;  Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North, New Zealand aff008;  Santa Fe Institute, Santa Fe, New Mexico, United States of America aff009;  Vienna Complexity Science Hub, Vienna, Austria aff010;  Stockholm Resilience Center, Kräftriket, Stockholm, Sweden aff011;  Department of Medical Biology, Faculty of Medicine, University of Indonesia, Jakarta, Indonesia aff012;  Sydney Medical School, University of Sydney, Sydney, NSW, Australia aff013
Vyšlo v časopise: Genome-wide DNA methylation and gene expression patterns reflect genetic ancestry and environmental differences across the Indonesian archipelago. PLoS Genet 16(5): e32767. doi:10.1371/journal.pgen.1008749
Kategorie: Research Article
doi: 10.1371/journal.pgen.1008749


Indonesia is the world’s fourth most populous country, host to striking levels of human diversity, regional patterns of admixture, and varying degrees of introgression from both Neanderthals and Denisovans. However, it has been largely excluded from the human genomics sequencing boom of the last decade. To serve as a benchmark dataset of molecular phenotypes across the region, we generated genome-wide CpG methylation and gene expression measurements in over 100 individuals from three locations that capture the major genomic and geographical axes of diversity across the Indonesian archipelago. Investigating between- and within-island differences, we find up to 10.55% of tested genes are differentially expressed between the islands of Sumba and New Guinea. Variation in gene expression is closely associated with DNA methylation, with expression levels of 9.80% of genes correlating with nearby promoter CpG methylation, and many of these genes being differentially expressed between islands. Genes identified in our differential expression and methylation analyses are enriched in pathways involved in immunity, highlighting Indonesia's tropical role as a source of infectious disease diversity and the strong selective pressures these diseases have exerted on humans. Finally, we identify robust within-island variation in DNA methylation and gene expression, likely driven by fine-scale environmental differences across sampling sites. Together, these results strongly suggest complex relationships between DNA methylation, transcription, archaic hominin introgression and immunity, all jointly shaped by the environment. This has implications for the application of genomic medicine, both in critically understudied Indonesia and globally, and will allow a better understanding of the interacting roles of genomic and environmental factors shaping molecular and complex phenotypes.

Klíčová slova:

DNA methylation – Gene expression – Genomic medicine – Indonesia – Introgression – Islands – Population genetics – RNA sequencing


1. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016. pp. 161–164.

2. Horton R. Offline: Indonesia—unravelling the mystery of a nation. Lancet. 2016;387: 830.

3. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526: 68–74.

4. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538: 201–206.

5. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016;538: 238–242.

6. Jacobs GS, Hudjashov G, Saag L, Kusuma P, Darusallam CC, Lawson DJ, et al. Multiple Deeply Divergent Denisovan Ancestries in Papuans. Cell. 2019;177: 1010–1021.e32.

7. Yamagishi J, Natori A, Tolba MEM, Mongan AE, Sugimoto C, Katayama T, et al. Interactive transcriptome analysis of malaria patients and infecting Plasmodium falciparum. Genome Res. 2014;24: 1433–1444.

8. Elyazar IRF, Hay SI, Baird JK. Malaria distribution, prevalence, drug resistance and control in Indonesia. Adv Parasitol. 2011;74: 41–175.

9. Tedjo Sasmono R., Dhenni Rama, Yohan Benediktus, Pronyk Paul, Hadinegoro Sri Rezeki, Soepardi Elizabeth Jane, et al. Zika Virus Seropositivity in 1–4-Year-Old Children, Indonesia, 2014. Emerging Infectious Disease journal. 2018;24: 1740.

10. Suryanto, Plummer V, Boyle M. Healthcare System in Indonesia. Hosp Top. 2017;95: 82–89.

11. Quintana-Murci L. Human Immunology through the Lens of Evolutionary Genetics. Cell. 2019;177: 184–199.

12. Cox MP, Karafet TM, Lansing JS, Sudoyo H, Hammer MF. Autosomal and X-linked single nucleotide polymorphisms reveal a steep Asian-Melanesian ancestry cline in eastern Indonesia and a sex bias in admixture rates. Proc Biol Sci. 2010;277: 1589–1596.

13. Hudjashov G, Karafet TM, Lawson DJ, Downey S, Savina O, Sudoyo H, et al. Complex Patterns of Admixture across the Indonesian Archipelago. Mol Biol Evol. 2017;34: 2439–2452.

14. Cox MP, Hudjashov G, Sim A, Savina O, Karafet TM, Sudoyo H, et al. Small Traditional Human Communities Sustain Genomic Diversity over Microgeographic Scales despite Linguistic Isolation. Mol Biol Evol. 2016;33: 2273–2284.

15. Babraham Bioinformatics—FastQC A Quality Control tool for High Throughput Sequence Data. [cited 6 Jun 2019]. Available:

16. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–2120.

17. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21.

18. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30: 923–930.

19. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303.

20. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43: 491–498.

21. Auwera GAV der, Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, et al. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics. 2013. pp. 11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43 25431634

22. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303.

23. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015. doi: 10.1186/s13742-015-0047-8 25722852

24. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009. pp. 1655–1664. doi: 10.1101/gr.094052.109 19648217

25. Vallée F, Luciani A, Cox MP. Reconstructing Demography and Social Behavior During the Neolithic Expansion from Genomic Diversity Across Island Southeast Asia. Genetics. 2016;204: 1495–1506.

26. Aguirre-Gamboa R, de Klein N, di Tommaso J, Claringbould A, Vosa U, Zorro M, et al. Deconvolution of bulk blood eQTL effects into immune cell subpopulations. bioRxiv. 2019. p. 548669. doi: 10.1101/548669

27. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13: 86.

28. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12: 453–457.

29. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2017. Available:

30. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11: R25.

31. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15: R29.

32. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43: e47.

33. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009;4: 1184–1191.

34. Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16: 284–287.

35. Carlson M. Genome wide annotation for Human, primarily based on mapping using Entrez Gene identifiers. In: Bioconductor [Internet]. [cited 6 Jun 2019]. Available:

36. Fick SE, Hijmans RJ. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology. 2017. pp. 4302–4315. doi: 10.1002/joc.5086

37. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30: 1363–1369.

38. Maksimovic J, Gordon L, Oshlack A. SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol. 2012;13: R44.

39. Du P, Kibbe WA, Lin SM. lumi: a pipeline for processing Illumina microarray. Bioinformatics. 2008;24: 1547–1548.

40. Phipson B, Maksimovic J, Oshlack A. missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics. 2016;32: 286–288.

41. Geeleher P, Hartnett L, Egan LJ, Golden A, Raja Ali RA, Seoighe C. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics. 2013;29: 1851–1857.

42. Peters TJ, Buckley MJ, Statham AL, Pidsley R, Samaras K, V Lord R, et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin. 2015;8: 6.

43. Gevaert O. MethylMix: an R package for identifying DNA methylation-driven genes. Bioinformatics. 2015;31: 1839–1841.

44. Cedoz P-L, Prunello M, Brennan K, Gevaert O. MethylMix 2.0: an R package for identifying DNA methylation genes. Bioinformatics. 2018;34: 3044–3046.

45. Reich D, Patterson N, Kircher M, Delfin F, Nandineni MR, Pugach I, et al. Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am J Hum Genet. 2011;89: 516–528.

46. Wood SN. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2003. pp. 95–114. doi: 10.1111/1467-9868.00374

47. Wood S. Generalized Additive Models: An Introduction with R. CRC Press; 2006.

48. The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47: D330–D338.

49. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28: 27–30.

50. Paust S, Gill HS, Wang B-Z, Flynn MP, Moseman EA, Senman B, et al. Critical role for the chemokine receptor CXCR6 in NK cell-mediated antigen-specific memory of haptens and viruses. Nat Immunol. 2010;11: 1127–1135.

51. Shenoy AR, Kim B-H, Choi H-P, Matsuzawa T, Tiwari S, MacMicking JD. Emerging themes in IFN-gamma-induced macrophage immunity by the p47 and p65 GTPase families. Immunobiology. 2007;212: 771–784.

52. Pilla-Moffett D, Barber MF, Taylor GA, Coers J. Interferon-Inducible GTPases in Host Resistance, Inflammation and Disease. J Mol Biol. 2016;428: 3495–3513.

53. Jandus C, Boligan KF, Chijioke O, Liu H, Dahlhaus M, Démoulins T, et al. Interactions between Siglec-7/9 receptors and ligands influence NK cell–dependent tumor immunosurveillance. Journal of Clinical Investigation. 2014. pp. 1810–1820. doi: 10.1172/jci65899 24569453

54. Daly J, Carlsten M, O’Dwyer M. Sugar Free: Novel Immunotherapeutic Approaches Targeting Siglecs and Sialic Acids to Enhance Natural Killer Cell Cytotoxicity Against Cancer. Frontiers in Immunology. 2019. doi: 10.3389/fimmu.2019.01047 31143186

55. Fagny M, Patin E, MacIsaac JL, Rotival M, Flutre T, Jones MJ, et al. The epigenomic landscape of African rainforest hunter-gatherers and farmers. Nat Commun. 2015;6: 1–11.

56. Zhai L, Ladomersky E, Lenzen A, Nguyen B, Patel R, Lauing KL, et al. IDO1 in cancer: a Gemini of immune checkpoints. Cell Mol Immunol. 2018;15: 447.

57. Bowdish DME, Sakamoto K, Lack NA, Hill PC, Sirugo G, Newport MJ, et al. Genetic variants of MARCO are associated with susceptibility to pulmonary tuberculosis in a Gambian population. BMC Medical Genetics. 2013. doi: 10.1186/1471-2350-14-47 23617307

58. Ma M-J, Wang H-B, Li H, Yang J-H, Yan Y, Xie L-P, et al. Genetic variants in MARCO are associated with the susceptibility to pulmonary tuberculosis in Chinese Han population. PLoS One. 2011;6: e24069.

59. Dorrington MG, Roche AM, Chauvin SE, Tu Z, Mossman KL, Weiser JN, et al. MARCO Is Required for TLR2- and Nod2-Mediated Responses to Streptococcus pneumoniae and Clearance of Pneumococcal Colonization in the Murine Nasopharynx. The Journal of Immunology. 2013. pp. 250–258. doi: 10.4049/jimmunol.1202113 23197261

60. Thuong NTT, Tram TTB, Dinh TD, Thai PVK, Heemskerk D, Bang ND, et al. MARCO variants are associated with phagocytosis, pulmonary tuberculosis susceptibility and Beijing lineage. Genes Immun. 2016;17: 419–425.

61. Novakowski KE, Yap NVL, Yin C, Sakamoto K, Heit B, Golding GB, et al. Human-Specific Mutations and Positively Selected Sites in MARCO Confer Functional Changes. Mol Biol Evol. 2018;35: 440–450.

62. Gittelman RM, Schraiber JG, Vernot B, Mikacenic C, Wurfel MM, Akey JM. Archaic Hominin Admixture Facilitated Adaptation to Out-of-Africa Environments. Curr Biol. 2016;26: 3375–3382.

63. Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, et al. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011;334: 89–94.

64. Dannemann M, Andrés AM, Kelso J. Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors. Am J Hum Genet. 2016;98: 22–33.

65. WHO. Tuberculosis country profiles. World Health Organization; 2019 [cited 5 Jul 2019]. Available:

66. Favé M-J, Lamaze FC, Soave D, Hodgkinson A, Gauvin H, Bruat V, et al. Gene-by-environment interactions in urban populations modulate risk phenotypes. Nat Commun. 2018;9: 827.

67. Lansing JS, Cox MP, Downey SS, Gabler BM, Hallmark B, Karafet TM, et al. Coevolution of languages and genes on the island of Sumba, eastern Indonesia. Proc Natl Acad Sci U S A. 2007;104: 16022–16026.

68. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51: 584–591.

69. Daar AS, Singer PA. Pharmacogenetics and geographical ancestry: implications for drug development and global health. Nat Rev Genet. 2005;6: 241–246.

70. Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet. 2017;100: 635–649.

71. Mostafavi H, Harpak A, Conley D, Pritchard JK, Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. doi: 10.1101/629949

