Abundance of ethnically biased microsatellites in human gene regions

Autoři: Nick Kinney aff001;  Lin Kang aff001;  Laurel Eckstrand aff003;  Arichanah Pulenthiran aff001;  Peter Samuel aff001;  Ramu Anandakrishnan aff001;  Robin T. Varghese aff001;  P. Michalak aff001;  Harold R. Garner aff001
Působiště autorů: Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America aff001;  Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America aff002;  Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America aff003;  Institute of Evolution, University of Haifa, Haifa, Israel aff004
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0225216


Microsatellites–a type of short tandem repeat (STR)–have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.

Klíčová slova:

Contingency tables – Ethnicities – Gene expression – Human genomics – Introns – Population genetics – principal component analysis – Sequence motif analysis


1. de Koning APJ, Gu WJ, Castoe TA, Batzer MA, Pollock DD. Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. Plos Genet. 2011;7(12). ARTN e1002384 doi: 10.1371/journal.pgen.1002384 WOS:000299167900005.

2. Ellegren H. Microsatellites: Simple sequences with complex evolution. Nature Reviews Genetics. 2004;5(6):435–45. 10.1038/nrg1348. WOS:000221759700014. doi: 10.1038/nrg1348 15153996

3. Borstnik B, Pumpernik D. Tandem repeats in protein coding regions of primate genes. Genome Res. 2002;12(6):909–15. doi: 10.1101/gr.138802 12045144; PubMed Central PMCID: PMC1383732.

4. Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: Structure, function, and evolution. Mol Biol Evol. 2004;21(6):991–1007. doi: 10.1093/molbev/msh073 WOS:000221599300004. 14963101

5. Murmann AE, Yu JD, Opal P, Peter ME. Trinucleotide Repeat Expansion Diseases, RNAi, and Cancer. Trends Cancer. 2018;4(10):684–700. doi: 10.1016/j.trecan.2018.08.004 30292352

6. Everett CM. Trinucleotide Repeat Disorders. Encyclopedia of Movement Disorders, Vol 3: Q-Z. 2010:290–6. WOS:000335076900087.

7. Hannan AJ. TANDEM REPEAT POLYMORPHISMS Mediators of Genetic Plasticity, Modulators of Biological Diversity and Dynamic Sources of Disease Susceptibility. Adv Exp Med Biol. 2012;769:1–9. Book_Doi 10.1007/978-1-4614-5434-2. WOS:000333841400002. 23560301

8. Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19(5):286–98. doi: 10.1038/nrg.2017.115 29398703

9. Gymrek M, Willems T, Guilmatre A, Zeng HY, Markus B, Georgiev S, et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat Genet. 2016;48(1):22–+. WOS:000367255300009. doi: 10.1038/ng.3461 26642241

10. Sawaya SM, Bagshaw AT, Buschiazzo E, Gemmell NJ. Promoter Microsatellites as Modulators of Human Gene Expression. In: Hannan AJ, editor. Tandem Repeat Polymorphisms: Genetic Plasticity, Neural Diversity and Disease. New York, NY: Springer New York; 2012. p. 41–54.

11. Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ. Unstable tandem repeats in promoters confer transcriptional evolvability. Science. 2009;324(5931):1213–6. Epub 2009/05/30. doi: 10.1126/science.1170097 19478187; PubMed Central PMCID: PMC3132887.

12. Bacolla A, Wells RD. Non-B DNA Conformations as Determinants of Mutagenesis and Human Disease. Mol Carcinogen. 2009;48(4):273–85. doi: 10.1002/mc.20507 WOS:000264918500002.

13. Sonay TB, Carvalho T, Robinson MD, Greminger MP, Krutzen M, Comas D, et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res. 2015;25(11):1591–9. WOS:000364355600001. doi: 10.1101/gr.190868.115 26290536

14. Bruford MW, Wayne RK. Microsatellites and Their Application to Population Genetic-Studies. Curr Opin Genet Dev. 1993;3(6):939–43. doi: 10.1016/0959-437x(93)90017-J WOS:A1993MW46500017. 8118220

15. Brinkmann B, Junge A, Meyer E, Wiegand P. Population genetic diversity in relation to microsatellite heterogeneity. Human Mutation. 1998;11(2):135–44. WOS:000071841800006. doi: 10.1002/(SICI)1098-1004(1998)11:2<135::AID-HUMU6>3.0.CO;2-I 9482577

16. Nei M, Roychoudhury AK. Evolutionary Relationships of Human-Populations on a Global-Scale. Mol Biol Evol. 1993;10(5):927–43. WOS:A1993LX26600001. doi: 10.1093/oxfordjournals.molbev.a040059 8412653

17. Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, et al. Ethnic-affiliation estimation by use of population-specific DNA markers. Am J Hum Genet. 1997;60(4):957–64. WOS:A1997WT61400026. 9106543

18. Edwards A, Hammond HA, Jin L, Caskey CT, Chakraborty R. Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. Genomics. 1992;12(2):241–53. Epub 1992/02/01. 1740333.

19. Bowcock AM, Ruizlinares A, Tomfohrde J, Minch E, Kidd JR, Cavallisforza LL. High-Resolution of Human Evolutionary Trees with Polymorphic Microsatellites. Nature. 1994;368(6470):455–7. WOS:A1994ND12000063. doi: 10.1038/368455a0 7510853

20. Zhivotovsky LA, Rosenberg NA, Feldman MW. Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. Am J Hum Genet. 2003;72(5):1171–86. Epub 2003/04/12. doi: 10.1086/375120 12690579; PubMed Central PMCID: PMC1180270.

21. Jorde LB, Rogers AR, Bamshad M, Watkins WS, Krakowiak P, Sung S, et al. Microsatellite diversity and the demographic history of modern humans. P Natl Acad Sci USA. 1997;94(7):3100–3. doi: 10.1073/pnas.94.7.3100 WOS:A1997WR93000064.

22. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–59. WOS:000087475100039. 10835412

23. Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science. 2009;324(5930):1035–44. Epub 2009/05/02. doi: 10.1126/science.1172257 19407144; PubMed Central PMCID: PMC2947357.

24. Algee-Hewitt BF, Edge MD, Kim J, Li JZ, Rosenberg NA. Individual Identifiability Predicts Population Identifiability in Forensic Microsatellite Markers. Curr Biol. 2016;26(7):935–42. Epub 2016/03/22. doi: 10.1016/j.cub.2016.01.065 26996508.

25. Creanza N, Ruhlen M, Pemberton TJ, Rosenberg NA, Feldman MW, Ramachandran S. A comparison of worldwide phonemic and genetic variation in human populations. Proc Natl Acad Sci U S A. 2015;112(5):1265–72. Epub 2015/01/22. doi: 10.1073/pnas.1424033112 25605893; PubMed Central PMCID: PMC4321277.

26. Santos NPC, Ribeiro-Rodrigues EM, Ribeiro-dos-Santos AKC, Pereira R, Gusmao L, Amorim A, et al. Assessing Individual Interethnic Admixture and Population Substructure Using a 48-Insertion-Deletion (INSEL) Ancestry-Informative Marker (AIM) Panel. Hum Mutat. 2010;31(2):184–90. WOS:000274461800009. doi: 10.1002/humu.21159 19953531

27. Friedlaender JS, Friedlaender FR, Reed FA, Kidd KK, Kidd JR, Chambers GK, et al. The genetic structure of Pacific Islanders. PLoS Genet. 2008;4(1):e19. Epub 2008/01/23. doi: 10.1371/journal.pgen.0040019 18208337; PubMed Central PMCID: PMC2211537.

28. Genome of the Netherlands C. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet. 2014;46(8):818–25. Epub 2014/07/01. doi: 10.1038/ng.3021 24974849.

29. Broman KW, Murray JC, Sheffield VC, White RL, Weber JL. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet. 1998;63(3):861–9. Epub 1998/08/27. doi: 10.1086/302011 9718341; PubMed Central PMCID: PMC1377399.

30. Pemberton TJ, DeGiorgio M, Rosenberg NA. Population Structure in a Comprehensive Genomic Data Set on Human Microsatellite Variation. G3-Genes Genom Genet. 2013;3(5):891–907. doi: 10.1534/g3.113.005728 WOS:000319438700010.

31. Santos C, Phillips C, Oldoni F, Amigo J, Fondevila M, Pereira R, et al. Completion of a worldwide reference panel of samples for an ancestry informative Indel assay. Forensic Sci Int-Gen. 2015;17:75–80. doi: 10.1016/j.fsigen.2015.03.011 WOS:000355918400012.

32. Willems T, Gymrek M, Highnam G, Genomes Project C, Mittelman D, Erlich Y. The landscape of human STR variation. Genome Res. 2014;24(11):1894–904. Epub 2014/08/20. doi: 10.1101/gr.177774.114 25135957; PubMed Central PMCID: PMC4216929.

33. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75–+. WOS:000362095100037. doi: 10.1038/nature15394 26432246

34. Schrider DR, Kern AD. Soft Sweeps Are the Dominant Mode of Adaptation in the Human Genome. Molecular biology and evolution. 2017;34(8):1863–77. Epub 2017/05/10. doi: 10.1093/molbev/msx154 28482049; PubMed Central PMCID: PMC5850737.

35. Fondon JW 3rd, Garner HR. Molecular origins of rapid and continuous morphological evolution. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(52):18058–63. Epub 2004/12/15. doi: 10.1073/pnas.0408118101 15596718; PubMed Central PMCID: PMC539791.

36. Gemayel R, Vinces MD, Legendre M, Verstrepen KJ. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual review of genetics. 2010;44:445–77. Epub 2010/09/03. doi: 10.1146/annurev-genet-072610-155046 20809801.

37. Kashi Y, King DG. Simple sequence repeats as advantageous mutators in evolution. Trends in genetics: TIG. 2006;22(5):253–9. Epub 2006/03/29. doi: 10.1016/j.tig.2006.03.005 16567018.

38. Haasl RJ, Payseur BA. Microsatellites as targets of natural selection. Molecular biology and evolution. 2013;30(2):285–98. Epub 2012/10/30. doi: 10.1093/molbev/mss247 23104080; PubMed Central PMCID: PMC3548306.

39. Katti MV, Ranjekar PK, Gupta VS. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol. 2001;18(7):1161–7. WOS:000169846400002. doi: 10.1093/oxfordjournals.molbev.a003903 11420357

40. Ikeuchi T, Koide R, Tanaka H, Onodera O, Igarashi S, Takahashi H, et al. Dentatorubral-Pallidoluysian Atrophy—Clinical-Features Are Closely-Related to Unstable Expansions of Trinucleotide (Cag) Repeat. Ann Neurol. 1995;37(6):769–75. WOS:A1995RD04000009. doi: 10.1002/ana.410370610 7778850

41. Komure O, Sano A, Nishino N, Yamauchi N, Ueno S, Kondoh K, et al. DNA Analysis in Hereditary Dentatorubral-Pallidoluysian Atrophy—Correlation between Cag Repeat Length and Phenotypic Variation and the Molecular-Basis of Anticipation. Neurology. 1995;45(1):143–9. WOS:A1995QB88600028. doi: 10.1212/wnl.45.1.143 7824105

42. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46(D1):D649–D55. WOS:000419550700098. doi: 10.1093/nar/gkx1132 29145629

43. Du JL, Yuan ZF, Ma ZW, Song JZ, Xie XL, Chen YL. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. Mol Biosyst. 2014;10(9):2441–7. WOS:000340437200017. doi: 10.1039/c4mb00287c 24994036

44. Naba A, Clauser KR, Hoersch S, Liu H, Carr SA, Hynes RO. The Matrisome: In Silico Definition and In Vivo Characterization by Proteomics of Normal and Tumor Extracellular Matrices. Mol Cell Proteomics. 2012;11(4). ARTN M111.014647 10.1074/mcp.M111.014647. WOS:000302786500016.

45. von Pein F, Valkkila M, Schwarz R, Morcher M, Klima B, Grau A, et al. Analysis of the COL3A1 gene in patients with spontaneous cervical artery dissections. J Neurol. 2002;249(7):862–6. WOS:000177159700012. doi: 10.1007/s00415-002-0745-x 12140670

46. Kizawa H, Kou I, Iida A, Sudo A, Miyamoto Y, Fukuda A, et al. An aspartic acid repeat polymorphism in asporin inhibits chondrogenesis and increases susceptibility to osteoarthritis. Nat Genet. 2005;37(2):138–44. WOS:000226690100019. doi: 10.1038/ng1496 15640800

47. Liu RX, Yuan XL, Yu J, Quan Q, Meng HY, Wang C, et al. An updated meta-analysis of the asporin gene D-repeat in knee osteoarthritis: effects of gender and ethnicity. J Orthop Surg Res. 2017;12. ARTN 148 10.1186/s13018-017-0647-3. WOS:000412896000001.

48. Queitsch C, Carlson KD, Girirajan S. Lessons from Model Organisms: Phenotypic Robustness and Missing Heritability in Complex Disease. Plos Genet. 2012;8(11). ARTN e1003041 10.1371/journal.pgen.1003041. WOS:000311891600029.

49. Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet. 2014;30(11):504–12. WOS:000344046400007. doi: 10.1016/j.tig.2014.07.008 25182195

50. Hannan AJ. Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for 'missing heritability'. Trends Genet. 2010;26(2):59–65. WOS:000274987400004. doi: 10.1016/j.tig.2009.11.008 20036436

51. Chen CH, Chuang TJ, Liao BY, Chen FC. Scanning for the signatures of positive selection for human-specific insertions and deletions. Genome Biol Evol. 2009;1:415–9. Epub 2009/01/01. doi: 10.1093/gbe/evp041 20333210; PubMed Central PMCID: PMC2817433.

52. Schlenke TA, Begun DJ. Strong selective sweep associated with a transposon insertion in Drosophila simulans. Proc Natl Acad Sci U S A. 2004;101(6):1626–31. Epub 2004/01/28. doi: 10.1073/pnas.0303793101 14745026; PubMed Central PMCID: PMC341797.

53. Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 2016;44(8):3750–62. WOS:000376389000030. doi: 10.1093/nar/gkw219 27060133

54. Handsaker RE, Korn JM, Nemesh J, McCarroll SA. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet. 2011;43(3):269–U126. WOS:000287693800020. doi: 10.1038/ng.768 21317889

55. Hindorff LA, Bonham VL, Brody LC, Ginoza MEC, Hutter CM, Manolio TA, et al. Prioritizing diversity in human genomics research. Nat Rev Genet. 2018;19(3):175–+. WOS:000425031400008. doi: 10.1038/nrg.2017.89 29151588

56. Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent (vol 51, pg 30, 2018). Nat Genet. 2019;51(2):364–. doi: 10.1038/s41588-018-0335-1 WOS:000457314300025.

57. Highnam G, Franck C, Martin A, Stephens C, Puthige A, Mittelman D. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res. 2013;41(1). ARTN e32 10.1093/nar/gks981. WOS:000312889900032.

58. Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: Accurate indel calls from short-read data. Genome Res. 2011;21(6):961–73. WOS:000291153400016. doi: 10.1101/gr.112326.110 20980555

59. Mose LE, Wilkerson MD, Hayes DN, Perou CM, Parker JS. ABRA: improved coding indel detection via assembly-based realignment. Bioinformatics. 2014;30(19):2813–5. WOS:000343082900018. doi: 10.1093/bioinformatics/btu376 24907369

60. Lu D, Xu S. Principal component analysis reveals the 1000 Genomes Project does not sufficiently cover the human genetic diversity in Asia. Front Genet. 2013;4:127. Epub 2013/07/13. doi: 10.3389/fgene.2013.00127 23847652; PubMed Central PMCID: PMC3701331.

61. Tae H, Kim DY, McCormick J, Settlage RE, Garner HR. Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs. Bioinformatics. 2014;30(5):652–9. WOS:000332259300009. doi: 10.1093/bioinformatics/btt595 24135263

62. Tae H, McMahon KW, Settlage RE, Bavarva JH, Garner HR. ReviSTER: an automated pipeline to revise misaligned reads to simple tandem repeats. Bioinformatics. 2013;29(14):1734–41. WOS:000321747800004. doi: 10.1093/bioinformatics/btt277 23677944

63. Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nature reviews Genetics. 2009;10(10):691–703. doi: 10.1038/nrg2640 19763152; PubMed Central PMCID: PMC2884099.

64. Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat Methods. 2017;14(6):590–+. WOS:000402291800021. doi: 10.1038/nmeth.4267 28436466

65. Gymrek M, Golan D, Rosset S, Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Res. 2012;22(6):1154–62. WOS:000304728100017. doi: 10.1101/gr.135780.111 22522390

66. Tankard RM, Bennett ME, Degorski P, Delatycki MB, Lockhart PJ, Bahlo M. Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data. Am J Hum Genet. 2018;103(6):858–73. WOS:000452535600003. doi: 10.1016/j.ajhg.2018.10.015 30503517

67. Dashnow H, Lek M, Phipson B, Halman A, Sadedin S, Lonsdale A, et al. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biol. 2018;19. ARTN 121 10.1186/s13059-018-1505-2. WOS:000442375800001.

68. Tang HB, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, et al. Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Am J Hum Genet. 2017;101(5):700–15. WOS:000414251600004. doi: 10.1016/j.ajhg.2017.09.013 29100084

69. Budis J, Kucharik M, Duris F, Gazdarica J, Zrubcova M, Ficek A, et al. Dante: genotyping of known complex and expanded short tandem repeats. Bioinformatics. 2018. Epub 2018/09/12. doi: 10.1093/bioinformatics/bty791 30203023.

70. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38(8):904–9. doi: 10.1038/ng1847 16862161.

71. Yu GC, Wang LG, Han YY, He QY. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. Omics. 2012;16(5):284–7. WOS:000303653300007. doi: 10.1089/omi.2011.0118 22455463

Článek vyšel v časopise


2019 Číslo 12
Nejčtenější tento týden