Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing

Autoři: Andrei Prodan aff001;  Valentina Tremaroli aff002;  Harald Brolin aff002;  Aeilko H. Zwinderman aff003;  Max Nieuwdorp aff001;  Evgeni Levin aff001
Působiště autorů: Department of Experimental Vascular Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands aff001;  Wallenberg Laboratory for Cardiovascular and Metabolic Research, Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden aff002;  Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centers, Amsterdam, The Netherlands aff003;  Horaizon BV, Delft, the Netherlands aff004
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227434


Microbial amplicon sequencing studies are an important tool in biological and biomedical research. Widespread 16S rRNA gene microbial surveys have shed light on the structure of many ecosystems inhabited by bacteria, including the human body. However, specialized software and algorithms are needed to convert raw sequencing data into biologically meaningful information (i.e. tables of bacterial counts). While different bioinformatic pipelines are available in a rapidly changing and improving field, users are often unaware of limitations and biases associated with individual pipelines and there is a lack of agreement regarding best practices. Here, we compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and USEARCH-UNOISE3). We tested workflows with different quality control options, clustering algorithms, and cutoff parameters on a mock community as well as on a large (N = 2170) recently published fecal sample dataset from the multi-ethnic HELIUS study. We assessed the sensitivity, specificity, and degree of consensus of the different outputs. DADA2 offered the best sensitivity, at the expense of decreased specificity compared to USEARCH-UNOISE3 and Qiime2-Deblur. USEARCH-UNOISE3 showed the best balance between resolution and specificity. OTU-level USEARCH-UPARSE and MOTHUR performed well, but with lower specificity than ASV-level pipelines. QIIME-uclust produced large number of spurious OTUs as well as inflated alpha-diversity measures and should be avoided in future studies. This study provides guidance for researchers using amplicon sequencing to gain biological insights.

Klíčová slova:

Bacteria – Bioinformatics – Clustering algorithms – DNA sequencing – Quality control – Ribosomal RNA – Sequence alignment – Sequence databases


1. Baird DJ, HajibabeiI M. Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Mol Ecol. 2012;21:2039–2044. doi: 10.1111/j.1365-294x.2012.05519.x 22590728

2. Lynch S V., Pedersen O. The Human Intestinal Microbiome in Health and Disease. Phimister EG, editor. N Engl J Med. 2016;375:2369–2379. doi: 10.1056/NEJMra1600266 27974040

3. van Dijk EL, Jaszczyszyn Y, Naquin D, Thermes C. The Third Revolution in Sequencing Technology. Trends Genet. 2018;34:666–681. doi: 10.1016/j.tig.2018.05.008 29941292

4. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–41. doi: 10.1128/AEM.01541-09 19801464

5. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. doi: 10.1093/bioinformatics/btq461 20709691

6. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Meth. 2016;13:581–583. doi: 10.1038/nmeth.3869 27214047

7. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, et al. Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns. Gilbert JA, editor. mSystems. 2017;2:e00191–16. doi: 10.1128/mSystems.00191-16 28289731

8. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–857 doi: 10.1038/s41587-019-0209-9 31341288

9. Callahan BJ, McMurdie PJ, Holmes SP, Callahan BJ, Mcmurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. bioRxiv. 2017. doi: 10.1038/ismej.2017.119 28731476

10. Edgar RC. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv. 2016;081257. doi: 10.1101/081257

11. Almeida A, Mitchell AL, Tarkowska A, Finn RD. Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. Gigascience. 2018;7:giy054. doi: 10.1093/gigascience/giy054 29762668

12. Nearing JT, Douglas GM, Comeau AM, Langille MGI. Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction methods. PeerJ. 2018;6:e5364. doi: 10.7717/peerj.5364 30123705

13. Snijder MB, Galenkamp H, Prins M, Derks EM, Peters RJG, Zwinderman AH, et al. Cohort profile: the Healthy Life in an Urban Setting (HELIUS) study in Amsterdam, The Netherlands. BMJ Open. 2017;7:e017873. doi: 10.1136/bmjopen-2017-017873 29247091

14. Deschasaux M, Bouter KE, Prodan A, Levin E, Groen AK, Herrema H, et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat Med. 2018;24:1526. doi: 10.1038/s41591-018-0160-1 30150717

15. Mobini R, Tremaroli V, Ståhlman M, Karlsson F, Levin M, Ljungberg M, et al. Metabolic effects of Lactobacillus reuteriDSM 17938 in people with type 2 diabetes: A randomized controlled trial. Diabetes, Obes Metab. 2017;19:579–589. doi: 10.1111/dom.12861 28009106

16. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Env Microbiol. 2013/06/25. 2013;79:5112–5120. doi: 10.1128/aem.01043-13 23793624

17. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–6. doi: 10.1038/nmeth.f.303 20383131

18. Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013/08/21. 2013;10:996–998. doi: 10.1038/nmeth.2604 23955772

19. Edgar RC, Flyvbjerg H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics. 2015;31:3476–3482. doi: 10.1093/bioinformatics/btv401 26139637

20. Rideout JR, He Y, Navas-Molina JA, Walters WA, Ursell LK, Gibbons SM, et al. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ. 2014;2:e545. doi: 10.7717/peerj.545 25177538

21. Westcott SL, Schloss PD. OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units. mSphere. 2017;2:e00073–17. doi: 10.1128/mSphereDirect.00073-17 28289728

22. Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584. doi: 10.7717/peerj.2584 27781170

23. Edgar R. UCHIME2: improved chimera prediction for amplicon sequencing. bioRxiv. 2016; 074252. doi: 10.1101/074252

24. McMurdie PJ, Holmes S. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLoS One. 2013;8:e61217. doi: 10.1371/journal.pone.0061217 23630581

25. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.; 2016.

26. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2009.

27. Wei T, Simko V. R package ‘corrplot’: visualization of a correlation matrix (version 0.84).’. Retrived from https://githubcom/taiyun/corrplot. 2017.

28. Chen H, Boutros C. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics. 2011;12:35. doi: 10.1186/1471-2105-12-35 21269502

29. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39:e90–e90. doi: 10.1093/nar/gkr344 21576222

30. Edgar RC. Accuracy of microbial community diversity estimated by closed- and open-reference OTUs. PeerJ. 2017;5:e3889. doi: 10.7717/peerj.3889 29018622

31. Bokulich N, Subramanian S, Faith J, Gevers D. Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nature. 2013 Available:

Článek vyšel v časopise


2020 Číslo 1