Sequencing artifacts derived from a library preparation method using enzymatic fragmentation

Autoři: Norio Tanaka aff001;  Akihisa Takahara aff001;  Taichi Hagio aff001;  Rika Nishiko aff001;  Junko Kanayama aff001;  Osamu Gotoh aff001;  Seiichi Mori aff001
Působiště autorů: Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Ariake, Koto-ku, Tokyo, Japan aff001;  Data4C’s Co. Ltd., Minami-azabu, Minato-ku, Tokyo, Japan aff002
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227427


DNA fragmentation is a fundamental step during library preparation in hybridization capture-based, short-read sequencing. Ultra-sonication has been used thus far to prepare DNA of an appropriate size, but this method is associated with a considerable loss of DNA sample. More recently, studies have employed library preparation methods that rely on enzymatic fragmentation with DNA endonucleases to minimize DNA loss, particularly in nano-quantity samples. Yet, despite their wide use, the effect of enzymatic fragmentation on the resultant sequences has not been carefully assessed. Here, we used pairwise comparisons of somatic variants of the same tumor DNA samples prepared using ultrasonic and enzymatic fragmentation methods. Our analysis revealed a substantially larger number of recurrent artifactual SNVs/indels in endonuclease-treated libraries as compared with those created through ultrasonication. These artifacts were marked by palindromic structure in the genomic context, positional bias in sequenced reads, and multi-nucleotide substitutions. Taking advantage of these distinctive features, we developed a filtering algorithm to distinguish genuine somatic mutations from artifactual noise with high specificity and sensitivity. Noise cancelling recovered the composition of the mutational signatures in the tumor samples. Thus, we provide an informatics algorithm as a solution to the sequencing errors produced as a consequence of endonuclease-mediated fragmentation, highlighted for the first time in this study.

Klíčová slova:

Cancer treatment – DNA fragmentation – DNA libraries – DNA sequencing – Mutation databases – Nucleotide sequencing – Substitution mutation – DNA fragmentation techniques


1. Taber KAJ, Dickinson BD, Wilson M. The Promise and Challenges of Next-Generation Genome Sequencing for Clinical Care. JAMA Intern. Med. 2014 174:275–80. doi: 10.1001/jamainternmed.2013.12048 24217348

2. Ma X, Shao Y, Liqing T, Flasch DA, Mulder HL, Edomonson MN, et al. Analysis of error profiles in deep next-generation sequencing data. Genome Biol. 2019 20:50. doi: 10.1186/s13059-019-1659-6 30867008

3. Head SR, Komori HK, LaMere AS, Whisenant T, Van Nieuwerburgh F, Salomon DR, et al. Library construction for next-generation sequencing: Overviews and challenges. BioTechniques 2014 56:61–77. doi: 10.2144/000114133 24502796

4. Robasky K, Lewis NE, Church GM. The role of replicates for error mitigation in next-generation sequencing. Nat. Rev. Genet. 2014 15:56–62. doi: 10.1038/nrg3655 24322726

5. Abnizova I, te Boekhorst R, Yuriy LO. Computational Errors and Biases in Short Read Next Generation Sequencing. J Proteomics Bioinform. 2017 10(1):1–17.

6. Chen G, Moiser S, Gocke CD, Lin MT, Eshleman JR, Cytosine Deamination is a Major Cause of Baseline Noise in Next Generation Sequencing. Mol Diagn Ther. 2014 18(5): 587–593. doi: 10.1007/s40291-014-0115-2 25091469

7. The Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012 487:330–7. doi: 10.1038/nature11252 22810696

8. The Cancer Genome Atlas Network Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014 511:543–50. doi: 10.1038/nature13385 25079552

9. Alexandrov LB, Nik-zainal S, Wedge DC, Campbell PJ, Stratton MR Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013 3:246–59. doi: 10.1016/j.celrep.2012.12.008 23318258

10. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell 2012 149:979–93. doi: 10.1016/j.cell.2012.04.024 22608084

11. Alexandrov LB, Nik-zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature 2013 500:415–21. doi: 10.1038/nature12477 23945592

12. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2010 38:164.

13. Costello M, Pugh TJ, Fennel TJ, Stewart C, Lichtenstein L, Meldrim JC, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013 41:67.

14. Knierim E, Lucke B, Schwarz JM, Schuelke M, Seelow D, Systematic comparison of three methods for fragmentation of long-range PCR products for next generation sequencing. PLoS One 2011 6:e28240. doi: 10.1371/journal.pone.0028240 22140562

15. He HH, Meyer CA, Hu SS, Chen MW, Zang C, Liu Y, et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat. Methods 2014 11:73–8. doi: 10.1038/nmeth.2762 24317252

16. Lazarovici A, Zhou T, Shafer A, Dantas Machado AC, Riley TR, Sandstrom R, et al. Probing DNA shape and methylation state on a genomic scale with DNase I. Proc. Natl. Acad. Sci. U. S. A. 2013 110:6376–81. doi: 10.1073/pnas.1216822110 23576721

17. Koohy H, Down TA, Hubbard TJ. Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One 2013 8:e69853. doi: 10.1371/journal.pone.0069853 23922824

18. Meyer CA, Liu XS. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Genet. Rev. 2014 15:709–721.

19. Roberts RJ, Halford SE. Type II Restriction Enzymes. In Roberts RJ, Linn SM, Lloyd RS, editors, Nucleases, 2nd Ed. Cold Spring Harbor Laboratory Press. 1993 pp. 35–88.

20. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009 25:1754–60. doi: 10.1093/bioinformatics/btp324 19451168

21. DePristo MA, Banks E, Poplin RE, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011 43:491–8. doi: 10.1038/ng.806 21478889

22. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012 22:568–76. doi: 10.1101/gr.129684.111 22300766

23. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 2013 31:213–9. doi: 10.1038/nbt.2514 23396013

24. Hiroki Ueda Karkinos Available from

25. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010 38:164.

Článek vyšel v časopise


2020 Číslo 1