Bioinformatics and Next ‑⁠ generation Sequencing

Czech version

Authors: A. Krejčí; P. Müller; B. Vojtěšek
Authors‘ workplace: Regionální centrum aplikované molekulární onkologie, Masarykův onkologický ústav, Brno
Published in: Klin Onkol 2015; 28(Supplementum 2): 91-96
doi: https://doi.org/10.14735/amko20152S91

Overview

Next-generation sequencing technologies are currently well‑established in the research field and progressively find their way towards clinical applications. Sequencers produce vast amounts of data and therefore bioinformatics methods are needed for processing. Without computational methods, sequencing would not be able to produce relevant biological information. In this review, we introduce the basics of common NGS‑related bioinformatics methods used in oncological research. We also state some of the common problems complicating data processing and interpretation of the results.

Key words:
bioinformatics –⁠ high‑throughput nucleotide sequencing –⁠ mutations –⁠ cancer research –⁠ clinical application

This study was supported by the European Regional Development Fund and the State Budget of the Czech Republic (RECAMO, CZ.1.05/2.1.00/03.0101), by the project MEYS –⁠ NPS I –⁠ LO1413, MH CZ –⁠ DRO (MMCI, 00209805) and BBMRI_CZ (LM2010004).

The authors declare they have no potential conflicts of interest concerning drugs, products, or services used in the study.

The Editorial Board declares that the manuscript met the ICMJE “uniform requirements” for biomedical papers.

Submitted:
21. 4. 2015

Accepted:
26. 6. 2015

Sources

1. Avery OT, Macleod CM, McCarty M. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J Exp Med 1944; 79(2): 137−158.

2. Watson JD, Crick FH. The structure of DNA. Cold Spring Harb Symp Quant Biol 1953; 18 : 123−131.

3. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain‑terminating inhibitors. Proc Natl Acad Sci U S A 1977; 74(12): 5463 –⁠ 5467.

4. Maxam AM, Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci U S A 1977; 74(2): 560−564.

5. Lander ES, Linton LM, Birren B et al. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822): 860 –⁠ 921.

6. Mardis ER. Next ‑⁠ generation sequencing platforms. Annu Rev Anal Chem 2013; 6 : 287 –⁠ 303. doi: 10.1146/ annurev ‑⁠ anchem ‑⁠ 062012 ‑⁠ 092628.

7. Yanhu L, Lu W, Li Y. The principle and application of the single‑molecule real ‑⁠ time sequencing technology. Yi Chuan 2015; 37(3): 259 –⁠ 268. doi: 10.16288/ j.yczz.14 ‑⁠ 323.

8. Fleischmann RD, Adams MD, White O et al. Whole ‑⁠ genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995; 269(5223): 496 –⁠ 512.

9. Sutton GG, White O, Adams MD et al. TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Technol 1995; 1(1): 9 –⁠ 19.

10. Xuan J, Yu Y, Qing T et al. Next ‑⁠ generation sequencing in the clinic: promises and challenges. Cancer Lett 2013; 340(2): 284 –⁠ 295. doi: 10.1016/ j.canlet.2012.11.025.

11. Koubková L, Vojtěšek B, Vyzula R. Sekvenování nové generace a možnosti jeho využití v onkologické praxi. Klin Onkol 2014; 27 (Suppl 1): S61 –⁠ S68. doi: 10.14735/ amko2014S61.

12. Chien ‑⁠ Yueh L, Yu ‑⁠ Chiao C, Liang ‑⁠ Bo W. Common applications of next ‑⁠ generation sequencing technologies in genomic research. Transl Cancer Res 2013; 2(1): 33 –⁠ 45.

13. Human Genome Assembly Data [homepage on the Internet]. Genome Reference Consortium, Great Britain; [updated 2014 January 2; cited 2015 March 1]. Available from: http:/ / www.ncbi.nlm.nih.gov/ projects/ genome/ assembly/ grc/ human/ data.

14. Li H, Durbin R. Fast and accurate short read alignment with Burrows ‑⁠ Wheeler transform. Bioinformatics 2009; 25(14): 1754 –⁠ 1760. doi: 10.1093/ bioinformatics/ btp324.

15. Mckenna A, Hanna M, Banks E et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next ‑⁠ generation DNA sequencing data. Genome Res 2010; 20(9): 1297 –⁠ 1303. doi: 10.1101/ gr.107524.110.

16. Nielsen R, Paul JS, Albrechtsen A et al. Genotype and SNP calling from next ‑⁠ generation sequencing data. Nat Rev Genet 2011; 12(6): 443 –⁠ 451. doi: 10.1038/ nrg2986.

17. Chen K, Wallis JW, Mclellan MD et al. BreakDancer: an algorithm for high‑resolution mapping of genomic structural variation. Nat Methods 2009; 6(9): 677 –⁠ 681. doi: 10.1038/ nmeth.1363.

18. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA ‑⁠ Seq. Bioinformatics 2009; 25(9): 1105 –⁠ 1111. doi: 10.1093/ bioinformatics/ btp120.

19. Kim D, Salzberg SL. TopHat ‑⁠ fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 2011; 12(8): R72. doi: 10.1186/ gb ‑⁠ 2011 ‑⁠ 12 ‑⁠ 8 ‑⁠ r72.

20. Trapnell C, Williams BA, Pertea G et al. Transcript assembly and quantification by RNA ‑⁠ Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28(5): 511 –⁠ 515. doi: 10.1038/ nbt.1621.

21. Hansen KD, Wu Z, Irizarry RA et al. Sequencing technology does not eliminate biological variability. Nat Biotechnol 2011; 29(7): 572 –⁠ 573. doi: 10.1038/ nbt.1910.

22. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26(1): 139 –⁠ 140. doi: 10.1093/ bioinformatics/ btp616.

23. Auer PL, Doerge RW. Statistical design and analysis of RNA sequencing data. Genetics 2010; 185(2): 405 –⁠ 416. doi: 10.1534/ genetics.110.114983.

24. Liu Y, Zhou J, White KP. RNA ‑⁠ seq differential expression studies: more sequence or more replication? Bioinformatics 2014; 30(3): 301 –⁠ 304. doi: 10.1093/ bioinformatics/btt688.