1. Zaharia M, Chowdhury M, Franklin MJ, Shenker S and Stoica I. “Spark: cluster computing with working sets”, HotCloud’10, USENIX Association, Berkeley, CA, USA.
2. Van der Auwera GA, Carneiro M, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, et al. “From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline”, Current Protocols in Bioinformatics, 43:11.10.1–11.10.33, 2013.
3. Mushtaq H, Liu F, Costa C, Liu G, Hofstee P and Al-Ars Z. “SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale”, Proc. ACM Conference Bioinformatics, Computational Biology and Health Informatics, 2017.
4. Jones DC, Ruzzo WL, Peng X and Katze MG. “Compression of next-generation sequencing reads aided by highly efficient de novo assembly”, Nucleic Acids Research, 2012. doi: 10.1093/nar/gks754
5. Langmead B and Salzberg SL. “Fast gapped-read alignment with Bowtie 2”, Nature Methods, vol. 9, no. 4, pp. 357–359, 2012. doi: 10.1038/nmeth.1923
6. Li H. “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM”, arXiv:1303.3997 [q-bio.GN], 2013.
7. Kelly BJ, Fitch JR, Hu Y, Corsmeier DJ, Zhong H, Wetzel AN, et al. “Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics”, Genome Biology, vol. 16, no. 6, 2015.
8. Decap D, Reumers J, Herzeel C, Costanza P and Fostier J. “Halvade: scalable sequence analysis with MapReduce”, Bioinformatics, btv179v2–btv179, 2015.
9. Deng L, Huang G, Zhuang Y, Wei J and Yan Y. “HiGene: A high-performance platform for genomic data analysis”, Proc. IEEE Inte’l Conf. Bioinformatics and Biomedicine, (BIBM16), Shenzhen, China, pp. 576–583, 2016.
10. Mushtaq H and Al-Ars Z. “Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline”, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, pp. 1471–1477, 2015.
11. Mushtaq H, Ahmed N and Al-Ars Z. “Streaming Distributed DNA Sequence Alignment Using Apache Spark”, 17th IEEE International Conference on BioInformatics and BioEngineering, 2017.