Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing

Autoři: George W. Cook aff001;  Michael G. Benton aff002;  Wallace Akerley aff003;  George F. Mayhew aff004;  Cynthia Moehlenkamp aff004;  Denise Raterman aff004;  Daniel L. Burgess aff004;  William J. Rowell aff005;  Christine Lambert aff005;  Kevin Eng aff005;  Jenny Gu aff005;  Primo Baybayan aff005;  John T. Fussell aff001;  Heath D. Herbold aff001;  John M. O’Shea aff006;  Thomas K. Varghese aff007;  Lyska L. Emerson aff008
Působiště autorů: Sentry Genomics, Baton Rouge, LA, United States of America aff001;  Department of Chemical Engineering, Louisiana State University, Baton Rouge, LA, United States of America aff002;  Huntsman Cancer Institute, University of Utah School of Medicine, Department of Oncological Sciences, Salt Lake City, UT, United States of America aff003;  Roche Sequencing Solutions, Madison, WI, United States of America aff004;  Pacific Biosciences, Menlo Park, CA, United States of America aff005;  Huntsman Cancer Institute, Biorepository Molecular Pathology, Salt Lake City, UT, United States of America aff006;  Huntsman Cancer Institute, University of Utah School of Medicine, Department of Surgery, Division of Thoracic Surgery, Salt Lake City, UT, United States of America aff007;  Huntsman Cancer Institute, University of Utah School of Medicine, Department of Pathology, Salt Lake City, UT, United States of America aff008
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article


Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.

Klíčová slova:

Alu elements – Cancer genomics – Genetic networks – Haplotypes – Human genomics – Protein structure networks – Sequence alignment – Sequence motif analysis


1. Chaisson MJ, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T, et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nature communications. 2019;10. doi: 10.1038/s41467-018-07709-6

2. Consortium GP. A global reference for human genetic variation. Nature. 2015;526(7571):68. doi: 10.1038/nature15393 26432245

3. English AC, Salerno WJ, Hampton OA, Gonzaga-Jauregui C, Ambreth S, Ritter DI, et al. Assessing structural variation in a personal genome—towards a human reference diploid genome. BMC Genomics. 2015;16(1):286.

4. Levy SE, Myers RM. Advancements in next-generation sequencing. Annual Review of Genomics and Human Genetics. 2016;17:95–115. doi: 10.1146/annurev-genom-083115-022413 27362342

5. Huddleston J, Chaisson MJ, Steinberg KM, Warren W, Hoekzema K, Gordon D, et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Research. 2017;27(5):677–85. doi: 10.1101/gr.214007.116 27895111

6. Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role of non-coding sequence variants in cancer. Nature Reviews Genetics. 2016;17(2):93. doi: 10.1038/nrg.2015.17 26781813

7. Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nature Reviews Genetics. 2006;7(2):85. doi: 10.1038/nrg1767 16418744

8. Lupski JR. Structural variation mutagenesis of the human genome: impact on disease and evolution. Environmental Molecular Mutagenesis. 2015;56(5):419–36. doi: 10.1002/em.21943 25892534

9. Sharp AJ, Cheng Z, Eichler EE. Structural variation of the human genome. Annual Review of Genomics and Human Genetics. 2006;7:407–42. doi: 10.1146/annurev.genom.7.080505.115618 16780417

10. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318(5849):420–6. doi: 10.1126/science.1149504 17901297

11. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Research. 2012;22(9):1760–74. doi: 10.1101/gr.135350.111 22955987

12. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75. doi: 10.1038/nature15394 26432246

13. Kim JC, Mirkin SM. The balancing act of DNA repeat expansions. Current opinion in genetics & development. 2013;23(3):280–8.

14. Polleys EJ, House NC, Freudenreich CH. Role of recombination and replication fork restart in repeat instability. DNA repair. 2017;56:156–65. doi: 10.1016/j.dnarep.2017.06.018 28641941

15. Han K, Lee J, Meyer TJ, Remedios P, Goodwin L, Batzer MA. L1 recombination-associated deletions generate human genomic variation. Proceedings of the National Academy of Sciences. 2008;105(49):19366–71.

16. Startek M, Szafranski P, Gambin T, Campbell IM, Hixson P, Shaw CA, et al. Genome-wide analyses of LINE–LINE-mediated nonallelic homologous recombination. Nucleic Acids Research. 2015;43(4):2188–98. doi: 10.1093/nar/gku1394 25613453

17. Vogt J, Bengesser K, Claes KB, Wimmer K, Mautner V-F, van Minkelen R, et al. SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints. Genome Biology. 2014;15(6):R80. doi: 10.1186/gb-2014-15-6-r80 24958239

18. Bagshaw AT. Functional mechanisms of microsatellite DNA in eukaryotic genomes. Genome biology and evolution. 2017;9(9):2428–43. doi: 10.1093/gbe/evx164 28957459

19. Gadgil R, Barthelemy J, Lewis T, Leffak M. Replication stalling and DNA microsatellite instability. Biophysical chemistry. 2017;225:38–48. doi: 10.1016/j.bpc.2016.11.007 27914716

20. Kaushal S, Freudenreich CH. The role of fork stalling and DNA structures in causing chromosome fragility. Genes, Chromosomes and Cancer. 2019;58(5):270–83. doi: 10.1002/gcc.22721 30536896

21. Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, disease, and evolution. Annual review of genomics and human genetics. 2009;10:451–81. doi: 10.1146/annurev.genom.9.081307.164217 19715442

22. Hile SE, Shabashev S, Eckert KA. Tumor-specific microsatellite instability: do distinct mechanisms underlie the MSI-L and EMAST phenotypes? Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis. 2013;743:67–77. doi: 10.1016/j.mrfmmm.2012.11.003 23206442

23. Mokarram P, Rismanchi M, Naeeni MA, Samiee SM, Paryan M, Alipour A, et al. Microsatellite instability typing in serum and tissue of patients with colorectal cancer: comparing real time PCR with hybridization probe and high-performance liquid chromatography. Molecular biology reports. 2014;41(5):2835–44. doi: 10.1007/s11033-014-3138-1 24452720

24. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nature Reviews Genetics. 2002;3:370. doi: 10.1038/nrg798 11988762

25. De Smith AJ, Walters RG, Coin LJ, Steinfeld I, Yakhini Z, Sladek R, et al. Small deletion variants have stable breakpoints commonly associated with alu elements. PloS One. 2008;3(8):e3104. doi: 10.1371/journal.pone.0003104 18769679

26. Gu S, Yuan B, Campbell IM, Beck CR, Carvalho CM, Nagamani SC, et al. Alu-mediated diverse and complex pathogenic copy-number variants within human chromosome 17 at p13. 3. Human Molecular Genetics. 2015;24(14):4061–77. doi: 10.1093/hmg/ddv146 25908615

27. Houck CM, Rinehart FP, Schmid CW. A ubiquitous family of repeated DNA sequences in the human genome. Journal of Molecular Biology. 1979;132(3):289–306. doi: 10.1016/0022-2836(79)90261-4 533893

28. Deininger PL, Batzer MA. Alu repeats and human disease. Molecular Genetics Metabolism. 1999;67(3):183–93. doi: 10.1006/mgme.1999.2864 10381326

29. Fazza AC, Sabino FC, Setta Nd, Bordin NA Jr, Silva EHTd, Carareto CMA. Estimating genomic instability mediated by Alu retroelements in breast cancer. Genetics Molecular Biology. 2009;32(1):25–31. doi: 10.1590/S1415-47572009005000018 21637642

30. Franke G, Bausch B, Hoffmann MM, Cybulla M, Wilhelm C, Kohlhase J, et al. Alu‐Alu recombination underlies the vast majority of large VHL germline deletions: Molecular characterization and genotype–phenotype correlations in VHL patients. Human Mutation. 2009;30(5):776–86. doi: 10.1002/humu.20948 19280651

31. Han K, Lee J, Meyer TJ, Wang J, Sen SK, Srikanta D, et al. Alu recombination-mediated structural deletions in the chimpanzee genome. PLoS Genetics. 2007;3(10):e184.

32. Sen SK, Han K, Wang J, Lee J, Wang H, Callinan PA, et al. Human genomic deletions mediated by recombination between Alu elements. The American Journal of Human Genetics. 2006;79(1):41–53. doi: 10.1086/504600 16773564

33. Kass DH, Batzer MA, Deininger PL. Gene conversion as a secondary mechanism of short interspersed element (SINE) evolution. Molecular Cellular Biology. 1995;15(1):19–25. doi: 10.1128/mcb.15.1.19 7799926

34. Roy AM, Carroll ML, Nguyen SV, Salem A-H, Oldridge M, Wilkie AO, et al. Potential gene conversion and source genes for recently integrated Alu elements. Genome Research. 2000;10(10):1485–95. doi: 10.1101/gr.152300 11042148

35. Cook GW, Konkel MK, Walker JA, Bourgeois MG, Fullerton ML, Fussell JT, et al. A comparison of 100 human genes using an alu element-based instability model. PLoS One. 2013;8(6):e65188. doi: 10.1371/journal.pone.0065188 23755193

36. Aleshin A, Zhi D. Recombination-associated sequence homogenization of neighboring Alu elements: signature of nonallelic gene conversion. Molecular biology and evolution. 2010;27(10):2300–11. Epub 2010/05/11. doi: 10.1093/molbev/msq11620453015; PubMed Central PMCID: PMC2950799.

37. Zhi D. Sequence correlation between neighboring Alu instances suggests post-retrotransposition sequence exchange due to Alu gene conversion. Gene. 2007;390(1):117–21.

38. Gordenin D, Lobachev K, Degtyareva N, Malkova A, Perkins E, Resnick M. Inverted DNA repeats: a source of eukaryotic genomic instability. Molecular Cellular Biology. 1993;13(9):5315–22. doi: 10.1128/mcb.13.9.5315 8395002

39. Stenger JE, Lobachev KS, Gordenin D, Darden TA, Jurka J, Resnick MA. Biased distribution of inverted and direct Alus in the human genome: implications for insertion, exclusion, and genome stability. Genome Research. 2001;11(1):12–27. doi: 10.1101/gr.158801 11156612

40. Aygun N. Correlations between long inverted repeat (LIR) features, deletion size and distance from breakpoint in human gross gene deletions. Scientific Reports. 2015;5:8300–. doi: 10.1038/srep08300 25657065.

41. Kitada K, Aikawa S, Aida S. Alu-Alu fusion sequences identified at junction sites of copy number amplified regions in cancer cell lines. Cytogenic Genome Research. 2013;139(1):1–8.

42. Lobachev KS, Stenger JE, Kozyreva OG, Jurka J, Gordenin DA, Resnick MA. Inverted Alu repeats unstable in yeast are excluded from the human genome. The EMBO Journal. 2000;19(14):3822–30. doi: 10.1093/emboj/19.14.3822 10899135

43. Cook GW, Konkel MK, Major JD, Walker JA, Han K, Batzer MA. Alu pair exclusions in the human genome. Mobile DNA. 2011;2(1):10.

44. Song X, Beck CR, Du R, Campbell IM, Coban-Akdemir Z, Gu S, et al. Predicting human genes susceptible to genomic instability associated with Alu/Alu-mediated rearrangements. Genome Research. 2018:gr. 229401.117.

45. Nandakumar D, Patel SS. Finding the right match fast. Cell. 2015;160(5):809–11. doi: 10.1016/j.cell.2015.02.007 25723158

46. Jang JS, Lee A, Li J, Liyanage H, Yang Y, Guo L, et al. Common oncogene mutations and novel SND1-BRAF transcript fusion in lung adenocarcinoma from never smokers. Scientific Reports 2015;5:9755. doi: 10.1038/srep09755 25985019

47. Kawaguchi T, Koh Y, Ando M, Ito N, Takeo S, Adachi H, et al. Prospective analysis of oncogenic driver mutations and environmental factors: Japan molecular epidemiology for lung cancer study. Journal of Clinical Oncology. 2016;34(19):2247–57. doi: 10.1200/JCO.2015.64.2322 27161973

48. Midha A, Dearden S, McCormack R. EGFR mutation incidence in non-small-cell lung cancer of adenocarcinoma histology: a systematic review and global map by ethnicity (mutMapII). American journal of cancer research. 2015;5(9):2892. 26609494

49. Gazdar A. Activating and resistance mutations of EGFR in non-small-cell lung cancer: role in clinical response to EGFR tyrosine kinase inhibitors. Oncogene. 2009;28(S1):S24.

50. Schrock AB, Frampton GM, Herndon D, Greenbowe J, Wang K, Lipson D, et al. Comprehensive genomic profiling identifies frequent drug sensitive EGFR exon 19 deletions in NSCLC not identified by prior molecular testing. Clinical Cancer Research. 2016:clincanres. 1668.2015.

51. Greer SU, Nadauld LD, Lau BT, Chen J, Wood-Bouwens C, Ford JM, et al. Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome medicine. 2017;9(1):57. doi: 10.1186/s13073-017-0447-8 28629429

52. Zhou B, Ho SS, Greer SU, Spies N, Bell JM, Zhang X, et al. Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic acids research. 2019;47(8):3846–61. doi: 10.1093/nar/gkz169 30864654

53. Mantere T, Kersten S, Hoischen A. Long-read sequencing emerging in medical genetics. Frontiers in genetics. 2019;10:426. doi: 10.3389/fgene.2019.00426 31134132

54. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–9. doi: 10.1093/bioinformatics/bty149 29547981

55. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nature Reviews Genetics. 2011;12(3):215. doi: 10.1038/nrg2950 21301473

56. Tian Y, Zhao J, Ren P, Wang B, Zhao C, Shi C, et al. Different subtypes of EGFR exon19 mutation can affect prognosis of patients with non-small cell lung adenocarcinoma. PloS one. 2018;13(11):e0201682. doi: 10.1371/journal.pone.0201682 30383772

57. Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto RA, Brannigan BW, et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non–small-cell lung cancer to gefitinib. New England Journal of Medicine. 2004;350(21):2129–39. doi: 10.1056/NEJMoa040938 15118073

58. Reinersman JM, Johnson ML, Riely GJ, Chitale DA, Nicastri AD, Soff GA, et al. Frequency of EGFR and KRAS mutations in lung adenocarcinomas in African Americans. Journal of Thoracic Oncology. 2011;6(1):28–31. doi: 10.1097/JTO.0b013e3181fb4fe2 21107288

59. Sharma SV, Bell DW, Settleman J, Haber DA. Epidermal growth factor receptor mutations in lung cancer. Nature Reviews Cancer. 2007;7(3):169. doi: 10.1038/nrc2088 17318210

60. Li C, Iida M, Dunn EF, Ghia AJ, Wheeler DL. Nuclear EGFR contributes to acquired resistance to cetuximab. Oncogene. 2009;28(43):3801. doi: 10.1038/onc.2009.234 19684613

61. de Koning AJ, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genetics. 2011;7(12):e1002384. doi: 10.1371/journal.pgen.1002384 22144907

62. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;1:7.

63. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy‐Moonshine A, et al. From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43(1):11.0.1–.0.33.

64. Martin M, Patterson M, Garg S, Fischer S, Pisanti N, Klau GW, et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv. 2016. doi: 10.1101/085050

65. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. doi: 10.1093/bioinformatics/btr330 21653522

66. Smit AF, Riggs AD. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Research. 1995;23(1):98–102. doi: 10.1093/nar/23.1.98 7870595

Článek vyšel v časopise


2020 Číslo 1
Nejčtenější tento týden