Restriction enzymes use a 24 dimensional coding space to recognize 6 base long DNA sequences

Autoři: Thomas D. Schneider aff001;  Vishnu Jejjala aff002
Působiště autorů: National Institutes of Health, National Cancer Institute, Center for Cancer Research, RNA Biology Laboratory, Frederick, Maryland, United States of America aff001;  Mandelstam Institute for Theoretical Physics, School of Physics, NITheP, and CoE-MaSS, University of the Witwatersrand, Johannesburg, South Africa aff002;  David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America aff003
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0222419


Restriction enzymes recognize and bind to specific sequences on invading bacteriophage DNA. Like a key in a lock, these proteins require many contacts to specify the correct DNA sequence. Using information theory we develop an equation that defines the number of independent contacts, which is the dimensionality of the binding. We show that EcoRI, which binds to the sequence GAATTC, functions in 24 dimensions. Information theory represents messages as spheres in high dimensional spaces. Better sphere packing leads to better communications systems. The densest known packing of hyperspheres occurs on the Leech lattice in 24 dimensions. We suggest that the single protein EcoRI molecule employs a Leech lattice in its operation. Optimizing density of sphere packing explains why 6 base restriction enzymes are so common.

Klíčová slova:

DNA-binding proteins – Information theory – Sequence databases – Molecular machines – Leeches – Coding theory – Channel capacity – Packing density


1. Roberts RJ. How restriction enzymes became the workhorses of molecular biology. Proc Natl Acad Sci USA. 2005;102:5905–5908. 15840723

2. Pingoud A, Wilson GG, Wende W. Type II restriction endonucleases-a historical perspective and more. Nucleic Acids Res. 2014;42:7489–7527. 24878924

3. Simmon VF, Lederberg S. Degradation of bacteriophage lambda deoxyribonucleic acid after restriction by Escherichia coli K-12. J Bacteriol. 1972;112:161–169. 4562392

4. Heitman J, Zinder ND, Model P. Repair of the Escherichia coli chromosome after in vivo scission by the EcoRI endonuclease. Proc Natl Acad Sci USA. 1989;86:2281–2285. doi: 10.1073/pnas.86.7.2281 2648397

5. Lesser DR, Kurpiewski MR, Jen-Jacobson L. The energetic basis of specificity in the Eco RI endonuclease–DNA interaction. Science. 1990;250:776–786.,

6. Leech J. Some sphere packings in higher space. Canad J Math. 1964;16:657–682.

7. Schneider TD. Theory of Molecular Machines. I. Channel Capacity of Molecular Machines. J Theor Biol. 1991;148:83–123., 2016886

8. Weber IT, Steitz TA. A model for the non-specific binding of catabolite gene activator protein to DNA. Nucleic Acids Res. 1984;12:8475–8487. 6390343

9. Piatt SC, Loparo JJ, Price AC. The Role of Noncognate Sites in the 1D Search Mechanism of EcoRI. Biophys J. 2019;116:2367–2377. 31113551

10. Weber IT, Steitz TA. Model of specific complex between catabolite gene activator protein and B-DNA suggested by electrostatic complementarity. Proc Natl Acad Sci USA. 1984;81:3973–3977. 6377305

11. McClarin JA, Frederick CA, Wang BC, Greene P, Boyer HW, Grable J, et al. Structure of the DNA-Eco RI endonuclease recognition complex at 3 Å resolution. Science. 1986;234:1526–1541.

12. Schneider TD. 70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence. Nucleic Acids Res. 2010;38:5995–6006., 20562221

13. Clore GM, Gronenborn AM, Davies RW. Theoretical Aspects of Specific and Non-specific Equilibrium Binding of Proteins to DNA as Studied by the Nitrocellulose Filter Binding Assay: Co-operative and Non-co-operative Binding to a One-dimensional Lattice. J Mol Biol. 1982;155:447–466. 6283096

14. Shannon CE. A Mathematical Theory of Communication. Bell System Tech J. 1948;27:379–423, 623–656.

15. Shannon CE. Communication in the Presence of Noise. Proc IRE. 1949;37:10–21.,

16. Schneider TD. Theory of Molecular Machines. II. Energy Dissipation from Molecular Machines. J Theor Biol. 1991;148:125–137., 2016881

17. Macaulay D, Ardley N. The New Way Things Work. Houghton Mifflin Company; 1998.

18. Uhlenbeck GE, Ornstein LS. On the theory of Brownian motion. Phys Rev Lett. 1930;36:823–841.

19. Schneider TD. Claude Shannon: Biologist. IEEE Engineering in Medicine and Biology Magazine. 2006;25:30–33., 16485389

20. Pierce JR. An Introduction to Information Theory: Symbols, Signals and Noise. NY: Dover Publications, Inc.; 1980.

21. Schneider TD. Information Theory Primer, With an Appendix on Logarithms. Published on the web. 2013;2013.,

22. Watson JD, Hopkins NH, Roberts JW, Steitz JA, Weiner AM. Molecular Biology of the Gene. Menlo Park, California: The Benjamin/Cummings Publishing Co., Inc.; 1987.

23. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986;188:415–431., 3525846

24. Schneider TD. Evolution of Biological Information. Nucleic Acids Res. 2000;28:2794–2799., 10908337

25. Schneider TD, Stephens RM. Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Res. 1990;18:6097–6100., 2172928

26. Conway JH, Sloane NJA. Sphere Packings, Lattices and Groups. New York: Springer-Verlag; 1998.

27. Schneider TD. A Brief Review of Molecular Information Theory. Nano Communication Networks. 2010;1:173–180., 22110566

28. Schneider TD. Sequence Logos, Machine/Channel Capacity, Maxwell’s Demon, and Molecular Computers: a Review of the Theory of Molecular Machines. Nanotechnology. 1994;5:1–18.,

29. Schneider TD. Twenty Years of Delila and Molecular Information Theory: The Altenberg-Austin Workshop in Theoretical Biology Biological Information, Beyond Metaphor: Causality, Explanation, and Unification Altenberg, Austria, 11-14 July 2002. Biol Theory. 2006;1:250–260.,, 18084638

30. Pierce JR, Cutler CC. Interplanetary Communications. In: Ordway FI III, editor. Advances in Space Science, Vol. 1. N. Y.: Academic Press, Inc.; 1959. p. 55–109.

31. Chaplin M. Water Absorption Spectrum. 2000; last updated 2018 October 3, last accessed 2018 Oct 11.

32. Jaynes ET. The Muscle As An Engine. unpublished manuscript. 1983; p. 1–5.,

33. Jaynes ET. The Evolution of Carnot’s Principle. In: Erickson GJ, Smith CR, editors. Maximum-Entropy and Bayesian Methods in Science and Engineering. Dordrecht, The Netherlands: Kluwer Academic Publishers; 1988. p. 267–281.

34. Felker JH. A Link Between Information and Energy. Proc IRE. 1952;40:728–729.

35. Szilard L. Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Z Phys. 1929;53:840–856.

36. Herr AJ, Williams LN, Preston BD. Antimutator variants of DNA polymerases. Crit Rev Biochem Mol Biol. 2011;46:548–570. 21977975

37. Wohlgemuth I, Pohl C, Rodnina MV. Optimization of speed and accuracy of decoding in translation. EMBO J. 2010;29:3701–3709. 20842102

38. Collier J. What We Can Discover from Dimensional Analysis of the Information Concept. In: Embodied, Embedded, Networked, Empowered through Information, Computation & Cognition! In Proceedings of the DIGITALISATION FOR A SUSTAINABLE SOCIETY 12-16 June 2017; Gothenburg, Sweden; 2017. p. 1–3.

39. Qiang BQ, Schildkraut I. NotI and SfiI: restriction endonucleases with octanucleotide recognition sequences. Methods Enzymol. 1987;155:15–21. 2828862

40. Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE–a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2015;43:D298–9. 25378308

41. Dixon HBF, Bielka H, Cantor CR, Liebecq C, Sharon N, Velick SF, et al. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Eur J Biochem. 1985;150:1–5.

42. Kamps-Hughes N, Quimby A, Zhu Z, Johnson EA. Massively parallel characterization of restriction endonucleases. Nucleic Acids Res. 2013;41:e119. 23605040

43. Schneider TD. Consensus Sequence Zen. Applied Bioinformatics. 2002;1:111–119., 15130839

44. Sloane NJA. The Sphere Packing Problem. Documenta Mathematika. 1998;3:387–396.,,

45. Sommerville DMY. An Introduction to the Geometry of N Dimensions. NY., NY: E. P. Dutton; 1929.

46. Kendall MG. A Course in the Geometry of n Dimensions. New York: Hafner Publishing Company; 1961.

47. Hayes B. An Adventure in the Nth Dimension. Amer Sci. 2011;99:442–446. doi: 10.1511/2011.93.442

48. Conway JH, Sloane NJA. Laminated Lattices. Annals of Mathematics. 1982;116:593–620.,

49. Sloane NJA. The Packing of Spheres. Sci Am. 1984;250:116–125.

50. Cipra B. Packing Your n-Dimensional Marbles. Science. 1990;247:1035. 17800054

51. Stewart I. Mathematics: the 24-dimensional greengrocer. Nature. 2003;424:895–896. 12931173

52. Cohn H, Elkies N. New upper bounds on sphere packings I. Annals of Mathematics. 2003;157:689–714.,

53. Cohn H, Kumar A. The densest lattice in twenty-four dimensions. Electronic Research Announcements of the American Mathematical Society. 2004;10:58–67.,

54. Cohn H, Kumar A. Optimality and uniqueness of the Leech lattice among lattices. Annals of Mathematics. 2009;170:1003–1050.,,

55. Klarreich E. Sphere Packing Solved in Higher Dimensions, A Ukrainian mathematician has solved the centuries-old sphere-packing problem in dimensions eight and 24. Quanta Magazine. 2016;20160330:1–6.

56. Viazovska M. The sphere packing problem in dimension 8. Annals of Mathematics. 2017;185:991–1015.,,

57. Cohn H, Kumar A, Miller SD, Radchenko D, Viazovska M. The sphere packing problem in dimension 24. Annals of Mathematics. 2017;185:1017–1033.,,

58. Lang GR, Longstaff FM. A Leech Lattice Modem. IEEE Journal on Selected Areas in Communications. 1989;7:968–973.,

59. Conway JH, Sloane NJA. What Are All the Best Sphere Packings in Low Dimensions? Discrete Comput Geom. 1995;13:383–403.

60. Sadegh-Zadeh K. Fuzzy genomes. Artif Intell Med. 2000;18:1–28. 10606792

61. Miranda TB, Jones PA. DNA methylation: the nuts and bolts of repression. J Cell Physiol. 2007;213:384–390. 17708532

62. Song J, Teplova M, Ishibe-Murakami S, Patel DJ. Structure-based mechanistic insights into DNMT1-mediated maintenance DNA methylation. Science. 2012;335:709–712. 22323818

63. Pingoud V, Sudina A, Geyer H, Bujnicki JM, Lurz R, Luder G, et al. Specificity changes in the evolution of type II restriction endonucleases: a biochemical and bioinformatic analysis of restriction enzymes that recognize unrelated sequences. J Biol Chem. 2005;280:4289–4298. 15563460

64. Chinen A, Naito Y, Handa N, Kobayashi I. Evolution of sequence recognition by restriction-modification enzymes: selective pressure for specificity decrease. Mol Biol Evol. 2000;17:1610–1619. 11070049

65. Sanger F, Coulson AR, Hong GF, Hill DF, Petersen GB. Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol. 1982;162:729–773. 6221115

66. Dunn JJ, Studier FW. Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements. J Mol Biol. 1983;166:477–535. 6864790

67. Chmiel AA, Bujnicki JM, Skowronek KJ. A homology model of restriction endonuclease SfiI in complex with DNA. BMC Struct Biol. 2005;5:2. 15667656

68. Bujnicki JM. Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J Mol Evol. 2000;50:39–44. 10654258

69. Bujnicki JM. Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the “midnight zone” of homology. Curr Protein Pept Sci. 2003;4:327–337. 14529527

70. Gupta R, Capalash N, Sharma P. Restriction endonucleases: natural and directed evolution. Appl Microbiol Biotechnol. 2012;94:583–599. 22398859

71. Jeltsch A, Kroger M, Pingoud A. Evidence for an evolutionary relationship among type-II restriction endonucleases. Gene. 1995;160:7–16. 7628720

72. Kim Y, Grable JC, Love R, Greene PJ, Rosenberg JM. Refinement of Eco RI endonuclease crystal structure: a revised protein chain tracing. Science. 1990;249:1307–1309.

73. Childs J, Villanueba K, Barrick D, Schneider TD, Stormo GD, Gold L, et al. Ribosome binding site sequences and function. In: Calendar R, Gold L, editors. Sequence Specificity in Transcription and Translation, UCLA Symposia on Molecular and Cellular Biology, Vol. 30. New York: Alan R. Liss, Inc; 1985. p. 341–350.

74. Stormo GD, Schneider TD, Gold L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 1986;14:6661–6679. 3092188

75. Barrick D, Villanueba K, Childs J, Kalil R, Schneider TD, Lawrence CE, et al. Quantitative analysis of ribosome binding sites in E. coli. Nucleic Acids Res. 1994;22:1287–1295. 8165145

76. Takeda Y, Sarai A, Rivera VM. Analysis of the sequence-specific interactions between Cro repressor and operator DNA by systematic base substitution experiments. Proc Natl Acad Sci USA. 1989;86:439–443. doi: 10.1073/pnas.86.2.439 2911590

77. Lehming N, Sartorius J, Kisters-Woike B, von Wilcken-Bergmann B, Müller-Hill B. Mutant lac repressors with new specificities hint at rules for protein–DNA recognition. EMBO J. 1990;9:615–621. doi: 10.1002/j.1460-2075.1990.tb08153.x 2178920

78. Heitman J, Model P. Substrate recognition by the EcoRI endonuclease. Proteins. 1990;7:185–197. 2139225

79. Man TK, Yang JS, Stormo GD. Quantitative modeling of DNA-protein interactions: effects of amino acid substitutions on binding specificity of the Mnt repressor. Nucleic Acids Res. 2004;32:4026–4032. 15289576

80. Bulyk ML, Johnson PL, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. 11861919

81. Stormo GD. Maximally efficient modeling of DNA sequence motifs at all levels of complexity. Genetics. 2011;187:1219–1224. 21300846

82. Zhao Y, Ruan S, Pandey M, Stormo GD. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012;191:781–790. 22505627

83. Bindewald E, Schneider TD, Shapiro BA. CorreLogo: An online server for 3D sequence logos of RNA and DNA alignments. Nucleic Acids Res. 2006;34:w405–w411., 16845037

84. Doruker P, Nilsson L, Kurkcuoglu O. Collective dynamics of EcoRI-DNA complex by elastic network model and molecular dynamics simulations. J Biomol Struct Dyn. 2006;24:1–16. 16780370

85. Csermely P, Palotai R, Nussinov R. Induced fit, conformational selection and independent dynamic segments: an extended view of binding events. Trends Biochem Sci. 2010;35:539–546. 20541943

86. Seeman NC, Rosenberg JM, Rich A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc Natl Acad Sci USA. 1976;73:804–808. 1062791

87. Schneider TD. Reading of DNA Sequence Logos: Prediction of Major Groove Binding by Information Theory. Meth Enzym. 1996;274:445–455., 8902824

88. Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the Sixth International Congress on Genetics. 1932;I:355–366.

89. Gavrilets S. Evolution and speciation on holey adaptive landscapes. Trends Ecol Evol. 1997;12:307–312. 21238086

90. Gavrilets S. High-Dimensional Fitness Landscapes and Speciation. In: Pigliucci M, Muller G, editors. Evolution—the Extended Synthesis. Cambridge, MA: MIT Press; 2010. p. 45–79.

91. Pigliucci M. Sewall Wright’s adaptive landscapes: 1932 vs. 1988. Biol Philos. 2008;23:591–603.,

92. Schneider TD, Spouge J. Information content of individual genetic sequences. J Theor Biol. 1997;189:427–441., 9446751

93. Shultzaberger RK, Roberts LR, Lyakhov IG, Sidorov IA, Stephen AG, Fisher RJ, et al. Correlation between binding rate constants and individual information of E. coli Fis binding sites. Nucleic Acids Res. 2007;35:5275–5283., 17617646

94. Hengen PN, Bartram SL, Stewart LE, Schneider TD. Information Analysis of Fis Binding Sites. Nucleic Acids Res. 1997;25:4994–5002., 9396807

95. Shultzaberger RK, Chen Z, Lewis KA, Schneider TD. Anatomy of Escherichia coli σ70 promoters. Nucleic Acids Res. 2007;35:771–788., 17189297

96. Penotti FE. Human DNA TATA boxes and transcription initiation sites. A statistical study. J Mol Biol. 1990;213:37–52. 2338714

97. Shultzaberger RK, Bucheimer RE, Rudd KE, Schneider TD. Anatomy of Escherichia coli Ribosome Binding Sites. J Mol Biol. 2001;313:215–228., 11601857

98. Schneider TD. Refined Computation of Rsequence and Rfrequency for E. coli ribosome binding sites; 2005.

99. Stephens RM, Schneider TD. Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites. J Mol Biol. 1992;228:1124–1136., 1474582

100. Mandelbrot BB. The fractal geometry of nature. San Francisco: W. H. Freeman and Co.; 1983.

101. Muslih SI, Agrawal OP. A Scaling Method and its Applications to Problems in Fractional Dimensional Space. Journal of Mathematical Physics. 2009;50:1–11.,

102. Sorensen CM, Roberts GC. The Prefactor of Fractal Aggregates. J Colloid Interface Sci. 1997;186:447–452. 9056374

103. Cohen-Karni D, Xu D, Apone L, Fomenkov A, Sun Z, Davis PJ, et al. The MspJI family of modification-dependent restriction endonucleases for epigenetic studies. Proc Natl Acad Sci USA. 2011;108:11040–11045. 21690366

Článek vyšel v časopise


2019 Číslo 10