Improving graphs of cycles approach to structural similarity of molecules

Autoři: Stefi Nouleho Ilemo aff001;  Dominique Barth aff001;  Olivier David aff002;  Franck Quessette aff001;  Marc-Antoine Weisser aff003;  Dimitri Watel aff004
Působiště autorů: DAVID, Department of Computer Science, University of Versailles Saint Quentin, Versailles, France aff001;  ILV, Department of Chemistry, University of Versailles, Versailles, France aff002;  LRI, CentraleSupelec, Paris-Saclay University, Evry, France aff003;  ENSIIE, Evry, France aff004;  SAMOVAR, Telecom SudParis, Evry, France aff005
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226680


This paper focuses on determining the structural similarity of two molecules, i.e., the similarity of the interconnection of all the elementary cycles in the corresponding molecular graphs. In this paper, we propose and analyze an algorithmic approach based on the resolution of the Maximum Common Edge Subgraph (MCES) problem with graphs representing the interaction of cycles molecules. Using the ChEBI database, we compare the effectiveness of this approach in terms of structural similarity and computation time with two calculations of similarity of molecular graphs, one based on the MCES, the other on the use of different fingerprints (Daylight, ECFP4, ECFP6, FCFP4, FCFP6) to measure Tanimoto coefficient. We also analyze the obtained structural similarity results for a selected subset of molecules.

Klíčová slova:

Algorithms – Amphotericin – Daylight – Graph theory – Graphs – Molecular structure – Quinine – Molecular computing


1. Reaxys. Accessed Nov, 2017.

2. Chemical entities of biological interest (chebi). Accessed April 1, 2015.

3. Birchall K. and Gillet V. J. Reduced Graphs and Their Applications in Chemoinformatics, pages 197–212. Humana Press, 2011.

4. C. Berge. Théorie des graphes et ses applications. Dunod, 1963.

5. Gasteiger J. Handbook of Chemoinformatics: From Data to Knowledge (Representation of Molecular Structures). Wiley, 1 edition, 2003.

6. Raymond J. W., Gardiner E. J., and W. Peter. Rascal: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal, 45(6), 2002. doi: 10.1093/comjnl/45.6.631

7. Zager L. A. and Verghese G. C. Graph similarity scoring and matching. Applied Mathematics Letters, 45(21):86–94, 2008. doi: 10.1016/j.aml.2007.01.006

8. Johnson M. A. and Maggiora G. M. Concepts and applications of molecular similarity. The American Chemical Society, 1988.

9. Eckert H. and Bajorath J. Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discovery Today, 12(5):225–233, 2007. doi: 10.1016/j.drudis.2007.01.011 17331887

10. Cereto-Massagué A., Ojeda M. J., Valls C., Mulero M., Garcia-Vallvé S., and Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods, 71:58–63, 2015. doi: 10.1016/j.ymeth.2014.08.005 25132639

11. B. Andreas, Jeremy J.L., S. Josef, Sai S.C.K., G. Meir, and D. John W. How similar are similarity searching methods? a principal component analysis of molecular descriptor space. J. Chem. Inf. Model., 49(1), 2009.

12. d. G. Kurt and C. F. Molecular graph augmentation with rings and functional groups. J. Chem. Inf. Model., 50(9):1660–1668, 2010. doi: 10.1021/ci9005035

13. G. Benoit, B. L., and V. D. Graph kernels in chemoinformatics. Quantitative Graph Theory Mathematical Foundations and Applications, CRC Press, pages 425–470, 2015.

14. Flower D. On the properties of bit string-based measures of chemical similarity. Journal of Chemical Information and Computer Sciences, 38:379–386, 05 1998. doi: 10.1021/ci970437z

15. F. Abu-khzam, N. Samatova, M. A. Rizk, and M. Langston. The maximum common subgraph problem: Faster solutions via vertex cover. IEEE/ACS International Conference on Computer Systems and Applications, pages 367–373, 2007.

16. A. Tatsuya and N. Hiroshi. Comparison and enumeration of chemical graphs. Comput Struct Biotechnol J., 5, 2013.

17. N. Michel and B. Horst. Bridging the Gap Between Graph Edit Distance and Kernel Machines. World Scientific Publishing Co., Inc., 2007.

18. S. Roger, M. John, N. O. Boyle, G. Andrew J., S. Stefan, and G. Darren V.S. Chemical similarity based on graph edit distance:efficient implementation and the challenges of evaluation. 7th Joint Sheffield Conference on Chemoinformatics, 2015.

19. Rogers D. J. and Tanimoto T. T. A computer program for classifying plants. Science, 132(3434):1115–1118, 1960. doi: 10.1126/science.132.3434.1115 17790723

20. G. Benoit, B. L., and V. D. Relevant cycle hypergraph representation for molecules. 9th IAPR-TC-15 Graph-Based Representations in Pattern Recognition, page 111, 2013.

21. Horváth, Tamás and Gärtner, Thomas and Wrobel, Stefan. Cyclic pattern kernels for predictive graph mining. KDD-2004—Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 158–167, 2004.

22. A. N. Flachsenberg Florian and Rarey M. Ringdecomposerlib: An open-source implementation of unique ring families and other cycle bases. J. Chem. Inf. Model., 2(57):122–126, 2017.

23. Vismara P. and Laurenco C. An abstract representation for molecular graphs. Discrete Mathematics and Theoretical Computer Science, 51:343–366, 2000.

24. L. Michael F. and H. John D. The sheffield generic structures project—a retrospective review. J. Chem. Inf. Comput. Sci., 36(5):930–936, 1996. doi: 10.1021/ci950173l

25. G. Valerie J., D. Geoffrey M., H. John D., L. Michael F., and D. Winfried. Computer storage and retrieval of generic chemical structures in patents. 13. reduced graph generation. J. Chem. Inf. Comput. Sci., 31(2):260–270, 1991. doi: 10.1021/ci00002a011

26. Glem Robert C., Bender Andreas, Hasselgren Catrin, Carlsson Lars, Boyer Scott and Smith James. Circular fingerprints: Flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs: the investigational drugs journal, 9:199–204, 2006. 16523386

27. Horton J.D. A polynomial-time algorithm fo find the shortest cycle basis of a graph. SIAM Journal on Computing, 16(2):358–366, 1987. doi: 10.1137/0216026

28. Vismara P. Union of all minimum cycle bases of a graph. Electr. J. Comb., 4(1):73–87, 1997.

29. SCIP Accessed Nov, 2017.

30. Santra AK., and Christy Josephine. Genetic Algorithm and Confusion Matrix for Document Clustering International Journal of Computer Science Issues, 2012.

Článek vyšel v časopise


2019 Číslo 12