Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction

Autoři: E. Tejera aff001;  I. Carrera aff003;  Karina Jimenes-Vargas aff001;  V. Armijos-Jaramillo aff001;  A. Sánchez-Rodríguez aff002;  M. Cruz-Monteagudo aff006;  Y. Perez-Castillo aff002
Působiště autorů: Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador aff001;  Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador aff002;  Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Ecuador aff003;  Departamento de Ciências de Computadores, Faculdade de Ciências, Universidade do Porto, Porto, Portugal aff004;  Universidad Técnica Particular de Loja, Loja, Ecuador aff005;  Center for Computational Science (CCS), University of Miami (UM), Miami, FL, United States of America aff006;  West Coast University, Miami, Florida, United States of America aff007;  Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, Ecuador aff008
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223276


The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.

Klíčová slova:

Algorithms – Database and informatics methods – Gene expression – Machine learning – Machine learning algorithms – Optimization – Support vector machines – Kernel functions


1. Lagunin AA, Dubovskaja VI, Rudik A V., Pogodin P V., Druzhilovskiy DS, Gloriozova TA, et al. CLC-Pred: A freely available web-service for in silico prediction of human cell line cytotoxicity for drug-like compounds. Rishi A, editor. PLoS One. Public Library of Science; 2018;13: e0191838. doi: 10.1371/journal.pone.0191838 29370280

2. Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, et al. Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties. Raghava GPS, editor. PLoS One. Public Library of Science; 2013;8: e61318. doi: 10.1371/journal.pone.0061318 23646105

3. Cortes-Ciriano I, Murrell D, Chetrit B, Bender A, Malliavin T, Ballester P. Cancer Cell Line Profiler (CCLP): a webserver for the prediction of compound activity across the NCI60 panel. bioRxiv. Cold Spring Harbor Laboratory; 2017; 105478. doi: 10.1101/105478

4. Cortés-Ciriano I, van Westen GJP, Bouvier G, Nilges M, Overington JP, Bender A, et al. Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel. Bioinformatics. Narnia; 2015;32: btv529. doi: 10.1093/bioinformatics/btv529 26351271

5. Ammad-ud-din M, Georgii E, Gönen M, Laitinen T, Kallioniemi O, Wennerberg K, et al. Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization. J Chem Inf Model. American Chemical Society; 2014;54: 2347–2359. doi: 10.1021/ci500152b 25046554

6. Zhang N, Wang H, Fang Y, Wang J, Zheng X, Liu XS. Predicting Anticancer Drug Responses Using a Dual-Layer Integrated Cell Line-Drug Network Model. Leslie CS, editor. PLOS Comput Biol. Public Library of Science; 2015;11: e1004498. doi: 10.1371/journal.pcbi.1004498 26418249

7. Lamb J. The Connectivity Map: a new tool for biomedical research. Nat Rev Cancer. 2007;7: 54–60. doi: 10.1038/nrc2044 17186018

8. Cheng J, Yang L, Kumar V, Agarwal P. Systematic evaluation of connectivity map for disease indications. Genome Med. 2014;6: 95. doi: 10.1186/s13073-014-0095-1 25606058

9. Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD, et al. L1000CDS(2): LINCS L1000 characteristic direction signatures search engine. NPJ Syst Biol Appl. 2016;2: 16015. doi: 10.1038/npjsba.2016.15 28413689

10. Wang K, Sun J, Zhou S, Wan C, Qin S, Li C, et al. Prediction of Drug-Target Interactions for Drug Repositioning Only Based on Genomic Expression Similarity. Markel S, editor. PLoS Comput Biol. 2013;9: e1003315. doi: 10.1371/journal.pcbi.1003315 24244130

11. Bajorath J, Peltason L, Wawer M, Guha R, Lajiness MS, Van Drie JH. Navigating structure–activity landscapes. Drug Discov Today. 2009;14: 698–705. doi: 10.1016/j.drudis.2009.04.003 19410012

12. Guha R, Van Drie JH. Assessing how well a modeling protocol captures a structure-activity landscape. J Chem Inf Model. 2008;48: 1716–28. doi: 10.1021/ci8001414 18686944

13. Chen R, Liu X, Jin S, Lin J, Liu J. Machine learning for drug-target interaction prediction. Molecules. MDPI AG; 2018. doi: 10.3390/molecules23092208 30200333

14. Liu X, Xu Y, Li S, Wang Y, Peng J, Luo C, et al. In Silico target fishing: addressing a "Big Data" problem by ligand-based similarity rankings with data fusion. J Cheminform. Springer; 2014;6: 33. doi: 10.1186/1758-2946-6-33 24976868

15. Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Pujadas G, Garcia-Vallve S. Tools for in silico target fishing. Methods. 2015;71: 98–103. doi: 10.1016/j.ymeth.2014.09.006 25277948

16. Daina A, Michielin O, Zoete V. SwissTargetPrediction: updated data and new features for efficient prediction of protein targets of small molecules. Nucleic Acids Res. Oxford University Press (OUP); 2019;47: W357–W364. doi: 10.1093/nar/gkz382 31106366

17. Peón A.; Naulaerts S.; Ballester PJ. Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space. 2017;

18. Jenkins JL, Bender A, Davies JW. In silico target fishing: Predicting biological targets from chemical structure. doi: 10.1016/j.ddtec.2006.12.008

19. Bender A, Jenkins JL, Li Q, Adams SE, Cannon EO, Glen RC. Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR. Annu Rep Comput Chem. Elsevier; 2006;2: 141–168. doi: 10.1016/S1574-1400(06)02009-3

20. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45: D945–D954. doi: 10.1093/nar/gkw1074 27899562

21. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, et al. PubChem Substance and Compound databases. Nucleic Acids Res. 2016;44: D1202–13. doi: 10.1093/nar/gkv951 26400175

22. Peón A, Dang CC, Ballester PJ. How reliable are ligand-centric methods for target fishing? Front Chem. Frontiers Media S. A; 2016;4. doi: 10.3389/fchem.2016.00015 27148522

23. Ding P, Yan X, Liu Z, Du J, Du Y, Lu Y, et al. PTS: a pharmaceutical target seeker. Database. Oxford University Press; 2017;2017. doi: 10.1093/database/bax095

24. Cruz-Monteagudo M, Schürer S, Tejera E, Pérez-Castillo Y, Medina-Franco JL, Sánchez-Rodríguez A, et al. Systemic QSAR and phenotypic virtual screening: chasing butterflies in drug discovery. Drug Discov Today. 2017;22: 994–1007. doi: 10.1016/j.drudis.2017.02.004 28274840

25. RDKit, Open-Source Cheminformatics. 2018. p. http://www.rdkit.org.

26. Stumpfe D, Bajorath J. Exploring Activity Cliffs in Medicinal Chemistry. J Med Chem. 2012;55: 2932–2942. doi: 10.1021/jm201706b 22236250

27. Tropsha A. Best practices for QSAR model development, validation, and exploitation. Molecular Informatics. 2010. doi: 10.1002/minf.201000061 27463326

Článek vyšel v časopise


2019 Číslo 10