A novel one-class classification approach to accurately predict disease-gene association in acute myeloid leukemia cancer

Autoři: Akram Vasighizaker aff001;  Alok Sharma aff002;  Abdollah Dehzangi aff007
Působiště autorů: Electrical & Computer Engineering Department, Tarbiat Modares University, Tehran, Iran aff001;  Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Queensland, Australia aff002;  Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, Japan aff003;  Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan aff004;  School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji aff005;  CREST, JST, Tokyo, Japan aff006;  Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America aff007
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: https://doi.org/10.1371/journal.pone.0226115


Disease causing gene identification is considered as an important step towards drug design and drug discovery. In disease gene identification and classification, the main aim is to identify disease genes while identifying non-disease genes are of less or no significant. Hence, this task can be defined as a one-class classification problem. Existing machine learning methods typically take into consideration known disease genes as positive training set and unknown genes as negative samples to build a binary-class classification model. Here we propose a new One-class Classification Support Vector Machines (OCSVM) method to precisely classify candidate disease genes. Our aim is to build a model that concentrate its focus on detecting known disease-causing gene to increase sensitivity and precision. We investigate the impact of our proposed model using a benchmark consisting of the gene expression dataset for Acute Myeloid Leukemia (AML) cancer. Compared with the traditional methods, our experimental result shows the superiority of our proposed method in terms of precision, recall, and F-measure to detect disease causing genes for AML. OCSVM codes and our extracted AML benchmark are publicly available at: https://github.com/imandehzangi/OCSVM.

Klíčová slova:

Acute myeloid leukemia – Algorithms – Drug discovery – Gene expression – Gene prediction – Machine learning – Support vector machines – Kernel methods


1. Luo, P., Tian, L. P., Ruan, J., and Wu, F. X., Identifying disease genes from PPI networks weighted by gene expression under different conditions. in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2016. IEEE.

2. Asif M., Martiniano H. F., Vicente A. M., and Couto F. M., Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PloS one, 2018. 13(12): p. e0208626. doi: 10.1371/journal.pone.0208626 30532199

3. McBride D. L., Large Genetic Study Uncovers 14 New Genes Responsible for Developmental Disorders in Children. Journal of pediatric nursing, 2017. 35: p. 1–2. doi: 10.1016/j.pedn.2017.02.002 28728758

4. Adie E. A., Adams R. R., Evans K. L., Porteous D. J., and Pickard B. S., Speeding disease gene discovery by sequence based candidate prioritization. BMC bioinformatics, 2005. 6(1): p. 55.

5. Xu J. and Li Y., Discovering disease-genes by topological features in human protein–protein interaction network. Bioinformatics, 2006. 22(22): p. 2800–2805. doi: 10.1093/bioinformatics/btl467 16954137

6. Smalter, A., Lei, S. F., and Chen, X. W., Human disease-gene classification with integrative sequence-based and topological features of protein-protein interaction networks. in Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on. 2007. IEEE.

7. Zhou H. and Skolnick J., A knowledge-based approach for predicting gene–disease associations. Bioinformatics, 2016. 32(18): p. 2831–2838. doi: 10.1093/bioinformatics/btw358 27283949

8. Ata S. K., Ou-Yang L., Fang Y., Kwoh C. K., Wu M., and Li X. L., Integrating node embeddings and biological annotations for genes to predict disease-gene associations. BMC systems biology, 2018. 12(9): p. 138.

9. Luo P., Li Y., Tian L. P., and Wu F. X., Enhancing the prediction of disease—gene associations with multimodal deep learning. Bioinformatics, 2019.

10. Han, P., Yang, P., Zhao, P., Shang, S., Liu, Y., Zhou, J., et al., GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorization. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. ACM.

11. Mordelet F. and Vert J. P., ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics, 2011. 12(1): p. 389.

12. Yang P., Li X. L., Mei J. P., Kwoh C. K., and Ng S. K., Positive-unlabeled learning for disease gene identification. Bioinformatics, 2012. 28(20): p. 2640–2647. doi: 10.1093/bioinformatics/bts504 22923290

13. Jowkar G. H. and Mansoori E. G., Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Computational biology and chemistry, 2016. 64: p. 263–270. doi: 10.1016/j.compbiolchem.2016.07.004 27475237

14. Yousef A. and Charkari N.M., SFM: a novel sequence-based fusion method for disease genes identification and prioritization. Journal of theoretical biology, 2015. 383: p. 12–19. doi: 10.1016/j.jtbi.2015.07.010 26209022

15. S Singh-Blom U. M., Natarajan N., Tewari A., Woods J. O., Dhillon I. S., and Marcotte E. M., Prediction and validation of gene-disease associations using methods inspired by social network analyses. PloS one, 2013. 8(5): p. e58977. doi: 10.1371/journal.pone.0058977 23650495

16. Vasighizaker A. and Jalili, C-PUGP: A Cluster-based Positive Unlabeled learning method for disease Gene Prediction and prioritization. Computational biology and chemistry, 2018.

17. Tatusova T., DiCuccio M., Badretdin A., Chetvernin V., Nawrocki E. P., Zaslavsky L., et al., NCBI prokaryotic genome annotation pipeline. Nucleic acids research, 2016. 44(14): p. 6614–6624. doi: 10.1093/nar/gkw569 27342282

18. Stirewalt D. L., Meshinchi S., Kopecky K. J., Fan W., Pogosova-Agadjanyan E. L., Engel J. H.,et al., Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes, Chromosomes and Cancer, 2008. 47(1): p. 8–20. doi: 10.1002/gcc.20500 17910043

19. Yang P., Li X., Chua H. N., Kwoh C. K., and Ng S. K., Ensemble positive unlabeled learning for disease gene identification. PloS one, 2014. 9(5): p. e97079. doi: 10.1371/journal.pone.0097079 24816822

20. Maji P., Shah E., and Paul S., RelSim: An integrated method to identify disease genes using gene expression profiles and PPIN based similarity measure. Information Sciences, 2017. 384: p. 110–125.

21. Khan, S. S. and Madden, M. G., A survey of recent trends in one class classification. in Irish conference on artificial intelligence and cognitive science. 2009. Springer.

22. Schölkopf B., Platt J. C., Shawe-Taylor J., Smola A. J., and Williamson R. C., Estimating the support of a high-dimensional distribution. Neural computation, 2001. 13(7): p. 1443–1471. doi: 10.1162/089976601750264965 11440593

23. Tax D. M. and Duin R.P., Support vector domain description. Pattern recognition letters, 1999. 20(11–13): p. 1191–1199.

24. De Bie T., Tranchevent L. C., Van Oeffelen L. M., and Moreau Y., Kernel-based data fusion for gene prioritization. Bioinformatics, 2007. 23(13): p. i125–i132. doi: 10.1093/bioinformatics/btm187 17646288

25. Tran Q. A., Li X., and Duan H., Efficient performance estimate for one-class support vector machine. Pattern Recognition Letters, 2005. 26(8): p. 1174–1182.

26. Lee, W. S. and Liu, B., Learning with positive and unlabeled examples using weighted logistic regression. in ICML. 2003.

27. Liu, B., Dai, Y., Li, X., Lee, W. S., and Philip, S. Y., Building text classifiers using positive and unlabeled examples. in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. 2003. IEEE.

Článek vyšel v časopise


2019 Číslo 12
Nejčtenější tento týden