1. Levitt M. Nature of the protein universe. Proceedings of the National Academy of Sciences of the United States of America. 2009; 106 (27): 11079–84. doi: 10.1073/pnas.0905029106 19541617
2. Yau ST, Yu C, He RL. A protein map and its application. DNA and Cell Biology. 2008; 27: 241250.
3. Yu C, Cheng SY, He RL, Yau ST. Protein map: An alignment-free sequence comparison method based on various properties of amino acids. Gene. 2011; 486(1–2): 110–118. doi: 10.1016/j.gene.2011.07.002 21803133
4. Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau ST. Protein space: A natural method for realizing the nature of protein universe. Journal of Theoretical Biology. 2013; 318:197–204. doi: 10.1016/j.jtbi.2012.11.005 23154188
5. Zhao B, He RL, Yau ST. A new distribution vector and its application in genome clustering. Molecular Phylogenetics and Evolution. 2011; 59: 438–443. doi: 10.1016/j.ympev.2011.02.020 21385621
6. Zhao X, Wan X, He RL, Yau ST. A new method for studying the evolutionary origin of the SAR11 clade marine bacteria. Molecular Phylogenetics and Evolution. 2016; 98: 271–279. doi: 10.1016/j.ympev.2016.02.015 26926946
7. Yu C, He RL, Yau ST. Protein sequence comparison based on K-string dictionary. Gene. 2013; 529: 250–256. doi: 10.1016/j.gene.2013.07.092 23939466
8. Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics. 2001; 17(4), 349–358. doi: 10.1093/bioinformatics/17.4.349 11301304
9. Edler L, Grassmann J, Suhai S. Role and results of statistical methods in protein fold class prediction. Mathematical and Computer Modelling. 2001; 33(12–13): 1401–1417.
10. Huang CD, Lin CT, Pal NR. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification. IEEE transactions on NanoBioscience. 2003; 2(4): 221–232. doi: 10.1109/tnb.2003.820284 15376912
11. Jo T, Hou J, Eickholt J, Cheng J. Improving protein fold recognition by deep learning networks. Scientific reports. 2015; 5: 17573. doi: 10.1038/srep17573 26634993
12. Khan MA, Shahzad W, Baig AR. Protein classification via an ant-inspired association rules-based classifier. International Journal of Bio-Inspired Computation. 2016; 8(1): 51–65.
13. Markowetz F, Edler L, Vingron M. Support vector machines for protein fold class prediction. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2003; 45(3): 377–389.
14. Tan AC, Gilbert D, Deville Y. Multi-class protein fold classification using a new ensemble machine learning approach. Genome Informatics. 2003; 14: 206–217. 15706535
15. Wei L, Liao M, Gao X, Zou Q. Enhanced protein fold prediction method through a novel feature extraction technique. IEEE transactions on nanobioscience. 2015; 14(6): 649–659. doi: 10.1109/TNB.2015.2450233 26335556
16. Wei L, Zou Q. Recent progress in machine learning-based methods for protein fold recognition. International journal of molecular sciences. 2016; 17(12): 2118.
17. Wang J, Wang Z, Tian X. Bioinformatics: Fundamentals and Applications. Tsinghua University Press. 2014.
18. Rackovsky S. Sequence physical properties encode the global organization of protein structure space. PNAS. 2009; 106(34): 14345–14348. doi: 10.1073/pnas.0903433106 19706520
19. Duda RO, Hart PE, Stork DG. Pattern Classification, second Edition. China Machine Press. 2001.
20. Tian K, Zhao X, Yau ST. Convex hull analysis of evolutionary and phylogenetic relationships between biological groups. Journal of Theoretical Biology. 2018; 456: 34–40. doi: 10.1016/j.jtbi.2018.07.035 30059661
21. Shen HB, Chou KC. PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry. 2008; 373(2): 386–388. doi: 10.1016/j.ab.2007.10.012 17976365
22. Liu B, Liu F, Wang X, Chen J, Fang L and Chou KC. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research. 2015; 43 (W1): W65–W71. doi: 10.1093/nar/gkv458 25958395
23. Xu Y, Ding J, Wu LY, Chou KC. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE. 2013; 8(2): e55844. doi: 10.1371/journal.pone.0055844 23409062
24. Gribskov M, Mclachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences. 1987; 84(13), 4355–4358.
25. Jeong JC, Lin X, Chen XW. On position-specific scoring matrix for protein function prediction. IEEE/ACM Transactions on Computational Biology & Bioinformatics. 2011; 8 (2), 308–315.
26. Hsu C, Chang C, Lin C. A practical guide to support vector classification. BJU International. 2008; 101(1):1396–1400.
27. Breiman L. Random Forests. Machine Learning. 2001; 45 (1): 5–32.
28. Lim A., Breiman L, Cutler A. Big random forests: classification and regression forests for large data sets. 2014.
29. Kidera A, Konishi Y, Oka M, Ooi T, Scheraga HA. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. Journal of Protein Chemistry. 1985; 4(1): 23–55.
30. Kidera A, Konishi Y, Ooi T, Scheraga HA. Relation between sequence similarity and structural similarity in proteins: Role of important properties of amino acids. Journal of Protein Chemistry. 1985; 4(5):265–297.
31. Chang CC and Lin CJ. LibSVM: A Library for support vector machines. ACM Transactions on Intelligent Systems & Technology. 2011; 2(3): 27.
32. Lin C, Chen W, Qiu C, Wu Y, Krishnan S, Zou Q. LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy. Neurocomputing. 2014; 123: 424–435.
33. Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, et al. Hierarchical classification of protein folds using a novel ensemble classifier. PLoS ONE. 2013; 8(2): e56499. doi: 10.1371/journal.pone.0056499 23437146