iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou’s 5-step rule

Autoři: Sharaf Jameel Malebary aff001;  Muhammad Safi ur Rehman aff002;  Yaser Daanial Khan aff002
Působiště autorů: Department of Information Technology, King Abdul Aziz University, Rabigh, Kingdom of Saudi Arabia aff001;  Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan aff002
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223993


Among different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.

Klíčová slova:

Artificial neural networks – Database searching – Lysine – Post-translational modification – Protein sequencing – Sequence databases – Support vector machines


1. Chatterjea M, Shinde R. Textbook of medical biochemistry: 2011.

2. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. 2001;291(5507):1304–51.

3. Chou K-C. Progresses in predicting post-translational modification. International Journal of Peptide Research and Therapeutics. 2019:1–16.

4. Li S, Li H, Li M, Shyr Y, Xie L, Li YJP, et al. Improved prediction of lysine acetylation by support vector machines. 2009;16(8):977–83.

5. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. JNar 2002;30(1):207–10.

6. Glozak M, Sengupta N, Zhang X, Seto EJG. Leaders in Pharmaceutical Business Intelligence (LPBI) Group. 2005;363(19):15–23.

7. Huang G, Zeng W. A discrete hidden Markov model for detecting histone crotonyllysine sites. JMCMCC 2016;75:717–30.

8. Qiu W-R, Sun B-Q, Tang H, Huang J, Lin H. Identify and analysis crotonylation sites in histone by using support vector machines. JAiim 2017;83:75–81.

9. Ju Z, He J-J. Prediction of lysine crotonylation sites by incorporating the composition of k-spaced amino acid pairs into Chou’s general PseAAC. JJoMG, Modelling 2017;77:200–4.

10. Qiu W-R, Sun B-Q, Xiao X, Xu Z-C, Jia J-H, Chou K-C. iKCR-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics. 2017.

11. Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. iPhosH-PseAAC: Identify phosphohistidine sites in proteins by blending statistical moments and position relative features according to the Chou's 5-step rule and general pseudo amino acid composition. IEEE/ACM transactions on computational biology and bioinformatics. 2019.

12. Ehsan A, Mahmood MK, Khan YD, Barukab OM, Khan SA, Chou K-C. iHyd-PseAAC (EPSV): Identifying Hydroxylation Sites in Proteins by Extracting Enhanced Position and Sequence Variant Feature via Chou's 5-Step Rule and General Pseudo Amino Acid Composition. Current Genomics. 2019;20(2):124–33. doi: 10.2174/1389202920666190325162307 31555063

13. Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. Analytical biochemistry. 2019;568:14–23. doi: 10.1016/j.ab.2018.12.019 30593778

14. Kabir M, Ahmad S, Iqbal M, Hayat M. iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families. Genomics. 2019.

15. Le NQK, Yapp EKY, Ou Y-Y, Yeh H-Y. iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule. Analytical biochemistry. 2019;575:17–26. doi: 10.1016/j.ab.2019.03.017 30930199

16. Tahir M, Tayara H, Chong KT. iDNA6mA (5-step rule): Identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou's 5-step rule. Chemometrics and Intelligent Laboratory Systems. 2019;189:96–101.

17. He B, Kang J, Ru B, Ding H, Zhou P, Huang J. SABinder: a web service for predicting streptavidin-binding peptides. BioMed research international. 2016;2016.

18. Kang J, Fang Y, Yao P, Li N, Tang Q, Huang J. NeuroPP: a tool for the prediction of neuropeptide precursors based on optimal sequence composition. Interdisciplinary Sciences: Computational Life Sciences. 2019;11(1):108–14.

19. Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, et al. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. 2019.

20. Wang L, Zhang R, Mu Y. Fu-SulfPred: Identification of Protein S-sulfenylation Sites by Fusing Forests via Chou’s General PseAAC. Journal of theoretical biology. 2019;461:51–8. doi: 10.1016/j.jtbi.2018.10.046 30365947

21. Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins. Journal of theoretical biology. 2019.

22. He W, Jia C, Zou Q. 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction. Bioinformatics. 2019;35(4):593–601. doi: 10.1093/bioinformatics/bty668 30052767.

23. Chou K, Forsen S, Zhou G. 3 SCHEMATIC RULES FOR DERIVING APPARENT RATE CONSTANTS. Chemica Scripta. 1980;16(4):109–13.

24. Li T, Chou K. The flow of substrate molecules in fast enzyme-catalyzed reaction systems. Chemica Scripta. 1980;16(5):192–6.

25. Lian P, Wei D-Q, Wang J-F, Chou K-C. An allosteric mechanism inferred from molecular dynamics simulations on phospholamban pentamer in lipid membranes. PLoS One. 2011;6(4):e18587. doi: 10.1371/journal.pone.0018587 21525996

26. Zhou G-P. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. Journal of Theoretical Biology. 2011;284(1):142–8. doi: 10.1016/j.jtbi.2011.06.006 21718705

27. Jia J, Liu Z, Xiao X, Liu B, Chou K-C. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. Journal of Biomolecular Structure and Dynamics. 2016;34(9):1946–61. doi: 10.1080/07391102.2015.1095116 26375780

28. Andraos J. Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws—new methods based on directed graphs. Canadian Journal of Chemistry. 2008;86(4):342–57.

29. Liu H, Wang M, Chou K-C. Low-frequency Fourier spectrum for predicting membrane protein types. Biochemical and biophysical research communications. 2005;336(3):737–9. doi: 10.1016/j.bbrc.2005.08.160 16140260

30. Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of theoretical biology. 2011;273(1):236–47. doi: 10.1016/j.jtbi.2010.12.024 21168420

31. Shen H-B, Chou K-C, Signal-3L: A 3-layer approach for predicting signal peptides. JB communications br. 2007;363(2):297–303.

32. Xu Y, Wen X, Wen L-S, Wu L-Y, Deng N-Y, Chou K-C. iNitro-Tyr: Prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PloS one. 2014;9(8):e105018. doi: 10.1371/journal.pone.0105018 25121969

33. Qiu W-R, Xiao X, Lin W-Z, Chou K-C. iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed research international. 2014;2014.

34. Xu Y, Wen X, Shao X-J, Deng N-Y, Chou K-C. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International journal of molecular sciences. 2014;15(5):7594–610. doi: 10.3390/ijms15057594 24857907

35. Qiu W-R, Xiao X, Lin W-Z, Chou K-C. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. Journal of Biomolecular Structure and Dynamics. 2015;33(8):1731–42. doi: 10.1080/07391102.2014.968875 25248923

36. Jia J, Liu Z, Xiao X, Liu B, Chou K-C. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. JJoBS, Dynamics 2016;34(9):1946–61.

37. Khan YD, Rasool N, Hussain W, Khan SA, Chou K-C. iPhosY-PseAAC: Identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Molecular Biology Reports. 2018:1–9.

38. Khan YD, Rasool N, Hussain W, Khan SA, Chou K-C. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Analytical biochemistry. 2018;550:109–16. doi: 10.1016/j.ab.2018.04.021 29704476

39. Jia J, Liu Z, Xiao X, Liu B, Chou K-C. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical biochemistry. 2016;497:48–56. doi: 10.1016/j.ab.2015.12.009 26723495

40. Qiu W-R, Sun B-Q, Xiao X, Xu Z-C, Chou K-C. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics. 2016;32(20):3116–23. doi: 10.1093/bioinformatics/btw380 27334473

41. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2. doi: 10.1093/bioinformatics/bts565 23060610

42. Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome research. 2004;14(6):1188–90. doi: 10.1101/gr.849004 15173120

43. Chou K-C. Impacts of bioinformatics to medicinal chemistry. Medicinal chemistry. 2015;11(3):218–34. doi: 10.2174/1573406411666141229162834 25548930

44. Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. JPS, Function, Bioinformatics 2001;43(3):246–55.

45. Cao D-S, Xu Q-S, Liang Y-Z. propy: a tool to generate various modes of Chou’s PseAAC. JB 2013;29(7):960–2.

46. Lin S-X, Lapointe J. Theoretical and experimental biology in one. JJBSE 2013;6(4).

47. Zhong W-Z, Zhou S-F. Molecular science for drug development and biomedicine. Multidisciplinary Digital Publishing Institute; 2014.

48. Zhou G-P, Zhong W-Z. Perspectives in Medicinal Chemistry. JCtimc 2016;16(4):381.

49. Ali F, Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou׳s Pseudo Amino Acid Composition. JJotb 2015;384:78–83.

50. Hajisharifi Z, Piryaiee M, Beigi MM, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou′s pseudo amino acid composition and investigating their mutagenicity via Ames test. JJoTB 2014;341:34–40.

51. Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. JMg, genomics 2016;291(1):285–96.

52. Du P, Gu S, Jiao Y. PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. JIjoms 2014;15(3):3495–506.

53. Liu B, Liu F, Fang L, Wang X, Chou K-C. repRNA: a web server for generating various feature vectors of RNA sequences. JMG, Genomics 2016;291(1):473–81.

54. Chen W, Lin H, Chou K-C. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. JMB 2015;11(10):2620–34.

55. Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. JNar 2015;43(W1):W65–W71.

56. Akmal MA, Rasool N, Khan YD. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PloS one. 2017;12(8):e0181966. doi: 10.1371/journal.pone.0181966 28797096


58. Butt AH, Rasool N, Khan YD. A Treatise to Computational Approaches Towards Prediction of Membrane Protein and Its Subtypes. The Journal of membrane biology. 2017;250(1):55–76. doi: 10.1007/s00232-016-9937-7 27866233

59. Butt AH, Rasool N, Khan YD. Predicting membrane proteins and their types by extracting various sequence features into Chou’s general PseAAC. Molecular biology reports. 2018:1–12.

60. Ehsan A, Mahmood K, Khan YD, Khan SA, Chou K-C. A novel modeling in mathematical biology for classification of signal peptides. Scientific reports. 2018;8(1):1039. doi: 10.1038/s41598-018-19491-y 29348418

61. Ghauri A, Khan Y, Rasool N, Khan S, Chou K. pNitro-Tyr-PseAAC: Predict nitrotyrosine sites in proteins by incorporating five features into Chou's general PseAAC. Current pharmaceutical design. 2018.

62. Khan YD, Ahmad F, Anwar MW. A neuro-cognitive approach for iris recognition using back propagation. World Applied Sciences Journal. 2012;16(5):678–85.

63. Khan YD, Khan NS, Farooq S, Abid A, Khan SA, Ahmad F, et al. An Efficient Algorithm for Recognition of Human Actions. The Scientific World Journal. 2014;2014.

64. Khan YD, Jamil M, Hussain W, Rasool N, Khan SA, Chou K-C. pSSbond-PseAAC: Prediction of disulfide bonding sites by integration of PseAAC and statistical moments. Journal of theoretical biology. 2018.

65. Khan YD, Khan SA, Ahmad F, Islam S. Iris recognition using image moments and k-means algorithm. The Scientific World Journal. 2014;2014.

66. Gluhovsky A, Agee E. Estimating higher-order moments of nonlinear time series. Journal of Applied Meteorology and Climatology. 2009;48(9):1948–54.

67. Zhu H, Shu H, Zhou J, Luo L, Coatrieux J-L. Image analysis by discrete orthogonal dual Hahn moments. Pattern Recognition Letters. 2007;28(13):1688–704.

68. Bishop CM. Neural networks for pattern recognition: Oxford university press; 1995.

69. Haykin S. Neural networks: a comprehensive foundation: Prentice Hall PTR; 1994.

70. Petersen B, Lundegaard C, Petersen TN. NetTurnP–neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features. PLoS One. 2010;5(11):e15079. doi: 10.1371/journal.pone.0015079 21152409

71. Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. JNar 1998;26(9):2230–6.

72. Chou K-C. Prediction of signal peptides using scaled window. Jp 2001;22(12):1973–9.

73. Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C, et al. iPhosH-PseAAC: Identify phosphohistidine sites in proteins by blending statistical moments and position relative features according to the Chou's 5-step rule and general pseudo amino acid composition. JIAtocb 2019.

74. Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory. Molecular Informatics. 2017;36(5–6).

75. Xiao X, Ye H-X, Liu Z, Jia J-H, Chou K-C. iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. JO 2016;7(23):34180.

76. Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget. 2017; 8(3):4208–17. doi: 10.18632/oncotarget.13758 27926534.

77. Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics. 2018;34(1):33–40. doi: 10.1093/bioinformatics/btx579 28968797.

78. Ehsan A, Mahmood K, Khan YD, Khan SA, Chou KC. A Novel Modeling in Mathematical Biology for Classification of Signal Peptides. Scientific Reports. 2018;8:1039. doi: 10.1038/s41598-018-19491-y 29348418.

79. Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics. 2018. doi: 10.1016/j.ygeno.2018.01.005 29360500.

80. Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. JAb 2013;436(2):168–77.

81. Khan A, Majid A, Hayat M. CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. JCb, chemistry 2011;35(4):218–29.

82. Chou K-C. Some remarks on predicting multi-label attributes in molecular biosystems. JMB 2013;9(6):1092–100.

83. Jia J, Li X, Qiu W, Xiao X, Chou K-C. iPPI-PseAAC (CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. Journal of theoretical biology. 2019;460:195–203. doi: 10.1016/j.jtbi.2018.10.021 30312687

84. Cui X, Yu Z, Yu B, Wang M, Tian B, Ma Q, et al. UbiSitePred: A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou's pseudo components. JC 2019;184:28–43.

Článek vyšel v časopise


2019 Číslo 11