TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information

Autoři: Munira Alballa aff001;  Faizah Aplop aff003;  Gregory Butler aff001
Působiště autorů: Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada aff001;  College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia aff002;  School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, Malaysia aff003;  Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada aff004
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227683


Transporters mediate the movement of compounds across the membranes that separate the cell from its environment and across the inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of membrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computational tools that predict the substrates that are transported by the transmembrane transport proteins. In this paper, we present TranCEP, a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment (MSA), and restriction to important positions of the alignment that play a role in determining the specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state-of-the-art predictors. The results quantify the contribution made by each type of information used.

Klíčová slova:

Anions – Cations – Membrane proteins – Multiple alignment calculation – Protein sequencing – Sequence alignment – Sequence databases – Transmembrane transport proteins


1. Buehler L. The Structure of Membrane Proteins. Cell Membranes. Garland Science; 2015.

2. Kozma D, Simon I, Tusnády GE. PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Research. 2013;41(D1):D524–D529. doi: 10.1093/nar/gks1169

3. Gromiha M, Ou Y. Bioinformatics approaches for functional annotation of membrane proteins. Briefings in Bioinformatics. 2014;15(2):155–168. doi: 10.1093/bib/bbt015

4. Butt AH, Rasool N, Khan YD. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology. 2017;250(1):55–76. doi: 10.1007/s00232-016-9937-7 27866233

5. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P. The protein data bank. Acta Crystallographica Section D: Biological Crystallography. 2002;58(6):899–907. doi: 10.1107/S0907444902003451

6. Schaadt NS, Christoph J, Helms V. Classifying substrate specificities of membrane transporters from Arabidopsis thaliana. Journal of Chemical Information and Modeling. 2010;50(10):1899–1905. doi: 10.1021/ci100243m 20925375

7. Chen S, Ou Y, Lee T, Gromiha MM. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. Bioinformatics. 2011;27(15):2062–2067. doi: 10.1093/bioinformatics/btr340 21653515

8. Schaadt N, Helms V. Functional classification of membrane transporters and channels based on filtered TM/non-TM amino acid composition. Biopolymers. 2012;97(7):558–567. doi: 10.1002/bip.22043 22492257

9. Barghash A, Helms V. Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs. BMC Bioinformatics. 2013;14(1):343. doi: 10.1186/1471-2105-14-343 24283849

10. Mishra NK, Chang J, Zhao PX. Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One. 2014;9(6):1–14. doi: 10.1371/journal.pone.0100278

11. Gromiha MM, Yabuki Y. Functional discrimination of membrane proteins using machine learning techniques. BMC Bioinformatics. 2008;9(1):135. doi: 10.1186/1471-2105-9-135 18312695

12. Li H, Benedito VA, Udvardi MK, Zhao PX. TransportTP: A two-phase classification approach for membrane transporter prediction and characterization. BMC Bioinformatics. 2009;10(418):1–13.

13. Ou YY, Chen SA, Gromiha MM. Classification of transporters using efficient radial basis function networks with position-specific scoring matrices and biochemical properties. Proteins: Structure, Function, and Bioinformatics. 2010;78(7):1789–1797.

14. Busch W, Saier M Jr. The IUBMB-endorsed transporter classification system. Methods in Molecular Biology. 2003;227:21. doi: 10.1385/1-59259-387-9:21 12824641

15. Saier MH Jr, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Research. 2006;34(suppl_1):D181–D186. doi: 10.1093/nar/gkj001

16. Saier MH Jr, Reddy VS, Tsu BV, Ahmed MS, Li C, Moreno-Hagelsieb G. The transporter classification database (TCDB): recent advances. Nucleic Acids Research. 2016;44(D1):D372–D379. doi: 10.1093/nar/gkv1103 26546518

17. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5(1):93–121. doi: 10.1038/nprot.2009.203 20057383

18. Sahoo S, Aurich MK, Jonsson JJ, Thiele I. Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. Frontiers in Physiology. 2014;5:91. doi: 10.3389/fphys.2014.00091 24653705

19. Dias O, Rocha M, Ferreira EC, Rocha I. Reconstructing genome-scale metabolic models with merlin. Nucleic Acids Research. 2015;43(8):3899–3910. doi: 10.1093/nar/gkv294 25845595

20. Loira N, Zhukova A, Sherman DJ. Pantograph: A template-based method for genome-scale metabolic model reconstruction. Journal of Bioinformatics and Computational Biology. 2015;13(02):1550006. doi: 10.1142/S0219720015500067 25572717

21. Aplop F, Butler G. TransATH: transporter prediction via annotation transfer by homology. ARPN Journal of Engineering and Applied Sciences. 2017;12(2).

22. Aplop F. Computational approaches to improving the reconstruction of metabolic pathway. Concordia University; 2016.

23. Farwick A, Bruder S, Schadeweg V, Oreb M, Boles E. Engineering of yeast hexose transporters to transport D-xylose without inhibition by D-glucose. Proceedings of the National Academy of Sciences. 2014;111(14):5159–5164. doi: 10.1073/pnas.1323464111

24. Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics. 2012;13(1):235. doi: 10.1186/1471-2105-13-235 22978315

25. Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Briefings in Bioinformatics. 2014;16(1):71–88. doi: 10.1093/bib/bbt092 24413183

26. Pirovano W, Feenstra KA, Heringa J. PRALINE™: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics. 2008;24(4):492–497. doi: 10.1093/bioinformatics/btm636 18174178

27. Chang JM, Di Tommaso P, Taly JF, Notredame C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics. 2012;13(Suppl 4):S1. doi: 10.1186/1471-2105-13-S4-S1 22536955

28. Floden EW, Tommaso PD, Chatzou M, Magis C, Notredame C, Chang JM. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic Acids Research. 2016;44(W1):W339–W343. doi: 10.1093/nar/gkw300 27106060

29. Bhat B, Ganai NA, Andrabi SM, Shah RA, Singh A. TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy. Scientific reports. 2017;7(1):12543. doi: 10.1038/s41598-017-13083-y 28970546

30. Chang JM, Di Tommaso P, Notredame C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Molecular Biology and Evolution. 2014; p. 1625–1637. doi: 10.1093/molbev/msu117 24694831

31. Lee TJ, Paulsen I, Karp P. Annotation-based inference of transporter function. Critical Reviews in Biochemistry and Molecular Biology. 2008;24:i259–i267.

32. Karp PD, Riley M, Paley SM, Pellegrini-Toole A. The MetaCyc database. Nucleic Acids Research. 2002;30(1):59–61. doi: 10.1093/nar/30.1.59 11752254

33. Reddy VS, Saier MH. BioV Suite—a collection of programs for the study of transport protein evolution. FEBS Journal. 2012;279(11):2036–2046. doi: 10.1111/j.1742-4658.2012.08590.x 22568782

34. Saier MH Jr, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Research. 2006;34(suppl_1):D181–6. doi: 10.1093/nar/gkj001

35. Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics. 2001;17(9):849–50. doi: 10.1093/bioinformatics/17.9.849 11590105

36. Paparoditis P, Västermark Å, Le AJ, Fuerst JA, Saier MH. Bioinformatic analyses of integral membrane transport proteins encoded within the genome of the planctomycetes species, Rhodopirellula baltica. Biochimica et Biophysica Acta (BBA)-Biomembranes. 2014;1838(1):193–215. doi: 10.1016/j.bbamem.2013.08.007

37. Li H, Dai X, Zhao X. A nearest neighbor approach for automated transporter prediction and categorization from protein sequences. Bioinformatics. 2008;24(9):1129–1136. doi: 10.1093/bioinformatics/btn099 18337257

38. Ren Q, Chen K, Paulsen IT. TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Research. 2007;35:D274–D279. doi: 10.1093/nar/gkl925 17135193

39. Lin H, Han L, Cai C, Ji Z, Chen Y. Prediction of transporter family from protein sequence by support vector machine approach. Proteins: Structure, Function, and Bioinformatics. 2006;62(1):218–231. doi: 10.1002/prot.20605

40. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147(1):195–7. doi: 10.1016/0022-2836(81)90087-5 7265238

41. Dias O, Gomes D, Vilaça P, Cardoso J, Rocha M, Ferreira EC, et al. Genome-wide semi-automated annotation of transporter systems. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2017;14(2):443–456. doi: 10.1109/TCBB.2016.2527647 26887005

42. Loira N, Dulermo T, Nicaud JM, Sherman DJ. A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica. BMC Systems Biology. 2012;6(1):35. doi: 10.1186/1752-0509-6-35 22558935

43. Liou YF, Vasylenko T, Yeh CL, Lin WC, Chiu SH, Charoenkwan P, et al. SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides. BMC Genomics. 2015;16(12):S6. doi: 10.1186/1471-2164-16-S12-S6 26677931

44. Li L, Li J, Xiao W, Li Y, Qin Y, Zhou S, et al. Prediction the substrate specificities of membrane transport proteins based on support vector machine and hybrid features. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2016;13(5):947–953. doi: 10.1109/TCBB.2015.2495140 26571537

45. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research. 2004;32(suppl_1):D258–61. doi: 10.1093/nar/gkh036 14681407

46. Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Bioinformatics. 2001;43(3):246–255. doi: 10.1002/prot.1035

47. Tanford C. Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. Journal of the American Chemical Society. 1962;84(22):4240–4247. doi: 10.1021/ja00881a009

48. Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proceedings of the National Academy of Sciences. 1981;78(6):3824–3828. doi: 10.1073/pnas.78.6.3824

49. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673 7984417

50. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31(1):365–370. doi: 10.1093/nar/gkg095 12520024

51. Ding Z. Diversified ensemble classifiers for highly imbalanced data learning and their application in bioinformatics. Georgia State University; 2011.

52. Weiss GM, Provost F. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research. 2003;19:315–354. doi: 10.1613/jair.1199

53. Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications. 2013;3(10).

54. Manning C, Raghavan P, Schütze H. Introduction to information retrieval. Natural Language Engineering. 2010;16(1):280–3.

55. Gorodkin J. Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry. 2004;28(5):367–374. doi: 10.1016/j.compbiolchem.2004.09.006 15556477

56. Kwak SG, Kim JH. Central limit theorem: the cornerstone of modern statistics. Korean Journal of Anesthesiology. 2017;70(2):144–156. doi: 10.4097/kjae.2017.70.2.144 28367284

57. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology. 1982;157(1):105–32. doi: 10.1016/0022-2836(82)90515-0 7108955

58. Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A. The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Research. 2015;43(W1)W401–W407. doi: 10.1093/nar/gkv485 25969446

59. Tsirigos KD, Elofsson A, Bagos PG. PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins. Bioinformatics. 2016;32(17):i665–i671. doi: 10.1093/bioinformatics/btw444 27587687

Článek vyšel v časopise


2020 Číslo 1