Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes

Autoři: C. K. Sruthi aff001;  Meher Prakash aff001
Působiště autorů: Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore, India aff001
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227621


Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realistic paradigm of assay context-aware predictive hybrid models that combine minimal experimental data from deep mutational scans with structure, sequence information and computational models, we define and evaluate different strategies for choosing this minimal set. We evaluated the trivial strategy of a systematic reduction in the number of mutational studies from 85% to 15%, along with several others about the choice of the types of mutations such as random versus site-directed with the same 15% data completeness. Interestingly, the predictive capabilities by training on a random set of mutations and using a systematic substitution of all amino acids to alanine, asparagine and histidine (ANH) were comparable. Another strategy we explored, augmenting the training data with measurements of the same mutants at multiple assay conditions, did not improve the prediction quality. For the six proteins we analyzed, the bin-wise error in prediction is optimal when 50-100 mutations per bin are used in training the computational model, suggesting that good prediction quality may be achieved with a library of 500-1000 mutations.

Klíčová slova:

Alanine – Amino acid substitution – Human mobility – Mutation detection – Neural networks – Point mutation – Protein sequencing – Substitution mutation


1. Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nature Reviews Genetics. 2007;8(8):610. doi: 10.1038/nrg2146 17637733

2. Nachman M. Single nucleotide polymorphisms and recombination rate in humans. Trends in Genetics. 2001;17(9):481–485. doi: 10.1016/s0168-9525(01)02409-x 11525814

3. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population differentiation in modern humans. Nature Genetics. 2008;40(3):340–345. doi: 10.1038/ng.78 18246066

4. Gudmundsson J, Sulem P, Steinthorsdottir V, Bergthorsson JT, Thorleifsson G, Thorsteinsdottir U, et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genetics. 2007;39(8):977–983. doi: 10.1038/ng2062 17603485

5. O’Hayre M, Vazquez-Prado J, Kufareva I, Stawiski EW, Handel TM, Seshagiri S, et al. The emerging mutational landscape of G proteins and G-protein-coupled receptors in cancer. Nature Genetics. 2013;13:412–424.

6. Walsh C. Molecular mechanisms that confer antibacterial drug resistance. Nature. 2000;406:775–781. doi: 10.1038/35021219 10963607

7. Brown ED, Wright GD. Antibacterial drug discovery in the resistance era. Nature. 2017;529:336–343. doi: 10.1038/nature17042

8. Sommer MOA, Munck C, Toft-Kehler RV, Andersson DI. Molecular mechanisms that confer antibacterial drug resistance. Nature. 2000;406:775–781. doi: 10.1038/35021219

9. Cunningham B, Wells J. High-resolution epitope mapping of high-receptor interactions by alanine-scanning mutagenesis. Science. 1989;244(4908):1081–1085.

10. Kristensen C, Kjeldsen T, Wiberg FC, Schäffer L, Hach M, Havelund S, et al. Alanine scanning mutagenesis of insulin. Journal of Biological Chemistry. 1997;272(20):12978–12983. doi: 10.1074/jbc.272.20.12978 9148904

11. Yu MH, Weissman JS, Kim PS. Contribution of individual side-chains to the stability of BPTI examined by alanine-scanning mutagenesis. Journal of molecular biology. 1995;249(2):388–397. doi: 10.1006/jmbi.1995.0304 7540212

12. Hove-Jensen B, Bentsen AKK, Harlow KW. Catalytic residues Lys197 and Arg199 of Bacillus subtilis phosphoribosyl diphosphate synthase: Alanine-scanning mutagenesis of the flexible catalytic loop. The FEBS journal. 2005;272(14):3631–3639. doi: 10.1111/j.1742-4658.2005.04785.x 16008562

13. Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ, Baker D, et al. High-resolution mapping of protein sequence-function relationships. Nature Methods. 2010;7(9):741. doi: 10.1038/nmeth.1492 20711194

14. Hietpas RT, Jensen JD, Bolon DNA. Experimental illumination of a fitness landscape. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(19):7896–7901. doi: 10.1073/pnas.1016024108 21464309

15. Zheng L, Baumann U, Reymond JL. An efficient one-step site-directed and site-saturation mutagenesis protocol. Nucleic Acids Research. 2004;32:e115. doi: 10.1093/nar/gnh110 15304544

16. Araya CL, Fowler DM. Deep mutational scanning: assessing protein function on a massive scale. Trends in Biotechnology. 2011;29(9):435–442. doi: 10.1016/j.tibtech.2011.04.003 21561674

17. Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only very few mutational paths to fitter proteins. science. 2006;312(5770):111–114. doi: 10.1126/science.1123539 16601193

18. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Research. 2012;40(W1):W452–W457. doi: 10.1093/nar/gks539 22689647

19. Hecht M, Bromberg Y, Rost B. Better prediction of functional effects for sequence variants. BMC genomics. 2015;16(8):S1. doi: 10.1186/1471-2164-16-S8-S1 26110438

20. Niroula A, Urolagin S, Vihinen M. PON-P2: prediction method for fast and reliable identification of harmful variants. PloS one. 2015;10(2):e0117380. doi: 10.1371/journal.pone.0117380 25647319

21. Yue P, Li Z, Moult J. Loss of protein structure stability as a major causative factor in monogenic disease. Journal of Molecular Biology. 2005;353(2):459–473. doi: 10.1016/j.jmb.2005.08.020 16169011

22. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nature Methods. 2010;7(4):248–249. doi: 10.1038/nmeth0410-248 20354512

23. González-Pérez A, López-Bigas N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. The American Journal of Human Genetics. 2011;88(4):440–449. doi: 10.1016/j.ajhg.2011.03.004 21457909

24. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nature genetics. 2014;46(3):310. doi: 10.1038/ng.2892 24487276

25. Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. The American Journal of Human Genetics. 2016;99(4):877–885. doi: 10.1016/j.ajhg.2016.08.016 27666373

26. Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M. PON-P: Integrated predictor for pathogenicity of missense variants. Human mutation. 2012;33(8):1166–1174. doi: 10.1002/humu.22102 22505138

27. Hopf TA, Ingraham JB, Poelwijk FJ, Scharfe CPI, Springer M, Sander C, et al. Mutation effects predicted from sequence co-variation. Nature Biotechnology. 2017;35(2):128–135. doi: 10.1038/nbt.3769 28092658

28. Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nature Methods. 2018;15(10):816+. doi: 10.1038/s41592-018-0138-4 30250057

29. Gray VE, Hause RJ, Luebeck J, Shendure J, Fowler DM. Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell systems. 2018;6(1):116–124. doi: 10.1016/j.cels.2017.11.003 29226803

30. Riera C, Padilla N, de la Cruz X. The complementarity between protein-specific and general pathogenicity predictors for amino acid substitutions. Human mutation. 2016;37(10):1013–1024. doi: 10.1002/humu.23048 27397615

31. Weile J, Sun S, Cote AG, Knapp J, Verby M, Mellor JC, et al. A framework for exhaustively mapping functional missense variants. Molecular Systems Biology. 2017;13(12). doi: 10.15252/msb.20177908 29269382

32. Yingzhou W, Weile J, Cote A, Sun S, Knapp J, Verby M, et al. A web application and service for imputing and visualizing missense variant effect maps. Bioinformatics (Oxford, England). 2019.

33. Stiffler MA, Hekstra DR, Ranganathan R. Evolvability as a Function of Purifying Selection in TEM-1 beta-Lactamase. Cell. 2015;160(5):882–892. doi: 10.1016/j.cell.2015.01.035 25723163

34. Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS. Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Research. 2014;42(14). doi: 10.1093/nar/gku511 24914046

35. Mishra P, Flynn JM, Starr TN, Bolon DNA. Systematic Mutant Analyses Elucidate General and Client-Specific Aspects of Hsp90 Function. Cell Reports. 2016;15(3):588–598. doi: 10.1016/j.celrep.2016.03.046 27068472

36. Brenan L, Andreev A, Cohen O, Pantel S, Kamburov A, Cacchiarelli D, et al. Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. Cell Reports. 2016;17(4):1171–1183. doi: 10.1016/j.celrep.2016.09.061 27760319

37. Gray VE, Hause RJ, Fowler DM. Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions. Genetics. 2017;207(1):53–61. doi: 10.1534/genetics.117.300064 28751422

38. Adkar BV, Tripathi A, Sahoo A, Bajaj K, Goswami D, Chakrabarti P, et al. Protein model discrimination using mutational sensitivity derived from deep sequencing. Structure. 2012;20(2):371–381. doi: 10.1016/j.str.2011.11.021 22325784

39. Wong TS, Roccatano D, Zacharias M, Schwaneberg U. A statistical analysis of random mutagenesis methods used for directed protein evolution. Journal of molecular biology. 2006;355(4):858–871. doi: 10.1016/j.jmb.2005.10.082 16325201

40. Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chemical Society Reviews. 2015;44(5):1172–1239. doi: 10.1039/c4cs00351a 25503938

41. Abdullah T, Faiza M, Pant P, Akhtar MR, Pant P. An Analysis of Single Nucleotide Substitution in Genetic Codons-Probabilities and Outcomes. Bioinformation. 2016;12(3):98. doi: 10.6026/97320630012098 28149042

42. Matuszewski S, Hildebrandt ME, Ghenu AH, Jensen JD, Bank C. A Statistical Guide to the Design of Deep Mutational Scanning Experiments. Genetics. 2016;204(1):77–87. doi: 10.1534/genetics.116.190462 27412710

44. Chennubhotla C, Bahar I. Signal propagation in proteins and relation to equilibrium fluctuations. PLOS Computational Biology. 2007;3(9):1716–1726. doi: 10.1371/journal.pcbi.0030172 17892319

43. Bromberg Y, Yachdav G, Rost B. SNAP predicts effect of mutations on protein function. Bioinformatics. 2008;24(20):2397–2398. doi: 10.1093/bioinformatics/btn435 18757876

45. Henikoff S, Henikoff J. Amino-acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA. 1992;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915

46. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of molecular biology. 1982;157(1):105–132. doi: 10.1016/0022-2836(82)90515-0 7108955

47. Sruthi C, Prakash M. Amino acid impact factor. PloS one. 2018;13(6):e0198645. doi: 10.1371/journal.pone.0198645 29897971

48. Halabi N, Rivoire O, Leibler S, Ranganathan R. Protein sectors: evolutionary units of three-dimensional structure. Cell. 2009;138(4):774–786. doi: 10.1016/j.cell.2009.07.038 19703402

49. Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems. 2006;1695(5):1–9.

Článek vyšel v časopise


2020 Číslo 1