In silico analysis and high-risk pathogenic phenotype predictions of non-synonymous single nucleotide polymorphisms in human Crystallin beta A4 gene associated with congenital cataract

Authors: Zhenyu Wang ^aff001; Chen Huang ^aff001; Huibin Lv ^aff001; Mingzhou Zhang ^aff001; Xuemin Li ^aff001
Authors place of work: Department of Ophthalmology, Peking University Third Hospital, Beijing, China ^aff001; Beijing Key Laboratory of Restoration of Damaged Ocular Nerve, Peking University Third Hospital, Beijing, China ^aff002; Medical Research Center, Peking University Third Hospital, Beijing, China ^aff003
Published in the journal: PLoS ONE 15(1)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0227859

Summary

In order to provide a cost-effective method to narrow down the number of pathogenic Crystallin beta A4 (CRYBA4) non-synonymous single nucleotide polymorphisms (nsSNPs), we collected nsSNP information of the CRYBA4 gene from SNP databases and literature, predicting the pathogenicity and possible changes of protein properties and structures using multiple bioinformatics tools. The nsSNP data of the CRYBA4 gene were collected from 4 databases and published literature. According to 12 criteria, six bioinformatics tools were chosen to predict the pathogenicity. I-Mutant 2.0, Mupro and INPS online tools were used to analyze the effects of amino acid substitution on protein stability by calculating the value of ΔΔG. ConSurf, SOPMA, GETAREA and HOPE online tools were used to predict the evolutionary conservation of amino acids, solvent accessible surface areas, and the physical and chemical properties and changes of protein structure. All 157 CRYBA4 nsSNPs were analyzed. Forty-four CRYBA4 high-risk pathogenic nsSNPs (predicted to be pathogenic by all six software tools) were detected out of the 157 CRYBA4 nsSNPs, four of which (c.283C>T, p.R95W; c.449T>A, p.V150D; c.475G>A, p.G159R; c.575G>C, p.R192P) should be focused on because of their high potential pathogenicity and possibility of changing protein properties. Thirty high-risk nsSNPs were predicted to cause a decrease of protein stability. Twenty-nine high-risk nsSNPs occurred in evolutionary conserved positions. Twenty-two high-risk nsSNPs occurred in the core of the protein. It is predicted that these high-risk pathogenic nsSNPs can cause changes in the physical and chemical properties of amino acids, resulting in structural changes of proteins and changes in the interactions between domains and other molecules, thus affecting the function of proteins. This study provides important reference value when narrowing down the number of pathogenic CRYBA4 nsSNPs and studying the pathogenesis of congenital cataracts. By using this method, we can easily find 44 high-risk pathogenic nsSNPs out of 157 CRYBA4 nsSNPs.

Keywords:

Protein interactions – Protein structure – Pathogenesis – Mutation databases – Amino acid analysis – Protein structure prediction – Software tools – Cataracts

Introduction

Crystallin beta A4 (CRYBA4) is one of the pathogenic genes associated with congenital cataract. Mutations in these genes can cause abnormal lens metabolism in the embryonic stage and slow lens development and formation of opaque scar tissue in the lens, which eventually leads to congenital cataracts [1–3]. The annual incidence of congenital cataracts is 1.8–3.6/10,000 children [1], and inherited cataracts account for 8–25%. Among all hereditary modes, the autosomal dominant congenital cataract (ADCC) is the most common, accounting for about 73% [4–6] of cases. It has also been shown that variable clinical features are presented in different families with ADCC [7,8].

CRYBA4 belongs to the crystallin family. Mammalian lens crystallins are divided into alpha, beta and gamma families. Alpha and beta families are further divided into acidic and basic groups. CRYBA4, a beta acidic group member, is part of a gene cluster along with beta-B1 (CRYBB1), beta-B2 (CRYBB2) and beta-B3 (CRYBB3). The human CRYBA4 gene is located in chromosome 22 (22q12.1) and contains seven exons. There is one CRYBA4 transcript (NM_001886.3) and one protein product (NP_001877.1). During the development of the human eye, CRYBA4 is one of several crystallins expressed in mature lens fiber cells, constitutes 5% of the proteins of vertebrate eye lens, and maintains the transparency and refractive index of the lens [9,10]. CRYBA4 is higher in neonatal lenses than adult lenses [11]. Therefore, mutations of CRYBA4 may lead to visual impairment of neonates.

In the human genome, single nucleotide polymorphisms (SNPs) account for more than 90% of all nucleic acid sequence variations. Non-synonymous single nucleotide polymorphisms (nsSNPs) in the protein coding region cause amino acid substitution, which may affect protein function and lead to pathogenic phenotypes [12]. In 2006, the first mutation in the CRYBA4 gene (c.242T>C, p.L69P) was found and was reported to be associated with the autosomal dominant congenital cataract [11]. Our team also found in 2018 that two heterozygous mutations of the CRYBA4 gene might be pathogenic mutations in the Chinese autosomal dominant congenital cataract families. So far, six nsSNPs of the CRYBA4 gene associated with congenital cataracts have been confirmed (A9V, G64W, Y67N, L69P, S93P, F94S) [11,13–17].

It has been speculated there are more ADCC-related CRYBA4 nsSNPs that have not been found. It is necessary to seriously study the relationship between CRYBA4 nsSNPs and the CRYBA4 gene pathogenic phenotype, which will help analyze the pathogenesis of ADCC. With current expanding applications of next-generation sequencing and increasingly developed computational methods, the preliminary pathogenicity screening of gene mutation sites based on in silico analysis using bioinformatics tools has been widely recognized and applied by researchers. The performances of different bioinformatics tools under various conditions are different. Li. et al [18] compared 12 performance measures of 23 methods based on three independent benchmark datasets. The 12 criteria included positive predictive value (PPV), negative predictive value (NPV), false negative rate (FNR), sensitivity, false positive rate (FPR), specificity, Mathew correlation coefficient (MCC), receiver operating characteristic (ROC) curve, area under the curve (AUC), high-sensitivity regional AUC (hser-AUC) and high-specificity regional AUC (hspr-AUC). In this study, we systematically collect the nsSNP information of the CRYBA4 gene from SNP databases and literature, screening the nsSNPs for high-risk pathogenicity using multiple bioinformatics software tools based on their high accuracy and frequency of use [18,19]. This study is undertaken to explore the nsSNPs of the CRYBA4 gene responsible for ADCC and to predict the deleterious nature of the mutation.

Materials & methods

Nonsynonymous SNP data retrieval

The nsSNP data were retrieved from four databases, including the Single Nucleotide Polymorphism database (dbSNP, RRID: SCR_002338, http://www.ncbi.nlm.nih.gov/projects/SNP/) [20], Clinvar database (RRID: SCR_006169, https://www.ncbi.nlm.nih.gov/clinvar) [21], the Human Gene Mutation Database (HGMD, RRID: SCR_001888, http://www.hgmd.cf.ac.uk/ac/index.php) [22] and DisGeNET database (RRID: SCR_006178, http://www.disgenet.org) [23]. In order to avoid omitting additional nsSNPs from the literature, we used “CRYBA4”, “nsSNP” and “congenital cataract” as search terms to retrieve data. The relevant information of the CRYBA4 nsSNP included SNP ID, chromosome loci, nucleic acid changes, amino acid changes and whether they were reported. The information on mRNA (NM_002473.5) and protein (NP_002464.1) was obtained from the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov). All authors didn’t have access to information that could identify individual participants during or after data collection. Choosing pathogenicity predictors in our study was decided based on high accuracy and frequency of use [18].

Prediction of pathogenicity of nsSNPs

To analyze and predict the pathogenicity of CRYBA4 nsSNPs precisely, six software tools were used including Mutpred2 (RRID: SCR_010778, http://mutpred.mutdb.org/) [24], PANTHER-PSEP (PANTHER Evolutionary analysis of coding SNPs, RRID: SCR_005145, http://pantherdb.org/tools/csnpScoreForm.jsp) [25], PhD-SNP (RRID: SCR_010782, http://snps.biofold.org/phd-snp/phd-snp.html) [26], PolyPhen 2.0 (Polymorphism phenotyping 2.0, RRID: SCR_013189, http://genetics.bwh.harvard.edu/pph2/) [27], PROVEAN (Protein variation effect analyzer, RRID: SCR_002182, http://provean.jcvi.org/index.php) [28], and SIFT (Sorting intolerant from tolerant, RRID: SCR_012813, http://sift.bii.a-star.edu.sg/) [29].

Mutpred2 is a sequence-homology based bioinformatics tools predicting potential pathogenicity and changes of protein properties. An amino acid substitution with the general score of >0.5–1 was considered to be pathogenic.

PANTHER-PSEP is able to calculate the score of position-specific evolutionary preservations. It measures the length of time (in millions of years) a position in a current protein has been preserved by tracing back to its reconstructed direct ancestors. The longer a position has been preserved, the more likely that it will have a deleterious effect. The thresholds were: "probably damaging" (time > 450my, corresponding to a false positive rate of ~0.2 as tested on HumVar), "possibly damaging" (450my > time > 200my, corresponding to a false positive rate of ~0.4) and "probably benign" (time < 200my).

PhD-SNP is a pathogenicity-predicting tool based on support vector machines. The Swiss-prot database was used for training and the prediction was used with 20-fold cross validation.

PolyPhen 2.0 is a widely used supervised machine-learning Na ve Bayes classifier. The two algorithms within PolyPhen2 were HumDiv and HumVar. The cut off score was 0.50 and mutations with scores over 0.50 were predicted to be pathogenic. HumDiv was trained using all 3,155 damaging alleles annotated in the UniProt database as causing human Mendelian diseases and affecting protein stability or function, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging. HumVar was trained using all 13,032 human disease-causing mutations from UniProt and 8,946 human nsSNPs without annotated involvement in disease, which the operators treated as non-damaging. The operators of PolyPhen2 indicated that HumVar had lower accuracy than HumDiv. One reason was that nsSNPs assumed to be non-damaging in the HumVar dataset included a sizable fraction of mildly deleterious alleles that may not be fixed in the evolving lineage. Also, the HumDiv dataset used extra criteria to avoid possible erroneous annotations of damaging mutations. Therefore, we chose the predicting results of HumDiv [27].

PROVEAN and SIFT are sequence homology-based tools which have independently developed algorithms. The cutoff score of PROVEAN is -2.5 and mutations with scores over -2.5 are predicted to be pathogenic. The cutoff score of SIFT is 0.05 and mutations with scores over 0.05 are predicted to be pathogenic.

In order to summarize the predictive results of the software tools, we set a scoring criterion. If a nsSNP was predicted to be “benign” or “harmless” by one of the six software tools, it would score 0. However, it would get 1 point if one software tool predicted it as a “pathogenic” or “harmful” nsSNP. The nsSNP that was predicted to be “pathogenic” by all six software tools would be defined as a “CRYBA4 high-risk pathogenic nsSNP” with 6 points.

Prediction of protein stability

Protein stability is the basic characteristic that affects the function, activity and regulation of biological molecules. Free energy of protein unfolding is a key index of protein stability. By analyzing the influence of mutation on free energy, the effect of mutation on protein stability could be accurately determined. I-Mutant 2.0 (http://folding.biofold.org/i-mutant/i-mutant 2.0) [30], Mupro (http://mupro.proteomics.ics.uci.edu/) [31] and INPS (Impact of Non-synonymous mutations on Protein Stability, http://inpsmd.biocomp.unibo.it) [32]. In the case of unknown protein structure, the online software tools could calculate and predict whether the substitution could change the stability of protein by using support vector machines (SVM) with the amino acid sequence and amino acid substitution infomration. The software predicted the change of stability by calculating the change of thermodynamic free energy (ΔΔG) and the direction of change after single point mutation of protein: ΔΔG > 0 indicated stabilization while a negative value indicated destabilization.

Evolutionary conservation analysis of nsSNPs and prediction of protein structure and property changes

The CRYBA4 domain was retrieved using the Uniprot database (https://www.uniprot.org/). ConSurf online software (http://consurf.tau.ac.il) [33] was used to analyze the evolutionary conservation of amino acids by calculating the conservation score through unique algorithm. The amino acids with scores between 7 and 9 were evolutionary conservative amino acids. The SOPMA online software (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html) [34] used five independent algorithms to predict the secondary structure of proteins. Then, the GETAREA online software (http://curie.utmb.edu/getarea.html) [35] was used to predict the effect of amino acid mutation by calculating the solvent accessible surface areas of the proteins. The HOPE online software (Have (y) Our Protein Explained, http://www.cmbi.ru.nl/hope/input/) [36] was used to predict the effect of amino acid mutation on physical and chemical properties, hydrophobicity, spatial structure and function of proteins.

Results

Collection and collation of nsSNP data

We retrieved 176 nsSNPs related to the CRYBA4 gene from four databases (155 from the db SNP database, seven from the ClinVar database, five from the HGMD database, and three from the DisGeNET database). After manual screening and eliminating duplicate records, there were 156 CRYBA4 nsSNPs included in four databases. It is noteworthy that five (A9V, G64W, Y67N, L69P, F94S) of the 156 nsSNPs in various databases have been reported to be associated with congenital cataracts. There is one reported nsSNP (c. 277T>C, p. S93P) that has not yet been included in any database (Fig 1). According to the results of database screening and literature reports, there are only three nsSNPs (G64W, L69P, F94S) that have been reported in four databases and in the literature.

**Fig. 1. CRYBA4 coding region nsSNP dataset intersection Wayne map.**

Prediction of pathogenicity of nsSNPs

Six pathogenicity prediction software tools (Mutpred2, PANTHER-PSEP, PhD-SNP, PolyPhen 2.0, PROVEAN and SIFT) were used to predict the pathogenicity of 157 nsSNPs. Fig 2A shows the prediction results of six computation tools. As a result of integrating six software tools, nine nsSNPs were predicted to be "benign" with a score of 0, while 19, 22, 22, 19 and 22 nsSNPs got a score of 1, 2, 3, 4 and 5, respectively, for "pathogenic" or "harmful" (Fig 2B). There were 44 nsSNPs that were predicted to be "pathogenic" or "harmful" by all the six software tools and received 6 points. We defined these nsSNPs as "CRYBA4 high-risk pathogenic nsSNPs". As shown in Table 1, these 44 nsSNPs can lead to the substitution of 39 amino acids in the CRYBA4 amino acid sequence. Among these nsSNPs, four nsSNPs (G64W, L69P, S93P, F94S) have been reported to be associated with congenital cataracts. The other 40 nsSNPs are the newly identified high-risk pathogenic nsSNPs.

The other two reported pathogenic nsSNPs were Y67N and A9V. Y67N got 5 points and only PANTHER predicted it as a “benign” nsSNP. A9V was predicted to be "benign" by all six software tools and scored 0 points. The predictions above indicate that the accuracy when integrating the six pieces of predicting software reaches 66.67% (4/6), although there is a certain false negative rate in the results due to insufficiencies of the predicting algorithm of each software.

Prediction of protein stability

In order to predict the effects of amino acid substitution caused by nsSNP on protein stability, we used I-Mutant 2.0, Mupro, INPS online software and the CRYBA4 protein (NP_001877.1) information to compare the effects of mutant and wild type amino acids on protein free energy. Among 157 nsSNPs, I-Mutant 2.0 predicted that 126 amino acid substitutions caused by nsSNPs resulted in the decrease of protein free energy. Mupro software and INPS software predicted 148 and 123 mutations, respectively, leading to a decrease in protein stability. When integrating the predicted results, all three software tools predicted that the free energy of 105 sites of protein would decrease after the mutation of amino acids and the ΔΔG < 0, resulting in protein stability decline. Among these, there were six sites (V15A, R26P, F94S, Y126S, V163A and W179R) that were scored below –1 by all three software tools, indicating that the stability of protein decreased sharply after the mutations.

As shown in Fig 3, 44 "CRYBA4 high-risk pathogenic nsSNPs" can cause 39 wild-type amino acids to be replaced by 44 mutant amino acids. Three software tools (I-Mutant 2.0, Mupro and INPS) suggest that 30 of the 44 nsSNPs may lead to a decline in protein stability. Among the 30 nsSNPs, there are 28 newly discovered high-risk pathogenic nsSNPs: W12R, F22L, R25W, R26W, R26P, E28K, E32K, S46Y, G56D, E58K, L69P, R71P, E73K, Y86S, F94S, R95W, D104G, E111K, E138G, V150D, C151F, G159R, Q161H, Y162C, G171S, R177W, E178K, W179R, R192H, and R192P. The two nsSNPs, L69P and F94S, were previously reported pathogenic variants identified in families.

**Fig. 3. I-Mutant 2.0, Mupro and INPS software predicted the change of protein free energy caused by nsSNPs.**

Evolutionary Conservation Analysis and Protein Structure Analysis

Evolutionary conservation is a key property of amino acid. ConSurf and SOPMA online tools were used to analyze the evolutionary conservation of CRYBA4 amino acid and protein structure. CRYBA4 protein structure is shown in Fig 4A, including the "Crystall" domain and "XTALbg" domain. According to the analysis of the Consurf online tool, there was a total of 60 evolutionary conserved positions with scores between 7 and 9 (Fig 4B). SOPMA online software predicted the secondary structure of the CRYBA4 protein. As shown in Fig 4C, CRYBA4 protein is composed of 196 amino acids, consisting of four secondary structures, 23 amino acids in the alpha helix (accounting for 11.73%), 45 amino acids in the extended strand (accounting for 22.96%), 22 amino acids in the beta turn (accounting for 11.22%), and 106 amino acids in the random coil (accounting for 54.08%).

**Fig. 4. Structural domains, evolutionary conservation and protein structure analysis of CRYBA4 high-risk pathogenic nsSNPs.**

Further analysis was carried out based on the pathogenicity prediction results. As shown in Fig 4 and Table 2, 39 wild-type amino acids can be replaced by 44 CRYBA4 high-risk pathogenic nsSNPs, including 18 wild-type amino acids in the "Crystall" domain, 18 in the "XTALbg" domain and three in the other regions. Evolutionary conservation analysis showed that 29 amino acids were in conserved positions (12 amino acids got 9 points, seven amino acids got 8 points, five amino acids got 7 points, and five amino acids got 6 points). This suggested that they were highly conserved in evolution. At the same time, the protein secondary structure prediction indicated that one amino acid was located in the alpha helix, accounting for 2.56%; 11 amino acids in the extension strand, accounting for 28.21%; seven amino acids in beta turn, accounting for 17.95%; and 20 amino acids in random coil, accounting for 51.28%.

**Tab. 2. Evolutionary conservativeness analyses and protein structure prediction of CRYBA4 high-risk pathogenic nsSNPs.**

Protein property analysis

GETAREA and HOPE online tools were used to analyze the solvent accessible surface areas, physical and chemical properties, and changes in protein structure. Using "3lwk.1 crystal structure of human Beta-crystallin A4" as a template, 39 "CRYBA4 high-risk pathogenic nsSNP" related amino acids were analyzed by GETAREA and HOPE online software. GETAREA calculated the solvent-accessible surface areas of the protein (Fig 5). Except for seven (4%) unrecorded amino acids, the software calculated the solvent-accessible surface areas of 189 amino acids. There were 51 (26%) amino acids in the surface of the protein, 95 (48%) in the core of the protein and 43 (22%) in other parts of the protein. Among the 39 CRYBA4 high-risk pathogenic nsSNP related amino acids, there were eight (21%) amino acids in the surface of the protein, 22 (56%) in the core of the protein and nine (23%) in other parts of the protein. This suggested that mutations in the core of the protein were more likely to have harmful effects on protein structure and function.

**Fig. 5. The solvent accessible surface areas of the protein result calculated by GETAREA software.**

HOPE predicted the changes of amino acids before and after mutation on the physical and chemical properties, hydrophobicity, spatial structure, and function of protein (Table 2). All 44 CRYBA4 high-risk pathogenic nsSNPs caused changes in size of the amino acids (18 smaller and 26 larger). There were 28 nsSNPs that caused change of charge. Also, 29 nsSNPs changes the hydrophobicity of the amino acids (10 lower and 19 higher). The changes of physical and chemical properties of these sites before and after amino acid mutation led to changes in protein structure and interaction between structural domains and other molecules, affecting the protein function.

Discussion

In a genome, nsSNPs account for about 50% of allele variation of all hereditary human diseases [37]. Identifying nsSNPs responsible for specific phenotypes using molecular approaches is expensive and time-consuming [38], and so bioinformatics predicting approaches can help in narrowing down the number of high-risk pathogenic nsSNPs to be screened in genetic association studies, and in a better understanding of the function and structure of protein products.

However, the performances of different bioinformatics tools under various conditions were different. It is important to choose tools with good performance. In order to objectively evaluate performances of different tools, Li. et al [18] compared 12 performance measures of 23 methods based on three independent benchmark datasets. Luxembourg. et al [19] used SERPINC1 missense mutations to test 12 tools. According to their tentative guidance for optimal tool section, we chose six bioinformatics tools as pathogenic predictors (Mutpred2, PANTHER-PSEP, PhD-SNP, PolyPhen 2.0, PROVEAN and SIFT) for their high accuracy and frequency of use.

In this study, 44 nsSNPs that were predicted to be pathogenic by all software tools were defined as “CRYBA4 high-risk pathogenic nsSNPs”. In order to further narrow down the number of possible pathogenic nsSNPs, the prediction of protein stability, the evolutionary conservation of amino acids, the physical and chemical properties, and changes of protein structure after mutations were analyzed by I-Mutant 2.0, Mupro, INPS, ConSurf, SOPMA, GETAREA and HOPE online software tools.

Protein stability is an important characteristic influencing the function, activity and regulation of proteins. Changes in protein stability are usually accompanied by changes in free energy (ΔΔG). When ΔΔG < 0, the stability decreases [30–32]. It has been reported that the breakdown of lens microarchitecture and the aggregation of crystallins can cause cataracts [39,40]. Human γD crystallin (HγDC) is one of the most abundant crystallins in the central lens nucleus, and its thermodynamic stability and aggregation are associated with the development of age-related cataracts [10]. Aguayo-Ortiz R et al, [41] have used alchemical free energy calculations to predict changes in the thermodynamic stability (ΔΔG) of 10 alanine-scanning variants and 12 HγDC mutations associated with the development of congenital cataracts. These results showed that changes in ΔΔG are associated with the significant position of the motif, thus affecting thermodynamic stability. Our analyses clearly show that 68.2% (30/44) of the CRYBA4 high-risk pathogenic nsSNPs could lead to a decrease in protein stability, and only two of these have been reported in published literature. These mutations of single or multiple amino acids can cause changes and damage in hydrophobicity, protein folding, main chain tension, and electrostatic force, leading to changes in protein stability [42]. We recommend further investigations on nsSNPs that lead to a decrease in protein stability. In addition, we recommend conducting experiments about molecular dynamics in order to investigate the effect of mutations on the stability of protein structure.

In addition to analyzing changes in free energy after mutations, the effect of mutation on protein stability is still based on changes of protein structure. Highly evolutionary conserved amino acids are important parts of secondary structure and domains. Mutations that cause substitution of highly conserved amino acids may change the secondary structure and interaction between structural domains and other molecules, which would affect protein function. Our study shows that 74.4% (29/39) of the amino acids related to “CRYBA4 high-risk pathogenic nsSNPs” are conserved and 92.3% (36/39) of them are in important functional domains. The high-risk pathogenic nsSNPs leads to changes in size, charge and hydrophobicity of most amino acids. According to Table 2, all 44 CRYBA4 high-risk pathogenic nsSNPs led to changes in amino acid size. 63.6% (28/44) of the CRYBA4 high-risk pathogenic nsSNPs changed the charge through substitution of amino acids and 65.9% (29/44) of nsSNPs changed the hydrophobicity. Thirteen residues are located on the surface of the CRYBA4 protein. Mutation in these locations may affect the disturbance interactions with other molecules or other parts of the protein. We used String (https://string-db.org) software and found three molecules (Beta-crystallin B1, CRYBB1; Beta-crystallin B2, CRYBB2 and DNA damage-binding protein 2, DDB2) that had interactions with CRYBA4. CRYBA4, CRYBB1 and CRYBB2 all belong to crystallin family. These three proteins have similar sequences and are co-expressed in humans and other species. DDB2 is required for DNA repair. It can bind to DNA damage-binding protein 1 (DDB1) to form the UV-damaged DNA-binding protein complex (the UV-DDB complex). It has been reported that tritiated water exposure disrupts myofibril structure and induces misregulation of eye opacity through the mutation of CRYBA4 and DNA repair genes in early zebrafish life stages [43]. However, more experiments are needed to fully confirm the connection between CRYBA4 protein and DDB2 protein. Our study suggests that nsSNPs in conserved positions may affect the function and properties of protein by changing the protein structure.

Our study provides predictive results of pathogenicity, as well as analysis results of protein properties and structural changes by nsSNPs. In order to test the accuracy of our predictive results, we conducted the comparison between theoretical prediction results and experimental studies in literature. As we found, the accuracy of prediction results was good (Table 3). For example, we predicted that G64W cause unusual torsion angles, thus forcing the local backbone into an incorrect conformation and disturbing the local structure and interactions with other molecules. According to Li et al[44], CRYBA4 p.G64W is prone to form inclusion body when expressed in E. coli which indicated that the p.G64W mutant might affect its folding properties of CRYBA4. Besides, G64W increases self-interaction in vitro and in mammalian cell. CRYBA4 p.G64W is less stable because it can affect CRYBA4 folding and block its interaction with CRYBB1.However, we also found that the literature data was insufficient. Further work is required to verify the accuracy of bioinformatics analysis.

**Tab. 3. Comparison between theoretical prediction results and biophysical experimental data available in the literature on beta crystallins.**

By using our predictive methods, we obtained four important nsSNPs (c.283C>T, p.R95W; c.449T>A, p.V150D; c.475G>A, p.G159R; c.575G>C, p.R192P) from 44 CRYBA4 high-risk pathogenic nsSNPs to focus on because: (i) the four nsSNPs were predicted to be pathogenic by all six predicting tools; (ii) according to three protein stability analyzing tools, they could all lead to ΔΔG < 0, resulting in protein stability decline; (iii) the results of evolutionary conservation analyzing tools showed that the four nsSNPs were all in highly conserved sites; and (iv) protein structural analyzing tools indicated that all four nsSNPs could cause changes in the size, charge and hydrophobicity of CRYBA4 protein.

Admittedly, there were several limitations of this study. The first limitation of this study is that it describes a comparative analysis using public data. We have found and reported ADCC pedigree with SNPs of the CRYBA4 gene. In our previous study, we found a new CRYBA4 nsSNP (c.169T>C, p.F57L) in an ADCC pedigree. We used methods in our article to predict the pathogenicity of this nsSNP and predicted it to be pathogenic. We have conducted experiments to confirm its pathogenicity. We used the methods mentioned to predict its pathogenicity and conducted experiments to explore the relationship between the SNPs and ADCC. Second, although we tried to improve the accuracy of our prediction by using multiple pathogenic predictors and analyzing their possible impacts on protein structure and properties, there were still several pitfalls for their false positive and negative predictions. The six bioinformatics tools were chosen based on their high accuracy and frequency of use. However, the six tools in the first stage use somewhat overlapping concepts to make their predictions. They rely heavily on conservation and other easy methods to calculate differences between amino acids. More information on ADCC pedigree should be collected to train and improve the accuracy of predicting tools. Third, the population allele frequencies should be taken into consideration. According to our clinical experience, mutations of the same gene can result in distinct cataract phenotypes. They may be neglected if mild opacity does not seriously affect visual acuity and appearance. Therefore, the disease may be missed and the incidence of congenital cataracts would not be reported. This problem requires the attention of ophthalmologists and researchers, and is one of the main purposes of our research.

Conclusions

In conclusion, our study provides not only predictive results of pathogenicity, but also the analysis of protein properties and structural changes by nsSNPs, allowing us to select 44 CRYBA4 high-risk pathogenic nsSNPs that are likely to have functional impact on the CRYBA4 gene and contribute to the pathogenesis of ADCC. Among these, there are four nsSNPs (c.283C>T, p.R95W; c.449T>A, p.V150D; c.475G>A, p.G159R; c.575G>C, p.R192P) that should be focused on because of their high potential pathogenicity predicted using bioinformatics tools. This computational study provides a theoretical basis for the molecular mechanism of ADCC caused by CRYBA4 gene mutation. This cost-effective and easy method of analyzing the predicted deleterious nsSNPs of the CRYBA4 gene has important reference value and will contribute to further studies of the treatment of CRYBA4 related ADCC.

Supporting information

S1 File [xlsx]
RAW Data of nsSNPs predictions.

Zdroje

1. Sheeladevi S, Lawrenson JG, Fielder AR, Suttle CM. Global prevalence of childhood cataract: a systematic review. Eye (Lond). 2016;30(9):1160–9.

2. Shiels A, Hejtmancik JF. Mutations and mechanisms in congenital and age-related cataracts. Exp Eye Res. 2017;156 : 95–102. doi: 10.1016/j.exer.2016.06.011 27334249

3. Pichi F, Lembo A, Serafino M, Nucci P. Genetics of Congenital Cataract. Dev Ophthalmol. 2016;57 : 1–14. doi: 10.1159/000442495 27043388

4. Gilbert C, Foster A. Childhood blindness in the context of VISION 2020—the right to sight. Bull World Health Organ. 2001;79(3):227–32. 11285667

5. Yi J, Yun J, Li ZK, Xu CT, Pan BR. Epidemiology and molecular genetics of congenital cataracts. Int J Ophthalmol. 2011;4(4):422–32. doi: 10.3980/j.issn.2222-3959.2011.04.20 22553694

6. Huang B, He W. Molecular characteristics of inherited congenital cataracts. Eur J Med Genet. 2010;53(6):347–57. doi: 10.1016/j.ejmg.2010.07.001 20624502

7. Santana A, Waiswo M. The genetic and molecular basis of congenital cataract. Arq Bras Oftalmol. 2011;74(2):136–42. doi: 10.1590/s0004-27492011000200016 21779674

8. Gill D, Klose R, Munier FL, McFadden M, Priston M, Billingsley G, et al. Genetic heterogeneity of the Coppock-like cataract: a mutation in CRYBB2 on chromosome 22q11.2. Invest Ophthalmol Vis Sci. 2000;41(1):159–65. 10634616

9. Graw J. Genetics of crystallins: cataract and beyond. Exp Eye Res. 2009;88(2):173–89. doi: 10.1016/j.exer.2008.10.011 19007775

10. Lampi KJ, Ma Z, Shih M, Shearer TR, Smith JB, Smith DL, et al. Sequence analysis of betaA3, betaB3, and betaA4 crystallins completes the identification of the major proteins in young human lens. J Biol Chem. 1997;272(4):2268–75. doi: 10.1074/jbc.272.4.2268 8999933

11. Billingsley G, Santhiya ST, Paterson AD, Ogata K, Wodak S, Hosseini SM, et al. CRYBA4, a novel human cataract gene, is also involved in microphthalmia. Am J Hum Genet. 2006;79(4):702–9. doi: 10.1086/507712 16960806

12. Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998;8(12):1229–31. doi: 10.1101/gr.8.12.1229 9872978

13. Sun W, Xiao X, Li S, Guo X, Zhang Q. Exome sequencing of 18 Chinese families with congenital cataracts: a new sight of the NHS gene. PLoS One. 2014;9(6):e100455. doi: 10.1371/journal.pone.0100455 24968223

14. Zhou G, Zhou N, Hu S, Zhao L, Zhang C, Qi Y. A missense mutation in CRYBA4 associated with congenital cataract and microcornea. Mol Vis. 2010;16 : 1019–24. 20577656

15. Kumar M, Agarwal T, Kaur P, Kumar M, Khokhar S, Dada R. Molecular and structural analysis of genetic variations in congenital cataract. Mol Vis. 2013;19 : 2436–50. 24319337

16. Chen J, Wang Q, Cabrera PE, Zhong Z, Sun W, Jiao X, et al. Molecular Genetic Analysis of Pakistani Families With Autosomal Recessive Congenital Cataracts by Homozygosity Screening. Invest Ophthalmol Vis Sci. 2017;58(4):2207–17. doi: 10.1167/iovs.17-21469 28418495

17. Li J, Leng Y, Han S, Yan L, Lu C, Luo Y, et al. Clinical and genetic characteristics of Chinese patients with familial or sporadic pediatric cataract. Orphanet J Rare Dis. 2018;13(1):94. doi: 10.1186/s13023-018-0828-0 29914532

18. Li J, Zhao T, Zhang Y, Zhang K, Shi L, Chen Y, et al. Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Res. 2018;46(15):7793–804. doi: 10.1093/nar/gky678 30060008

19. Luxembourg B, D'Souza M, Korber S, Seifried E. Prediction of the pathogenicity of antithrombin sequence variations by in silico methods. Thromb Res. 2015;135(2):404–9. doi: 10.1016/j.thromres.2014.11.022 25496998

20. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11. doi: 10.1093/nar/29.1.308 11125122

21. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–D7. doi: 10.1093/nar/gkx1153 29165669

22. Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet. 2017;136(6):665–77. doi: 10.1007/s00439-017-1779-6 28349240

23. Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45(D1):D833–D9. doi: 10.1093/nar/gkw943 27924018

24. Pejaver V UJ, Lugo-Martinez J, Pagel KA, Lin GN, Nam HJ, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. MutPred2: inferring the molecular and phenotypic impact of amino acid variants.

25. Tang H, Thomas PD. PANTHER-PSEP: predicting disease-causing genetic variants using position-specific evolutionary preservation. Bioinformatics. 2016;32(14):2230–2. doi: 10.1093/bioinformatics/btw222 27193693

26. Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33(Web Server issue):W306–10. doi: 10.1093/nar/gki375 15980478

27. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9. doi: 10.1038/nmeth0410-248 20354512

28. Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745–7. doi: 10.1093/bioinformatics/btv195 25851949

29. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012;40(Web Server issue):W452–7. doi: 10.1093/nar/gks539 22689647

30. Capriotti E, Calabrese R, Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics. 2006;22(22):2729–34. doi: 10.1093/bioinformatics/btl423 16895930

31. Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins. 2006;62(4):1125–32. doi: 10.1002/prot.20810 16372356

32. Savojardo C, Fariselli P, Martelli PL, Casadio R. INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics. 2016;32(16):2542–4. doi: 10.1093/bioinformatics/btw192 27153629

33. Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44(W1):W344–50. doi: 10.1093/nar/gkw408 27166375

34. Geourjon C, Deleage G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci. 1995;11(6):681–4. doi: 10.1093/bioinformatics/11.6.681 8808585

35. Fraczkiewicz RaB W. Exact and Efficient Analytical Calculation of the Accessible Surface Areas and Their Gradients for Macromolecules. J Comp Chem. 1998;19 : 319–33.

36. Moritz C, Brown WM. Tandem duplications in animal mitochondrial DNAs: variation in incidence and gene content among lizards. Proc Natl Acad Sci U S A. 1987;84(20):7183–7. doi: 10.1073/pnas.84.20.7183 3478691

37. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–7. doi: 10.1093/nar/gki033 15608251

38. Chen X, Sullivan PF. Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. Pharmacogenomics J. 2003;3(2):77–96. doi: 10.1038/sj.tpj.6500167 12746733

39. Bloemendal H, de Jong W, Jaenicke R, Lubsen NH, Slingsby C, Tardieu A. Ageing and vision: structure, stability and function of lens crystallins. Prog Biophys Mol Biol. 2004;86(3):407–85. doi: 10.1016/j.pbiomolbio.2003.11.012 15302206

40. Hejtmancik JF. Congenital cataracts and their molecular genetics. Semin Cell Dev Biol. 2008;19(2):134–49. doi: 10.1016/j.semcdb.2007.10.003 18035564

41. Aguayo-Ortiz R, Gonzalez-Navejas A, Palomino-Vizcaino G, Rodriguez-Meza O, Costas M, Quintanar L, et al. Thermodynamic Stability of Human gammaD-crystallin Mutants Using Alchemical Free Energy Calculations. J Phys Chem B. 2019.

42. Khan S, Vihinen M. Performance of protein stability predictors. Hum Mutat. 2010;31(6):675–84. doi: 10.1002/humu.21242 20232415

43. Arcanjo C, Armant O, Floriani M, Cavalie I, Camilleri V, Simon O, et al. Tritiated water exposure disrupts myofibril structure and induces mis-regulation of eye opacity and DNA repair genes in zebrafish early life stages. Aquat Toxicol. 2018;200 : 114–26. doi: 10.1016/j.aquatox.2018.04.012 29751158

44. Li W, Ji Q, Wei Z et al., Biochemical characterization of G64W mutant of acidic beta-crystallin 4. Exp Eye Res, 2019. 186 : 107712. doi: 10.1016/j.exer.2019.107712 31254514

45. Zhai Y, Li J, Yu W et al., Targeted Exome Sequencing of Congenital Cataracts Related Genes: Broadening the Mutation Spectrum and Genotype-Phenotype Correlations in 27 Chinese Han Families. Sci Rep, 2017. 7(1):1219. doi: 10.1038/s41598-017-01182-9 28450710

Top novinky

Nové kurzy

Top články

Nové číslo

Top novinky

Nejčtenější

Nová videa

Nová videa

Nové podcasty

Doporučené pozice

Top novinky

In silico analysis and high-risk pathogenic phenotype predictions of non-synonymous single nucleotide polymorphisms in human Crystallin beta A4 gene associated with congenital cataract

Summary

Keywords:

Introduction

Materials & methods

Nonsynonymous SNP data retrieval

Prediction of pathogenicity of nsSNPs

Prediction of protein stability

Evolutionary conservation analysis of nsSNPs and prediction of protein structure and property changes

Results

Collection and collation of nsSNP data

Prediction of pathogenicity of nsSNPs

Prediction of protein stability

Evolutionary Conservation Analysis and Protein Structure Analysis

Protein property analysis

Discussion

Conclusions

Supporting information

Zdroje

PLOS One

Svět praktické medicíny 2/2026 (znalostní test z časopisu)

Citikolin v neuroprotekci a neuroregeneraci – nové poznatky

Revma Focus: Spondyloartritidy

Denzitometrie v praxi: od kvalitního snímku po správnou interpretaci

Čelistně-ortodontické kazuistiky od A do Z