The protein architecture in Bacteria and Archaea identifies a set of promiscuous and ancient domains

Autoři: Rafael Hernandez-Guerrero aff001;  Edgardo Galán-Vásquez aff002;  Ernesto Pérez-Rueda aff001
Působiště autorů: Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México aff001;  Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Ciudad Universitaria, Universidad Nacional Autónoma de México, Ciudad de México, México aff002;  Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile aff003
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
doi: 10.1371/journal.pone.0226604


In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.

Klíčová slova:

Bacterial genomics – Comparative genomics – Functional genomics – Gene prediction – Genomic signal processing – Hydrolases – Paleogenetics – Protein domains


1. Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973;70(3):697–701. doi: 10.1073/pnas.70.3.697 4351801; PubMed Central PMCID: PMC433338.

2. Tordai H, Nagy A, Farkas K, Banyai L, Patthy L. Modules, multidomain proteins and organismic complexity. FEBS J. 2005;272(19):5064–78. doi: 10.1111/j.1742-4658.2005.04917.x 16176277.

3. Del Sol A, Arauzo-Bravo MJ, Amoros D, Nussinov R. Modular architecture of protein structures and allosteric communications: potential implications for signaling proteins and regulatory linkages. Genome Biol. 2007;8(5):R92. doi: 10.1186/gb-2007-8-5-r92 17531094; PubMed Central PMCID: PMC1929157.

4. Doolittle RF. The multiplicity of domains in proteins. Annu Rev Biochem. 1995;64:287–314. doi: 10.1146/ 7574483.

5. Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001;310(2):311–25. doi: 10.1006/jmbi.2001.4776 11428892.

6. Wang M, Caetano-Anolles G. Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol. 2006;23(12):2444–54. doi: 10.1093/molbev/msl117 16971695.

7. Wuchty S. Scale-free behavior in protein domain networks. Mol Biol Evol. 2001;18(9):1694–702. doi: 10.1093/oxfordjournals.molbev.a003957 11504849.

8. Ye Y, Godzik A. Comparative analysis of protein domain organization. Genome Res. 2004;14(3):343–53. doi: 10.1101/gr.1610504 14993202; PubMed Central PMCID: PMC535408.

9. Rivera-Gomez N, Martinez-Nunez MA, Pastor N, Rodriguez-Vazquez K, Perez-Rueda E. Dissecting the protein architecture of DNA-binding transcription factors in bacteria and archaea. Microbiology. 2017;163(8):1167–78. doi: 10.1099/mic.0.000504 28777072.

10. Perez-Rueda E, Hernandez-Guerrero R, Martinez-Nunez MA, Armenta-Medina D, Sanchez I, Ibarra JA. Abundance, diversity and domain architecture variability in prokaryotic DNA-binding transcription factors. PLoS One. 2018;13(4):e0195332. doi: 10.1371/journal.pone.0195332 29614096; PubMed Central PMCID: PMC5882156.

11. Moreno-Hagelsieb G, Wang Z, Walsh S, ElSherbiny A. Phylogenomic clustering for selecting non-redundant genomes for comparative genomics. Bioinformatics. 2013;29(7):947–9. doi: 10.1093/bioinformatics/btt064 23396122.

12. Pandurangan AP, Stahlhacke J, Oates ME, Smithers B, Gough J. The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver. Nucleic Acids Res. 2019;47(D1):D490–D4. doi: 10.1093/nar/gky1130 30445555; PubMed Central PMCID: PMC6324026.

13. Wheeler TJ, Eddy SR. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013;29(19):2487–9. doi: 10.1093/bioinformatics/btt403 23842809; PubMed Central PMCID: PMC3777106.

14. Lee B, Lee D. Protein comparison at the domain architecture level. BMC Bioinformatics. 2009;10 Suppl 15:S5. doi: 10.1186/1471-2105-10-S15-S5 19958515; PubMed Central PMCID: PMC2788356.

15. Caetano-Anolles G, Wang M, Caetano-Anolles D, Mittenthal JE. The origin, evolution and structure of the protein world. Biochem J. 2009;417(3):621–37. doi: 10.1042/BJ20082063 19133840.

16. Hegyi H, Lin J, Greenbaum D, Gerstein M. Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds. Proteins. 2002;47(2):126–41. doi: 10.1002/prot.10078 11933060.

17. Nee S, Holmes EC, May RM, Harvey PH. Extinction rates can be estimated from molecular phylogenies. Philos Trans R Soc Lond B Biol Sci. 1994;344(1307):77–82. doi: 10.1098/rstb.1994.0054 8878259.

18. Koonin EV, Wolf YI, Karev GP. The structure of the protein universe and genome evolution. Nature. 2002;420(6912):218–23. doi: 10.1038/nature01256 12432406.

19. Vogel C, Teichmann SA, Pereira-Leal J. The relationship between domain duplication and recombination. J Mol Biol. 2005;346(1):355–65. doi: 10.1016/j.jmb.2004.11.050 15663950.

20. Bobay LM, Ochman H. The Evolution of Bacterial Genome Architecture. Front Genet. 2017;8:72. doi: 10.3389/fgene.2017.00072 28611826; PubMed Central PMCID: PMC5447742.

21. Leipe DD, Koonin EV, Aravind L. STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. J Mol Biol. 2004;343(1):1–28. doi: 10.1016/j.jmb.2004.08.023 15381417.

22. Walker JE, Saraste M, Runswick MJ, Gay NJ. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–51. 6329717; PubMed Central PMCID: PMC553140.

23. Gajiwala KS, Burley SK. Winged helix proteins. Curr Opin Struct Biol. 2000;10(1):110–6. doi: 10.1016/s0959-440x(99)00057-3 10679470.

24. Aravind L, Anantharaman V, Balaji S, Babu MM, Iyer LM. The many faces of the helix-turn-helix domain: transcription regulation and beyond. FEMS Microbiol Rev. 2005;29(2):231–62. doi: 10.1016/j.femsre.2004.12.008 15808743.

25. Brennan RG. The winged-helix DNA-binding motif: another helix-turn-helix takeoff. Cell. 1993;74(5):773–6. doi: 10.1016/0092-8674(93)90456-z 8374950.

26. Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muniz-Rascado L, Garcia-Sotelo JS, et al. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res. 2016;44(D1):D133–43. doi: 10.1093/nar/gkv1156 26527724; PubMed Central PMCID: PMC4702833.

Článek vyšel v časopise


2019 Číslo 12