Clustering of the structures by using “snakes-&-dragons” approach, or correlation matrix as a signal

Autoři: Victor P. Andreev aff001;  Gang Liu aff001;  Jarcy Zee aff001;  Lisa Henn aff001;  Gilberto E. Flores aff002;  Robert M. Merion aff001
Působiště autorů: Arbor Research Collaborative for Health, Ann Arbor, Michigan, United States of America aff001;  Department of Biology, California State University, Northridge, California, United States of America aff002
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article


Biological, ecological, social, and technological systems are complex structures with multiple interacting parts, often represented by networks. Correlation matrices describing interdependency of the variables in such structures provide key information for comparison and classification of such systems. Classification based on correlation matrices could supplement or improve classification based on variable values, since the former reveals similarities in system structures, while the latter relies on the similarities in system states. Importantly, this approach of clustering correlation matrices is different from clustering elements of the correlation matrices, because our goal is to compare and cluster multiple networks–not the nodes within the networks. A novel approach for clustering correlation matrices, named “snakes-&-dragons,” is introduced and illustrated by examples from neuroscience, human microbiome, and macroeconomics.

Klíčová slova:

Functional magnetic resonance imaging – Hierarchical clustering – Macroeconomics – Microbiome – Shannon index – Snakes – Tongue – Consensus clustering


1. Duda RO, Hart PE, Stork DG. Pattern classification, 2nd ed. 2001. Wiley, New York.

2. Roff DA, Mousseau TA, Howard DJ. Variation in genetic architecture of calling song among populations of Allonemobius socius, A. fasciatus and a hybrid population: drift or selection? Evolution. 1999; 53:216–224. doi: 10.1111/j.1558-5646.1999.tb05347.x 28565178

3. Cheverud JM. Quantitative genetic analysis of cranial morphology in the cotton-top (Saguinus oedipus) and saddle-back (S. fuscicollis) tamarins. J Evol Biol. 1996; 9:5–42

4. Pielou EC. Probing multivariate data with random skewers: a preliminary to direct gradient analysis. Oikos. 1984; 42:161–165.

5. Garcia C. A simple procedure for the comparison of covariance matrices. BMC Evol Biol. 2012; 12:222. doi: 10.1186/1471-2148-12-222 23171139

6. Goodnight CJ, Schwartz JM. A bootstrap comparison of genetic covariance matrices. Biometrics. 1997; 53:1026–1039.

7. Calsbeek B, Goodnight CJ. Empirical comparison of G matrix test statistics: Finding biologically relevant change. Evolution. 2009; 63:2627–2635. doi: 10.1111/j.1558-5646.2009.00735.x 19490079

8. Phillips PC, Arnold SJ. Hierarchical comparison of genetic variance-covariance matrices. I. Using the Flury hierarchy. Evolution. 1999; 53:1506–1515. doi: 10.1111/j.1558-5646.1999.tb05414.x 28565553

9. Flury B. Common principal components and related multivariate models. 1988. John Wiley & Sons.

10. Haber A. A comparative analysis of integration indices. Evol Biol. 2011; 38:476–488.

11. Barabasi A-L, Oltvai ZN. Network Biology: Understanding the cell’s functional organization. Nature Reviews. Genetics. 2004; 5:101. doi: 10.1038/nrg1272 14735121

12. Holmes AJ, Hollinshead M, O’Keefe TM, Petrov VI, Fariello GR, Wald LL, et al. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures. Scientific data. 2015; 2: 150031. doi: 10.1038/sdata.2015.31 26175908

13. Flores GE, Caporaso JG, Henley JB, Rideout JR, Domogala D, Chase J, et al. Temporal variability is a personalized feature of the human microbiome. Genome Biology. 2014; 15:531. doi: 10.1186/s13059-014-0531-y 25517225

14. The World Bank 2016. World development indicators. Washington, DC: The World Bank (producer and distributor). Available at: Accessed 9/21/16.

15. Calinski RB, Harabasz J. A dendrite method for cluster analysis. Commun Stat. 1974; 3:1–27.

16. Rouseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987; 20(1):53–65.

17. Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001; 17:107.

18. Andreev VP, Gillespie BW, Helfand BT, Merion RM. Misclassification errors in unsupervised classification methods. Comparison based on the simulation of targeted proteomics data. J Proteomics Bioinform. 2016; S14:005.

19. Liao TW. Clustering of time series data -a survey. Pattern Recognit. 2005; 38:1857–1874.

20. Biswal B, Yetkin FZ, Haughton VM, Hyde JS. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn Reson Med. 1995; 34: 537–541. doi: 10.1002/mrm.1910340409 8524021

21. Uddin LQ, Menon V. Introduction to special topic–resting state brain activity: implications for systems neuroscience. Frontiers in Systems Neuroscience. 2010; 4: 5–6. doi: 10.3389/neuro.06.005.2010

22. Fox MD, Greicius M. Clinical applications of resting state functional connectivity. Frontiers in Systems Neuroscience. 2010; 4:126–134.

23. Andrews-Hanna JR, Snyder A Z, Vincent JL, Lustig C, Head D, Raichle ME, Buckner RL. Disruption of large-scale brain systems in advanced aging. Neuron. 2007; 56: 924–935. doi: 10.1016/j.neuron.2007.10.038 18054866

24. Langan J, Peltier SJ, Bo J, Fling BW, Welsh RC, Seidler RD. Functional implications of age differences in motor system connectivity. Frontiers in Systems Neuroscience. 2010; 4:78–88.

25. Hacker CD, Laumann TO, Szrama NP, Baldassarre A, Snyder AZ, Leuthardt EC, et al. Resting state network estimation in individual subjects. Neuroimage. 2013; 82:616–633. doi: 10.1016/j.neuroimage.2013.05.108 23735260

26. Fox MD, Raichle ME. Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat Rev Neurosci. 2007; 8: 700–711. doi: 10.1038/nrn2201 17704812

27. Power JD, Mitra A, Laumann TO, Snyder AZ, Schlaggar BL, Petersen SE. Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage. 2014;84:320–41. doi: 10.1016/j.neuroimage.2013.08.048 23994314

28. Cheverud JM, Marroig G. Comparing covariance matrices: random skewers method compared to the common principal components model. Genet Mol Biol 2007; 30(2):461–469.

29. Sun SY, Liu ZP, Zeng T, Wang Y, Chen L. Spatio-temporal analysis of type 2 diabetes mellitus based on differential expression networks. Scientific Reports. 2013; 3:2268. doi: 10.1038/srep02268 23881262

30. Albert R, Barabasi AL. Statistical mechanics of complex networks. Reviews of Modern Physics. 2002; 74:47–97.

31. Dolnicar S. A review of unquestioned standards in using cluster analysis for data-driven market segmentation. CD Conference Proceedings of the Australian and New Zealand Marketing Academy Conference 2002 (ANZMAC 2002). Deakin University, Melbourne, December 2–4, 2002.

32. Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning. 2003; 52:91–118.

33. Rasero J, Pellicoro M, Angelini L, Cortes JM, Marinazzo D, Stramaglia S. Consensus clustering approach to group brain connectivity matrices. Network Neuroscience, 2017,1 (3): 242–253 doi: 10.1162/NETN_a_00017 29601048

34. Rasero J, Diez I, Cortes JM, Stramaglia S. Connectome sorting by consensus clustering increases separability in group neuroimaging studies. Network Neuroscience, 2019, 3(2): 325–343. doi: 10.1162/netn_a_00074 30793085

35. Schafer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol. 2005; 4:32.

36. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J Royal Statistical Society, Ser B. 1995; 57:289–300.

37. Magurran A. Measuring Biological Diversity. Oxford: Blackwell Publishing; 2004.

38. Turnbaugh PJ, Ley RE, Hamady M, Frazer-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007; 449:804–10. doi: 10.1038/nature06244 17943116

39. Costello EK, Lauber CL, Hamady M, Frierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science. 2009; 326: 1694–7. doi: 10.1126/science.1177486 19892944

40. Dekker J, Marti-Renom MA, Mirny LA. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet. 2013; 14(6):390–403. doi: 10.1038/nrg3454 23657480

41. Yaffe E, Tanay A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature Genetics. 2011; 43:1059–1065. doi: 10.1038/ng.947 22001755

42. Flot J-F, Marie-Nelly H, Koszul R. Contact genomics: scaffolding and phasing (meta)genomes using chromosome 3D physical signatures. FEBS Letters. 2015; 589: 2966–2974. doi: 10.1016/j.febslet.2015.04.034 25935414


Článek vyšel v časopise


2019 Číslo 10
Nejčtenější tento týden