A new resolution function to evaluate tree shape statistics

Autoři: Maryam Hayati aff001;  Bita Shadgar aff001;  Leonid Chindelevitch aff001
Působiště autorů: School of Computing Science, Simon Fraser University, Burnaby, BC, Canada aff001
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224197


Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.

Klíčová slova:

Computing methods – Eigenvalues – Epidemiological statistics – Leaves – Phylogenetic analysis – Phylogenetics – Speciation – Viral evolution


1. Steel M, Mckenzie A. Properties of phylogenetic trees generated by Yule-type speciation models. Mathematical Biosciences. 2001;170:91–112. doi: 10.1016/s0025-5564(00)00061-4 11259805

2. Purvis A. Using interspecies phylogenies to test macroevolutionary hypotheses. In: New Uses for New Phylogenies. Oxford University Press; 1996. p. 153–168.

3. Blum MG, François O. On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. Mathematical biosciences. 2005;195(2):141–153. doi: 10.1016/j.mbs.2005.03.003 15893336

4. Felsenstein J. Inferring phylogenies. 2nd ed. Sinauer Associates Sunderland; 2003.

5. Shao KT. Tree balance. Systematic Zoology. 1990;39(3):266–276. doi: 10.2307/2992186

6. Kirkpatrick M, Slatkin M. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution. 1993;47(4):1171–1181. doi: 10.1111/j.1558-5646.1993.tb02144.x

7. Aldous DJ. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statistical Science. 2001; p. 23–34. doi: 10.1214/ss/998929474

8. Blum MG, François O. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Systematic Biology. 2006;55(4):685–691. doi: 10.1080/10635150600889625 16969944

9. Mooers AO, Heard SB. Inferring evolutionary process from phylogenetic tree shape. Quarterly Review of Biology. 1997; p. 31–54. doi: 10.1086/419657

10. Pompei S, Loreto V, Tria F. Phylogenetic properties of RNA viruses. PLoS One. 2012;7(9):e44849. doi: 10.1371/journal.pone.0044849 23028645

11. Stich M, Manrubia S. Topological properties of phylogenetic trees in evolutionary models. The European Physical Journal B. 2009;70(4):583–592. doi: 10.1140/epjb/e2009-00254-8

12. Sackin MJ. “Good” and “Bad” Phenograms. Systematic Zoology. 1972;21(2):225–226. doi: 10.2307/2412292

13. Colless DH. Relative symmetry of cladograms and phenograms: an experimental study. Systematic Biology. 1995;. doi: 10.2307/2413487

14. Agapow PM, Purvis A. Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. Systematic Biology. 2002;51(6):866–872. doi: 10.1080/10635150290102564 12554452

15. Purvis A, Katzourakis A, Agapow PM. Evaluating phylogenetic tree shape: two modifications to Fusco & Cronk’s method. Journal of Theoretical Biology. 2002;214(1):99–103. doi: 10.1006/jtbi.2001.2443 11786035

16. Purvis A, Agapow PM. Phylogeny imbalance: taxonomic level matters. Systematic Biology. 2002;51(6):844–854. doi: 10.1080/10635150290102546 12554450

17. Fusco G, Cronk QC. A new method for evaluating the shape of large phylogenies. Journal of Theoretical Biology. 1995;175(2):235–243. doi: 10.1006/jtbi.1995.0136

18. McKenzie A, Steel M. Distributions of cherries for two models of trees. Mathematical Biosciences. 2000;164(1):81–92. doi: 10.1016/s0025-5564(99)00060-7 10704639

19. Harding E. The probabilities of rooted tree-shapes generated by random bifurcation. Advances in Applied Probability. 1971;3(1):44–77. doi: 10.2307/1426329

20. Udny Yule G. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, F. R. S. Philosophical Transactions of the Royal Society of London Series B. 1925;213:21–87. doi: 10.1098/rstb.1925.0002

21. Rogers JS. Response of Colless’s Tree Imbalance to Number of Terminal Taxa. Systematic Biology. 1993;42(1):102–105. doi: 10.1093/sysbio/42.1.102

22. Mir A, Rosselló F, Rotger L. A new balance index for phylogenetic trees. Mathematical Biosciences. 2013;241(1):125–136. doi: 10.1016/j.mbs.2012.10.005 23142312

23. Colijn C, Gardy J. Phylogenetic tree shapes resolve disease transmission patterns. Evol Med Public Health. 2014; p. 96–108. doi: 10.1093/emph/eou018 24916411

24. Leventhal GE, Kouyos R, Stadler T, Von Wyl V, Yerly S, Böni J, et al. Inferring epidemic contact structure from phylogenetic trees. PLoS computational biology. 2012;8(3):e1002413. doi: 10.1371/journal.pcbi.1002413 22412361

25. Frost SD, Volz EM. Modelling tree shape and structure in viral phylodynamics. Phil Trans R Soc B. 2013;368(1614):20120208. doi: 10.1098/rstb.2012.0208 23382430

26. Neher RA, Russell CA, Shraiman BI. Predicting evolution from the shape of genealogical trees. Elife. 2014;3:e03568. doi: 10.7554/eLife.03568

27. Hayati M, Biller P, Colijn C. Predicting the short-term success of human influenza A variants with machine learning. bioRxiv. 2019;.

28. Matsen FA. A Geometric Approach to Tree Shape Statistics. Systematic Biology. 2006;55(4):652–661. doi: 10.1080/10635150600889617 16969941

29. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.

30. DasGupta B, He X, Jiang T, Li M, Tromp J, Zhang L. On computing the nearest neighbor interchange distance. In: Discrete Mathematical Problems with Medical Applications. vol. 55. American Mathematical Soc.; 2000. p. 125–143.

31. Schliep KP. Phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–593. doi: 10.1093/bioinformatics/btq706 21169378

32. R Development Core Team. R: A Language and Environment for Statistical Computing; 2008.

33. Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;Complex Systems:1695.

34. Brown C. hash: Full feature implementation of hash/associated arrays/dictionaries; 2013. Available from: https://CRAN.R-project.org/package=hash.

35. Bortolussi N, Durand E, Blum M, François O. apTreeshape: Analyses of Phylogenetic Treeshape; 2012. Available from: https://CRAN.R-project.org/package=apTreeshape.

36. Qiu Y, Mei J. RSpectra: Solvers for Large-Scale Eigenvalue and SVD Problems. R package version 0.15-0. 2019. Available from: https://CRAN.R-project.org/package=RSpectra.

37. Chasalow S. combinat: combinatorics utilities; 2012. Available from: https://CRAN.R-project.org/package=combinat.

38. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 2004;20:289–290. doi: 10.1093/bioinformatics/btg412 14734327

39. Guattery S, Miller GL. Graph embeddings and Laplacian eigenvalues. SIAM Journal on Matrix Analysis and Applications. 2000;21(3):703–723. doi: 10.1137/S0895479897329825

40. Golub GH, Van Loan CF. Matrix computations. 3rd ed. JHU Press; 2012.

41. Fiedler M. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal. 1973;23(2):298–305.

Článek vyšel v časopise


2019 Číslo 11