Clustering via hypergraph modularity


Autoři: Bogumił Kamiński aff001;  Valérie Poulin aff002;  Paweł Prałat aff003;  Przemysław Szufel aff001;  François Théberge aff002
Působiště autorů: SGH Warsaw School of Economics, Warsaw, Poland aff001;  The Tutte Institute for Mathematics and Computing, Ottawa, ON, Canada aff002;  Department of Mathematics, Ryerson University, Toronto, ON, Canada aff003
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0224307

Souhrn

Despite the fact that many important problems (including clustering) can be described using hypergraphs, theoretical foundations as well as practical algorithms using hypergraphs are not well developed yet. In this paper, we propose a hypergraph modularity function that generalizes its well established and widely used graph counterpart measure of how clustered a network is. In order to define it properly, we generalize the Chung-Lu model for graphs to hypergraphs. We then provide the theoretical foundations to search for an optimal solution with respect to our hypergraph modularity function. A simple heuristic algorithm is described and applied to a few illustrative examples. We show that using a strict version of our proposed modularity function often leads to a solution where a smaller number of hyperedges get cut as compared to optimizing modularity of 2-section graph of a hypergraph.

Klíčová slova:

Algorithms – Clustering algorithms – Community structure – Computer and information sciences – Finance – Source code – Telecommunications – Transportation


Zdroje

1. Fortunato S. Community detection in graphs, Physics Reports. 2010; 486: 75–174. doi: 10.1016/j.physrep.2009.11.002

2. Girvan M, Newman MEJ. Community structure in social and biological networks. Proceedings of the National Academy of Sciences. 2002; 99: 7821–7826. doi: 10.1073/pnas.122653799

3. Zhang Z, Liu C. A hypergraph model of social tagging networks. Journal of Statistical Mechanics: Theory and Experiment, vol. 2010, no 10, 2010. doi: 10.1088/1742-5468/2010/10/P10005

4. Shepherd MA, Watters CR., Cai Y. Transient hypergraphs for citation networks. Information Processing & Management, 26(3), 395–412, 2010. doi: 10.1016/0306-4573(90)90099-N

5. Gallo G, Longo G, Pallottino Stefano, Nguyen S. Directed hypergraphs and applications. Discrete applied mathematics, vol. 42, no 2-3, pp 177–201, 1993. doi: 10.1016/0166-218X(93)90045-P

6. Samaga R, Klamt S. Modeling approaches for qualitative and semi-quantitative analysis of cellular signaling networks. Cell communication and signaling, vol. 11 no 1, 2013. doi: 10.1186/1478-811X-11-43

7. Sarkar S, Sivarajan KN. Hypergraph models for cellular mobile communication systems. IEEE Transactions on Vehicular Technology, 47(2), 460–471, 1998. doi: 10.1109/25.669084

8. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys. Rev. E. 2004; 69: 026–113. doi: 10.1103/PhysRevE.69.066133

9. Chung F, Lu L, Connected components in random graphs with given expected degree sequence. Annals of Combinatorics. 2002; 6: 125–145. doi: 10.1007/PL00012580

10. Zhou D, Huang J, Scholkopf B. Learning with hypergraphs: clustering, classification and embedding. Advances in neural information processing systems. 2006; 19:1601–1608.

11. Shi J, Malik J. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence. 2000; 22: 888–905. doi: 10.1109/34.868688

12. Rodriguez JA. On the Laplacian spectrum and walk-regular hypergraphs. Linear and Multi-linear Algebra. 2003; 51: 285–297. doi: 10.1080/0308108031000084374

13. Zien J, Schlag M, Chan P. Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 1999; 18: 1389–1399. doi: 10.1109/43.784130

14. Agarwal S, Lim J, Zelnik-Manor L, Perona P, Kriegman D, Belongie S. Beyond pairwise clustering. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005; 2: 838–845.

15. Agarwal S, Branson K, Belongie S. Higher Order Learning with Graphs, Proc. Int’l Conf. Machine Learning. 2006; 148: 17–24.

16. Voloshim VI, Introduction to Graph and Hypergraph Theory. Nova Kroshka Books; 2013.

17. Karypis G, Kumar V, Multilevel K-Way Hypergraph Partitioning. VLSI Design. 2000; 11: 285–300. doi: 10.1155/2000/19436

18. Chien I, Chung-Yi L, I-Hsiang W. Community detection in hypergraphs: Optimal statistical limit and efficient algorithms. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics. 2018: 84: 871–879.

19. GitHub gist repository, https://gist.github.com/pszufe/02666497d2c138d1b2de5b7f67784d2b.

20. Chung FRK, Lu L. Complex Graphs and Networks. American Mathematical Society; 2006.

21. Seshadhri C, Kolda TG, Pinar A. Community structure and scale-free collections of Erdös-Rényi graphs. Physical Review E. 2012; 85: 056109. doi: 10.1103/PhysRevE.85.056109

22. Kolda TG, Pinar A, Plantenga T, Seshadhri C. A scalable generative graph model with community structure. SIAM Journal on Scientific Computing. 2014; 36: C424–C452. doi: 10.1137/130914218

23. Winlaw M, DeSterck H, Sanders G. An In-Depth Analysis of the Chung-Lu Model. Lawrence Livermore Technical Report LLNL-TR-678729. 2015.

24. Newman M. Networks: An Introduction. Oxford University Press; 2010.

25. Fortunato S, Barthelemy M. Resolution limit in community detection. Proc. Natl. Acad. Sci. USA. 2007: 104: 36–41. doi: 10.1073/pnas.0605965104 17190818

26. Clauset A, Newman MEJ, Moore C. Finding community structure in very large networks. Phys. Rev. E. 2004; 70: 066111. doi: 10.1103/PhysRevE.70.066111

27. Lancichinetti A, Fortunato S. Limits of modularity maximization in community detection. Phys. Rev. E. 2011; 84: 066122. doi: 10.1103/PhysRevE.84.066122

28. Newman MEJ. Fast algorithm for detecting community structure in networks. Phys. Rev. E. 2004; 69: 066133. doi: 10.1103/PhysRevE.69.066133

29. McDiarmid C, Skerman F. Modularity in random regular graphs and lattices. Electronic Notes in Discrete Mathematics. 2013; 43: 431–437. doi: 10.1016/j.endm.2013.07.063

30. McDiarmid C, Skerman F. Modularity of tree-like and random regular graphs. Oxford Journal of Complex Networks. 2018; 6: 596–619. doi: 10.1093/comnet/cnx046

31. Ostroumova Prokhorenkova L, Prałat P, and Raigorodskii A. Modularity of complex networks models. Internet Mathematics. 2017. doi: 10.24166/im.12.2017

32. McDiarmid C, Skerman F. Modularity of Erdos-Renyi random graphs; 2018. Preprint. Available from: https://arxiv.org/abs/1808.02243. Cited 30 April 2019.

33. Leordeanu M, Sminchisescu C. Efficient Hypergraph Clustering. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. 2012; 22: 676–684.

34. Blondel V.D., Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008; 10: P10008. doi: 10.1088/1742-5468/2008/10/P10008

35. Poulin V, Théberge F. Comparing Graph Clusterings: Set partition measures vs. Graph-aware measures. 2018. Preprint. Available from: https://arxiv.org/abs/1806.11494. Cited 30 April 2019.

36. Newman MEJ. Community detection in networks: Modularity optimization and maximum likelihood are equivalent. Phys. Rev. E. 2016; 94: 052315.

37. Lu X, Szymanski BK. Asymptotic resolution bounds of generalized modularity and statistically significant community detection. 2019. Preprint. Available from: https://arxiv.org/abs/1902.04243. Cited 30 April 2019.


Článek vyšel v časopise

PLOS One


2019 Číslo 11