Exponential random graph model parameter estimation for very large directed networks


Autoři: Alex Stivala aff001;  Garry Robins aff003;  Alessandro Lomi aff001
Působiště autorů: Institute of Computational Science, Università della Svizzera italiana, Lugano, Ticino, Switzerland aff001;  Centre for Transformative Innovation, Swinburne University of Technology, Melbourne, Victoria, Australia aff002;  Melbourne School of Psychological Sciences, The University of Melbourne, Melbourne, Victoria, Australia aff003;  The University of Exeter Business School, The University of Exeter, Exeter, Devon, United Kingdom aff004
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0227804

Souhrn

Exponential random graph models (ERGMs) are widely used for modeling social networks observed at one point in time. However the computational difficulty of ERGM parameter estimation has limited the practical application of this class of models to relatively small networks, up to a few thousand nodes at most, with usually only a few hundred nodes or fewer. In the case of undirected networks, snowball sampling can be used to find ERGM parameter estimates of larger networks via network samples, and recently published improvements in ERGM network distribution sampling and ERGM estimation algorithms have allowed ERGM parameter estimates of undirected networks with over one hundred thousand nodes to be made. However the implementations of these algorithms to date have been limited in their scalability, and also restricted to undirected networks. Here we describe an implementation of the recently published Equilibrium Expectation (EE) algorithm for ERGM parameter estimation of large directed networks. We test it on some simulated networks, and demonstrate its application to an online social network with over 1.6 million nodes.

Klíčová slova:

Algorithms – Directed graphs – Network analysis – Network reciprocity – Random graphs – Research errors – Social networks – Statistical distributions


Zdroje

1. Lusher D, Koskinen J, Robins G, editors. Exponential random graph models for social networks. Structural Analysis in the Social Sciences. New York: Cambridge University Press; 2013.

2. Amati V, Lomi A, Mira A. Social network modeling. Annu Rev Stat Appl. 2018;5:343–369. doi: 10.1146/annurev-statistics-031017-100746

3. Corander J, Dahmström K, Dahmström P. Maximum likelihood estimation for Markov graphs. Stockholm University, Department of Statistics; 1998. 8.

4. Corander J, Dahmström K, Dahmström P. Maximum likelihood estimation for exponential random graph models. In: Hagberg J, editor. Contributions to social network analysis, information theory, and other topics in statistics; a Festschrift in honour of Ove Frank. Department of Statistics, University of Stockholm; 2002. p. 1–17.

5. Snijders TAB. Markov chain Monte Carlo estimation of exponential random graph models. J Soc Struct. 2002;3(2):1–40.

6. Hunter DR, Handcock MS. Inference in curved exponential family models for networks. J Comput Graph Stat. 2006;15(3):565–583. doi: 10.1198/106186006X133069

7. Robins G, Snijders T, Wang P, Handcock M, Pattison P. Recent developments in exponential random graph (p*) models for social networks. Soc Networks. 2007;29(2):192–215. doi: 10.1016/j.socnet.2006.08.003

8. Caimo A, Friel N. Bayesian inference for exponential random graph models. Soc Networks. 2011;33:41–55. doi: 10.1016/j.socnet.2010.09.004

9. Hummel RM, Hunter DR, Handcock MS. Improving simulation-based algorithms for fitting ERGMs. J Comput Graph Stat. 2012;21(4):920–939. doi: 10.1080/10618600.2012.679224 26120266

10. Hunter DR, Krivitsky PN, Schweinberger M. Computational statistical methods for social network models. J Comput Graph Stat. 2012;21(4):856–882. doi: 10.1080/10618600.2012.732921 23828720

11. Byshkin M, Stivala A, Mira A, Krause R, Robins G, Lomi A. Auxiliary parameter MCMC for exponential random graph models. J Stat Phys. 2016;165(4):740–754. doi: 10.1007/s10955-016-1650-5

12. Byshkin M, Stivala A, Mira A, Robins G, Lomi A. Fast maximum likelihood estimation via Equilibrium Expectation for large network data. Sci Rep. 2018;8:11509. doi: 10.1038/s41598-018-29725-8 30065311

13. Robins G, Pattison P, Wang P. Closure, connectivity and degree distributions: Exponential random graph (p*) models for directed social networks. Soc Networks. 2009;31(2):105–117. doi: 10.1016/j.socnet.2008.10.006

14. Coleman JS. Relational analysis: the study of social organizations with survey methods. Hum Organ. 1958;17(4):28–36. doi: 10.17730/humo.17.4.q5604m676260q8n7

15. Goodman LA. Snowball sampling. Ann Math Stat. 1961;32:148–170. doi: 10.1214/aoms/1177705148

16. Goodman LA. Comment: On respondent-driven sampling and snowball sampling in hard-to-reach populations and snowball sampling not in hard-to-reach populations. Sociol Methodol. 2011;41(1):347–353. doi: 10.1111/j.1467-9531.2011.01242.x

17. Heckathorn DD. Comment: Snowball versus respondent-driven sampling. Sociol Methodol. 2011;41(1):355–366. doi: 10.1111/j.1467-9531.2011.01244.x 22228916

18. Handcock MS, Gile KJ. Comment: On the concept of snowball sampling. Sociol Methodol. 2011;41(1):367–371. doi: 10.1111/j.1467-9531.2011.01243.x

19. Handcock MS, Gile KJ. Modeling social networks from sampled data. Ann Appl Stat. 2010;4(1):5–25. doi: 10.1214/08-AOAS221 26561513

20. Stivala AD, Koskinen JH, Rolls DA, Wang P, Robins GL. Snowball sampling for estimating exponential random graph models for large networks. Soc Networks. 2016;47:167–188. doi: 10.1016/j.socnet.2015.11.003

21. Pattison PE, Robins GL, Snijders TAB, Wang P. Conditional estimation of exponential random graph models from snowball sampling designs. J Math Psychol. 2013;57(6):284–296. doi: 10.1016/j.jmp.2013.05.004

22. Snijders TAB, Baerveldt C. A multilevel network study of the effects of delinquent behavior on friendship evolution. J Math Sociol. 2003;27(2–3):123–151. doi: 10.1080/00222500305892

23. Efron B. Better bootstrap confidence intervals. J Am Stat Assoc. 1987;82(397):171–185. doi: 10.2307/2289153

24. Hunter DR, Goodreau SM, Handcock MS. Goodness of fit of social network models. J Am Stat Assoc. 2008;103(481):248–258. doi: 10.1198/016214507000000446

25. Borisenko A, Byshkin M, Lomi A. A simple algorithm for scalable Monte Carlo inference; 2019. Preprint. Available from: arXiv:1901.00533v3. Cited 17 April 2019.

26. Thiemichen S, Kauermann G. Stable exponential random graph models with non-parametric components for large dense networks. Soc Networks. 2017;49:67–80. doi: 10.1016/j.socnet.2016.12.002

27. Babkin S, Schweinberger M. Massive-scale estimation of exponential-family random graph models with local dependence; 2017. Preprint. Available from: arXiv:1703.09301v1. Cited 17 April 2019.

28. Schweinberger M, Krivitsky PN, Butts CT, Stewart J. Exponential-family models of random graphs: Inference in finite-, super-, and infinite-population scenarios; 2019. Preprint. Available from: arXiv:1707.04800v4. Cited 15 October 2019.

29. Snijders TAB, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociol Methodol. 2006;36(1):99–153. doi: 10.1111/j.1467-9531.2006.00176.x

30. Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M. statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw. 2008;24(1):1–11. doi: 10.18637/jss.v024.i01

31. Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks. J Stat Softw. 2008;24(3):1–29. doi: 10.18637/jss.v024.i03

32. Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Bender-deMoll S, et al. statnet: Software tools for the statistical analysis of network data; 2016. Available from: CRAN.R-project.org/package=statnet.

33. Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks; 2016. Available from: http://CRAN.R-project.org/package=ergm.

34. Wang P. Exponential random graph models for affiliation networks [PhD thesis]. The University of Melbourne. Melbourne, Australia; 2012.

35. Morris M, Handcock M, Hunter D. Specification of exponential-family random graph models: Terms and computational aspects. J Stat Softw. 2008;24(4):1–24. doi: 10.18637/jss.v024.i04

36. Younes L. Estimation and annealing for Gibbsian fields. Ann Inst Henri Poincaré B. 1988;24:269–294.

37. Tieleman T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th international conference on machine learning. ACM; 2008. p. 1064–1071.

38. Barndorff-Nielsen O. Information and exponential families in statistical theory. John Wiley & Sons; 2014.

39. Hinton GE. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002;14(8):1771–1800. doi: 10.1162/089976602760128018 12180402

40. Asuncion A, Liu Q, Ihler A, Smyth P. Learning with blocks: Composite likelihood and contrastive divergence. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics; 2010. p. 33–40.

41. Krivitsky PN. Using contrastive divergence to seed Monte Carlo MLE for exponential-family random graph models. Comput Stat Data Anal. 2017;107:149–161. doi: 10.1016/j.csda.2016.10.015

42. Wang P, Robins G, Pattison P. PNet: program for the simulation and estimation of exponential random graph (p*) models; 2009.

43. Bloom BH. Space/time trade-offs in hash coding with allowable errors. Commun ACM. 1970;13(7):422–426. doi: 10.1145/362686.362692

44. Jones GL, Haran M, Caffo BS, Neath R. Fixed-width output analysis for Markov chain Monte Carlo. J Am Stat Assoc. 2006;101(476):1537–1547. doi: 10.1198/016214506000000492

45. Vats D, Flegal JM, Jones GL. Multivariate output analysis for Markov chain Monte Carlo; 2017. Preprint. Available from: arXiv:1512.07713v4. Cited 17 April 2019.

46. Flegal JM, Hughes J, Vats D. mcmcse: Monte Carlo standard errors for MCMC; 2016. Available from: https://cran.r-project.org/package=mcmcse.

47. Hartung J, Knapp G, Sinha BK. Statistical meta-analysis with applications. Hoboken, NJ: John Wiley & Sons; 2008.

48. Hanson TD. uthash; 2018. https://github.com/troydhanson/uthash.

49. Salmon JK, Moraes MA, Dror RO, Shaw DE. Parallel random numbers: As easy as 1, 2, 3. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM; 2011. p. 16.

50. Csárdi G, Nepusz T. The igraph software package for complex network research. InterJournal Complex Systems: 1695. 2006.

51. Leskovec J, Sosič R. SNAP: A general-purpose network analysis and graph-mining library. ACM Trans Intell Syst Technol. 2016;8(1):1. doi: 10.1145/2898361 28344853

52. Wickham H. ggplot2: elegant graphics for data analysis. Springer New York; 2009. Available from: http://had.co.nz/ggplot2/book.

53. Scherer R. PropCIs: Various confidence interval methods for proportions; 2014. Available from: https://CRAN.R-project.org/package=PropCIs.

54. Gillespie CS. Fitting heavy tailed distributions: The poweRlaw package. J Stat Softw. 2015;64(2):1–16. doi: 10.18637/jss.v064.i02

55. Takac L, Zabovsky M. Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations. vol. 1; 2012. p. 1–6. Available from: http://snap.stanford.edu/data/soc-pokec.pdf.

56. Leskovec J, Krevl A. SNAP Datasets: Stanford large network dataset collection; 2014. http://snap.stanford.edu/data.

57. Kleineberg KK, Boguñá M. Evolution of the digital society reveals balance between viral and mass media influence. Phys Rev X. 2014;4(3):031046.

58. Clauset A, Shalizi CR, Newman ME. Power-law distributions in empirical data. SIAM Rev. 2009;51(4):661–703. doi: 10.1137/070710111

59. Stumpf MP, Porter MA. Critical truths about power laws. Science. 2012;335(6069):665–666. doi: 10.1126/science.1216142 22323807

60. Broido AD, Clauset A. Scale-free networks are rare. Nat Commun. 2019;10(1):1017. doi: 10.1038/s41467-019-08746-5 30833554

61. Van Duijn MA, Gile KJ, Handcock MS. A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Soc Networks. 2009;31(1):52–62. doi: 10.1016/j.socnet.2008.10.003 23170041

62. Wilson EB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927;22(158):209–212. doi: 10.1080/01621459.1927.10502953

63. Granovetter MS. The strength of weak ties. Am J Sociol. 1973;78(6):1360–1380. doi: 10.1086/225469

64. An C, O’Malley AJ, Rockmore DN, Stock CD. Analysis of the US patient referral network. Stat Med. 2018;37(5):847–866. doi: 10.1002/sim.7565 29205445

65. Robbins H, Monro S. A stochastic approximation method. Ann Math Stat. 1951;22(3):400–407. doi: 10.1214/aoms/1177729586

66. Geyer CJ, Thompson EA. Constrained Monte Carlo maximum likelihood for dependent data. J Roy Stat Soc B Met. 1992;54(3):657–699.

67. Fellows IE. Why (and when and how) contrastive divergence works; 2014. Preprint. Available from: arXiv:1405.0602v1. Cited 17 April 2019.


Článek vyšel v časopise

PLOS One


2020 Číslo 1