Sample size issues in multilevel logistic regression models

Autoři: Amjad Ali aff001;  Sabz Ali aff001;  Sajjad Ahmad Khan aff001;  Dost Muhammad Khan aff002;  Kamran Abbas aff003;  Alamgir Khalil aff004;  Sadaf Manzoor aff001;  Umair Khalil aff002
Působiště autorů: Department of Statistics Islamia College, Peshawar, Pakistan aff001;  Department of Statistics, Abdul Wali Khan University Mardan, Pakistan aff002;  Department of Statistics, University of Azad Jammu & Kashmir, Muzaffarabad, Pakistan aff003;  Department of Statistics, University of Peshawar, Pakistan aff004
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
doi: 10.1371/journal.pone.0225427


Educational researchers, psychologists, social, epidemiological and medical scientists are often dealing with multilevel data. Sometimes, the response variable in multilevel data is categorical in nature and needs to be analyzed through Multilevel Logistic Regression Models. The main theme of this paper is to provide guidelines for the analysts to select an appropriate sample size while fitting multilevel logistic regression models for different threshold parameters and different estimation methods. Simulation studies have been performed to obtain optimum sample size for Penalized Quasi-likelihood (PQL) and Maximum Likelihood (ML) Methods of estimation. Our results suggest that Maximum Likelihood Method performs better than Penalized Quasi-likelihood Method and requires relatively small sample under chosen conditions. To achieve sufficient accuracy of fixed and random effects under ML method, we established ‘‘50/50” and ‘‘120/50” rule respectively. On the basis our findings, a ‘‘50/60” and ‘‘120/70” rules under PQL method of estimation have also been recommended.

Klíčová slova:

Analysis of variance – Generalized linear model – Normal distribution – Psychological and psychosocial issues – Psychologists – Simulation and modeling – Statistical models – Social epidemiology


1. Raudenbush SW. and Bryk AS. “Hierarchical linear models” Applications and data analysis methods”. (vol.1) Sage, 2002.

2. Goldstein H., “Performance Indicators in Education”. Statistics in Society, London, Arnold, 1999, pp. 281–286.

3. Goldstein H., “Multilevel Statistical Models”. New York, Halstead Press, 1995.

4. Goldstein H., “Multilevel statistical models (3rd ed.)”. London, Hodder Arnold, 2003.

5. Snijders T. A. B., and Bosker R. J., “Multilevel analysis: An introduction to basic and Advanced multilevel modeling”,London, Sage, 1999.

6. Hox J. J., “Multilevel analysis: Techniques and applications”. Mahwah, NJ: Lawrence Erlbaum Associates, Inc., 2002.

7. Maas C.J. and Hox J.J., “Robustness issues in multilevel regression analysis”, Statistica Neerlandica. 2004; 58(2), 127–37.

8. Maas C.J. and Hox J.J., “Sufficient sample sizes for multilevel modeling”, Methodology, 2005, 1(3), 86–92.

9. Moineddin R., Matheson F.I. and Glazier R.H., “A simulation study of sample size for multilevel logistic regression models”, BMC medical research methodology, 2007, 7(1):34. doi: 10.1186/1471-2288-7-34 17634107

10. Paccagnella O., “Sample size and accuracy of estimates in multilevel models”, European Journal of Research Methods for the Behavioral and Social Sciences, 2011, 7(3), 111.

11. Zeng Q., Gu W., Zhang X., Wen H., Lee J. and Hao W., “Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors”, Accident Analysis & Prevention, 2019,127, 87–95.

12. Zeng Q., Wen H., Huang H., Pei X. and Wong S. C., “Incorporating temporal correlation into a multivariate random parameters Tobit model for modeling crash rate by injury severity”, Transportmetrica A: transport science, 2018, 14(3), 177–191.

13. Zeng Q., Guo Q., Wong S. C., Wen H., Huang H. and Pei X., “Jointly modeling area-level crash rates by severity: a Bayesian multivariate random-parameters spatio-temporal Tobit regression”. Transportmetrica A: Transport Science, 2019, 15(2), 1867–1884.

14. Chen F., Peng H., Ma X., Liang J., Hao W. and Pan X., “Examining the safety of trucks under crosswind at bridge-tunnel section: A driving simulator study”. Tunnelling and Underground Space Technology, 2019, 92, 103034.

15. Chen F. and Chen S., “Injury severities of truck drivers in single-and multi-vehicle accidents on rural highways” Accident Analysis & Prevention, 2011, 43(5), 1677–1688.

16. Chen F., Song M., and Ma X., (2019). “Investigation on the injury severity of drivers in rear-end collisions between cars using a random parameters bivariate ordered probit model”, International journal of environmental research and public health, 2019, 16(14), 2632.

17. Scott Long J., “Regression models for categorical and limited dependent variables”. Advanced quantitative techniques in the social sciences, 1997, 7.

18. Agresti A., “Categorical Data Analysis,” Wiley, New York, 1990.

19. Hox J.J., “Applied Multilevel Analysis”. Amsterdam: TT-Publikaties, 1995.

20. McCullagh P. and Nelder J. A., “Generalised linear models”. Chapman and Hall. London, UK, 1989.

21. Bradley J. V., “Robustness”. British Journal of Mathematical and Statistical Psychology, 1978, 31(2), 144–152.

22. Snijders T. A. and Bosker R. J., “Standard errors and sample sizes for two-level research”, Journal of Educational and Behavioral Statistics, 1993, 18(3), 237–259.

23. Raudenbush S. W. and Liu X., “Statistical power and optimal design for multisite randomized trials”. Psychological methods, 2000, 5(2), 199. doi: 10.1037/1082-989x.5.2.199 10937329

Článek vyšel v časopise


2019 Číslo 11