Precluding rare outcomes by predicting their absence

Autoři: Eric W. Schoon aff001;  David Melamed aff001;  Ronald L. Breiger aff002;  Eunsung Yoon aff002;  Christopher Kleps aff001
Působiště autorů: Department of Sociology, The Ohio State University, Columbus, Ohio, United States of America aff001;  School of Sociology, University of Arizona, Tucson, Arizona, United States of America aff002
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article


Forecasting extremely rare events is a pressing problem, but efforts to model such outcomes are often limited by the presence of multiple causes within classes of events, insufficient observations of the outcome to assess fit, and biased estimates due to insufficient observations of the outcome. We introduce a novel approach for analyzing rare event data that addresses these challenges by turning attention to the conditions under which rare outcomes do not occur. We detail how configurational methods can be used to identify conditions or sets of conditions that would preclude the occurrence of a rare outcome. Results from Monte Carlo experiments show that our approach can be used to systematically preclude up to 78.6% of observations, and application to ground-truth data coupled with a bootstrap inferential test illustrates how our approach can also yield novel substantive insights that are obscured by standard statistical analyses.

Klíčová slova:

Algorithms – Experimental design – Sociolinguistics – Statistical data – Statistical distributions – Analysts


1. Harding DJ, Fox C, Mehta JD. Studying rare events through qualitative case studies: Lessons from a study of rampage school shootings. Sociol Methods Res. 2002 Nov;31(2):174–217.

2. Mohamad MA, Sapsis TP. Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems. Proc Natl Acad Sci U S A. 2018 Oct 30;115(44):11138–43. doi: 10.1073/pnas.1813263115 30327341

3. Perrow C. Fukushima and the inevitability of accidents. Bull At Sci. 2011 Nov;67(6):44–52.

4. Perrow C. Normal accidents: Living with high risk technologies-Updated edition. Princeton: Princeton University Press; 2011 Oct 12.

5. Martin JL. Thinking through statistics. Chicago: University of Chicago Press; 2018 Aug 21.

6. King G, Zeng L. Logistic regression in rare events data. Polit Anal. 2001;9(2):137–63.

7. Allison P. Convergence problems in logistic regression. Numerical issues in statistical computing for the social scientist. 2003 Dec 12:238–52.

8. Ragin CC. Redesigning social inquiry: Fuzzy sets and beyond. Chicago: University of Chicago Press; 2009 May 15.

9. Ferreira CA, Gama J, Costa VS, Miranda V, Botterud A. Predicting Ramp Events with a Stream-based HMM framework. In International Conference on Discovery Science 2012 Oct 29 (pp. 224–238). Springer, Berlin, Heidelberg.

10. Fudenberg D, He K, Imhof LA. Bayesian posteriors for arbitrarily rare events. Proc Natl Acad Sci U S A. 2017 May 9;114(19):4925–9. doi: 10.1073/pnas.1618780114 28442566

11. Szymanski BK, Lin X, Asztalos A, Sreenivasan S. Failure dynamics of the global risk network. Sci Rep. 2015 Jun 18;5:10998. doi: 10.1038/srep10998 26087020

12. Braumoeller BF. Guarding against false positives in qualitative comparative analysis. Polit Anal. 2015 Oct 1;23(4):471–87.

13. Grofman B, Schneider CQ. An introduction to crisp set QCA, with a comparison to binary logistic regression. Polit Res Q. 2009 62: 662–672.

14. Ragin CC. The comparative method: Moving beyond qualitative and quantitative strategies. Berkeley: University of California Press; 2014 Jul 18.

15. Thiem A, Duşa A. Boolean minimization in social science research: A review of current software for Qualitative Comparative Analysis (QCA). Soc Sci Comput Rev. 2013 Aug;31(4):505–21.

16. Schoon EW. The asymmetry of legitimacy: Analyzing the legitimation of violence in 30 cases of insurgent revolution. Soc Forces. 2014 Aug 1;93(2):779–801.

17. McCluskey EJ Jr. Minimization of Boolean functions. Bell system technical Journal. 1956 Nov;35(6):1417–44.

18. McCluskey EJ. Introduction to the theory of switching circuits. McGraw-Hill, 1966.

19. Quine WV. A way to simplify truth functions. Am Math Mon. 1955 Nov 1;62(9):627–31.

20. Ragin CC, Mayer SE, Drass KA. Assessing discrimination: A Boolean approach. Am Sociol Rev. 1984 Apr 1:221–34.

21. Ragin CC. Qualitative comparative analysis using fuzzy sets (fsQCA). Configurational comparative methods: Qualitative comparative analysis (QCA) and related techniques. 2009; 51:87–121.

22. Wimmer A, Cederman LE, Min B. Ethnic politics and armed conflict: A configurational analysis of a new global data set. Am Sociol Rev. 2009 Apr;74(2):316–37.

23. Breiger RL, Schoon E, Melamed D, Asal V, Rethemeyer RK. Comparative configurational analysis as a two-mode network problem: A study of terrorist group engagement in the drug trade. Soc Networks. 2014 Jan 1;36:23–39.

24. Fan J, Han F, Liu H. Challenges of big data analysis. Natl Sci Rev. 2014 Jun 1;1(2):293–314. doi: 10.1093/nsr/nwt032 25419469

25. Khoury MJ, Ioannidis JP. Big data meets public health. Science. 2014 Nov 28;346(6213):1054–5. doi: 10.1126/science.aaa2709 25430753

Článek vyšel v časopise


2019 Číslo 10
Nejčtenější tento týden