Robust policy evaluation from large-scale observational studies

Autoři: Md Saiful Islam aff001;  Md Sarowar Morshed aff001;  Gary J. Young aff002;  Md. Noor-E-Alam aff001
Působiště autorů: Mechanical and Industrial Engineering, Northeastern University, Boston, Massachusetts, United States of America aff001;  Center for Health Policy and Healthcare Research, Northeastern University, Boston, Massachusetts, United States of America aff002;  D’Amore-McKim School of Business, Northeastern University, Boston, Massachusetts, United States of America aff003;  Bouvè College of Health Sciences, Northeastern University, Boston, Massachusetts, United States of America aff004
Vyšlo v časopise: PLoS ONE 14(10)
Kategorie: Research Article
doi: 10.1371/journal.pone.0223360


Under the current policy decision making paradigm we make or evaluate a policy decision by intervening different socio-economic parameters and analyzing the impact of those interventions. This process involves identifying the causal relation between interventions and outcomes. Matching method is one of the popular techniques to identify such causal relations. However, in one-to-one matching, when a treatment or control unit has multiple pair assignment options with similar match quality, different matching algorithms often assign different pairs. Since all the matching algorithms assign pairs without considering the outcomes, it is possible that with the same data and same hypothesis, different experimenters can reach different conclusions creating an uncertainty in policy decision making. This problem becomes more prominent in the case of large-scale observational studies as there are more pair assignment options. Recently, a robust approach has been proposed to tackle the uncertainty that uses an integer programming model to explore all possible assignments. Though the proposed integer programming model is very efficient in making robust causal inference, it is not scalable to big data observational studies. With the current approach, an observational study with 50,000 samples will generate hundreds of thousands binary variables. Solving such integer programming problem is computationally expensive and becomes even worse with the increase of sample size. In this work, we consider causal inference testing with binary outcomes and propose computationally efficient algorithms that are adaptable for large-scale observational studies. By leveraging the structure of the optimization model, we propose a robustness condition that further reduces the computational burden. We validate the efficiency of the proposed algorithms by testing the causal relation between the Medicare Hospital Readmission Reduction Program (HRRP) and non-index readmissions (i.e., readmission to a hospital that is different from the hospital that discharged the patient) from the State of California Patient Discharge Database from 2010 to 2014. Our result shows that HRRP has a causal relation with the increase in non-index readmissions. The proposed algorithms proved to be highly scalable in testing causal relations from large-scale observational studies.

Klíčová slova:

Algorithms – California – Health care policy – Medicare – Observational studies – Optimization – Test statistics – Experimental economics


1. Nssah BE. Propensity score matching and policy impact analysis: A demonstration in EViews. vol. 3877. World Bank Publications; 2006.

2. Pearl J. Causal inference in statistics: An overview. Statistics Surveys. 2009;3:96–146. doi: 10.1214/09-SS057

3. Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z. Prediction policy problems. American Economic Review. 2015;105(5):491–95. doi: 10.1257/aer.p20151023 27199498

4. Zajonc T. Essays on causal inference for public policy; 2012.

5. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66(5):688. doi: 10.1037/h0037350

6. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behavioral Research. 2011;46(3):399–424. doi: 10.1080/00273171.2011.568786 21818162

7. Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika. 1983;70(1):41–55. doi: 10.1093/biomet/70.1.41

8. Athey S, Imbens GW. The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives. 2017;31(2):3–32. doi: 10.1257/jep.31.2.3

9. Rosenbaum PR. Observational studies. In: Observational Studies. Springer; 2002. p. 1–17.

10. Stuart EA. Matching Methods for Causal Inference: A Review and a Look Forward. Statist Sci. 2010;25(1):1–21. doi: 10.1214/09-STS313

11. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA. 2000;283(15):2008–2012. 10789670

12. Hansen BB. Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association. 2004;99(467):609–618. doi: 10.1198/016214504000000647

13. Zubizarreta JR. Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure After Surgery. Journal of the American Statistical Association. 2012;107(500):1360–1371. doi: 10.1080/01621459.2012.703874

14. Rosenbaum PR, Rubin DB. Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician. 1985;39(1):33–38. doi: 10.2307/2683903

15. Holland PW. Statistics and Causal Inference. Journal of the American Statistical Association. 1986;81(396):945–960. doi: 10.2307/2289069

16. Morgan SL, Harding DJ. Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods & Research. 2006;35(1):3–60. doi: 10.1177/0049124106289164

17. Christakis NA, Iwashyna TJ. The health impact of health care on families: a matched cohort study of hospice use by decedents and mortality outcomes in surviving, widowed spouses. Social Science & Medicine. 2003;57(3):465–475. doi: 10.1016/S0277-9536(02)00370-2

18. Akematsu Y, Tsuji M. Measuring the effect of telecare on medical expenditures without bias using the propensity score matching method. Telemedicine and e-Health. 2012;18(10):743–747. doi: 10.1089/tmj.2012.0019 23072633

19. Kiil A. Does employment-based private health insurance increase the use of covered health care services? A matching estimator approach. International Journal of Health Care Finance and Economics. 2012;12(1):1–38. doi: 10.1007/s10754-012-9104-3 22367625

20. Sari N, Osman M. The effects of patient education programs on medication use among asthma and COPD patients: a propensity score matching with a difference-in-difference regression approach. BMC Health Services Research. 2015;15(1):332. doi: 10.1186/s12913-015-0998-6 26277920

21. Zubizarreta JR, Keele L. Optimal multilevel matching in clustered observational studies: A case study of the effectiveness of private schools under a large-scale voucher system. Journal of the American Statistical Association. 2017;112(518):547–560. doi: 10.1080/01621459.2016.1240683

22. Hong G, Raudenbush SW. Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association. 2006;101(475):901–910. doi: 10.1198/016214506000000447

23. Dehejia RH, Wahba S. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association. 1999;94(448):1053–1062. doi: 10.1080/01621459.1999.10473858

24. Epstein L, Ho DE, King G, Segal JA. The Supreme Court during crisis: How war affects only non-war cases. NYUL rev. 2005;80:1.

25. Herron MC, Wand J. Assessing partisan bias in voting technology: The case of the 2004 New Hampshire recount. Electoral Studies. 2007;26(2):247–261. doi: 10.1016/j.electstud.2006.02.004

26. Morucci M, Noor-E-Alam M, Rudin C. Hypothesis Tests That Are Robust to Choice of Matching Method. arXiv preprint arXiv:181202227. 2018.

27. McIlvennan CK, Eapen ZJ, Allen LA. Hospital readmissions reduction program. Circulation. 2015;131(20):1796–1803. doi: 10.1161/CIRCULATIONAHA.114.010270 25986448

28. Austin PC. Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2009;51(1):171–184. doi: 10.1002/bimj.200810488

29. Iacus S, King G, Porro G, et al. CEM: software for coarsened exact matching. Journal of Statistical Software. 2009;30(13):1–27.

30. Diamond A, Sekhon JS. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics. 2013;95(3):932–945. doi: 10.1162/REST_a_00318

31. Chen M. Reducing excess hospital readmissions: Does destination matter? International Journal of Health Economics and Management. 2018;18(1):67–82. doi: 10.1007/s10754-017-9224-x 28948445

32. Hasan MM, Noor-E-Alam M, Wang X, Zepeda ED, Young GJ, et al. Hospital Readmissions to Nonindex Hospitals: Patterns and Determinants Following the Medicare Readmission Reduction Penalty Program. Journal for Healthcare Quality. 2019. doi: 10.1097/JHQ.0000000000000199 31135609

33. Chen M, Grabowski DC. Hospital readmissions reduction program: intended and unintended effects. Medical Care Research and Review. 2017; p. 1077558717744611.

34. Burke RE, Jones CD, Hosokawa P, Glorioso TJ, Coleman EA, Ginde AA. Influence of nonindex hospital readmission on length of stay and mortality. Medical care. 2018;56(1):85–90. doi: 10.1097/MLR.0000000000000829 29087981

35. Rubin DB. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics. 1978;6(1):34–58. doi: 10.1214/aos/1176344064

36. Nikolaev AG, Jacobson SH, Cho WKT, Sauppe JJ, Sewell EC. Balance Optimization Subset Selection (BOSS): An Alternative Approach for Causal Inference with Observational Data. Operations Research. 2013;61(2):398–412. doi: 10.1287/opre.1120.1118

37. King G, Nielsen R. Why propensity scores should not be used for matching. Political Analysis. 2019. doi: 10.1017/pan.2019.11

38. Zubizarreta JR. Stable weights that balance covariates for estimation with incomplete outcome data. Journal of the American Statistical Association. 2015;110(511):910–922. doi: 10.1080/01621459.2015.1023805

39. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153–157. doi: 10.1007/BF02295996 20254758

40. Iacus SM, King G, Porro G. Causal inference without balance checking: Coarsened exact matching. Political analysis. 2012;20(1):1–24. doi: 10.1093/pan/mpr013

41. Iacus SM, King G, Porro G. Multivariate Matching Methods That Are Monotonic Imbalance Bounding. Journal of the American Statistical Association. 2011;106(493):345–361. doi: 10.1198/jasa.2011.tm09599

42. Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314. doi: 10.2307/1390807

43. Fourer R, Gay DM, Kernighan BW. AMPL: A mathematical programming language. AT & T Bell Laboratories Murray Hill, NJ 07974; 1987.

44. CPLEX II. V12. 1: User’s Manual for CPLEX. International Business Machines Corporation. 2009;46(53):157.

45. Connor RJ. Sample size for testing differences in proportions for the paired-sample design. Biometrics. 1987; p. 207–211. doi: 10.2307/2531961 3567305

Článek vyšel v časopise


2019 Číslo 10