Open Access

Observational vs. Experimental Data When Making Automated Decisions Using Machine Learning

Carlos Fernández-Loría
Corresponding Author
Carlos Fernández-Loría
[email protected]
https://orcid.org/0000-0003-4509-3768
School of Business and Management, Hong Kong University of Science and Technology, New Territories, Hong Kong
Search for more papers by this author
,
Foster Provost
Foster Provost
[email protected]
Leonard N. Stern School of Business, New York University, New York, New York 10012
Search for more papers by this author

Carlos Fernández-Loría

Corresponding Author

Carlos Fernández-Loría

[email protected]

https://orcid.org/0000-0003-4509-3768

School of Business and Management, Hong Kong University of Science and Technology, New Territories, Hong Kong

Search for more papers by this author

Foster Provost

[email protected]

Leonard N. Stern School of Business, New York University, New York, New York 10012

Search for more papers by this author

Published Online:3 Jun 2025https://doi.org/10.1287/ijds.2023.0012

References

Ascarza E (2018) Retention futility: Targeting high-risk customers might be ineffective. J. Marketing Res. 55(1):80–98.Google Scholar
Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89(1):133–161.Google Scholar
Athey S, Chetty R, Imbens G (2020) Combining experimental and observational data to estimate treatment effects on long term outcomes. Preprint, submitted June 17, https://arxiv.org/abs/2006.09676.Google Scholar
Besbes O, Phillips R, Zeevi A (2010) Testing the validity of a demand model: An operations perspective. Manufacturing Service Oper. Management 12(1):162–183.Link, Google Scholar
Bhattacharya D, Dupas P (2012) Inferring welfare maximizing treatment assignment under budget constraints. J. Econometrics 167(1):168–196.Google Scholar
Demirezen EM, Kumar S (2016) Optimization of recommender systems based on inventory. Production Oper. Management 25(4):593–608.Google Scholar
Devriendt F, Moldovan D, Verbeke W (2018) A literature survey and experimental evaluation of the state-of-the-art in uplift modeling: A stepping stone toward the development of prescriptive analytics. Big Data 6(1):13–41.Google Scholar
Diemert E, Betlei A, Renaudin C, Amini MR (2018) A large scale benchmark for uplift modeling. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York).Google Scholar
Dorie V, Harada M, Carnegie NB, Hill J (2016) A flexible, interpretable framework for assessing sensitivity to unmeasured confounding. Statist. Medicine 35(20):3453–3470.Google Scholar
Feit EM, Berman R (2019) Test & roll: Profit-maximizing A/B tests. Marketing Sci. 38(6):1038–1058.Link, Google Scholar
Feng Q, Luo S, Zhang D (2014) Dynamic inventory–pricing control under backorder: Demand estimation and policy optimization. Manufacturing Service Oper. Management 16(1):149–160.Link, Google Scholar
Fernández-Loría C, Provost F (2020) Combining observational and experimental data to improve large-scale decision-making. Proc. Internat. Conf. Inform. Systems (Association for Information Systems, Atlanta), 1583.Google Scholar
Fernández-Loría C, Provost F (2022a) Causal classification: Treatment effect estimation vs. outcome prediction. J. Machine Learn. Res. 23(59):1–35.Google Scholar
Fernández-Loría C, Provost F (2022b) Causal decision making and causal effect estimation are not the same…and why it matters. INFORMS J. Data Sci. 1(1):4–16.Link, Google Scholar
Ferreira KJ, Lee BHA, Simchi-Levi D (2016) Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing Service Oper Management 18(1):69–88.Link, Google Scholar
Frey LJ, Fisher DH (1999) Modeling decision tree performance with the power law. Proc. Seventh Internat. Workshop Artificial Intelligence Statist., Proceedings of Machine Learning Research (PMLR, New York).Google Scholar
Friedman JH (1997) On bias, variance, 0/1–loss, and the curse-of-dimensionality. Data Mining Knowledge Discovery 1(1):55–77.Google Scholar
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput. 4(1):1–58.Google Scholar
Gordon BR, Moakler R, Zettelmeyer F (2023) Close enough? A large-scale exploration of non-experimental approaches to advertising measurement. Marketing Sci. 42(4):768–793.Link, Google Scholar
Gordon BR, Zettelmeyer F, Bhargava N, Chapsky D (2019) A comparison of approaches to advertising measurement: Evidence from big field experiments at Facebook. Marketing Sci. 38(2):193–225.Link, Google Scholar
Hill JL (2011) Bayesian nonparametric modeling for causal inference. J. Comput. Graphical Statist. 20(1):217–240.Google Scholar
Hirano K, Porter JR (2009) Asymptotics for statistical treatment rules. Econometrica 77(5):1683–1701.Google Scholar
Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Statist. 7(1):443–470.Google Scholar
Kallus N, Puli AM, Shalit U (2018) Removing hidden confounding by experimental grounding. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10911–10920.Google Scholar
Kane K, Lo VS, Zheng J (2014) Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. J. Marketing Anal. 2(4):218–238.Google Scholar
Kent DM, Paulus JK, Van Klaveren D, D’Agostino R, Goodman S, Hayward R, Ioannidis JP, et al. (2020) The predictive approaches to treatment effect heterogeneity (PATH) statement. Ann. Internal Medicine 172(1):35–45.Google Scholar
Kitagawa T, Tetenov A (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.Google Scholar
Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: Survey and practical guide. Data Mining Knowledge Discovery 18(1):140–181.Google Scholar
Manski CF (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72(4):1221–1246.Google Scholar
McFowland E III, Gangarapu S, Bapna R, Sun T (2021) A prescriptive analytics framework for optimal policy deployment using heterogeneous treatment effects. MIS Quart. 45(4):1807–1832.Google Scholar
Morucci M, Noor-E-Alam M, Rudin C (2022) A robust approach to quantifying uncertainty in matching problems of causal inference. INFORMS J. Data Sci. 1(2):156–171.Link, Google Scholar
Pearl J (2009) Causality: Models, Reasoning and Inference (Cambridge University Press, Cambridge, UK).Google Scholar
Perlich C, Provost F, Simonoff JS (2003) Tree induction vs. logistic regression: A learning-curve analysis. J. Machine Learn. Res. 4(June):211–255.Google Scholar
Peysakhovich A, Lada A (2016) Combining observational and experimental data to find heterogeneous treatment effects. Preprint, submitted November 8, https://arxiv.org/abs/1611.02385.Google Scholar
Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (O’Reilly Media, Sebastopol, CA).Google Scholar
Radcliffe NJ, Surry PD (2011) Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions.Google Scholar
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55.Google Scholar
Rosenman ET, Basse G, Owen AB, Baiocchi M (2023) Combining observational and experimental datasets using shrinkage estimators. Biometrics 79(4):2961–2973.Google Scholar
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688–701.Google Scholar
Simester D, Timoshenko A, Zoumpoulis SI (2020) Efficiently evaluating targeting policies: Improving on champion vs. challenger experiments. Management Sci. 66(8):3412–3424.Link, Google Scholar
Train KE (2009) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).Google Scholar
Verbeke W, Olaya D, Guerry MA, Van Belle J (2023) To do or not to do? Cost-sensitive causal classification with individual treatment effect estimates. Eur. J. Oper. Res. 305(2):838–852.Google Scholar
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.Google Scholar
Wooldridge JM (2015) Introductory Econometrics: A Modern Approach, 6th ed. (Cengage Learning, Boston).Google Scholar
Yahav I, Shmueli G, Mani D (2016) A tree-based approach for addressing self-selection in impact studies with big data. MIS Quart. 40(4):819–848.Google Scholar
Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E (2012) Estimating optimal treatment regimes from a classification perspective. Stat 1(1):103–114.Google Scholar
Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107(499):1106–1118.Google Scholar

cover image INFORMS Journal on Data Science

Volume 4, Issue 3

July-September 2025

Pages iii-vi, 197-282, ii

Article Information

Supplemental Material

Metrics

Information

Received:June 19, 2023
Accepted:March 21, 2025
Published Online:June 03, 2025

Cite as

Carlos Fernández-Loría, Foster Provost (2025) Observational vs. Experimental Data When Making Automated Decisions Using Machine Learning. INFORMS Journal on Data Science 4(3):197-229.

https://doi.org/10.1287/ijds.2023.0012

Keywords

Acknowledgments

The authors thank their research assistant, Yanfang Hou, for her valuable contributions to this paper. Her assistance was instrumental in proving Lemma D.2 in the appendix and in implementing some of the code for Sections 4 and 6. The authors also thank the Fubon Center and Ira Rennert for supporting research on data science at NYU/Stern.

PDF download

Available Issues