Free Access

Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters

Carlos Fernández-Loría
Carlos Fernández-Loría
[email protected]
https://orcid.org/0000-0003-4509-3768
Department of Information Systems, Business Statistics, and Operations Management, HKUST Business School, Hong Kong University of Science and Technology, New Territories, Hong Kong;
Search for more papers by this author
,
Foster Provost
Foster Provost
[email protected]
Department of Technology, Operations, and Statistics, NYU Stern School of Business, New York University, New York, New York 10012;Compass Inc., New York, New York 10011
Search for more papers by this author

Department of Information Systems, Business Statistics, and Operations Management, HKUST Business School, Hong Kong University of Science and Technology, New Territories, Hong Kong;

Search for more papers by this author

Foster Provost

[email protected]

Department of Technology, Operations, and Statistics, NYU Stern School of Business, New York University, New York, New York 10012;Compass Inc., New York, New York 10011

Search for more papers by this author

Published Online:10 Mar 2022https://doi.org/10.1287/ijds.2021.0006

References

Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91(434):444–455.Google Scholar
Ascarza E (2018) Retention futility: Targeting high-risk customers might be ineffective. J. Marketing Res. 55(1):80–98.Google Scholar
Ascarza E, Neslin S, Netzer O, Anderson Z, Fader P, Gupta S, Hardie B, et al. (2018) In pursuit of enhanced customer retention management: Review, key issues, and future directions. Customer Needs Solutions 5(1-2):65–81.Google Scholar
Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc. National Acad. Sci. USA 113(27):7353–7360.Google Scholar
Athey S, Imbens GW (2017) The state of applied econometrics: Causality and policy evaluation. J. Econom. Perspectives 31(2):3–32.Google Scholar
Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu. Rev. Econom. 11:685–725.Google Scholar
Athey S, Wager S (2021) Policy learning with observational data. Econometrica. 89(1):133–161.Google Scholar
Athey S, Chetty R, Imbens G (2020) Combining experimental and observational data to estimate treatment effects on long term outcomes. Preprint, submitted June 17, https://arxiv.org/abs/2006.09676.Google Scholar
Athey S, Chetty R, Imbens G, Kang H (2016) Estimating treatment effects using multiple surrogates: The role of the surrogate score and the surrogate index. Preprint, submitted March 30 (v1), last revised February 29, 2020 (v3), https://arxiv.org/abs/1603.09326.Google Scholar
Beygelzimer A, Langford J (2009) The offset tree for learning with partial labels. Proc.15th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 129–138.Google Scholar
Bhattacharya D, Dupas P (2012) Inferring welfare maximizing treatment assignment under budget constraints. J. Econometrics 167(1):168–196.Google Scholar
Breiman L (2001) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statist. Sci. 16(3):199–231.Google Scholar
Chakraborty B, Murphy SA (2014) Dynamic treatment regimes. Annu. Rev. Statist. Appl. 1:447–464.Google Scholar
Cox DR (1958) Planning of Experiments (Wiley, New York).Google Scholar
Cox DR (2001). Statistical modeling: The two cultures. Statist. Sci. 16(3):216–218.Google Scholar
Dalessandro B, Hook R, Perlich C, Provost F (2015) Evaluating and optimizing online advertising: Forget the click, but there are good proxies. Big Data 3(2):90–102.Google Scholar
Diemert E, Betlei A, Renaudin C, Amini MR (2018) A large scale benchmark for uplift modeling. Proc. 24th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York).Google Scholar
Dorie V, Hill J, Shalit U, Scott M, Cervone D (2019) Automated vs. do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statist. Sci. 34(1):43–68.Google Scholar
Dubé JP, Misra S (2017) Personalized pricing and customer welfare. Preprint, submitted June 26, https://dx.doi.org/10.2139/ssrn.2992257.Google Scholar
Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Management Sci., ePub ahead of print March 12, https://doi.org/10.1287/mnsc.2020.3922.Link, Google Scholar
Elmachtoub A, Liang JCN, McNellis R (2020) Decision trees for decision-making under the predict-then-optimize framework. Internat. Conf. on Machine Learning (IMLS), 2858–2867.Google Scholar
Feit EM, Berman R (2019) Test & roll: Profit-maximizing a/b tests. Marketing Sci. 38(6):1038–1058.Link, Google Scholar
Fernández-Loría C, Provost F (2019a) Causal classification: Treatment effect vs. outcome prediction. Preprint, submitted June 26, https://dx.doi.org/10.2139/ssrn.3408524.Google Scholar
Fernández-Loría C, Provost F (2019b) Observational vs experimental data when making automated decisions using machine learning. Preprint, submitted September 5, https://dx.doi.org/10.2139/ssrn.3444678.Google Scholar
Fernández-Loría C, Provost F (2020) Combining observational and experimental data to improve large-scale decision-making. Proc. Internat. Conf. on Inform. Systems (Association for Information Systems, Atlanta), 1583.Google Scholar
Fernández-Loría C, Provost F, Anderton J, Carterette B, Chandar P (2020) A comparison of methods for treatment assignment with an application to playlist generation. Preprint, submitted April 24, 2020 (v1), last revised April 23, 2021 (v4), https://arxiv.org/abs/.Google Scholar
Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining Knowledge Discovery 1(1):55–77.Google Scholar
Gordon BR, Zettelmeyer F, Bhargava N, Chapsky D (2019) A comparison of approaches to advertising measurement: Evidence from big field experiments at facebook. Marketing Sci. 38(2):193–225.Link, Google Scholar
Hill JL (2011) Bayesian nonparametric modeling for causal inference. J. Comput. Graphics Statist. 20(1):217–240.Google Scholar
Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, et al.. (2015) Telco churn prediction with big data. Proc. ACM SIGMOD Internat. Conf. on Management of Data (ACM, New York), 607–618.Google Scholar
Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Statist. 7(1):443–470.Google Scholar
Imbens G, Athey S (2021) Breiman’s two cultures: A perspective from econometrics. Observational Stud. 7(1):127–133.Google Scholar
Kallus N, Puli AM, Shalit U (2018) Removing hidden confounding by experimental grounding. Proc. 32nd Internat. Conf. on Neural Inform. Processing Systems, NeurIPS(Neural Information Processing Systems Foundation, California), 10911–10920.Google Scholar
Lemmens A, Gupta S (2020) Managing churn to maximize profits. Marketing Sci. 39(5):956–973.Link, Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. on World Wide Web (ACM, New York), 661–670.Google Scholar
MacKenzie I, Meyer C, Noble S (2013) How Retailers Can Keep Up with Consumers (McKinsey & Company).Google Scholar
Manski CF (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72(4):1221–1246.Google Scholar
McFowland III E, Somanchi S, Neill DB (2018) Efficient discovery of heterogeneous treatment effects in randomized experiments via anomalous pattern detection. Preprint, submitted March 24 (v1), last revised June 7, 2018 (v2), https://arxiv.org/abs/1803.09159.Google Scholar
McFowland III E, Gangarapu S, Bapna R, Sun T (2022) A prescriptive analytics framework for optimal policy deployment using heterogeneous treatment effects. MIS Quart. Forthcoming.Google Scholar
Miller A, Berman R (2020) Test, target, & roll: Optimal explore-first contextual targeting in finite populations. Poster Session, Conference on Digital Experimentation (CODE@MIT, Boston).Google Scholar
Miller A, Hosanagar K (2020) Personalized discount targeting with causal machine learning. Proc. Internat. Conf. on Inform. Systems (Association for Information Systems, Atlanta), 1682.Google Scholar
Mitra N (2021) Introduction. Observational Stud. 7(1):1–2.Google Scholar
Olaya D, Coussement K, Verbeke W (2020) A survey and benchmarking study of multitreatment uplift modeling. Data Mining Knowledge Discovery 34(2):273–308.Google Scholar
Pearl J (2009) Causality: Models, Reasoning and Inference (Cambridge University Press).Google Scholar
Pearl J (2021) Causally colored reflections on leo breiman’s “statistical modeling: The two cultures” (2001). Observational Stud. 7(1):187–190.Google Scholar
Pearl J, Bareinboim E (2011) Transportability of causal and statistical relations: A formal approach. Proc. AAAI Conf. on Artificial Intelligence AAAI, vol. 25 (Advancement of Artificial Intelligence, California).Google Scholar
Perlich C, Dalessandro B, Raeder T, Stitelman O, Provost F (2014) Machine learning for targeted display advertising: Transfer learning in action. Machine Learn. 95(1):103–127.Google Scholar
Peysakhovich A, Lada A (2016) Combining observational and experimental data to find heterogeneous treatment effects. Preprint, submitted November 8, https://arxiv.org/abs/1611.02385.Google Scholar
Prentice RL (1989) Surrogate endpoints in clinical trials: definition and operational criteria. Statist. Medicine 8(4):431–440.Google Scholar
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Machine Learn. 42(3):203–231.Google Scholar
Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media).Google Scholar
Radcliffe NJ, Surry PD (2011) Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions.Google Scholar
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55.Google Scholar
Rosenman E, Basse G, Owen A, Baiocchi M (2020) Combining observational and experimental datasets using shrinkage estimators. Preprint, submitted February 16 (v1), last revised May 18, 2020 (v2), https://arxiv.org/abs/2002.06708.Google Scholar
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Edu. Psych. 66(5):688.Google Scholar
Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform. Systems Res. 18(1):4–22.Link, Google Scholar
Schuler A, Baiocchi M, Tibshirani R, Shah N (2018) A comparison of methods for model selection when estimating individual treatment effects. Preprint, submitted April 14 (v1), last revised June 13, 2018 (v2), https://arxiv.org/abs/1804.05146.Google Scholar
Shmueli G (2010) To explain or to predict? Statist. Sci. 25(3):289–310.Google Scholar
Simester D, Timoshenko A, Zoumpoulis SI (2020) Efficiently evaluating targeting policies: Improving on champion vs. challenger experiments. Management Sci. 66(8):3412–3424.Link, Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1-2):1–286.Google Scholar
Stitelman O, Dalessandro B, Perlich C, Provost F (2011) Estimating the effect of online display advertising on browser conversion. Proc. Fifth International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD (ACM, New York), 8–16.Google Scholar
Tafti A, Shmueli G (2020) Beyond overall treatment effects: Leveraging covariates in randomized experiments guided by causal structure. Inform. Systems Res. 31(4):1183–1199.Link, Google Scholar
VanderWeele TJ (2013) Surrogate measures and consistent surrogates. Biometrics 69(3):561–565.Google Scholar
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.Google Scholar
Wooldridge JM (2015) Introductory Econometrics: A Modern Approach (Nelson Education).Google Scholar
Yahav I, Shmueli G, Mani D (2015) A tree-based approach for addressing self-selection in impact studies with big data. Management Inform. Systems Quart. 40(4):819–848.Google Scholar
Yang J, Eckles D, Dhillon P, Aral S (2020) Targeting for long-term outcomes. Preprint, submitted October 29, https://arxiv.org/abs/2010.15835.Google Scholar
Zadrozny B (2003) Policy mining: Learning decision policies from fixed sets of data. PhD thesis, University of California, San Diego.Google Scholar
Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107(499):1106–1118.Google Scholar

cover image INFORMS Journal on Data Science

Volume 1, Issue 1

April-June 2022

Pages 1-113, C2

Article Information

Supplemental Material

Metrics

Information

Received:April 05, 2021
Accepted:September 26, 2021
Published Online:March 10, 2022

Cite as

Carlos Fernández-Loría, Foster Provost (2022) Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters. INFORMS Journal on Data Science 1(1):4-16.

https://doi.org/10.1287/ijds.2021.0006

Keywords

PDF download

Available Issues