Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters

Published Online:https://doi.org/10.1287/ijds.2021.0006

References

  • Angrist J, Imbens G, Rubin D (1996) Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91(434):444–455.Google Scholar
  • Ascarza E (2018) Retention futility: Targeting high-risk customers might be ineffective. J. Marketing Res. 55(1):80–98.Google Scholar
  • Ascarza E, Neslin S, Netzer O, Anderson Z, Fader P, Gupta S, Hardie B, et al. (2018) In pursuit of enhanced customer retention management: Review, key issues, and future directions. Customer Needs Solutions 5(1-2):65–81.Google Scholar
  • Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc. National Acad. Sci. USA 113(27):7353–7360.Google Scholar
  • Athey S, Imbens GW (2017) The state of applied econometrics: Causality and policy evaluation. J. Econom. Perspectives 31(2):3–32.Google Scholar
  • Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu. Rev. Econom. 11:685–725.Google Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. Econometrica. 89(1):133–161.Google Scholar
  • Athey S, Chetty R, Imbens G (2020) Combining experimental and observational data to estimate treatment effects on long term outcomes. Preprint, submitted June 17, https://arxiv.org/abs/2006.09676.Google Scholar
  • Athey S, Chetty R, Imbens G, Kang H (2016) Estimating treatment effects using multiple surrogates: The role of the surrogate score and the surrogate index. Preprint, submitted March 30 (v1), last revised February 29, 2020 (v3), https://arxiv.org/abs/1603.09326.Google Scholar
  • Beygelzimer A, Langford J (2009) The offset tree for learning with partial labels. Proc.15th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 129–138.Google Scholar
  • Bhattacharya D, Dupas P (2012) Inferring welfare maximizing treatment assignment under budget constraints. J. Econometrics 167(1):168–196.Google Scholar
  • Breiman L (2001) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statist. Sci. 16(3):199–231.Google Scholar
  • Chakraborty B, Murphy SA (2014) Dynamic treatment regimes. Annu. Rev. Statist. Appl. 1:447–464.Google Scholar
  • Cox DR (1958) Planning of Experiments (Wiley, New York).Google Scholar
  • Cox DR (2001). Statistical modeling: The two cultures. Statist. Sci. 16(3):216–218.Google Scholar
  • Dalessandro B, Hook R, Perlich C, Provost F (2015) Evaluating and optimizing online advertising: Forget the click, but there are good proxies. Big Data 3(2):90–102.Google Scholar
  • Diemert E, Betlei A, Renaudin C, Amini MR (2018) A large scale benchmark for uplift modeling. Proc. 24th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York).Google Scholar
  • Dorie V, Hill J, Shalit U, Scott M, Cervone D (2019) Automated vs. do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statist. Sci. 34(1):43–68.Google Scholar
  • Dubé JP, Misra S (2017) Personalized pricing and customer welfare. Preprint, submitted June 26, https://dx.doi.org/10.2139/ssrn.2992257.Google Scholar
  • Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Management Sci., ePub ahead of print March 12, https://doi.org/10.1287/mnsc.2020.3922.LinkGoogle Scholar
  • Elmachtoub A, Liang JCN, McNellis R (2020) Decision trees for decision-making under the predict-then-optimize framework. Internat. Conf. on Machine Learning (IMLS), 2858–2867.Google Scholar
  • Feit EM, Berman R (2019) Test & roll: Profit-maximizing a/b tests. Marketing Sci. 38(6):1038–1058.LinkGoogle Scholar
  • Fernández-Loría C, Provost F (2019a) Causal classification: Treatment effect vs. outcome prediction. Preprint, submitted June 26, https://dx.doi.org/10.2139/ssrn.3408524.Google Scholar
  • Fernández-Loría C, Provost F (2019b) Observational vs experimental data when making automated decisions using machine learning. Preprint, submitted September 5, https://dx.doi.org/10.2139/ssrn.3444678.Google Scholar
  • Fernández-Loría C, Provost F (2020) Combining observational and experimental data to improve large-scale decision-making. Proc. Internat. Conf. on Inform. Systems (Association for Information Systems, Atlanta), 1583.Google Scholar
  • Fernández-Loría C, Provost F, Anderton J, Carterette B, Chandar P (2020) A comparison of methods for treatment assignment with an application to playlist generation. Preprint, submitted April 24, 2020 (v1), last revised April 23, 2021 (v4), https://arxiv.org/abs/.Google Scholar
  • Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining Knowledge Discovery 1(1):55–77.Google Scholar
  • Gordon BR, Zettelmeyer F, Bhargava N, Chapsky D (2019) A comparison of approaches to advertising measurement: Evidence from big field experiments at facebook. Marketing Sci. 38(2):193–225.LinkGoogle Scholar
  • Hill JL (2011) Bayesian nonparametric modeling for causal inference. J. Comput. Graphics Statist. 20(1):217–240.Google Scholar
  • Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, et al.. (2015) Telco churn prediction with big data. Proc. ACM SIGMOD Internat. Conf. on Management of Data (ACM, New York), 607–618.Google Scholar
  • Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Statist. 7(1):443–470.Google Scholar
  • Imbens G, Athey S (2021) Breiman’s two cultures: A perspective from econometrics. Observational Stud. 7(1):127–133.Google Scholar
  • Kallus N, Puli AM, Shalit U (2018) Removing hidden confounding by experimental grounding. Proc. 32nd Internat. Conf. on Neural Inform. Processing Systems, NeurIPS(Neural Information Processing Systems Foundation, California), 10911–10920.Google Scholar
  • Lemmens A, Gupta S (2020) Managing churn to maximize profits. Marketing Sci. 39(5):956–973.LinkGoogle Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. on World Wide Web (ACM, New York), 661–670.Google Scholar
  • MacKenzie I, Meyer C, Noble S (2013) How Retailers Can Keep Up with Consumers (McKinsey & Company).Google Scholar
  • Manski CF (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72(4):1221–1246.Google Scholar
  • McFowland III E, Somanchi S, Neill DB (2018) Efficient discovery of heterogeneous treatment effects in randomized experiments via anomalous pattern detection. Preprint, submitted March 24 (v1), last revised June 7, 2018 (v2), https://arxiv.org/abs/1803.09159.Google Scholar
  • McFowland III E, Gangarapu S, Bapna R, Sun T (2022) A prescriptive analytics framework for optimal policy deployment using heterogeneous treatment effects. MIS Quart. Forthcoming.Google Scholar
  • Miller A, Berman R (2020) Test, target, & roll: Optimal explore-first contextual targeting in finite populations. Poster Session, Conference on Digital Experimentation (CODE@MIT, Boston).Google Scholar
  • Miller A, Hosanagar K (2020) Personalized discount targeting with causal machine learning. Proc. Internat. Conf. on Inform. Systems (Association for Information Systems, Atlanta), 1682.Google Scholar
  • Mitra N (2021) Introduction. Observational Stud. 7(1):1–2.Google Scholar
  • Olaya D, Coussement K, Verbeke W (2020) A survey and benchmarking study of multitreatment uplift modeling. Data Mining Knowledge Discovery 34(2):273–308.Google Scholar
  • Pearl J (2009) Causality: Models, Reasoning and Inference (Cambridge University Press).Google Scholar
  • Pearl J (2021) Causally colored reflections on leo breiman’s “statistical modeling: The two cultures” (2001). Observational Stud. 7(1):187–190.Google Scholar
  • Pearl J, Bareinboim E (2011) Transportability of causal and statistical relations: A formal approach. Proc. AAAI Conf. on Artificial Intelligence AAAI, vol. 25 (Advancement of Artificial Intelligence, California).Google Scholar
  • Perlich C, Dalessandro B, Raeder T, Stitelman O, Provost F (2014) Machine learning for targeted display advertising: Transfer learning in action. Machine Learn. 95(1):103–127.Google Scholar
  • Peysakhovich A, Lada A (2016) Combining observational and experimental data to find heterogeneous treatment effects. Preprint, submitted November 8, https://arxiv.org/abs/1611.02385.Google Scholar
  • Prentice RL (1989) Surrogate endpoints in clinical trials: definition and operational criteria. Statist. Medicine 8(4):431–440.Google Scholar
  • Provost F, Fawcett T (2001) Robust classification for imprecise environments. Machine Learn. 42(3):203–231.Google Scholar
  • Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media).Google Scholar
  • Radcliffe NJ, Surry PD (2011) Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions.Google Scholar
  • Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55.Google Scholar
  • Rosenman E, Basse G, Owen A, Baiocchi M (2020) Combining observational and experimental datasets using shrinkage estimators. Preprint, submitted February 16 (v1), last revised May 18, 2020 (v2), https://arxiv.org/abs/2002.06708.Google Scholar
  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Edu. Psych. 66(5):688.Google Scholar
  • Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform. Systems Res. 18(1):4–22.LinkGoogle Scholar
  • Schuler A, Baiocchi M, Tibshirani R, Shah N (2018) A comparison of methods for model selection when estimating individual treatment effects. Preprint, submitted April 14 (v1), last revised June 13, 2018 (v2), https://arxiv.org/abs/1804.05146.Google Scholar
  • Shmueli G (2010) To explain or to predict? Statist. Sci. 25(3):289–310.Google Scholar
  • Simester D, Timoshenko A, Zoumpoulis SI (2020) Efficiently evaluating targeting policies: Improving on champion vs. challenger experiments. Management Sci. 66(8):3412–3424.LinkGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1-2):1–286.Google Scholar
  • Stitelman O, Dalessandro B, Perlich C, Provost F (2011) Estimating the effect of online display advertising on browser conversion. Proc. Fifth International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD (ACM, New York), 8–16.Google Scholar
  • Tafti A, Shmueli G (2020) Beyond overall treatment effects: Leveraging covariates in randomized experiments guided by causal structure. Inform. Systems Res. 31(4):1183–1199.LinkGoogle Scholar
  • VanderWeele TJ (2013) Surrogate measures and consistent surrogates. Biometrics 69(3):561–565.Google Scholar
  • Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.Google Scholar
  • Wooldridge JM (2015) Introductory Econometrics: A Modern Approach (Nelson Education).Google Scholar
  • Yahav I, Shmueli G, Mani D (2015) A tree-based approach for addressing self-selection in impact studies with big data. Management Inform. Systems Quart. 40(4):819–848.Google Scholar
  • Yang J, Eckles D, Dhillon P, Aral S (2020) Targeting for long-term outcomes. Preprint, submitted October 29, https://arxiv.org/abs/2010.15835.Google Scholar
  • Zadrozny B (2003) Policy mining: Learning decision policies from fixed sets of data. PhD thesis, University of California, San Diego.Google Scholar
  • Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107(499):1106–1118.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.