Calibration of Heterogeneous Treatment Effects in Randomized Experiments

Yan Leng
Corresponding Author
Yan Leng
[email protected]
https://orcid.org/0000-0002-7084-2700
McCombs School of Business, The University of Texas at Austin, Austin, Texas 78705
Search for more papers by this author
,
Drew Dimmery
Drew Dimmery
[email protected]
https://orcid.org/0000-0001-9602-6325
Research Network Data Science, University of Vienna, 1090 Vienna, Austria
Search for more papers by this author

Yan Leng

Corresponding Author

Yan Leng

[email protected]

https://orcid.org/0000-0002-7084-2700

McCombs School of Business, The University of Texas at Austin, Austin, Texas 78705

Search for more papers by this author

Drew Dimmery

[email protected]

https://orcid.org/0000-0001-9602-6325

Research Network Data Science, University of Vienna, 1090 Vienna, Austria

Search for more papers by this author

Published Online:12 Jan 2024https://doi.org/10.1287/isre.2021.0343

References

Aronow P, Robins JM, Saarinen T, Sävje F, Sekhon J (2021) Nonparametric identification is not enough, but randomized controlled trials are. Preprint, submitted September 27, https://arxiv.org/abs/2108.11342.Google Scholar
Athey S, Imbens GW (2015) Machine learning methods for estimating heterogeneous causal effects. Statistics 1050(5):1–26.Google Scholar
Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA 113(27):7353–7360.Crossref, Google Scholar
Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
Casella G, Berger RL (2002) Statistical Inference, vol. 2 (Duxbury, Pacific Grove, CA).Google Scholar
Chatterji AK, Fabrizio KR (2014) Using users: When does external knowledge enhance corporate product innovation? Strategic Management J. 35(10):1427–1445.Crossref, Google Scholar
Chernozhukov V, Fernández-Val I, Luo Y (2018) The sorted effects method: Discovering heterogeneous effects beyond their averages. Econometrica 86(6):1911–1938.Crossref, Google Scholar
Chernozhukov V, Demirer M, Duflo E, Fernandez-Val I (2023) Generic machine learning inference on heterogenous treatment effects in randomized experiments. NBER Working Paper No. 24678, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Deng A, Shi X (2016) Data-driven metric development for online controlled experiments: Seven lessons learned. Proc. 22nd ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 77–86.Google Scholar
Diemert E, Betlei A, Renaudin C, Amini MR (2018) A large scale benchmark for uplift modeling. Proc. AdKDD & TargetAd (ADKDD’18) (ACM, New York), 1–6.Google Scholar
Dudík M, Erhan D, Langford J, Li L (2014) Doubly robust policy evaluation and optimization. Statist. Sci. 29(4):485–511.Crossref, Google Scholar
Dwivedi R, Tan YS, Park B, Wei M, Horgan K, Madigan D, Yu B (2020) Stable discovery of interpretable subgroups via calibration in causal studies. Internat. Statist. Rev. 88:S135–S178.Google Scholar
Eckles D (2022) Commentary on “Causal decision making and causal effect estimation are not the same… and why it matters”: On loss functions and bias–variance tradeoffs in causal estimation and decisions. INFORMS J. Data Sci. 1(1):17–18.Link, Google Scholar
Fernández-Loría C, Provost F (2022a) Causal classification: Treatment effect estimation vs. outcome prediction. J. Machine Learn. Res. 23(59):1–35.Google Scholar
Fernández-Loría C, Provost F (2022b) Causal decision making and causal effect estimation are not the same… and why it matters. INFORMS J. Data Sci. 1(1):4–16.Google Scholar
Fernández-Loría C, Provost F (2022c) Rejoinder to “causal decision making and causal effect estimation are not the same… and why it matters”. INFORMS J. Data Sci. 1(1):23–26.Google Scholar
Fernández-Loría C, Provost F, Anderton J, Carterette B, Chandar P (2023) A comparison of methods for treatment assignment with an application to playlist generation. Inform. Systems Res. 34(2):786–803.Link, Google Scholar
Galasso A, Simcoe TS (2011) CEO overconfidence and innovation. Management Sci. 57(8):1469–1484.Link, Google Scholar
Green DP, Kern HL (2012) Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quart. 76(3):491–511.Crossref, Google Scholar
Greenfeld D, Shalit U (2020) Robust learning with the Hilbert-Schmidt independence criterion. Proc. Internat. Conf. on Machine Learn. (PMLR, New York), 3759–3768.Google Scholar
Grimmer J, Messing S, Westwood SJ (2017) Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis 25(4):413–434.Crossref, Google Scholar
Guelman L, Guillén M, Pérez-Marín AM (2015) A decision support framework to implement optimal personalized marketing interventions. Decision Support Systems 72:24–32.Crossref, Google Scholar
Hahn PR, Carvalho CM, Puelz D, He J, et al. (2018) Regularization and confounding in linear regression for treatment effect estimation. Bayesian Anal. 13(1):163–182.Crossref, Google Scholar
Hill JL (2011) Bayesian nonparametric modeling for causal inference. J. Comput. Graphical Statist. 20(1):217–240.Crossref, Google Scholar
Holland PW (1986) Statistics and causal inference. J. Amer. Statist. Assoc. 81(396):945–960.Crossref, Google Scholar
Imai K, Ratkovic M (2013) Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Statist. 7(1):443–470.Crossref, Google Scholar
Imai K, Strauss A (2011) Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Anal. 19(1):1–19.Crossref, Google Scholar
Imbens GW, Rubin DB (2015) Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Jacob D (2020) Cross-fitting and averaging for machine learning estimation of heterogeneous treatment effects. Preprint, submitted July 6, https://arxiv.org/abs/2007.02852.Google Scholar
Jung J, Bapna R, Golden JM, Sun T (2020) Words matter! Toward a prosocial call-to-action for online referral: Evidence from two field experiments. Inform. Systems Res. 31(1):16–36.Link, Google Scholar
Kennedy EH (2023) Toward optimal doubly robust estimation of heterogeneous causal effects. Electronic J. Statis. 17(2):3008–3049.Google Scholar
Kohavi R, Henne RM, Sommerfield D (2007) Practical guide to controlled experiments on the web: Listen to your customers not to the hippo. Proc. 13th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 959–967.Google Scholar
Kuleshov V, Fenner N, Ermon S (2018) Accurate uncertainties for deep learning using calibrated regression. Proc. Internat. Conf. on Machine Learn. (ACM, New York), 2801–2809.Google Scholar
Künzel SR, Sekhon JS, Bickel PJ, Yu B (2019) Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. USA 116(10):4156–4165.Crossref, Google Scholar
Kuppuswamy V, Bayus BL (2017) Does my contribution to your crowdfunding project matter? J. Bus. Venturing 32(1):72–89.Crossref, Google Scholar
Kuusela P, Keil T, Maula M (2017) Driven by aspirations, but in what direction? Performance shortfalls, slack resources, and resource-consuming vs. resource-freeing organizational change. Strategic Management J. 38(5):1101–1120.Crossref, Google Scholar
Letham B, Karrer B, Ottoni G, Bakshy E (2019) Constrained bayesian optimization with noisy experiments. Bayesian Anal. 14(2):495–519.Crossref, Google Scholar
Lewandowski D, Kurowicka D, Joe H (2009) Generating random correlation matrices based on vines and extended onion method. J. Multivariate Anal. 100(9):1989–2001.Crossref, Google Scholar
Lin W (2013) Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. Ann. Appl. Statist. 7(1):295–318.Crossref, Google Scholar
Markov IL, Wang H, Kasturi N, Singh S, Yuen SW, Garrard M, Tran S, et al. (2021) Looper: An end-to-end ml platform for product decisions. Preprint, submitted October 14, https://arxiv.org/abs/2110.07554.Google Scholar
McFowland E III (2022) Commentary on “Causal decision making and causal effect estimation are not the same… and why it matters”. INFORMS J. Data Sci. 1(1):21–22.Link, Google Scholar
McFowland E, Gangarapu S, Bapna R, Sun T (2021) A prescriptive analytics framework for optimal policy deployment using heterogeneous treatment effects. MIS Quart. 45(4):1807–1832.Crossref, Google Scholar
Morgan NA, Rego LL (2006) The value of different customer satisfaction and loyalty metrics in predicting business performance. Marketing Sci. 25(5):426–439.Link, Google Scholar
Nie X, Wager S (2021) Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 108(2):299–319.Crossref, Google Scholar
Oettl A (2012) Reconceptualizing stars: Scientist helpfulness and peer performance. Management Sci. 58(6):1122–1140.Link, Google Scholar
Olaya D, Coussement K, Verbeke W (2020) A survey and benchmarking study of multitreatment uplift modeling. Data Mining Knowledge Discovery 34(2):273–308.Crossref, Google Scholar
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3):61–74.Google Scholar
Prais SJ, Aitchison J (1954) The grouping of observations in regression analysis. Rev. Inst. Internat. Statist. 22(1/3):1–22.Google Scholar
Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, Rich S, et al. (2020) Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence 2(7):369–375.Crossref, Google Scholar
Rzepakowski P, Jaroszewicz S (2012) Decision trees for uplift modeling with single and multiple treatments. Knowledge Inform. Systems 32(2):303–327.Crossref, Google Scholar
Schuler A, Baiocchi M, Tibshirani R, Shah N (2018) A comparison of methods for model selection when estimating individual treatment effects. Preprint, submitted June 13, https://arxiv.org/abs/1804.05146.Google Scholar
Shalit U (2022) Commentary on “Causal decision making and causal effect estimation are not the same… and why it matters”. INFORMS J. Data Sci. 1(1):19–20.Link, Google Scholar
Sun T, Gao G, Jin GZ (2019) Mobile messaging for offline group formation in prosocial activities: A large field experiment. Management Sci. 65(6):2717–2736.Link, Google Scholar
Van der Laan MJ, Polley EC, Hubbard AE (2007) Super learner. Statist. Appl. Genetic Molecular Biology, vol. 6 (De Gruyter, Berlin), 1–23.Crossref, Google Scholar
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113(523):1228–1242.Crossref, Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259.Crossref, Google Scholar
Wooldridge JM (1999) Distribution-free estimation of some nonlinear panel data models. J. Econometrics 90(1):77–97.Crossref, Google Scholar
Wu H, Tan S, Li W, Garrard M, Obeng A, Dimmery D, Singh S, et al. (2022) Interpretable personalized experimentation. Proc. 28th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (ACM, New York), 4173–4183.Google Scholar
Xie Y (2007) Otis dudley duncan’s legacy: The demographic approach to quantitative reasoning in social science. Res. Soc. Stratification Mobility 25(2):141–156.Crossref, Google Scholar
Xie Y, Brand JE, Jann B (2012) Estimating heterogeneous treatment effects with observational data. Sociol. Methodology 42(1):314–347.Crossref, Google Scholar
Zhang M, Tsiatis AA, Davidian M (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 64(3):707–715.Crossref, Google Scholar
Zhao Q, Small DS, Ertefaie A (2017a) Selective inference for effect modification via the lasso. Preprint, submitted May 22, https://arxiv.org/abs/1705.08020.Google Scholar
Zhao Y, Fang X, Simchi-Levi D (2017b) Uplift modeling with multiple treatments and general response types. Proc. SIAM Internat. Conf. on Data Mining (SIAM, Philadelphia), 588–596.Google Scholar

cover image Information Systems Research

Volume 35, Issue 4

December 2024

Pages iii-x, 1507-2085, C2

Article Information

Supplemental Material

Metrics

Information

Received:June 28, 2021
Accepted:October 13, 2023
Published Online:January 12, 2024

Cite as

Yan Leng; , Drew Dimmery (2024) Calibration of Heterogeneous Treatment Effects in Randomized Experiments. Information Systems Research 35(4):1721-1742.

https://doi.org/10.1287/isre.2021.0343

Keywords

Acknowledgments

The authors thank the senior editor, the associate editor, and the anonymous reviewers for their insightful comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Calibration of Heterogeneous Treatment Effects in Randomized Experiments

References

Volume 35, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News