Nonstationary A/B Tests: Optimal Variance Reduction, Bias Correction, and Valid Inference
Published Online:18 Sep 2024https://doi.org/10.1287/mnsc.2022.01205
References
- (2010) Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. J. Amer. Statist. Assoc. 105(490):493–505.Crossref, Google Scholar
- (2018) Best of both worlds: Stochastic & adversarial best-arm identification. Lawrence N, ed. Proc. Conf. Learn. Theory (PMLR, Cambridge), 918–949.Google Scholar
- (2021) Expected value of information methods for contextual ranking and selection: Clinical trials and simulation optimization. Proc. Winter Simulation Conf. (IEEE, New York), 1–12.Google Scholar
- (2007) Stochastic Simulation: Algorithms and Analysis, vol. 57 (Springer Science & Business Media, Boston).Crossref, Google Scholar
- (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973.Crossref, Google Scholar
- (2015) The power of optimization over randomization in designing experiments involving small samples. Oper. Res. 63(4):868–876.Link, Google Scholar
- (2020) Near-optimal ab testing. Management Sci. 66(10):4477–4495.Link, Google Scholar
- (1972) On simpson’s paradox and the sure-thing principle. J. Amer. Statist. Assoc. 67(338):364–366.Crossref, Google Scholar
- (2023) Design and analysis of switchback experiments. Management Sci. 69(7):3759–3777.Link, Google Scholar
- (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
- (2012) Sequential sampling with economics of selection procedures. Management Sci. 58(3):550–569.Link, Google Scholar
- (2001) New two-stage and sequential procedures for selecting the best simulated system. Oper. Res. 49(5):732–743.Link, Google Scholar
- (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71–80.Link, Google Scholar
- (2022) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Link, Google Scholar
- (2013) Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. Proc. 6th ACM Internat. Conf. Web Search Data Mining (ACM, New York), 123–132.Google Scholar
- (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5):2410–2439.Crossref, Google Scholar
- (2009) The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4):599–613.Link, Google Scholar
- (1992) Some guidelines and guarantees for common random numbers. Management Sci. 38(6):884–908.Link, Google Scholar
- (2019) Top challenges from the first practical online controlled experiments summit. SIGKDD Exploration 21(1):20–35.Crossref, Google Scholar
- (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66(2):315–331.Crossref, Google Scholar
- (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189.Crossref, Google Scholar
- (2020) Limiting bias from test-control interference in online marketplace experiments. Preprint, submitted May 20, https://dx.doi.org/10.2139/ssrn.3583596.Google Scholar
- (2016) Non-stochastic best arm identification and hyperparameter optimization. Artificial Intelligence and Statistics (PMLR, Cambridge), 240–248.Google Scholar
- (2017) Peeking at a/b tests: Why it matters, and what to do about it. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1517–1525.Google Scholar
- (2022a) Always valid inference: Continuous monitoring of a/b tests. Oper. Res. 70(3):1806–1821.Link, Google Scholar
- (2022b) Experimental design in two-sided platforms: An analysis of bias. Management Sci. 68(10):7069–7089.Link, Google Scholar
- (2021) The role of contextual information in best arm identification. Preprint, submitted June 26, https://arxiv.org/abs/2106.14077.Google Scholar
- (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
- (2017) Online controlled experiments and a/b testing. Encyclopedia Machine Learn. Data Mining 7(8):922–929.Crossref, Google Scholar
- (2020) Trustworthy Online Controlled Experiments: A Practical Guide to a/b Testing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2013) Online controlled experiments at large scale. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1168–1176.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2020) Rerandomization and regression adjustment. J. Roy. Statist. Soc. Ser. B Statist. Methodology 82(1):241–268.Crossref, Google Scholar
- (2019) A dimension-free algorithm for contextual continuum-armed bandits. Preprint, submitted July 15, https://arxiv.org/abs/1907.06550.Google Scholar
- (2021) Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms. Preprint, submitted April 25, https://arxiv.org/abs/2104.12222.Google Scholar
- (2013) Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. Ann. Appl. Statist. 7(1):295–318.Crossref, Google Scholar
- (2013) Adjusting treatment effect estimates by post-stratification in randomized experiments. J. Roy. Statist. Soc. Ser. B Statist. Methodology 75(2):369–396.Crossref, Google Scholar
- (1990) Semiparametric efficiency bounds. J. Appl. Econometrics 5(2):99–135.Crossref, Google Scholar
- (2022) Adaptivity and confounding in multi-armed bandit experiments. Preprint, submitted February 18, https://arxiv.org/abs/2202.09036.Google Scholar
- (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688.Crossref, Google Scholar
- (1978) Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6(1):34–58.Google Scholar
- (2021) A/b/n testing with control in the presence of subpopulations. Adv. Neural Inform. Processing Systems 34:25100–25110.Google Scholar
- (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Industry 26(6):639–658.Crossref, Google Scholar
- (2019) Universal best arm identification. IEEE Trans. Signal Processing 67(17):4464–4478.Crossref, Google Scholar
- (2016) Scalable semiparametric inference for the means of heavy-tailed distributions. Preprint, submitted February 25, https://arxiv.org/abs/1602.08066.Google Scholar
- (2010) Overlapping experiment infrastructure: More, better, faster experimentation. Proc. 16th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 17–26.Google Scholar
- (2013) Graph cluster randomization: Network exposure to multiple universes. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 329–337.Google Scholar
- (2016) Improving the sensitivity of online controlled experiments: Case studies at netflix. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 645–654.Google Scholar
- (2022) Adaptive stratified sampling with infinitely many strata. Working paper, Stanford University, Palo Alto, CA.Google Scholar
- (2024) Pigeonhole design: Balancing sequential experiments from an online matching perspective. Management Sci., ePub ahead of print May 24, https://doi.org/10.1287/mnsc.2023.02184.Google Scholar
- (2017) A CLT for infinitely stratified estimators, with applications to debiased MLMC. ESAIM Proc. Surveys 59:104–114.Crossref, Google Scholar
- (2021) Safe optimal design with applications in policy learning. Preprint, submitted November 10, https://dx.doi.org/10.2139/ssrn.3959086.Google Scholar

