Minimax Optimal Estimation of Stability Under Distribution Shift
References
- ACCORD Study Group (2010) Effects of intensive blood-pressure control in type 2 diabetes mellitus. New Engl. J. Med. 362(17):1575–1585.Crossref, Google Scholar
- (2018) Automated essay scoring in the presence of biased ratings. Proc. 2018 Conf. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Kerrville, TX), 229–237.Google Scholar
- (1999) Coherent measures of risk. Math. Finance 9(3):203–228.Crossref, Google Scholar
- (2000) Ruin Probabilities, Advanced Series on Statistical Science & Applied Probability, vol. 2 (World Scientific, Singapore).Crossref, Google Scholar
- (2007) Stochastic Simulation: Algorithms and Analysis (Springer, New York).Crossref, Google Scholar
- (2017) Beyond prediction: Using big data for policy problems. Science 355(6324):483–485.Crossref, Google Scholar
- (2023) Explaining machine learning models using entropic variable projection. Inform. Inference J. IMA 12(3):1686–1715.Google Scholar
- (2019) From detection of individual metastases to classification of lymph node status at the patient level: The CAMELYON17 challenge. IEEE Trans. Med. Imaging 38(2):550–560.Crossref, Google Scholar
- (2015) Six randomized evaluations of microcredit: Introduction and further steps. Amer. Econom. J. Appl. Econom. 7(1):1–21.Crossref, Google Scholar
- (2017) Detecting heterogeneous treatment effects to guide personalized blood pressure treatment: A modeling study of randomized clinical trials. Ann. Intern. Med. 166(5):354–360.Crossref, Google Scholar
- (2020) The iWildCam 2020 competition dataset. Preprint, submitted April 21, https://arxiv.org/abs/2004.10340.Google Scholar
- (1966) Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Statist. 37(1):51–58.Crossref, Google Scholar
- (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
- (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
- (2023) An automatic finite-sample robustness metric: When can dropping a little data make a big difference? Preprint, submitted July 19, https://arxiv.org/abs/2011.14999.Google Scholar
- (1986) Fundamentals of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA).Google Scholar
- (2014) Twenty years post-NIH revitalization act: Enhancing minority participation in clinical trials (EMPaCT): Laying the groundwork for improving minority clinical trial accrual: Renewing the case for enhancing minority participation in cancer clinical trials. Cancer 120(57):1091–1096.Crossref, Google Scholar
- (2020) Ethical machine learning in health care. Preprint, submitted October 8, https://arxiv.org/abs/2009.10576.Google Scholar
- (2021) Who is driving the great resignation? Harvard Bus. Rev. (September 15), https://hbr.org/2021/09/who-is-driving-the-great-resignation.Google Scholar
- (1959) Smoking and lung cancer: Recent evidence and a discussion of some questions. J. Natl. Cancer Inst. 22(1):173–203.Google Scholar
- (2007) Fertility and female labor supply in Latin America: New causal evidence. Labour Econom. 14(3):565–573.Crossref, Google Scholar
- (1984) Sanov property, generalized I-projection and a conditional limit theorem. Ann. Probab. 12(3):768–793.Crossref, Google Scholar
- (1990) Empirical laplace transform and approximation of compound distributions. J. Appl. Probab. 27(1):88–101.Crossref, Google Scholar
- (2005) Medicaid managed care: Effects on children’s Medicaid coverage and utilization. J. Public Econom. 89(1):85–108.Crossref, Google Scholar
- (1996) Health insurance eligibility, utilization of medical care, and child health. Quart. J. Econom. 111(2):431–466.Crossref, Google Scholar
- (2022) Queueing network controls via deep reinforcement learning. Stochastic Systems 12(1):30–67.Link, Google Scholar
- (2022) Underspecification presents challenges for credibility in modern machine learning. J. Machine Learn. Res. 23(226):1–61.Google Scholar
- (2021) From local to global: External validity in a fertility natural experiment. J. Bus. Econom. Statist. 39(1):217–243.Crossref, Google Scholar
- (2002) Coherent risk measures on general probability spaces. Sandmann K, Schönbucher PJ, eds. Advances in Finance and Stochastics (Springer, Berlin, Heidelberg), 1–37.Crossref, Google Scholar
- (1998) Large Deviations Techniques and Applications (Springer-Verlag, Berlin, Heidelberg).Crossref, Google Scholar
- (1989) Large Deviations, Pure and Applied Mathematics, vol. 137 (Academic Press, Boston).Google Scholar
- (1976) Asymptotic evaluation of certain Markov process expectations for large time-III. Commun. Pure Appl. Math. 29(4):389–461.Crossref, Google Scholar
- (2021) Learning models with uniform performance via distributionally robust optimization. Ann. Statist. 49(3):1378–1406.Crossref, Google Scholar
- (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.Link, Google Scholar
- (2023) Distributionally robust losses for latent covariate mixtures. Oper. Res. 71(2):649–664.Link, Google Scholar
- (2005) The large deviations of estimating rate functions. J. Appl. Probab. 42(1):267–274.Crossref, Google Scholar
- (2015) Estimating large deviation rate functions. Preprint, submitted November 7, https://arxiv.org/abs/1511.02295v1.Google Scholar
- (1993) Mt/G/∞ queues with sinusoidal arrival rates. Management Sci. 39(2):241–252.Link, Google Scholar
- (2007) Entropy, Large Deviations, and Statistical Mechanics (Springer, Berlin, Heidelberg).Google Scholar
- (1989) On the empirical saddlepoint approximation. Biometrika 76(3):457–464.Crossref, Google Scholar
- (1977) The empirical characteristic function and its applications. Ann. Statist. 5(1):88–97.Crossref, Google Scholar
- (2023) Towards practical robustness auditing for linear regression. Preprint, submitted July 30, https://arxiv.org/abs/2307.16315.Google Scholar
- (2023) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.Link, Google Scholar
- (2022) Driver surge pricing. Management Sci. 68(5):3219–3235.Link, Google Scholar
- (2015) How does health promotion work? Evidence from the dirty business of eliminating open defecation. NBER Working Paper No. 20997, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2019) Robust analysis in stochastic simulation: Computation and performance guarantees. Oper. Res. 67(1):232–249.Link, Google Scholar
- (2004) Monte Carlo Methods in Financial Engineering, Stochastic Modelling and Applied Probability, vol. 53 (Springer, New York).Google Scholar
- (2014) Robust risk measurement and model risk. Quant. Finance 14(1):29–58.Crossref, Google Scholar
- (2018) Bounding wrong-way risk in CVA calculation. Math. Finance 28(1):268–305.Crossref, Google Scholar
- (2019) Estimation and inference for non-stationary arrival models with a linear trend. Winter Simul. Conf. WSC (IEEE, Piscataway, NJ), 3764–3773.Google Scholar
- (2019) Optimization-based calibration of simulation input models. Oper. Res. 67(5):1362–1382.Link, Google Scholar
- (1989) Testing the validity of a queueing model of police patrol. Management Sci. 35(2):127–148.Link, Google Scholar
- (1991) Some effects of nonstationarity on multiserver Markovian queueing systems. Oper. Res. 39(3):502–511.Link, Google Scholar
- (2022) Why do tree-based models still outperform deep learning on tabular data? Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 507–520.Google Scholar
- (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Internat. Conf. Robotics Automation (IEEE, Piscataway, NJ), 3389–3396.Google Scholar
- (2021) The s-value: Evaluating stability with respect to distributional shifts. Preprint, submitted May 7, https://arxiv.org/abs/2105.03067v1.Google Scholar
- (1992) The abscissa of convergence of the Laplace transform. J. Appl. Probab. 29(2):353–362.Crossref, Google Scholar
- (2006) Classifier technology and the illusion of progress. Statist. Sci. 21(1):1–14.Crossref, Google Scholar
- (2001) Robust control and model uncertainty. Amer. Econom. Rev. 91(2):60–66.Crossref, Google Scholar
- (1989) Scheduling networks of queues: Heavy traffic analysis of a simple open network. Queueing Systems 5(2):265–280.Crossref, Google Scholar
- (2004) Dynamic scheduling of a multiclass queue in the Halfin-Whitt heavy traffic regime. Oper. Res. 52(2):243–257.Link, Google Scholar
- (2009) The Elements of Statistical Learning, 2nd ed. (Springer, New York).Crossref, Google Scholar
- (2018) Deep reinforcement learning that matters. Thirty-Second AAAI Conf. Artificial Intelligence (AAAI Press, Washington, DC), 3207–3214.Google Scholar
- (1996) Importance measures in global sensitivity analysis of nonlinear models. Reliability Engrg. System Safety 52(1):1–17.Crossref, Google Scholar
- (2025) Prediction-driven surge planning with application to emergency department nurse staffing. Management Sci. 71(3):2079–2126.Google Scholar
- (1981) Robust Statistics (John Wiley and Sons, New York).Crossref, Google Scholar
- (2009) Robust Statistics, 2nd ed. (John Wiley and Sons, New York).Crossref, Google Scholar
- (2005) Why most published research findings are false. PLoS Med. 2(8):e124.Crossref, Google Scholar
- (2020a) Robust causal inference under covariate shift via worst-case subpopulation treatment effect. Proc. Thirty-Third Annu. Conf. Comput. Learn. Theory (PMLR, Graz, Austria), 2079–2084.Google Scholar
- (2020b) Assessing external validity over worst-case subpopulations. Preprint, submitted July 5, https://arxiv.org/abs/2007.02411v1.Google Scholar
- (2019) Estimating sensitivity to input model variance. Proc. 2019 Winter Simul. Conf. (IEEE, Piscataway, NJ), 3705–3716.Google Scholar
- (2017) Utilization of smoking cessation medication benefits among Medicaid fee-for-service enrollees 1999–2008. PLoS One 12(2):e0170381.Crossref, Google Scholar
- (2018) Confounding-robust policy improvement. Adv. Neural Inform. Processing Systems 31:9269–9279.Google Scholar
- (2015) Prediction policy problems. Amer. Econom. Rev. 105(5):491–495.Crossref, Google Scholar
- (2020) WILDS: A benchmark of in-the-wild distribution shifts. Preprint, submitted December 14, https://arxiv.org/abs/2012.07421.Google Scholar
- (1972) Air-terminal queues under time-dependent conditions. Oper. Res. 20(6):1089–1114.Link, Google Scholar
- (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS, Catonsville, MD), 130–166.Link, Google Scholar
- (2016a) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.Link, Google Scholar
- (2016b) Advanced tutorial: Input uncertainty and robust analysis in stochastic simulation. Proc. 2016 Winter Simul. Conf. (IEEE, Piscataway, NJ), 178–192.Google Scholar
- (2018) Sensitivity to serial dependency of input processes: A robust approach. Management Sci. 64(3):1311–1327.Link, Google Scholar
- (2019) Optimization-based quantification of simulation input uncertainty via empirical likelihood. Preprint, submitted February 13, https://arxiv.org/abs/1707.05917.Google Scholar
- (2017) The empirical likelihood approach to quantifying uncertainty in sample average approximation. Oper. Res. Lett. 45(4):301–307.Crossref, Google Scholar
- (1973) Convergence of estimates under dimensionality restrictions. Ann. Statist. 1(1):38–53.Google Scholar
- (1986) Asymptotic Methods in Statistical Decision Theory (Springer, New York).Crossref, Google Scholar
- (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10):733–739.Crossref, Google Scholar
- (2015) Density modification-based reliability sensitivity analysis. J. Statist. Comput. Simul. 85(6):1200–1223.Crossref, Google Scholar
- (2020) Large-scale methods for distributionally robust optimization. Adv. Neural Inform. Processing Systems 33:8847–8860.Google Scholar
- (2021) Evaluating model performance under worst-case subpopulations. Adv. Neural Inform. Processing Systems 34:17325–17334.Google Scholar
- (2011) Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5(3):1752–1779.Crossref, Google Scholar
- (2015) The effect of health insurance coverage on medical care utilization and health outcomes: Evidence from Medicaid adult vision benefits. J. Health Econom. 44:320–332.Crossref, Google Scholar
- (2023) Robust satisficing. Oper. Res. 71(1):61–82.Link, Google Scholar
- (2004) Scheduling flexible servers with convex delay costs: Heavy-traffic optimality of the generalized cμ-rule. Oper. Res. 52(6):836–855.Link, Google Scholar
- (2013) Public Policy in an Uncertain World (Harvard University Press, Cambridge, MA).Crossref, Google Scholar
- (2006) Robust Statistics: Theory and Methods, Wiley Series in Probability and Statistics (Wiley, Hoboken, NJ).Crossref, Google Scholar
- (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
- (2023) Provably auditing ordinary least squares in low dimensions. Proc. Eleventh Internat. Conf. Learn. Representations.Google Scholar
- (2019) Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 116(44):22071–22080.Crossref, Google Scholar
- (2014) Sobol’ indices and Shapley value. SIAM/ASA J. Uncertainty Quantification. 2(1):245–251.Crossref, Google Scholar
- (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
- (2019) Distributionally robust optimization: A review. Preprint, submitted August 13, https://arxiv.org/abs/1908.05659.Google Scholar
- (2007) Coherent approaches to risk in optimization under uncertainty. Tutorials Oper. Res. 3:38–61.Google Scholar
- (2000) Optimization of conditional value-at-risk. J. Risk 2:21–42.Crossref, Google Scholar
- (2006) Generalized deviations in risk analysis. Finance Stochastics 10(1):51–74.Crossref, Google Scholar
- (2015) Convergence of large-deviation estimators. Phys. Rev. E 92(5):052104.Crossref, Google Scholar
- (2002) Observational studies. Observational Studies (Springer, New York), 1–17.Crossref, Google Scholar
- (2010) Design of Observational Studies, Springer Series in Statistics (Springer, Cham, Switzerland).Crossref, Google Scholar
- (2011) A new u-statistic with superior design sensitivity in matched observational studies. Biometrics 67(3):1017–1027.Crossref, Google Scholar
- (2020) External validity in a stochastic world: Evidence from low-income countries. Rev. Econom. Stud. 87(1):343–381.Crossref, Google Scholar
- (2006) Optimization of convex risk functions. Math. Oper Res. 31(3):433–452.Link, Google Scholar
- (2010) Adapting visual category models to new domains. ECCV 2010 Proc. Eur. Conf. Comput. Vision (Springer, Berlin, Heidelberg), 213–226.Google Scholar
- (2008) Global Sensitivity Analysis: The Primer (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society, Philadelphia).Crossref, Google Scholar
- (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354.Crossref, Google Scholar
- (2014) Advanced tutorial: Input uncertainty quantification. Proc. Winter Simul. Conf. 2014 (IEEE, Piscataway, NJ), 162–176.Google Scholar
- (2016) Shapley effects for global sensitivity analysis: Theory and computation. SIAM/ASA J. Uncertainty Quantification 4(1):1060–1083.Crossref, Google Scholar
- SPRINT Research Group (2015) A randomized trial of intensive versus standard blood-pressure control. New Engl. J. Med. 373(22):2103–2116.Crossref, Google Scholar
- (2015) Reproducing statistical results. Annu. Rev. Stat. Appl. 2(1):1–19.Crossref, Google Scholar
- (2006) A distributional approach for causal inference using propensity scores. J. Amer. Statist. Assoc. 101(476):1619–1637.Crossref, Google Scholar
- (2019) When robustness doesn’t promote robustness: Synthetic vs. natural distribution shifts on ImageNet. Preprint submitted September 26, https://openreview.net/forum?id=HyxPIyrFvH.Google Scholar
- (2018) A review of statistical methods for generalizing from evaluations of educational interventions. Educational Res. 47(8):516–524.Crossref, Google Scholar
- (2017) A design-based approach to improve external validity in welfare policy evaluations. Eval. Rev. 41(4):326–356.Crossref, Google Scholar
- (2009) The large deviation approach to statistical mechanics. Phys. Rep. 478(1):1–69.Crossref, Google Scholar
- (2009) Introduction to Nonparametric Estimation (Springer, New York).Crossref, Google Scholar
- (1996) Weak Convergence and Empirical Processes: With Applications to Statistics (Springer, New York).Crossref, Google Scholar
- (1995) Dynamic scheduling with convex delay costs: The generalized c-μ rule. Ann. Appl. Probab. 5(3):809–833.Crossref, Google Scholar
- (2021) From data to decisions: Distributionally robust optimization is optimal. Management Sci. 67(6):3387–3402.Link, Google Scholar
- (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2021) Learning and information in stochastic networks and queues. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications (INFORMS, Catonsville, MD), 161–198.Link, Google Scholar
- (2006) All of Nonparametric Statistics (Springer Science & Business Media, New York).Google Scholar
- (2015) Health insurance market risk assessment: Covariate shift and k−anonymity. SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 226–234.Google Scholar
- (1995) Large Deviations for Performance Analysis (Chapman and Hall, Boca Raton, FL).Google Scholar
- (2002) Stochastic Process Limits: An Introduction to Stochastic Process Limits and Their Application to Queues (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2016) Stochastic processing networks. Annu. Rev. Stat. Appl. 3(1):323–345.Crossref, Google Scholar
- (2021) External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern. Med. 181(8):1065–1070.Crossref, Google Scholar
- (2022) Bounds on the conditional and average treatment effect with unobserved confounding factors. Ann. Stat. 50(5):2587–2615.Crossref, Google Scholar
- (1997) Assouad, Fano, and Le Cam. Pollard D, Torgerson E, Yang GL, eds. Festschrift for Lucien Le Cam (Springer, New York), 423–435.Crossref, Google Scholar
- (2013) Stability. Bernoulli 19(4):1484–1500.Crossref, Google Scholar
- (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 15(11):e1002683.Crossref, Google Scholar

