Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach

John C. Duchi
John C. Duchi
[email protected]
Department of Electrical Engineering and Statistics, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Peter W. Glynn
Peter W. Glynn
[email protected]
Department of Management Science and Engineering, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Hongseok Namkoong
Corresponding Author
Hongseok Namkoong
[email protected]
https://orcid.org/0000-0002-5708-4044
Decision, Risk, and Operations Division, Columbia Business School, New York City, NY 10027
Search for more papers by this author

John C. Duchi

[email protected]

Department of Electrical Engineering and Statistics, Stanford University, Stanford, California 94305;

Search for more papers by this author

Peter W. Glynn

[email protected]

Department of Management Science and Engineering, Stanford University, Stanford, California 94305;

Search for more papers by this author

Hongseok Namkoong

Corresponding Author

Hongseok Namkoong

[email protected]

https://orcid.org/0000-0002-5708-4044

Decision, Risk, and Operations Division, Columbia Business School, New York City, NY 10027

Search for more papers by this author

Published Online:27 Jan 2021https://doi.org/10.1287/moor.2020.1085

References

[1] Ali SM , Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B . 28:131–142.Google Scholar
[2] Arcones MA , Yu B (1994) Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab. 7(1):47–71.Crossref, Google Scholar
[3] Artzner P , Delbaen F , Eber J-M , Heath D (1999) Coherent measures of risk. Math. Finance 9(3):203–228.Crossref, Google Scholar
[4] Baggerly KA (1998) Empirical likelihood as a goodness-of-fit measure. Biometrika 85(3):535–547.Crossref, Google Scholar
[5] Bartlett PL , Bousquet O , Mendelson S (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.Crossref, Google Scholar
[6] Bean D , Bickel P , El Karoui N , Yu B (2013) Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. USA 110(36):14563–14568.Crossref, Google Scholar
[7] Ben-Tal A , Ghaoui LE , Nemirovski A (2009) Robust Optimization (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
[8] Ben-Tal A , Hazan E , Koren T , Mannor S (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.Link, Google Scholar
[9] Ben-Tal A , den Hertog D , Waegenaere AD , Melenberg B , Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
[10] Bertail P (2006) Empirical likelihood in some semiparametric models. Bernoulli 12(2):299–331.Crossref, Google Scholar
[11] Bertail P , Gautherat E , Harari-Kermadec H (2014) Empirical φ∗ p-divergence minimizers for hadamard differentiable functionals. Akritas MG , Lahiri SN , Politis DN , eds. Topics in Nonparametric Statistics (Springer, New York), 21–32.Crossref, Google Scholar
[12] Bertsekas DP (1973) Stochastic optimization problems with nondifferentiable cost functionals. J. Optim. Theory Appl. 12(2):218–231.Crossref, Google Scholar
[13] Bertsimas D , Gupta V , Kallus N (2014) Robust sample average approximation. Preprint, submitted August 19, https://arxiv.org/abs/1408.4445.Google Scholar
[14] Bertsimas D , Gupta V , Kallus N (2018) Data-driven robust optimization. Math. Programming 167(2):235–292.Crossref, Google Scholar
[15] Billingsley P (1986) Probability and Measure , 2nd ed. (Wiley, New York).Google Scholar
[16] Black F (1976) Studies of stock price volatility changes. Proc. 1976 Meetings Amer. Statist. Assoc. (American Statistical Association, Washington, DC), 177–181.Google Scholar
[17] Blanchet J , Murthy K (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
[18] Blanchet J , Kang Y , Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
[19] Bradley RC (2005) Basic properties of strong mixing conditions. a survey and some open questions. Probab. Surveys 2:107–144.Crossref, Google Scholar
[20] Bravo F (2003) Second-order power comparisons for a class of nonparametric likelihood-based tests. Biometrika 90(4):881–890.Crossref, Google Scholar
[21] Bravo F (2006) Bartlett-type adjustments for empirical discrepancy test statistics. J. Statist. Planning Inference 136(3):537–554.Crossref, Google Scholar
[22] Bubeck S , Eldan R , Lehec J (2015) Finite-time analysis of projected Langevin Monte Carlo. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1243–1251.Google Scholar
[23] Candès E , Sur P (2020) The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. Ann. Statist. 48(1):27–42.Crossref, Google Scholar
[24] Chen SX , Peng L , Qin YL (2009) Effects of data dimension on empirical likelihood. Biometrika 96(3):711–722.Crossref, Google Scholar
[25] Chen X , Lee JD , Tong XT , Zhang Y (2020) Statistical inference for model parameters in stochastic gradient descent. Ann. Statist. 48(1):251–273.Crossref, Google Scholar
[26] Christie AA (1982) The stochastic behavior of common stock variances: Value, leverage and interest rate effects. J. Financial Econom. 10(4):407–432.Crossref, Google Scholar
[27] Clarkson K , Hazan E , Woodruff D (2012) Sublinear optimization for machine learning. J. ACM 59(5):23.Crossref, Google Scholar
[28] Corcoran SA (1998) Bartlett adjustment of empirical discrepancy statistics. Biometrika 85(4):967–972.Crossref, Google Scholar
[29] Cressie N , Read TR (1984) Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. B 46(3):440–464.Crossref, Google Scholar
[30] Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observation. Studia Scientifica Mathematica Hungary 2:299–318.Google Scholar
[31] Danskin JM (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems (Springer, Berlin).Crossref, Google Scholar
[32] Defazio A , Bach F , Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z , Welling M , Cortes C , Lawrence ND , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 27 (Neural Information Processing Systems Foundation, San Diego), 1646–1654.Google Scholar
[33] Delage E , Ye Y (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595–612.Link, Google Scholar
[34] DiCiccio T , Hall P , Romano J (1988) Bartlett adjustment for empirical likelihood. Technical Report 298. Department of Statistics, Stanford University, Stanford, CA.Google Scholar
[35] DiCiccio T , Hall P , Romano J (1991) Empirical likelihood is Bartlett-correctable. Ann. Statist. 19(2):1053–1061.Crossref, Google Scholar
[36] Donoho D , Montanari A (2016) High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probab. Theory Related Fields 166(3–4):935–969.Crossref, Google Scholar
[37] Doukhan P (1994) Mixing, Properties and Examples (Springer, New York).Google Scholar
[38] Doukhan P , Massart P , Rio E (1995) Invariance principles for absolutely regular empirical processes. Annales de l’IHP probabilités et statistiques 31(2):393–427.Google Scholar
[39] Duchi JC , Namkoong H (2016) Variance-based regularization with convex objectives. Preprint, submitted October 8, https://arxiv.org/abs/1610.02581.Google Scholar
[40] Duchi JC , Namkoong H (2019) Variance-based regularization with convex objectives. J. Machine Learn. Res. 20(68):1–55.Google Scholar
[41] Duchi JC , Hazan E , Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12(61):2121–2159.Google Scholar
[42] Dupacová J , Wets R (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.Crossref, Google Scholar
[43] Esfahani PM , Kuhn D (2018) Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.Crossref, Google Scholar
[44] Ethier SN , Kurtz TG (2009) Markov Processes: Characterization and Convergence (Wiley, New York).Google Scholar
[45] Fournier N , Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4):707–738.Crossref, Google Scholar
[46] Glynn PW , Zeevi A (2008) Bounding stationary expectations of markov processes. Ethier SN , Feng J , Stockbridge RH , eds. Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz (Institute of Mathematical Statistics, Beachwood, OH), 195–214.Crossref, Google Scholar
[47] Gupta V (2019) Near-optimal Bayesian ambiguity sets for distributionally robust optimization. Management Sci. 65(9):4242–4260.Link, Google Scholar
[48] Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
[49] Hiriart-Urruty J , Lemaréchal C (1993) Convex Analysis and Minimization Algorithms I (Springer, New York).Crossref, Google Scholar
[50] Hiriart-Urruty J , Lemaréchal C (1993) Convex Analysis and Minimization Algorithms II (Springer, New York).Crossref, Google Scholar
[51] Hjort NL , McKeague IW , Van Keilegom I (2009) Extending the scope of empirical likelihood. Ann. Statist. 37(3):1079–1111.Crossref, Google Scholar
[52] Ibragimov IA (1962) Some limit theorems for stationary processes. Theory Probab. Appl. 7(4):349–382.Crossref, Google Scholar
[53] Imbens G (2002) Generalized method of moments and empirical likelihood. J. Bus. Econom. Statist. 20(4):493–506.Crossref, Google Scholar
[54] Jiang R , Guan Y (2016) Data-driven chance constrained stochastic program. Math. Programming 158(1–2):291–327.Crossref, Google Scholar
[55] Johnson R , Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC , Bottou L , Welling M , Ghahramani Z , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 26 (Neural Information Processing Systems Foundation, San Diego), 315–323.Google Scholar
[56] King AJ (1989) Generalized delta theorems for multivalued mappings and measurable selections. Math. Oper. Res. 14(4):720–736.Link, Google Scholar
[57] King AJ , Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Link, Google Scholar
[58] King AJ , Wets RJ (1991) Epi-consistency of convex stochastic programs. Stochastics Stochastic Rep. 34(1–2):83–92.Crossref, Google Scholar
[59] Kosorok MR (2008) Introduction to empirical processes. Introduction to Empirical Processes and Semiparametric Inference (Springer, New York), 77–79.Crossref, Google Scholar
[60] Krokhmal PA (2007) Higher moment coherent risk measures. Quant. Finance 7(4):373–387.Crossref, Google Scholar
[61] Lam H (2016) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.Link, Google Scholar
[62] Lam H (2018) Sensitivity to serial dependency of input processes: A robust approach. Management Sci. 64(3):1311–1327.Link, Google Scholar
[63] Lam H , Zhou E (2017) The empirical likelihood approach to quantifying uncertainty in sample average approximation. Oper. Res. Lett. 45(4):301–307.Crossref, Google Scholar
[64] Lan G , Nemirovski A , Shapiro A (2012) Validation analysis of robust stochastic approximation method. Math. Programming 134(2):425–458.Crossref, Google Scholar
[65] Lehmann EL , Romano JP (2005) Testing Statistical Hypotheses , 3rd ed. (Springer, New York).Google Scholar
[66] Li T , Liu L , Kyrillidis A , Caramanis C (2018) Statistical inference using SGD. Thirty-Second AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 3571–3578.Google Scholar
[67] Mak W-K , Morton DP , Wood RK (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1):47–56.Crossref, Google Scholar
[68] Mandt S , Hoffman M , Blei D (2017) Stochastic gradient descent as approximate Bayesian inference. J. Machine Learn. Res. 18(134):1–35.Google Scholar
[69] Markowitz H (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
[70] Meyn S , Tweedie RL (2009) Markov Chains and Stochastic Stability , 2nd ed. (Cambridge University Press, New York).Crossref, Google Scholar
[71] Mokkadem A (1990) Propriétés de mélange des processus autorégressifs polynomiaux. Ann. Inst. Henri Poincaré Probab. Statist. 26(2):219–260.Google Scholar
[72] Namkoong H , Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee DD , Sugiyama M , Luxburg UV , Guyon I , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 29 (Neural Information Processing Systems Foundation, San Diego), 2208–2216.Google Scholar
[73] Namkoong H , Duchi JC (2017) Variance-based regularization with convex objectives. Guyon I , Luxburg UV , Bengio S , Wallach H , Fergus R , Vishwanathan S , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 30 (Neural Information Processing Systems Foundation, San Diego), 2971–2980.Google Scholar
[74] Nemirovski A , Juditsky A , Lan G , Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[75] Newey W , Smith R (2004) Higher order properties of gmm and generalized empirical likelihood estimators. Econometrica 72(1):219–255.Crossref, Google Scholar
[76] Nobel A , Dembo A (1993) A note on uniform laws of averages for dependent processes. Statist. Probab. Lett. 17(3):169–172.Crossref, Google Scholar
[77] Nummelin E , Tweedie RL (1978) Geometric ergodicity and r-positivity for general markov chains. Ann. Probab. 6(3):404–420.Crossref, Google Scholar
[78] Owen A (1990) Empirical likelihood ratio confidence regions. Ann. Statist. 18(1):90–120.Crossref, Google Scholar
[79] Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75(2):237–249.Crossref, Google Scholar
[80] Owen AB (2001) Empirical Likelihood (CRC Press, Boca Raton, FL).Crossref, Google Scholar
[81] Pflug G , Wozabal D (2007) Ambiguity in portfolio selection. Quant. Finance 7(4):435–442.Crossref, Google Scholar
[82] Polyak BT , Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
[83] Rio E (2017) Asymptotic Theory of Weakly Dependent Random Processes (Springer, New York).Crossref, Google Scholar
[84] Rockafellar RT , Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
[85] Rockafellar RT , Wets RJB (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar
[86] Römisch W (2005) Delta method, infinite dimensional. Kotz S, Read CB, Balakrishnan N, Vidakovic B, eds. Encyclopedia of Statistical Sciences (Wiley, Hoboken, NJ).Google Scholar
[87] Scarsini M (1999) Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1):93–103.Crossref, Google Scholar
[88] Shafieezadeh-Abadeh S , Esfahani PM , Kuhn D (2015) Distributionally robust logistic regression. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1576–1584.Google Scholar
[89] Shalev-Shwartz S , Wexler Y (2016) Minimizing the maximal loss: How and why? Balcan MF , Weinberger KQ , eds. Proc. 33rd Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 793–801.Google Scholar
[90] Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
[91] Shapiro A (1990) On differential stability in stochastic programming. Math. Programming 47(1–3):107–116.Crossref, Google Scholar
[92] Shapiro A (1991) Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30(1):169–186.Crossref, Google Scholar
[93] Shapiro A (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.Link, Google Scholar
[94] Shapiro A , Dentcheva D , Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society, Philadelphia).Crossref, Google Scholar
[95] Sinha A , Namkoong H , Volpi R , Duchi JC (2017) Certifiable distributional robustness with principled adversarial training. Preprint, submitted October 29, https://arxiv.org/abs/1710.10571.Google Scholar
[96] Udell M , Mohan K , Zeng D , Hong J , Diamond S , Boyd S (2014) Convex optimization in Julia. First Workshop High Performance Tech. Comput. Dynam. Languages (IEEE, New York), 18–28.Google Scholar
[97] van der Vaart AW (1998) Asymptotic Statistics (Cambridge University Press, New York).Crossref, Google Scholar
[98] van der Vaart AW , Wellner JA (1996) Weak Convergence and Empirical Processes with Applications to Statistics (Springer, New York).Crossref, Google Scholar
[99] Wang Z , Glynn P , Ye Y (2016) Likelihood robust optimization for data-driven problems. Comput. Management Sci. 13:241–261.Crossref, Google Scholar
[100] Wozabal D (2012) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1):21–47.Crossref, Google Scholar
[101] Xu H , Caramanis C , Mannor S (2009) Robustness and regularization of support vector machines. J. Machine Learn. Res. 10:1485–1510.Google Scholar
[102] Yu B (1994) Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22(1):94–116.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 46, Issue 3

August 2021

Pages 835-1234, C2

Article Information

Supplemental Material

Metrics

Information

Received:July 13, 2018
Accepted:July 29, 2020
Published Online:January 27, 2021

Cite as

John C. Duchi , Peter W. Glynn , Hongseok Namkoong (2021) Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach. Mathematics of Operations Research 46(3):946-969.

https://doi.org/10.1287/moor.2020.1085

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach

References

Volume 46, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News