Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach

Published Online:https://doi.org/10.1287/moor.2020.1085

References

  • [1] Ali SM , Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B . 28:131–142.Google Scholar
  • [2] Arcones MA , Yu B (1994) Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab. 7(1):47–71.CrossrefGoogle Scholar
  • [3] Artzner P , Delbaen F , Eber J-M , Heath D (1999) Coherent measures of risk. Math. Finance 9(3):203–228.CrossrefGoogle Scholar
  • [4] Baggerly KA (1998) Empirical likelihood as a goodness-of-fit measure. Biometrika 85(3):535–547.CrossrefGoogle Scholar
  • [5] Bartlett PL , Bousquet O , Mendelson S (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.CrossrefGoogle Scholar
  • [6] Bean D , Bickel P , El Karoui N , Yu B (2013) Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. USA 110(36):14563–14568.CrossrefGoogle Scholar
  • [7] Ben-Tal A , Ghaoui LE , Nemirovski A (2009) Robust Optimization (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [8] Ben-Tal A , Hazan E , Koren T , Mannor S (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.LinkGoogle Scholar
  • [9] Ben-Tal A , den Hertog D , Waegenaere AD , Melenberg B , Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.LinkGoogle Scholar
  • [10] Bertail P (2006) Empirical likelihood in some semiparametric models. Bernoulli 12(2):299–331.CrossrefGoogle Scholar
  • [11] Bertail P , Gautherat E , Harari-Kermadec H (2014) Empirical φ∗ p-divergence minimizers for hadamard differentiable functionals. Akritas MG , Lahiri SN , Politis DN , eds. Topics in Nonparametric Statistics (Springer, New York), 21–32.CrossrefGoogle Scholar
  • [12] Bertsekas DP (1973) Stochastic optimization problems with nondifferentiable cost functionals. J. Optim. Theory Appl. 12(2):218–231.CrossrefGoogle Scholar
  • [13] Bertsimas D , Gupta V , Kallus N (2014) Robust sample average approximation. Preprint, submitted August 19, https://arxiv.org/abs/1408.4445.Google Scholar
  • [14] Bertsimas D , Gupta V , Kallus N (2018) Data-driven robust optimization. Math. Programming 167(2):235–292.CrossrefGoogle Scholar
  • [15] Billingsley P (1986) Probability and Measure , 2nd ed. (Wiley, New York).Google Scholar
  • [16] Black F (1976) Studies of stock price volatility changes. Proc. 1976 Meetings Amer. Statist. Assoc. (American Statistical Association, Washington, DC), 177–181.Google Scholar
  • [17] Blanchet J , Murthy K (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.LinkGoogle Scholar
  • [18] Blanchet J , Kang Y , Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.CrossrefGoogle Scholar
  • [19] Bradley RC (2005) Basic properties of strong mixing conditions. a survey and some open questions. Probab. Surveys 2:107–144.CrossrefGoogle Scholar
  • [20] Bravo F (2003) Second-order power comparisons for a class of nonparametric likelihood-based tests. Biometrika 90(4):881–890.CrossrefGoogle Scholar
  • [21] Bravo F (2006) Bartlett-type adjustments for empirical discrepancy test statistics. J. Statist. Planning Inference 136(3):537–554.CrossrefGoogle Scholar
  • [22] Bubeck S , Eldan R , Lehec J (2015) Finite-time analysis of projected Langevin Monte Carlo. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1243–1251.Google Scholar
  • [23] Candès E , Sur P (2020) The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. Ann. Statist. 48(1):27–42.CrossrefGoogle Scholar
  • [24] Chen SX , Peng L , Qin YL (2009) Effects of data dimension on empirical likelihood. Biometrika 96(3):711–722.CrossrefGoogle Scholar
  • [25] Chen X , Lee JD , Tong XT , Zhang Y (2020) Statistical inference for model parameters in stochastic gradient descent. Ann. Statist. 48(1):251–273.CrossrefGoogle Scholar
  • [26] Christie AA (1982) The stochastic behavior of common stock variances: Value, leverage and interest rate effects. J. Financial Econom. 10(4):407–432.CrossrefGoogle Scholar
  • [27] Clarkson K , Hazan E , Woodruff D (2012) Sublinear optimization for machine learning. J. ACM 59(5):23.CrossrefGoogle Scholar
  • [28] Corcoran SA (1998) Bartlett adjustment of empirical discrepancy statistics. Biometrika 85(4):967–972.CrossrefGoogle Scholar
  • [29] Cressie N , Read TR (1984) Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. B 46(3):440–464.CrossrefGoogle Scholar
  • [30] Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observation. Studia Scientifica Mathematica Hungary 2:299–318.Google Scholar
  • [31] Danskin JM (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems (Springer, Berlin).CrossrefGoogle Scholar
  • [32] Defazio A , Bach F , Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z , Welling M , Cortes C , Lawrence ND , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 27 (Neural Information Processing Systems Foundation, San Diego), 1646–1654.Google Scholar
  • [33] Delage E , Ye Y (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595–612.LinkGoogle Scholar
  • [34] DiCiccio T , Hall P , Romano J (1988) Bartlett adjustment for empirical likelihood. Technical Report 298. Department of Statistics, Stanford University, Stanford, CA.Google Scholar
  • [35] DiCiccio T , Hall P , Romano J (1991) Empirical likelihood is Bartlett-correctable. Ann. Statist. 19(2):1053–1061.CrossrefGoogle Scholar
  • [36] Donoho D , Montanari A (2016) High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probab. Theory Related Fields 166(3–4):935–969.CrossrefGoogle Scholar
  • [37] Doukhan P (1994) Mixing, Properties and Examples (Springer, New York).Google Scholar
  • [38] Doukhan P , Massart P , Rio E (1995) Invariance principles for absolutely regular empirical processes. Annales de l’IHP probabilités et statistiques 31(2):393–427.Google Scholar
  • [39] Duchi JC , Namkoong H (2016) Variance-based regularization with convex objectives. Preprint, submitted October 8, https://arxiv.org/abs/1610.02581.Google Scholar
  • [40] Duchi JC , Namkoong H (2019) Variance-based regularization with convex objectives. J. Machine Learn. Res. 20(68):1–55.Google Scholar
  • [41] Duchi JC , Hazan E , Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12(61):2121–2159.Google Scholar
  • [42] Dupacová J , Wets R (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.CrossrefGoogle Scholar
  • [43] Esfahani PM , Kuhn D (2018) Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.CrossrefGoogle Scholar
  • [44] Ethier SN , Kurtz TG (2009) Markov Processes: Characterization and Convergence (Wiley, New York).Google Scholar
  • [45] Fournier N , Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4):707–738.CrossrefGoogle Scholar
  • [46] Glynn PW , Zeevi A (2008) Bounding stationary expectations of markov processes. Ethier SN , Feng J , Stockbridge RH , eds. Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz (Institute of Mathematical Statistics, Beachwood, OH), 195–214.CrossrefGoogle Scholar
  • [47] Gupta V (2019) Near-optimal Bayesian ambiguity sets for distributionally robust optimization. Management Sci. 65(9):4242–4260.LinkGoogle Scholar
  • [48] Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.CrossrefGoogle Scholar
  • [49] Hiriart-Urruty J , Lemaréchal C (1993) Convex Analysis and Minimization Algorithms I (Springer, New York).CrossrefGoogle Scholar
  • [50] Hiriart-Urruty J , Lemaréchal C (1993) Convex Analysis and Minimization Algorithms II (Springer, New York).CrossrefGoogle Scholar
  • [51] Hjort NL , McKeague IW , Van Keilegom I (2009) Extending the scope of empirical likelihood. Ann. Statist. 37(3):1079–1111.CrossrefGoogle Scholar
  • [52] Ibragimov IA (1962) Some limit theorems for stationary processes. Theory Probab. Appl. 7(4):349–382.CrossrefGoogle Scholar
  • [53] Imbens G (2002) Generalized method of moments and empirical likelihood. J. Bus. Econom. Statist. 20(4):493–506.CrossrefGoogle Scholar
  • [54] Jiang R , Guan Y (2016) Data-driven chance constrained stochastic program. Math. Programming 158(1–2):291–327.CrossrefGoogle Scholar
  • [55] Johnson R , Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC , Bottou L , Welling M , Ghahramani Z , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 26 (Neural Information Processing Systems Foundation, San Diego), 315–323.Google Scholar
  • [56] King AJ (1989) Generalized delta theorems for multivalued mappings and measurable selections. Math. Oper. Res. 14(4):720–736.LinkGoogle Scholar
  • [57] King AJ , Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.LinkGoogle Scholar
  • [58] King AJ , Wets RJ (1991) Epi-consistency of convex stochastic programs. Stochastics Stochastic Rep. 34(1–2):83–92.CrossrefGoogle Scholar
  • [59] Kosorok MR (2008) Introduction to empirical processes. Introduction to Empirical Processes and Semiparametric Inference (Springer, New York), 77–79.CrossrefGoogle Scholar
  • [60] Krokhmal PA (2007) Higher moment coherent risk measures. Quant. Finance 7(4):373–387.CrossrefGoogle Scholar
  • [61] Lam H (2016) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.LinkGoogle Scholar
  • [62] Lam H (2018) Sensitivity to serial dependency of input processes: A robust approach. Management Sci. 64(3):1311–1327.LinkGoogle Scholar
  • [63] Lam H , Zhou E (2017) The empirical likelihood approach to quantifying uncertainty in sample average approximation. Oper. Res. Lett. 45(4):301–307.CrossrefGoogle Scholar
  • [64] Lan G , Nemirovski A , Shapiro A (2012) Validation analysis of robust stochastic approximation method. Math. Programming 134(2):425–458.CrossrefGoogle Scholar
  • [65] Lehmann EL , Romano JP (2005) Testing Statistical Hypotheses , 3rd ed. (Springer, New York).Google Scholar
  • [66] Li T , Liu L , Kyrillidis A , Caramanis C (2018) Statistical inference using SGD. Thirty-Second AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 3571–3578.Google Scholar
  • [67] Mak W-K , Morton DP , Wood RK (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1):47–56.CrossrefGoogle Scholar
  • [68] Mandt S , Hoffman M , Blei D (2017) Stochastic gradient descent as approximate Bayesian inference. J. Machine Learn. Res. 18(134):1–35.Google Scholar
  • [69] Markowitz H (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
  • [70] Meyn S , Tweedie RL (2009) Markov Chains and Stochastic Stability , 2nd ed. (Cambridge University Press, New York).CrossrefGoogle Scholar
  • [71] Mokkadem A (1990) Propriétés de mélange des processus autorégressifs polynomiaux. Ann. Inst. Henri Poincaré Probab. Statist. 26(2):219–260.Google Scholar
  • [72] Namkoong H , Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee DD , Sugiyama M , Luxburg UV , Guyon I , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 29 (Neural Information Processing Systems Foundation, San Diego), 2208–2216.Google Scholar
  • [73] Namkoong H , Duchi JC (2017) Variance-based regularization with convex objectives. Guyon I , Luxburg UV , Bengio S , Wallach H , Fergus R , Vishwanathan S , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 30 (Neural Information Processing Systems Foundation, San Diego), 2971–2980.Google Scholar
  • [74] Nemirovski A , Juditsky A , Lan G , Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [75] Newey W , Smith R (2004) Higher order properties of gmm and generalized empirical likelihood estimators. Econometrica 72(1):219–255.CrossrefGoogle Scholar
  • [76] Nobel A , Dembo A (1993) A note on uniform laws of averages for dependent processes. Statist. Probab. Lett. 17(3):169–172.CrossrefGoogle Scholar
  • [77] Nummelin E , Tweedie RL (1978) Geometric ergodicity and r-positivity for general markov chains. Ann. Probab. 6(3):404–420.CrossrefGoogle Scholar
  • [78] Owen A (1990) Empirical likelihood ratio confidence regions. Ann. Statist. 18(1):90–120.CrossrefGoogle Scholar
  • [79] Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75(2):237–249.CrossrefGoogle Scholar
  • [80] Owen AB (2001) Empirical Likelihood (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • [81] Pflug G , Wozabal D (2007) Ambiguity in portfolio selection. Quant. Finance 7(4):435–442.CrossrefGoogle Scholar
  • [82] Polyak BT , Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [83] Rio E (2017) Asymptotic Theory of Weakly Dependent Random Processes (Springer, New York).CrossrefGoogle Scholar
  • [84] Rockafellar RT , Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.CrossrefGoogle Scholar
  • [85] Rockafellar RT , Wets RJB (1998) Variational Analysis (Springer, New York).CrossrefGoogle Scholar
  • [86] Römisch W (2005) Delta method, infinite dimensional. Kotz S, Read CB, Balakrishnan N, Vidakovic B, eds. Encyclopedia of Statistical Sciences (Wiley, Hoboken, NJ).Google Scholar
  • [87] Scarsini M (1999) Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1):93–103.CrossrefGoogle Scholar
  • [88] Shafieezadeh-Abadeh S , Esfahani PM , Kuhn D (2015) Distributionally robust logistic regression. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1576–1584.Google Scholar
  • [89] Shalev-Shwartz S , Wexler Y (2016) Minimizing the maximal loss: How and why? Balcan MF , Weinberger KQ , eds. Proc. 33rd Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 793–801.Google Scholar
  • [90] Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.CrossrefGoogle Scholar
  • [91] Shapiro A (1990) On differential stability in stochastic programming. Math. Programming 47(1–3):107–116.CrossrefGoogle Scholar
  • [92] Shapiro A (1991) Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30(1):169–186.CrossrefGoogle Scholar
  • [93] Shapiro A (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.LinkGoogle Scholar
  • [94] Shapiro A , Dentcheva D , Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society, Philadelphia).CrossrefGoogle Scholar
  • [95] Sinha A , Namkoong H , Volpi R , Duchi JC (2017) Certifiable distributional robustness with principled adversarial training. Preprint, submitted October 29, https://arxiv.org/abs/1710.10571.Google Scholar
  • [96] Udell M , Mohan K , Zeng D , Hong J , Diamond S , Boyd S (2014) Convex optimization in Julia. First Workshop High Performance Tech. Comput. Dynam. Languages (IEEE, New York), 18–28.Google Scholar
  • [97] van der Vaart AW (1998) Asymptotic Statistics (Cambridge University Press, New York).CrossrefGoogle Scholar
  • [98] van der Vaart AW , Wellner JA (1996) Weak Convergence and Empirical Processes with Applications to Statistics (Springer, New York).CrossrefGoogle Scholar
  • [99] Wang Z , Glynn P , Ye Y (2016) Likelihood robust optimization for data-driven problems. Comput. Management Sci. 13:241–261.CrossrefGoogle Scholar
  • [100] Wozabal D (2012) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1):21–47.CrossrefGoogle Scholar
  • [101] Xu H , Caramanis C , Mannor S (2009) Robustness and regularization of support vector machines. J. Machine Learn. Res. 10:1485–1510.Google Scholar
  • [102] Yu B (1994) Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22(1):94–116.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.