Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach
Published Online:27 Jan 2021https://doi.org/10.1287/moor.2020.1085
References
- [1] (1966) A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B . 28:131–142.Google Scholar
- [2] (1994) Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab. 7(1):47–71.Crossref, Google Scholar
- [3] (1999) Coherent measures of risk. Math. Finance 9(3):203–228.Crossref, Google Scholar
- [4] (1998) Empirical likelihood as a goodness-of-fit measure. Biometrika 85(3):535–547.Crossref, Google Scholar
- [5] (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.Crossref, Google Scholar
- [6] (2013) Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. USA 110(36):14563–14568.Crossref, Google Scholar
- [7] (2009) Robust Optimization (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- [8] (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.Link, Google Scholar
- [9] (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
- [10] (2006) Empirical likelihood in some semiparametric models. Bernoulli 12(2):299–331.Crossref, Google Scholar
- [11] (2014) Empirical φ∗ p-divergence minimizers for hadamard differentiable functionals. Akritas MG , Lahiri SN , Politis DN , eds. Topics in Nonparametric Statistics (Springer, New York), 21–32.Crossref, Google Scholar
- [12] (1973) Stochastic optimization problems with nondifferentiable cost functionals. J. Optim. Theory Appl. 12(2):218–231.Crossref, Google Scholar
- [13] (2014) Robust sample average approximation. Preprint, submitted August 19, https://arxiv.org/abs/1408.4445.Google Scholar
- [14] (2018) Data-driven robust optimization. Math. Programming 167(2):235–292.Crossref, Google Scholar
- [15] (1986) Probability and Measure , 2nd ed. (Wiley, New York).Google Scholar
- [16] (1976) Studies of stock price volatility changes. Proc. 1976 Meetings Amer. Statist. Assoc. (American Statistical Association, Washington, DC), 177–181.Google Scholar
- [17] (2019) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565–600.Link, Google Scholar
- [18] (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
- [19] (2005) Basic properties of strong mixing conditions. a survey and some open questions. Probab. Surveys 2:107–144.Crossref, Google Scholar
- [20] (2003) Second-order power comparisons for a class of nonparametric likelihood-based tests. Biometrika 90(4):881–890.Crossref, Google Scholar
- [21] (2006) Bartlett-type adjustments for empirical discrepancy test statistics. J. Statist. Planning Inference 136(3):537–554.Crossref, Google Scholar
- [22] (2015) Finite-time analysis of projected Langevin Monte Carlo. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1243–1251.Google Scholar
- [23] (2020) The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. Ann. Statist. 48(1):27–42.Crossref, Google Scholar
- [24] (2009) Effects of data dimension on empirical likelihood. Biometrika 96(3):711–722.Crossref, Google Scholar
- [25] (2020) Statistical inference for model parameters in stochastic gradient descent. Ann. Statist. 48(1):251–273.Crossref, Google Scholar
- [26] (1982) The stochastic behavior of common stock variances: Value, leverage and interest rate effects. J. Financial Econom. 10(4):407–432.Crossref, Google Scholar
- [27] (2012) Sublinear optimization for machine learning. J. ACM 59(5):23.Crossref, Google Scholar
- [28] (1998) Bartlett adjustment of empirical discrepancy statistics. Biometrika 85(4):967–972.Crossref, Google Scholar
- [29] (1984) Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. B 46(3):440–464.Crossref, Google Scholar
- [30] (1967) Information-type measures of difference of probability distributions and indirect observation. Studia Scientifica Mathematica Hungary 2:299–318.Google Scholar
- [31] (1967) The Theory of Max-Min and Its Application to Weapons Allocation Problems (Springer, Berlin).Crossref, Google Scholar
- [32] (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z , Welling M , Cortes C , Lawrence ND , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 27 (Neural Information Processing Systems Foundation, San Diego), 1646–1654.Google Scholar
- [33] (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595–612.Link, Google Scholar
- [34] (1988) Bartlett adjustment for empirical likelihood. Technical Report 298. Department of Statistics, Stanford University, Stanford, CA.Google Scholar
- [35] (1991) Empirical likelihood is Bartlett-correctable. Ann. Statist. 19(2):1053–1061.Crossref, Google Scholar
- [36] (2016) High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probab. Theory Related Fields 166(3–4):935–969.Crossref, Google Scholar
- [37] (1994) Mixing, Properties and Examples (Springer, New York).Google Scholar
- [38] (1995) Invariance principles for absolutely regular empirical processes. Annales de l’IHP probabilités et statistiques 31(2):393–427.Google Scholar
- [39] (2016) Variance-based regularization with convex objectives. Preprint, submitted October 8, https://arxiv.org/abs/1610.02581.Google Scholar
- [40] (2019) Variance-based regularization with convex objectives. J. Machine Learn. Res. 20(68):1–55.Google Scholar
- [41] (2011) Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12(61):2121–2159.Google Scholar
- [42] (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.Crossref, Google Scholar
- [43] (2018) Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.Crossref, Google Scholar
- [44] (2009) Markov Processes: Characterization and Convergence (Wiley, New York).Google Scholar
- [45] (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3–4):707–738.Crossref, Google Scholar
- [46] (2008) Bounding stationary expectations of markov processes. Ethier SN , Feng J , Stockbridge RH , eds. Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz (Institute of Mathematical Statistics, Beachwood, OH), 195–214.Crossref, Google Scholar
- [47] (2019) Near-optimal Bayesian ambiguity sets for distributionally robust optimization. Management Sci. 65(9):4242–4260.Link, Google Scholar
- [48] (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
- [49] (1993) Convex Analysis and Minimization Algorithms I (Springer, New York).Crossref, Google Scholar
- [50] (1993) Convex Analysis and Minimization Algorithms II (Springer, New York).Crossref, Google Scholar
- [51] (2009) Extending the scope of empirical likelihood. Ann. Statist. 37(3):1079–1111.Crossref, Google Scholar
- [52] (1962) Some limit theorems for stationary processes. Theory Probab. Appl. 7(4):349–382.Crossref, Google Scholar
- [53] (2002) Generalized method of moments and empirical likelihood. J. Bus. Econom. Statist. 20(4):493–506.Crossref, Google Scholar
- [54] (2016) Data-driven chance constrained stochastic program. Math. Programming 158(1–2):291–327.Crossref, Google Scholar
- [55] (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC , Bottou L , Welling M , Ghahramani Z , Weinberger KQ , eds. Advances in Neural Information Processing Systems , vol. 26 (Neural Information Processing Systems Foundation, San Diego), 315–323.Google Scholar
- [56] (1989) Generalized delta theorems for multivalued mappings and measurable selections. Math. Oper. Res. 14(4):720–736.Link, Google Scholar
- [57] (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Link, Google Scholar
- [58] (1991) Epi-consistency of convex stochastic programs. Stochastics Stochastic Rep. 34(1–2):83–92.Crossref, Google Scholar
- [59] (2008) Introduction to empirical processes. Introduction to Empirical Processes and Semiparametric Inference (Springer, New York), 77–79.Crossref, Google Scholar
- [60] (2007) Higher moment coherent risk measures. Quant. Finance 7(4):373–387.Crossref, Google Scholar
- [61] (2016) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.Link, Google Scholar
- [62] (2018) Sensitivity to serial dependency of input processes: A robust approach. Management Sci. 64(3):1311–1327.Link, Google Scholar
- [63] (2017) The empirical likelihood approach to quantifying uncertainty in sample average approximation. Oper. Res. Lett. 45(4):301–307.Crossref, Google Scholar
- [64] (2012) Validation analysis of robust stochastic approximation method. Math. Programming 134(2):425–458.Crossref, Google Scholar
- [65] (2005) Testing Statistical Hypotheses , 3rd ed. (Springer, New York).Google Scholar
- [66] (2018) Statistical inference using SGD. Thirty-Second AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 3571–3578.Google Scholar
- [67] (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1):47–56.Crossref, Google Scholar
- [68] (2017) Stochastic gradient descent as approximate Bayesian inference. J. Machine Learn. Res. 18(134):1–35.Google Scholar
- [69] (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
- [70] (2009) Markov Chains and Stochastic Stability , 2nd ed. (Cambridge University Press, New York).Crossref, Google Scholar
- [71] (1990) Propriétés de mélange des processus autorégressifs polynomiaux. Ann. Inst. Henri Poincaré Probab. Statist. 26(2):219–260.Google Scholar
- [72] (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee DD , Sugiyama M , Luxburg UV , Guyon I , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 29 (Neural Information Processing Systems Foundation, San Diego), 2208–2216.Google Scholar
- [73] (2017) Variance-based regularization with convex objectives. Guyon I , Luxburg UV , Bengio S , Wallach H , Fergus R , Vishwanathan S , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 30 (Neural Information Processing Systems Foundation, San Diego), 2971–2980.Google Scholar
- [74] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [75] (2004) Higher order properties of gmm and generalized empirical likelihood estimators. Econometrica 72(1):219–255.Crossref, Google Scholar
- [76] (1993) A note on uniform laws of averages for dependent processes. Statist. Probab. Lett. 17(3):169–172.Crossref, Google Scholar
- [77] (1978) Geometric ergodicity and r-positivity for general markov chains. Ann. Probab. 6(3):404–420.Crossref, Google Scholar
- [78] (1990) Empirical likelihood ratio confidence regions. Ann. Statist. 18(1):90–120.Crossref, Google Scholar
- [79] (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75(2):237–249.Crossref, Google Scholar
- [80] (2001) Empirical Likelihood (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- [81] (2007) Ambiguity in portfolio selection. Quant. Finance 7(4):435–442.Crossref, Google Scholar
- [82] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [83] (2017) Asymptotic Theory of Weakly Dependent Random Processes (Springer, New York).Crossref, Google Scholar
- [84] (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
- [85] (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar
- [86] (2005) Delta method, infinite dimensional. Kotz S, Read CB, Balakrishnan N, Vidakovic B, eds. Encyclopedia of Statistical Sciences (Wiley, Hoboken, NJ).Google Scholar
- [87] (1999) Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1):93–103.Crossref, Google Scholar
- [88] (2015) Distributionally robust logistic regression. Cortes C , Lawrence ND , Lee DD , Sugiyama M , Garnett R , eds. Advances in Neural Information Processing Systems , vol. 28 (Neural Information Processing Systems Foundation, San Diego), 1576–1584.Google Scholar
- [89] (2016) Minimizing the maximal loss: How and why? Balcan MF , Weinberger KQ , eds. Proc. 33rd Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 793–801.Google Scholar
- [90] (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
- [91] (1990) On differential stability in stochastic programming. Math. Programming 47(1–3):107–116.Crossref, Google Scholar
- [92] (1991) Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30(1):169–186.Crossref, Google Scholar
- [93] (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.Link, Google Scholar
- [94] (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM and Mathematical Programming Society, Philadelphia).Crossref, Google Scholar
- [95] (2017) Certifiable distributional robustness with principled adversarial training. Preprint, submitted October 29, https://arxiv.org/abs/1710.10571.Google Scholar
- [96] (2014) Convex optimization in Julia. First Workshop High Performance Tech. Comput. Dynam. Languages (IEEE, New York), 18–28.Google Scholar
- [97] (1998) Asymptotic Statistics (Cambridge University Press, New York).Crossref, Google Scholar
- [98] (1996) Weak Convergence and Empirical Processes with Applications to Statistics (Springer, New York).Crossref, Google Scholar
- [99] (2016) Likelihood robust optimization for data-driven problems. Comput. Management Sci. 13:241–261.Crossref, Google Scholar
- [100] (2012) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1):21–47.Crossref, Google Scholar
- [101] (2009) Robustness and regularization of support vector machines. J. Machine Learn. Res. 10:1485–1510.Google Scholar
- [102] (1994) Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22(1):94–116.Crossref, Google Scholar

