Generalization Bounds in the Predict-Then-Optimize Framework

Published Online:https://doi.org/10.1287/moor.2022.1330

References

  • [1] Amos B, Kolter JZ (2017) Optnet: Differentiable optimization as a layer in neural networks. Internat. Conf. Machine Learn. (PMLR), 136–145.Google Scholar
  • [2] Ban G-Y, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.LinkGoogle Scholar
  • [3] Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3:463–482.Google Scholar
  • [4] Berthet Q, Blondel M, Teboul O, Cuturi M, Vert J-P, Bach F (2020) Learning with differentiable perturbed optimizers. Adv. Neural Inform. Processing Systems 33:9508–9519.Google Scholar
  • [5] Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.LinkGoogle Scholar
  • [6] Chu H, Zhang W, Bai P, Chen Y (2021) Data-driven optimization for last-mile delivery. Complex Intelligent Systems 1–14.Google Scholar
  • [7] Daniely A, Shalev-Shwartz S (2014) Optimal learners for multiclass problems. Conf. Learn. Theory 287–316.Google Scholar
  • [8] Daniely A, Sabato S, Ben-David S, Shalev-Shwartz S (2015) Multiclass learnability and the ERM principle. J. Machine Learn. Res. 16(1):2377–2404.Google Scholar
  • [9] Demirovic E, Stuckey P, Bailey J, Chan J, Leckie C, Ramamohanarao K, Guns T (2019) Predict+optimise with ranking objectives: Exhaustively learning linear functions. IJCAI-19, 1078–1085.Google Scholar
  • [10] Demirovic E, Stuckey P, Bailey J, Chan J, Leckie C, Ramamohanarao K, Guns T (2020) Dynamic programming for predict+optimise. Proc. 34th AAAI Conf. Artificial Intelligence, 1444–1451.Google Scholar
  • [11] Donti P, Amos B, Kolter JZ (2017) Task-based end-to-end model learning in stochastic optimization. Adv. Neural Inform. Processing Systems 31:5484–5494.Google Scholar
  • [12] El Balghiti O, Elmachtoub AN, Grigas P, Tewari A (2019) Generalization bounds in the predict-then-optimize framework. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems, vol. 32, 14412–14421.Google Scholar
  • [13] Elmachtoub A, Grigas P (2022) Smart “predict, then optimize.” Management Sci. 68(1):9–26.LinkGoogle Scholar
  • [14] Elmachtoub A, Jason CNL, McNellis R (2020) Decision trees for decision-making under the predict-then-optimize framework. Internat. Conf. Machine Learn. (PMLR), 2858–2867.Google Scholar
  • [15] Ferber A, Wilder B, Dilkina B, Tambe M (2020) MIPaaL: Mixed integer program as a layer. Proc. Conf. AAAI Artificial Intelligence, vol. 34, 1504–1511.Google Scholar
  • [16] Foster DJ, Rakhlin A (2019) ℓ∞ vector contraction for Rademacher complexity. Preprint, submitted November 15, https://arxiv.org/abs/1911.06468.Google Scholar
  • [17] Garber D, Hazan E (2015) Faster rates for the Frank-Wolfe method over strongly-convex sets. Internat. Conf. Machine Learn. (PMLR), 541–549.Google Scholar
  • [18] Guermeur Y (2007) VC theory of large margin multi-category classifiers. J. Machine Learn. Res. 8:2551–2594.Google Scholar
  • [19] Ho CP, Hanasusanto GA (2019) On data-driven prescriptive analytics with side information: A regularized Nadaraya–Watson approach. Accessed June 1, 2022, http://www.optimization-online.org/DBFILE/2019/01/7043.pdf.Google Scholar
  • [20] Ho-Nguyen N, Kılınç-Karzan F (2022) Risk guarantees for end-to-end prediction and optimization processes. Management Sci. 68(12):8680–8698.Google Scholar
  • [21] Hu Y, Kallus N, Mao X (2022) Fast rates for contextual linear optimization. Management Sci. 68(6):4236–4245.LinkGoogle Scholar
  • [22] Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11:517–553.Google Scholar
  • [23] Kakade SM, Shalev-Shwartz S, Tewari A (2012) Regularization techniques for learning with matrices. J. Machine Learn. Res. 13:1865–1890.Google Scholar
  • [24] Kakade SM, Sridharan K, Tewari A (2008) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Adv. Neural Inform. Processing Systems 21:793–800.Google Scholar
  • [25] Kallus N, Mao X (2022) Stochastic optimization forests. Management Sci. ePub ahead of print June 28, https://doi.org/10.1287/mnsc.2022.4458.Google Scholar
  • [26] Kao Y, Roy BV, Yan X (2009) Directed regression. Adv. Neural Inform. Processing Systems 22:889–897.Google Scholar
  • [27] Koltchinskii V, Panchenko D (2002) Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30(1):1–50.CrossrefGoogle Scholar
  • [28] Koltchinskii V, Panchenko D, Lozano F (2001) Some new bounds on the generalization error of combined classifiers. Adv. Neural Inform. Processing Systems 13:245–251.Google Scholar
  • [29] Kotary J, Fioretto F, Van Hentenryck P, Wilder B (2021) End-to-end constrained optimization learning: A survey. 30th International Joint Conference on Artificial Intelligence, IJCAI, 4475–4482.Google Scholar
  • [30] Lei Y, Dogan U, Binder A, Kloft M (2015) Multi-class SVMs: From tighter data-dependent generalization bounds to novel algorithms. Adv. Neural Inform. Processing Systems 28:2035–2043.Google Scholar
  • [31] Li J, Liu Y, Yin R, Zhang H, Ding L, Wang W (2018) Multi-class learning: From theory to algorithm. Adv. Neural Inform. Processing Systems 31:1586–1595.Google Scholar
  • [32] Loke GG, Tang Q, Xiao Y (2022) Decision-driven regularization: A blended model for predict-then-optimize. Preprint, submitted June 17, 2020, https://dx.doi.org/10.2139/ssrn.3623006.Google Scholar
  • [33] Mandi J, Guns T (2020) Interior point solving for LP-based prediction+optimisation. Adv. Neural Inform. Processing Systems 33:7272–7282.Google Scholar
  • [34] Mandi J, Demirovic E, Stuckey PJ, Guns T (2020) Smart predict-and-optimize for hard combinatorial optimization problems. Proc. AAAI Conf. Artificial Intelligence, vol. 34, 1603–1610.Google Scholar
  • [35] Maurer A (2016) A vector-contraction inequality for Rademacher complexities. Internat. Conf. Algorithmic Learn. Theory (Springer, Berlin), 3–17.Google Scholar
  • [36] Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
  • [37] Natarajan BK (1989) On learning sets and functions. Machine Learn. 4(1):67–97.CrossrefGoogle Scholar
  • [38] Pogančić MV, Paulus A, Musil V, Martius G, Rolinek M (2019) Differentiation of blackbox combinatorial solvers. Internat. Conf. Learn. Representations.Google Scholar
  • [39] Rockafellar RT, Wets RJB (2009) Variational Analysis, vol. 317 (Springer Science & Business Media, Berlin).Google Scholar
  • [40] Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge).CrossrefGoogle Scholar
  • [41] Tibshirani R, Wainwright M, Hastie T (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, Boca Raton).Google Scholar
  • [42] Vial J-P (1983) Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2):231–259.LinkGoogle Scholar
  • [43] Wang K, Wilder B, Perrault A, Tambe M (2020) Automatically learning compact quality-aware surrogates for optimization problems. Adv. Neural Inform. Processing Systems 33:9586–9596.Google Scholar
  • [44] Wilder B, Dilkina B, Tambe M (2019) Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization. Proc. Conf. AAAI Artificial Intelligence 33:1658–1665.Google Scholar
  • [45] Wilder B, Ewing E, Dilkina B, Tambe M (2019) End to end learning and optimization on graphs. Adv. Neural Inform. Processing Systems 33:4674–4685.Google Scholar
  • [46] Yan R, Wang S, Fagerholt K (2020) A semi-“smart predict then optimize”(semi-SPO) method for efficient ship inspection. Transportation Res. Part B Methodological 142:100–125.CrossrefGoogle Scholar
  • [47] Zatarain-Vera O (2019) A vector-contraction inequality for Rademacher complexities using p-stable variables. Preprint, submitted December 20, https://arxiv.org/abs/1912.10136.Google Scholar
  • [48] Ziegler GM (2012) Lectures on Polytopes, vol. 152 (Springer Science & Business Media, Berlin).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.