Generalization Bounds in the Predict-Then-Optimize Framework

Othman El Balghiti
Othman El Balghiti
[email protected]
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Adam N. Elmachtoub
Corresponding Author
Adam N. Elmachtoub
[email protected]
https://orcid.org/0000-0003-0729-4999
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Paul Grigas
Paul Grigas
[email protected]
https://orcid.org/0000-0002-5617-1058
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;
Search for more papers by this author
,
Ambuj Tewari
Ambuj Tewari
[email protected]
https://orcid.org/0000-0001-6969-7844
Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author

Othman El Balghiti

[email protected]

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;

Search for more papers by this author

Adam N. Elmachtoub

Corresponding Author

Adam N. Elmachtoub

[email protected]

https://orcid.org/0000-0003-0729-4999

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;

Search for more papers by this author

Paul Grigas

[email protected]

https://orcid.org/0000-0002-5617-1058

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;

Search for more papers by this author

Ambuj Tewari

[email protected]

https://orcid.org/0000-0001-6969-7844

Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Published Online:16 Nov 2022https://doi.org/10.1287/moor.2022.1330

References

[1] Amos B, Kolter JZ (2017) Optnet: Differentiable optimization as a layer in neural networks. Internat. Conf. Machine Learn. (PMLR), 136–145.Google Scholar
[2] Ban G-Y, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.Link, Google Scholar
[3] Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3:463–482.Google Scholar
[4] Berthet Q, Blondel M, Teboul O, Cuturi M, Vert J-P, Bach F (2020) Learning with differentiable perturbed optimizers. Adv. Neural Inform. Processing Systems 33:9508–9519.Google Scholar
[5] Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
[6] Chu H, Zhang W, Bai P, Chen Y (2021) Data-driven optimization for last-mile delivery. Complex Intelligent Systems 1–14.Google Scholar
[7] Daniely A, Shalev-Shwartz S (2014) Optimal learners for multiclass problems. Conf. Learn. Theory 287–316.Google Scholar
[8] Daniely A, Sabato S, Ben-David S, Shalev-Shwartz S (2015) Multiclass learnability and the ERM principle. J. Machine Learn. Res. 16(1):2377–2404.Google Scholar
[9] Demirovic E, Stuckey P, Bailey J, Chan J, Leckie C, Ramamohanarao K, Guns T (2019) Predict+optimise with ranking objectives: Exhaustively learning linear functions. IJCAI-19, 1078–1085.Google Scholar
[10] Demirovic E, Stuckey P, Bailey J, Chan J, Leckie C, Ramamohanarao K, Guns T (2020) Dynamic programming for predict+optimise. Proc. 34th AAAI Conf. Artificial Intelligence, 1444–1451.Google Scholar
[11] Donti P, Amos B, Kolter JZ (2017) Task-based end-to-end model learning in stochastic optimization. Adv. Neural Inform. Processing Systems 31:5484–5494.Google Scholar
[12] El Balghiti O, Elmachtoub AN, Grigas P, Tewari A (2019) Generalization bounds in the predict-then-optimize framework. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems, vol. 32, 14412–14421.Google Scholar
[13] Elmachtoub A, Grigas P (2022) Smart “predict, then optimize.” Management Sci. 68(1):9–26.Link, Google Scholar
[14] Elmachtoub A, Jason CNL, McNellis R (2020) Decision trees for decision-making under the predict-then-optimize framework. Internat. Conf. Machine Learn. (PMLR), 2858–2867.Google Scholar
[15] Ferber A, Wilder B, Dilkina B, Tambe M (2020) MIPaaL: Mixed integer program as a layer. Proc. Conf. AAAI Artificial Intelligence, vol. 34, 1504–1511.Google Scholar
[16] Foster DJ, Rakhlin A (2019) ℓ∞ vector contraction for Rademacher complexity. Preprint, submitted November 15, https://arxiv.org/abs/1911.06468.Google Scholar
[17] Garber D, Hazan E (2015) Faster rates for the Frank-Wolfe method over strongly-convex sets. Internat. Conf. Machine Learn. (PMLR), 541–549.Google Scholar
[18] Guermeur Y (2007) VC theory of large margin multi-category classifiers. J. Machine Learn. Res. 8:2551–2594.Google Scholar
[19] Ho CP, Hanasusanto GA (2019) On data-driven prescriptive analytics with side information: A regularized Nadaraya–Watson approach. Accessed June 1, 2022, http://www.optimization-online.org/DBFILE/2019/01/7043.pdf.Google Scholar
[20] Ho-Nguyen N, Kılınç-Karzan F (2022) Risk guarantees for end-to-end prediction and optimization processes. Management Sci. 68(12):8680–8698.Google Scholar
[21] Hu Y, Kallus N, Mao X (2022) Fast rates for contextual linear optimization. Management Sci. 68(6):4236–4245.Link, Google Scholar
[22] Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11:517–553.Google Scholar
[23] Kakade SM, Shalev-Shwartz S, Tewari A (2012) Regularization techniques for learning with matrices. J. Machine Learn. Res. 13:1865–1890.Google Scholar
[24] Kakade SM, Sridharan K, Tewari A (2008) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Adv. Neural Inform. Processing Systems 21:793–800.Google Scholar
[25] Kallus N, Mao X (2022) Stochastic optimization forests. Management Sci. ePub ahead of print June 28, https://doi.org/10.1287/mnsc.2022.4458.Google Scholar
[26] Kao Y, Roy BV, Yan X (2009) Directed regression. Adv. Neural Inform. Processing Systems 22:889–897.Google Scholar
[27] Koltchinskii V, Panchenko D (2002) Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30(1):1–50.Crossref, Google Scholar
[28] Koltchinskii V, Panchenko D, Lozano F (2001) Some new bounds on the generalization error of combined classifiers. Adv. Neural Inform. Processing Systems 13:245–251.Google Scholar
[29] Kotary J, Fioretto F, Van Hentenryck P, Wilder B (2021) End-to-end constrained optimization learning: A survey. 30th International Joint Conference on Artificial Intelligence, IJCAI, 4475–4482.Google Scholar
[30] Lei Y, Dogan U, Binder A, Kloft M (2015) Multi-class SVMs: From tighter data-dependent generalization bounds to novel algorithms. Adv. Neural Inform. Processing Systems 28:2035–2043.Google Scholar
[31] Li J, Liu Y, Yin R, Zhang H, Ding L, Wang W (2018) Multi-class learning: From theory to algorithm. Adv. Neural Inform. Processing Systems 31:1586–1595.Google Scholar
[32] Loke GG, Tang Q, Xiao Y (2022) Decision-driven regularization: A blended model for predict-then-optimize. Preprint, submitted June 17, 2020, https://dx.doi.org/10.2139/ssrn.3623006.Google Scholar
[33] Mandi J, Guns T (2020) Interior point solving for LP-based prediction+optimisation. Adv. Neural Inform. Processing Systems 33:7272–7282.Google Scholar
[34] Mandi J, Demirovic E, Stuckey PJ, Guns T (2020) Smart predict-and-optimize for hard combinatorial optimization problems. Proc. AAAI Conf. Artificial Intelligence, vol. 34, 1603–1610.Google Scholar
[35] Maurer A (2016) A vector-contraction inequality for Rademacher complexities. Internat. Conf. Algorithmic Learn. Theory (Springer, Berlin), 3–17.Google Scholar
[36] Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of Machine Learning, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
[37] Natarajan BK (1989) On learning sets and functions. Machine Learn. 4(1):67–97.Crossref, Google Scholar
[38] Pogančić MV, Paulus A, Musil V, Martius G, Rolinek M (2019) Differentiation of blackbox combinatorial solvers. Internat. Conf. Learn. Representations.Google Scholar
[39] Rockafellar RT, Wets RJB (2009) Variational Analysis, vol. 317 (Springer Science & Business Media, Berlin).Google Scholar
[40] Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge).Crossref, Google Scholar
[41] Tibshirani R, Wainwright M, Hastie T (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, Boca Raton).Google Scholar
[42] Vial J-P (1983) Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2):231–259.Link, Google Scholar
[43] Wang K, Wilder B, Perrault A, Tambe M (2020) Automatically learning compact quality-aware surrogates for optimization problems. Adv. Neural Inform. Processing Systems 33:9586–9596.Google Scholar
[44] Wilder B, Dilkina B, Tambe M (2019) Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization. Proc. Conf. AAAI Artificial Intelligence 33:1658–1665.Google Scholar
[45] Wilder B, Ewing E, Dilkina B, Tambe M (2019) End to end learning and optimization on graphs. Adv. Neural Inform. Processing Systems 33:4674–4685.Google Scholar
[46] Yan R, Wang S, Fagerholt K (2020) A semi-“smart predict then optimize”(semi-SPO) method for efficient ship inspection. Transportation Res. Part B Methodological 142:100–125.Crossref, Google Scholar
[47] Zatarain-Vera O (2019) A vector-contraction inequality for Rademacher complexities using p-stable variables. Preprint, submitted December 20, https://arxiv.org/abs/1912.10136.Google Scholar
[48] Ziegler GM (2012) Lectures on Polytopes, vol. 152 (Springer Science & Business Media, Berlin).Google Scholar

cover image Mathematics of Operations Research

Volume 48, Issue 4

November 2023

Pages 1811-2382, C2

Article Information

Metrics

Information

Received:July 23, 2021
Accepted:October 11, 2022
Published Online:November 16, 2022

Cite as

Othman El Balghiti, Adam N. Elmachtoub, Paul Grigas, Ambuj Tewari (2022) Generalization Bounds in the Predict-Then-Optimize Framework. Mathematics of Operations Research 48(4):2043-2065.

https://doi.org/10.1287/moor.2022.1330

Keywords

Acknowledgments

The authors thank the review team for their excellent feedback, which helped improve the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Generalization Bounds in the Predict-Then-Optimize Framework

References

Volume 48, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News