Solving Optimization Problems with Blackwell Approachability

Julien Grand-Clément
Corresponding Author
Julien Grand-Clément
[email protected]
https://orcid.org/0000-0002-2864-8779
Department of Information Systems and Operations Management, École des Hautes Études Commerciales de Paris (HEC Paris), 78350 Jouy-en-Josas, France;
Search for more papers by this author
,
Christian Kroer
Christian Kroer
[email protected]
https://orcid.org/0000-0002-9009-8683
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Search for more papers by this author

Corresponding Author

Julien Grand-Clément

Department of Information Systems and Operations Management, École des Hautes Études Commerciales de Paris (HEC Paris), 78350 Jouy-en-Josas, France;

Search for more papers by this author

Christian Kroer

[email protected]

https://orcid.org/0000-0002-9009-8683

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:16 May 2023https://doi.org/10.1287/moor.2023.1376

References

[1] Abernethy J, Bartlett PL, Hazan E (2011) Blackwell approachability and no-regret learning are equivalent. Proc. 24th Annual Conf. Learn. Theory, 27–46.Google Scholar
[2] Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (COLT) (Omnipress), 263–274.Google Scholar
[3] Alagoz O, Hsu H, Schaefer AJ, Roberts MS (2010) Markov decision processes: A tool for sequential decision making under uncertainty. Medical Decision Making 30(4):474–483.Crossref, Google Scholar
[4] Archibald T, McKinnon K, Thomas L (1995) On the generation of Markov decision processes. J. Oper. Res. Soc. 46(3):354–361.Crossref, Google Scholar
[5] Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
[6] Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
[7] Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2 (SIAM, Philadelphia).Crossref, Google Scholar
[8] Ben-Tal A, Hazan E, Koren T, Mannor S (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.Link, Google Scholar
[9] Bertsimas D, Den Hertog D, Pauphilet J (2021) Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4):2893–2920.Google Scholar
[10] Bhatnagar S, Sutton R, Bowling M, Ghavamzadeh M, Lee M (2007) Natural-gradient Actor-Critic algorithms. Proc. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 19–26.Google Scholar
[11] Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
[12] Blackwell D (1954) Controlled random walks. Proc. Internat. Congress Math., vol. 3, 336–338.Google Scholar
[13] Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149.Crossref, Google Scholar
[14] Brown N, Sandholm T (2018) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.Crossref, Google Scholar
[15] Brown N, Sandholm T (2019) Solving imperfect-information games via discounted regret minimization. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1829–1836.Google Scholar
[16] Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science 365(6456):885–890.Crossref, Google Scholar
[17] Burch N, Moravčík M, Schmid M (2019) Revisiting CFR+ and alternating updates. J. Artificial Intelligence Res. 64:429–443.Crossref, Google Scholar
[18] Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1):120–145.Crossref, Google Scholar
[19] Chambolle A, Pock T (2016) On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Programming 159(1–2):253–287.Crossref, Google Scholar
[20] Chiang C-K, Yang T, Lee C-J, Mahdavi M, Lu C-J, Jin R, Zhu S (2012) Online optimization with gradual variations. Proc. Conf. Learn. Theory (COLT), vol. 23, 6.1–6.20.Google Scholar
[21] Chzhen E, Giraud C, Stoltz G (2021) A unified approach to fair online learning via blackwell approachability. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, SanDiego), 18280–18292.Google Scholar
[22] Combettes PL, Reyes NN (2013) Moreau’s decomposition in Banach spaces. Math. Programming 139(1):103–114.Crossref, Google Scholar
[23] De Rooij S, Van Erven T, Grünwald PD, Koolen WM (2014) Follow the leader if you can, hedge if you must. J. Machine Learn. Res. 15(1):1281–1316.Google Scholar
[24] Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1 ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (PMLR, New York), 272–279.Google Scholar
[25] Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math. Geology 35(3):279–300.Crossref, Google Scholar
[26] Farina G, Kroer C, Sandholm T (2019) Online convex optimization for sequential decision processes and extensive-form games. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1917–1925.Google Scholar
[27] Farina G, Kroer C, Sandholm T (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 5222–5232.Google Scholar
[28] Farina G, Kroer C, Sandholm T (2021) Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. Proc. AAAI Conf. on Artificial Intelligence 35(6):5363–5371.Google Scholar
[29] Gao Y, Kroer C, Goldfarb D (2021) Increasing iterate averaging for solving saddle-point problems. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 7537–7544.Google Scholar
[30] Goyal V, Grand-Clement J (2023) Robust Markov decision processes: Beyond rectangularity. Math. Oper. Res. 48(1):203–226.Google Scholar
[31] Grand-Clément J, Kroer C (2020) First-order methods for Wasserstein distributionally robust MDP. Internat. Conf. Machine Learn. (PMLR, New York), 2010–2019.Google Scholar
[32] Grand-Clément J, Kroer C (2021) Conic Blackwell algorithm: Parameter-free convex-concave saddle-point solving. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, San Diego), 9587–9599.Google Scholar
[33] Grand-Clément J, Kroer C (2021) Scalable first-order methods for robust MDPs. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 12086–12094.Google Scholar
[34] Grand-Clément J, Chan CW, Goyal V, Escobar G (2022) Robustness of Proactive Intensive Care Unit Transfer Policies. Oper. Res., ePub ahead of print November 22, https://doi.org/10.1287/opre.2022.2403.Google Scholar
[35] Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.Crossref, Google Scholar
[36] Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.Crossref, Google Scholar
[37] Iyengar G (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
[38] Jin Y, Sidford A (2020) Efficiently solving MDPs with stochastic mirror descent. Internat. Conf. Machine Learn. (PMLR), 4890–4900.Google Scholar
[39] Kroer C (2020) IEOR8100: Economics, AI, and optimization lecture note 5: Computing Nash equilibrium via regret minimization. Working paper.Google Scholar
[40] Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 864–874.Google Scholar
[41] Kroer C, Peysakhovich A, Sodomka E, Stier-Moses NE (2022) Computing large market equilibria using abstractions. Oper. Res. 70(1):329–351.Link, Google Scholar
[42] Kroer C, Waugh K, Kılınç-Karzan F, Sandholm T (2020) Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Programming 179:385–417.Crossref, Google Scholar
[43] McMahan B (2011) Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist., 525–533.Google Scholar
[44] Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.Crossref, Google Scholar
[45] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Google Scholar
[46] Moravčík M, Schmid M, Burch N, Lis`y V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513.Crossref, Google Scholar
[47] Namkoong H, Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. NIPS, vol. 29, 2208–2216.Google Scholar
[48] Nemirovski A (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.Crossref, Google Scholar
[49] Nemirovski A, Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, New York).Google Scholar
[50] Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
[51] Niazadeh R, Golrezaei N, Wang J, Susan F, Badanidiyuru A (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 737–738.Google Scholar
[52] Orabona F, Pál D (2015) Scale-free algorithms for online linear optimization. Internat. Conf. Algorithmic Learn. Theory (Springer), 287–301.Google Scholar
[53] Perchet V (2010) Approachability, calibration and regret in games with partial observations. PhD thesis, Université Pierre et Marie Curie, Paris.Google Scholar
[54] Perchet V (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamics Games 1(2):181–254.Google Scholar
[55] Puterman M (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Crossref, Google Scholar
[56] Rahimian H, Mehrotra S (2019) Distributionally robust optimization: A review. Open J. Math. Optim. 3(2022).Google Scholar
[57] Rakhlin A, Sridharan K (2013) Online learning with predictable sequences. Conf. Learn. Theory (PMLR, New York), 993–1019.Google Scholar
[58] Sidford A, Tian K (2018) Coordinate methods for accelerating l∞ regression and faster approximate maximum flow. Proc. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 922–933.Google Scholar
[59] Steimle LN, Denton BT (2017) Markov decision processes for screening and treatment of chronic diseases. Boucherie R, van Dijk N, eds. Markov Decision Processes in Practice, International Series in Operations Research & Management Science, vol. 248 (Springer, Cham, Switzerland), 189–222.Crossref, Google Scholar
[60] Syrgkanis V, Agarwal A, Luo H, Schapire RE (2015) Fast convergence of regularized learning in games. Adv. Neural Inform. Processing Systems 28 (NIPS 2015) (NeurIPS, San Diego).Google Scholar
[61] Tammelin O, Burch N, Johanson M, Bowling M (2015) Solving heads-up limit Texas hold’em. Proc. Twenty-Fourth Internat. Joint Conf. Artificial Intelligence.Google Scholar
[62] Tseng P (1995) On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2):237–252.Crossref, Google Scholar
[63] von Stengel B (1996) Efficient computation of behavior strategies. Games Econom. Behav. 14(2):220–246.Crossref, Google Scholar
[64] Wei C-Y, Lee C-W, Zhang M, Luo H (2020) Linear last-iterate convergence in constrained saddle-point optimization. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
[65] Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
[66] Zinkevich M, Johanson M, Bowling M, Piccione C (2007) Regret minimization in games with incomplete information. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 1729–1736.Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 2

May 2024

Pages 653-1302, C2

Article Information

Metrics

Information

Received:August 05, 2022
Accepted:April 07, 2023
Published Online:May 16, 2023

Cite as

Julien Grand-Clément, Christian Kroer (2023) Solving Optimization Problems with Blackwell Approachability. Mathematics of Operations Research 49(2):697-728.

https://doi.org/10.1287/moor.2023.1376

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Solving Optimization Problems with Blackwell Approachability

References

Volume 49, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News