Solving Optimization Problems with Blackwell Approachability

Published Online:https://doi.org/10.1287/moor.2023.1376

References

  • [1] Abernethy J, Bartlett PL, Hazan E (2011) Blackwell approachability and no-regret learning are equivalent. Proc. 24th Annual Conf. Learn. Theory, 27–46.Google Scholar
  • [2] Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (COLT) (Omnipress), 263–274.Google Scholar
  • [3] Alagoz O, Hsu H, Schaefer AJ, Roberts MS (2010) Markov decision processes: A tool for sequential decision making under uncertainty. Medical Decision Making 30(4):474–483.CrossrefGoogle Scholar
  • [4] Archibald T, McKinnon K, Thomas L (1995) On the generation of Markov decision processes. J. Oper. Res. Soc. 46(3):354–361.CrossrefGoogle Scholar
  • [5] Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
  • [6] Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.CrossrefGoogle Scholar
  • [7] Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2 (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [8] Ben-Tal A, Hazan E, Koren T, Mannor S (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.LinkGoogle Scholar
  • [9] Bertsimas D, Den Hertog D, Pauphilet J (2021) Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4):2893–2920.Google Scholar
  • [10] Bhatnagar S, Sutton R, Bowling M, Ghavamzadeh M, Lee M (2007) Natural-gradient Actor-Critic algorithms. Proc. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 19–26.Google Scholar
  • [11] Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.CrossrefGoogle Scholar
  • [12] Blackwell D (1954) Controlled random walks. Proc. Internat. Congress Math., vol. 3, 336–338.Google Scholar
  • [13] Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149.CrossrefGoogle Scholar
  • [14] Brown N, Sandholm T (2018) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.CrossrefGoogle Scholar
  • [15] Brown N, Sandholm T (2019) Solving imperfect-information games via discounted regret minimization. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1829–1836.Google Scholar
  • [16] Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science 365(6456):885–890.CrossrefGoogle Scholar
  • [17] Burch N, Moravčík M, Schmid M (2019) Revisiting CFR+ and alternating updates. J. Artificial Intelligence Res. 64:429–443.CrossrefGoogle Scholar
  • [18] Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1):120–145.CrossrefGoogle Scholar
  • [19] Chambolle A, Pock T (2016) On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Programming 159(1–2):253–287.CrossrefGoogle Scholar
  • [20] Chiang C-K, Yang T, Lee C-J, Mahdavi M, Lu C-J, Jin R, Zhu S (2012) Online optimization with gradual variations. Proc. Conf. Learn. Theory (COLT), vol. 23, 6.1–6.20.Google Scholar
  • [21] Chzhen E, Giraud C, Stoltz G (2021) A unified approach to fair online learning via blackwell approachability. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, SanDiego), 18280–18292.Google Scholar
  • [22] Combettes PL, Reyes NN (2013) Moreau’s decomposition in Banach spaces. Math. Programming 139(1):103–114.CrossrefGoogle Scholar
  • [23] De Rooij S, Van Erven T, Grünwald PD, Koolen WM (2014) Follow the leader if you can, hedge if you must. J. Machine Learn. Res. 15(1):1281–1316.Google Scholar
  • [24] Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1 ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (PMLR, New York), 272–279.Google Scholar
  • [25] Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math. Geology 35(3):279–300.CrossrefGoogle Scholar
  • [26] Farina G, Kroer C, Sandholm T (2019) Online convex optimization for sequential decision processes and extensive-form games. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1917–1925.Google Scholar
  • [27] Farina G, Kroer C, Sandholm T (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 5222–5232.Google Scholar
  • [28] Farina G, Kroer C, Sandholm T (2021) Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. Proc. AAAI Conf. on Artificial Intelligence 35(6):5363–5371.Google Scholar
  • [29] Gao Y, Kroer C, Goldfarb D (2021) Increasing iterate averaging for solving saddle-point problems. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 7537–7544.Google Scholar
  • [30] Goyal V, Grand-Clement J (2023) Robust Markov decision processes: Beyond rectangularity. Math. Oper. Res. 48(1):203–226.Google Scholar
  • [31] Grand-Clément J, Kroer C (2020) First-order methods for Wasserstein distributionally robust MDP. Internat. Conf. Machine Learn. (PMLR, New York), 2010–2019.Google Scholar
  • [32] Grand-Clément J, Kroer C (2021) Conic Blackwell algorithm: Parameter-free convex-concave saddle-point solving. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, San Diego), 9587–9599.Google Scholar
  • [33] Grand-Clément J, Kroer C (2021) Scalable first-order methods for robust MDPs. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 12086–12094.Google Scholar
  • [34] Grand-Clément J, Chan CW, Goyal V, Escobar G (2022) Robustness of Proactive Intensive Care Unit Transfer Policies. Oper. Res., ePub ahead of print November 22, https://doi.org/10.1287/opre.2022.2403.Google Scholar
  • [35] Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.CrossrefGoogle Scholar
  • [36] Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.CrossrefGoogle Scholar
  • [37] Iyengar G (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.LinkGoogle Scholar
  • [38] Jin Y, Sidford A (2020) Efficiently solving MDPs with stochastic mirror descent. Internat. Conf. Machine Learn. (PMLR), 4890–4900.Google Scholar
  • [39] Kroer C (2020) IEOR8100: Economics, AI, and optimization lecture note 5: Computing Nash equilibrium via regret minimization. Working paper.Google Scholar
  • [40] Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 864–874.Google Scholar
  • [41] Kroer C, Peysakhovich A, Sodomka E, Stier-Moses NE (2022) Computing large market equilibria using abstractions. Oper. Res. 70(1):329–351.LinkGoogle Scholar
  • [42] Kroer C, Waugh K, Kılınç-Karzan F, Sandholm T (2020) Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Programming 179:385–417.CrossrefGoogle Scholar
  • [43] McMahan B (2011) Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist., 525–533.Google Scholar
  • [44] Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.CrossrefGoogle Scholar
  • [45] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Google Scholar
  • [46] Moravčík M, Schmid M, Burch N, Lis`y V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513.CrossrefGoogle Scholar
  • [47] Namkoong H, Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. NIPS, vol. 29, 2208–2216.Google Scholar
  • [48] Nemirovski A (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.CrossrefGoogle Scholar
  • [49] Nemirovski A, Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, New York).Google Scholar
  • [50] Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.CrossrefGoogle Scholar
  • [51] Niazadeh R, Golrezaei N, Wang J, Susan F, Badanidiyuru A (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 737–738.Google Scholar
  • [52] Orabona F, Pál D (2015) Scale-free algorithms for online linear optimization. Internat. Conf. Algorithmic Learn. Theory (Springer), 287–301.Google Scholar
  • [53] Perchet V (2010) Approachability, calibration and regret in games with partial observations. PhD thesis, Université Pierre et Marie Curie, Paris.Google Scholar
  • [54] Perchet V (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamics Games 1(2):181–254.Google Scholar
  • [55] Puterman M (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • [56] Rahimian H, Mehrotra S (2019) Distributionally robust optimization: A review. Open J. Math. Optim. 3(2022).Google Scholar
  • [57] Rakhlin A, Sridharan K (2013) Online learning with predictable sequences. Conf. Learn. Theory (PMLR, New York), 993–1019.Google Scholar
  • [58] Sidford A, Tian K (2018) Coordinate methods for accelerating l∞ regression and faster approximate maximum flow. Proc. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 922–933.Google Scholar
  • [59] Steimle LN, Denton BT (2017) Markov decision processes for screening and treatment of chronic diseases. Boucherie R, van Dijk N, eds. Markov Decision Processes in Practice, International Series in Operations Research & Management Science, vol. 248 (Springer, Cham, Switzerland), 189–222.CrossrefGoogle Scholar
  • [60] Syrgkanis V, Agarwal A, Luo H, Schapire RE (2015) Fast convergence of regularized learning in games. Adv. Neural Inform. Processing Systems 28 (NIPS 2015) (NeurIPS, San Diego).Google Scholar
  • [61] Tammelin O, Burch N, Johanson M, Bowling M (2015) Solving heads-up limit Texas hold’em. Proc. Twenty-Fourth Internat. Joint Conf. Artificial Intelligence.Google Scholar
  • [62] Tseng P (1995) On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2):237–252.CrossrefGoogle Scholar
  • [63] von Stengel B (1996) Efficient computation of behavior strategies. Games Econom. Behav. 14(2):220–246.CrossrefGoogle Scholar
  • [64] Wei C-Y, Lee C-W, Zhang M, Luo H (2020) Linear last-iterate convergence in constrained saddle-point optimization. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • [65] Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.LinkGoogle Scholar
  • [66] Zinkevich M, Johanson M, Bowling M, Piccione C (2007) Regret minimization in games with incomplete information. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 1729–1736.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.