Solving Optimization Problems with Blackwell Approachability
Published Online:16 May 2023https://doi.org/10.1287/moor.2023.1376
References
- [1] (2011) Blackwell approachability and no-regret learning are equivalent. Proc. 24th Annual Conf. Learn. Theory, 27–46.Google Scholar
- [2] Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (COLT) (Omnipress), 263–274.Google Scholar
- [3] (2010) Markov decision processes: A tool for sequential decision making under uncertainty. Medical Decision Making 30(4):474–483.Crossref, Google Scholar
- [4] (1995) On the generation of Markov decision processes. J. Oper. Res. Soc. 46(3):354–361.Crossref, Google Scholar
- [5] (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
- [6] (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
- [7] (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2 (SIAM, Philadelphia).Crossref, Google Scholar
- [8] (2015) Oracle-based robust optimization via online learning. Oper. Res. 63(3):628–638.Link, Google Scholar
- [9] Bertsimas D, Den Hertog D, Pauphilet J (2021) Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4):2893–2920.Google Scholar
- [10] Bhatnagar S, Sutton R, Bowling M, Ghavamzadeh M, Lee M (2007) Natural-gradient Actor-Critic algorithms. Proc. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 19–26.Google Scholar
- [11] (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
- [12] (1954) Controlled random walks. Proc. Internat. Congress Math., vol. 3, 336–338.Google Scholar
- [13] (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149.Crossref, Google Scholar
- [14] (2018) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.Crossref, Google Scholar
- [15] (2019) Solving imperfect-information games via discounted regret minimization. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1829–1836.Google Scholar
- [16] (2019) Superhuman AI for multiplayer poker. Science 365(6456):885–890.Crossref, Google Scholar
- [17] (2019) Revisiting CFR+ and alternating updates. J. Artificial Intelligence Res. 64:429–443.Crossref, Google Scholar
- [18] (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1):120–145.Crossref, Google Scholar
- [19] (2016) On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Programming 159(1–2):253–287.Crossref, Google Scholar
- [20] Chiang C-K, Yang T, Lee C-J, Mahdavi M, Lu C-J, Jin R, Zhu S (2012) Online optimization with gradual variations. Proc. Conf. Learn. Theory (COLT), vol. 23, 6.1–6.20.Google Scholar
- [21] Chzhen E, Giraud C, Stoltz G (2021) A unified approach to fair online learning via blackwell approachability. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, SanDiego), 18280–18292.Google Scholar
- [22] (2013) Moreau’s decomposition in Banach spaces. Math. Programming 139(1):103–114.Crossref, Google Scholar
- [23] (2014) Follow the leader if you can, hedge if you must. J. Machine Learn. Res. 15(1):1281–1316.Google Scholar
- [24] (2008) Efficient projections onto the ℓ1 ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (PMLR, New York), 272–279.Google Scholar
- [25] (2003) Isometric logratio transformations for compositional data analysis. Math. Geology 35(3):279–300.Crossref, Google Scholar
- [26] (2019) Online convex optimization for sequential decision processes and extensive-form games. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 33, 1917–1925.Google Scholar
- [27] (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 5222–5232.Google Scholar
- [28] Farina G, Kroer C, Sandholm T (2021) Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. Proc. AAAI Conf. on Artificial Intelligence 35(6):5363–5371.Google Scholar
- [29] (2021) Increasing iterate averaging for solving saddle-point problems. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 7537–7544.Google Scholar
- [30] Goyal V, Grand-Clement J (2023) Robust Markov decision processes: Beyond rectangularity. Math. Oper. Res. 48(1):203–226.Google Scholar
- [31] (2020) First-order methods for Wasserstein distributionally robust MDP. Internat. Conf. Machine Learn. (PMLR, New York), 2010–2019.Google Scholar
- [32] (2021) Conic Blackwell algorithm: Parameter-free convex-concave saddle-point solving. Adv. Neural Inform. Processing Systems, vol. 34 (NeurIPS, San Diego), 9587–9599.Google Scholar
- [33] (2021) Scalable first-order methods for robust MDPs. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), vol. 35, 12086–12094.Google Scholar
- [34] (2022) Robustness of Proactive Intensive Care Unit Transfer Policies. Oper. Res., ePub ahead of print November 22, https://doi.org/10.1287/opre.2022.2403.Google Scholar
- [35] (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.Crossref, Google Scholar
- [36] (1998) Tracking the best expert. Machine Learn. 32(2):151–178.Crossref, Google Scholar
- [37] (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
- [38] (2020) Efficiently solving MDPs with stochastic mirror descent. Internat. Conf. Machine Learn. (PMLR), 4890–4900.Google Scholar
- [39] (2020) IEOR8100: Economics, AI, and optimization lecture note 5: Computing Nash equilibrium via regret minimization. Working paper.Google Scholar
- [40] (2018) Solving large sequential games with the excessive gap technique. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 864–874.Google Scholar
- [41] (2022) Computing large market equilibria using abstractions. Oper. Res. 70(1):329–351.Link, Google Scholar
- [42] (2020) Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Programming 179:385–417.Crossref, Google Scholar
- [43] (2011) Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist., 525–533.Google Scholar
- [44] (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.Crossref, Google Scholar
- [45] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Google Scholar
- [46] (2017) Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513.Crossref, Google Scholar
- [47] (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. NIPS, vol. 29, 2208–2216.Google Scholar
- [48] (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.Crossref, Google Scholar
- [49] (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, New York).Google Scholar
- [50] (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
- [51] (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 737–738.Google Scholar
- [52] (2015) Scale-free algorithms for online linear optimization. Internat. Conf. Algorithmic Learn. Theory (Springer), 287–301.Google Scholar
- [53] (2010) Approachability, calibration and regret in games with partial observations. PhD thesis, Université Pierre et Marie Curie, Paris.Google Scholar
- [54] Perchet V (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamics Games 1(2):181–254.Google Scholar
- [55] (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Crossref, Google Scholar
- [56] (2019) Distributionally robust optimization: A review. Open J. Math. Optim. 3(2022).Google Scholar
- [57] (2013) Online learning with predictable sequences. Conf. Learn. Theory (PMLR, New York), 993–1019.Google Scholar
- [58] (2018) Coordinate methods for accelerating l∞ regression and faster approximate maximum flow. Proc. 2018 IEEE 59th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 922–933.Google Scholar
- [59] (2017) Markov decision processes for screening and treatment of chronic diseases. Boucherie R, van Dijk N, eds. Markov Decision Processes in Practice, International Series in Operations Research & Management Science, vol. 248 (Springer, Cham, Switzerland), 189–222.Crossref, Google Scholar
- [60] (2015) Fast convergence of regularized learning in games. Adv. Neural Inform. Processing Systems 28 (NIPS 2015) (NeurIPS, San Diego).Google Scholar
- [61] (2015) Solving heads-up limit Texas hold’em. Proc. Twenty-Fourth Internat. Joint Conf. Artificial Intelligence.Google Scholar
- [62] (1995) On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2):237–252.Crossref, Google Scholar
- [63] (1996) Efficient computation of behavior strategies. Games Econom. Behav. 14(2):220–246.Crossref, Google Scholar
- [64] Wei C-Y, Lee C-W, Zhang M, Luo H (2020) Linear last-iterate convergence in constrained saddle-point optimization. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- [65] (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
- [66] (2007) Regret minimization in games with incomplete information. Adv. Neural Inform. Processing Systems (NeurIPS, San Diego), 1729–1736.Google Scholar

