Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

Published Online:https://doi.org/10.1287/opre.2021.0633

References

  • Abernethy JD, Rakhlin A (2009) Beating the adaptive bandit with high probability. COLT 2009 - 22nd Conf. Learn. Theory (Montreal, Quebec), 280.Google Scholar
  • Bai Y, Jin C, Mei S, Song Z, Yu T (2022) Efficient phi-regret minimization in extensive-form games via online mirror descent. NIPS’22 Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 22313–22325.Google Scholar
  • Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.CrossrefGoogle Scholar
  • Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Application, vol. 2 (Society for Industrial and Applied Mathematics, Philadelphia).CrossrefGoogle Scholar
  • Ben-Tal A, Nemirovski A (2005) Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Programming 102(3):407–456.CrossrefGoogle Scholar
  • Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149.CrossrefGoogle Scholar
  • Brown N, Sandholm T (2017) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.CrossrefGoogle Scholar
  • Brown N, Sandholm T (2019a) Solving imperfect-information games via discounted regret minimization. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1829–1836.Google Scholar
  • Brown N, Sandholm T (2019b) Superhuman AI for multiplayer poker. Science 365(6456):885–890.CrossrefGoogle Scholar
  • Celli A, Gatti N (2018) Computational results for extensive-form adversarial team games. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (Association for the Advancement of Artificial Intelligence, Washington, DC), 965–972.Google Scholar
  • Chakrabarti D, Diakonikolas J, Kroer C (2024) Block-coordinate methods and restarting for solving extensive-form games. NIPS’23 Proc. 37th Internat. Conf. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 10692–10720.Google Scholar
  • Condat L (2016) Fast projection onto the simplex and the ℓ1 ball. Math. Programming 158(1):575–585.CrossrefGoogle Scholar
  • Farina G, Sandholm T (2020) Polynomial-time computation of optimal correlated equilibria in two-player extensive-form games with public chance moves and beyond. NIPS ‘20 Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 19609–19619.Google Scholar
  • Farina G, Sandholm T (2022) Fast payoff matrix sparsification techniques for structured extensive-form games. Proc. 36th AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Washington, DC), 4999–5007.Google Scholar
  • Farina G, Bianchi T, Sandholm T (2020a) Coarse correlation in extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 34 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1934–1941.Google Scholar
  • Farina G, Kroer C, Sandholm T (2019a) Online convex optimization for sequential decision processes and extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1917–1925.Google Scholar
  • Farina G, Kroer C, Sandholm T (2019b) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 5222–5232.Google Scholar
  • Farina G, Kroer C, Sandholm T (2020b) Stochastic regret minimization in extensive-form games. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 3018–3028.Google Scholar
  • Farina G, Kroer C, Sandholm T (2021b) Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. Proc. AAAI Conf. Artificial Intelligence, vol. 35 (Association for the Advancement of Artificial Intelligence, Washington, DC), 5363–5371.Google Scholar
  • Farina G, Schmucker R, Sandholm T (2021c) Bandit linear optimization for sequential decision making and extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 35 (Association for the Advancement of Artificial Intelligence, Washington, DC), 5372–5380.Google Scholar
  • Farina G, Celli A, Gatti N, Sandholm T (2021a) Connecting optimal ex-ante collusion in teams to extensive-form correlation: Faster algorithms and positive complexity results. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 3164–3173.Google Scholar
  • Farina G, Lee C-W, Luo H, Kroer C (2022b) Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. Internat. Conf. Machine Learning, ICML 2022, Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 6337–6357.Google Scholar
  • Farina G, Ling CK, Fang F, Sandholm T (2019c) Correlation in extensive-form games: Saddle-point formulation and benchmarks. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 9233–9243.Google Scholar
  • Farina G, Ling CK, Fang F, Sandholm T (2019d) Efficient regret minimization algorithm for extensive-form correlated equilibrium. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 5186–5196.Google Scholar
  • Farina G, Anagnostides I, Lee C-W, Luo H, Kroer C, Sandholm T (2022a) Near-optimal no-regret learning for general convex games. NIPS’22 Proc. 36th Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 39076–39089.Google Scholar
  • Fiegel C, Ménard P, Kozuno T, Munos R, Perchet V, Valko M (2023) Adapting to game trees in zero-sum imperfect information games. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Internat. Conf. Machine Learn., ICML 2023, Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 10093–10135.Google Scholar
  • Gibson R, Lanctot M, Burch N, Szafron D, Bowling M (2012) Generalized sampling and variance in counterfactual regret minimization. Proc. AAAI Conf. Artificial Intelligence, vol. 26 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1355–1361.Google Scholar
  • Gilpin A, Sandholm T (2007) Lossless abstraction of imperfect information games. J. ACM 54(5):25.CrossrefGoogle Scholar
  • Gilpin A, Peña J, Sandholm T (2012) First-order algorithm with O(ln(1/ϵ)) convergence for ϵ-equilibrium in two-person zero-sum games. Math. Programming 133(1–2):279–298.CrossrefGoogle Scholar
  • Held M, Wolfe P, Crowder HP (1974) Validation of subgradient optimization. Math. Programming 6(1):62–88.CrossrefGoogle Scholar
  • Hoda S, Gilpin A, Peña J, Sandholm T (2010) Smoothing techniques for computing nash equilibria of sequential games. Math. Oper. Res. 35(2):494–512.LinkGoogle Scholar
  • Koller D, Megiddo N, von Stengel B (1996) Efficient computation of equilibria for extensive two-person games. Games Econom. Behav. 14(2):247–259.CrossrefGoogle Scholar
  • Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. NIPS’18 Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 872–882.Google Scholar
  • Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2015) Faster first-order methods for extensive-form game solving. EC’15 Proc. 16th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 817–834.Google Scholar
  • Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2017) Theoretical and practical advances on smoothing for extensive-form games. EC’17 Proc. 2017 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 693.Google Scholar
  • Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2020) Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Programming 179:385–417.CrossrefGoogle Scholar
  • Lanctot M, Waugh K, Zinkevich M, Bowling M (2009) Monte Carlo sampling for regret minimization in extensive games. NIPS’09 Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1078–1086.Google Scholar
  • Lee C-W, Kroer C, Luo H (2021) Last-iterate convergence in extensive-form games. NIPS’21 Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 14293–14305.Google Scholar
  • Moravčík M, Schmid M, Burch N, Lisý V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513.CrossrefGoogle Scholar
  • Moulin H, Vial J-P (1978) Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. Internat. J. Game Theory 7(3–4):201–221.CrossrefGoogle Scholar
  • Nemirovski A (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.CrossrefGoogle Scholar
  • Nesterov Y (2005a) Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1):235–249.CrossrefGoogle Scholar
  • Nesterov Y (2005b) Smooth minimization of non-smooth functions. Math. Programming 103:127–152.CrossrefGoogle Scholar
  • Romanovskii I (1962) Reduction of a game with complete memory to a matrix game. Soviet Math. 3:678–681.Google Scholar
  • Schmid M, Burch N, Lanctot M, Moravcik M, Kadlec R, Bowling M (2019) Variance reduction in Monte Carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 2157–2164.Google Scholar
  • Sokota S, D’Orazio R, Kolter JZ, Loizou N, Lanctot M, Mitliagkas I, Brown N, Kroer C (2023) A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. Eleventh Internat. Conf. Learn. Representations, ICLR 2023 (Kigali, Rwanda).Google Scholar
  • Syrgkanis V, Agarwal A, Luo H, Schapire RE (2015) Fast convergence of regularized learning in games. NIPS’15 Proc. 29th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2989–2997.Google Scholar
  • Tseng P (1995) On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2):237–252.CrossrefGoogle Scholar
  • von Stengel B (1996) Efficient computation of behavior strategies. Games Econom. Behav. 14(2):220–246.CrossrefGoogle Scholar
  • von Stengel B, Forges F (2008) Extensive-form correlated equilibrium: Definition and computational complexity. Math. Oper. Res. 33(4):1002–1022.LinkGoogle Scholar
  • Wei C-Y, Wei Lee C, Zhang M, Luo H (2021) Linear last-iterate convergence in constrained saddle-point optimization. 9th Internat. Conf. Learn. Representations, ICLR 2021 (Vienna, Austria).Google Scholar
  • Xu H, Li K, Fu H, Fu Q, Xing J, Cheng J (2024) Dynamic discounted counterfactual regret minimization. Twelfth Internat. Conf. Learn. Representations, ICLR 2024 (Vienna, Austria).Google Scholar
  • Zhang B, Sandholm T (2020) Sparsified linear programming for zero-sum equilibrium finding. ICML’20 Proc. 37th Internat. Conf. Machine Learning, Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 11256–11267.Google Scholar
  • Zhang BH, Farina G, Sandholm T (2023) Team belief DAG: Generalizing the sequence form to team games for fast computation of correlated team max-min equilibria via regret minimization. Internat. Conf. Machine Learn., ICML 2023, Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 40996–41018.Google Scholar
  • Zhang N, McAleer S, Sandholm T (2024) Faster game solving via hyperparameter schedules. Preprint, submitted April 13, https://doi.org/10.48550/arXiv.2404.09097.Google Scholar
  • Zhang BH, Farina G, Celli A, Sandholm T (2022) Optimal correlated equilibria in general-sum extensive-form games: Fixed-parameter algorithms, hardness, and two-sided column-generation. EC’22 Proc. 23rd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 1119–1120.Google Scholar
  • Zinkevich M, Bowling M, Johanson M, Piccione C (2007) Regret minimization in games with incomplete information. NIPS’07 Proc. 21st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1729–1736.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.