Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

Gabriele Farina
Gabriele Farina
[email protected]
https://orcid.org/0000-0002-3976-0061
MIT EECS, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Christian Kroer
Corresponding Author
Christian Kroer
[email protected]
https://orcid.org/0000-0002-9009-8683
Industrial Engineering and Operations Research Department, Columbia University, New York, New York 10027
Search for more papers by this author
,
Tuomas Sandholm
Tuomas Sandholm
[email protected]
https://orcid.org/0000-0001-8861-9366
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213; and Strategy Robot, Inc., Pittsburgh, Pennsylvania 15232; and Optimized Markets, Inc., Pittsburgh, Pennsylvania 15232; and Strategic Machine, Inc., Pittsburgh, Pennsylvania 15232
Search for more papers by this author

MIT EECS, Cambridge, Massachusetts 02139

Search for more papers by this author

Christian Kroer

Corresponding Author

Christian Kroer

[email protected]

https://orcid.org/0000-0002-9009-8683

Industrial Engineering and Operations Research Department, Columbia University, New York, New York 10027

Search for more papers by this author

Tuomas Sandholm

[email protected]

https://orcid.org/0000-0001-8861-9366

Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213; and Strategy Robot, Inc., Pittsburgh, Pennsylvania 15232; and Optimized Markets, Inc., Pittsburgh, Pennsylvania 15232; and Strategic Machine, Inc., Pittsburgh, Pennsylvania 15232

Search for more papers by this author

Published Online:28 Feb 2025https://doi.org/10.1287/opre.2021.0633

References

Abernethy JD, Rakhlin A (2009) Beating the adaptive bandit with high probability. COLT 2009 - 22nd Conf. Learn. Theory (Montreal, Quebec), 280.Google Scholar
Bai Y, Jin C, Mei S, Song Z, Yu T (2022) Efficient phi-regret minimization in extensive-form games via online mirror descent. NIPS’22 Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 22313–22325.Google Scholar
Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Application, vol. 2 (Society for Industrial and Applied Mathematics, Philadelphia).Crossref, Google Scholar
Ben-Tal A, Nemirovski A (2005) Non-Euclidean restricted memory level method for large-scale convex optimization. Math. Programming 102(3):407–456.Crossref, Google Scholar
Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit hold’em poker is solved. Science 347(6218):145–149.Crossref, Google Scholar
Brown N, Sandholm T (2017) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.Crossref, Google Scholar
Brown N, Sandholm T (2019a) Solving imperfect-information games via discounted regret minimization. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1829–1836.Google Scholar
Brown N, Sandholm T (2019b) Superhuman AI for multiplayer poker. Science 365(6456):885–890.Crossref, Google Scholar
Celli A, Gatti N (2018) Computational results for extensive-form adversarial team games. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (Association for the Advancement of Artificial Intelligence, Washington, DC), 965–972.Google Scholar
Chakrabarti D, Diakonikolas J, Kroer C (2024) Block-coordinate methods and restarting for solving extensive-form games. NIPS’23 Proc. 37th Internat. Conf. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 10692–10720.Google Scholar
Condat L (2016) Fast projection onto the simplex and the ℓ1 ball. Math. Programming 158(1):575–585.Crossref, Google Scholar
Farina G, Sandholm T (2020) Polynomial-time computation of optimal correlated equilibria in two-player extensive-form games with public chance moves and beyond. NIPS ‘20 Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 19609–19619.Google Scholar
Farina G, Sandholm T (2022) Fast payoff matrix sparsification techniques for structured extensive-form games. Proc. 36th AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Washington, DC), 4999–5007.Google Scholar
Farina G, Bianchi T, Sandholm T (2020a) Coarse correlation in extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 34 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1934–1941.Google Scholar
Farina G, Kroer C, Sandholm T (2019a) Online convex optimization for sequential decision processes and extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1917–1925.Google Scholar
Farina G, Kroer C, Sandholm T (2019b) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 5222–5232.Google Scholar
Farina G, Kroer C, Sandholm T (2020b) Stochastic regret minimization in extensive-form games. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 3018–3028.Google Scholar
Farina G, Kroer C, Sandholm T (2021b) Faster game solving via predictive Blackwell approachability: Connecting regret matching and mirror descent. Proc. AAAI Conf. Artificial Intelligence, vol. 35 (Association for the Advancement of Artificial Intelligence, Washington, DC), 5363–5371.Google Scholar
Farina G, Schmucker R, Sandholm T (2021c) Bandit linear optimization for sequential decision making and extensive-form games. Proc. AAAI Conf. Artificial Intelligence, vol. 35 (Association for the Advancement of Artificial Intelligence, Washington, DC), 5372–5380.Google Scholar
Farina G, Celli A, Gatti N, Sandholm T (2021a) Connecting optimal ex-ante collusion in teams to extensive-form correlation: Faster algorithms and positive complexity results. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 3164–3173.Google Scholar
Farina G, Lee C-W, Luo H, Kroer C (2022b) Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games. Internat. Conf. Machine Learning, ICML 2022, Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 6337–6357.Google Scholar
Farina G, Ling CK, Fang F, Sandholm T (2019c) Correlation in extensive-form games: Saddle-point formulation and benchmarks. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 9233–9243.Google Scholar
Farina G, Ling CK, Fang F, Sandholm T (2019d) Efficient regret minimization algorithm for extensive-form correlated equilibrium. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 5186–5196.Google Scholar
Farina G, Anagnostides I, Lee C-W, Luo H, Kroer C, Sandholm T (2022a) Near-optimal no-regret learning for general convex games. NIPS’22 Proc. 36th Internat. Conf. Neural Inform. Processing Systems (NeurIPS) (Curran Associates Inc., Red Hook, NY), 39076–39089.Google Scholar
Fiegel C, Ménard P, Kozuno T, Munos R, Perchet V, Valko M (2023) Adapting to game trees in zero-sum imperfect information games. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Internat. Conf. Machine Learn., ICML 2023, Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 10093–10135.Google Scholar
Gibson R, Lanctot M, Burch N, Szafron D, Bowling M (2012) Generalized sampling and variance in counterfactual regret minimization. Proc. AAAI Conf. Artificial Intelligence, vol. 26 (Association for the Advancement of Artificial Intelligence, Washington, DC), 1355–1361.Google Scholar
Gilpin A, Sandholm T (2007) Lossless abstraction of imperfect information games. J. ACM 54(5):25.Crossref, Google Scholar
Gilpin A, Peña J, Sandholm T (2012) First-order algorithm with O(ln(1/ϵ)) convergence for ϵ-equilibrium in two-person zero-sum games. Math. Programming 133(1–2):279–298.Crossref, Google Scholar
Held M, Wolfe P, Crowder HP (1974) Validation of subgradient optimization. Math. Programming 6(1):62–88.Crossref, Google Scholar
Hoda S, Gilpin A, Peña J, Sandholm T (2010) Smoothing techniques for computing nash equilibria of sequential games. Math. Oper. Res. 35(2):494–512.Link, Google Scholar
Koller D, Megiddo N, von Stengel B (1996) Efficient computation of equilibria for extensive two-person games. Games Econom. Behav. 14(2):247–259.Crossref, Google Scholar
Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. NIPS’18 Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 872–882.Google Scholar
Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2015) Faster first-order methods for extensive-form game solving. EC’15 Proc. 16th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 817–834.Google Scholar
Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2017) Theoretical and practical advances on smoothing for extensive-form games. EC’17 Proc. 2017 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 693.Google Scholar
Kroer C, Waugh K, Il Inç-Karzan FK, Sandholm T (2020) Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Programming 179:385–417.Crossref, Google Scholar
Lanctot M, Waugh K, Zinkevich M, Bowling M (2009) Monte Carlo sampling for regret minimization in extensive games. NIPS’09 Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1078–1086.Google Scholar
Lee C-W, Kroer C, Luo H (2021) Last-iterate convergence in extensive-form games. NIPS’21 Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 14293–14305.Google Scholar
Moravčík M, Schmid M, Burch N, Lisý V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513.Crossref, Google Scholar
Moulin H, Vial J-P (1978) Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. Internat. J. Game Theory 7(3–4):201–221.Crossref, Google Scholar
Nemirovski A (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.Crossref, Google Scholar
Nesterov Y (2005a) Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1):235–249.Crossref, Google Scholar
Nesterov Y (2005b) Smooth minimization of non-smooth functions. Math. Programming 103:127–152.Crossref, Google Scholar
Romanovskii I (1962) Reduction of a game with complete memory to a matrix game. Soviet Math. 3:678–681.Google Scholar
Schmid M, Burch N, Lanctot M, Moravcik M, Kadlec R, Bowling M (2019) Variance reduction in Monte Carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proc. AAAI Conf. Artificial Intelligence, vol. 33 (Association for the Advancement of Artificial Intelligence, Washington, DC), 2157–2164.Google Scholar
Sokota S, D’Orazio R, Kolter JZ, Loizou N, Lanctot M, Mitliagkas I, Brown N, Kroer C (2023) A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. Eleventh Internat. Conf. Learn. Representations, ICLR 2023 (Kigali, Rwanda).Google Scholar
Syrgkanis V, Agarwal A, Luo H, Schapire RE (2015) Fast convergence of regularized learning in games. NIPS’15 Proc. 29th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2989–2997.Google Scholar
Tseng P (1995) On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2):237–252.Crossref, Google Scholar
von Stengel B (1996) Efficient computation of behavior strategies. Games Econom. Behav. 14(2):220–246.Crossref, Google Scholar
von Stengel B, Forges F (2008) Extensive-form correlated equilibrium: Definition and computational complexity. Math. Oper. Res. 33(4):1002–1022.Link, Google Scholar
Wei C-Y, Wei Lee C, Zhang M, Luo H (2021) Linear last-iterate convergence in constrained saddle-point optimization. 9th Internat. Conf. Learn. Representations, ICLR 2021 (Vienna, Austria).Google Scholar
Xu H, Li K, Fu H, Fu Q, Xing J, Cheng J (2024) Dynamic discounted counterfactual regret minimization. Twelfth Internat. Conf. Learn. Representations, ICLR 2024 (Vienna, Austria).Google Scholar
Zhang B, Sandholm T (2020) Sparsified linear programming for zero-sum equilibrium finding. ICML’20 Proc. 37th Internat. Conf. Machine Learning, Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 11256–11267.Google Scholar
Zhang BH, Farina G, Sandholm T (2023) Team belief DAG: Generalizing the sequence form to team games for fast computation of correlated team max-min equilibria via regret minimization. Internat. Conf. Machine Learn., ICML 2023, Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 40996–41018.Google Scholar
Zhang N, McAleer S, Sandholm T (2024) Faster game solving via hyperparameter schedules. Preprint, submitted April 13, https://doi.org/10.48550/arXiv.2404.09097.Google Scholar
Zhang BH, Farina G, Celli A, Sandholm T (2022) Optimal correlated equilibria in general-sum extensive-form games: Fixed-parameter algorithms, hardness, and two-sided column-generation. EC’22 Proc. 23rd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 1119–1120.Google Scholar
Zinkevich M, Bowling M, Johanson M, Piccione C (2007) Regret minimization in games with incomplete information. NIPS’07 Proc. 21st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1729–1736.Google Scholar

Volume 73, Issue 5

September-October 2025

Pages iii-vii, 2297-2866, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:September 24, 2021
Accepted:November 26, 2024
Published Online:February 28, 2025

Cite as

Gabriele Farina, Christian Kroer, Tuomas Sandholm (2025) Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria. Operations Research 73(5):2430-2457.

https://doi.org/10.1287/opre.2021.0633

Keywords

Acknowledgments

The authors gratefully acknowledge the insightful comments from the anonymous reviewers at Operations Research and the ACM Conference on Economics and Computation. These comments led to a greatly improved manuscript.

This paper first appeared as an extended abstract at the ACM Conference on Economics and Computation in 2021.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

References

Volume 73, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News