Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

Published Online:https://doi.org/10.1287/opre.2021.0633

We study the application of iterative first-order methods to the problem of computing equilibria of large-scale extensive-form games. First-order methods must typically be instantiated with a regularizer that serves as a distance-generating function (DGF) for the decision sets of the players. In this paper, we introduce a new weighted entropy-based distance-generating function. We show that this function is equivalent to a particular set of new weights for the dilated entropy distance–generating function on a treeplex while retaining the simpler structure of the regular entropy function for the unit cube. This function achieves significantly better strong-convexity properties than existing weight schemes for the dilated entropy while maintaining the same easily implemented closed-form proximal mapping as the prior state of the art. Extensive numerical simulations show that these superior theoretical properties translate into better numerical performance as well. We then generalize our new entropy distance function, as well as general dilated distance functions, to the scaled extension operator. The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games as well as the convex polytopes corresponding to correlated and team equilibria. Correspondingly, we give the first efficiently computable distance-generating function for all those strategy polytopes. By instantiating first-order methods with our regularizers, we achieve several new results, such as the first method for computing ex ante correlated team equilibria with a guaranteed 1/T rate of convergence and efficient proximal updates. Similarly, we show that our regularizers can be used to speed up the computation of correlated solution concepts.

Funding: G. Farina was supported by the National Science Foundations [Grant CCF-2443068] and by T. Sandholm’s grants listed below and a Facebook fellowship. C. Kroer was supported by the Office of Naval Research [Grants N00014-22-1-2530 and N00014-23-1-2374] and the National Science Foundation [Grants IIS-2147361 and IIS-2238960]. T. Sandholm was supported by the Vannevar Bush Faculty Fellowship, Office of Naval Research [Grant ONR N00014-23-1-2876], the National Science Foundation Division of Information and Intelligent Systems [Grants RI-1718457, RI-2312342, RI-1901403, and CCF-1733556], the Army Research Office [Grants W911NF2010081 and W911NF2210266], and the National Institutes of Health [Grant A240108S001]. This work was further supported by the National Science Foundation Division of Information and Intelligent Systems [Grant 1617590] and the Army Research Office [Grant W911NF-17-1-0082].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2021.0633.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.