Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games

Manxi Wu
Corresponding Author
Manxi Wu
[email protected]
https://orcid.org/0000-0001-5334-4163
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850
Search for more papers by this author
,
Saurabh Amin
Saurabh Amin
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Asuman Ozdaglar
Asuman Ozdaglar
[email protected]
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Manxi Wu

Corresponding Author

Manxi Wu

[email protected]

https://orcid.org/0000-0001-5334-4163

School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850

Search for more papers by this author

Saurabh Amin

[email protected]

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Asuman Ozdaglar

[email protected]

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:12 Mar 2024https://doi.org/10.1287/moor.2022.0161

References

[1] Acemoglu D, Bimpikis K, Ozdaglar A (2014) Dynamics of information exchange in endogenous social networks. Theor. Econom. 9(1):41–97.Crossref, Google Scholar
[2] Acemoglu D, Makhdoumi A, Malekian A, Ozdaglar A (2017) Fast and slow learning from reviews. NBER Working Paper 24046, National Bureau of Economic Research, Cambridge, MA.Google Scholar
[3] Alós-Ferrer C, Netzer N (2010) The logit-response dynamics. Games Econom. Behav. 68(2):413–427.Crossref, Google Scholar
[4] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
[5] Banerjee AV (1992) A simple model of herd behavior. Quart. J. Econom. 107(3):797–817.Crossref, Google Scholar
[6] Beggs AW (2005) On the convergence of reinforcement learning. J. Econom. Theory 122(1):1–36.Crossref, Google Scholar
[7] Benaim M, Hirsch MW (1999) Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games Econom. Behav. 29(1–2):36–72.Crossref, Google Scholar
[8] Blume LE (1993) The statistical mechanics of strategic interaction. Games Econom. Behav. 5(3):387–424.Crossref, Google Scholar
[9] Bravo M, Leslie D, Mertikopoulos P (2018) Bandit learning in concave n-person games. Adv. Neural Inform. Processing Systems 31.Google Scholar
[10] Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[11] Cominetti R, Melo E, Sorin S (2010) A payoff-based learning procedure and its application to traffic games. Games Econom. Behav. 70(1):71–83.Crossref, Google Scholar
[12] Daskalakis C, Deckelbaum A, Kim A (2011) Near-optimal no-regret algorithms for zero-sum games. Proc. Twenty-Second Annu. ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 235–254.Google Scholar
[13] Dumett MA, Cominetti R (2018) On the stability of an adaptive learning dynamics in traffic games. Preprint, submitted July 3, https://arxiv.org/abs/1807.01256.Google Scholar
[14] Foster D, Young HP (2006) Regret testing: Learning to play Nash equilibrium without knowing you have an opponent. Theor. Econom. 1(3):341–367.Google Scholar
[15] Fudenberg D, Kreps DM (1993) Learning mixed equilibria. Games Econom. Behav. 5(3):320–367.Crossref, Google Scholar
[16] Fudenberg D, Kreps DM (1995) Learning in extensive-form games I. Self-confirming equilibria. Games Econom. Behav. 8(1):20–55.Crossref, Google Scholar
[17] Fudenberg D, Levine DK (1993) Self-confirming equilibrium. Econometrica 61(3):523–545.Crossref, Google Scholar
[18] Fudenberg D, Tirole J (1991) Game Theory (MIT Press, Cambridge, MA).Google Scholar
[19] Gale D, Kariv S (2003) Bayesian learning in social networks. Games Econom. Behav. 45(2):329–346.Crossref, Google Scholar
[20] Golowich N, Pattathil S, Daskalakis C (2020) Tight last-iterate convergence rates for no-regret learning in multi-player games. Adv. Neural Inform. Processing Systems 33:20766–20778.Google Scholar
[21] Golub B, Jackson MO (2010) Naive learning in social networks and the wisdom of crowds. Amer. Econom. J. Microeconom. 2(1):112–149.Crossref, Google Scholar
[22] Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150.Crossref, Google Scholar
[23] Hart S, Mas-Colell A (2003) Regret-based continuous-time dynamics. Games Econom. Behav. 45(2):375–394.Crossref, Google Scholar
[24] Hofbauer J, Sandholm WH (2002) On the global convergence of stochastic fictitious play. Econometrica 70(6):2265–2294.Crossref, Google Scholar
[25] Hofbauer J, Sandholm WH (2009) Stable games and their dynamics. J. Econom. Theory 144(4):1665–1693.Crossref, Google Scholar
[26] Hofbauer J, Sorin S (2006) Best response dynamics for continuous zero-sum games. Discrete Continuous Dynam. Systems Ser. B 6(1):215–224.Crossref, Google Scholar
[27] Hopkins E (2002) Two competing models of how people learn in games. Econometrica 70(6):2141–2166.Crossref, Google Scholar
[28] Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[29] Marden JR, Shamma JS (2012) Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation. Games Econom. Behav. 75(2):788–808.Crossref, Google Scholar
[30] Marden JR, Arslan G, Shamma JS (2007) Regret based dynamics: Convergence in weakly acyclic games. Proc. 6th Internat. Joint Conf. Autonomous Agents Multiagent Systems (Association for Computing Machinery, New York), 42.Google Scholar
[31] Marden JR, Young HP, Arslan G, Shamma JS (2009) Payoff-based dynamics for multiplayer weakly acyclic games. SIAM J. Control Optim. 48(1):373–396.Crossref, Google Scholar
[32] Matsui A (1992) Best response dynamics and socially stable strategies. J. Econom. Theory 57(2):343–362.Crossref, Google Scholar
[33] Meigs E, Parise F, Ozdaglar A (2017) Learning dynamics in stochastic routing games. 2017 55th Annu. Allerton Conf. Commun. Control Comput. (IEEE, Piscataway, NJ), 259–266.Google Scholar
[34] Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unknown payoff functions. Math. Program. 173(1):465–507.Crossref, Google Scholar
[35] Milgrom P, Roberts J (1990) Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58(6):1255–1277.Crossref, Google Scholar
[36] Moe WW, Fader PS (2004) Dynamic conversion behavior at e-commerce sites. Management Sci. 50(3):326–335.Link, Google Scholar
[37] Monderer D, Shapley LS (1996a) Fictitious play property for games with identical interests. J. Econom. Theory 68(1):258–265.Crossref, Google Scholar
[38] Monderer D, Shapley LS (1996b) Potential games. Games Econom. Behav. 14(1):124–143.Crossref, Google Scholar
[39] Mossel E, Sly A, Tamuz O (2015) Strategic learning and the topology of social networks. Econometrica 83(5):1755–1794.Crossref, Google Scholar
[40] Rosen JB (1965) Existence and uniqueness of equilibrium points for concave n-person games. Econometrica 33(3):520–534.Crossref, Google Scholar
[41] Samuelson L, Zhang J (1992) Evolutionary stability in asymmetric games. J. Econom. Theory 57(2):363–391.Crossref, Google Scholar
[42] Sandholm WH (2010a) Local stability under evolutionary game dynamics. Theor. Econom. 5(1):27–50.Crossref, Google Scholar
[43] Sandholm WH (2010b) Population Games and Evolutionary Dynamics (MIT Press, Cambridge, MA).Google Scholar
[44] Smith JM, Price GR (1973) The logic of animal conflict. Nature 246(5427):15–18.Crossref, Google Scholar
[45] Syrgkanis V, Agarwal A, Luo H, Schapire RE (2015) Fast convergence of regularized learning in games. Adv. Neural Inform. Processing Systems 28.Google Scholar
[46] Taylor PD, Jonker LB (1978) Evolutionary stable strategies and game dynamics. Math. Biosci. 40(1–2):145–156.Crossref, Google Scholar
[47] Wu M, Amin S, Ozdaglar AE (2021) Value of information in Bayesian routing games. Oper. Res. 69(1):148–163.Link, Google Scholar
[48] Zhu S, Levinson D, Liu HX, Harder K (2010) The traffic and behavioral effects of the I-35W Mississippi River bridge collapse. Transportation Res. Part A Policy Pract. 44(10):771–784.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 50, Issue 1

February 2025

Pages 1-781 C2

Article Information

Metrics

Information

Received:June 10, 2022
Accepted:January 07, 2024
Published Online:March 12, 2024

Cite as

Manxi Wu; , Saurabh Amin; , Asuman Ozdaglar (2024) Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games. Mathematics of Operations Research 50(1):459-481.

https://doi.org/10.1287/moor.2022.0161

Keywords

Acknowledgments

The authors are grateful to the area editor, the associate editor, and the three reviewers for useful suggestions and constructive feedback. The authors thank participants of the Cornell ORIE Colloquium (2021); the Games, Decisions, and Networks Seminar (2021); the seminar at Simons Institute for the Theory of Computing (2022); and the Spring 2022 Colloquium at the C3.ai Digital Transformation Institute for useful discussions.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games

References

Volume 50, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News