Multiagent Online Learning in Time-Varying Games

Benoit Duvocelle
Benoit Duvocelle
[email protected]
https://orcid.org/0000-0002-7191-389X
Department of Quantitative Economics, Maastricht University, NL–6200 MD Maastricht, Netherlands;
Search for more papers by this author
,
Panayotis Mertikopoulos
Panayotis Mertikopoulos
[email protected]
https://orcid.org/0000-0003-2026-9616
Université Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000 Grenoble, France;Criteo AI Lab, 38130 Echirolles, France;
Search for more papers by this author
,
Mathias Staudigl
Corresponding Author
Mathias Staudigl
[email protected]
https://orcid.org/0000-0003-2481-0019
Department of Advanced Computing Sciences, Maastricht University, NL–6200 MD Maastricht, Netherlands
Search for more papers by this author
,
Dries Vermeulen
Dries Vermeulen
[email protected]
Department of Quantitative Economics, Maastricht University, NL–6200 MD Maastricht, Netherlands;
Search for more papers by this author

Department of Quantitative Economics, Maastricht University, NL–6200 MD Maastricht, Netherlands;

https://orcid.org/0000-0003-2026-9616

Université Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000 Grenoble, France;Criteo AI Lab, 38130 Echirolles, France;

Search for more papers by this author

Mathias Staudigl

Corresponding Author

Mathias Staudigl

[email protected]

https://orcid.org/0000-0003-2481-0019

Department of Advanced Computing Sciences, Maastricht University, NL–6200 MD Maastricht, Netherlands

Search for more papers by this author

Dries Vermeulen

[email protected]

Department of Quantitative Economics, Maastricht University, NL–6200 MD Maastricht, Netherlands;

Search for more papers by this author

Published Online:1 Jul 2022https://doi.org/10.1287/moor.2022.1283

Abstract

We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoff-based feedback—that is, when players only get to observe the payoffs of their chosen actions.

Funding: This research was partially supported by the European Cooperation in Science and Technology COST Action [Grant CA16228] “European Network for Game Theory” (GAMENET). P. Mertikopoulos is grateful for financial support by the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program [Grant ANR-15-IDEX-02], the LabEx PERSYVAL [Grant ANR-11-LABX-0025-01], MIAI@Grenoble Alpes [Grant ANR-19-P3IA-0003], and the ALIAS [Grant ANR-19-CE48-0018-01].

cover image Mathematics of Operations Research

Volume 48, Issue 2

May 2023

Pages 603-1211, C2

Article Information

Metrics

Information

Received:September 08, 2018
Accepted:May 07, 2022
Published Online:July 01, 2022

Cite as

Benoit Duvocelle, Panayotis Mertikopoulos, Mathias Staudigl, Dries Vermeulen (2022) Multiagent Online Learning in Time-Varying Games. Mathematics of Operations Research 48(2):914-941.

https://doi.org/10.1287/moor.2022.1283

Keywords

Acknowledgments

The authors are deeply grateful to the associate editor and two anonymous referees for providing many insightful comments and remarks that greatly improved the manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Multiagent Online Learning in Time-Varying Games

Abstract

Volume 48, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News