Multiagent Online Learning in Time-Varying Games

Published Online:https://doi.org/10.1287/moor.2022.1283

We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoff-based feedback—that is, when players only get to observe the payoffs of their chosen actions.

Funding: This research was partially supported by the European Cooperation in Science and Technology COST Action [Grant CA16228] “European Network for Game Theory” (GAMENET). P. Mertikopoulos is grateful for financial support by the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program [Grant ANR-15-IDEX-02], the LabEx PERSYVAL [Grant ANR-11-LABX-0025-01], MIAI@Grenoble Alpes [Grant ANR-19-P3IA-0003], and the ALIAS [Grant ANR-19-CE48-0018-01].

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.