Entropy Regularization for Mean Field Games with Learning

Xin Guo
Xin Guo
[email protected]
https://orcid.org/0000-0002-3350-4606
Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;Tsinghua-UC Berkeley Shenzhen Institute, Shenzhen 518055, China;
Search for more papers by this author
,
Renyuan Xu
Renyuan Xu
[email protected]
https://orcid.org/0000-0003-4293-3450
Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089;Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
Search for more papers by this author
,
Thaleia Zariphopoulou
Thaleia Zariphopoulou
[email protected]
https://orcid.org/0000-0002-4213-3720
Departments of Mathematics and IROM, The University of Texas at Austin, Austin, Texas 78712;
Search for more papers by this author

Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;Tsinghua-UC Berkeley Shenzhen Institute, Shenzhen 518055, China;

Search for more papers by this author

Renyuan Xu

[email protected]

https://orcid.org/0000-0003-4293-3450

Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089;Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;

Search for more papers by this author

Thaleia Zariphopoulou

[email protected]

https://orcid.org/0000-0002-4213-3720

Departments of Mathematics and IROM, The University of Texas at Austin, Austin, Texas 78712;

Search for more papers by this author

Published Online:25 Feb 2022https://doi.org/10.1287/moor.2021.1238

Abstract

Entropy regularization has been extensively adopted to improve the efficiency, the stability, and the convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively and qualitatively the impact of entropy regularization for mean field games (MFGs) with learning in a finite time horizon. Our study provides a theoretical justification that entropy regularization yields time-dependent policies and, furthermore, helps stabilizing and accelerating convergence to the game equilibrium. In addition, this study leads to a policy-gradient algorithm with exploration in MFG. With this algorithm, agents are able to learn the optimal exploration scheduling, with stable and fast convergence to the game equilibrium.

cover image Mathematics of Operations Research

Volume 47, Issue 4

November 2022

Pages 2547-3399, C2

Article Information

Metrics

Information

Received:October 05, 2020
Accepted:October 28, 2021
Published Online:February 25, 2022

Cite as

Xin Guo, Renyuan Xu, Thaleia Zariphopoulou (2022) Entropy Regularization for Mean Field Games with Learning. Mathematics of Operations Research 47(4):3239-3260.

https://doi.org/10.1287/moor.2021.1238

Keywords

Acknowledgments

This work was presented at the Summer School of the Bachelier Finance Society, the Mathematical Finance Colloquium at the University of Southern California, the SIAG/FME Virtual Seminars Series, the Control and Optimization Seminar at the University of Connecticut, the Department of Systems Engineering and Engineering Management at The Chinese University of Hong Kong, the Actuarial and Financial Mathematics Seminar at the Quantact Laboratory. The authors thank the participants for their comments and suggestions.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Entropy Regularization for Mean Field Games with Learning

Abstract

Volume 47, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News