Entropy Regularization for Mean Field Games with Learning

Xin Guo
Xin Guo
[email protected]
https://orcid.org/0000-0002-3350-4606
Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;Tsinghua-UC Berkeley Shenzhen Institute, Shenzhen 518055, China;
Search for more papers by this author
,
Renyuan Xu
Renyuan Xu
[email protected]
https://orcid.org/0000-0003-4293-3450
Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089;Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
Search for more papers by this author
,
Thaleia Zariphopoulou
Thaleia Zariphopoulou
[email protected]
https://orcid.org/0000-0002-4213-3720
Departments of Mathematics and IROM, The University of Texas at Austin, Austin, Texas 78712;
Search for more papers by this author

Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;Tsinghua-UC Berkeley Shenzhen Institute, Shenzhen 518055, China;

Search for more papers by this author

Renyuan Xu

[email protected]

https://orcid.org/0000-0003-4293-3450

Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90089;Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;

Search for more papers by this author

Thaleia Zariphopoulou

[email protected]

https://orcid.org/0000-0002-4213-3720

Departments of Mathematics and IROM, The University of Texas at Austin, Austin, Texas 78712;

Search for more papers by this author

Published Online:25 Feb 2022https://doi.org/10.1287/moor.2021.1238

References

[1] Ahmed Z, Le Roux N, Norouzi M, Schuurmans D (2019) Understanding the impact of entropy on policy optimization. Proc. Internat. Conf. Machine Learning 97:151–160.Google Scholar
[2] Anahtarci B, Kariksiz CD, Saldi N (2020) Q-learning in regularized mean-field games. Preprint, submitted March 24, https://arxiv.org/abs/2003.12151.Google Scholar
[3] Aström KJ, Wittenmark B (2013) Adaptive Control (Courier Corporation, North Chelmsford, MA).Google Scholar
[4] Bardi M (2012) Explicit solutions of some linear-quadratic mean field games. Networks Heterogeneous Media 7(2):243–261.Crossref, Google Scholar
[5] Bensoussan A, Sung KCJ, Yam SCP, Yung S-P (2016) Linear-quadratic mean field games. J. Optim. Theory Appl. 169(2):496–529.Crossref, Google Scholar
[6] Bhandari J, Russo D (2019) Global optimality guarantees for policy gradient methods. Preprint, submitted June 5, https://arxiv.org/abs/1906.01786.Google Scholar
[7] Cardaliaguet P, Hadikhanloo S (2017) Learning in mean field games: the fictitious play. ESAIM Control Optim. Calculus Variations 23(2):569–591.Crossref, Google Scholar
[8] Cardaliaguet P, Delarue F, Lasry J-M, Lions P-L (2019) The Master Equation and the Convergence Problem in Mean Field Games (Princeton University Press, Princeton, NJ).Google Scholar
[9] Carmona R, Delarue F (2013) Mean field forward-backward stochastic differential equations. Electronic Comm. Probab. 18:1–15.Crossref, Google Scholar
[10] Carmona R, Lacker D (2015) A probabilistic weak formulation of mean field games and applications. Ann. Appl. Probab. 25(3):1189–1231.Crossref, Google Scholar
[11] Carmona R, Laurière M, Tan Z (2019) Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint, submitted October 9, https://arxiv.org/abs/1910.04295.Google Scholar
[12] Dean S, Mania H, Matni N, Recht B, Tu S (2019) On the sample complexity of the linear quadratic regulator. Foundations Comput. Math. 20:633–679.Crossref, Google Scholar
[13] Fazel M, Ge R, Kakade SM, Mesbahi M (2018) Global convergence of policy gradient methods for the linear quadratic regulator. Internat. Conf. Machine Learn. (PMLR), 1467–1476.Google Scholar
[14] Fu Z, Yang Z, Chen Y, Wang Z (2019) Actor-critic provably finds Nash equilibria of linear-quadratic mean-field games. Preprint, submitted October 16, https://arxiv.org/abs/1910.07498.Google Scholar
[15] Grau-Moya J, Leibfried F, Vrancx P (2018) Soft Q-learning with mutual-information regularization. Proc. Internat. Conf. Learning Representations.Google Scholar
[16] Guo X, Hu A, Xu R, Zhang J (2019) Learning mean-field games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4967–4977.Google Scholar
[17] Hambly BM, Xu R, Yang H (2021) Policy gradient methods find the Nash equilibrium in n-player general-sum linear-quadratic games. Preprint, submitted August 2, https://dx.doi.org/10.2139/ssrn.3894471.Google Scholar
[18] Hambly BM, Xu R, Yang H (2021) Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM J. Control Optim. 59(5):3359–3391.Google Scholar
[19] Hazan E, Kakade S, Singh K, Van Soest A (2019) Provably efficient maximum entropy exploration. Proc. Internat. Conf. Machine Learning 97:2681–2691.Google Scholar
[20] Hofbauer J, Sandholm WH (2002) On the global convergence of stochastic fictitious play. Econometrica 70(6):2265–2294.Crossref, Google Scholar
[21] Hong Z-W, Su S-Y, Shann T-Y, Chang Y-H, Lee C-Y (2018) A deep policy inference Q-network for multi-agent systems. Proc. 17th Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1388–1396.Google Scholar
[22] Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) VIME: Variational information maximizing exploration. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1117–1125.Google Scholar
[23] Huang M, Caines PE, Malhamé RP (2007) Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ϵ-Nash equilibria. IEEE Trans. Automatic Control 52(9):1560–1571.Crossref, Google Scholar
[24] Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. Proc. Internat. Conf. Machine Learning 97:2961–2970.Google Scholar
[25] Lacker D (2015) Mean field games via controlled martingale problems: Existence of Markovian equilibria. Stochastic Processes Appl. 125(7):2856–2894.Crossref, Google Scholar
[26] Lacker D, Zariphopoulou T (2017) Mean field and n-agent games for optimal investment under relative performance criteria. Math. Finance 29(4):1003–1038.Crossref, Google Scholar
[27] Lasry J-M, Lions P-L (2007) Mean field games. Japanese J. Math. 2(1):229–260.Crossref, Google Scholar
[28] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
[29] Lu X, Van Roy B (2019) Information-theoretic confidence bounds for reinforcement learning. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2461–2470.Google Scholar
[30] Mania H, Tu S, Recht B (2019) Certainty equivalence is efficient for linear quadratic control. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 10154–10164.Google Scholar
[31] Neu G, Jonsson A, Gómez V (2017) A unified view of entropy-regularized Markov decision processes. Preprint, submitted May 22, https://arxiv.org/abs/1705.07798.Google Scholar
[32] Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen RY, Chen X, Asfour T, Abbeel P, Andrychowicz M (2017.) Parameter space noise for exploration. Preprint, submitted June 6, https://arxiv.org/abs/1706.01905.Google Scholar
[33] Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. Preprint, submitted February 26, https://arxiv.org/abs/1802.09640.Google Scholar
[34] Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learning Res. 17(1):2442–2471.Google Scholar
[35] Wang H, Xun YZ (2019) Continuous-time mean-variance portfolio selection: A reinforcement learning framework. Preprint, submitted May 29, https://dx.doi.org/10.2139/ssrn.Google Scholar
[36] Wang H, Zariphopoulou T, Zhou X (2022) Exploration vs. exploitation in reinforcement learning: A stochastic control approach. J. Machine Learning Res. Forthcoming.Google Scholar
[37] Wang W, Han J, Yang Z, Wang Z (2021) Global convergence of policy gradient for linear-quadratic mean-field control/game in continuous time. Internat. Conf. Machine Learn. (PMLR), 10772–10782.Google Scholar
[38] Wunder M, Littman ML, Babes M (2010) Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learning (Omnipress, Madison, WI), 1167–1174.Google Scholar
[39] Zhang K, Yang Z, Basar T (2019) Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 11602–11614.Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 4

November 2022

Pages 2547-3399, C2

Article Information

Metrics

Information

Received:October 05, 2020
Accepted:October 28, 2021
Published Online:February 25, 2022

Cite as

Xin Guo, Renyuan Xu, Thaleia Zariphopoulou (2022) Entropy Regularization for Mean Field Games with Learning. Mathematics of Operations Research 47(4):3239-3260.

https://doi.org/10.1287/moor.2021.1238

Keywords

Acknowledgments

This work was presented at the Summer School of the Bachelier Finance Society, the Mathematical Finance Colloquium at the University of Southern California, the SIAG/FME Virtual Seminars Series, the Control and Optimization Seminar at the University of Connecticut, the Department of Systems Engineering and Engineering Management at The Chinese University of Hong Kong, the Actuarial and Financial Mathematics Seminar at the Quantact Laboratory. The authors thank the participants for their comments and suggestions.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Entropy Regularization for Mean Field Games with Learning

References

Volume 47, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News