Entropy Regularization for Mean Field Games with Learning

Published Online:https://doi.org/10.1287/moor.2021.1238

References

  • [1] Ahmed Z, Le Roux N, Norouzi M, Schuurmans D (2019) Understanding the impact of entropy on policy optimization. Proc. Internat. Conf. Machine Learning 97:151–160.Google Scholar
  • [2] Anahtarci B, Kariksiz CD, Saldi N (2020) Q-learning in regularized mean-field games. Preprint, submitted March 24, https://arxiv.org/abs/2003.12151.Google Scholar
  • [3] Aström KJ, Wittenmark B (2013) Adaptive Control (Courier Corporation, North Chelmsford, MA).Google Scholar
  • [4] Bardi M (2012) Explicit solutions of some linear-quadratic mean field games. Networks Heterogeneous Media 7(2):243–261.CrossrefGoogle Scholar
  • [5] Bensoussan A, Sung KCJ, Yam SCP, Yung S-P (2016) Linear-quadratic mean field games. J. Optim. Theory Appl. 169(2):496–529.CrossrefGoogle Scholar
  • [6] Bhandari J, Russo D (2019) Global optimality guarantees for policy gradient methods. Preprint, submitted June 5, https://arxiv.org/abs/1906.01786.Google Scholar
  • [7] Cardaliaguet P, Hadikhanloo S (2017) Learning in mean field games: the fictitious play. ESAIM Control Optim. Calculus Variations 23(2):569–591.CrossrefGoogle Scholar
  • [8] Cardaliaguet P, Delarue F, Lasry J-M, Lions P-L (2019) The Master Equation and the Convergence Problem in Mean Field Games (Princeton University Press, Princeton, NJ).Google Scholar
  • [9] Carmona R, Delarue F (2013) Mean field forward-backward stochastic differential equations. Electronic Comm. Probab. 18:1–15.CrossrefGoogle Scholar
  • [10] Carmona R, Lacker D (2015) A probabilistic weak formulation of mean field games and applications. Ann. Appl. Probab. 25(3):1189–1231.CrossrefGoogle Scholar
  • [11] Carmona R, Laurière M, Tan Z (2019) Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint, submitted October 9, https://arxiv.org/abs/1910.04295.Google Scholar
  • [12] Dean S, Mania H, Matni N, Recht B, Tu S (2019) On the sample complexity of the linear quadratic regulator. Foundations Comput. Math. 20:633–679.CrossrefGoogle Scholar
  • [13] Fazel M, Ge R, Kakade SM, Mesbahi M (2018) Global convergence of policy gradient methods for the linear quadratic regulator. Internat. Conf. Machine Learn. (PMLR), 1467–1476.Google Scholar
  • [14] Fu Z, Yang Z, Chen Y, Wang Z (2019) Actor-critic provably finds Nash equilibria of linear-quadratic mean-field games. Preprint, submitted October 16, https://arxiv.org/abs/1910.07498.Google Scholar
  • [15] Grau-Moya J, Leibfried F, Vrancx P (2018) Soft Q-learning with mutual-information regularization. Proc. Internat. Conf. Learning Representations.Google Scholar
  • [16] Guo X, Hu A, Xu R, Zhang J (2019) Learning mean-field games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4967–4977.Google Scholar
  • [17] Hambly BM, Xu R, Yang H (2021) Policy gradient methods find the Nash equilibrium in n-player general-sum linear-quadratic games. Preprint, submitted August 2, https://dx.doi.org/10.2139/ssrn.3894471.Google Scholar
  • [18] Hambly BM, Xu R, Yang H (2021) Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM J. Control Optim. 59(5):3359–3391.Google Scholar
  • [19] Hazan E, Kakade S, Singh K, Van Soest A (2019) Provably efficient maximum entropy exploration. Proc. Internat. Conf. Machine Learning 97:2681–2691.Google Scholar
  • [20] Hofbauer J, Sandholm WH (2002) On the global convergence of stochastic fictitious play. Econometrica 70(6):2265–2294.CrossrefGoogle Scholar
  • [21] Hong Z-W, Su S-Y, Shann T-Y, Chang Y-H, Lee C-Y (2018) A deep policy inference Q-network for multi-agent systems. Proc. 17th Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1388–1396.Google Scholar
  • [22] Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P (2016) VIME: Variational information maximizing exploration. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1117–1125.Google Scholar
  • [23] Huang M, Caines PE, Malhamé RP (2007) Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ϵ-Nash equilibria. IEEE Trans. Automatic Control 52(9):1560–1571.CrossrefGoogle Scholar
  • [24] Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. Proc. Internat. Conf. Machine Learning 97:2961–2970.Google Scholar
  • [25] Lacker D (2015) Mean field games via controlled martingale problems: Existence of Markovian equilibria. Stochastic Processes Appl. 125(7):2856–2894.CrossrefGoogle Scholar
  • [26] Lacker D, Zariphopoulou T (2017) Mean field and n-agent games for optimal investment under relative performance criteria. Math. Finance 29(4):1003–1038.CrossrefGoogle Scholar
  • [27] Lasry J-M, Lions P-L (2007) Mean field games. Japanese J. Math. 2(1):229–260.CrossrefGoogle Scholar
  • [28] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
  • [29] Lu X, Van Roy B (2019) Information-theoretic confidence bounds for reinforcement learning. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2461–2470.Google Scholar
  • [30] Mania H, Tu S, Recht B (2019) Certainty equivalence is efficient for linear quadratic control. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 10154–10164.Google Scholar
  • [31] Neu G, Jonsson A, Gómez V (2017) A unified view of entropy-regularized Markov decision processes. Preprint, submitted May 22, https://arxiv.org/abs/1705.07798.Google Scholar
  • [32] Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen RY, Chen X, Asfour T, Abbeel P, Andrychowicz M (2017.) Parameter space noise for exploration. Preprint, submitted June 6, https://arxiv.org/abs/1706.01905.Google Scholar
  • [33] Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. Preprint, submitted February 26, https://arxiv.org/abs/1802.09640.Google Scholar
  • [34] Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learning Res. 17(1):2442–2471.Google Scholar
  • [35] Wang H, Xun YZ (2019) Continuous-time mean-variance portfolio selection: A reinforcement learning framework. Preprint, submitted May 29, https://dx.doi.org/10.2139/ssrn.Google Scholar
  • [36] Wang H, Zariphopoulou T, Zhou X (2022) Exploration vs. exploitation in reinforcement learning: A stochastic control approach. J. Machine Learning Res. Forthcoming.Google Scholar
  • [37] Wang W, Han J, Yang Z, Wang Z (2021) Global convergence of policy gradient for linear-quadratic mean-field control/game in continuous time. Internat. Conf. Machine Learn. (PMLR), 10772–10782.Google Scholar
  • [38] Wunder M, Littman ML, Babes M (2010) Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learning (Omnipress, Madison, WI), 1167–1174.Google Scholar
  • [39] Zhang K, Yang Z, Basar T (2019) Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 11602–11614.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.