Entropy Regularization for Mean Field Games with Learning
References
- [1] (2019) Understanding the impact of entropy on policy optimization. Proc. Internat. Conf. Machine Learning 97:151–160.Google Scholar
- [2] (2020) Q-learning in regularized mean-field games. Preprint, submitted March 24, https://arxiv.org/abs/2003.12151.Google Scholar
- [3] (2013) Adaptive Control (Courier Corporation, North Chelmsford, MA).Google Scholar
- [4] (2012) Explicit solutions of some linear-quadratic mean field games. Networks Heterogeneous Media 7(2):243–261.Crossref, Google Scholar
- [5] (2016) Linear-quadratic mean field games. J. Optim. Theory Appl. 169(2):496–529.Crossref, Google Scholar
- [6] (2019) Global optimality guarantees for policy gradient methods. Preprint, submitted June 5, https://arxiv.org/abs/1906.01786.Google Scholar
- [7] (2017) Learning in mean field games: the fictitious play. ESAIM Control Optim. Calculus Variations 23(2):569–591.Crossref, Google Scholar
- [8] (2019) The Master Equation and the Convergence Problem in Mean Field Games (Princeton University Press, Princeton, NJ).Google Scholar
- [9] (2013) Mean field forward-backward stochastic differential equations. Electronic Comm. Probab. 18:1–15.Crossref, Google Scholar
- [10] (2015) A probabilistic weak formulation of mean field games and applications. Ann. Appl. Probab. 25(3):1189–1231.Crossref, Google Scholar
- [11] (2019) Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint, submitted October 9, https://arxiv.org/abs/1910.04295.Google Scholar
- [12] (2019) On the sample complexity of the linear quadratic regulator. Foundations Comput. Math. 20:633–679.Crossref, Google Scholar
- [13] (2018) Global convergence of policy gradient methods for the linear quadratic regulator. Internat. Conf. Machine Learn. (PMLR), 1467–1476.Google Scholar
- [14] (2019) Actor-critic provably finds Nash equilibria of linear-quadratic mean-field games. Preprint, submitted October 16, https://arxiv.org/abs/1910.07498.Google Scholar
- [15] (2018) Soft Q-learning with mutual-information regularization. Proc. Internat. Conf. Learning Representations.Google Scholar
- [16] (2019) Learning mean-field games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4967–4977.Google Scholar
- [17] (2021) Policy gradient methods find the Nash equilibrium in n-player general-sum linear-quadratic games. Preprint, submitted August 2, https://dx.doi.org/10.2139/ssrn.3894471.Google Scholar
- [18] (2021) Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM J. Control Optim. 59(5):3359–3391.Google Scholar
- [19] (2019) Provably efficient maximum entropy exploration. Proc. Internat. Conf. Machine Learning 97:2681–2691.Google Scholar
- [20] (2002) On the global convergence of stochastic fictitious play. Econometrica 70(6):2265–2294.Crossref, Google Scholar
- [21] (2018) A deep policy inference Q-network for multi-agent systems. Proc. 17th Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1388–1396.Google Scholar
- [22] (2016) VIME: Variational information maximizing exploration. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1117–1125.Google Scholar
- [23] (2007) Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ϵ-Nash equilibria. IEEE Trans. Automatic Control 52(9):1560–1571.Crossref, Google Scholar
- [24] (2019) Actor-attention-critic for multi-agent reinforcement learning. Proc. Internat. Conf. Machine Learning 97:2961–2970.Google Scholar
- [25] (2015) Mean field games via controlled martingale problems: Existence of Markovian equilibria. Stochastic Processes Appl. 125(7):2856–2894.Crossref, Google Scholar
- [26] (2017) Mean field and n-agent games for optimal investment under relative performance criteria. Math. Finance 29(4):1003–1038.Crossref, Google Scholar
- [27] (2007) Mean field games. Japanese J. Math. 2(1):229–260.Crossref, Google Scholar
- [28] (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
- [29] (2019) Information-theoretic confidence bounds for reinforcement learning. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2461–2470.Google Scholar
- [30] (2019) Certainty equivalence is efficient for linear quadratic control. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 10154–10164.Google Scholar
- [31] (2017) A unified view of entropy-regularized Markov decision processes. Preprint, submitted May 22, https://arxiv.org/abs/1705.07798.Google Scholar
- [32] (2017.) Parameter space noise for exploration. Preprint, submitted June 6, https://arxiv.org/abs/1706.01905.Google Scholar
- [33] (2018) Modeling others using oneself in multi-agent reinforcement learning. Preprint, submitted February 26, https://arxiv.org/abs/1802.09640.Google Scholar
- [34] (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learning Res. 17(1):2442–2471.Google Scholar
- [35] (2019) Continuous-time mean-variance portfolio selection: A reinforcement learning framework. Preprint, submitted May 29, https://dx.doi.org/10.2139/ssrn.Google Scholar
- [36] (2022) Exploration vs. exploitation in reinforcement learning: A stochastic control approach. J. Machine Learning Res. Forthcoming.Google Scholar
- [37] (2021) Global convergence of policy gradient for linear-quadratic mean-field control/game in continuous time. Internat. Conf. Machine Learn. (PMLR), 10772–10782.Google Scholar
- [38] (2010) Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learning (Omnipress, Madison, WI), 1167–1174.Google Scholar
- [39] (2019) Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 11602–11614.Google Scholar

