Dynamic Programming Principles for Mean-Field Controls with Learning
References
- (2020) A McKean–Vlasov approach to distributed electricity generation development. Math. Methods Oper. Res. 91(2):269–310.Crossref, Google Scholar
- (2011) A maximum principle for SDEs of mean-field type. Appl. Math. Optim. 63(3):341–356.Crossref, Google Scholar
- (1957) A Markovian decision process. J. Math. Mechanics 6(5):679–684.Google Scholar
- (2013) Mean Field Games and Mean Field Type Control Theory, SpringerBriefs in Mathematics, vol. 101 (Springer, New York).Crossref, Google Scholar
- (1978) Stochastic Optimal Control: The Discrete-Time Case, Mathematics in Science and Engineering, vol. 139 (Academic Press, New York).Google Scholar
- (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (2011) A general stochastic maximum principle for SDEs of mean-field type. Appl. Math. Optim. 64(2):197–216.Crossref, Google Scholar
- (2015) Forward–backward stochastic differential equations and controlled McKean–Vlasov dynamics. Ann. Probab. 43(5):2647–2700.Crossref, Google Scholar
- (2018a) Probabilistic Theory of Mean Field Games with Applications I, Probability Theory and Stochastic Modelling, vol. 83 (Springer, Cham, Switzerland).Google Scholar
- (2018b) Probabilistic Theory of Mean Field Games with Applications II, Probability Theory and Stochastic Modelling, vol. 84 (Springer, Cham, Switzerland).Google Scholar
- (2019a) Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint, submitted October 9, https://arxiv.org/abs/1910.04295.Google Scholar
- (2019b) Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning. Preprint, submitted October 28, https://arxiv.org/abs/1910.12802.Google Scholar
- (1998) Bayesian Q-learning. Proc. 15th Natl./10th Conf. Artificial Intelligence/Innovative Appl. Artificial Intelligence (AAAI Press, Palo Alto, CA), 761–768.Google Scholar
- (2019) McKean-Vlasov optimal control: The dynamic programming principle. Preprint, submitted July 20, https://arxiv.org/abs/1907.08860.Google Scholar
- (2000) Reinforcement learning in continuous time and space. Neural Comput. 12(1):219–245.Crossref, Google Scholar
- (2002) Multiple model-based reinforcement learning. Neural Comput. 14(6):1347–1369.Crossref, Google Scholar
- (2013) Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown Toronto. IEEE Trans. Intelligent Transportation Systems 14(3):1140–1150.Crossref, Google Scholar
- (2003) Learning rates for Q-learning. J. Machine Learn. Res. 5(1):1–25.Google Scholar
- (2006) Controlled Markov Processes and Viscosity Solutions, Stochastic Modelling and Applied Probability, vol. 25. (Springer Science & Business Media, New York).Google Scholar
- (2013) Large deviations for a mean field model of systemic risk. SIAM J. Financial Math. 4(1):151–184.Crossref, Google Scholar
- (2002) On choosing and bounding probability metrics. Internat. Statist. Rev. 70(3):419–435.Crossref, Google Scholar
- (2021) Mean-field controls with Q-learning for cooperative MARL: Convergence and complexity analysis. SIAM J. Math. Data Sci. 3(4):1168–1196.Crossref, Google Scholar
- (2019) Learning mean-field games. Adv. Neural Inform. Processing Systems 32:4966–4976.Google Scholar
- (2006) Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Commun. Inform. Systems 6(3):221–252.Crossref, Google Scholar
- (2018) Real-time bidding with multi-agent reinforcement learning in display advertising. Proc. 27th ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 2193–2201.Google Scholar
- (2000) Actor-critic algorithms. Adv. Neural Inform. Processing Systems 12:1008–1014.Google Scholar
- (2015) Mean field games via controlled martingale problems: Existence of Markovian equilibria. Stochastic Processes Appl. 125(7):2856–2894.Crossref, Google Scholar
- (2017) Limit theory for controlled McKean–Vlasov dynamics. SIAM J. Control Optim. 55(3):1641–1672.Crossref, Google Scholar
- (2007) Mean field games. Jpn. J. Math. 2(1):229–260.Crossref, Google Scholar
- (2014) Dynamic programming for mean-field type control. Comptes Rendus Math. Acad. Sci. Paris 352(9):707–713.Crossref, Google Scholar
- (2019) Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. WWW’19 World Wide Web Conference (Association for Computing Machinery, New York), 983–994.Google Scholar
- (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
- (2013) Algorithmic aspects of mean–variance optimization in Markov decision processes. Eur. J. Oper. Res. 231(3):645–653.Crossref, Google Scholar
- (1969) Propagation of chaos for a class of non-linear parabolic equations. Stochastic Differential Equations, Lecture Series in Differential Equations, vol. 7 (Catholic University, Washington, DC), 41–57.Google Scholar
- (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
- (2019) Mean-field Markov decision processes with common noise and open-loop controls. Preprint, submitted December 17, https://arxiv.org/abs/1912.07883.Google Scholar
- (2017) Optimal social policies in mean field games. Appl. Math. Optim. 76(1):29–57.Crossref, Google Scholar
- (2016) Discrete time McKean–Vlasov control problem: A dynamic programming approach. Appl. Math. Optim. 74(3):487–506.Crossref, Google Scholar
- (2016) Safe, multi-agent, reinforcement learning for autonomous driving. Preprint, submitted October 11, https://arxiv.org/abs/1610.03295.Google Scholar
- (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2009) Optimal Transport: Old and New, Grundlehren der Mathematischen Wissenschaften, vol. 338 (Springer, Berlin).Crossref, Google Scholar
- (2019) Alphastar: Mastering the real-time strategy game Starcraft II. Accessed June 15, 2019, https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii.Google Scholar
- (2020) Breaking the curse of many agents: Provable mean embedding Q-iteration for mean-field reinforcement learning. Daumé H, Singh A, eds. ICML’20 Proc. 37th Internat. Conf. Machine Learn. (JMLR.org), 10092–10103.Google Scholar
- (1989) Learning from delayed rewards. Unpublished PhD thesis, King’s College, Cambridge, UK.Google Scholar
- (1992) Q-learning. Machine Learn. 8(3-4):279–292.Crossref, Google Scholar
- (2010) Classes of multiagent Q-learning dynamics with epsilon-greedy exploration. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1167–1174.Google Scholar

