Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach
References
- [1] (2021) On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Machine Learn. Res. 22(98):1–76.Google Scholar
- [2] (2021) The entry and exit game in the electricity markets: A mean-field game approach. J. Dynamics Games 8(4):331–358.Crossref, Google Scholar
- [3] (2019) Learning and generalization in overparameterized neural networks, going beyond two layers. Adv. Neural Inform. Processing Systems 32:6158–6169.Google Scholar
- [4] (2019) A convergence theory for deep learning via over-parameterization. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 242–252.Google Scholar
- [5] (2018) A finite time analysis of temporal difference learning with linear function approximation. Bubeck S, Perchet, V, Rigollet, P, eds. Conf. Learn. Theory, vol. 75 (PMLR, New York), 1691–1692.Google Scholar
- [6] (2021) Solving N-player dynamic routing games with congestion: A mean-field approach. Preprint, submitted October 22, https://arxiv.org/abs/2110.11943.Google Scholar
- [7] (2019) Neural temporal-difference learning converges to global optima. Adv. Neural Inform. Processing Systems 32:11315–11326.Google Scholar
- [8] (2017) Markov decision process routing games. Internat. Conf. Cyber-Physical Systems (IEEE, Piscataway, NJ), 273–280.Google Scholar
- [9] (2012) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Indust. Informatics 9(1):427–438.Crossref, Google Scholar
- [10] (2015) Mean-field games and systemic risk. Comm. Math. Sci. 13(4):911–933.Crossref, Google Scholar
- [11] (2019) Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. Preprint, submitted October 9, https://arxiv.org/abs/1910.04295.Google Scholar
- [12] (2023) Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning. Ann. Appl. Probab. 33(6B):5334–5381.Crossref, Google Scholar
- [13] (2020) Mean-field games with differing beliefs for algorithmic trading. Math. Finance 30(3):995–1034.Crossref, Google Scholar
- [14] (2023) Sample complexity and overparameterization bounds for projection-free neural TD learning. IEEE Trans. Automatic Control 68(5):2891–2905.Crossref, Google Scholar
- [15] (2022) Communication-efficient policy gradient methods for distributed reinforcement learning. Hennequin PL, ed. IEEE Trans. Control Network Systems 9(2):917–929.Crossref, Google Scholar
- [16] (1993) Measure-valued Markov processes. Ecole d’Eté de Probabilités de Saint-Flour. XXI-1991 (Springer, Berlin, Heidelberg), 1–260.Google Scholar
- [17] (2013) Multi-agent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown Toronto. IEEE Trans. Intelligent Transportation Systems 14(3):1140–1150.Crossref, Google Scholar
- [18] (2018) Counterfactual multi-agent policy gradients. McIlraith SA, Weinberger KQ, eds. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA), 2974–2982.Google Scholar
- [19] (2020) Single-timescale actor-critic provably finds globally optimal policy. Internat. Conf. Learn. Representations.Google Scholar
- [20] (2013) Correlation decay method for decision, optimization, and inference in large-scale networks. Theory Driven by Influential Applications (INFORMS, Catonsville, MD), 108–121.Link, Google Scholar
- [21] (2014) Correlation decay in random decision networks. Math. Oper. Res. 39(2):229–261.Link, Google Scholar
- [22] (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Foundations Trends Machine Learn. 6(4):375–451.Crossref, Google Scholar
- [23] (2023) A level-set approach to the control of state-constrained McKean-Vlasov equations: Application to renewable energy storage and portfolio selection. Numerical Algebra Control Optim. 14(3–4):555–582.Crossref, Google Scholar
- [24] (2010) Understanding the difficulty of training deep feedforward neural networks. Teh YW, Titterington M, eds. Internat. Conf. Artificial Intelligence Statist., vol. 8 (PMLR, New York), 249–256.Google Scholar
- [25] (2021) Mean-field controls with Q-learning for cooperative MARL: Convergence and complexity analysis. SIAM J. Math. Data Sci. 3(4):1168–1196.Crossref, Google Scholar
- [26] (2023) Dynamic programming principles for mean-field controls with learning. Oper. Res. 71(4):1040–1054.Link, Google Scholar
- [27] (2018) SAMoD: Shared autonomous mobility-on-demand using decentralized reinforcement learning. Internat. Conf. Intelligent Transportation Systems (IEEE, Piscataway, NJ), 1558–1563.Google Scholar
- [28] (2019) Learning mean-field games. Adv. Neural Inform. Processing Systems 32:4966–4976.Google Scholar
- [29] (2022) N-player and mean-field games in Itô-diffusion markets with competitive or homophilous interaction. Stochastic Analysis, Filtering, and Stochastic Optimization: A Commemorative Volume to Honor Mark HA Davis’s Contributions (Springer, Berlin, Heidelberg), 209–237.Google Scholar
- [30] (2017) Guided deep reinforcement learning for swarm systems. Preprint, submitted September 18, https://arxiv.org/abs/1709.06011.Google Scholar
- [31] (2014) Mean-field equilibria of dynamic auctions with learning. Management Sci. 60(12):2949–2970.Link, Google Scholar
- [32] (2020) Neural tangent kernels, transportation mappings, and universal approximation. Internat. Conf. Learn. Representations.Google Scholar
- [33] (2020) Provably efficient reinforcement learning with linear function approximation. Abernethy J, Agarwal S, eds. Conf. Learn. Theory (PMLR, New York), 2137–2143.Google Scholar
- [34] (2018) Real-time bidding with multi-agent reinforcement learning in display advertising. ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 2193–2201.Google Scholar
- [35] (2002) Approximately optimal approximate reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 267–274.Google Scholar
- [36] (2000) Actor-critic algorithms. Adv. Neural Inform. Processing Systems 12:1008–1014.Google Scholar
- [37] (2019) Mean-field and N-agent games for optimal investment under relative performance criteria. Math. Finance 29(4):1003–1038.Crossref, Google Scholar
- [38] (2021) Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach. IEEE Trans. Automatic Control 67(12):6429–6444.Crossref, Google Scholar
- [39] (2019) Efficient ridesharing order dispatching with mean-field multi-agent reinforcement learning. World Wide Web Conf. (ACM, New York), 983–994.Google Scholar
- [40] (2021) Multi-agent reinforcement learning in stochastic networked systems. Adv. Neural Inform. Processing Systems 34:7825–7837.Google Scholar
- [41] (2019) Neural trust region/proximal policy optimization attains globally optimal policy. Adv. Neural Inform. Processing Systems 32:10565–10576.Google Scholar
- [42] (2019) Off-policy policy gradient with stationary distribution correction. Globerson A, Hoffmann AG, eds. Conf. Uncertainty Artificial Intelligence, vol. 115 (PMLR, New York), 1180–1190.Google Scholar
- [43] (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inform. Processing Systems 30:6382–6393.Google Scholar
- [44] (2006) Universal kernels. J. Machine Learn. Res. 7(12):2651–2667.Google Scholar
- [45] (2022) Mean-field Markov decision processes with common noise and open-loop controls. Ann. Appl. Probab. 32(2):1421–1458.Crossref, Google Scholar
- [46] (2015) Policy gradient in Lipschitz Markov decision processes. Machine Learn. 100(2):255–283.Crossref, Google Scholar
- [47] (2022) Reinforcement learning for ridesharing: An extended survey. Transportation Res. Part C Emerging Tech. 144:103852.Crossref, Google Scholar
- [48] (2020) Scalable reinforcement learning of localized policies for multi-agent networked systems. Learning for Dynamics and Control, vol. 120 (PMLR, New York), 256–266.Google Scholar
- [49] (2004) Distributed optimization in sensor networks. Internat. Sympos. Inform. Processing Sensor Networks (IEEE, Piscataway, NJ), 20–27.Google Scholar
- [50] (2008) Uniform approximation of functions with random bases. Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 555–561.Google Scholar
- [51] (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. Internat. Conf. Machine Learn., vol. 21(1) (PMLR, New York), 4295–4304.Google Scholar
- [52] (2016) Safe, multi-agent, reinforcement learning for autonomous driving. Preprint, submitted October 11, https://arxiv.org/abs/1610.03295.Google Scholar
- [53] (2012) Optimization for Machine Learning (MIT Press, Cambridge, MA).Google Scholar
- [54] (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. Andre E, Koenig S, eds. Internat. Conf. Autonomous Agents Multi-agent Systems, vol. 3 (ACM, New York), 2085–2087.Google Scholar
- [55] (2000) Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inform. Processing Systems 99:1057–1063.Google Scholar
- [56] (2020) Calibration of shared equilibria in general sum partially observable Markov games. Adv. Neural Inform. Processing Systems 33:14118–14128.Google Scholar
- [57] (2020) Neural policy gradient methods: Global optimality and rates of convergence. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- [58] (2019) Sample efficient policy gradient methods with recursive variance reduction. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- [59] (2020) An improved convergence analysis of stochastic variance-reduced policy gradient. Adams RP, Gogate V, eds. Conf. Uncertainty Artificial Intelligence, vol. 115 (PMLR, New York), 541–551.Google Scholar
- [60] (2020) Multi-agent determinantal Q-learning. Internat. Conf. Machine Learn. (PMLR, New York), 10757–10766.Google Scholar
- [61] (2020) Q-value path decomposition for deep multiagent reinforcement learning. Daumé H, Singh A, eds. Internat. Conf. Machine Learn. (PMLR, New York), 10706–10715.Google Scholar
- [62] (2020) Toward packet routing with fully distributed multiagent deep reinforcement learning. IEEE Trans. Systems Man Cybernetics Systems 52(2):855–868.Crossref, Google Scholar
- [63] (2018) Networked multi-agent reinforcement learning in continuous spaces. Conf. Decision Control (IEEE, Piscataway, NJ), 2771–2776.Google Scholar
- [64] (2021) Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, Chapter 12 (Springer, Cham, Switzerland), 321–384.Crossref, Google Scholar
- [65] (2020) Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM J. Control Optim. 58(6):3586–3612.Crossref, Google Scholar
- [66] (2020) Distributed learning of average belief over networks using sequential observations. Automatica J. IFAC 115:108857.Crossref, Google Scholar
- [67] (2018) Fully decentralized multi-agent reinforcement learning with networked agents. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (PMLR, New York), 5872–5881.Google Scholar
- [68] (2021) Finite-sample analysis for decentralized batch multi-agent reinforcement learning with networked agents. IEEE Trans. Automatic Control 66(12):5925–5940.Crossref, Google Scholar
- [69] (2020) The AI economist: Improving equality and productivity with AI-driven tax policies. Preprint, submitted April 28, https://arxiv.org/abs/2004.13332.Google Scholar
- [70] (2021) Robust power management via learning and game design. Oper. Res. 69(1):331–345.Link, Google Scholar
- [71] (2019) An improved analysis of training over-parameterized deep neural networks. Adv. Neural Inform. Processing Systems 32:2055–2064.Google Scholar

