Optimality of Symmetric Independent Policies Under Decentralized Mean-Field Information Sharing for Stochastic Teams and Equivalence with McKean−Vlasov Control of a Representative Agent

Published Online:https://doi.org/10.1287/moor.2024.0489

We study a class of stochastic exchangeable teams with a finite number of decision makers (DMs) as well as their mean-field limits with infinitely many DMs. In the finite-population regime, we study exchangeable teams under the centralized information structure. For the infinite-population setting, we study both the centralized information structure and the decentralized mean-field information-sharing structure. The paper makes the following main contributions. (i) For finite-population exchangeable teams, we establish the existence of an optimal policy that is exchangeable (permutation invariant) and Markovian. (ii) As our main result in the paper, we show that a sequence of exchangeable optimal policies for finite-population settings (which satisfies a measure-valued Markov decision problem (MDP) formulation (following a work by Bäuerle) converges to a decentralized symmetric (identical) and conditionally independent (given the mean-field) policy for the infinite-population problem, which is then globally optimal under both the centralized information structure as well as the mean-field-sharing information structure. (iii) This result establishes the existence of a symmetric, independent, decentralized optimal randomized policy for the infinite-population problem and proves the optimality of the limiting measure-valued MDP for the representative DM. Our paper thus establishes the relation between controlled McKean–Vlasov dynamics and the optimal infinite-population decentralized stochastic control problem (without an a priori restriction of symmetry in policies of individual agents) for the first time to our knowledge (beyond several special cases). We also establish near optimality of a numerical method for solving this problem. (iv) Finally, we show that symmetric, independent, decentralized optimal randomized policies are approximately optimal for the corresponding finite-population team with a large number of DMs under the centralized information structure.

Funding: The research of S. Sanjari and S. Yüksel was supported by the Natural Sciences and Engineering Research Council of Canada.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.