Optimality of Symmetric Independent Policies Under Decentralized Mean-Field Information Sharing for Stochastic Teams and Equivalence with McKean−Vlasov Control of a Representative Agent
Abstract
We study a class of stochastic exchangeable teams with a finite number of decision makers (DMs) as well as their mean-field limits with infinitely many DMs. In the finite-population regime, we study exchangeable teams under the centralized information structure. For the infinite-population setting, we study both the centralized information structure and the decentralized mean-field information-sharing structure. The paper makes the following main contributions. (i) For finite-population exchangeable teams, we establish the existence of an optimal policy that is exchangeable (permutation invariant) and Markovian. (ii) As our main result in the paper, we show that a sequence of exchangeable optimal policies for finite-population settings (which satisfies a measure-valued Markov decision problem (MDP) formulation (following a work by Bäuerle) converges to a decentralized symmetric (identical) and conditionally independent (given the mean-field) policy for the infinite-population problem, which is then globally optimal under both the centralized information structure as well as the mean-field-sharing information structure. (iii) This result establishes the existence of a symmetric, independent, decentralized optimal randomized policy for the infinite-population problem and proves the optimality of the limiting measure-valued MDP for the representative DM. Our paper thus establishes the relation between controlled McKean–Vlasov dynamics and the optimal infinite-population decentralized stochastic control problem (without an a priori restriction of symmetry in policies of individual agents) for the first time to our knowledge (beyond several special cases). We also establish near optimality of a numerical method for solving this problem. (iv) Finally, we show that symmetric, independent, decentralized optimal randomized policies are approximately optimal for the corresponding finite-population team with a large number of DMs under the centralized information structure.
Funding: The research of S. Sanjari and S. Yüksel was supported by the Natural Sciences and Engineering Research Council of Canada.

