Dynamic Programming Principles for Mean-Field Controls with Learning

Haotian Gu
Haotian Gu
[email protected]
https://orcid.org/0000-0002-0268-7147
Department of Mathematics, University of California, Berkeley, California 94720;
Search for more papers by this author
,
Xin Guo
Corresponding Author
Xin Guo
[email protected]
https://orcid.org/0000-0002-3350-4606
Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;
Search for more papers by this author
,
Xiaoli Wei
Xiaoli Wei
[email protected]
https://orcid.org/0000-0002-4787-2856
Tsinghua-Berkeley Shenzhen Institute, Shenzen 518055, China;
Search for more papers by this author
,
Renyuan Xu
Renyuan Xu
[email protected]
https://orcid.org/0000-0003-4293-3450
Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90001
Search for more papers by this author

Department of Mathematics, University of California, Berkeley, California 94720;

Search for more papers by this author

Xin Guo

Corresponding Author

Xin Guo

[email protected]

https://orcid.org/0000-0002-3350-4606

Department of Industrial Engineering and Operations Research, University of California, Berkeley, California 94720;

Search for more papers by this author

Xiaoli Wei

[email protected]

https://orcid.org/0000-0002-4787-2856

Tsinghua-Berkeley Shenzhen Institute, Shenzen 518055, China;

Search for more papers by this author

Renyuan Xu

[email protected]

https://orcid.org/0000-0003-4293-3450

Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90001

Search for more papers by this author

Published Online:12 Jan 2023https://doi.org/10.1287/opre.2022.2395

Abstract

The dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and, more recently, mean-field controls (MFCs). However, in the learning framework of MFCs, the DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where the DPP fails with a misspecified Q function and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.

Funding: Financial support from the Coleman Fung Chair Endowment Fund and the Tsinghua-Berkeley-Shenzhen-Institute is gratefully acknowledged.

Volume 71, Issue 4

July-August 2023

Pages iii-vi, 1021-1439, C2-C3

Article Information

Metrics

Information

Received:November 14, 2020
Accepted:August 24, 2022
Published Online:January 12, 2023

Cite as

Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu (2023) Dynamic Programming Principles for Mean-Field Controls with Learning. Operations Research 71(4):1040-1054.

https://doi.org/10.1287/opre.2022.2395

Keywords

Acknowledgments

The authors thank the area editor, associate editor, and two anonymous referees whose comments helped them significantly strengthen both the theoretical and computational results.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Dynamic Programming Principles for Mean-Field Controls with Learning

Abstract

Volume 71, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News