Policy Learning with Competing Agents

Roshni Sahoo
Corresponding Author
Roshni Sahoo
[email protected]
https://orcid.org/0000-0002-7424-445X
Department of Computer Science, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Stefan Wager
Stefan Wager
[email protected]
Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author

Roshni Sahoo

Corresponding Author

Roshni Sahoo

[email protected]

https://orcid.org/0000-0002-7424-445X

Department of Computer Science, Stanford University, Stanford, California 94305

Search for more papers by this author

Stefan Wager

[email protected]

Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:28 Apr 2025https://doi.org/10.1287/opre.2022.0687

Abstract

Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignments in the presence of such interference. We consider a dynamic model in which the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy’s mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semisynthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior.

Funding: This work was supported by National Science Foundation (NSF) [Grant SES-2242876]. R. Sahoo is supported by NSF Graduate Research Fellowship Program [Grant DGE-1656518], a Stanford University Data Science Fellowship, and a Stanford University Ethics in Society Fellowship.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2022.0687.

Volume 73, Issue 5

September-October 2025

Pages iii-vii, 2297-2866, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:December 27, 2022
Accepted:March 23, 2025
Published Online:April 28, 2025

Cite as

Roshni Sahoo, Stefan Wager (2025) Policy Learning with Competing Agents. Operations Research 73(5):2515-2529.

https://doi.org/10.1287/opre.2022.0687

Keywords

Acknowledgments

The code and data to support the numerical experiments in this paper can be found at https://github.com/roshni714/policy-learning-competing-agents.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Policy Learning with Competing Agents

Abstract

Volume 73, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News