Policy Learning with Competing Agents

Published Online:https://doi.org/10.1287/opre.2022.0687

Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignments in the presence of such interference. We consider a dynamic model in which the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy’s mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semisynthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior.

Funding: This work was supported by National Science Foundation (NSF) [Grant SES-2242876]. R. Sahoo is supported by NSF Graduate Research Fellowship Program [Grant DGE-1656518], a Stanford University Data Science Fellowship, and a Stanford University Ethics in Society Fellowship.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2022.0687.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.