Adaptive Learning in Uncertain and Sequential Competition

Published Online:https://doi.org/10.1287/opre.2024.0825

We investigate an individual’s decision-making problem in a competitive and uncertain environment, where N learners (decision makers) confront unknown objective functions, lack competitor data, and optimize actions over a finite horizon of T epochs. Within a general framework, we explore what conditions ensure good performance of learning policies solely based on individual data. We show that when learner objective functions exhibit a tatônnement stability property and individual data are informative regarding the learner’s best response to competitor actions, individual data alone are sufficient for designing a learning policy that, when employed by all learners, leads to Nash equilibrium. Specifically, under our learning policy, the worst-off learners within each epoch make progress toward Nash equilibrium. The convergence rate is O(1/T) under noise-free feedback and O(T1/3logT) under noisy feedback, with constants independent of N. Simultaneously, each learner attains sublinear regret relative to a dynamic benchmark: O(logT) under noise-free feedback and O(T2/3logT) under noisy feedback. We illustrate our informative individual data conditions and learning policy using applications from a repeated newsvendor-type competition with demand substitution and a multiseller multiproduct repeated price competition.

Funding: The work of the authors was supported by the National Science Foundation [grant CMMI-1763035].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.0825.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.