LEGO: Optimal Online Learning Under Sequential Price Competition
Abstract
We consider price competition among multiple sellers over a selling horizon of T periods. In each period, sellers simultaneously set prices and subsequently observe their own demand realizations, which are unobservable to competitors. The realized demand of each seller depends on the prices of all sellers and follows a private, unknown linear model. We propose a least-squares estimation and then gradient optimization (LEGO) policy, which does not require sellers to share demand information or coordinate price experiments throughout the selling horizon. We show that, when employed by all sellers, our policy converges to the Nash equilibrium prices at a rate on the order of , matching the outcome under full information, whereas each seller achieves regret of optimal order relative to a dynamic benchmark. Our analysis further shows that the unknown individual price sensitivity is the main source of difficulty in dynamic pricing under sequential competition, leading to worst-case regret of order . If each seller knows that his or her individual price sensitivity coefficient, then a gradient-based policy achieves a convergence rate of order to the Nash equilibrium as well as regret of optimal order .
Funding: S. Li and S. Mehrotra acknowledge support from the National Science Foundation [Grant CMMI-1763035]. C. Shi acknowledges support from an Amazon Research Award and a Provost Research Award from the University of Miami.
Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.1085.

