LEGO: Optimal Online Learning Under Sequential Price Competition

Published Online:https://doi.org/10.1287/opre.2024.1085

We consider price competition among multiple sellers over a selling horizon of T periods. In each period, sellers simultaneously set prices and subsequently observe their own demand realizations, which are unobservable to competitors. The realized demand of each seller depends on the prices of all sellers and follows a private, unknown linear model. We propose a least-squares estimation and then gradient optimization (LEGO) policy, which does not require sellers to share demand information or coordinate price experiments throughout the selling horizon. We show that, when employed by all sellers, our policy converges to the Nash equilibrium prices at a rate on the order of 1/T, matching the outcome under full information, whereas each seller achieves regret of optimal order T relative to a dynamic benchmark. Our analysis further shows that the unknown individual price sensitivity is the main source of difficulty in dynamic pricing under sequential competition, leading to worst-case regret of order T. If each seller knows that his or her individual price sensitivity coefficient, then a gradient-based policy achieves a convergence rate of order 1/T to the Nash equilibrium as well as regret of optimal order logT.

Funding: S. Li and S. Mehrotra acknowledge support from the National Science Foundation [Grant CMMI-1763035]. C. Shi acknowledges support from an Amazon Research Award and a Provost Research Award from the University of Miami.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.1085.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.