LEGO: Optimal Online Learning Under Sequential Price Competition

Published Online:https://doi.org/10.1287/opre.2024.1085

References

  • Abada I, Lambin X (2023) Artificial intelligence: Can seemingly collusive outcomes be avoided? Management Sci. 69(9):5042–5065.LinkGoogle Scholar
  • Aksoy-Pierson M, Allon G, Federgruen A (2013) Price competition under mixed multinomial logit demand functions. Management Sci. 59(8):1817–1835.LinkGoogle Scholar
  • Allon G, Gurvich I (2010) Pricing and dimensioning competing large-scale service providers. Manufacturing Service Oper. Management 12(3):449–469.LinkGoogle Scholar
  • Alptekinoğlu A, Semple JH (2016) The exponomial choice model: A new alternative for assortment and price optimization. Oper. Res. 64(1):79–93.LinkGoogle Scholar
  • Aouad A, den Boer AV (2021) Algorithmic collusion in assortment games. Working paper, London Business School, London, UK.Google Scholar
  • Arcieri K (2025) Algorithmic pricing gets boost in ninth cir. hotel-casino ruling. Bloomberg Law. Accessed October 17, 2025. https://news.bloomberglaw.com/antitrust/algorithmic-pricing-gets-boost-in-ninth-cir-hotel-casino-ruling.Google Scholar
  • Asker J, Fershtman C, Pakes A (2022) Artificial intelligence, algorithm design, and pricing. AEA Papers Proc. 112:452–456.CrossrefGoogle Scholar
  • Ba W, Lin T, Zhang J, Zhou Z (2025) Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Oper. Res. 73(6):3219–3244.LinkGoogle Scholar
  • Banchio M, Mantegazza G (2023) Artificial intelligence and spontaneous collusion. Working paper, Bocconi University, Milan, Italy.Google Scholar
  • Bertrand J (1883) Théorie mathématique de la richesse sociale. J. Des Savants 67(1883):499–508.Google Scholar
  • Besbes O, Sauré D (2016) Product assortment and price competition under multinomial logit demand. Production Oper. Management 25(1):114–127.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • Birge JR, Chen H, Keskin NB, Ward A (2024) To interfere or not to interfere: Information revelation and price-setting incentives in a multiagent learning environment. Oper. Res. 72(6):2391–2412.LinkGoogle Scholar
  • Bravo M, Leslie DS, Mertikopoulos P (2018) Bandit learning in concave N-person games. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 5666–5676.Google Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Calvano E, Calzolari G, Denicolo V, Pastorello S (2020) Artificial intelligence, algorithmic pricing, and collusion. Amer. Econom. Rev. 110(10):3267–3297.CrossrefGoogle Scholar
  • Cartea Á, Chang P, Penalva J (2022a) Algorithmic collusion in electronic markets: The impact of tick size. Working paper, University of Oxford, Oxford, UK.Google Scholar
  • Cartea Á, Chang P, Penalva J, Waldon H (2022b) The algorithmic learning equations: Evolving strategies in dynamic games. Working paper, University of Oxford, Oxford, UK.Google Scholar
  • Cartea Á, Chang P, Penalva J, Waldon H (2026) Algorithmic collusion and a folk theorem from learning with bounded rationality. Games Econom. Behav. 157:1–21.CrossrefGoogle Scholar
  • Chen N, Chen YJ (2021) Duopoly competition with network effects in discrete choice models. Oper. Res. 69(2):545–559.LinkGoogle Scholar
  • Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.CrossrefGoogle Scholar
  • Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.LinkGoogle Scholar
  • Cont R, Xiong W (2024) Dynamics of market making algorithms in dealer markets: Learning and tacit collusion. Math. Finance 34(2):467–521.CrossrefGoogle Scholar
  • Cooper WL, Homem-de Mello T, Kleywegt AJ (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.LinkGoogle Scholar
  • Cournot AA (1838) Recherches Sur Les Principes Mathématiques de la Théorie Des Richesses (L. Hachette, Paris).Google Scholar
  • den Boer AV (2023) A (mathematical) definition of algorithmic collusion. Working paper, University of Amsterdam, Amsterdam.Google Scholar
  • den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • den Boer AV, Meylahn JM, Schinkel MP (2022) Artificial collusion: Examining supracompetitive pricing by Q-learning algorithms. Working paper, University of Amsterdam, Amsterdam.Google Scholar
  • Epivent A, Lambin X (2024) On algorithmic collusion and reward-punishment schemes. Econom. Lett. 237:111661.CrossrefGoogle Scholar
  • Eschenbaum N, Mellgren F, Zahn P (2022) Robust algorithmic collusion. Working paper, University of St. Gallen, St. Gallen, Switzerland.Google Scholar
  • Federgruen A, Hu M (2015) Multi-product price and assortment competition. Oper. Res. 63(3):572–584.LinkGoogle Scholar
  • Federgruen A, Hu M (2016) Sequential multiproduct price competition in supply chain networks. Oper. Res. 64(1):135–149.LinkGoogle Scholar
  • Federgruen A, Hu M (2021) Global robust stability in a general price and assortment competition model. Oper. Res. 69(1):164–174.LinkGoogle Scholar
  • Gallego G, Hu M (2014) Dynamic pricing of perishable assets under competition. Management Sci. 60(5):1241–1259.LinkGoogle Scholar
  • Gallego G, Wang R (2014) Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Oper. Res. 62(2):450–461.LinkGoogle Scholar
  • Gallego G, Huh WT, Kang W, Phillips R (2006) Price competition with the attraction demand model: Existence of unique equilibrium and its stability. Manufacturing Service Oper. Management 8(4):359–375.LinkGoogle Scholar
  • Gershgorn D (2024) The best mini desktop PCs. New York Times. Accessed April 15, 2024, https://www.nytimes.com/wirecutter/reviews/best-mini-desktop-pcs/.Google Scholar
  • Golowich N, Pattathil S, Daskalakis C (2020) Tight last-iterate convergence rates for no-regret learning in multi-player games. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds., Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 20766–20778.Google Scholar
  • Golrezaei N, Jaillet P, Liang JCN (2020) No-regret learning in price competitions under consumer reference effects. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds., Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 21416–21427.Google Scholar
  • Goyal V, Li S, Mehrotra S (2023) Learning to price under competition for multinomial logit demand. Working paper, Northwestern University, Evanston, IL.Google Scholar
  • Guo MA, Ying D, Lavaei J, Shen ZJM (2026) Last-iterate convergence in no-regret learning: Games with reference effects under logit demand. Management Sci. 72(2):1007–1024.LinkGoogle Scholar
  • Hansen KT, Misra K, Pai MM (2021) Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Sci. 40(1):1–12.LinkGoogle Scholar
  • Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.CrossrefGoogle Scholar
  • Hettich M (2021) Algorithmic collusion: Insights from deep learning. Working paper, University of Muenster, Muenster, Germany.Google Scholar
  • Hsieh YG, Antonakopoulos K, Mertikopoulos P (2021) Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. Belkin M, Kpotufe S, eds., Proc. 34th Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 134 (PMLR, Brookline, MA), 2388–2422.Google Scholar
  • Jordan MI, Lin T, Zhou Z (2025) Adaptive, doubly optimal no-regret learning in strongly monotone and exp-concave games with gradient feedback. Oper. Res. 73(3):1675–1702.LinkGoogle Scholar
  • Kachani S, Perakis G, Simon C (2007) Modeling the transient nature of dynamic pricing with demand learning in a competitive environment. Nagurney A, ed. Network Science, Nonlinear Science and Infrastructure Systems (Springer, Berlin), 223–267.CrossrefGoogle Scholar
  • Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Kirman AP (1975) Learning by firms about demand conditions. Day RH, Groves T, eds. Adaptive Economic Models (Academic Press, New York), 137–156.CrossrefGoogle Scholar
  • Kirman A (1983) On mistaken beliefs and resultant equilibria. Frydman R, Phelps ES, eds. Individual Forecasting and Aggregate Outcomes (Cambridge University Press, Cambridge, UK), 147–166.Google Scholar
  • Klein T (2018) Assessing autonomous algorithmic collusion: Q-learning under short-run price commitments. Working paper, University of Amsterdam, Amsterdam.Google Scholar
  • Klein T (2021) Autonomous algorithmic collusion: Q-learning under sequential pricing. RAND J. Econom. 52(3):538–558.CrossrefGoogle Scholar
  • Lai TL, Robbins H (1982) Iterated least squares in multiperiod control. Adv. Appl. Math. 3(1):50–73.CrossrefGoogle Scholar
  • Li S, Mehrotra S (2026) Adaptive learning in uncertain and sequential competition. Oper. Res. 74(1):301–338.LinkGoogle Scholar
  • Li S, Luo Q, Huang Z, Shi C (2025) Online learning for constrained assortment optimization under Markov chain choice model. Oper. Res. 73(1):109–138.LinkGoogle Scholar
  • Lin T, Zhou Z, Mertikopoulos P, Jordan MI (2020) Finite-time last-iterate convergence for multi-agent learning in games. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, Brookline, MA), 6161–6171.Google Scholar
  • Loots T, den Boer AV (2022) Data-driven collusion and competition in a pricing duopoly with multinomial logit demand. Production Oper. Management 31(1):45–63.Google Scholar
  • Mao W, Zhang K, Zhu R, Simchi-Levi D, Başar T (2025) Model-free nonstationary reinforcement learning: Near-optimal regret and applications in multiagent reinforcement learning and inventory control. Management Sci. 71(2):1564–1580.LinkGoogle Scholar
  • Martin N (2019) Uber charges more if they think you’re willing to pay more. Forbes. Accessed April 14, 2024, https://www.forbes.com/sites/nicolemartin1/2019/03/30/uber-charges-more-if-they-think-youre-willing-to-pay-more/?sh=1f8993647365.Google Scholar
  • McKinsey and Company (2023) What is fast fashion? McKinsey & Company. Accessed April 13, 2024, https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-fast-fashion.Google Scholar
  • Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unknown payoff functions. Math. Programming 173:465–507.CrossrefGoogle Scholar
  • Meylahn JM (2023a) Does an intermediate price facilitate algorithmic collusion? Working paper, University of Twente, Enschede, The Netherlands.Google Scholar
  • Meylahn JM (2023b) Weak acyclicity in games with unique best-responses and implications for algorithmic collusion. Working paper, University of Twente, Enschede, The Netherlands.Google Scholar
  • Meylahn JM, den Boer AV (2022) Learning to collude in a pricing duopoly. Manufacturing Service Oper. Management 24(5):2577–2594.LinkGoogle Scholar
  • Morrow WR, Skerlos SJ (2011) Fixed-point approaches to computing Bertrand-Nash equilibrium prices under mixed-logit demand. Oper. Res. 59(2):328–345.LinkGoogle Scholar
  • Nemirovski AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, New York).Google Scholar
  • Phillips RL (2005) Pricing and Revenue Optimization (Stanford University Press, Stanford, California).CrossrefGoogle Scholar
  • Qin H (2023) Boba’s boom: Reshaping the U.S. beverage landscape. Michigan Journal of Economics. Accessed April 13, 2024, https://sites.lsa.umich.edu/mje/2023/12/04/bobas-boom-reshaping-the-u-s-beverage-landscape/.Google Scholar
  • Ren D (2023) Huawei and Xiaomi launch new EV models in China, reigniting worries about price wars in the world’s largest EV market. South China Morning Post. Accessed April 13, 2024, https://www.scmp.com/business/china-business/article/3246414/huawei-and-xiaomi-launch-new-ev-models-china-reigniting-worries-about-price-wars-worlds-largest-ev.Google Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Scott M, Stillman A, Simon Z, Tanakasempipat P, Burkhardt P, Lorinc J (2023) EV market’s surge toward $57 trillion sparks global flashpoints. Bloomberg. Accessed April 13, 2024, https://www.bloomberg.com/news/features/2023-11-07/the-57-trillion-ev-market-is-a-battleground-for-china-us-eu?itm_source=record&itm_campaign=EV_Slowdown&itm_content=$57_Trillion_Market-4.Google Scholar
  • Tesauro G, Kephart JO (2002) Pricing in agent economies using multi-agent Q-learning. Autonomous Agents Multi Agent Systems 5(3):289–304.CrossrefGoogle Scholar
  • Valinsky J (2024) Wendy’s will test new menus that change prices throughout the day. CNN. Accessed April 14, 2024, https://www.cnn.com/2024/02/27/food/wendys-test-surge-pricing/index.html.Google Scholar
  • Waltman L, Kaymak U (2008) Q-learning agents in a cournot oligopoly model. J. Econom. Dynam. Control 32(10):3275–3293.CrossrefGoogle Scholar
  • Wang R, Ke C, Cui S (2022) Product price, quality, and service decisions under consumer choice models. Manufacturing Service Oper. Management 24(1):430–447.LinkGoogle Scholar
  • Yang Y, Lee YC, Chen PA (2024) Competitive demand learning: A noncooperative pricing algorithm with coordinated price experimentation. Production Oper. Management 33(1):48–68.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.