LEGO: Optimal Online Learning Under Sequential Price Competition

Shukai Li
Shukai Li
[email protected]
https://orcid.org/0000-0003-3540-5035
Operations and Business Analytics, New York University Shanghai, Shanghai 200124, China
Search for more papers by this author
,
Cong Shi
Corresponding Author
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Department Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146
Search for more papers by this author
,
Sanjay Mehrotra
Sanjay Mehrotra
[email protected]
https://orcid.org/0000-0003-1106-1901
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author

Operations and Business Analytics, New York University Shanghai, Shanghai 200124, China

Search for more papers by this author

Cong Shi

Corresponding Author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Department Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146

Search for more papers by this author

Sanjay Mehrotra

[email protected]

https://orcid.org/0000-0003-1106-1901

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Published Online:3 Jun 2026https://doi.org/10.1287/opre.2024.1085

References

Abada I, Lambin X (2023) Artificial intelligence: Can seemingly collusive outcomes be avoided? Management Sci. 69(9):5042–5065.Link, Google Scholar
Aksoy-Pierson M, Allon G, Federgruen A (2013) Price competition under mixed multinomial logit demand functions. Management Sci. 59(8):1817–1835.Link, Google Scholar
Allon G, Gurvich I (2010) Pricing and dimensioning competing large-scale service providers. Manufacturing Service Oper. Management 12(3):449–469.Link, Google Scholar
Alptekinoğlu A, Semple JH (2016) The exponomial choice model: A new alternative for assortment and price optimization. Oper. Res. 64(1):79–93.Link, Google Scholar
Aouad A, den Boer AV (2021) Algorithmic collusion in assortment games. Working paper, London Business School, London, UK.Google Scholar
Arcieri K (2025) Algorithmic pricing gets boost in ninth cir. hotel-casino ruling. Bloomberg Law. Accessed October 17, 2025. https://news.bloomberglaw.com/antitrust/algorithmic-pricing-gets-boost-in-ninth-cir-hotel-casino-ruling.Google Scholar
Asker J, Fershtman C, Pakes A (2022) Artificial intelligence, algorithm design, and pricing. AEA Papers Proc. 112:452–456.Crossref, Google Scholar
Ba W, Lin T, Zhang J, Zhou Z (2025) Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Oper. Res. 73(6):3219–3244.Link, Google Scholar
Banchio M, Mantegazza G (2023) Artificial intelligence and spontaneous collusion. Working paper, Bocconi University, Milan, Italy.Google Scholar
Bertrand J (1883) Théorie mathématique de la richesse sociale. J. Des Savants 67(1883):499–508.Google Scholar
Besbes O, Sauré D (2016) Product assortment and price competition under multinomial logit demand. Production Oper. Management 25(1):114–127.Crossref, Google Scholar
Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
Birge JR, Chen H, Keskin NB, Ward A (2024) To interfere or not to interfere: Information revelation and price-setting incentives in a multiagent learning environment. Oper. Res. 72(6):2391–2412.Link, Google Scholar
Bravo M, Leslie DS, Mertikopoulos P (2018) Bandit learning in concave N-person games. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 5666–5676.Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Calvano E, Calzolari G, Denicolo V, Pastorello S (2020) Artificial intelligence, algorithmic pricing, and collusion. Amer. Econom. Rev. 110(10):3267–3297.Crossref, Google Scholar
Cartea Á, Chang P, Penalva J (2022a) Algorithmic collusion in electronic markets: The impact of tick size. Working paper, University of Oxford, Oxford, UK.Google Scholar
Cartea Á, Chang P, Penalva J, Waldon H (2022b) The algorithmic learning equations: Evolving strategies in dynamic games. Working paper, University of Oxford, Oxford, UK.Google Scholar
Cartea Á, Chang P, Penalva J, Waldon H (2026) Algorithmic collusion and a folk theorem from learning with bounded rationality. Games Econom. Behav. 157:1–21.Crossref, Google Scholar
Chen N, Chen YJ (2021) Duopoly competition with network effects in discrete choice models. Oper. Res. 69(2):545–559.Link, Google Scholar
Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Crossref, Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
Cont R, Xiong W (2024) Dynamics of market making algorithms in dealer markets: Learning and tacit collusion. Math. Finance 34(2):467–521.Crossref, Google Scholar
Cooper WL, Homem-de Mello T, Kleywegt AJ (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.Link, Google Scholar
Cournot AA (1838) Recherches Sur Les Principes Mathématiques de la Théorie Des Richesses (L. Hachette, Paris).Google Scholar
den Boer AV (2023) A (mathematical) definition of algorithmic collusion. Working paper, University of Amsterdam, Amsterdam.Google Scholar
den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
den Boer AV, Meylahn JM, Schinkel MP (2022) Artificial collusion: Examining supracompetitive pricing by Q-learning algorithms. Working paper, University of Amsterdam, Amsterdam.Google Scholar
Epivent A, Lambin X (2024) On algorithmic collusion and reward-punishment schemes. Econom. Lett. 237:111661.Crossref, Google Scholar
Eschenbaum N, Mellgren F, Zahn P (2022) Robust algorithmic collusion. Working paper, University of St. Gallen, St. Gallen, Switzerland.Google Scholar
Federgruen A, Hu M (2015) Multi-product price and assortment competition. Oper. Res. 63(3):572–584.Link, Google Scholar
Federgruen A, Hu M (2016) Sequential multiproduct price competition in supply chain networks. Oper. Res. 64(1):135–149.Link, Google Scholar
Federgruen A, Hu M (2021) Global robust stability in a general price and assortment competition model. Oper. Res. 69(1):164–174.Link, Google Scholar
Gallego G, Hu M (2014) Dynamic pricing of perishable assets under competition. Management Sci. 60(5):1241–1259.Link, Google Scholar
Gallego G, Wang R (2014) Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Oper. Res. 62(2):450–461.Link, Google Scholar
Gallego G, Huh WT, Kang W, Phillips R (2006) Price competition with the attraction demand model: Existence of unique equilibrium and its stability. Manufacturing Service Oper. Management 8(4):359–375.Link, Google Scholar
Gershgorn D (2024) The best mini desktop PCs. New York Times. Accessed April 15, 2024, https://www.nytimes.com/wirecutter/reviews/best-mini-desktop-pcs/.Google Scholar
Golowich N, Pattathil S, Daskalakis C (2020) Tight last-iterate convergence rates for no-regret learning in multi-player games. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds., Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 20766–20778.Google Scholar
Golrezaei N, Jaillet P, Liang JCN (2020) No-regret learning in price competitions under consumer reference effects. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds., Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 21416–21427.Google Scholar
Goyal V, Li S, Mehrotra S (2023) Learning to price under competition for multinomial logit demand. Working paper, Northwestern University, Evanston, IL.Google Scholar
Guo MA, Ying D, Lavaei J, Shen ZJM (2026) Last-iterate convergence in no-regret learning: Games with reference effects under logit demand. Management Sci. 72(2):1007–1024.Link, Google Scholar
Hansen KT, Misra K, Pai MM (2021) Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Sci. 40(1):1–12.Link, Google Scholar
Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
Hettich M (2021) Algorithmic collusion: Insights from deep learning. Working paper, University of Muenster, Muenster, Germany.Google Scholar
Hsieh YG, Antonakopoulos K, Mertikopoulos P (2021) Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. Belkin M, Kpotufe S, eds., Proc. 34th Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 134 (PMLR, Brookline, MA), 2388–2422.Google Scholar
Jordan MI, Lin T, Zhou Z (2025) Adaptive, doubly optimal no-regret learning in strongly monotone and exp-concave games with gradient feedback. Oper. Res. 73(3):1675–1702.Link, Google Scholar
Kachani S, Perakis G, Simon C (2007) Modeling the transient nature of dynamic pricing with demand learning in a competitive environment. Nagurney A, ed. Network Science, Nonlinear Science and Infrastructure Systems (Springer, Berlin), 223–267.Crossref, Google Scholar
Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
Kirman AP (1975) Learning by firms about demand conditions. Day RH, Groves T, eds. Adaptive Economic Models (Academic Press, New York), 137–156.Crossref, Google Scholar
Kirman A (1983) On mistaken beliefs and resultant equilibria. Frydman R, Phelps ES, eds. Individual Forecasting and Aggregate Outcomes (Cambridge University Press, Cambridge, UK), 147–166.Google Scholar
Klein T (2018) Assessing autonomous algorithmic collusion: Q-learning under short-run price commitments. Working paper, University of Amsterdam, Amsterdam.Google Scholar
Klein T (2021) Autonomous algorithmic collusion: Q-learning under sequential pricing. RAND J. Econom. 52(3):538–558.Crossref, Google Scholar
Lai TL, Robbins H (1982) Iterated least squares in multiperiod control. Adv. Appl. Math. 3(1):50–73.Crossref, Google Scholar
Li S, Mehrotra S (2026) Adaptive learning in uncertain and sequential competition. Oper. Res. 74(1):301–338.Link, Google Scholar
Li S, Luo Q, Huang Z, Shi C (2025) Online learning for constrained assortment optimization under Markov chain choice model. Oper. Res. 73(1):109–138.Link, Google Scholar
Lin T, Zhou Z, Mertikopoulos P, Jordan MI (2020) Finite-time last-iterate convergence for multi-agent learning in games. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, Brookline, MA), 6161–6171.Google Scholar
Loots T, den Boer AV (2022) Data-driven collusion and competition in a pricing duopoly with multinomial logit demand. Production Oper. Management 31(1):45–63.Google Scholar
Mao W, Zhang K, Zhu R, Simchi-Levi D, Başar T (2025) Model-free nonstationary reinforcement learning: Near-optimal regret and applications in multiagent reinforcement learning and inventory control. Management Sci. 71(2):1564–1580.Link, Google Scholar
Martin N (2019) Uber charges more if they think you’re willing to pay more. Forbes. Accessed April 14, 2024, https://www.forbes.com/sites/nicolemartin1/2019/03/30/uber-charges-more-if-they-think-youre-willing-to-pay-more/?sh=1f8993647365.Google Scholar
McKinsey and Company (2023) What is fast fashion? McKinsey & Company. Accessed April 13, 2024, https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-fast-fashion.Google Scholar
Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unknown payoff functions. Math. Programming 173:465–507.Crossref, Google Scholar
Meylahn JM (2023a) Does an intermediate price facilitate algorithmic collusion? Working paper, University of Twente, Enschede, The Netherlands.Google Scholar
Meylahn JM (2023b) Weak acyclicity in games with unique best-responses and implications for algorithmic collusion. Working paper, University of Twente, Enschede, The Netherlands.Google Scholar
Meylahn JM, den Boer AV (2022) Learning to collude in a pricing duopoly. Manufacturing Service Oper. Management 24(5):2577–2594.Link, Google Scholar
Morrow WR, Skerlos SJ (2011) Fixed-point approaches to computing Bertrand-Nash equilibrium prices under mixed-logit demand. Oper. Res. 59(2):328–345.Link, Google Scholar
Nemirovski AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, New York).Google Scholar
Phillips RL (2005) Pricing and Revenue Optimization (Stanford University Press, Stanford, California).Crossref, Google Scholar
Qin H (2023) Boba’s boom: Reshaping the U.S. beverage landscape. Michigan Journal of Economics. Accessed April 13, 2024, https://sites.lsa.umich.edu/mje/2023/12/04/bobas-boom-reshaping-the-u-s-beverage-landscape/.Google Scholar
Ren D (2023) Huawei and Xiaomi launch new EV models in China, reigniting worries about price wars in the world’s largest EV market. South China Morning Post. Accessed April 13, 2024, https://www.scmp.com/business/china-business/article/3246414/huawei-and-xiaomi-launch-new-ev-models-china-reigniting-worries-about-price-wars-worlds-largest-ev.Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Scott M, Stillman A, Simon Z, Tanakasempipat P, Burkhardt P, Lorinc J (2023) EV market’s surge toward $57 trillion sparks global flashpoints. Bloomberg. Accessed April 13, 2024, https://www.bloomberg.com/news/features/2023-11-07/the-57-trillion-ev-market-is-a-battleground-for-china-us-eu?itm_source=record&itm_campaign=EV_Slowdown&itm_content=$57_Trillion_Market-4.Google Scholar
Tesauro G, Kephart JO (2002) Pricing in agent economies using multi-agent Q-learning. Autonomous Agents Multi Agent Systems 5(3):289–304.Crossref, Google Scholar
Valinsky J (2024) Wendy’s will test new menus that change prices throughout the day. CNN. Accessed April 14, 2024, https://www.cnn.com/2024/02/27/food/wendys-test-surge-pricing/index.html.Google Scholar
Waltman L, Kaymak U (2008) Q-learning agents in a cournot oligopoly model. J. Econom. Dynam. Control 32(10):3275–3293.Crossref, Google Scholar
Wang R, Ke C, Cui S (2022) Product price, quality, and service decisions under consumer choice models. Manufacturing Service Oper. Management 24(1):430–447.Link, Google Scholar
Yang Y, Lee YC, Chen PA (2024) Competitive demand learning: A noncooperative pricing algorithm with coordinated price experimentation. Production Oper. Management 33(1):48–68.Crossref, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:June 04, 2024
Accepted:April 30, 2026
Published Online:June 03, 2026

Cite as

Shukai Li, Cong Shi, Sanjay Mehrotra (2026) LEGO: Optimal Online Learning Under Sequential Price Competition. Operations Research 0(0).

https://doi.org/10.1287/opre.2024.1085

Keywords

Acknowledgments

The authors thank area editor Professor Xi Chen, the associate editor, and two anonymous referees for their careful reading and constructive comments, which led to several substantial improvements to the paper. Shukai Li and Sanjay Mehrotra acknowledge support from the National Science Foundation [Grant CMMI-1763035]. Cong Shi acknowledges support from an Amazon Research Award and a Provost Research Award from the University of Miami.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

LEGO: Optimal Online Learning Under Sequential Price Competition

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News