Adaptive Learning in Uncertain and Sequential Competition

Shukai Li
Shukai Li
[email protected]
https://orcid.org/0000-0003-3540-5035
New York University Shanghai, Shanghai 200124, China
Search for more papers by this author
,
Sanjay Mehrotra
Corresponding Author
Sanjay Mehrotra
[email protected]
https://orcid.org/0000-0003-1106-1901
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author

New York University Shanghai, Shanghai 200124, China

Search for more papers by this author

Sanjay Mehrotra

Corresponding Author

Sanjay Mehrotra

[email protected]

https://orcid.org/0000-0003-1106-1901

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Published Online:4 Nov 2025https://doi.org/10.1287/opre.2024.0825

References

Aksoy-Pierson M, Allon G, Federgruen A (2013) Price competition under mixed multinomial logit demand functions. Management Sci. 59(8):1817–1835.Link, Google Scholar
Allon G, Federgruen A (2008) Service competition with general queueing facilities. Oper. Res. 56(4):827–849.Link, Google Scholar
Aouad A, den Boer AV (2021) Algorithmic collusion in assortment games. Preprint, submitted September 28, https://doi.org/10.2139/ssrn.3930364.Google Scholar
Ba W, Lin T, Zhang J, Zhou Z (2025) Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Oper. Res., ePub ahead of print January 3, https://doi.org/10.1287/opre.2021.0445.Link, Google Scholar
Balseiro S, Kroer C, Kumar R (2023) Contextual standard auctions with budgets: Revenue equivalence and efficiency guarantees. Management Sci. 69(11):6837–6854.Link, Google Scholar
Bertrand J (1883) Théorie mathématique de la richesse sociale. J. Des Savants 67(1883):499–508.Google Scholar
Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
Besbes O, Sauré D (2016) Product assortment and price competition under multinomial logit demand. Production Oper. Management 25(1):114–127.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Bravo M, Leslie D, Mertikopoulos P (2018) Bandit learning in concave n-person games. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems, Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 5666–5676.Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Cai Y, Zheng W (2023) Doubly optimal no-regret learning in monotone games. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3507–3524.Google Scholar
Cai Y, Oikonomou A, Zheng W (2022) Finite-time last-iterate convergence for learning in multi-player games. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 33904–33919.Google Scholar
Calvano E, Calzolari G, Denicolo V, Pastorello S (2020) Artificial intelligence, algorithmic pricing, and collusion. Amer. Econom. Rev. 110(10):3267–3297.Crossref, Google Scholar
Capponi A, Weber M (2024) Systemic portfolio diversification. Oper. Res. 72(1):110–131.Google Scholar
Chen N, Chen YJ (2021) Duopoly competition with network effects in discrete choice models. Oper. Res. 69(2):545–559.Link, Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Chen X, Wang Y, Wang YX (2019) Nonstationary stochastic optimization under l p, q-variation measures. Oper. Res. 67(6):1752–1765.Link, Google Scholar
Chen B, Jiang J, Zhang J, Zhou Z (2024) Learning to order for inventory systems with lost sales and uncertain supplies. Management Sci. 70(12):8631–8646.Google Scholar
Cohen MC, Zhang R (2022) Competition and coopetition for two-sided platforms. Production Oper. Management 31(5):1997–2014.Crossref, Google Scholar
Cooper WL, Homem-de Mello T, Kleywegt AJ (2015) Learning and pricing with models that do not explicitly incorporate competition. Oper. Res. 63(1):86–103.Link, Google Scholar
Cournot AA (1838) Recherches Sur Les Principes Mathématiques de la Théorie Des Richesses Par Augustin Cournot (Chez L. Hachette, Paris).Google Scholar
den Boer AV, Meylahn JM, Schinkel MP (2022) Artificial collusion: Examining supracompetitive pricing by q-learning algorithms. Research paper, Amsterdam Law School, Amsterdam.Google Scholar
Facchinei F, Kanzow C (2007) Generalized Nash equilibrium problems. 4OR 5:173–210.Crossref, Google Scholar
Fan X, Chen B, Xiao W, Zhou Z (2023) No-regret learning in multi-retailer inventory control. Preprint, submitted November 22, https://doi.org/10.2139/ssrn.4626023.Google Scholar
Federgruen A, Hu M (2015) Multi-product price and assortment competition. Oper. Res. 63(3):572–584.Link, Google Scholar
Federgruen A, Hu M (2016) Sequential multiproduct price competition in supply chain networks. Oper. Res. 64(1):135–149.Link, Google Scholar
Federgruen A, Hu M (2021) Global robust stability in a general price and assortment competition model. Oper. Res. 69(1):164–174.Link, Google Scholar
Federgruen A, Yang N (2009) Competition under generalized attraction models: Applications to quality competition under yield uncertainty. Management Sci. 55(12):2028–2043.Link, Google Scholar
Ferris M, Philpott A (2022) Dynamic risked equilibrium. Oper. Res. 70(3):1933–1952.Link, Google Scholar
Fournier G, Scarsini M (2019) Location games on networks: Existence and efficiency of equilibria. Math. Oper. Res. 44(1):212–235.Abstract, Google Scholar
Gallego G, Hu M (2014) Dynamic pricing of perishable assets under competition. Management Sci. 60(5):1241–1259.Link, Google Scholar
Gallego G, Wang R (2014) Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Oper. Res. 62(2):450–461.Link, Google Scholar
Gallego G, Huh WT, Kang W, Phillips R (2006) Price competition with the attraction demand model: Existence of unique equilibrium and its stability. Manufacturing Service Oper. Management 8(4):359–375.Link, Google Scholar
Golowich N, Pattathil S, Daskalakis C (2020) Tight last-iterate convergence rates for no-regret learning in multi-player games. Adv. Neural Inform. Processing Systems 33:20766–20778.Google Scholar
Golrezaei N, Jaillet P, Liang JCN (2020) No-regret learning in price competitions under consumer reference effects. Adv. Neural Inform. Processing Systems 33:21416–21427.Google Scholar
Goyal V, Li S, Mehrotra S (2023) Learning to price under competition for multinomial logit demand. Preprint, submitted October 10, https://doi.org/10.2139/ssrn.4572453.Google Scholar
Guo MA, Ying D, Lavaei J, Shen ZJM (2023) Last-iterate convergence in no-regret learning: Games with reference effects under logit demand. Preprint, submitted November 7, https://doi.org/10.2139/ssrn.4597658.Google Scholar
Gur Y, Saban D, Stier-Moses NE (2018) The competitive facility location problem in a duopoly: Advances beyond trees. Oper. Res. 66(4):1058–1067.Link, Google Scholar
Hansen KT, Misra K, Pai MM (2021) Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Sci. 40(1):1–12.Link, Google Scholar
Hazan E, Levy K (2014) Bandit convex optimization: Towards tight bounds. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (MIT Press, Cambridge, MA), 784–792.Google Scholar
Héliou A, Mertikopoulos P, Zhou Z (2020) Gradient-free online learning in continuous games with delayed rewards. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 4172–4181.Google Scholar
Hsieh YG, Antonakopoulos K, Mertikopoulos P (2021) Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. Proc. Conf. Learn. Theory (PMLR, New York), 2388–2422.Google Scholar
Hsieh YG, Antonakopoulos K, Cevher V, Mertikopoulos P (2022) No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation. Adv. Neural Inform. Processing Systems 35:6544–6556.Google Scholar
Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Javanmard A (2017) Perishability of data: Dynamic pricing under varying-coefficient models. J. Machine Learn. Res. 18(1):1714–1744.Google Scholar
Jordan M, Lin T, Zhou Z (2025) Adaptive, doubly optimal no-regret learning in strongly monotone and exp-concave games with gradient feedback. Oper. Res. 73(3):1675–1702.Google Scholar
Kirman AP (1975) Learning by firms about demand conditions. Day RH, Graves T, eds. Adaptive Economic Models (Academic Press, New York), 137–156.Crossref, Google Scholar
Kirman A (1983) On mistaken beliefs and resultant equilibria. Frydman R, Phelps ES, eds. Individual Forecasting and Aggregate Outcomes (Cambridge University Press, New York), 147–166.Google Scholar
Klein T (2018) Assessing autonomous algorithmic collusion: Q-learning under short-run price commitments. Technical report, Tinbergen Institute, Amsterdam.Google Scholar
Li H, Huh WT (2011) Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing Service Oper. Management 13(4):549–563.Link, Google Scholar
Lin T, Zhou Z, Mertikopoulos P, Jordan M (2020) Finite-time last-iterate convergence for multi-agent learning in games. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 6161–6171.Google Scholar
Loots T, den Boer AV (2023) Data-driven collusion and competition in a pricing duopoly with multinomial logit demand. Production Oper. Management 32(4):1169–1186.Crossref, Google Scholar
Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unknown payoff functions. Math. Programming 173:465–507.Crossref, Google Scholar
Meylahn JM, den Boer AV (2022) Learning to collude in a pricing duopoly. Manufacturing Service Oper. Management 24(5):2577–2594.Link, Google Scholar
Netessine S, Rudi N (2003) Centralized and competitive inventory models with demand substitution. Oper. Res. 51(2):329–335.Link, Google Scholar
Parker W (2024) Big cities take up fight against algorithm-based rents. Accessed February 18, 2024, https://www.wsj.com/real-estate/big-cities-take-up-fight-against-algorithm-based-rents-e55f3aa1.Google Scholar
Schied A, Zhang T (2019) A market impact game under transient price impact. Math. Oper. Res. 44(1):102–121.Abstract, Google Scholar
Shi C, Chen W, Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.Link, Google Scholar
Song JS, Xue Z (2021) Demand shaping through bundling and product configuration: A dynamic multiproduct inventory-pricing model. Oper. Res. 69(2):525–544.Link, Google Scholar
Talluri KT, Van Ryzin GJ (2004) The Theory and Practice of Revenue Management (Kluwer Academic Publishers, Boston).Crossref, Google Scholar
Tesauro G, Kephart JO (2002) Pricing in agent economies using multi-agent q-learning. Autonomous Agents Multi-Agent Systems 5:289–304.Crossref, Google Scholar
Waltman L, Kaymak U (2008) Learning agents in a Cournot oligopoly model. J. Econom. Dynamic Control 32(10):3275–3293.Crossref, Google Scholar
Yang C, Hu Z, Zhou SX (2021) Multilocation newsvendor problem: Centralization and inventory pooling. Management Sci. 67(1):185–200.Link, Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zhou Z, Mertikopoulos P, Bambos N, Glynn PW, Tomlin C (2017a) Countering feedback delays in multi-agent learning. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 6172–6182.Google Scholar
Zhou Z, Mertikopoulos P, Moustakas AL, Bambos N, Glynn P (2017b) Mirror descent learning in continuous games. Proc. IEEE 56th Annual Conf. Decision Control (IEEE, Piscataway, NJ), 5776–5783.Google Scholar
Zhou Z, Mertikopoulos P, Moustakas AL, Bambos N, Glynn P (2021) Robust power management via learning and game design. Oper. Res. 69(1):331–345.Link, Google Scholar
Zhou Z, Mertikopoulos P, Athey S, Bambos N, Glynn PW, Ye Y (2018) Learning in games with lossy feedback. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 5140–5150.Google Scholar

Volume 74, Issue 1

January-February 2026

Pages iii-vii, 1-571, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:February 18, 2024
Accepted:July 18, 2025
Published Online:November 04, 2025

Cite as

Shukai Li, Sanjay Mehrotra (2025) Adaptive Learning in Uncertain and Sequential Competition. Operations Research 74(1):301-338.

https://doi.org/10.1287/opre.2024.0825

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Adaptive Learning in Uncertain and Sequential Competition

References

Volume 74, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News