Auer P, Gajane P, Ortner R (2018) Adaptively tracking the best arm with an unknown number of distribution changes. Proc. Thirty-Second Conf. Learning Theory, vol. 99 (PMLR, New York), 138–158.Google Scholar
Auer P, Chen Y, Gajane P, Lee CW, Luo H, Ortner R, Wei CY (2019) Achieving optimal dynamic regret for non-stationary bandits without prior information. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 159–163.Google Scholar
Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2021) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
Ben-Tal A, Den Hertog D, De Waegenaere A, Melenberg B, Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY), 199–207.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2019) Optimal exploration-exploitation in multi-armed-bandit problems with non-stationary rewards. Stoch. Syst. 9(4):319–337.Link, Google Scholar
Birge JR, Chen H, Keskin NB (2025) Markdown policies for demand learning with forward-looking customers. Oper. Res., ePub ahead of print May 8, https://doi.org/10.1287/opre.2019.0402.Google Scholar
Birge JR, Chen H, Keskin NB, Ward A (2024) To interfere or not to interfere: Information revelation and price-setting incentives in a multiagent learning environment. Oper. Res. 72(6):2391–2412.Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Cachon GP (2003) Supply chain coordination with contracts. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science, vol. 11 (North Holland, Amsterdam), 227–339.Crossref, Google Scholar
Cao Y, Wen Z, Kveton B, Xie Y (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
Cesa-Bianchi N, Cesari T, Osogami T, Scarsini M, Wasserkrug S (2022) Online learning in supply-chain games. Preprint, submitted July 8, https://arxiv.org/abs/2207.04054.Google Scholar
Chen M, Chen ZL (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.Crossref, Google Scholar
Chen X, Wang Y, Wang YX (2019a) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.Link, Google Scholar
Chen BB, Wang Y, Zhou Y (2022b) Optimal policies for dynamic pricing and inventory control with nonparametric censored demands. Management Sci. 70(5):3362–3380.Link, Google Scholar
Chen Y, Lee CW, Luo H, Wei CY (2019b) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 696–726.Google Scholar
Chen BB, Simchi-Levi D, Wang Y, Zhou Y (2022a) Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Sci. 68(8):5684–5703.Link, Google Scholar
Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2019) Learning to optimize under non-stationarity. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1079–1087.Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
Cohen A, Deligkas A, Koren M (2022) Learning approximately optimal contracts. Internat. Sympos. Algorithmic Game Theory (Springer, Berlin, Heidelberg), 331–346.Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2022) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
den Boer AV, Keskin NB (2020) Discontinuous demand functions: Estimation and pricing. Management Sci. 66(10):4516–4534.Link, Google Scholar
den Boer AV, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.Link, Google Scholar
EY Americas (2020) How the future of work will change the digital supply chain. Accessed October 4, 2022, https://www.ey.com/en_us/consulting/how-the-future-of-work-will-change-the-digital-supply-chain.Google Scholar
Feng Q, Zhu R (2023) Principro: Data-driven algorithms for joint pricing and inventory control under price protection. Preprint, submitted July 24, https://ssrn.com/abstract=4511384.Google Scholar
Feng Q, Zhu R, Jasin S (2023) Temporal fairness in learning and earning: Price protection guarantee and phase transitions. Proc. 24th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York).Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. Internat. Statist. Rev. 70(3):419–435.Crossref, Google Scholar
Golrezaei N, Manshadi V, Schneider J, Sekar S (2023) Learning product rankings robust to fake users. Oper. Res. 71(4):1171–1196.Link, Google Scholar
Han M, Albert M, Xu H (2024) Learning in online principal-agent interactions: The power of menus. Proc. AAAI Conf. Artificial Intell., vol. 38 (AAAI Press, Palo Alto, CA), 17426–17434.Google Scholar
Ho CJ, Slivkins A, Vaughan JW (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intell. Res. 55:317–359.Crossref, Google Scholar
Jia H, Shi C, Shen S (2022) Online learning and pricing for network revenue management with reusable resources. Preprint, submitted September 30, https://ssrn.com/abstract=4225832.Google Scholar
Karnin ZS, Anava O (2016) Multi-armed bandits: Competing with optimal sequences. Adv. Neural Inform. Processing Systems 29:199–207. Google Scholar
Keskin NB, Birge JR (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Abstract, Google Scholar
Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
Keskin NB, Li Y, Song JS (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.Link, Google Scholar
Keskin NB, Min X, Song JSJ (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, http://dx.doi.org/10.2139/ssrn.3866171.Google Scholar
Kleinberg R, Slivkins A, Upfal E (2019) Bandits and experts in metric spaces. J. ACM 66(4):1–77.Crossref, Google Scholar
Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.Crossref, Google Scholar
Levi R, Perakis G, Uichanco J (2015) The data-driven newsvendor problem: New bounds and insights. Oper. Res. 63(6):1294–1306.Link, Google Scholar
Liu L, Rong Y (2024) A Stackelberg regret minimizing framework for online learning in newsvendor pricing games. Preprint, submitted March 30, https://arxiv.org/html/2404.00203v1.Google Scholar
Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Bubeck S, Perchet V, Rigollet P, eds. Conf. Learn. Theory (PMLR, New York), 1739–1776.Google Scholar
Lykouris T, Mirrokni V, Paes Leme R (2018) Stochastic bandits robust to adversarial corruptions. Proc. 50th Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 114–122.Google Scholar
Wei CY, Luo H (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Conf. Learn. Theory (PMLR, New York), 4300–4354.Google Scholar
Wei L, Srivatsva V (2018) On abruptly-changing and slowly-varying multiarmed bandit problems. 2018 Annual Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 6291–6296.Google Scholar
Wei CY, Hong YT, Lu CJ (2016) Tracking the best expert in non-stationary stochastic environments. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY).Google Scholar
Zhang K, Yang Z, Başar T (2021) Multi-agent reinforcement learning: A selective overview of theories and algorithms. Vamvoudakis KG, Wan Y, Lewis FL, Cansever D, eds. Handbook of Reinforcement Learning and Control (Springer, Berlin), 321–384.Crossref, Google Scholar
Zhou TZ, Liu J, Dong C, Dong J (2021) Incentivized bandit learning with self-reinforcing user preferences. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
Zhu B, Bates S, Yang Z, Wang Y, Jiao J, Jordan MI (2022) The sample complexity of online contract design. Preprint, submitted November 10, https://arxiv.org/abs/2211.05732.Google Scholar

Volume 72, Issue 3

March 2026

Pages 1727-2679, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:October 28, 2022
Accepted:August 11, 2024
Published Online:July 21, 2025

Cite as

Xuejun Zhao, Ruihao Zhu, William B. Haskell (2025) Learning to Price Supply Chain Contracts Against a Learning Retailer. Management Science 72(3):2168-2187.

https://doi.org/10.1287/mnsc.2022.03339

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Learning to Price Supply Chain Contracts Against a Learning Retailer

References

Volume 72, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News