Learning to Price Supply Chain Contracts Against a Learning Retailer

Published Online:https://doi.org/10.1287/mnsc.2022.03339

References

  • Auer P, Gajane P, Ortner R (2018) Adaptively tracking the best arm with an unknown number of distribution changes. Proc. Thirty-Second Conf. Learning Theory, vol. 99 (PMLR, New York), 138–158.Google Scholar
  • Auer P, Chen Y, Gajane P, Lee CW, Luo H, Ortner R, Wei CY (2019) Achieving optimal dynamic regret for non-stationary bandits without prior information. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 159–163.Google Scholar
  • Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2021) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Ben-Tal A, Den Hertog D, De Waegenaere A, Melenberg B, Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.LinkGoogle Scholar
  • Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY), 199–207.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2019) Optimal exploration-exploitation in multi-armed-bandit problems with non-stationary rewards. Stoch. Syst. 9(4):319–337.LinkGoogle Scholar
  • Birge JR, Chen H, Keskin NB (2025) Markdown policies for demand learning with forward-looking customers. Oper. Res., ePub ahead of print May 8, https://doi.org/10.1287/opre.2019.0402.Google Scholar
  • Birge JR, Chen H, Keskin NB, Ward A (2024) To interfere or not to interfere: Information revelation and price-setting incentives in a multiagent learning environment. Oper. Res. 72(6):2391–2412.Google Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Cachon GP (2003) Supply chain coordination with contracts. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science, vol. 11 (North Holland, Amsterdam), 227–339.CrossrefGoogle Scholar
  • Cao Y, Wen Z, Kveton B, Xie Y (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
  • Cesa-Bianchi N, Cesari T, Osogami T, Scarsini M, Wasserkrug S (2022) Online learning in supply-chain games. Preprint, submitted July 8, https://arxiv.org/abs/2207.04054.Google Scholar
  • Chen M, Chen ZL (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.CrossrefGoogle Scholar
  • Chen X, Wang Y, Wang YX (2019a) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.LinkGoogle Scholar
  • Chen BB, Wang Y, Zhou Y (2022b) Optimal policies for dynamic pricing and inventory control with nonparametric censored demands. Management Sci. 70(5):3362–3380.LinkGoogle Scholar
  • Chen Y, Lee CW, Luo H, Wei CY (2019b) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 696–726.Google Scholar
  • Chen BB, Simchi-Levi D, Wang Y, Zhou Y (2022a) Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Sci. 68(8):5684–5703.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2019) Learning to optimize under non-stationarity. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1079–1087.Google Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.LinkGoogle Scholar
  • Cohen A, Deligkas A, Koren M (2022) Learning approximately optimal contracts. Internat. Sympos. Algorithmic Game Theory (Springer, Berlin, Heidelberg), 331–346.Google Scholar
  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2022) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
  • den Boer AV, Keskin NB (2020) Discontinuous demand functions: Estimation and pricing. Management Sci. 66(10):4516–4534.LinkGoogle Scholar
  • den Boer AV, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.LinkGoogle Scholar
  • EY Americas (2020) How the future of work will change the digital supply chain. Accessed October 4, 2022, https://www.ey.com/en_us/consulting/how-the-future-of-work-will-change-the-digital-supply-chain.Google Scholar
  • Feng Q, Zhu R (2023) Principro: Data-driven algorithms for joint pricing and inventory control under price protection. Preprint, submitted July 24, https://ssrn.com/abstract=4511384.Google Scholar
  • Feng Q, Zhu R, Jasin S (2023) Temporal fairness in learning and earning: Price protection guarantee and phase transitions. Proc. 24th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York).Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. Internat. Statist. Rev. 70(3):419–435.CrossrefGoogle Scholar
  • Golrezaei N, Manshadi V, Schneider J, Sekar S (2023) Learning product rankings robust to fake users. Oper. Res. 71(4):1171–1196.LinkGoogle Scholar
  • Han M, Albert M, Xu H (2024) Learning in online principal-agent interactions: The power of menus. Proc. AAAI Conf. Artificial Intell., vol. 38 (AAAI Press, Palo Alto, CA), 17426–17434.Google Scholar
  • Ho CJ, Slivkins A, Vaughan JW (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intell. Res. 55:317–359.CrossrefGoogle Scholar
  • Jia H, Shi C, Shen S (2022) Online learning and pricing for network revenue management with reusable resources. Preprint, submitted September 30, https://ssrn.com/abstract=4225832.Google Scholar
  • Karnin ZS, Anava O (2016) Multi-armed bandits: Competing with optimal sequences. Adv. Neural Inform. Processing Systems 29:199–207. Google Scholar
  • Keskin NB, Birge JR (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.AbstractGoogle Scholar
  • Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.LinkGoogle Scholar
  • Keskin NB, Li Y, Song JS (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.LinkGoogle Scholar
  • Keskin NB, Min X, Song JSJ (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, http://dx.doi.org/10.2139/ssrn.3866171.Google Scholar
  • Kleinberg R, Slivkins A, Upfal E (2019) Bandits and experts in metric spaces. J. ACM 66(4):1–77.CrossrefGoogle Scholar
  • Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.CrossrefGoogle Scholar
  • Levi R, Perakis G, Uichanco J (2015) The data-driven newsvendor problem: New bounds and insights. Oper. Res. 63(6):1294–1306.LinkGoogle Scholar
  • Liu L, Rong Y (2024) A Stackelberg regret minimizing framework for online learning in newsvendor pricing games. Preprint, submitted March 30, https://arxiv.org/html/2404.00203v1.Google Scholar
  • Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Bubeck S, Perchet V, Rigollet P, eds. Conf. Learn. Theory (PMLR, New York), 1739–1776.Google Scholar
  • Lykouris T, Mirrokni V, Paes Leme R (2018) Stochastic bandits robust to adversarial corruptions. Proc. 50th Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 114–122.Google Scholar
  • Wei CY, Luo H (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Conf. Learn. Theory (PMLR, New York), 4300–4354.Google Scholar
  • Wei L, Srivatsva V (2018) On abruptly-changing and slowly-varying multiarmed bandit problems. 2018 Annual Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 6291–6296.Google Scholar
  • Wei CY, Hong YT, Lu CJ (2016) Tracking the best expert in non-stationary stochastic environments. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Zhang K, Yang Z, Başar T (2021) Multi-agent reinforcement learning: A selective overview of theories and algorithms. Vamvoudakis KG, Wan Y, Lewis FL, Cansever D, eds. Handbook of Reinforcement Learning and Control (Springer, Berlin), 321–384.CrossrefGoogle Scholar
  • Zhou TZ, Liu J, Dong C, Dong J (2021) Incentivized bandit learning with self-reinforcing user preferences. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
  • Zhu B, Bates S, Yang Z, Wang Y, Jiao J, Jordan MI (2022) The sample complexity of online contract design. Preprint, submitted November 10, https://arxiv.org/abs/2211.05732.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.