Learning to Price Supply Chain Contracts Against a Learning Retailer
Published Online:21 Jul 2025https://doi.org/10.1287/mnsc.2022.03339
References
- (2018) Adaptively tracking the best arm with an unknown number of distribution changes. Proc. Thirty-Second Conf. Learning Theory, vol. 99 (PMLR, New York), 138–158.Google Scholar
- (2019) Achieving optimal dynamic regret for non-stationary bandits without prior information. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 159–163.Google Scholar
- (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
- (2021) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
- (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
- (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.Link, Google Scholar
- (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY), 199–207.Google Scholar
- (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- (2019) Optimal exploration-exploitation in multi-armed-bandit problems with non-stationary rewards. Stoch. Syst. 9(4):319–337.Link, Google Scholar
- (2025) Markdown policies for demand learning with forward-looking customers. Oper. Res., ePub ahead of print May 8, https://doi.org/10.1287/opre.2019.0402.Google Scholar
- (2024) To interfere or not to interfere: Information revelation and price-setting incentives in a multiagent learning environment. Oper. Res. 72(6):2391–2412.Google Scholar
- (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- (2003) Supply chain coordination with contracts. Henderson SG, Nelson BL, eds. Handbooks in Operations Research and Management Science, vol. 11 (North Holland, Amsterdam), 227–339.Crossref, Google Scholar
- (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
- (2022) Online learning in supply-chain games. Preprint, submitted July 8, https://arxiv.org/abs/2207.04054.Google Scholar
- (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.Crossref, Google Scholar
- (2019a) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.Link, Google Scholar
- (2022b) Optimal policies for dynamic pricing and inventory control with nonparametric censored demands. Management Sci. 70(5):3362–3380.Link, Google Scholar
- (2019b) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Beygelzimer A, Hsu D, eds. Conf. Learn. Theory (PMLR, New York), 696–726.Google Scholar
- (2022a) Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Sci. 68(8):5684–5703.Link, Google Scholar
- (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
- (2019) Learning to optimize under non-stationarity. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1079–1087.Google Scholar
- (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
- (2022) Learning approximately optimal contracts. Internat. Sympos. Algorithmic Game Theory (Springer, Berlin, Heidelberg), 331–346.Google Scholar
- (2022) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
- (2020) Discontinuous demand functions: Estimation and pricing. Management Sci. 66(10):4516–4534.Link, Google Scholar
- (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.Link, Google Scholar
- EY Americas (2020) How the future of work will change the digital supply chain. Accessed October 4, 2022, https://www.ey.com/en_us/consulting/how-the-future-of-work-will-change-the-digital-supply-chain.Google Scholar
- (2023) Principro: Data-driven algorithms for joint pricing and inventory control under price protection. Preprint, submitted July 24, https://ssrn.com/abstract=4511384.Google Scholar
- (2023) Temporal fairness in learning and earning: Price protection guarantee and phase transitions. Proc. 24th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York).Google Scholar
- (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2002) On choosing and bounding probability metrics. Internat. Statist. Rev. 70(3):419–435.Crossref, Google Scholar
- (2023) Learning product rankings robust to fake users. Oper. Res. 71(4):1171–1196.Link, Google Scholar
- (2024) Learning in online principal-agent interactions: The power of menus. Proc. AAAI Conf. Artificial Intell., vol. 38 (AAAI Press, Palo Alto, CA), 17426–17434.Google Scholar
- (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intell. Res. 55:317–359.Crossref, Google Scholar
- (2022) Online learning and pricing for network revenue management with reusable resources. Preprint, submitted September 30, https://ssrn.com/abstract=4225832.Google Scholar
- (2016) Multi-armed bandits: Competing with optimal sequences. Adv. Neural Inform. Processing Systems 29:199–207. Google Scholar
- (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Abstract, Google Scholar
- (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
- (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
- (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.Link, Google Scholar
- (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, http://dx.doi.org/10.2139/ssrn.3866171.Google Scholar
- (2019) Bandits and experts in metric spaces. J. ACM 66(4):1–77.Crossref, Google Scholar
- (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.Crossref, Google Scholar
- (2015) The data-driven newsvendor problem: New bounds and insights. Oper. Res. 63(6):1294–1306.Link, Google Scholar
- (2024) A Stackelberg regret minimizing framework for online learning in newsvendor pricing games. Preprint, submitted March 30, https://arxiv.org/html/2404.00203v1.Google Scholar
- (2018) Efficient contextual bandits in non-stationary worlds. Bubeck S, Perchet V, Rigollet P, eds. Conf. Learn. Theory (PMLR, New York), 1739–1776.Google Scholar
- (2018) Stochastic bandits robust to adversarial corruptions. Proc. 50th Annual ACM SIGACT Sympos. Theory Comput. (Association for Computing Machinery, New York), 114–122.Google Scholar
- (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Conf. Learn. Theory (PMLR, New York), 4300–4354.Google Scholar
- (2018) On abruptly-changing and slowly-varying multiarmed bandit problems. 2018 Annual Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 6291–6296.Google Scholar
- (2016) Tracking the best expert in non-stationary stochastic environments. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2021) Multi-agent reinforcement learning: A selective overview of theories and algorithms. Vamvoudakis KG, Wan Y, Lewis FL, Cansever D, eds. Handbook of Reinforcement Learning and Control (Springer, Berlin), 321–384.Crossref, Google Scholar
- (2021) Incentivized bandit learning with self-reinforcing user preferences. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
- (2022) The sample complexity of online contract design. Preprint, submitted November 10, https://arxiv.org/abs/2211.05732.Google Scholar

