Online Pricing with Offline Data: Phase Transition and Inverse Square Law

Published Online:https://doi.org/10.1287/mnsc.2022.4322

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems 24:2312–2320.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. 2017 Conf. on Learn. Theory (PMLR) 65:76–78.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2-3):235–256.CrossrefGoogle Scholar
  • Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Google Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Learning across experiments. Management Sci. 68(3):1865–1881.Google Scholar
  • Bouneffouf D, Parthasarathy S, Samulowitz H, Wistub M (2019) Optimal exploitation of clustering and history information in multi-armed bandit. Preprint, submitted May 31, https://arxiv.org/abs/1906.03979.Google Scholar
  • Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. Proc. 37th Internat. Conf. Machine Learn. (PMLR), 119:1202–1210.Google Scholar
  • Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press).CrossrefGoogle Scholar
  • Correa J, Dütting P, Fischer F, Schewior K (2021) Prophet inequalities for independent and identically distributed random variables from an unknown distribution. Math. Oper. Res., ePub ahead of print December 20, https://doi.org/10.1287/mnsc.2021.1167.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Conf. on Learn. Theory. (COLT 2008), 355–366.Google Scholar
  • den Boer AV (2014) Dynamic pricing with multiple products and partially specified demand distribution. Math. Oper. Res. 39(3):863–888.LinkGoogle Scholar
  • den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Survey Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • den Boer AV, Keskin NB (2017) Dynamic pricing with demand learning and reference effects. Preprint, submitted September 16, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3092745.Google Scholar
  • den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • Domb C (2000) Phase Transitions and Critical Phenomena, vol. 19 (Elsevier, New York).Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Adv. Neural Inform. Processing Systems 23:586–594.Google Scholar
  • Gill R, Levit B (2001) Applications of the van trees inequality: A Bayesian Cramér-Rao bound. Bernoulli 1:59.CrossrefGoogle Scholar
  • Gur Y, Momeni A (2019) Adaptive sequential experiments with unknown information flows. Preprint, submitted December 18, https://arxiv.org/abs/1907.00107.Google Scholar
  • Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math. Intelligencer 27(2):83–85.CrossrefGoogle Scholar
  • Hsu CW, Kveton B, Meshi O, Martin M, Szepesvari C (2019) Empirical bayes regret minimization. Preprint, submitted June 10, https://arxiv.org/abs/1904.02664.Google Scholar
  • Keskin N, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2016) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.LinkGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Google Scholar
  • Miao S, Chao X (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
  • Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted June 1, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2765257.Google Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Shivaswamy P, Joachims T (2012) Multi-armed bandit problems with history. Proc. 15th Internat. Conf. Artificial Intelligence and Statistics (PMLR), 22:1046–1054.Google Scholar
  • Simchi-Levi D, Xu Y (2019) Phase transitions in bandits with switching constraints. Preprint, submitted March 18, https://arxiv.org/abs/1905.10825.Google Scholar
  • Tsybakov A (2009) Introduction to Nonparametric Estimation (Springer, New York).CrossrefGoogle Scholar
  • Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
  • Ye L, Lin Y, Xie H, Lui J (2020) Combining offline causal inference and online bandit learning for data driven decisions. Preprint, submitted November 7, https://arxiv.org/abs/2001.05699.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.