Multimodal Dynamic Pricing

Published Online:https://doi.org/10.1287/mnsc.2020.3819

References

  • Abbasi-Yadkori Y , Pal D , Szepesvari C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Proc. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 1–9.Google Scholar
  • Agarwal A , Foster DP , Hsu DJ , Kakade SM , Rakhlin A (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.CrossrefGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
  • Auer P , Ortner R , Szepesvári C (2007) Improved rates for the stochastic continuum-armed bandit problem. Proc. Conf. Comput. Learn. Theory (COLT) (Springer, Berlin, Heidelberg), 454–468.Google Scholar
  • Badanidiyuru A , Kleinberg R , Slivkins A (2013) Bandits with knapsacks. IEEE 54th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 207–216.Google Scholar
  • Bastani H , Bayati M (2020) Online decision-making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Besbes O , Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Besbes O , Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • Besbes O , Zeevi A (2015) On the surprising sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • Broder J , Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Bubeck S , Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends® Machine Learn. 5(1):1–122.Google Scholar
  • Bubeck S , Munos R , Stoltz G , Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12(May):1655–1695.Google Scholar
  • Bubeck S , Stoltz G , Szepesvári C , Munos R (2009) Online optimization in x-armed bandits. D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, eds. Proc. Adv. Neural Inform. Processing Systems (NIPS), vol. 21 (Curran Associates, Inc.), 201–208.Google Scholar
  • Bull AD (2011) Convergence rates of efficient global optimization algorithms. J. Machine Learn. Res. 12(Oct):2879–2904.Google Scholar
  • Chen N , Gallego G (2019) A primal-dual learning algorithm for personalized dynamic pricing with an inventory constraint. Working paper, Hong Kong University of Science and Technology, Hong Kong.Google Scholar
  • Chen Y , Shi C (2019) Network revenue management with online inverse batch gradient descent method. Working paper, University of Cincinnati, Cincinnati.Google Scholar
  • Chen B , Chao X , Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost-sales and censored demand. Math. Oper. Res. Forthcoming.LinkGoogle Scholar
  • Chen Q , Jasin S , Duenyas I (2019) A nonparametric self-adjusting control for joint learning and optimization of multi-product pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.LinkGoogle Scholar
  • Cheung WC , Simchi-Levi D , Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • Chu W , Li L , Reyzin L , Schapire R (2011) Contextual bandits with linear payoff functions. Proc. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 208–214.Google Scholar
  • Cope E (2009) Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Automat. Control 54(6):1243–1253.CrossrefGoogle Scholar
  • Ebert DS , Musgrave FK (2003) Texturing & Modeling: A Procedural Approach (Chapman and Hall/CRC, London).Google Scholar
  • Fan J (1993) Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21(1):196–216.CrossrefGoogle Scholar
  • Fan J , Gijbels I (2018) Local Polynomial Modelling and Its Applications (Routledge, Abingdon-on-Thames, UK).CrossrefGoogle Scholar
  • Ferreira KJ , Simchi-Levi D , Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Flaxman AD , Kalai AT , Kalai AT , McMahan HB (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. Proc. Annual ACM-SIAM Sympos. Discrete Algorithms (SODA), 385–394.Google Scholar
  • Gittins J , Glazebrook K , Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Goldenshluger A , Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.LinkGoogle Scholar
  • Grill J-B , Valko M , Munos R (2015) Black-box optimization of noisy functions with unknown smoothness. Proc. Adv. Neural Inform. Processing Systems (NIPS), 667–675.Google Scholar
  • Gur Y , Momeni A , Wager S (2019) Smoothness-adaptive stochastic bandits. Preprint, submitted October 22, https://arxiv.org/abs/1910.09714.Google Scholar
  • Hazan E , Klivans A , Yuan Y (2018) Hyperparameter optimization: A spectral approach. Proc. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Keskin NB , Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Lai TL , Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Applied Math . 6(1):4–22.CrossrefGoogle Scholar
  • Lei Y , Jasin S , Sinha A (2019) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, University of Michigan, Ann Arbor.Google Scholar
  • Li L , Jamieson K , DeSalvo G , Rostamizadeh A , Talwalkar A (2017) Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Machine Learn. Res. 18(1):6765–6816.Google Scholar
  • Malherbe C , Vayatis N (2016) A ranking approach to global optimization. Proc. Internat. Conf. Machine Learn. (ICML), 1539–1547.Google Scholar
  • Malherbe C , Vayatis N (2017) Global optimization of Lipschitz functions. Proc. Internat. Conf. Machine Learn. (ICML), 2314–2323.Google Scholar
  • Rusmevichientong P , Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Simchi-Levi D , Xu Y (2019) Phase transitions and cyclic phenomena in bandits with switching constraints. Preprint, submitted June 6, https://ssrn.com/abstract=3380783.Google Scholar
  • Wang Y , Balakrishnan S , Singh A (2019) Optimization of smooth functions with noisy observations: Local minimax rates. IEEE Trans. Inform. Theory 65(11):7350–7366.CrossrefGoogle Scholar
  • Wang Z , Deng S , Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
  • Weber R (1992) On the Gittens index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.CrossrefGoogle Scholar
  • Whittle P (1980) Multi-armed bandits and the Gittens index. J. R. Statist. Soc. B . 42(2):143–149.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.