Network Revenue Management with Nonparametric Demand Learning: T-Regret and Polynomial Dimension Dependency

Published Online:https://doi.org/10.1287/moor.2022.0086

References

  • [1] Agarwal A, Dekel O, Xiao L (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Conf. Learn. Theory (Association for Computational Learning), 28–40.Google Scholar
  • [2] Agarwal A, Foster DP, Hsu D, Kakade SM, Rakhlin A (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.CrossrefGoogle Scholar
  • [3] Agrawal S, Devanur NR (2014) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 1405–1424.Google Scholar
  • [4] Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.LinkGoogle Scholar
  • [5] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • [6] Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. Conf. Learn. Theory, vol. 7 (Association for Computational Learning), 1–122.Google Scholar
  • [7] Audibert J-Y, Bubeck S (2010) Regret bounds and minimax policies under partial monitoring. J. Machine Learn. Res. 11:2785–2836.Google Scholar
  • [8] Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Google Scholar
  • [9] Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4.CrossrefGoogle Scholar
  • [10] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):1–55.CrossrefGoogle Scholar
  • [11] Balseiro S, Lu H, Mirrokni V (2020) The best of many worlds: Dual mirror descent for online allocation problems. Preprint, submitted November 18, https://arxiv.org/abs/2011.10124v1.Google Scholar
  • [12] Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.CrossrefGoogle Scholar
  • [13] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • [14] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • [15] Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • [16] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • [17] Bitran G, Caldentey R (2003) An overview of pricing models for revenue management. Manufacturing Service Oper. Management 5(3):203–229.LinkGoogle Scholar
  • [18] Boyd S, Boyd SP, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [19] Broadie M, Cicek D, Zeevi A (2011) General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Oper. Res. 59(5):1211–1224.LinkGoogle Scholar
  • [20] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • [21] Bubeck S, Eldan R, Lee YT (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.CrossrefGoogle Scholar
  • [22] Bumpensanti P, Wang H (2020) A re-solving heuristic with uniformly bounded loss for network revenue management. Management Sci. 66(7):2993–3009.LinkGoogle Scholar
  • [23] Chen M, Chen Z-L (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.CrossrefGoogle Scholar
  • [24] Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.LinkGoogle Scholar
  • [25] Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.CrossrefGoogle Scholar
  • [26] Chen Q, Jasin S, Duenyas I (2016) Real-time dynamic pricing with minimal and flexible price adjustment. Management Sci. 62(8):2437–2455.LinkGoogle Scholar
  • [27] Chen Q, Jasin S, Duenyas I (2019a) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.LinkGoogle Scholar
  • [28] Chen X, Wang Y, Wang Y-X (2019b) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.LinkGoogle Scholar
  • [29] Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • [30] Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.LinkGoogle Scholar
  • [31] Cooper WL (2002) Asymptotic behavior of an allocation policy for revenue management. Oper. Res. 50(4):720–727.LinkGoogle Scholar
  • [32] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • [33] den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • [34] den Boer AV, Zwart B (2015) Dynamic pricing and learning with finite inventories. Oper. Res. 63(4):965–978.LinkGoogle Scholar
  • [35] Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14(1):153–158.CrossrefGoogle Scholar
  • [36] Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.LinkGoogle Scholar
  • [37] Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • [38] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
  • [39] Frazier PI (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
  • [40] Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.LinkGoogle Scholar
  • [41] Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.LinkGoogle Scholar
  • [42] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • [43] Hazan E, Levy K (2014) Bandit convex optimization: Towards tight bounds. Adv. Neural Inform. Processing Systems 27:784–792.Google Scholar
  • [44] Hazan E, Levy KY, Shalev-Shwartz S (2015) Beyond convexity: Stochastic quasi-convex optimization. Preprint, submitted October 28, https://arxiv.org/abs/1507.02030.Google Scholar
  • [45] Jasin S (2014) Reoptimization and self-adjusting price control for network revenue management. Oper. Res. 62(5):1168–1178.LinkGoogle Scholar
  • [46] Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.LinkGoogle Scholar
  • [47] Jiang J, Li X, Zhang J (2020) Online stochastic optimization with Wasserstein based non-stationarity. Preprint, submitted December 13, https://arxiv.org/abs/2012.06961v1.Google Scholar
  • [48] Johnson K, Simchi-Levi D, Wang H (2015) Online network revenue management using Thompson sampling. Preprint, submitted April 3, https://doi.org/10.2139/ssrn.2588730.Google Scholar
  • [49] Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases ECML/PKDD (Springer, Berlin), 795–811.Google Scholar
  • [50] Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • [51] Kleinberg R, Leighton T (2003a) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
  • [52] Kleinberg R, Leighton T (2003b) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
  • [53] Lattimore T, Gyorgy A (2021) Improved regret for zeroth-order stochastic convex bandits. Conf. Learn. Theory (PMLR, New York), 2938–2964.Google Scholar
  • [54] Lei Y, Jasin S, Sinha A (2014) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
  • [55] Lei YM, Jasin S, Sinha A (2015) Multidimensional bisection search for constrained optimization with noisy observations. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
  • [56] Li H, Huh W (2011) Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing Service Oper. Management 13(4):549–563.LinkGoogle Scholar
  • [57] Miao S, Wang Y, Zhang J (2021) A general framework for resource constrained revenue management with demand learning and large action space. Preprint, submitted May 7, https://doi.org/10.2139/ssrn.3841273.Google Scholar
  • [58] Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • [59] Nemirovskij AS, Yudin DB (1985) Problem complexity and method efficiency in optimization. SIAM Rev. 27(2):264–265.Google Scholar
  • [60] Reiman MI, Wang Q (2008) An asymptotically optimal policy for a quantity-based network revenue management problem. Math. Oper. Res. 33(2):257–282.LinkGoogle Scholar
  • [61] Rusmevichientong P, van Roy B, Glynn P (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.LinkGoogle Scholar
  • [62] Secomandi N (2008) An analysis of the control-algorithm re-solving issue in inventory and revenue management. Manufacturing Service Oper. Management 10(3):468–483.LinkGoogle Scholar
  • [63] Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272v1.Google Scholar
  • [64] Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. Pereira F, Burges CJ, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • [65] Talluri KT, Van Ryzin G, Van Ryzin G (2004) The Theory and Practice of Revenue Management, vol. 1 (Springer, New York).CrossrefGoogle Scholar
  • [66] Wang Y, Wang H (2020) Constant regret re-solving heuristics for price-based revenue management. Preprint, submitted December 29, https://arxiv.org/abs/2009.02861.Google Scholar
  • [67] Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.