Network Revenue Management with Nonparametric Demand Learning: -Regret and Polynomial Dimension Dependency
Published Online:10 Oct 2025https://doi.org/10.1287/moor.2022.0086
References
- [1] (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Conf. Learn. Theory (Association for Computational Learning), 28–40.Google Scholar
- [2] (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Crossref, Google Scholar
- [3] (2014) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 1405–1424.Google Scholar
- [4] (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
- [5] (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- [6] (2009) Minimax policies for adversarial and stochastic bandits. Conf. Learn. Theory, vol. 7 (Association for Computational Learning), 1–122.Google Scholar
- [7] (2010) Regret bounds and minimax policies under partial monitoring. J. Machine Learn. Res. 11:2785–2836.Google Scholar
- [8] (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Google Scholar
- [9] (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4.Crossref, Google Scholar
- [10] (2018) Bandits with knapsacks. J. ACM 65(3):1–55.Crossref, Google Scholar
- [11] (2020) The best of many worlds: Dual mirror descent for online allocation problems. Preprint, submitted November 18, https://arxiv.org/abs/2011.10124v1.Google Scholar
- [12] (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
- [13] (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- [14] (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
- [15] (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
- [16] (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- [17] (2003) An overview of pricing models for revenue management. Manufacturing Service Oper. Management 5(3):203–229.Link, Google Scholar
- [18] (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [19] (2011) General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Oper. Res. 59(5):1211–1224.Link, Google Scholar
- [20] (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- [21] (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
- [22] (2020) A re-solving heuristic with uniformly bounded loss for network revenue management. Management Sci. 66(7):2993–3009.Link, Google Scholar
- [23] (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.Crossref, Google Scholar
- [24] (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.Link, Google Scholar
- [25] (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Crossref, Google Scholar
- [26] (2016) Real-time dynamic pricing with minimal and flexible price adjustment. Management Sci. 62(8):2437–2455.Link, Google Scholar
- [27] (2019a) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.Link, Google Scholar
- [28] (2019b) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.Link, Google Scholar
- [29] (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
- [30] (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
- [31] (2002) Asymptotic behavior of an allocation policy for revenue management. Oper. Res. 50(4):720–727.Link, Google Scholar
- [32] (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
- [33] (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
- [34] (2015) Dynamic pricing and learning with finite inventories. Oper. Res. 63(4):965–978.Link, Google Scholar
- [35] (1969) Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14(1):153–158.Crossref, Google Scholar
- [36] (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
- [37] (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- [38] (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
- [39] (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
- [40] (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.Link, Google Scholar
- [41] (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.Link, Google Scholar
- [42] (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
- [43] (2014) Bandit convex optimization: Towards tight bounds. Adv. Neural Inform. Processing Systems 27:784–792.Google Scholar
- [44] (2015) Beyond convexity: Stochastic quasi-convex optimization. Preprint, submitted October 28, https://arxiv.org/abs/1507.02030.Google Scholar
- [45] (2014) Reoptimization and self-adjusting price control for network revenue management. Oper. Res. 62(5):1168–1178.Link, Google Scholar
- [46] (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
- [47] (2020) Online stochastic optimization with Wasserstein based non-stationarity. Preprint, submitted December 13, https://arxiv.org/abs/2012.06961v1.Google Scholar
- [48] (2015) Online network revenue management using Thompson sampling. Preprint, submitted April 3, https://doi.org/10.2139/ssrn.2588730.Google Scholar
- [49] (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases ECML/PKDD (Springer, Berlin), 795–811.Google Scholar
- [50] (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
- [51] (2003a) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
- [52] (2003b) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
- [53] (2021) Improved regret for zeroth-order stochastic convex bandits. Conf. Learn. Theory (PMLR, New York), 2938–2964.Google Scholar
- [54] (2014) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
- [55] (2015) Multidimensional bisection search for constrained optimization with noisy observations. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
- [56] (2011) Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing Service Oper. Management 13(4):549–563.Link, Google Scholar
- [57] (2021) A general framework for resource constrained revenue management with demand learning and large action space. Preprint, submitted May 7, https://doi.org/10.2139/ssrn.3841273.Google Scholar
- [58] (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
- [59] (1985) Problem complexity and method efficiency in optimization. SIAM Rev. 27(2):264–265.Google Scholar
- [60] (2008) An asymptotically optimal policy for a quantity-based network revenue management problem. Math. Oper. Res. 33(2):257–282.Link, Google Scholar
- [61] (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.Link, Google Scholar
- [62] (2008) An analysis of the control-algorithm re-solving issue in inventory and revenue management. Manufacturing Service Oper. Management 10(3):468–483.Link, Google Scholar
- [63] (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272v1.Google Scholar
- [64] (2012) Practical Bayesian optimization of machine learning algorithms. Pereira F, Burges CJ, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- [65] (2004) The Theory and Practice of Revenue Management, vol. 1 (Springer, New York).Crossref, Google Scholar
- [66] (2020) Constant regret re-solving heuristics for price-based revenue management. Preprint, submitted December 29, https://arxiv.org/abs/2009.02861.Google Scholar
- [67] (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar

