Network Revenue Management with Nonparametric Demand Learning: $\sqrt{T}$ -Regret and Polynomial Dimension Dependency

Sentao Miao
Sentao Miao
[email protected]
Leeds School of Business, University of Colorado at Boulder, Boulder, Colorado 80309
Search for more papers by this author
,
Yining Wang
Corresponding Author
Yining Wang
[email protected]
https://orcid.org/0000-0001-9410-0392
Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author

Sentao Miao

[email protected]

Leeds School of Business, University of Colorado at Boulder, Boulder, Colorado 80309

Search for more papers by this author

Yining Wang

Corresponding Author

Yining Wang

[email protected]

https://orcid.org/0000-0001-9410-0392

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Published Online:10 Oct 2025https://doi.org/10.1287/moor.2022.0086

References

[1] Agarwal A, Dekel O, Xiao L (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Conf. Learn. Theory (Association for Computational Learning), 28–40.Google Scholar
[2] Agarwal A, Foster DP, Hsu D, Kakade SM, Rakhlin A (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Crossref, Google Scholar
[3] Agrawal S, Devanur NR (2014) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 1405–1424.Google Scholar
[4] Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
[5] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
[6] Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. Conf. Learn. Theory, vol. 7 (Association for Computational Learning), 1–122.Google Scholar
[7] Audibert J-Y, Bubeck S (2010) Regret bounds and minimax policies under partial monitoring. J. Machine Learn. Res. 11:2785–2836.Google Scholar
[8] Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Google Scholar
[9] Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4.Crossref, Google Scholar
[10] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):1–55.Crossref, Google Scholar
[11] Balseiro S, Lu H, Mirrokni V (2020) The best of many worlds: Dual mirror descent for online allocation problems. Preprint, submitted November 18, https://arxiv.org/abs/2011.10124v1.Google Scholar
[12] Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
[13] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
[14] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
[15] Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
[16] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
[17] Bitran G, Caldentey R (2003) An overview of pricing models for revenue management. Manufacturing Service Oper. Management 5(3):203–229.Link, Google Scholar
[18] Boyd S, Boyd SP, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[19] Broadie M, Cicek D, Zeevi A (2011) General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Oper. Res. 59(5):1211–1224.Link, Google Scholar
[20] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
[21] Bubeck S, Eldan R, Lee YT (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
[22] Bumpensanti P, Wang H (2020) A re-solving heuristic with uniformly bounded loss for network revenue management. Management Sci. 66(7):2993–3009.Link, Google Scholar
[23] Chen M, Chen Z-L (2015) Recent developments in dynamic pricing research: Multiple products, competition, and limited demand information. Production Oper. Management 24(5):704–731.Crossref, Google Scholar
[24] Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.Link, Google Scholar
[25] Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Crossref, Google Scholar
[26] Chen Q, Jasin S, Duenyas I (2016) Real-time dynamic pricing with minimal and flexible price adjustment. Management Sci. 62(8):2437–2455.Link, Google Scholar
[27] Chen Q, Jasin S, Duenyas I (2019a) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.Link, Google Scholar
[28] Chen X, Wang Y, Wang Y-X (2019b) Nonstationary stochastic optimization under lp,q-variation measures. Oper. Res. 67(6):1752–1765.Link, Google Scholar
[29] Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
[30] Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
[31] Cooper WL (2002) Asymptotic behavior of an allocation policy for revenue management. Oper. Res. 50(4):720–727.Link, Google Scholar
[32] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
[33] den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
[34] den Boer AV, Zwart B (2015) Dynamic pricing and learning with finite inventories. Oper. Res. 63(4):965–978.Link, Google Scholar
[35] Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14(1):153–158.Crossref, Google Scholar
[36] Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
[37] Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
[38] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms SODA (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
[39] Frazier PI (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
[40] Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.Link, Google Scholar
[41] Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.Link, Google Scholar
[42] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
[43] Hazan E, Levy K (2014) Bandit convex optimization: Towards tight bounds. Adv. Neural Inform. Processing Systems 27:784–792.Google Scholar
[44] Hazan E, Levy KY, Shalev-Shwartz S (2015) Beyond convexity: Stochastic quasi-convex optimization. Preprint, submitted October 28, https://arxiv.org/abs/1507.02030.Google Scholar
[45] Jasin S (2014) Reoptimization and self-adjusting price control for network revenue management. Oper. Res. 62(5):1168–1178.Link, Google Scholar
[46] Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
[47] Jiang J, Li X, Zhang J (2020) Online stochastic optimization with Wasserstein based non-stationarity. Preprint, submitted December 13, https://arxiv.org/abs/2012.06961v1.Google Scholar
[48] Johnson K, Simchi-Levi D, Wang H (2015) Online network revenue management using Thompson sampling. Preprint, submitted April 3, https://doi.org/10.2139/ssrn.2588730.Google Scholar
[49] Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewicz condition. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases ECML/PKDD (Springer, Berlin), 795–811.Google Scholar
[50] Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
[51] Kleinberg R, Leighton T (2003a) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
[52] Kleinberg R, Leighton T (2003b) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. FOCS (IEEE, Piscataway, NJ), 594–605.Google Scholar
[53] Lattimore T, Gyorgy A (2021) Improved regret for zeroth-order stochastic convex bandits. Conf. Learn. Theory (PMLR, New York), 2938–2964.Google Scholar
[54] Lei Y, Jasin S, Sinha A (2014) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
[55] Lei YM, Jasin S, Sinha A (2015) Multidimensional bisection search for constrained optimization with noisy observations. Working paper, Ross School of Business, University of Michigan, Ann Arbor.Google Scholar
[56] Li H, Huh W (2011) Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing Service Oper. Management 13(4):549–563.Link, Google Scholar
[57] Miao S, Wang Y, Zhang J (2021) A general framework for resource constrained revenue management with demand learning and large action space. Preprint, submitted May 7, https://doi.org/10.2139/ssrn.3841273.Google Scholar
[58] Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
[59] Nemirovskij AS, Yudin DB (1985) Problem complexity and method efficiency in optimization. SIAM Rev. 27(2):264–265.Google Scholar
[60] Reiman MI, Wang Q (2008) An asymptotically optimal policy for a quantity-based network revenue management problem. Math. Oper. Res. 33(2):257–282.Link, Google Scholar
[61] Rusmevichientong P, van Roy B, Glynn P (2006) A nonparametric approach to multiproduct pricing. Oper. Res. 54(1):82–98.Link, Google Scholar
[62] Secomandi N (2008) An analysis of the control-algorithm re-solving issue in inventory and revenue management. Manufacturing Service Oper. Management 10(3):468–483.Link, Google Scholar
[63] Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272v1.Google Scholar
[64] Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. Pereira F, Burges CJ, Bottou L, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY).Google Scholar
[65] Talluri KT, Van Ryzin G, Van Ryzin G (2004) The Theory and Practice of Revenue Management, vol. 1 (Springer, New York).Crossref, Google Scholar
[66] Wang Y, Wang H (2020) Constant regret re-solving heuristics for price-based revenue management. Preprint, submitted December 29, https://arxiv.org/abs/2009.02861.Google Scholar
[67] Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:March 28, 2022
Accepted:August 01, 2025
Published Online:October 10, 2025

Cite as

Sentao Miao, Yining Wang (2025) Network Revenue Management with Nonparametric Demand Learning:

\sqrt{T}

-Regret and Polynomial Dimension Dependency. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2022.0086

Keywords

Acknowledgments

Author names are listed in alphabetical order.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Network Revenue Management with Nonparametric Demand Learning: $\sqrt{T}$ -Regret and Polynomial Dimension Dependency

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News

Available Issues

Available Issues

Network Revenue Management with Nonparametric Demand Learning: T-Regret and Polynomial Dimension Dependency

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Network Revenue Management with Nonparametric Demand Learning: $\sqrt{T}$ -Regret and Polynomial Dimension Dependency