A Primal-Dual Approach Toward Resource-Constrained Revenue Management with Demand Learning and Large Action Space

Sentao Miao
Corresponding Author
Sentao Miao
[email protected]
https://orcid.org/0000-0002-0380-0797
Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309
Search for more papers by this author
,
Yining Wang
Yining Wang
[email protected]
https://orcid.org/0000-0001-9410-0392
Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author
,
Jiawei Zhang
Jiawei Zhang
[email protected]
https://orcid.org/0000-0003-4988-6028
Leonard N. Stern School of Business, New York University, New York, New York 10011
Search for more papers by this author

Sentao Miao

Corresponding Author

Sentao Miao

[email protected]

https://orcid.org/0000-0002-0380-0797

Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309

Search for more papers by this author

Yining Wang

[email protected]

https://orcid.org/0000-0001-9410-0392

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Jiawei Zhang

[email protected]

https://orcid.org/0000-0003-4988-6028

Leonard N. Stern School of Business, New York University, New York, New York 10011

Search for more papers by this author

Published Online:5 Dec 2025https://doi.org/10.1287/opre.2021.0483

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, vol. 11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
Agrawal S, Devanur N (2016) Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY), 3450–3458.Google Scholar
Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. 2017 Conf. Learn. Theory (PMLR, New York), 76–78.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(v):397–422.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Link, Google Scholar
Aznag A, Goyal V, Noemie P (2021) MNL-bandit with knapsacks. Preprint, submitted June 2, https://arxiv.org/abs/2106.01135.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE), 207–216.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):1–55.Crossref, Google Scholar
Balseiro SR, Lu H, Mirrokni V (2022) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.Link, Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Google Scholar
Chen X, Wang Y (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
Chen Q, Jasin S, Duenyas I (2019) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.Link, Google Scholar
Chen X, Wang Y, Zhou Y (2021) Dynamic assortment selection under the nested logit models. Production Oper. Management 30(1):85–102.Crossref, Google Scholar
Cheung WC, Simchi-Levi D (2017) Assortment optimization under unknown multinomial logit choice models. Preprint, submitted April 1, https://arxiv.org/abs/1704.00108.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Frazier PI (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.Link, Google Scholar
Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.Link, Google Scholar
Jiang J, Li X, Zhang J (2025) Online stochastic optimization with Wasserstein-based nonstationarity. Management Sci. 71(11):9104–9122.Google Scholar
Jin C, Yang Z, Wang Z, Jordan MI (2020) Provably efficient reinforcement learning with linear function approximation. Proc. Thirty Third Conf. Learn. Theory (PMLR, New York), 2137–2143.Google Scholar
Klein R, Koch S, Steinhardt C, Strauss AK (2020) A review of revenue management: Recent generalizations and advances in industry applications. Eur. J. Oper. Res. 284(2):397–412.Crossref, Google Scholar
Lei YM, Jasin S, Sinha A (2014) Nearoptimal bisection search for nonparametric dynamic pricing with inventory constraint. Ross School of Business Paper No. 1252, Ann Arbor, MI.Google Scholar
Lei YM, Jasin S, Uichanco J, Vakhutinsky A (2022) Joint product framing (display, ranking, pricing) and order fulfillment under the multinomial logit model for e-commerce retailers. Manufacturing Service Oper. Management 24(3):1529–1546.Google Scholar
Liu Q, Van Ryzin G (2008) On the choice-based linear programming model for network revenue management. Manufacturing Service Oper. Management 10(2):288–310.Link, Google Scholar
McFadden D, Train K (2000) Mixed MNL models for discrete response. J. Appl. Econometrics 15(5):447–470.Crossref, Google Scholar
Megiddo N (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.Link, Google Scholar
Miao S, Chao X (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
Rusmevichientong P, Shen Z-JM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Talluri KT, Van Ryzin GJ (2006) The Theory and Practice of Revenue Management, vol. 68 (Springer Science & Business Media, New York).Google Scholar
Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316.Crossref, Google Scholar

Volume 74, Issue 2

March-April 2026

Pages v-ix, 573-1152, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:July 26, 2021
Accepted:October 14, 2025
Published Online:December 05, 2025

Cite as

Sentao Miao, Yining Wang, Jiawei Zhang (2025) A Primal-Dual Approach Toward Resource-Constrained Revenue Management with Demand Learning and Large Action Space. Operations Research 74(2):825-839.

https://doi.org/10.1287/opre.2021.0483

Keywords

Acknowledgments

The authors are grateful to the review team for their constructive suggestions, which greatly improved the manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Primal-Dual Approach Toward Resource-Constrained Revenue Management with Demand Learning and Large Action Space

References

Volume 74, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News