A Primal-Dual Approach Toward Resource-Constrained Revenue Management with Demand Learning and Large Action Space

Published Online:https://doi.org/10.1287/opre.2021.0483

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, vol. 11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Agrawal S, Devanur N (2016) Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY), 3450–3458.Google Scholar
  • Agrawal S, Devanur NR (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.LinkGoogle Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. 2017 Conf. Learn. Theory (PMLR, New York), 76–78.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(v):397–422.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.CrossrefGoogle Scholar
  • Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.LinkGoogle Scholar
  • Aznag A, Goyal V, Noemie P (2021) MNL-bandit with knapsacks. Preprint, submitted June 2, https://arxiv.org/abs/2106.01135.Google Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE), 207–216.Google Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):1–55.CrossrefGoogle Scholar
  • Balseiro SR, Lu H, Mirrokni V (2022) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.LinkGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.LinkGoogle Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
  • Chen Y, Shi C (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Google Scholar
  • Chen X, Wang Y (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.CrossrefGoogle Scholar
  • Chen Q, Jasin S, Duenyas I (2019) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.LinkGoogle Scholar
  • Chen X, Wang Y, Zhou Y (2021) Dynamic assortment selection under the nested logit models. Production Oper. Management 30(1):85–102.CrossrefGoogle Scholar
  • Cheung WC, Simchi-Levi D (2017) Assortment optimization under unknown multinomial logit choice models. Preprint, submitted April 1, https://arxiv.org/abs/1704.00108.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
  • Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.LinkGoogle Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Frazier PI (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
  • Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.LinkGoogle Scholar
  • Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.LinkGoogle Scholar
  • Jiang J, Li X, Zhang J (2025) Online stochastic optimization with Wasserstein-based nonstationarity. Management Sci. 71(11):9104–9122.Google Scholar
  • Jin C, Yang Z, Wang Z, Jordan MI (2020) Provably efficient reinforcement learning with linear function approximation. Proc. Thirty Third Conf. Learn. Theory (PMLR, New York), 2137–2143.Google Scholar
  • Klein R, Koch S, Steinhardt C, Strauss AK (2020) A review of revenue management: Recent generalizations and advances in industry applications. Eur. J. Oper. Res. 284(2):397–412.CrossrefGoogle Scholar
  • Lei YM, Jasin S, Sinha A (2014) Nearoptimal bisection search for nonparametric dynamic pricing with inventory constraint. Ross School of Business Paper No. 1252, Ann Arbor, MI.Google Scholar
  • Lei YM, Jasin S, Uichanco J, Vakhutinsky A (2022) Joint product framing (display, ranking, pricing) and order fulfillment under the multinomial logit model for e-commerce retailers. Manufacturing Service Oper. Management 24(3):1529–1546.Google Scholar
  • Liu Q, Van Ryzin G (2008) On the choice-based linear programming model for network revenue management. Manufacturing Service Oper. Management 10(2):288–310.LinkGoogle Scholar
  • McFadden D, Train K (2000) Mixed MNL models for discrete response. J. Appl. Econometrics 15(5):447–470.CrossrefGoogle Scholar
  • Megiddo N (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.LinkGoogle Scholar
  • Miao S, Chao X (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
  • Rusmevichientong P, Shen Z-JM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Talluri KT, Van Ryzin GJ (2006) The Theory and Practice of Revenue Management, vol. 68 (Springer Science & Business Media, New York).Google Scholar
  • Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
  • Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.