A Primal-Dual Approach Toward Resource-Constrained Revenue Management with Demand Learning and Large Action Space
References
- (2011) Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, vol. 11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
- (2016) Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY), 3450–3458.Google Scholar
- (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
- (2017) Thompson sampling for the MNL-bandit. Proc. 2017 Conf. Learn. Theory (PMLR, New York), 76–78.Google Scholar
- (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
- (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(v):397–422.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
- (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Link, Google Scholar
- (2021) MNL-bandit with knapsacks. Preprint, submitted June 2, https://arxiv.org/abs/2106.01135.Google Scholar
- (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE), 207–216.Google Scholar
- (2018) Bandits with knapsacks. J. ACM 65(3):1–55.Crossref, Google Scholar
- (2022) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.Link, Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
- (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Sci. 61(4):723–739.Link, Google Scholar
- (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
- (2023) Network revenue management with online inverse batch gradient descent method. Production Oper. Management 32(7):2123–2137.Google Scholar
- (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
- (2019) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.Link, Google Scholar
- (2021) Dynamic assortment selection under the nested logit models. Production Oper. Management 30(1):85–102.Crossref, Google Scholar
- (2017) Assortment optimization under unknown multinomial logit choice models. Preprint, submitted April 1, https://arxiv.org/abs/1704.00108.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
- (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
- (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://arxiv.org/abs/1807.02811.Google Scholar
- (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.Link, Google Scholar
- (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.Link, Google Scholar
- (2025) Online stochastic optimization with Wasserstein-based nonstationarity. Management Sci. 71(11):9104–9122.Google Scholar
- (2020) Provably efficient reinforcement learning with linear function approximation. Proc. Thirty Third Conf. Learn. Theory (PMLR, New York), 2137–2143.Google Scholar
- (2020) A review of revenue management: Recent generalizations and advances in industry applications. Eur. J. Oper. Res. 284(2):397–412.Crossref, Google Scholar
- (2014) Nearoptimal bisection search for nonparametric dynamic pricing with inventory constraint. Ross School of Business Paper No. 1252, Ann Arbor, MI.Google Scholar
- (2022) Joint product framing (display, ranking, pricing) and order fulfillment under the multinomial logit model for e-commerce retailers. Manufacturing Service Oper. Management 24(3):1529–1546.Google Scholar
- (2008) On the choice-based linear programming model for network revenue management. Manufacturing Service Oper. Management 10(2):288–310.Link, Google Scholar
- (2000) Mixed MNL models for discrete response. J. Appl. Econometrics 15(5):447–470.Crossref, Google Scholar
- (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.Link, Google Scholar
- (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
- (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
- (2006) The Theory and Practice of Revenue Management, vol. 68 (Springer Science & Business Media, New York).Google Scholar
- (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar
- (2020) On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316.Crossref, Google Scholar

