Online Resource Allocation with Personalized Learning
Published Online:10 May 2022https://doi.org/10.1287/opre.2022.2294
References
- (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS) (Currant Associates, Red Hook, NY), 24:2312–2320. Google Scholar
- (2014) Taming the monster: A fast and simple algorithm for contextual bandits. Proc. Internat. Conf. on Machine Learn., 1638–1646.Google Scholar
- Agarwal S, Devanur NR (2016) Linear contextual bandits with knapsacks. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems (NIPS) (Curran Associates Inc., Red Hook, NY), 29:3450–3458.Google Scholar
- Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. on Econom. and Comput. (ACM, Palo Alto, CA), 989–1006.Google Scholar
- Agrawal S, Devanur NR, Li L (2016) An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. J. Machine Learn. Res. 49:4–18.Google Scholar
- Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
- Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. Proc. 54th Annual Sympos. on Foundations of Computer Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
- Badanidiyuru A, Langford J, Slivkins A (2014) Resourceful contextual bandits. Proc. 27th Annual Conf. on Learn. Theory (PMLR), 35:1109–1134.Google Scholar
- (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
- (2020) Personalized treatment for coronary artery disease patients: A machine learning approach. Health Care Management Sci. 23(4):482–506.Crossref, Google Scholar
- Bistritz I, Zhou Z, Chen X, Bambos N, Blanchet J (2019) Online exp3 learning in adversarial bandits with delayed feedback. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (NIPS) (Curran Associates, Red Hook, NY), 32:11349–11358.Google Scholar
- (2007) Online primal-dual algorithms for maximizing ad-auctions revenue. Algorithms–ESA 2007:253–264.Google Scholar
- Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereria F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS) (Curran Associates, Red Hook, NY), 24:2249–2257.Google Scholar
- (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
- Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.Google Scholar
- Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under Bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
- Devanur NR, Jain K (2012) Online matching with concave returns. Proc. 44th Annual ACM Symp. Theory Comput. (Association for Computing Machinery, New York), 137–144.Google Scholar
- Dudik M, Hsu D, Kale S, Karampatziakis N, Langford J, Reyzin L, Zhang T (2011) Efficient optimal learning for contextual bandits. Preprint, submitted June 13, https://arxiv.org/abs/1106.2369.Google Scholar
- (2014) Appointment scheduling under patient preference and no-show behavior. Oper. Res. 62(4):794–811.Link, Google Scholar
- Feng Y, Niazadeh R, Saberi A (2020) Near-optimal Bayesian online assortment of reusable resources. Preprint, submitted 24 September 24, revised 29 October, 2020, https://ssrn.com/abstract=3714338.Google Scholar
- (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2014) Real-time optimization of personalized assortments. Management Sci. 60(6):1532–1551.Link, Google Scholar
- (2021) Online assortment optimization with reusable resources. Management Sci., ePub ahead of print November 11, https://doi.org/10.1287/mnsc.2021.4134.Link, Google Scholar
- (2008) Appointment scheduling in healthcare: Challenges and opportunities. IIE Trans. 40(9):800–819.Crossref, Google Scholar
- (2008) Revenue management for a primary-care clinic in the presence of patient choice. Oper. Res. 56(3):576–592.Link, Google Scholar
- (2013) Online learning under delayed feedback. Proc. Internat. Conf. on Machine Learn., 1453–1461.Google Scholar
- (2016) Online budgeted allocation with general budgets. Proc. ACM Conf. on Econom. and Comput. (Association for Computing Machinery, New York), 419–436.Google Scholar
- (2021) Online advance scheduling with overtime: A primal-dual approach. Manufacturing Service Oper. Management 23(1):246–266.Link, Google Scholar
- Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
- (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web, 661–670.Google Scholar
- (2010) Dynamic scheduling of outpatient appointments under patient no-shows and cancellations. Manufacturing Service Oper. Management 12(2):347–364.Link, Google Scholar
- Mehta A, Saberi A, Vazirani U, Vazirani V (2005) Adwords and generalized on-line matching. FOCS’05: Proc. 46th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society, Washington, DC), 264–273.Google Scholar
- (1998) On a stochastic knapsack problem and generalizations. Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search (Springer, Berlin), 149–168.Google Scholar
- Pan X, Song J, Zhao J, Truong VA (2020) Online contextual learning with perishable resources allocation. IISE Trans. 52(12):1343–1357.Google Scholar
- (2008) Dynamic multipriority patient scheduling for a diagnostic resource. Oper. Res. 56(6):1507–1525.Link, Google Scholar
- Pike-Burke C, Agrawal S, Szepesvari C, Grunewalder S (2017) Bandits with delayed, aggregated anonymous feedback. Preprint, submitted September 20, last revised June 13, 2018, https://arxiv.org/abs/1709.06853.Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- Slivkins A (2019) Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, vol. 12 (Now Publishers, Boston).Google Scholar
- (2020) Advance service reservations with heterogeneous customers. Management Sci. 66(7):2929–2950.Link, Google Scholar
- (2015) Optimal advance scheduling. Management Sci. 61(7):1584–1597.Link, Google Scholar
- (2000) Finite horizon stochastic knapsacks with applications to yield management. Oper. Res. 48(1):155–172.Link, Google Scholar
- Vernade C, Capp’e O, Perchet V (2017) Stochastic bandit models for delayed conversions. Preprint, submitted June 28, last revised 12 July 2017, https://arxiv.org/abs/1706.09186.Google Scholar
- (2011) Adaptive appointment systems with patient preferences. Manufacturing Service Oper. Management 13(3):373–389.Link, Google Scholar
- Wang X, Truong VA, Bank D (2018) Online advance admission scheduling for services with customer preferences. Preprint, submitted May 26, https://arxiv.org/abs/1805.10412.Google Scholar
- Zhou Z, Xu R, Blanchet J (2019) Learning in generalized linear contextual bandits with stochastic delays. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems 32:5197–5208.Google Scholar

