Online Resource Allocation with Personalized Learning

Mohammad Zhalechian
Mohammad Zhalechian
[email protected]
https://orcid.org/0000-0002-1174-6102
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;
Search for more papers by this author
,
Esmaeil Keyvanshokooh
Esmaeil Keyvanshokooh
[email protected]
https://orcid.org/0000-0001-9634-3806
Department of Information and Operations Management, Mayes Business School, Texas A&M University, College Station, Texas 77845
Search for more papers by this author
,
Cong Shi
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;
Search for more papers by this author
,
Mark P. Van Oyen
Corresponding Author
Mark P. Van Oyen
[email protected]
https://orcid.org/0000-0002-8685-7843
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;
Search for more papers by this author

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;

Department of Information and Operations Management, Mayes Business School, Texas A&M University, College Station, Texas 77845

Search for more papers by this author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;

Search for more papers by this author

Mark P. Van Oyen

Corresponding Author

Mark P. Van Oyen

[email protected]

https://orcid.org/0000-0002-8685-7843

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;

Search for more papers by this author

Published Online:10 May 2022https://doi.org/10.1287/opre.2022.2294

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS) (Currant Associates, Red Hook, NY), 24:2312–2320. Google Scholar
Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire R (2014) Taming the monster: A fast and simple algorithm for contextual bandits. Proc. Internat. Conf. on Machine Learn., 1638–1646.Google Scholar
Agarwal S, Devanur NR (2016) Linear contextual bandits with knapsacks. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems (NIPS) (Curran Associates Inc., Red Hook, NY), 29:3450–3458.Google Scholar
Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. on Econom. and Comput. (ACM, Palo Alto, CA), 989–1006.Google Scholar
Agrawal S, Devanur NR, Li L (2016) An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. J. Machine Learn. Res. 49:4–18.Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. Proc. 54th Annual Sympos. on Foundations of Computer Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
Badanidiyuru A, Langford J, Slivkins A (2014) Resourceful contextual bandits. Proc. 27th Annual Conf. on Learn. Theory (PMLR), 35:1109–1134.Google Scholar
Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
Bertsimas D, Orfanoudaki A, Weiner RB (2020) Personalized treatment for coronary artery disease patients: A machine learning approach. Health Care Management Sci. 23(4):482–506.Crossref, Google Scholar
Bistritz I, Zhou Z, Chen X, Bambos N, Blanchet J (2019) Online exp3 learning in adversarial bandits with delayed feedback. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (NIPS) (Curran Associates, Red Hook, NY), 32:11349–11358.Google Scholar
Buchbinder N, Jain K, Naor J (2007) Online primal-dual algorithms for maximizing ad-auctions revenue. Algorithms–ESA 2007:253–264.Google Scholar
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereria F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS) (Curran Associates, Red Hook, NY), 24:2249–2257.Google Scholar
Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under Bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
Devanur NR, Jain K (2012) Online matching with concave returns. Proc. 44th Annual ACM Symp. Theory Comput. (Association for Computing Machinery, New York), 137–144.Google Scholar
Dudik M, Hsu D, Kale S, Karampatziakis N, Langford J, Reyzin L, Zhang T (2011) Efficient optimal learning for contextual bandits. Preprint, submitted June 13, https://arxiv.org/abs/1106.2369.Google Scholar
Feldman J, Liu N, Topaloglu H, Ziya S (2014) Appointment scheduling under patient preference and no-show behavior. Oper. Res. 62(4):794–811.Link, Google Scholar
Feng Y, Niazadeh R, Saberi A (2020) Near-optimal Bayesian online assortment of reusable resources. Preprint, submitted 24 September 24, revised 29 October, 2020, https://ssrn.com/abstract=3714338.Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Golrezaei N, Nazerzadeh H, Rusmevichientong P (2014) Real-time optimization of personalized assortments. Management Sci. 60(6):1532–1551.Link, Google Scholar
Gong XY, Goyal V, Iyengar GN, Simchi-Levi D, Udwani R, Wang S (2021) Online assortment optimization with reusable resources. Management Sci., ePub ahead of print November 11, https://doi.org/10.1287/mnsc.2021.4134.Link, Google Scholar
Gupta D, Denton B (2008) Appointment scheduling in healthcare: Challenges and opportunities. IIE Trans. 40(9):800–819.Crossref, Google Scholar
Gupta D, Wang L (2008) Revenue management for a primary-care clinic in the presence of patient choice. Oper. Res. 56(3):576–592.Link, Google Scholar
Joulani P, Gyorgy A, Szepesvári C (2013) Online learning under delayed feedback. Proc. Internat. Conf. on Machine Learn., 1453–1461.Google Scholar
Kell N, Panigrahi D (2016) Online budgeted allocation with general budgets. Proc. ACM Conf. on Econom. and Comput. (Association for Computing Machinery, New York), 419–436.Google Scholar
Keyvanshokooh E, Shi C, Van Oyen MP (2021) Online advance scheduling with overtime: A primal-dual approach. Manufacturing Service Oper. Management 23(1):246–266.Link, Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web, 661–670.Google Scholar
Liu N, Ziya S, Kulkarni VG (2010) Dynamic scheduling of outpatient appointments under patient no-shows and cancellations. Manufacturing Service Oper. Management 12(2):347–364.Link, Google Scholar
Mehta A, Saberi A, Vazirani U, Vazirani V (2005) Adwords and generalized on-line matching. FOCS’05: Proc. 46th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society, Washington, DC), 264–273.Google Scholar
Morton DP, Wood RK (1998) On a stochastic knapsack problem and generalizations. Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search (Springer, Berlin), 149–168.Google Scholar
Pan X, Song J, Zhao J, Truong VA (2020) Online contextual learning with perishable resources allocation. IISE Trans. 52(12):1343–1357.Google Scholar
Patrick J, Puterman ML, Queyranne M (2008) Dynamic multipriority patient scheduling for a diagnostic resource. Oper. Res. 56(6):1507–1525.Link, Google Scholar
Pike-Burke C, Agrawal S, Szepesvari C, Grunewalder S (2017) Bandits with delayed, aggregated anonymous feedback. Preprint, submitted September 20, last revised June 13, 2018, https://arxiv.org/abs/1709.06853.Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, vol. 12 (Now Publishers, Boston).Google Scholar
Stein C, Truong VA, Wang X (2020) Advance service reservations with heterogeneous customers. Management Sci. 66(7):2929–2950.Link, Google Scholar
Truong VA (2015) Optimal advance scheduling. Management Sci. 61(7):1584–1597.Link, Google Scholar
Van Slyke R, Young Y (2000) Finite horizon stochastic knapsacks with applications to yield management. Oper. Res. 48(1):155–172.Link, Google Scholar
Vernade C, Capp’e O, Perchet V (2017) Stochastic bandit models for delayed conversions. Preprint, submitted June 28, last revised 12 July 2017, https://arxiv.org/abs/1706.09186.Google Scholar
Wang WY, Gupta D (2011) Adaptive appointment systems with patient preferences. Manufacturing Service Oper. Management 13(3):373–389.Link, Google Scholar
Wang X, Truong VA, Bank D (2018) Online advance admission scheduling for services with customer preferences. Preprint, submitted May 26, https://arxiv.org/abs/1805.10412.Google Scholar
Zhou Z, Xu R, Blanchet J (2019) Learning in generalized linear contextual bandits with stochastic delays. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems 32:5197–5208.Google Scholar

Volume 70, Issue 4

July-August 2022

Pages iii-vii, 1953-2596, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:January 17, 2020
Accepted:February 11, 2022
Published Online:May 10, 2022

Cite as

Mohammad Zhalechian, Esmaeil Keyvanshokooh, Cong Shi, Mark P. Van Oyen (2022) Online Resource Allocation with Personalized Learning. Operations Research 70(4):2138-2161.

https://doi.org/10.1287/opre.2022.2294

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Resource Allocation with Personalized Learning

References

Volume 70, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News