Online Allocation of Reusable Resources in Nonstationary Environments

Xilin Zhang
Xilin Zhang
[email protected]
https://orcid.org/0009-0008-8351-7262
Industrial System Engineering and Management, National University of Singapore, Singapore 117576
Search for more papers by this author
,
Wang Chi Cheung
Corresponding Author
Wang Chi Cheung
[email protected]
https://orcid.org/0000-0001-9380-5673
Industrial System Engineering and Management, National University of Singapore, Singapore 117576
Search for more papers by this author

Industrial System Engineering and Management, National University of Singapore, Singapore 117576

Search for more papers by this author

Wang Chi Cheung

Corresponding Author

Wang Chi Cheung

[email protected]

https://orcid.org/0000-0001-9380-5673

Industrial System Engineering and Management, National University of Singapore, Singapore 117576

Search for more papers by this author

Published Online:27 Jun 2025https://doi.org/10.1287/moor.2023.0250

References

[1] Agrawal S, Devanur NR (2014) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 1405–1424.Google Scholar
[2] Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
[3] Aouad A, Désir A (2022) Representing random utility choice models with neural networks. Preprint, submitted July 26, https://arxiv.org/abs/2207.12877.Google Scholar
[4] Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 99 (PMLR, Phoenix), 138–158.Google Scholar
[5] Auer P, Jaksch T, Ortner R (2008) Near-optimal regret bounds for reinforcement learning. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY).Google Scholar
[6] Balseiro S, Lu H, Mirrokni V (2020) Dual mirror descent for online allocation problems. Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 613–628.Google Scholar
[7] Balseiro SR, Lu H, Mirrokni V (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.Link, Google Scholar
[8] Besbes O, Elmachtoub AN, Sun Y (2022) Static pricing: Universal guarantees for reusable resources. Oper. Res. 70(2):1143–1152.Link, Google Scholar
[9] Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY).Google Scholar
[10] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
[11] Castiglioni M, Celli A, Marchesi A, Romano G, Gatti N (2022) A unifying framework for online optimization with long-term constraints. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
[12] Chen Y, Levi R, Shi C (2017) Revenue management of reusable resources with advanced reservations. Production Oper. Management 26(5):836–859.Crossref, Google Scholar
[13] Chen Y, Lee CW, Luo H, Wei CY (2019) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 99 (PMLR, New York), 696–726.Google Scholar
[14] Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
[15] Cheung WC, Simchi-Levi D, Zhu R (2020) Reinforcement learning for non-stationary Markov decision processes: The blessing of (more) optimism. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 1843–1854.Google Scholar
[16] Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
[17] Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7.Crossref, Google Scholar
[18] Feng Y, Niazadeh R, Saberi A (2019) Linear programming based online policies for real-time assortment of reusable resources. Preprint, submitted July 17, http://dx.doi.org/10.2139/ssrn.3421227.Google Scholar
[19] Feng Y, Niazadeh R, Saberi A (2022) Near-optimal Bayesian online assortment of reusable resources. Pennock DM, Segal I, Seuken S, eds. Proc. 23rd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 964–965.Google Scholar
[20] Gajane P, Ortner R, Auer P (2018) A sliding-window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. Preprint, submitted May 25, https://arxiv.org/abs/1805.10066.Google Scholar
[21] Golrezaei N, Nazerzadeh H, Rusmevichientong P (2014) Real-time optimization of personalized assortments. Management Sci. 60(6):1532–1551.Link, Google Scholar
[22] Gong XY, Goyal V, Iyengar G, Simchi-Levi D, Udwani R, Wang S (2022) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.Google Scholar
[23] Goyal V, Iyengar G, Udwani R (2025) Asymptotically optimal competitive ratio for online allocation of reusable resources. Oper. Res., ePub ahead of print January 21, https://doi.org/10.1287/opre.2021.0695.Google Scholar
[24] Goyal V, Iyengar G, Udwani R (2020) Online allocation of reusable resources via algorithms guided by fluid approximations. Preprint, submitted October 8, https://arxiv.org/abs/2010.03983.Google Scholar
[25] Han Y, Pereira FC, Ben-Akiva M, Zegras C (2020) A neural-embedded choice model: Tastenet-MNL modeling taste heterogeneity with flexibility and interpretability. Preprint, submitted February 3, https://arxiv.org/abs/2002.00922.Google Scholar
[26] Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2022) Adversarial bandits with knapsacks. J. ACM 69(6):1–47.Google Scholar
[27] Jiang J, Li X, Zhang J (2025) Online stochastic optimization with Wasserstein based non-stationarity. Management Sci., ePub ahead of print March 3, https://doi.org/10.1287/mnsc.2020.03850.Google Scholar
[28] Kim J, Randhawa RS (2018) The value of dynamic pricing in large queueing systems. Oper. Res. 66(2):409–425.Link, Google Scholar
[29] Lei Y, Jasin S (2020) Real-time dynamic pricing for revenue management with reusable resources, advance reservation, and deterministic service time requirements. Oper. Res. 68(3):676–685.Link, Google Scholar
[30] Levi R, Radovanović A (2010) Provably near-optimal LP-based policies for revenue management in systems with reusable resources. Oper. Res. 58(2):503–507.Link, Google Scholar
[31] Li X, Ye Y (2022) Online linear programming: Dual convergence, new algorithms, and regret bounds. Oper. Res. 70(5):2948–2966.Link, Google Scholar
[32] Li X, Sun C, Ye Y (2020) Simple and fast algorithm for binary integer and online linear programming. Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 9412–9421.Google Scholar
[33] Liu S, Jiang J, Li X (2022) Non-stationary bandits with knapsacks. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Thirty-Sixth Annual Conf. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
[34] Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Proc. 31st Conf. Learn. Theory (COLT), Proceedings of Machine Learning Research, vol. 75 (PMLR, New York), 1739–1776.Google Scholar
[35] Ma W, Simchi-Levi D (2020) Algorithms for online matching, assortment, and pricing with tight weight-dependent competitive ratios. Oper. Res. 68(6):1787–1803.Link, Google Scholar
[36] Mao W, Zhang K, Zhu R, Simchi-Levi D, Basar T (2021) Near-optimal model-free reinforcement learning in non-stationary episodic MDPs. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn. (ICML), Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 7447–7458.Google Scholar
[37] Rangi A, Franceschetti M, Tran-Thanh L (2019) Unifying the stochastic and the adversarial bandits with knapsack. Kraus S, ed. Proc. Twenty-Eighth Internat. Joint Conf. Artificial Intelligence (IJCAI-19) (International Joint Conferences on Artificial Intelligence Organization, Macao, China), 3311–3317.Google Scholar
[38] Rusmevichientong P, Sumida M, Topaloglu H (2020) Dynamic assortment optimization for reusable products with random usage durations. Management Sci. 66(7):2820–2844.Link, Google Scholar
[39] Russac Y, Vernade C, Cappé O (2019) Weighted linear bandits for non-stationary environments. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
[40] Wei CY, Luo H (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Proc. Thirty Fourth Conf. Learn. Theory (PMLR, New York), 4300–4354.Google Scholar
[41] Xie X, Gurvich I, Küçükyavuz S (2024) Dynamic allocation of reusable resources: Logarithmic regret in overloaded networks. Oper. Res., ePub ahead of print July 24, https://doi.org/10.1287/opre.2022.0429.Google Scholar
[42] Yu H, Neely M, Wei X (2017) Online convex optimization with stochastic constraints. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY).Google Scholar
[43] Yuan J, Lamperski A (2018) Online convex optimization for cumulative constraints. Bengio S, Chockler H, Wehbe L, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY).Google Scholar
[44] Zhang X, Cheung WC (2022) Online resource allocation for reusable resources. Preprint, submitted December 6, https://arxiv.org/abs/2212.02855.Google Scholar
[45] Zhong Y, Birge JR, Ward A (2022) Learning the scheduling policy in time-varying multiclass many server queues with abandonment. Preprint, submitted April 21, http://dx.doi.org/10.2139/ssrn.4090021.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:August 14, 2023
Accepted:January 25, 2025
Published Online:June 27, 2025

Cite as

Xilin Zhang, Wang Chi Cheung (2025) Online Allocation of Reusable Resources in Nonstationary Environments. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0250

Keywords

Acknowledgments

The authors thank the area editor, the associate editor, and the reviewers for their detailed and well-thought suggestions on improving the exposition and positioning the research results.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Allocation of Reusable Resources in Nonstationary Environments

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News