Online Allocation of Reusable Resources in Nonstationary Environments

Published Online:https://doi.org/10.1287/moor.2023.0250

References

  • [1] Agrawal S, Devanur NR (2014) Fast algorithms for online stochastic convex programming. Proc. Twenty-Sixth Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 1405–1424.Google Scholar
  • [2] Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.LinkGoogle Scholar
  • [3] Aouad A, Désir A (2022) Representing random utility choice models with neural networks. Preprint, submitted July 26, https://arxiv.org/abs/2207.12877.Google Scholar
  • [4] Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 99 (PMLR, Phoenix), 138–158.Google Scholar
  • [5] Auer P, Jaksch T, Ortner R (2008) Near-optimal regret bounds for reinforcement learning. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [6] Balseiro S, Lu H, Mirrokni V (2020) Dual mirror descent for online allocation problems. Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 613–628.Google Scholar
  • [7] Balseiro SR, Lu H, Mirrokni V (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.LinkGoogle Scholar
  • [8] Besbes O, Elmachtoub AN, Sun Y (2022) Static pricing: Universal guarantees for reusable resources. Oper. Res. 70(2):1143–1152.LinkGoogle Scholar
  • [9] Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [10] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • [11] Castiglioni M, Celli A, Marchesi A, Romano G, Gatti N (2022) A unifying framework for online optimization with long-term constraints. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Advances in Neural Information Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [12] Chen Y, Levi R, Shi C (2017) Revenue management of reusable resources with advanced reservations. Production Oper. Management 26(5):836–859.CrossrefGoogle Scholar
  • [13] Chen Y, Lee CW, Luo H, Wei CY (2019) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 99 (PMLR, New York), 696–726.Google Scholar
  • [14] Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.LinkGoogle Scholar
  • [15] Cheung WC, Simchi-Levi D, Zhu R (2020) Reinforcement learning for non-stationary Markov decision processes: The blessing of (more) optimism. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 1843–1854.Google Scholar
  • [16] Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.LinkGoogle Scholar
  • [17] Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7.CrossrefGoogle Scholar
  • [18] Feng Y, Niazadeh R, Saberi A (2019) Linear programming based online policies for real-time assortment of reusable resources. Preprint, submitted July 17, http://dx.doi.org/10.2139/ssrn.3421227.Google Scholar
  • [19] Feng Y, Niazadeh R, Saberi A (2022) Near-optimal Bayesian online assortment of reusable resources. Pennock DM, Segal I, Seuken S, eds. Proc. 23rd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 964–965.Google Scholar
  • [20] Gajane P, Ortner R, Auer P (2018) A sliding-window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. Preprint, submitted May 25, https://arxiv.org/abs/1805.10066.Google Scholar
  • [21] Golrezaei N, Nazerzadeh H, Rusmevichientong P (2014) Real-time optimization of personalized assortments. Management Sci. 60(6):1532–1551.LinkGoogle Scholar
  • [22] Gong XY, Goyal V, Iyengar G, Simchi-Levi D, Udwani R, Wang S (2022) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.Google Scholar
  • [23] Goyal V, Iyengar G, Udwani R (2025) Asymptotically optimal competitive ratio for online allocation of reusable resources. Oper. Res., ePub ahead of print January 21, https://doi.org/10.1287/opre.2021.0695.Google Scholar
  • [24] Goyal V, Iyengar G, Udwani R (2020) Online allocation of reusable resources via algorithms guided by fluid approximations. Preprint, submitted October 8, https://arxiv.org/abs/2010.03983.Google Scholar
  • [25] Han Y, Pereira FC, Ben-Akiva M, Zegras C (2020) A neural-embedded choice model: Tastenet-MNL modeling taste heterogeneity with flexibility and interpretability. Preprint, submitted February 3, https://arxiv.org/abs/2002.00922.Google Scholar
  • [26] Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2022) Adversarial bandits with knapsacks. J. ACM 69(6):1–47.Google Scholar
  • [27] Jiang J, Li X, Zhang J (2025) Online stochastic optimization with Wasserstein based non-stationarity. Management Sci., ePub ahead of print March 3, https://doi.org/10.1287/mnsc.2020.03850.Google Scholar
  • [28] Kim J, Randhawa RS (2018) The value of dynamic pricing in large queueing systems. Oper. Res. 66(2):409–425.LinkGoogle Scholar
  • [29] Lei Y, Jasin S (2020) Real-time dynamic pricing for revenue management with reusable resources, advance reservation, and deterministic service time requirements. Oper. Res. 68(3):676–685.LinkGoogle Scholar
  • [30] Levi R, Radovanović A (2010) Provably near-optimal LP-based policies for revenue management in systems with reusable resources. Oper. Res. 58(2):503–507.LinkGoogle Scholar
  • [31] Li X, Ye Y (2022) Online linear programming: Dual convergence, new algorithms, and regret bounds. Oper. Res. 70(5):2948–2966.LinkGoogle Scholar
  • [32] Li X, Sun C, Ye Y (2020) Simple and fast algorithm for binary integer and online linear programming. Larochelle H, Ranzato MA, Hadsell R, Balcan M-F, Lin H-T, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 9412–9421.Google Scholar
  • [33] Liu S, Jiang J, Li X (2022) Non-stationary bandits with knapsacks. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Thirty-Sixth Annual Conf. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [34] Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Proc. 31st Conf. Learn. Theory (COLT), Proceedings of Machine Learning Research, vol. 75 (PMLR, New York), 1739–1776.Google Scholar
  • [35] Ma W, Simchi-Levi D (2020) Algorithms for online matching, assortment, and pricing with tight weight-dependent competitive ratios. Oper. Res. 68(6):1787–1803.LinkGoogle Scholar
  • [36] Mao W, Zhang K, Zhu R, Simchi-Levi D, Basar T (2021) Near-optimal model-free reinforcement learning in non-stationary episodic MDPs. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn. (ICML), Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 7447–7458.Google Scholar
  • [37] Rangi A, Franceschetti M, Tran-Thanh L (2019) Unifying the stochastic and the adversarial bandits with knapsack. Kraus S, ed. Proc. Twenty-Eighth Internat. Joint Conf. Artificial Intelligence (IJCAI-19) (International Joint Conferences on Artificial Intelligence Organization, Macao, China), 3311–3317.Google Scholar
  • [38] Rusmevichientong P, Sumida M, Topaloglu H (2020) Dynamic assortment optimization for reusable products with random usage durations. Management Sci. 66(7):2820–2844.LinkGoogle Scholar
  • [39] Russac Y, Vernade C, Cappé O (2019) Weighted linear bandits for non-stationary environments. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [40] Wei CY, Luo H (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Proc. Thirty Fourth Conf. Learn. Theory (PMLR, New York), 4300–4354.Google Scholar
  • [41] Xie X, Gurvich I, Küçükyavuz S (2024) Dynamic allocation of reusable resources: Logarithmic regret in overloaded networks. Oper. Res., ePub ahead of print July 24, https://doi.org/10.1287/opre.2022.0429.Google Scholar
  • [42] Yu H, Neely M, Wei X (2017) Online convex optimization with stochastic constraints. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [43] Yuan J, Lamperski A (2018) Online convex optimization for cumulative constraints. Bengio S, Chockler H, Wehbe L, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [44] Zhang X, Cheung WC (2022) Online resource allocation for reusable resources. Preprint, submitted December 6, https://arxiv.org/abs/2212.02855.Google Scholar
  • [45] Zhong Y, Birge JR, Ward A (2022) Learning the scheduling policy in time-varying multiclass many server queues with abandonment. Preprint, submitted April 21, http://dx.doi.org/10.2139/ssrn.4090021.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.