Bandits atop Reinforcement Learning: Tackling Online Inventory Models with Cyclic Demands

Xiao-Yue Gong
Corresponding Author
Xiao-Yue Gong
[email protected]
https://orcid.org/0000-0002-4647-3941
Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;
Search for more papers by this author
,
David Simchi-Levi
David Simchi-Levi
[email protected]
https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Xiao-Yue Gong

Corresponding Author

Xiao-Yue Gong

[email protected]

https://orcid.org/0000-0002-4647-3941

Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;

Search for more papers by this author

David Simchi-Levi

[email protected]

https://orcid.org/0000-0002-4650-1519

Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:26 Oct 2023https://doi.org/10.1287/mnsc.2023.4947

References

Abbasi-Yadkori Y, Bartlett PL, Kanade V, Seldin Y, Szepesvári C (2013) Online learning in Markov decision processes with adversarially chosen transition probability distributions. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Agrawal S, Jia R (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.Google Scholar
Aviv Y, Federgruen A (1997) Stochastic inventory models with limited production capacity and periodically varying parameters. Probability Engrg. Inform. Sci. 11(1):107–135.Crossref, Google Scholar
Balseiro SR, Golrezaei N, Mahdian M, Mirrokni VS, Schneider J (2019) Contextual bandits with cross-learning. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Chatwin RE (1998) Multiperiod airline overbooking with a single fare class. Oper. Res. 46(6):805–819.Link, Google Scholar
Chen B (2021) Production Oper. Management 30(5):1365–1385.Google Scholar
Chen B, Shi C (2019) Tailored base-surge policies in dual-sourcing inventory systems with demand learning. Preprint, submitted September 27, https://dx.doi.org/10.2139/ssrn.3456834.Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2020) Reinforcement learning for non-stationary Markov decision processes: The blessing of (more) optimism. Daumé III H, Aarti S, eds. Proc. 37th Internat. Conf. Machine Learn. Proceedings of Machine Learning Research Series, vol. 119 (PMLR, New York),1843–1854.Google Scholar
Dann C, Mansour Y, Mohri M, Sekhari A, Sridharan K (2020) Reinforcement learning with feedback graphs. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin Hm, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 16868–16878.Google Scholar
Davoodi M, Katehakis MN, Yang J (2022) Dynamic inventory control with fixed setup costs and unknown discrete demand distribution. Oper. Res. 70(3):1560–1576.Google Scholar
Dong S, Roy BV, Zhou Z (2019) Provably efficient reinforcement learning with aggregated states. Preprint, submitted December 13, https://doi.org/10.48550/arXiv.1912.06366.Google Scholar
Ehrenthal J, Honhon D, Woensel TV (2014) Demand seasonality in retail inventory management. Eur. J. Oper. Res. 238(2):527–539.Crossref, Google Scholar
Huh WT, Rusmevichientong P (2009a) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT, Rusmevichientong P (2009b) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT, Rusmevichientong P (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150–159.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Huh WT, Levi R, Rusmevichientong P, Orlin JB (2011) Adaptive data-driven inventory control with censored demand based on Kaplan-Meier estimator. Oper. Res. 59(4):929–941.Link, Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY). Google Scholar
Kaggle (2015) Rossmann store sales. Accessed August 15, 2020, https://www.kaggle.com/c/rossmann-store-sales/overview.Google Scholar
Karlin S (1960) Optimal policy for dynamic inventory process with stochastic demands subject to seasonal variations. J. Soc. Industrial Appl. Math. 8(4):611–629.Google Scholar
Lim V (2016) How poor inventory management ruined Target Canada. Accessed April 10, 2020, https://www.tradegecko.com/blog/inventory-management/how-poor-inventory-management-ruined-target-canada.Google Scholar
Markowitz H (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
Perakis G, Roels G (2008) Regret in the newsvendor model with partial information. Oper. Res. 56(1):188–203.Google Scholar
Porteus E (2002) Foundations of Stochastic Inventory Theory (Stanford University Press, Stanford, CA).Google Scholar
Sidford A, Wang M, Wu X, Yang LF, Ye Y (2018) Near-optimal time and sample complexities for solving Markov decision processes with a generative model. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 5192–5202.Google Scholar
Sinclair S, Banerjee S, Yu C (2019) Adaptive discretization for episodic reinforcement learning in metric spaces. Proc. ACM on Measurement and Analysis of Comput. Systems (ACM, New York), 1–44.Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
Watkins C, Dayan P (1992) Technical note: Q-learning. Machine Learn. 8:279–292.Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Google Scholar
Zhao H, Chen W (2019) Stochastic one-sided full-information bandit. Proc. Eur. Conf. on Machine Learn. and Principles and Practice of Knowledge Discovery in Databases (Springer, Cham), 150–166.Google Scholar
Zipkin P (1989) Critical number policies for inventory models with periodic data. Management Sci. 35(1):71–80.Google Scholar
Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar

Volume 70, Issue 9

September 2024

Pages 5627-6482, iii-v

Article Information

Supplemental Material

Metrics

Information

Received:November 23, 2021
Accepted:September 08, 2022
Published Online:October 26, 2023

Cite as

Xiao-Yue Gong, David Simchi-Levi (2023) Bandits atop Reinforcement Learning: Tackling Online Inventory Models with Cyclic Demands. Management Science 70(9):6139-6157.

https://doi.org/10.1287/mnsc.2023.4947

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bandits atop Reinforcement Learning: Tackling Online Inventory Models with Cyclic Demands

References

Volume 70, Issue 9

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News