UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost-Sales Inventory Models with Lead Times

Chengyi Lyu
Chengyi Lyu
[email protected]
https://orcid.org/0009-0007-6691-8462
Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309;
Search for more papers by this author
,
Huanan Zhang
Corresponding Author
Huanan Zhang
[email protected]
https://orcid.org/0000-0002-0672-5227
Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309;
Search for more papers by this author
,
Linwei Xin
Linwei Xin
[email protected]
https://orcid.org/0000-0002-8160-6877
Booth School of Business, University of Chicago, Chicago, Illinois 60637
Search for more papers by this author

Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309;

Search for more papers by this author

Huanan Zhang

Corresponding Author

Huanan Zhang

[email protected]

https://orcid.org/0000-0002-0672-5227

Leeds School of Business, University of Colorado Boulder, Boulder, Colorado 80309;

Search for more papers by this author

Linwei Xin

[email protected]

https://orcid.org/0000-0002-8160-6877

Booth School of Business, University of Chicago, Chicago, Illinois 60637

Search for more papers by this author

Published Online:29 Feb 2024https://doi.org/10.1287/opre.2022.0273

References

Agrawal R (1995) Sample mean based index policies by o (log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054–1078.Crossref, Google Scholar
Agrawal S, Jia R (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Bu J, Gong X, Yao D (2020) Constant-order policies for lost-sales inventory models with random supply functions: Asymptotics and heuristic. Oper. Res. 68(4):1063–1073.Link, Google Scholar
Burtini G, Loeppky J, Lawrence R (2015) A survey of online experiment design with the stochastic multi-armed bandit. Preprint, submitted October 2, https://arxiv.org/abs/1510.00757.Google Scholar
Chen B, Chao X (2020) Dynamic inventory control with stockout substitution and demand learning. Management Sci. 66(11):5108–5127.Link, Google Scholar
Chen B, Chao X, Ahn H-S (2019a) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.Abstract, Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Chen B, Chao X, Wang Y (2020) Technical note-data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.Link, Google Scholar
Chen X, Stolyar AL, Xin L (2024) Asymptotic optimality of constant-order policies in joint pricing and inventory control models. Math. Oper. Res. 49(1):557–577.Google Scholar
Chen B, Jiang J, Zhang J, Zhou Z (2022) Learning to order for inventory systems with lost sales and uncertain supplies. Preprint, submitted July 10, https://arxiv.org/abs/2207.04550.Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
Gao X, Jasin S, Najafi S, Zhang H (2022) Joint learning and optimization for multi-product pricing (and ranking) under a general cascade click model. Management Sci. 68(10):7362–7382.Link, Google Scholar
Goldberg DA, Katz-Rogozhnikov DA, Lu Y, Sharma M, Squillante MS (2016) Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Math. Oper. Res. 41(3):898–913.Link, Google Scholar
Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009b) Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Sci. 55(3):404–420.Link, Google Scholar
Huh WT, Levi R, Rusmevichientong P, Orlin JB (2011) Adaptive data-driven inventory control with censored demand based on Kaplan-Meier estimator. Oper. Res. 59(4):929–941.Link, Google Scholar
Janakiraman G, Roundy RO (2004) Lost-sales problems with stochastic lead times: Convexity results for base-stock policies. Oper. Res. 52(5):795–803.Link, Google Scholar
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53(282):457–481.Crossref, Google Scholar
Karlin S, Scarf H (1958) Inventory models of the Arrow-Harris-Marschak type with time lag. Arrow KJ, Karlin S, Scarf H, eds. Studies in the Mathematical Theory of Inventory and Production (Stanford University Press, Stanford, CA), 155–178.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Levi R, Janakiraman G, Nagarajan M (2008) A 2-approximation algorithm for stochastic inventory control models with lost sales. Math. Oper. Res. 33(2):351–374.Link, Google Scholar
Reiman MI (2004) A new and simple policy for the continuous review lost sales inventory model. Unpublished manuscript, Columbia University, New York.Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Shi C, Chen W, Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.Link, Google Scholar
Simchi-Levi D, Sun R, Zhang H (2022) Online learning and optimization for revenue management problems with add-on discounts. Management Sci. 68(10):7402–7421.Link, Google Scholar
Wei L, Jasin S, Xin L (2021) On a deterministic approximation of inventory systems with sequential probabilistic service level constraints. Oper. Res. 69(4):1057–1076.Link, Google Scholar
Xin L (2021) Understanding the performance of capped base-stock policies in lost-sales inventory models. Oper. Res. 69(1):61–70.Link, Google Scholar
Xin L (2022) 1.79-approximation algorithms for continuous review single-sourcing lost-sales and dual-sourcing inventory models. Oper. Res. 70(1):111–128.Link, Google Scholar
Xin L, Goldberg DA (2016) Optimality gap of constant-order policies decays exponentially in the lead time for lost sales models. Oper. Res. 64(6):1556–1565.Link, Google Scholar
Xin L, Goldberg DA (2018) Asymptotic optimality of tailored base-surge policies in dual-sourcing inventory systems. Management Sci. 64(1):437–452.Link, Google Scholar
Xin L, He L, Bewli J, Bowman J, Feng H, Qin Z (2017) On the performance of tailored base-surge policies: Theory and application at Walmart.com. Preprint, submitted December 20, https://dx.doi.org/10.2139/ssrn.3090177.Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhang H, Jasin S (2022) Online learning and optimization of (some) cyclic pricing policies in the presence of patient customers. Manufacturing Service Oper. Management 24(2):1165–1182.Link, Google Scholar
Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.Link, Google Scholar
Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zipkin P (2008) Old and new methods for lost-sales inventory systems. Oper. Res. 56(5):1256–1263.Link, Google Scholar

Volume 72, Issue 4

July-August 2024

Pages iii-vi, 1317-1750, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:May 26, 2022
Accepted:January 16, 2024
Published Online:February 29, 2024

Cite as

Chengyi Lyu, Huanan Zhang, Linwei Xin (2024) UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost-Sales Inventory Models with Lead Times. Operations Research 72(4):1317-1332.

https://doi.org/10.1287/opre.2022.0273

Keywords

Acknowledgments

The authors thank the department editor (Tava Olsen), the associate editor, and the referees whose comments and guidance throughout the review process have greatly improved both the content and the exposition of the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

UCB-Type Learning Algorithms with Kaplan–Meier Estimator for Lost-Sales Inventory Models with Lead Times

References

Volume 72, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News