A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy

Jiameng Lyu
Corresponding Author
Jiameng Lyu
[email protected]
https://orcid.org/0000-0002-4688-5276
Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Jinxing Xie
Corresponding Author
Jinxing Xie
[email protected]
https://orcid.org/0000-0002-9269-6468
Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Shilin Yuan
Corresponding Author
Shilin Yuan
[email protected]
https://orcid.org/0009-0002-7892-0344
Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Yuan Zhou
Corresponding Author
Yuan Zhou
[email protected]
https://orcid.org/0009-0008-1706-6539
Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China; and Beijing Institute of Mathematical Sciences and Application, Beijing 100084, China
Search for more papers by this author

Corresponding Author

Jiameng Lyu

Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Jinxing Xie

Corresponding Author

Jinxing Xie

[email protected]

https://orcid.org/0000-0002-9269-6468

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Shilin Yuan

Corresponding Author

Shilin Yuan

[email protected]

https://orcid.org/0009-0002-7892-0344

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Yuan Zhou

Corresponding Author

Yuan Zhou

[email protected]

https://orcid.org/0009-0008-1706-6539

Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China; and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China; and Beijing Institute of Mathematical Sciences and Application, Beijing 100084, China

Search for more papers by this author

Published Online:9 Oct 2024https://doi.org/10.1287/mnsc.2023.00920

References

Agrawal S, Jia R (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.Link, Google Scholar
Aouad A, Elmachtoub AN, Ferreira KJ, McNellis R (2023) Market segmentation trees. Manufacturing Service Oper. Management 25(2):648–667.Link, Google Scholar
Bekci RY, Gümüş M, Miao S (2023) Inventory control and learning for one-warehouse multistore system with censored demand. Oper. Res. 71(6):2092–2110.Link, Google Scholar
Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Link, Google Scholar
Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
Bubeck S (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3–4):231–357.Crossref, Google Scholar
Burnetas AN, Smith CE (2000) Adaptive ordering and pricing for perishable products. Oper. Res. 48(3):436–443.Link, Google Scholar
Chao X, Jasin S, Miao S (2024) Adaptive algorithms for multi-warehouse multi-store inventory system with lost sales and fixed replenishment cost. Oper. Res., ePub ahead of print April 16, https://doi.org/10.1287/opre.2022.0668.Link, Google Scholar
Chen B, Jiang J, Zhang J, Zhou Z (2024) Learning to order for inventory systems with lost sales and uncertain supplies. Management Sci., ePub ahead of print March 4, https://doi.org/10.1287/mnsc.2022.02476.Link, Google Scholar
Chen F, Zheng YS (1994) Lower bounds for multi-echelon stochastic inventory systems. Management Sci. 40(11):1426–1443.Link, Google Scholar
Chen F, Federgruen A, Zheng YS (2001) Coordination mechanisms for a distribution system with one supplier and multiple retailers. Management Sci. 47(5):693–708.Link, Google Scholar
Chen W, Shi C, Duenyas I (2020) Optimal learning algorithms for stochastic inventory systems with random capacities. Production Oper. Management 29(7):1624–1649.Crossref, Google Scholar
Cheung WC, Simchi-Levi D (2019) Sampling-based approximation schemes for capacitated stochastic inventory control models. Math. Oper. Res. 44(2):668–692.Link, Google Scholar
Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Management Sci. 6(4):475–490.Link, Google Scholar
DeCroix GA, Arreola-Risa A (1998) Optimal production and inventory policy for multiple products under resource constraints. Management Sci. 44(7):950–961.Link, Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J. Machine Learn. Res. 13(1):165–202.Google Scholar
den Boer AV, Keskin NB (2020) Discontinuous demand functions: Estimation and pricing. Management Sci. 66(10):4516–4534.Link, Google Scholar
Ding J, Huh WT, Rong Y (2024) Feature-based inventory control with censored demand. Manufacturing Service Oper. Management 26(3):1157–1172.Link, Google Scholar
Downs B, Metters R, Semple J (2001) Managing inventory with multiple products, lags in delivery, resource constraints, and lost sales: A mathematical programming approach. Management Sci. 47(3):464–479.Link, Google Scholar
Erlebacher SJ (2000) Optimal and heuristic solutions for the multi-item newsvendor problem with a single capacity constraint. Production Oper. Management 9(3):303–318.Crossref, Google Scholar
Gao X, Zhang H (2022a) An efficient learning framework for multiproduct inventory systems with customer choices. Production Oper. Management 31(6):2492–2516.Crossref, Google Scholar
Gao X, Zhang H (2022b) Inventory control with censored demand. Chen X, Jasin S, Shi C, eds. The Elements of Joint Learning and Optimization in Operations Management, Springer Series in Supply Chain Management, vol. 18 (Springer International Publishing, Cham, Switzerland), 273–303.Crossref, Google Scholar
Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1):267–305.Crossref, Google Scholar
Hazan E, Kale S (2014) Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization. J. Machine Learn. Res. 15(1):2489–2512.Google Scholar
Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT, Rusmevichientong P (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150–159.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009b) Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Sci. 55(3):404–420.Link, Google Scholar
Huh WT, Levi R, Rusmevichientong P, Orlin JB (2011) Adaptive data-driven inventory control with censored demand based on Kaplan-Meier estimator. Oper. Res. 59(4):929–941.Link, Google Scholar
Ignall E, Veinott AF Jr (1969) Optimality of myopic inventory policies for several substitute products. Management Sci. 15(5):284–304.Link, Google Scholar
Jennings JB (1973) Blood bank inventory control. Management Sci. 19(6):637–645.Link, Google Scholar
Jofré A, Thompson P (2019) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Programming 174(1):253–292.Crossref, Google Scholar
Keskin NB, Li Y, Song JS (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.Link, Google Scholar
Keskin NB, Li Y, Sunar N (2020) Data-driven clustering and feature-based retail electricity pricing with smart meters. Preprint, submitted July 19, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3686518.Google Scholar
Keskin NB, Min X, Song JSJ (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3866171.Google Scholar
Lau HS, Lau AHL (1995) The multi-product multi-constraint newsboy problem: Applications, formulation and solution. J. Oper. Management 13(2):153–162.Crossref, Google Scholar
Lei M, Liu S, Jasin S, Vakhutinsky A (2024) Joint inventory and pricing for a one-warehouse multistore problem: Spiraling phenomena, near optimal policies, and the value of dynamic pricing. Oper. Res. 72(2):738–762.Link, Google Scholar
Levi R, Perakis G, Uichanco J (2015) The data-driven newsvendor problem: New bounds and insights. Oper. Res. 63(6):1294–1306.Link, Google Scholar
Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. Kpotufe S, Perlich C, Leskovec J, Wang W, Ghani R, eds. Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 661–670.Google Scholar
Lin M, Huh WT, Krishnan H, Uichanco J (2022) Data-driven newsvendor problem: Performance of the sample average approximation. Oper. Res. 70(4):1996–2012.Link, Google Scholar
Miao S, Jasin S, Chao X (2022) Asymptotically optimal lagrangian policies for multi-warehouse, multi-store systems with lost sales. Oper. Res. 70(1):141–159.Link, Google Scholar
Nahmias S (1976) Myopic approximations for the perishable inventory problem. Management Sci. 22(9):1002–1008.Link, Google Scholar
Nemirovskij AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, New York).Google Scholar
Roundy R (1985) 98%-Effective integer-ratio lot-sizing for one-warehouse multi-retailer systems. Management Sci. 31(11):1416–1430.Link, Google Scholar
Shang KH, Song JS (2003) Newsvendor bounds and heuristic for optimal policies in serial supply chains. Management Sci. 49(5):618–638.Link, Google Scholar
Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
Shi C, Chen W, Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.Link, Google Scholar
Snyder LV, Shen ZJM (2019) Fundamentals of Supply Chain Theory (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Turken N, Tan Y, Vakharia AJ, Wang L, Wang R, Yenipazarli A (2012) The multi-product newsvendor problem: Review, extensions, and directions for future research. Choi T-M, ed. Handbook of Newsvendor Problems, International Series in Operations Research & Management Science, vol. 176 (Springer, New York), 3–39.Crossref, Google Scholar
Wang J (1985) Distribution sensitivity analysis for stochastic programs with complete recourse. Math. Programming 31(3):286–297.Crossref, Google Scholar
Yang C, Huh WT (2024) A nonparametric learning algorithm for a stochastic multi-echelon inventory problem. Production Oper. Management 33(3):701–720.Crossref, Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.Link, Google Scholar
Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zhang K, Gao X, Wang Z, Zhou S (2024) Sampling-based approximation for serial multi-echelon inventory system. Management Sci. Forthcoming.Google Scholar
Zhang M, Chen S, Luo H, Wang Y (2023) No-regret learning in two-echelon supply chain with unknown demand distribution. Ruiz F, Dy J, van de Meent J-W, eds. Proc. 26th Internat. Conf. Artificial Intelligence Statist. (PMLR, Ann Arbor, MI), 3270–3298.Google Scholar

Volume 71, Issue 7

July 2025

Pages iv-vi, 5419-6318

Article Information

Supplemental Material

Metrics

Information

Received:March 23, 2023
Accepted:August 18, 2024
Published Online:October 09, 2024

Cite as

Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou (2024) A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy. Management Science 71(7):5572-5588.

https://doi.org/10.1287/mnsc.2023.00920

Keywords

Acknowledgments

The authors thank the department editor, associate editor, and three anonymous referees for detailed and constructive comments that considerably improved the quality of this paper. The authors Jiameng Lyu, Jinxing Xie, Shilin Yuan, and Yuan Zhou are listed in alphabetical order.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy

References

Volume 71, Issue 7

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News