An Online Mirror Descent Learning Algorithm for Multiproduct Inventory Systems

Sichen Guo
Sichen Guo
[email protected]
https://orcid.org/0009-0002-7637-4829
Department of Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146; and Research Institute for Interdisciplinary Sciences, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
Search for more papers by this author
,
Cong Shi
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Department of Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146
Search for more papers by this author
,
Chaolin Yang
Chaolin Yang
[email protected]
https://orcid.org/0000-0001-8857-5877
Research Institute for Interdisciplinary Sciences, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China
Search for more papers by this author
,
Christos Zacharias
Corresponding Author
Christos Zacharias
[email protected]
https://orcid.org/0000-0002-9911-7860
Department of Management Science, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146
Search for more papers by this author

Department of Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146; and Research Institute for Interdisciplinary Sciences, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China

Search for more papers by this author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Department of Management, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146

Search for more papers by this author

Chaolin Yang

[email protected]

https://orcid.org/0000-0001-8857-5877

Research Institute for Interdisciplinary Sciences, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China

Search for more papers by this author

Christos Zacharias

Corresponding Author

Christos Zacharias

[email protected]

https://orcid.org/0000-0002-9911-7860

Department of Management Science, Miami Herbert Business School, University of Miami, Coral Gables, Florida 33146

Search for more papers by this author

Published Online:29 Apr 2026https://doi.org/10.1287/opre.2024.0982

References

Agrawal S, Devanur NR (2016) Linear contextual bandits with knapsacks. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Barcelona, Spain), 3450–3458.Google Scholar
Agrawal S, Jia R (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.Link, Google Scholar
Amazon (2024) Amazon 2023 annual report. Accessed April 22, 2025, https://s2.q4cdn.com/299287126/files/doc_financials/2024/ar/Amazon-com-Inc-2023-Annual-Report.pdf.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13.Crossref, Google Scholar
Balseiro SR, Lu H, Mirrokni V (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.Link, Google Scholar
Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
Bekci RY, Gümüş M, Miao S (2023) Inventory control and learning for one-warehouse multistore system with censored demand. Oper. Res. 71(6):2092–2110.Link, Google Scholar
Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
Beyer D, Sethi SP, Sridhar R (2001) Stochastic multi-product inventory models with limited storage. J. Optim. Theory Appl. 111:553–588. Crossref, Google Scholar
Beyer D, Sethi SP, Sridhar R (2002) Average-cost optimality of a base-stock policy for a multi-product inventory model with limited storage. Zaccour G, ed. Decision and Control in Management Science: Essays in Honor of Alain Haurie (Springer, New York), 241–260.Crossref, Google Scholar
Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. J. ACM 44(3):427–485.Crossref, Google Scholar
Chen B, Shi C (2025) Tailored base-surge policies in dual-sourcing inventory systems with demand learning. Oper. Res. 73(4):1723–1743.Link, Google Scholar
Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.Link, Google Scholar
Chen W, Shi C, Duenyas I (2020) Optimal learning algorithms for stochastic inventory systems with random capacities. Production Oper. Management 29(7):1624–1649.Crossref, Google Scholar
Chen B, Jiang J, Zhang J, Zhou Z (2024) Learning to order for inventory systems with lost sales and uncertain supplies. Management Sci. 70(12):8631–8646.Link, Google Scholar
Chen B, Simchi-Levi D, Wang Y, Zhou Y (2022) Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Sci. 68(8):5684–5703.Link, Google Scholar
Costco (2019) Costco wholesale annual report 2019. Accessed April 22, 2025, https://stocklight.com/stocks/us/nasdaq-cost/costco-wholesale/annual-reports/nasdaq-cost-2019-10K-191146791.pdf.Google Scholar
Duchi JC, Agarwal A, Johansson M, Jordan MI (2012) Ergodic mirror descent. SIAM J. Optim. 22(4):1549–1578.Crossref, Google Scholar
Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A (2010) Composite objective mirror descent. Kalai AT, Mohri M, eds. Proc. 23rd Conf. Learn. Theory (COLT) (Omnipress, Haifa, Israel), 14–26.Google Scholar
Fang H, Harvey NJ, Portella VS, Friedlander MP (2022) Online mirror descent and dual averaging: Keeping pace in the dynamic case. J. Machine Learn. Res. 23(1):5271–5308.Google Scholar
Federgruen A, Guetta D, Iyengar G, Liu X (2022) An asymptotically optimal heuristic for multi-item inventory models with joint inventory constraints. Working paper, Columbia University, New York.Google Scholar
Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Ignall E, Veinott AF (1969) Optimality of myopic inventory policies for several substitute products. Management Sci. 15(5):284–304.Link, Google Scholar
Jiang Y, Shi C, Shen S (2019) Service level constrained inventory systems. Production Oper. Management 28(9):2365–2389.Crossref, Google Scholar
Jin Y, Sidford A (2020) Efficiently solving MDPs with stochastic mirror descent. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 4890–4900.Google Scholar
Juditsky A, Nemirovski A, Tauvel C (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.Link, Google Scholar
Kelly J (2024) What’s causing the warehouse space shortage and how businesses are tackling it. Accessed January 20, 2024, https://www.newcastlesys.com/blog/whats-causing-the-warehouse-space-shortage-and-how-businesses-are-tackling-it.Google Scholar
Lan G (2012) An optimal method for stochastic composite optimization. Math. Programming 133(1):365–397.Crossref, Google Scholar
Lan G (2023) Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Math. Programming 198(1):1059–1106.Crossref, Google Scholar
Miao S, Wang Y, Zhao R (2023) Dynamic learning policy for multi-warehouse multi-store systems with censored demands. Working paper, University of Texas at Dallas, Richardson.Google Scholar
Nemirovski AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, New York).Google Scholar
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
Nesterov Y (2007) Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Programming 109(2):319–344.Crossref, Google Scholar
Shahrampour S, Jadbabaie A (2018) Distributed online optimization in dynamic environments using mirror descent. IEEE Trans. Automatic Control 63(3):714–725.Crossref, Google Scholar
Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.Crossref, Google Scholar
Shi C, Chen W, Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.Link, Google Scholar
Tang J, Chen B, Shi C (2024) Online learning for dual-index policies in dual-sourcing systems. Manufacturing Service Oper. Management 26(2):758–774.Link, Google Scholar
Tang J, Chen B, Shi C, Zhou Y (2025a) Fairness-constrained inventory control with demand learning. Working paper, University of Miami, Miami.Google Scholar
Tang J, Qi Z, Fang E, Shi C (2025b) Offline feature-based pricing under censored demand: A causal inference approach. Manufacturing Service Oper. Management 27(2):535–553.Link, Google Scholar
Tomar M, Shani L, Efroni Y, Ghavamzadeh M (2022) Mirror descent policy optimization. Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. Proc. Tenth Internat. Conf. Learn. Representations (ICLR).Google Scholar
Veinott AF Jr (1965) Optimal policy for a multi-product, dynamic, nonstationary inventory problem. Management Sci. 12(3):206–222.Link, Google Scholar
Xu Z, Ji X, Chen M, Wang M, Zhao T (2024) Sample complexity of neural policy mirror descent for policy optimization on low-dimensional manifolds. J. Machine Learn. Res. 25(226):1–67.Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.Link, Google Scholar
Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zhou Z, Mertikopoulos P, Bambos N, Boyd S, Glynn PW (2017) Stochastic mirror descent in variationally coherent optimization problems. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 9397–9406.Google Scholar
Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar
Zipkin P (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.Link, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:April 24, 2024
Accepted:February 22, 2026
Published Online:April 29, 2026

Cite as

Sichen Guo, Cong Shi, Chaolin Yang, Christos Zacharias (2026) An Online Mirror Descent Learning Algorithm for Multiproduct Inventory Systems. Operations Research 0(0).

https://doi.org/10.1287/opre.2024.0982

Keywords

Acknowledgments

The authors thank area editor Rouba Ibrahim, the associate editor, and two anonymous referees for their careful reading and constructive comments, which led to several substantial improvements to the paper. Part of this research was conducted while Sichen Guo was at the Miami Herbert Business School at the University of Miami as a visiting PhD student.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

An Online Mirror Descent Learning Algorithm for Multiproduct Inventory Systems

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News