An Online Mirror Descent Learning Algorithm for Multiproduct Inventory Systems

Published Online:https://doi.org/10.1287/opre.2024.0982

References

  • Agrawal S, Devanur NR (2016) Linear contextual bandits with knapsacks. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Barcelona, Spain), 3450–3458.Google Scholar
  • Agrawal S, Jia R (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.LinkGoogle Scholar
  • Amazon (2024) Amazon 2023 annual report. Accessed April 22, 2025, https://s2.q4cdn.com/299287126/files/doc_financials/2024/ar/Amazon-com-Inc-2023-Annual-Report.pdf.Google Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13.CrossrefGoogle Scholar
  • Balseiro SR, Lu H, Mirrokni V (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.LinkGoogle Scholar
  • Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.CrossrefGoogle Scholar
  • Bekci RY, Gümüş M, Miao S (2023) Inventory control and learning for one-warehouse multistore system with censored demand. Oper. Res. 71(6):2092–2110.LinkGoogle Scholar
  • Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.LinkGoogle Scholar
  • Beyer D, Sethi SP, Sridhar R (2001) Stochastic multi-product inventory models with limited storage. J. Optim. Theory Appl. 111:553–588. CrossrefGoogle Scholar
  • Beyer D, Sethi SP, Sridhar R (2002) Average-cost optimality of a base-stock policy for a multi-product inventory model with limited storage. Zaccour G, ed. Decision and Control in Management Science: Essays in Honor of Alain Haurie (Springer, New York), 241–260.CrossrefGoogle Scholar
  • Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. J. ACM 44(3):427–485.CrossrefGoogle Scholar
  • Chen B, Shi C (2025) Tailored base-surge policies in dual-sourcing inventory systems with demand learning. Oper. Res. 73(4):1723–1743.LinkGoogle Scholar
  • Chen B, Chao X, Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand. Math. Oper. Res. 46(2):726–756.LinkGoogle Scholar
  • Chen W, Shi C, Duenyas I (2020) Optimal learning algorithms for stochastic inventory systems with random capacities. Production Oper. Management 29(7):1624–1649.CrossrefGoogle Scholar
  • Chen B, Jiang J, Zhang J, Zhou Z (2024) Learning to order for inventory systems with lost sales and uncertain supplies. Management Sci. 70(12):8631–8646.LinkGoogle Scholar
  • Chen B, Simchi-Levi D, Wang Y, Zhou Y (2022) Dynamic pricing and inventory control with fixed ordering cost and incomplete demand information. Management Sci. 68(8):5684–5703.LinkGoogle Scholar
  • Costco (2019) Costco wholesale annual report 2019. Accessed April 22, 2025, https://stocklight.com/stocks/us/nasdaq-cost/costco-wholesale/annual-reports/nasdaq-cost-2019-10K-191146791.pdf.Google Scholar
  • Duchi JC, Agarwal A, Johansson M, Jordan MI (2012) Ergodic mirror descent. SIAM J. Optim. 22(4):1549–1578.CrossrefGoogle Scholar
  • Duchi JC, Shalev-Shwartz S, Singer Y, Tewari A (2010) Composite objective mirror descent. Kalai AT, Mohri M, eds. Proc. 23rd Conf. Learn. Theory (COLT) (Omnipress, Haifa, Israel), 14–26.Google Scholar
  • Fang H, Harvey NJ, Portella VS, Friedlander MP (2022) Online mirror descent and dual averaging: Keeping pace in the dynamic case. J. Machine Learn. Res. 23(1):5271–5308.Google Scholar
  • Federgruen A, Guetta D, Iyengar G, Liu X (2022) An asymptotically optimal heuristic for multi-item inventory models with joint inventory constraints. Working paper, Columbia University, New York.Google Scholar
  • Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.CrossrefGoogle Scholar
  • Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.LinkGoogle Scholar
  • Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.LinkGoogle Scholar
  • Ignall E, Veinott AF (1969) Optimality of myopic inventory policies for several substitute products. Management Sci. 15(5):284–304.LinkGoogle Scholar
  • Jiang Y, Shi C, Shen S (2019) Service level constrained inventory systems. Production Oper. Management 28(9):2365–2389.CrossrefGoogle Scholar
  • Jin Y, Sidford A (2020) Efficiently solving MDPs with stochastic mirror descent. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 4890–4900.Google Scholar
  • Juditsky A, Nemirovski A, Tauvel C (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.LinkGoogle Scholar
  • Kelly J (2024) What’s causing the warehouse space shortage and how businesses are tackling it. Accessed January 20, 2024, https://www.newcastlesys.com/blog/whats-causing-the-warehouse-space-shortage-and-how-businesses-are-tackling-it.Google Scholar
  • Lan G (2012) An optimal method for stochastic composite optimization. Math. Programming 133(1):365–397.CrossrefGoogle Scholar
  • Lan G (2023) Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Math. Programming 198(1):1059–1106.CrossrefGoogle Scholar
  • Miao S, Wang Y, Zhao R (2023) Dynamic learning policy for multi-warehouse multi-store systems with censored demands. Working paper, University of Texas at Dallas, Richardson.Google Scholar
  • Nemirovski AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, New York).Google Scholar
  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • Nesterov Y (2007) Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Programming 109(2):319–344.CrossrefGoogle Scholar
  • Shahrampour S, Jadbabaie A (2018) Distributed online optimization in dynamic environments using mirror descent. IEEE Trans. Automatic Control 63(3):714–725.CrossrefGoogle Scholar
  • Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.CrossrefGoogle Scholar
  • Shi C, Chen W, Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.LinkGoogle Scholar
  • Tang J, Chen B, Shi C (2024) Online learning for dual-index policies in dual-sourcing systems. Manufacturing Service Oper. Management 26(2):758–774.LinkGoogle Scholar
  • Tang J, Chen B, Shi C, Zhou Y (2025a) Fairness-constrained inventory control with demand learning. Working paper, University of Miami, Miami.Google Scholar
  • Tang J, Qi Z, Fang E, Shi C (2025b) Offline feature-based pricing under censored demand: A causal inference approach. Manufacturing Service Oper. Management 27(2):535–553.LinkGoogle Scholar
  • Tomar M, Shani L, Efroni Y, Ghavamzadeh M (2022) Mirror descent policy optimization. Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. Proc. Tenth Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Veinott AF Jr (1965) Optimal policy for a multi-product, dynamic, nonstationary inventory problem. Management Sci. 12(3):206–222.LinkGoogle Scholar
  • Xu Z, Ji X, Chen M, Wang M, Zhao T (2024) Sample complexity of neural policy mirror descent for policy optimization on low-dimensional manifolds. J. Machine Learn. Res. 25(226):1–67.Google Scholar
  • Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.LinkGoogle Scholar
  • Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.LinkGoogle Scholar
  • Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.LinkGoogle Scholar
  • Zhou Z, Mertikopoulos P, Bambos N, Boyd S, Glynn PW (2017) Stochastic mirror descent in variationally coherent optimization problems. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 9397–9406.Google Scholar
  • Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar
  • Zipkin P (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.