Control of Dual-Sourcing Inventory Systems Using Recurrent Neural Networks

Published Online:https://doi.org/10.1287/ijoc.2022.0136

References

  • Allon G, Van Mieghem JA (2010) Global dual sourcing: Tailored base-surge allocation to near- and offshore production. Management Sci. 56(1):110–124.LinkGoogle Scholar
  • Arrow KJ, Harris T, Marschak J (1951) Optimal inventory policy. Econometrica 19(3):250–272.CrossrefGoogle Scholar
  • Asikis T (2021) Multi-objective optimization for value-sensitive and sustainable basket recommendations. Preprint, submitted November 10, https://arxiv.org/abs/2111.05944.Google Scholar
  • Asikis T, Böttcher L, Antulov-Fantulin N (2022) Neural ordinary differential equation control of dynamics on graphs. Physical Rev. Res. 4(1):013221.CrossrefGoogle Scholar
  • Axsäter S (2007) A heuristic for triggering emergency orders in an inventory system. Eur. J. Oper. Res. 176(2):880–891.CrossrefGoogle Scholar
  • Barankin E (1961) A delivery-lag inventory model with an emergency provision. Naval Res. Logistics Quart. 8:285–311.Google Scholar
  • Barron JT (2017) Continuously differentiable exponential linear units. Preprint, submitted April 24, https://arxiv.org/abs/1704.07483.Google Scholar
  • Baxter J (2000) A model of inductive bias learning. J. Artificial Intelligence Res. 12(1):149–198.CrossrefGoogle Scholar
  • Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2018) Automatic differentiation in machine learning: A survey. J. Machine Learn. Res. 18(1):5595–5637.Google Scholar
  • Bengio Y, Léonard N, Courville A (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint, submitted August 15, https://arxiv.org/abs/1308.3432.Google Scholar
  • Böttcher L, Asikis T (2022) Near-optimal control of dynamical systems with neural ordinary differential equations. Machine Learn. Sci. Tech. 3(4):045004.CrossrefGoogle Scholar
  • Böttcher L, Antulov-Fantulin N, Asikis T (2022) AI Pontryagin or how neural networks learn to control dynamical systems. Nature Comm. 13:333.CrossrefGoogle Scholar
  • Böttcher L, Asikis T, Fragkos I (2023) GitHub repository. https://github.com/INFORMSJoC/2022.0136.Google Scholar
  • Botvinick M, Ritter S, Wang JX, Kurth-Nelson Z, Blundell C, Hassabis D (2019) Reinforcement learning, fast and slow. Trends Cognitive Sci. 23(5):408–422.CrossrefGoogle Scholar
  • Boute RN, Disney SM, Gijsbrechts J, Van Mieghem JA (2022a) Dual sourcing and smoothing under nonstationary demand time series: Reshoring with speed factories. Management Sci. 68(2):1039–1057.LinkGoogle Scholar
  • Boute RN, Gijsbrechts J, van Jaarsveld W, Vanvuchelen N (2022b) Deep reinforcement learning for inventory control: A roadmap. Eur. J. Oper. Res. 298(2):401–412.CrossrefGoogle Scholar
  • Bozinovski S, Fulgosi A (1976) The influence of pattern similarity and transfer learning upon the training of a base perceptron B2. Proc. Sympos. Informatica, 3–121.Google Scholar
  • Chen B, Shi C (2019) Tailored base-surge policies in dual-sourcing inventory systems with demand learning. Preprint, submitted September 27, https://dx.doi.org/10.2139/ssrn.3456834.Google Scholar
  • Chen RT, Amos B, Nickel M (2021) Learning neural event functions for ordinary differential equations. 9th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Chen TQ, Rubanova Y, Bettencourt J, Duvenaud D (2018) Neural ordinary differential equations. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Montréal, Canada), 6572–6583.Google Scholar
  • DeCroix GA, Arreola-Risa A (1998) Optimal production and inventory policy for multiple products under resource constraints. Management Sci. 44(7):950–961.LinkGoogle Scholar
  • Degris T, White M, Sutton RS (2012) Linear off-policy actor-critic. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, Madison, WI).Google Scholar
  • Douglas SC, Yu J (2018) Why ReLU units sometimes die: Analysis of single-unit error backpropagation in neural networks. 52nd Asilomar Conf. Signals Systems Comput. (IEEE, Piscataway, NJ), 864–868.Google Scholar
  • Dulac-Arnold G, Evans R, van Hasselt H, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015) Deep reinforcement learning in large discrete action spaces. Preprint, submitted December 24, https://arxiv.org/abs/1512.07679.Google Scholar
  • Feldkamp L, Puskorius G (1993) Neural network control of an unstable process. Proc. 36th Midwest Sympos. Circuits Systems (IEEE, Piscataway, NJ), 35–40.Google Scholar
  • Fox EJ, Metters R, Semple J (2006) Optimal inventory policy with two suppliers. Oper. Res. 54(2):389–393.LinkGoogle Scholar
  • Fukuda Y (1964) Optimal policies for the inventory problem with negotiable leadtime. Management Sci. 10(4):690–708.LinkGoogle Scholar
  • Gijsbrechts J, Boute RN, Van Mieghem JA, Zhang DJ (2022) Can deep reinforcement learning improve inventory management? Performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing Service Oper. Management 24(3):1349–1368.LinkGoogle Scholar
  • Goldberg DA, Reiman MI, Wang Q (2021) A survey of recent progress in the asymptotic analysis of inventory systems. Production Oper. Management 30(6):1718–1750.CrossrefGoogle Scholar
  • Hanin B, Sellke M (2017) Approximating continuous functions by ReLU nets of minimal width. Preprint, submitted October 31, https://arxiv.org/abs/1710.11278.Google Scholar
  • Hinton G (2016) RMSprop—PyTorch 1.10.0 documentation. Accessed June 11, 2023, https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html.Google Scholar
  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.CrossrefGoogle Scholar
  • Holl P, Thuerey N, Koltun V (2020) Learning to control PDEs with differentiable physics. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.CrossrefGoogle Scholar
  • Hua Z, Yu Y, Zhang W, Xu X (2015) Structural properties of the optimal policy for dual-sourcing systems with general lead times. IIE Trans. 47(8):841–850.CrossrefGoogle Scholar
  • Huggins EL, Olsen TL (2010) Inventory control with generalized expediting. Oper. Res. 58(5):1414–1426.LinkGoogle Scholar
  • Janakiraman G, Seshadri S, Sheopuri A (2015) Analysis of tailored base-surge policies in dual sourcing inventory systems. Management Sci. 61(7):1547–1561.LinkGoogle Scholar
  • Jiang Y, Shi C, Shen S (2019) Service level constrained inventory systems. Production Oper. Management 28(9):2365–2389.CrossrefGoogle Scholar
  • Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (NeurIPS, San Diego, CA).Google Scholar
  • Jin W, Wang Z, Yang Z, Mou S (2020) Pontryagin differentiable programming: An end-to-end learning and control framework. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (NeurIPS, San Diego, CA).Google Scholar
  • Johansen SG, Thorstenson A (2014) Emergency orders in the periodic-review inventory system with fixed ordering costs and compound Poisson demand. Internat. J. Production Econom. 157:147–157.CrossrefGoogle Scholar
  • Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nature Rev. Phys. 3(6):422–440.CrossrefGoogle Scholar
  • LeCun YA, Bottou L, Orr GB, Müller KR (1998) Efficient backprop. Orr GB, Müller KR, eds. Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol. 1524 (Springer, Berlin), 9–48.CrossrefGoogle Scholar
  • Lin X, Hou ZJ, Ren H, Pan F (2019) Approximate mixed-integer programming solution with machine learning technique and linear programming relaxation. Third Internat. Conf. Smart Grid Smart Cities (IEEE, Piscataway, NJ), 101–107.Google Scholar
  • Linnainmaa S (1976) Taylor expansion of the accumulated rounding error. BIT 16(2):146–160.CrossrefGoogle Scholar
  • Liu J, Chen W, Yang J, Xiong H, Chen C (2022) Iterative prediction-and-optimization for e-logistics distribution network design. INFORMS J. Comput. 34(2):769–789.LinkGoogle Scholar
  • Lutter M, Ritter C, Peters J (2019) Deep Lagrangian networks: Using physics as model prior for deep learning. 7th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Mak HY, Dai T, Tang CS (2021) Managing two-dose COVID-19 vaccine rollouts with limited supply: Operations strategies for distributing time-sensitive resources. Production Oper. Management 31(12):4424–4442.Google Scholar
  • Manary MP, Willems SP (2021) Data set: 187 weeks of customer forecasts and orders for microprocessors from Intel corporation. Manufacturing Service Oper. Management 24(1):682–689.LinkGoogle Scholar
  • Morton TE (1971) The near-myopic nature of the lagged-proportional-cost inventory problem with lost sales. Oper. Res. 19(7):1708–1716.LinkGoogle Scholar
  • Mowlavi S, Nabi S (2023) Optimal control of PDEs using physics-informed neural networks. J. Comput. Phys. 473:111731.CrossrefGoogle Scholar
  • Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 807–814.Google Scholar
  • Nair V, Bartunov S, Gimeno F, von Glehn I, Lichocki P, Lobov I, O’Donoghue B, et al. (2020) Solving mixed integer programs using neural networks. Preprint, submitted December 23, https://arxiv.org/abs/2012.13349.Google Scholar
  • Park S, Yun C, Lee J, Shin J (2020) Minimum width for universal approximation. Preprint, submitted June 16, https://arxiv.org/abs/2006.08859.Google Scholar
  • Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. This work was part of the NIPS 2017 Autodiff workshop.Google Scholar
  • Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: Applications on robotics. J. Intelligent Robotic Systems 86(2):153–173.CrossrefGoogle Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378:686–707.CrossrefGoogle Scholar
  • Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S (2013) Gaussian processes for time-series modelling. Philos. Trans. Roy. Soc. A Math. Physical Engrg. Sci. 371(1984):20110550.CrossrefGoogle Scholar
  • Roehrl MA, Runkler TA, Brandtstetter V, Tokic M, Obermayer S (2020) Modeling system dynamics with physics-informed neural networks based on Lagrangian mechanics. IFAC-PapersOnLine 53(2):9195–9200.CrossrefGoogle Scholar
  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536.CrossrefGoogle Scholar
  • Scarf H, Karlin S (1958) Inventory models of the Arrow-Harris-Marschak type with time lag. Arrow KJ, Karlin S, Scarf HE, eds. Studies in the Mathematical Theory of Inventory and Production (Stanford University Press, Stanford, CA).Google Scholar
  • Scheller-Wolf A, Veeraraghavan S, van Houtum GJ (2007) Effective dual sourcing with a single index policy. Working paper, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  • Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117.CrossrefGoogle Scholar
  • Schmitt S, Hessel M, Simonyan K (2020) Off-policy actor-critic with shared experience replay. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 8545–8554.Google Scholar
  • Sheopuri A, Janakiraman G, Seshadri S (2010) New policies for the stochastic inventory control problem with two supply sources. Oper. Res. 58(3):734–745.LinkGoogle Scholar
  • Song JS, Zipkin P (1993) Inventory control in a fluctuating demand environment. Oper. Res. 41(2):351–370.LinkGoogle Scholar
  • Song JS, van Houtum GJ, Van Mieghem JA (2020) Capacity and inventory management: Review, trends, and projections. Manufacturing Service Oper. Management 22(1):36–46.LinkGoogle Scholar
  • Song JS, Xiao L, Zhang H, Zipkin P (2017) Optimal policies for a dual-sourcing inventory problem with endogenous stochastic lead times. Oper. Res. 65(2):379–395.LinkGoogle Scholar
  • Song JS, Xiao L, Zhang H, Zipkin P (2021) Smart policies for multisource inventory systems and general tandem queues with order tracking and expediting. Oper. Res. 70(4):2421–2438.LinkGoogle Scholar
  • Sun J, Van Mieghem JA (2019) Robust dual sourcing inventory management: Optimality of capped dual index policies and smoothing. Manufacturing Service Oper. Management 21(4):912–931.LinkGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Svoboda J, Minner S, Yao M (2021) Typology and literature review on multiple supplier inventory control models. Eur. J. Oper. Res. 293(1):1–23.CrossrefGoogle Scholar
  • van Hasselt H, Wiering MA (2009) Using continuous action spaces to solve discrete problems. 2009 Internat. Joint Conf. Neural Networks (IEEE, Piscataway, NJ), 1149–1156.Google Scholar
  • van Hasselt H, Doron Y, Strub F, Hessel M, Sonnerat N, Modayil J (2018) Deep reinforcement learning and the deadly triad. Preprint, submitted December 6, https://arxiv.org/abs/1812.02648.Google Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (NeurIPS, San Diego).Google Scholar
  • Veeraraghavan S, Scheller-Wolf A (2008) Now or later: A simple policy for effective dual sourcing in capacitated systems. Oper. Res. 56(4):850–864.LinkGoogle Scholar
  • Veinott AF Jr (1965) The optimal inventory policy for batch ordering. Oper. Res. 13(3):424–432.LinkGoogle Scholar
  • Wang W, Axelrod S, Gómez-Bombarelli R (2020) Differentiable molecular simulations for control and learning. Preprint, submitted February 27, https://arxiv.org/abs/2003.00868.Google Scholar
  • Wang X, Xiong W, Wang H, Wang WY (2018) Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision (Springer International Publishing, Cham, Switzerland), 38–55.Google Scholar
  • Wang YJ, Lin CT (1998) Runge-Kutta neural network for identification of dynamical systems in high accuracy. IEEE Trans. Neural Networks 9(2):294–307.CrossrefGoogle Scholar
  • Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2017) Sample efficient actor-critic with experience replay. 5th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Werbos PJ (1990) Backpropagation through time: What it does and how to do it. Proc. IEEE. 78(10):1550–1560.CrossrefGoogle Scholar
  • Whittemore AS, Saunders S (1977) Optimal inventory under stochastic demand with two supply options. SIAM J. Appl. Math. 32(2):293–305.CrossrefGoogle Scholar
  • Wilcoxon F (1992) Individual comparisons by ranking methods. Kotz S, Johnson NL, eds. Breakthroughs in Statistics, Springer Series in Statistics (Springer, New York), 196–202.CrossrefGoogle Scholar
  • Williams RJ, Peng J (1990) An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4):490–501.CrossrefGoogle Scholar
  • Xin L (2021a) 1.79-approximation algorithms for continuous review single-sourcing lost-sales and dual-sourcing inventory models. Oper. Res. 70(1):111–128.LinkGoogle Scholar
  • Xin L (2021b) Understanding the performance of capped base-stock policies in lost-sales inventory models. Oper. Res. 69(1):61–70.LinkGoogle Scholar
  • Xin L, Goldberg DA (2018) Asymptotic optimality of tailored base-surge policies in dual-sourcing inventory systems. Management Sci. 64(1):437–452.LinkGoogle Scholar
  • Xin L, Van Mieghem JA (2021) Dual-sourcing, dual-mode dynamic stochastic inventory models: A review. Preprint submitted September 29, http://dx.doi.org/10.2139/ssrn.3885147.Google Scholar
  • Yang Z, Lee J, Park C (2022) Injecting logical constraints into neural networks via straight-through estimators. Chaudhuri K, Jegelka S, Song L, Szepesvári C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (PMLR, New York), 25096–25122.Google Scholar
  • Yarats D, Zhang A, Kostrikov I, Amos B, Pineau J, Fergus R (2021) Improving sample efficiency in model-free reinforcement learning from images. Proc. 35th Conf. AAAI Artificial Intelligence, 33rd Conf. Innovative Appl. Artificial Intelligence IAAI 2021, 11th Sympos. Educational Adv. Artificial Intelligence, EAAI 2021, vol. 35 (AAAI Press), 10674–10681.Google Scholar
  • Yin P, Lyu J, Zhang S, Osher S, Qi Y, Xin J (2018) Understanding straight-through estimator in training activation quantized neural nets. 7th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Zhong YD, Dey B, Chakraborty A (2020) Symplectic ODE-Net: Learning hamiltonian dynamics with control. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Zipkin P (2008a) Old and new methods for lost-sales inventory systems. Oper. Res. 56(5):1256–1263.LinkGoogle Scholar
  • Zipkin P (2008b) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.