Control of Dual-Sourcing Inventory Systems Using Recurrent Neural Networks

Lucas Böttcher
Corresponding Author
Lucas Böttcher
[email protected]
https://orcid.org/0000-0003-1700-1897
Department of Computational Science and Philosophy, Frankfurt School of Finance and Management, 60322 Frankfurt, Germany;
Search for more papers by this author
,
Thomas Asikis
Thomas Asikis
[email protected]
https://orcid.org/0000-0003-0163-4622
Game Theory, University of Zurich, 8092 Zurich, Switzerland;
Search for more papers by this author
,
Ioannis Fragkos
Ioannis Fragkos
[email protected]
https://orcid.org/0000-0001-7654-2314
Department of Technology and Operations Management, Rotterdam School of Management, Erasmus University Rotterdam, 3062 Rotterdam, Netherlands
Search for more papers by this author

Corresponding Author

Lucas Böttcher

Department of Computational Science and Philosophy, Frankfurt School of Finance and Management, 60322 Frankfurt, Germany;

Search for more papers by this author

Thomas Asikis

[email protected]

https://orcid.org/0000-0003-0163-4622

Game Theory, University of Zurich, 8092 Zurich, Switzerland;

Search for more papers by this author

Ioannis Fragkos

[email protected]

https://orcid.org/0000-0001-7654-2314

Department of Technology and Operations Management, Rotterdam School of Management, Erasmus University Rotterdam, 3062 Rotterdam, Netherlands

Search for more papers by this author

Published Online:6 Jul 2023https://doi.org/10.1287/ijoc.2022.0136

References

Allon G, Van Mieghem JA (2010) Global dual sourcing: Tailored base-surge allocation to near- and offshore production. Management Sci. 56(1):110–124.Link, Google Scholar
Arrow KJ, Harris T, Marschak J (1951) Optimal inventory policy. Econometrica 19(3):250–272.Crossref, Google Scholar
Asikis T (2021) Multi-objective optimization for value-sensitive and sustainable basket recommendations. Preprint, submitted November 10, https://arxiv.org/abs/2111.05944.Google Scholar
Asikis T, Böttcher L, Antulov-Fantulin N (2022) Neural ordinary differential equation control of dynamics on graphs. Physical Rev. Res. 4(1):013221.Crossref, Google Scholar
Axsäter S (2007) A heuristic for triggering emergency orders in an inventory system. Eur. J. Oper. Res. 176(2):880–891.Crossref, Google Scholar
Barankin E (1961) A delivery-lag inventory model with an emergency provision. Naval Res. Logistics Quart. 8:285–311.Google Scholar
Barron JT (2017) Continuously differentiable exponential linear units. Preprint, submitted April 24, https://arxiv.org/abs/1704.07483.Google Scholar
Baxter J (2000) A model of inductive bias learning. J. Artificial Intelligence Res. 12(1):149–198.Crossref, Google Scholar
Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2018) Automatic differentiation in machine learning: A survey. J. Machine Learn. Res. 18(1):5595–5637.Google Scholar
Bengio Y, Léonard N, Courville A (2013) Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint, submitted August 15, https://arxiv.org/abs/1308.3432.Google Scholar
Böttcher L, Asikis T (2022) Near-optimal control of dynamical systems with neural ordinary differential equations. Machine Learn. Sci. Tech. 3(4):045004.Crossref, Google Scholar
Böttcher L, Antulov-Fantulin N, Asikis T (2022) AI Pontryagin or how neural networks learn to control dynamical systems. Nature Comm. 13:333.Crossref, Google Scholar
Böttcher L, Asikis T, Fragkos I (2023) GitHub repository. https://github.com/INFORMSJoC/2022.0136.Google Scholar
Botvinick M, Ritter S, Wang JX, Kurth-Nelson Z, Blundell C, Hassabis D (2019) Reinforcement learning, fast and slow. Trends Cognitive Sci. 23(5):408–422.Crossref, Google Scholar
Boute RN, Disney SM, Gijsbrechts J, Van Mieghem JA (2022a) Dual sourcing and smoothing under nonstationary demand time series: Reshoring with speed factories. Management Sci. 68(2):1039–1057.Link, Google Scholar
Boute RN, Gijsbrechts J, van Jaarsveld W, Vanvuchelen N (2022b) Deep reinforcement learning for inventory control: A roadmap. Eur. J. Oper. Res. 298(2):401–412.Crossref, Google Scholar
Bozinovski S, Fulgosi A (1976) The influence of pattern similarity and transfer learning upon the training of a base perceptron B2. Proc. Sympos. Informatica, 3–121.Google Scholar
Chen B, Shi C (2019) Tailored base-surge policies in dual-sourcing inventory systems with demand learning. Preprint, submitted September 27, https://dx.doi.org/10.2139/ssrn.3456834.Google Scholar
Chen RT, Amos B, Nickel M (2021) Learning neural event functions for ordinary differential equations. 9th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Chen TQ, Rubanova Y, Bettencourt J, Duvenaud D (2018) Neural ordinary differential equations. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Montréal, Canada), 6572–6583.Google Scholar
DeCroix GA, Arreola-Risa A (1998) Optimal production and inventory policy for multiple products under resource constraints. Management Sci. 44(7):950–961.Link, Google Scholar
Degris T, White M, Sutton RS (2012) Linear off-policy actor-critic. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, Madison, WI).Google Scholar
Douglas SC, Yu J (2018) Why ReLU units sometimes die: Analysis of single-unit error backpropagation in neural networks. 52nd Asilomar Conf. Signals Systems Comput. (IEEE, Piscataway, NJ), 864–868.Google Scholar
Dulac-Arnold G, Evans R, van Hasselt H, Sunehag P, Lillicrap T, Hunt J, Mann T, Weber T, Degris T, Coppin B (2015) Deep reinforcement learning in large discrete action spaces. Preprint, submitted December 24, https://arxiv.org/abs/1512.07679.Google Scholar
Feldkamp L, Puskorius G (1993) Neural network control of an unstable process. Proc. 36th Midwest Sympos. Circuits Systems (IEEE, Piscataway, NJ), 35–40.Google Scholar
Fox EJ, Metters R, Semple J (2006) Optimal inventory policy with two suppliers. Oper. Res. 54(2):389–393.Link, Google Scholar
Fukuda Y (1964) Optimal policies for the inventory problem with negotiable leadtime. Management Sci. 10(4):690–708.Link, Google Scholar
Gijsbrechts J, Boute RN, Van Mieghem JA, Zhang DJ (2022) Can deep reinforcement learning improve inventory management? Performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing Service Oper. Management 24(3):1349–1368.Link, Google Scholar
Goldberg DA, Reiman MI, Wang Q (2021) A survey of recent progress in the asymptotic analysis of inventory systems. Production Oper. Management 30(6):1718–1750.Crossref, Google Scholar
Hanin B, Sellke M (2017) Approximating continuous functions by ReLU nets of minimal width. Preprint, submitted October 31, https://arxiv.org/abs/1710.11278.Google Scholar
Hinton G (2016) RMSprop—PyTorch 1.10.0 documentation. Accessed June 11, 2023, https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html.Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.Crossref, Google Scholar
Holl P, Thuerey N, Koltun V (2020) Learning to control PDEs with differentiable physics. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.Crossref, Google Scholar
Hua Z, Yu Y, Zhang W, Xu X (2015) Structural properties of the optimal policy for dual-sourcing systems with general lead times. IIE Trans. 47(8):841–850.Crossref, Google Scholar
Huggins EL, Olsen TL (2010) Inventory control with generalized expediting. Oper. Res. 58(5):1414–1426.Link, Google Scholar
Janakiraman G, Seshadri S, Sheopuri A (2015) Analysis of tailored base-surge policies in dual sourcing inventory systems. Management Sci. 61(7):1547–1561.Link, Google Scholar
Jiang Y, Shi C, Shen S (2019) Service level constrained inventory systems. Production Oper. Management 28(9):2365–2389.Crossref, Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (NeurIPS, San Diego, CA).Google Scholar
Jin W, Wang Z, Yang Z, Mou S (2020) Pontryagin differentiable programming: An end-to-end learning and control framework. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (NeurIPS, San Diego, CA).Google Scholar
Johansen SG, Thorstenson A (2014) Emergency orders in the periodic-review inventory system with fixed ordering costs and compound Poisson demand. Internat. J. Production Econom. 157:147–157.Crossref, Google Scholar
Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nature Rev. Phys. 3(6):422–440.Crossref, Google Scholar
LeCun YA, Bottou L, Orr GB, Müller KR (1998) Efficient backprop. Orr GB, Müller KR, eds. Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol. 1524 (Springer, Berlin), 9–48.Crossref, Google Scholar
Lin X, Hou ZJ, Ren H, Pan F (2019) Approximate mixed-integer programming solution with machine learning technique and linear programming relaxation. Third Internat. Conf. Smart Grid Smart Cities (IEEE, Piscataway, NJ), 101–107.Google Scholar
Linnainmaa S (1976) Taylor expansion of the accumulated rounding error. BIT 16(2):146–160.Crossref, Google Scholar
Liu J, Chen W, Yang J, Xiong H, Chen C (2022) Iterative prediction-and-optimization for e-logistics distribution network design. INFORMS J. Comput. 34(2):769–789.Link, Google Scholar
Lutter M, Ritter C, Peters J (2019) Deep Lagrangian networks: Using physics as model prior for deep learning. 7th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Mak HY, Dai T, Tang CS (2021) Managing two-dose COVID-19 vaccine rollouts with limited supply: Operations strategies for distributing time-sensitive resources. Production Oper. Management 31(12):4424–4442.Google Scholar
Manary MP, Willems SP (2021) Data set: 187 weeks of customer forecasts and orders for microprocessors from Intel corporation. Manufacturing Service Oper. Management 24(1):682–689.Link, Google Scholar
Morton TE (1971) The near-myopic nature of the lagged-proportional-cost inventory problem with lost sales. Oper. Res. 19(7):1708–1716.Link, Google Scholar
Mowlavi S, Nabi S (2023) Optimal control of PDEs using physics-informed neural networks. J. Comput. Phys. 473:111731.Crossref, Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 807–814.Google Scholar
Nair V, Bartunov S, Gimeno F, von Glehn I, Lichocki P, Lobov I, O’Donoghue B, et al. (2020) Solving mixed integer programs using neural networks. Preprint, submitted December 23, https://arxiv.org/abs/2012.13349.Google Scholar
Park S, Yun C, Lee J, Shin J (2020) Minimum width for universal approximation. Preprint, submitted June 16, https://arxiv.org/abs/2006.08859.Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. This work was part of the NIPS 2017 Autodiff workshop.Google Scholar
Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: Applications on robotics. J. Intelligent Robotic Systems 86(2):153–173.Crossref, Google Scholar
Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, New York).Crossref, Google Scholar
Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378:686–707.Crossref, Google Scholar
Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S (2013) Gaussian processes for time-series modelling. Philos. Trans. Roy. Soc. A Math. Physical Engrg. Sci. 371(1984):20110550.Crossref, Google Scholar
Roehrl MA, Runkler TA, Brandtstetter V, Tokic M, Obermayer S (2020) Modeling system dynamics with physics-informed neural networks based on Lagrangian mechanics. IFAC-PapersOnLine 53(2):9195–9200.Crossref, Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536.Crossref, Google Scholar
Scarf H, Karlin S (1958) Inventory models of the Arrow-Harris-Marschak type with time lag. Arrow KJ, Karlin S, Scarf HE, eds. Studies in the Mathematical Theory of Inventory and Production (Stanford University Press, Stanford, CA).Google Scholar
Scheller-Wolf A, Veeraraghavan S, van Houtum GJ (2007) Effective dual sourcing with a single index policy. Working paper, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117.Crossref, Google Scholar
Schmitt S, Hessel M, Simonyan K (2020) Off-policy actor-critic with shared experience replay. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 8545–8554.Google Scholar
Sheopuri A, Janakiraman G, Seshadri S (2010) New policies for the stochastic inventory control problem with two supply sources. Oper. Res. 58(3):734–745.Link, Google Scholar
Song JS, Zipkin P (1993) Inventory control in a fluctuating demand environment. Oper. Res. 41(2):351–370.Link, Google Scholar
Song JS, van Houtum GJ, Van Mieghem JA (2020) Capacity and inventory management: Review, trends, and projections. Manufacturing Service Oper. Management 22(1):36–46.Link, Google Scholar
Song JS, Xiao L, Zhang H, Zipkin P (2017) Optimal policies for a dual-sourcing inventory problem with endogenous stochastic lead times. Oper. Res. 65(2):379–395.Link, Google Scholar
Song JS, Xiao L, Zhang H, Zipkin P (2021) Smart policies for multisource inventory systems and general tandem queues with order tracking and expediting. Oper. Res. 70(4):2421–2438.Link, Google Scholar
Sun J, Van Mieghem JA (2019) Robust dual sourcing inventory management: Optimality of capped dual index policies and smoothing. Manufacturing Service Oper. Management 21(4):912–931.Link, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Svoboda J, Minner S, Yao M (2021) Typology and literature review on multiple supplier inventory control models. Eur. J. Oper. Res. 293(1):1–23.Crossref, Google Scholar
van Hasselt H, Wiering MA (2009) Using continuous action spaces to solve discrete problems. 2009 Internat. Joint Conf. Neural Networks (IEEE, Piscataway, NJ), 1149–1156.Google Scholar
van Hasselt H, Doron Y, Strub F, Hessel M, Sonnerat N, Modayil J (2018) Deep reinforcement learning and the deadly triad. Preprint, submitted December 6, https://arxiv.org/abs/1812.02648.Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (NeurIPS, San Diego).Google Scholar
Veeraraghavan S, Scheller-Wolf A (2008) Now or later: A simple policy for effective dual sourcing in capacitated systems. Oper. Res. 56(4):850–864.Link, Google Scholar
Veinott AF Jr (1965) The optimal inventory policy for batch ordering. Oper. Res. 13(3):424–432.Link, Google Scholar
Wang W, Axelrod S, Gómez-Bombarelli R (2020) Differentiable molecular simulations for control and learning. Preprint, submitted February 27, https://arxiv.org/abs/2003.00868.Google Scholar
Wang X, Xiong W, Wang H, Wang WY (2018) Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision (Springer International Publishing, Cham, Switzerland), 38–55.Google Scholar
Wang YJ, Lin CT (1998) Runge-Kutta neural network for identification of dynamical systems in high accuracy. IEEE Trans. Neural Networks 9(2):294–307.Crossref, Google Scholar
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2017) Sample efficient actor-critic with experience replay. 5th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Werbos PJ (1990) Backpropagation through time: What it does and how to do it. Proc. IEEE. 78(10):1550–1560.Crossref, Google Scholar
Whittemore AS, Saunders S (1977) Optimal inventory under stochastic demand with two supply options. SIAM J. Appl. Math. 32(2):293–305.Crossref, Google Scholar
Wilcoxon F (1992) Individual comparisons by ranking methods. Kotz S, Johnson NL, eds. Breakthroughs in Statistics, Springer Series in Statistics (Springer, New York), 196–202.Crossref, Google Scholar
Williams RJ, Peng J (1990) An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4):490–501.Crossref, Google Scholar
Xin L (2021a) 1.79-approximation algorithms for continuous review single-sourcing lost-sales and dual-sourcing inventory models. Oper. Res. 70(1):111–128.Link, Google Scholar
Xin L (2021b) Understanding the performance of capped base-stock policies in lost-sales inventory models. Oper. Res. 69(1):61–70.Link, Google Scholar
Xin L, Goldberg DA (2018) Asymptotic optimality of tailored base-surge policies in dual-sourcing inventory systems. Management Sci. 64(1):437–452.Link, Google Scholar
Xin L, Van Mieghem JA (2021) Dual-sourcing, dual-mode dynamic stochastic inventory models: A review. Preprint submitted September 29, http://dx.doi.org/10.2139/ssrn.3885147.Google Scholar
Yang Z, Lee J, Park C (2022) Injecting logical constraints into neural networks via straight-through estimators. Chaudhuri K, Jegelka S, Song L, Szepesvári C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (PMLR, New York), 25096–25122.Google Scholar
Yarats D, Zhang A, Kostrikov I, Amos B, Pineau J, Fergus R (2021) Improving sample efficiency in model-free reinforcement learning from images. Proc. 35th Conf. AAAI Artificial Intelligence, 33rd Conf. Innovative Appl. Artificial Intelligence IAAI 2021, 11th Sympos. Educational Adv. Artificial Intelligence, EAAI 2021, vol. 35 (AAAI Press), 10674–10681.Google Scholar
Yin P, Lyu J, Zhang S, Osher S, Qi Y, Xin J (2018) Understanding straight-through estimator in training activation quantized neural nets. 7th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Zhong YD, Dey B, Chakraborty A (2020) Symplectic ODE-Net: Learning hamiltonian dynamics with control. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Zipkin P (2008a) Old and new methods for lost-sales inventory systems. Oper. Res. 56(5):1256–1263.Link, Google Scholar
Zipkin P (2008b) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.Link, Google Scholar

cover image INFORMS Journal on Computing

Volume 35, Issue 6

November-December 2023

Pages 1215-1532, C2

Article Information

Supplemental Material

Metrics

Information

Received:May 05, 2022
Accepted:May 23, 2023
Published Online:July 06, 2023

Cite as

Lucas Böttcher, Thomas Asikis, Ioannis Fragkos (2023) Control of Dual-Sourcing Inventory Systems Using Recurrent Neural Networks. INFORMS Journal on Computing 35(6):1308-1328.

https://doi.org/10.1287/ijoc.2022.0136

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Control of Dual-Sourcing Inventory Systems Using Recurrent Neural Networks

References

Volume 35, Issue 6

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News