Multiagent Environments for Vehicle Routing Problems

Published Online:https://doi.org/10.1287/ijoc.2025.1211

References

  • Accorsi L, Lodi A, Vigo D (2022) Guidelines for the computational testing of machine learning approaches to vehicle routing problems. Oper. Res. Lett. 50(2):229–234.CrossrefGoogle Scholar
  • Albrecht SV, Christianos F, Schäfer L (2024) Multi-Agent Reinforcement Learning: Foundations and Modern Approaches (MIT Press, Cambridge, MA).Google Scholar
  • Arishi A, Krishnan K (2023) A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem. J. Management Anal. 10(3):493–515.Google Scholar
  • Balaji B, Bell-Masterson J, Bilgin E, Damianou A, Garcia PM, Jain A, Luo R, Maggiar A, Narayanaswamy B, Ye C (2019) ORL: Reinforcement learning benchmarks for online stochastic optimization problems. Preprint, submitted November 19, https://arxiv.org/abs/1911.10641.Google Scholar
  • Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural combinatorial optimization with reinforcement learning. Proc. 5th Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Berto F, Hua C, Zepeda NG, Hottung A, Wouda N, Lan L, Tierney K, Park J (2025a) RouteFinder: Towards foundation models for vehicle routing problems. Trans. Machine Learn. Res. (OpenReview.net).Google Scholar
  • Berto F, Hua C, Luttmann L, Son J, Park J, Ahn K, Kwon C, Xie L, Park J (2025b) PARCO: Parallel autoregressive models for multi-agent combinatorial optimization. Advances in Neural Information Processing Systems (Neural Information Processing Systems Foundation, Inc., San Diego).Google Scholar
  • Berto F, Hua C, Park J, Luttman L, Ma Y, Bu F, Wang J, et al. (2025c) RL4CO: An extensive reinforcement learning for combinatorial optimization benchmark. Proc. 31st ACM SIGKDD Conf. Knowledge Discovery Data Mining (ACM, New York).Google Scholar
  • Bettini M, Prorok A, Moens V (2024) BenchMARL: Benchmarking multi-agent reinforcement learning. J. Machine Learn. Res. 25(217):1–10.Google Scholar
  • Biagioni D, Tripp CE, Clark S, Duplyakin D, Law J, John PCS (2022) graphenv: A Python library for reinforcement learning on graph search spaces. J. Open Source Software 7(77):4621.CrossrefGoogle Scholar
  • Bonnet C, Luo D, Byrne D, Surana S, Coyette V, Duckworth P, Midgley LI, et al. (2024) Jumanji: A diverse suite of scalable reinforcement learning environments in JAX. Internat. Conf. Learn. Representations (Vienna, Austria), 49264–49293.Google Scholar
  • Bono G, Dibangoye JS, Simonin O, Matignon L, Pereyron F (2020) Solving multi-agent routing problems using deep attention mechanisms. IEEE Trans. Intelligent Transportation Systems 22(12):7804–7813.CrossrefGoogle Scholar
  • Bou A, Bettini M, Dittert S, Kumar V, Sodhani S, Yang X, Fabritiis GD, Moens V (2024) TorchRL: A data-driven decision-making library for PyTorch. Internat. Conf. Learn. Representations (Vienna, Austria), 1778–1811.Google Scholar
  • Braekers K, Ramaekers K, Van Nieuwenhuyse I (2016) The vehicle routing problem: State of the art classification and review. Comput. Indust. Engrg. 99:300–313.CrossrefGoogle Scholar
  • Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym. Preprint, submitted June 5, https://arxiv.org/abs/1606.01540.Google Scholar
  • Fitzpatrick J, Ajwani D, Carroll P (2024) A scalable learning approach for the capacitated vehicle routing problem. Comput. Oper. Res. 171:106787.CrossrefGoogle Scholar
  • Fuertes D, del Blanco CR, Jaureguizar F, García N (2025) Top-former: A multi-agent transformer approach for the team orienteering problem. IEEE Trans. Intelligent Transportation Systems 26(9):13799–13810.CrossrefGoogle Scholar
  • Gama R, Fernandes H, Fuertes D, del Blanco CR, Cunha R (2026) Multiagent environments for vehicle routing problems. https://doi.org/10.1287/ijoc.2025.1211.cd, https://github.com/INFORMSJoC/2025.1211.Google Scholar
  • Guo F, Wei Q, Wang M, Guo Z, Wallace SW (2023) Deep attention models with dimension-reduction and gate mechanisms for solving practical time-dependent vehicle routing problems. Transportation Res. Part E: Logist. Transportation Rev. 173:103095.CrossrefGoogle Scholar
  • Hu S, Zhong Y, Gao M, Wang W, Dong H, Liang X, Li Z, Chang X, Yang Y (2023) MARLlib: A scalable and efficient multi-agent reinforcement learning library. J. Machine Learn. Res. 24(315):1–23.Google Scholar
  • Hubbs CD, Perez HD, Sarwar O, Sahinidis NV, Grossmann IE, Wassick JM (2020) OR-Gym: A reinforcement learning library for operations research problems. Preprint, submitted August 14, https://arxiv.org/abs/2008.06319.Google Scholar
  • Kim M, Park J, Park J (2022) Sym-NCO: Leveraging symmetricity for neural combinatorial optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1936–1949.Google Scholar
  • Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems! 7th Internat. Conf. Learn. Representations, ICLR 2019, 1–25.Google Scholar
  • Kwon YD, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 21188–21198.Google Scholar
  • Li J, Niu Y, Zhu G, Xiao J (2024) Solving pick-up and delivery problems via deep reinforcement learning based symmetric neural optimization. Expert Syst. Appl. 255:124514.CrossrefGoogle Scholar
  • Li B, Wu G, He Y, Fan M, Pedrycz W (2022) An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. IEEE/CAA J. Automatica Sinica 9(7):1115–1138.CrossrefGoogle Scholar
  • Liu F, Lin X, Zhang Q, Tong X, Yuan M (2024a) Multi-task learning for routing problem with cross-problem zero-shot generalization. KDD’24: Proc. 30th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1898–1908.Google Scholar
  • Liu Q, Liu C, Niu S, Long C, Zhang J, Xu M (2024b) 2D-Ptr: 2D array pointer network for solving the heterogeneous capacitated vehicle routing problem. AAMAS’ 24: Proc. 23rd Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1238–1246.Google Scholar
  • Mazyavkina N, Sviridov S, Ivanov S, Burnaev E (2021) Reinforcement learning for combinatorial optimization: A survey. Computers Oper. Res. 134:105400.CrossrefGoogle Scholar
  • Menda K, Chen YC, Grana J, Bono JW, Tracey BD, Kochenderfer MJ, Wolpert D (2018) Deep reinforcement learning for event-driven multi-agent decision processes. IEEE Trans. Intelligent Transportation Systems 20(4):1259–1268.CrossrefGoogle Scholar
  • Mohanty SP, Nygren E, Laurent F, Schneider M, Scheller CV, Bhattacharya N, Watson JD, et al. (2020) Flatland-RL: Multi-agent reinforcement learning on trains. Preprint, submitted December 10, https://arxiv.org/abs/2012.05893.Google Scholar
  • Nazari M, Oroojlooy A, Snyder LV, Takáč M (2018) Deep reinforcement learning for solving the vehicle routing problem. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 9861–9871.Google Scholar
  • Pan W, Liu SQ (2023) Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl. Intelligence 53(1):405–422.CrossrefGoogle Scholar
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) PyTorch: An imperative style, high-performance deep learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 8026–8037.Google Scholar
  • Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-Baselines3: Reliable reinforcement learning implementations. J. Machine Learn. Res. 22(268):1–8.Google Scholar
  • Shi R, Niu L (2023) A brief survey on learning based methods for vehicle routing problems. Procedia Comput. Sci. 221:773–780.CrossrefGoogle Scholar
  • Shoham Y, Leyton-Brown K (2008) Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, New York).CrossrefGoogle Scholar
  • Terry J, Black B, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos LS, et al. (2021) PettingZoo: Gym for multi-agent reinforcement learning. Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 15032–15043.Google Scholar
  • Thyssens D, Dernedde T, Falkner JK, Schmidt-Thieme L (2023) Routing arena: A benchmark suite for neural routing solvers. Preprint, submitted October 6, https://arxiv.org/abs/2310.04140.Google Scholar
  • Towers M, Kwiatkowski A, Terry J, Balis JU, Cola GD, Deleu T, Goulão M, et al. (2024) Gymnasium: A standard interface for reinforcement learning environments. Preprint, submitted July 24, https://arxiv.org/abs/2407.17032.Google Scholar
  • Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. NIPS’15: Proc. 29th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2692–2700.Google Scholar
  • Wan CP, Li T, Wang JM (2023) RLOR: A flexible framework of deep reinforcement learning for operation research. Preprint, submitted March 23, https://arxiv.org/abs/2303.13117.Google Scholar
  • Wu X, Wang D, Wen L, Xiao Y, Wu C, Wu Y, Yu C, Maskell DL, Zhou Y (2024) Neural combinatorial optimization algorithms for solving vehicle routing problems: A comprehensive survey with perspectives. Preprint, submitted June 1, https://arxiv.org/abs/2406.00415.Google Scholar
  • Xiang C, Wu Z, Tu J, Huang J (2024) Centralized deep reinforcement learning method for dynamic multi-vehicle pickup and delivery problem with crowdshippers. IEEE Trans. Intelligent Transportation Systems 25(8):9253–9267.CrossrefGoogle Scholar
  • Zhang Z, Qi G, Guan W (2023b) Coordinated multi-agent hierarchical deep reinforcement learning to solve multi-trip vehicle routing problems with soft time windows. IET Intelligent Transportation Systems 17(10):2034–2051.CrossrefGoogle Scholar
  • Zhang K, He F, Zhang Z, Lin X, Li M (2020) Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach. Transportation Res. Part C: Emerging Tech. 121:102861.CrossrefGoogle Scholar
  • Zhang Y, Bliek L, da Costa P, Afshar RR, Reijnen R, Catshoek T, Vos D, et al. (2023a) The first AI4TSP competition: Learning to solve stochastic routing problems. Artificial Intelligence 319:103918.CrossrefGoogle Scholar
  • Zhou G, Li X, Li D, Bian J (2024a) Learning-based optimization algorithms for routing problems: Bibliometric analysis and literature review. IEEE Trans. Intelligent Transportation Systems 25(11):15273–15290.CrossrefGoogle Scholar
  • Zhou J, Cao Z, Wu Y, Song W, Ma Y, Zhang J, Xu C (2024b) MVMoE: Multi-task vehicle routing solver with mixture-of-experts. Proc. 41st Internat. Conf. Machine Learn. (Vienna, Austria).Google Scholar
  • Zong Z, Zheng M, Li Y, Jin D (2022) MAPDP: Cooperative multi-agent reinforcement learning to solve pickup and delivery problems. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (AAAI Press, Washington, DC), 9980–9988.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.