Multiagent Environments for Vehicle Routing Problems

Ricardo Gama
Corresponding Author
Ricardo Gama
[email protected]
https://orcid.org/0000-0002-7051-8310
Escola Superior de Tecnologia e Gestão de Lamego, Instituto Politécnico de Viseu, 5100-074 Lamego, Portugal
Search for more papers by this author
,
Ricardo Cunha
Ricardo Cunha
[email protected]
Escola Superior de Tecnologia e Gestão de Lamego, Instituto Politécnico de Viseu, 5100-074 Lamego, Portugal
Search for more papers by this author
,
Daniel Fuertes
Daniel Fuertes
[email protected]
https://orcid.org/0000-0002-5746-2199
Grupo de Tratamiento de Imágenes (GTI), Information Processing and Telecommunications Center, ETSI, Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain
Search for more papers by this author
,
Carlos R. del-Blanco
Carlos R. del-Blanco
[email protected]
https://orcid.org/0000-0003-0618-3488
Grupo de Tratamiento de Imágenes (GTI), Information Processing and Telecommunications Center, ETSI, Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain
Search for more papers by this author
,
Hugo L. Fernandes
Hugo L. Fernandes
[email protected]
Singuli, Brooklyn, New York 11217
Search for more papers by this author

Ricardo Gama

Corresponding Author

Ricardo Gama

[email protected]

https://orcid.org/0000-0002-7051-8310

Escola Superior de Tecnologia e Gestão de Lamego, Instituto Politécnico de Viseu, 5100-074 Lamego, Portugal

Search for more papers by this author

Ricardo Cunha

[email protected]

Escola Superior de Tecnologia e Gestão de Lamego, Instituto Politécnico de Viseu, 5100-074 Lamego, Portugal

Search for more papers by this author

Daniel Fuertes

[email protected]

https://orcid.org/0000-0002-5746-2199

Grupo de Tratamiento de Imágenes (GTI), Information Processing and Telecommunications Center, ETSI, Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain

Search for more papers by this author

Carlos R. del-Blanco

[email protected]

https://orcid.org/0000-0003-0618-3488

Grupo de Tratamiento de Imágenes (GTI), Information Processing and Telecommunications Center, ETSI, Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain

Search for more papers by this author

Hugo L. Fernandes

[email protected]

Singuli, Brooklyn, New York 11217

Search for more papers by this author

Published Online:23 Jun 2026https://doi.org/10.1287/ijoc.2025.1211

References

Accorsi L, Lodi A, Vigo D (2022) Guidelines for the computational testing of machine learning approaches to vehicle routing problems. Oper. Res. Lett. 50(2):229–234.Crossref, Google Scholar
Albrecht SV, Christianos F, Schäfer L (2024) Multi-Agent Reinforcement Learning: Foundations and Modern Approaches (MIT Press, Cambridge, MA).Google Scholar
Arishi A, Krishnan K (2023) A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem. J. Management Anal. 10(3):493–515.Google Scholar
Balaji B, Bell-Masterson J, Bilgin E, Damianou A, Garcia PM, Jain A, Luo R, Maggiar A, Narayanaswamy B, Ye C (2019) ORL: Reinforcement learning benchmarks for online stochastic optimization problems. Preprint, submitted November 19, https://arxiv.org/abs/1911.10641.Google Scholar
Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural combinatorial optimization with reinforcement learning. Proc. 5th Internat. Conf. Learn. Representations (ICLR).Google Scholar
Berto F, Hua C, Zepeda NG, Hottung A, Wouda N, Lan L, Tierney K, Park J (2025a) RouteFinder: Towards foundation models for vehicle routing problems. Trans. Machine Learn. Res. (OpenReview.net).Google Scholar
Berto F, Hua C, Luttmann L, Son J, Park J, Ahn K, Kwon C, Xie L, Park J (2025b) PARCO: Parallel autoregressive models for multi-agent combinatorial optimization. Advances in Neural Information Processing Systems (Neural Information Processing Systems Foundation, Inc., San Diego).Google Scholar
Berto F, Hua C, Park J, Luttman L, Ma Y, Bu F, Wang J, et al. (2025c) RL4CO: An extensive reinforcement learning for combinatorial optimization benchmark. Proc. 31st ACM SIGKDD Conf. Knowledge Discovery Data Mining (ACM, New York).Google Scholar
Bettini M, Prorok A, Moens V (2024) BenchMARL: Benchmarking multi-agent reinforcement learning. J. Machine Learn. Res. 25(217):1–10.Google Scholar
Biagioni D, Tripp CE, Clark S, Duplyakin D, Law J, John PCS (2022) graphenv: A Python library for reinforcement learning on graph search spaces. J. Open Source Software 7(77):4621.Crossref, Google Scholar
Bonnet C, Luo D, Byrne D, Surana S, Coyette V, Duckworth P, Midgley LI, et al. (2024) Jumanji: A diverse suite of scalable reinforcement learning environments in JAX. Internat. Conf. Learn. Representations (Vienna, Austria), 49264–49293.Google Scholar
Bono G, Dibangoye JS, Simonin O, Matignon L, Pereyron F (2020) Solving multi-agent routing problems using deep attention mechanisms. IEEE Trans. Intelligent Transportation Systems 22(12):7804–7813.Crossref, Google Scholar
Bou A, Bettini M, Dittert S, Kumar V, Sodhani S, Yang X, Fabritiis GD, Moens V (2024) TorchRL: A data-driven decision-making library for PyTorch. Internat. Conf. Learn. Representations (Vienna, Austria), 1778–1811.Google Scholar
Braekers K, Ramaekers K, Van Nieuwenhuyse I (2016) The vehicle routing problem: State of the art classification and review. Comput. Indust. Engrg. 99:300–313.Crossref, Google Scholar
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym. Preprint, submitted June 5, https://arxiv.org/abs/1606.01540.Google Scholar
Fitzpatrick J, Ajwani D, Carroll P (2024) A scalable learning approach for the capacitated vehicle routing problem. Comput. Oper. Res. 171:106787.Crossref, Google Scholar
Fuertes D, del Blanco CR, Jaureguizar F, García N (2025) Top-former: A multi-agent transformer approach for the team orienteering problem. IEEE Trans. Intelligent Transportation Systems 26(9):13799–13810.Crossref, Google Scholar
Gama R, Fernandes H, Fuertes D, del Blanco CR, Cunha R (2026) Multiagent environments for vehicle routing problems. https://doi.org/10.1287/ijoc.2025.1211.cd, https://github.com/INFORMSJoC/2025.1211.Google Scholar
Guo F, Wei Q, Wang M, Guo Z, Wallace SW (2023) Deep attention models with dimension-reduction and gate mechanisms for solving practical time-dependent vehicle routing problems. Transportation Res. Part E: Logist. Transportation Rev. 173:103095.Crossref, Google Scholar
Hu S, Zhong Y, Gao M, Wang W, Dong H, Liang X, Li Z, Chang X, Yang Y (2023) MARLlib: A scalable and efficient multi-agent reinforcement learning library. J. Machine Learn. Res. 24(315):1–23.Google Scholar
Hubbs CD, Perez HD, Sarwar O, Sahinidis NV, Grossmann IE, Wassick JM (2020) OR-Gym: A reinforcement learning library for operations research problems. Preprint, submitted August 14, https://arxiv.org/abs/2008.06319.Google Scholar
Kim M, Park J, Park J (2022) Sym-NCO: Leveraging symmetricity for neural combinatorial optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1936–1949.Google Scholar
Kool W, Van Hoof H, Welling M (2019) Attention, learn to solve routing problems! 7th Internat. Conf. Learn. Representations, ICLR 2019, 1–25.Google Scholar
Kwon YD, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 21188–21198.Google Scholar
Li J, Niu Y, Zhu G, Xiao J (2024) Solving pick-up and delivery problems via deep reinforcement learning based symmetric neural optimization. Expert Syst. Appl. 255:124514.Crossref, Google Scholar
Li B, Wu G, He Y, Fan M, Pedrycz W (2022) An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. IEEE/CAA J. Automatica Sinica 9(7):1115–1138.Crossref, Google Scholar
Liu F, Lin X, Zhang Q, Tong X, Yuan M (2024a) Multi-task learning for routing problem with cross-problem zero-shot generalization. KDD’24: Proc. 30th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1898–1908.Google Scholar
Liu Q, Liu C, Niu S, Long C, Zhang J, Xu M (2024b) 2D-Ptr: 2D array pointer network for solving the heterogeneous capacitated vehicle routing problem. AAMAS’ 24: Proc. 23rd Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1238–1246.Google Scholar
Mazyavkina N, Sviridov S, Ivanov S, Burnaev E (2021) Reinforcement learning for combinatorial optimization: A survey. Computers Oper. Res. 134:105400.Crossref, Google Scholar
Menda K, Chen YC, Grana J, Bono JW, Tracey BD, Kochenderfer MJ, Wolpert D (2018) Deep reinforcement learning for event-driven multi-agent decision processes. IEEE Trans. Intelligent Transportation Systems 20(4):1259–1268.Crossref, Google Scholar
Mohanty SP, Nygren E, Laurent F, Schneider M, Scheller CV, Bhattacharya N, Watson JD, et al. (2020) Flatland-RL: Multi-agent reinforcement learning on trains. Preprint, submitted December 10, https://arxiv.org/abs/2012.05893.Google Scholar
Nazari M, Oroojlooy A, Snyder LV, Takáč M (2018) Deep reinforcement learning for solving the vehicle routing problem. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 9861–9871.Google Scholar
Pan W, Liu SQ (2023) Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl. Intelligence 53(1):405–422.Crossref, Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) PyTorch: An imperative style, high-performance deep learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 8026–8037.Google Scholar
Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-Baselines3: Reliable reinforcement learning implementations. J. Machine Learn. Res. 22(268):1–8.Google Scholar
Shi R, Niu L (2023) A brief survey on learning based methods for vehicle routing problems. Procedia Comput. Sci. 221:773–780.Crossref, Google Scholar
Shoham Y, Leyton-Brown K (2008) Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, New York).Crossref, Google Scholar
Terry J, Black B, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos LS, et al. (2021) PettingZoo: Gym for multi-agent reinforcement learning. Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 15032–15043.Google Scholar
Thyssens D, Dernedde T, Falkner JK, Schmidt-Thieme L (2023) Routing arena: A benchmark suite for neural routing solvers. Preprint, submitted October 6, https://arxiv.org/abs/2310.04140.Google Scholar
Towers M, Kwiatkowski A, Terry J, Balis JU, Cola GD, Deleu T, Goulão M, et al. (2024) Gymnasium: A standard interface for reinforcement learning environments. Preprint, submitted July 24, https://arxiv.org/abs/2407.17032.Google Scholar
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. NIPS’15: Proc. 29th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2692–2700.Google Scholar
Wan CP, Li T, Wang JM (2023) RLOR: A flexible framework of deep reinforcement learning for operation research. Preprint, submitted March 23, https://arxiv.org/abs/2303.13117.Google Scholar
Wu X, Wang D, Wen L, Xiao Y, Wu C, Wu Y, Yu C, Maskell DL, Zhou Y (2024) Neural combinatorial optimization algorithms for solving vehicle routing problems: A comprehensive survey with perspectives. Preprint, submitted June 1, https://arxiv.org/abs/2406.00415.Google Scholar
Xiang C, Wu Z, Tu J, Huang J (2024) Centralized deep reinforcement learning method for dynamic multi-vehicle pickup and delivery problem with crowdshippers. IEEE Trans. Intelligent Transportation Systems 25(8):9253–9267.Crossref, Google Scholar
Zhang Z, Qi G, Guan W (2023b) Coordinated multi-agent hierarchical deep reinforcement learning to solve multi-trip vehicle routing problems with soft time windows. IET Intelligent Transportation Systems 17(10):2034–2051.Crossref, Google Scholar
Zhang K, He F, Zhang Z, Lin X, Li M (2020) Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach. Transportation Res. Part C: Emerging Tech. 121:102861.Crossref, Google Scholar
Zhang Y, Bliek L, da Costa P, Afshar RR, Reijnen R, Catshoek T, Vos D, et al. (2023a) The first AI4TSP competition: Learning to solve stochastic routing problems. Artificial Intelligence 319:103918.Crossref, Google Scholar
Zhou G, Li X, Li D, Bian J (2024a) Learning-based optimization algorithms for routing problems: Bibliometric analysis and literature review. IEEE Trans. Intelligent Transportation Systems 25(11):15273–15290.Crossref, Google Scholar
Zhou J, Cao Z, Wu Y, Song W, Ma Y, Zhang J, Xu C (2024b) MVMoE: Multi-task vehicle routing solver with mixture-of-experts. Proc. 41st Internat. Conf. Machine Learn. (Vienna, Austria).Google Scholar
Zong Z, Zheng M, Li Y, Jin D (2022) MAPDP: Cooperative multi-agent reinforcement learning to solve pickup and delivery problems. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (AAAI Press, Washington, DC), 9980–9988.Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:March 08, 2025
Accepted:May 17, 2026
Published Online:June 23, 2026

Cite as

Ricardo Gama, Ricardo Cunha, Daniel Fuertes, Carlos R. del-Blanco, Hugo L. Fernandes (2026) Multiagent Environments for Vehicle Routing Problems. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2025.1211

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Multiagent Environments for Vehicle Routing Problems

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News