Dynamic Orienteering on a Network of Queues

Published Online:https://doi.org/10.1287/trsc.2017.0761

We introduce a stochastic orienteering problem on a network of queues in which the traveler must arrive and enter service at locations within the respective time windows to collect rewards, but the traveler may experience stochastic wait time at each location before service can begin. To maximize the expected rewards collected, the traveler must determine which locations to visit and how long to wait in queues at each location. We formally model the problem as a Markov decision process with the objective of maximizing the expected collected rewards. We investigate the existence of optimal control limits and examine conditions under which certain actions cannot be optimal. To solve the problem, we propose an approximate dynamic programming approach based on rollout algorithms. The method introduces a two-stage heuristic estimation that we refer to as compound rollout. In the first stage, the algorithm decides whether to stay at the current location or go to another location. If departing the current location, it chooses the next location in the second stage. We demonstrate the value of our modeling and solution approaches by comparing the dynamic policies to a-priori-route solutions with recourse actions.

The online appendix is available at https://doi.org/10.1287/trsc.2017.0761.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.