Dynamic Orienteering on a Network of Queues
Abstract
We introduce a stochastic orienteering problem on a network of queues in which the traveler must arrive and enter service at locations within the respective time windows to collect rewards, but the traveler may experience stochastic wait time at each location before service can begin. To maximize the expected rewards collected, the traveler must determine which locations to visit and how long to wait in queues at each location. We formally model the problem as a Markov decision process with the objective of maximizing the expected collected rewards. We investigate the existence of optimal control limits and examine conditions under which certain actions cannot be optimal. To solve the problem, we propose an approximate dynamic programming approach based on rollout algorithms. The method introduces a two-stage heuristic estimation that we refer to as compound rollout. In the first stage, the algorithm decides whether to stay at the current location or go to another location. If departing the current location, it chooses the next location in the second stage. We demonstrate the value of our modeling and solution approaches by comparing the dynamic policies to a-priori-route solutions with recourse actions.
The online appendix is available at https://doi.org/10.1287/trsc.2017.0761.