Online Learning and Optimization for Queues with Unknown Arrival Rate and Service Distribution
Abstract
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity to maximize the cumulative expected profit (the service revenue minus the capacity cost and delay penalty). The conventional predict-then-optimize (PTO) approach takes two steps: First, it estimates the model parameters (e.g., arrival rate and service-time distribution) from data; second, it optimizes a model taking these parameters as input. A major drawback of PTO is that its solution accuracy can often be highly sensitive to the parameter estimation errors because PTO is unable to effectively account for how these errors (Step 1) will impact the solution quality of the downstream optimization (Step 2). To remedy this issue, we develop an online learning framework that automatically incorporates the aforementioned parameter estimation errors in the optimization process; it is an end-to-end approach that can learn the optimal solution without needing to set up the parameter estimation as a separate step as in PTO. Effectiveness of our online learning approach is substantiated by (i) theoretical results including the algorithm convergence and analysis of the regret (“cost” to pay over time for the algorithm to learn the optimal policy) and (ii) engineering confirmation via simulation experiments of a variety of representative examples. We also provide careful comparisons between PTO and our online learning method.
Funding: X. Chen acknowledges financial support from the National Natural Science Foundation of China [Grants 72571237 and 72394361].
Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0304.

