On the Power of (Even a Little) Resource Pooling
Abstract
We propose and analyze a multi-server model that captures a performance trade-off between centralized and distributed processing. In our model, a fraction p of an available resource is deployed in a centralized manner (e.g., to serve a most-loaded station) while the remaining fraction 1 − p is allocated to local servers that can only serve requests addressed specifically to their respective stations.
Using a fluid model approach, we demonstrate a surprising phase transition in the steady-state delay scaling, as p changes: in the limit of a large number of stations, and when any amount of centralization is available (p > 0), the average queue length in steady state scales as when the traffic intensity λ goes to 1. This is exponentially smaller than the usual M/M/1-queue delay scaling of , obtained when all resources are fully allocated to local stations (p = 0). This indicates a strong qualitative impact of even a small degree of resource pooling.
We prove convergence to a fluid limit, and characterize both the transient and steady-state behavior of the actual system, in the limit as the number of stations N goes to infinity. We show that the sequence of queue-length processes converges to a unique fluid trajectory (over any finite time interval, as N → ∞), and that this fluid trajectory converges to a unique invariant state vI, for which a simple closed-form expression is obtained. We also show that the steady-state distribution of the N-server system concentrates on vI as N goes to infinity.
This article appears in INFORMS Analytics Collections Vol. 15: 25 Years of INFORMS.
Visit this collection for free access to more articles showcasing the evolution of INFORMS over the past 25 years.

