Delay-minimizing capacity allocation in an infinite server queueing system

We consider a service system with an infinite number of exponential servers sharing a finite service capacity. The servers are ordered according to their speed, and arriving customers join the fastest idle server. A capacity allocation is an infinite sequence of service rates. We study the probabilistic properties of this system by considering overflows from sub-systems with a finite number of servers. Several stability measures are suggested and analysed. The tail of the series of service rates that minimizes the average expected delay (service time) is shown to be approximately geometrically decreasing. We use this property in order to approximate the optimal allocation of service rates by constructing an appropriate dynamic program.


Introduction
We are interested in the optimal allocation of service rates in a system with an infinite number of parallel servers and finite service capacity. Customers join the fastest server available, without jockeying if a faster server becomes available at a later time. This model is appropriate for a system with servers at different physical locations and no possibility to accommodate waiting customers. For example this may be the case in a distributed (cloud) computing system with jobs arriving at a central router that immediately sends them to the best available server. Other applications with ordered entry and heterogeneous servers are conveyor and storage systems. In this setting an allocation is given by a infinite series of service rates. Every such service-rate series determines the probabilistic traits of the system. Our objective is to minimize the stationary expected delay (service time) faced by arrivals. asymptotically optimal geometric tail of the service-rate series. The asymptotic optimal geometric rate is shown to be the square-root of the term in the product representing the aforementioned overflow (blocking) probabilities. Furthermore, we use the geometric approximation of the tail in order to define a finite dynamic program which can be solved efficiently. Numerical analysis suggests that the optimal service-rate series is very close to geometric from the start. This means that instead of solving the original capacity allocation problem we can approximate the solution by the single parameter problem of finding the optimal geometric service-rate series. Moreover, in the special case of a Poisson arrival process the simple heuristic of choosing a geometric service-rate series with decay rate √ ρ is quite close to the approximated optimal solution.
The use of stochastic queueing models in order to model cloud computing, also known as distributed or parallel computing, is very common (e.g. [13] and [25]). For example, the highly cited paper of [13] uses a M/G/m/m + r queueing system to approximate the performance. Our model is related to theirs in the case of r = 0. They state that a common assumption in cloud computing models is that there is some positive blocking probability with a predefined upper bound constraint, and our model can be conveniently used to approximate such a system. The afformentioned papers, and many others that use stochastic queueing models for cloud computing, assume homogeneous servers. However, most cloud computing systems do in fact have heterogeneous servers (see, for example, [4], [30] and [14]). Furthermore, the Fastest Server First is a common policy implemented in such systems (see [30]). In this paper we present a framework that allows for the analysis of constrained capacity allocation in large scale heterogeneous server systems. Another useful aspect of our model is that it provides tools to study the trade-off between blocking probabilities and expected delay in finite server systems. Other relevant applications of our model are large scale conveyor and storage systems (see discussion in [29]).
The research of ordered service systems with heterogeneous servers has mostly focused on analysing the blocking probability in loss systems, and their minimization in particular. In [23] it was shown that the optimal allocation of service rates in terms of minimizing blocking in an ordered Markovian system is heterogeneous. In [17] it was shown that the optimal series of service rates, in terms of minimizing blocking probabilities, is decreasing. Analysis of an ordered system with a general arrival process, along with the comparison methods for different entry order regimes, can be found in [29] and [22]. An interesting observation made in [8] is that the policy of Fastest Server First is not necessarily optimal, for example when the slower server has a lower variance of service time. Optimal assignment to an ordered system with heterogeneous customer types that can only be served by some of the servers was studied in [21]. The work presented here is related to [12] which analysed the capacity allocation and pricing in a loss system possibly with heterogeneous servers. The objective function considered in that paper is different from the others because the objective is maximizing profit and not minimizing blocking probabilities. This objective required analysis of the expected waiting times, which will also be important in the analysis presented here. In [20] routing policies were analysed with the goal of minimizing holding costs. Approximation analysis of ordered homogeneous-server systems can be found in [6] and [16], and heteregenous servers with a single queue (including a waiting buffer) under the Fastest Server First policy appeared in [2]. Another related work is [3] that considered a service capacity allocation problem for a system of parallel queues and heterogeneous customer types using a heavy-traffic approximation.
Paper outline: In Section 2 we present the model and mention some of its known properties. We define system stability along with necessary conditions for finite expected delay in Section 3. In Section 4 we introduce the special class of service-rate series that decreases geometrically. We prove that the tail of the optimal service-rate series is of this type. This fact is due to the product form of the blocking probabilities. In Section 5 we formally define our optimization problem as an infinite horizon dynamic program and suggest a numerical method to approximate its solution using the fact that the tail of the optimal series is geometric. We then proceed to present numerical analysis and examples of the optimal service-rate series in Section 6. The numerical results suggest that the optimal service-rate series is very close to geometric from the start, and not just at the tail. Finally, Section 8 features concluding remarks and a brief discussion of straightforward extensions of our analysis aimed at optimizing other performance measures of the system, apart from expected delay.

Model
Customers arrive at a service system according to a renewal process with mean interarrival time ET 0 = 1 λ . The system is comprised of an infinite number of parallel exponential servers that are ordered according to service-rate; µ 1 ≥ µ 2 ≥ · · · , such that ∞ n=1 µ n = µ > λ. Every arriving customer joins the fastest server available, and does not switch server even if a faster server later becomes available while he is still in the system. For a given µ, our goal in this paper is to find a series of service rates that minimizes the stationary expected sojourn time (delay) of an arriving customer.
Let X = (X 1 , X 2 , X 3 , . . .) be the random sequence of server indicators ,zero if idle and one if busy, at arrival times in the limit. The state space of the process can be defined as follows 1 , where A n = {x ∈ {0, 1} ∞ : n = sup{j : x j = 1}}. In words, A n is the set of all states such that the highest indexed busy server is n. Note that S is a countable collection of finite sets and is therefore countable. The underlying continuous time process is not Markovian, due to the general arrival distribution, however the embedded process at arrival moments is indeed a discrete-time Markov chain.
Remark. The random variables and distributional properties discussed in this work are all with respect to the limiting distribution X at arrival times, which is also the stationary distribution if the process is ergodic. Furthermore, all random variables depend on the service-rate series {µ n } ∞ n=1 , but we omit this from the notations for the sake of brevity.
Let Y := inf{i : X i = 0} denote the random variable of the fastest available server, according to the limiting distribution at arrival times. Further denote by S the respective limiting delay (service time) faced by an arbitrary arriving customer. The expected delay, is the expected service time at the fastest idle server upon arrival, We will soon argue that this limit is well defined even when there is no stationary distribution for the underlying process, in which case ES is infinite. To be specific, each probability P (Y = n) is derived from the limiting distribution of a Markov chain with a finite state space and therefore the infinite series is well defined and so is the sum. The state of any server n ≥ 1 depends only on the arrival process and on the service process at servers i ≤ n. For example, if the arrival process is Poisson then by viewing the first server as an isolated M/M/1/0 system, The distribution of Y is obtained from the blocking probabilities of consecutive subsystems, where P (X i = 1 ∀i ≤ n) is the blocking probability in a GI/M/n/n system with heterogeneous ordered servers. In the following analysis we use the more compact notation: p n := P (X i = 1 ∀i ≤ n), n ≥ 1, q n := P (Y = n), n ≥ 1.
We are interested in computing (1), which can be re-written as q n = p n−1 − p n . If the servers are homogeneous then the well-known Erlang Loss Formula can be applied for the blocking probabilities, but this is not possible for an infinite server system with finite capacity. Otherwise, the blocking probabilities are given in [29]: where p 0 := 1, L 0 (s) is the LST of the exogenous inter-arrival distribution, and is the Laplace-Stieljes Transform (LST) of the stationary time between overflows at server n, T n . Specifically, T n is the time between two consecutive arrivals that find the first n servers busy, and recall that T 0 is the external inter-arrival time. The recursive formula (3) generally relies on a Palm-type theorem for the renewal process of overflows at station n, that is, the probability at overflow times as opposed to the time-average distribution which is different as the counting process of overflows is not Poisson. The original result for homogeneous servers appeared in [24] (see also p.37 of [19]). A generalization to heterogeneous service rates appeared in [23] and a similar model with an additional waiting buffer for customers that find all servers busy upon arrival [7]. The derivative of (3) is and thus the mean time between overflows from server n is We next state a technical lemma that will be useful in the following analysis. We do not prove the lemma as these properties are straightforward extensions or rephrasing of known results. b. L n−1 (s) > L n (s) for any s > 0 and n ≥ 1 (Proof in [29]). c. If µ 1 ≥ µ 2 ≥ · · · then the decreasing series p n is convex in the discrete sense: p n−1 + p n+1 > 2p n , ∀n ≥ 2 (Proof in [28]).
Remark. In the following sections we use the notation a ∧ b := min{a, b}. We will also make frequent use of the notation a n ≈ b n to indicate that the two series, {a n } and {b n }, have the same tail behaviour. Formally, this means that there exists a constant 0 < C < ∞ such that lim n→∞ a n b n = C, and when used for the limits themselves it means that they are proportional: lim n→∞ a n lim n→∞ b n = C.
We use a n ≪ b n to indicate that C = 0. In the numerical analysis we will use when numerical results are close (in a non-accurate sense) to some value.

System stability
In an infinite-server system with infinite capacity, as in the homogeneous server GI/M/∞ model, the system is always stable. Normally a queueing system with finite capacity is stable if ρ < 1, in the sense that the number of customers in the system does not explode (and the underlying embedded Markov process is positive recurrent). However this is not sufficient for our model because the service allocation may cause the effective arrival rate to a subset of the system to exceed its service capacity. For example if the arrival process is Poisson then p 1 = L 0 (µ 1 ) = λ λ+µ 1 , further if µ 1 = µ − ǫ then the effective arrival rate to the system excluding server 1 is λp 1 = λ λ λ+µ−ǫ , which is larger than ǫ when ǫ is chosen small enough.
This paper does not directly address the issue of positive recurrence of the underlying process, X at arrival times, which seems to require a different approach than the blocking probability and delay computations employed here. Rather, we define two different levels of stability: the first simply states that all subsystems have a greater capacity than their effective arrival rate, and the second is finite expected delay -a condition that may not hold even if the underlying process is positive recurrent.
As a first reasonable condition, and as we will show one that is also necessary for finite expected delay, we would like a service-rate series {µ n } ∞ n=1 to satisfy That is, the effective arrival rate into the system excluding the first n servers is smaller than the remaining service capacity. In the memoryless arrival example, the first condition for n = 1 is λ λ λ+µ 1 < µ − µ 1 , or equivalently, Definition. A service-rate series {µ n } ∞ n=1 is feasible if it satisfies condition (6). Denote the set of feasible service-rate series by Proposition 2. For any λ ∈ (0, µ) there exists a feasible service-rate series (M = ∅) that is decreasing.
Proof. If λ < µ then the range for µ 1 given by Let m 1 be the solution to µ 1 = µ − λL 0 (µ 1 ). Recall that L 0 (0) = 1 and that λ < µ, therefore as L 0 is an LST, hence convex, there exists a unique solution m 1 > 0. This argument is illustrated in Figure 1. Any point µ 1 in the interior of the interval (0, m 1 ) satisfies the stability condition for n = 1, in particularμ 1 = αm 1 , for any α ∈ (0, 1). Suppose thatμ 1 , . . . ,μ n is a decreasing series that satisfies (6), then by applying the product form of (2) we have that condition (6) is satisfied for n + 1 if Let m n+1 > 0 be the unique solution to µ n+1 = µ − n i=1 µ i − λp n L n (µ n+1 ). Repeating the argument illustrated in Figure 1, the solution is unique and positive because L n (µ n+1 ) ∈ [0, 1] is convex and condition (6) is satisfied for n. We thus conclude that any µ n+1 ∈ (0, m n+1 ) satisfies (6). In particular, for any α ∈ (0, 1)μ n+1 := α(m n+1 ∧ µ n ) is non-increasing, feasible and ∞ n=1μ n ≤ µ. Recall that regardless of whether the service-rate series is feasible, the sub-system of the first n servers is ergodic for every finite n, hence the limit probabilities p n and q n exist for any service-rate series. Further observe that if condition (6) is satisfied for some N then it is satisfied for all n < N as well, as if this was not the case, i.e., Figure 1 for the case that the solid convex overflow rate line starts above the dotted linear capacity allocation line).

Lemma 3.
For any service-rate series {µ n } ∞ n=1 such that µ n > 0 for all n ≥ 1 and ∞ n=1 µ n = µ, there exists a limit Proof. Iterating the recursion of (3) yields By Lemma 1a each term in the product in the denominator is smaller than one, therefore for any n ≥ 1, The lower and upper bounds are obtained using Lemma 1a: first the fact that the LST is a decreasing function yields , and furthermore every every term in the series L n−1 (µ n ) is bounded from above by 1, hence, which implies that the product in the denominator does not converge to zero. If the limit a := lim n→∞ We will verify the existence of the limit a in three steps as outlined below: 1. We show that a in ∈ (0, 1] is increasing with n and therefore has a limit a i := lim n→∞ a in .
a in converges to a limit b < ∞, and therefore the product n−1 i=1 a in converges to a limit a = e −b .
Step 1: Since µ n > 0 for all n ≥ 1 then the convexity of L i−1 implies that hence, a in is increasing with n and bounded by one, and thus there exists a limit, Step 2: Let b in := | log(a in )| and b i := lim n→∞ b in , and observe that log(a in ) ≤ 0 as Step 3: The monotonicity of b in further implies that and as the sum is bounded when taking n → ∞, for any The above holds for an arbitrary ǫ > 0 and thus we conclude that lim n→∞ We conclude that there exists a limit ℓ = lim n→∞ L n−1 (µ n ) = L 0 (µ) a . We now turn our attention to the expected delay, Definition. A service-rate series {µ n } ∞ n=1 satisfies finite delay (FD) if it belongs to From (1) and (2) we have that is, a non-homogeneous geometric distribution. For n ≥ 1, recall that ℓ n := L n−1 (µ n ), ℓ := lim n→∞ ℓ n and denote p := lim n→∞ p n . If ℓ < 1 then the geometric term tends to the constant ℓ, i.e. the tail is as of a memoryless distribution.
Lemma 4. For any external arrival distribution T 0 and service-rate series Proof.
a. By (2), p n = n i=1 ℓ i . For any positive series {a n } ∞ n=1 the convergence of the product ∞ n=1 (1 − a n ) to a non-zero and finite limit is equivalent to the convergence of the sum ∞ n=1 a n (see [1], p. 209), hence p n hence by the previous property we have that p = 0.
c. An equivalent condition for Y being heavy-tailed is given by Theorem 2.6 of [10]: As P (Y > n) = p n , this is equivalent by the Stolz-Cesáro Theorem (discrete version of L'Hopital's Rule) to log p n − log p n+1 n→∞ − −− → 0, and as p n = ∞ n=1 ℓ n , we conclude that converges to zero if and only if ℓ = 1.
d. Recall the definition of the LST, ℓ n = L n−1 (µ n ) = Ee −µnT n−1 , then by applying Jensen's inequality and (5) we conclude that Lemma 4 suggests that the tail behaviour of the LST series L n−1 (µ n ), and its limit in particular, is a key component in analysing the stability and expected delay in the system. The following proposition summarizes the relationship between feasibility, finite expected delay and the tail behaviour of the LST series. In particular we obtain a necessary and sufficient condition for finite expected delay: any feasible service-rate series that satisfies ℓ < 1 with slower decay rate than ℓ. This will be useful for the optimization problem in the following sections.
Proposition 5. Let λ and {µ n } ∞ n=1 be the arrival rate and service-rate series, such that ∞ n=1 µ n = µ > λ. Then the following properties are satisfied: Proof.
a. This can be seen directly from (6) as the right-hand side tends to zero due to the capacity constraint.
b. The sum of the service-rate series converges to a finite sum, therefore its tail decays faster than that of the harmonic series. Without loss of generality, as we are only interested in the tail behaviour, we assume this is the case for all n ≥ 1: The expected delay then satisfies Hence, ℓ = 1 implies that ES = ∞. In other words, ℓ < 1 is a necessary condition for finite delay.
c. If ℓ < 1 then by Lemma 4c Y is not heavy tailed: there exists η > 0 such that ∞ n=1 q n e ηn < ∞.
If we further assume that then the tail of the service-rate series decays even faster than the exponential term, i.e., 1 µ n > e ηn ⇔ µ n < e −ηn .
Equivalently we can say that µ n < α n for α = e −η . If µ n = β n such that ℓ < β < α then contradicting the assumption that ES = ∞. Hence, if ℓ < 1 and the service-rate series decays slower than ℓ n then the expected delay is finite.
d. Any series {µ} ∞ n=1 that decays at least as fast as ℓ n induces infinite expected delay because ∞ n=1 ℓ n µ n = ∞.
hence the tail of the service-rate series is on the boundary of the feasible range given by (6). This means that the inequality condition of (6) is satisfied for every n although the difference converges to zero, and moreover that any series decreasing at a faster rate is not feasible. We conclude that feasibility of a series, {µ n } ∞ n=1 ∈ M, is a necessary condition for finite delay, {µ n } ∞ n=1 ∈ F D.
Proposition 5 yields a convenient necessary and sufficient condition for a feasible service-rate series to satisfy ES < ∞, by combining parts b and c of the proposition: We conclude this section by pointing out a few open questions and additional refinements of the stability analysis that can be considered in future work on this model. Remark 1. We conjecture that a stronger result than Proposition 5 holds, namely that Proposition 5c establishes that if µ n decays slowly enough then ℓ < 1 is sufficient for FD. Furthermore, if ℓ < 1 and ES = ∞ such that µ n ≤ β n , where β < ℓ, then µn pn ≤ β n ℓ n → 0, and by Lemma 4d we conclude that ℓ n → 1, a contradiction. That is, if the service-rate series decays faster than the blocking probability then ℓ = 1 and the expected delay is infinite. We are left with checking the case of µ n ≈ p n ≈ ℓ n , where ℓ < β. We believe that in this case ℓ n → 1 as well, and this belief is supported by numerical tests, but we were unable to prove this claim. In such a case ℓ n → 1 at a slow rate (in the sense of Lemma 4a). If this is true then indeed ES < ∞ ⇔ ℓ < 1, but we leave this issue as an open question. A more speculative conjecture is that the extreme case on the boundary of the feasibility region, µ n ≈ ℓ n , occurs when the underlying process is null-recurrent.

Remark 2.
An additional open question is whether for any λ < µ there exists a feasible service-rate series {µ n } ∞ n=1 such that ES < ∞, i.e. F D = ∅. We conjecture that this is the case, but have no proof. Note that for any finite N it is possible to construct a series {µ} N n=1 such that µ n decays at a slower rate than p n (by some positive factor), but the difficulty lies in showing that the rates don't coincide when taking N → ∞.
Remark 3. Little's Law implies that a finite expected delay, ES < ∞, is equivalent to a finite expected number of customers in the system. This means that a feasible servicerate series and ℓ < 1 are both necessary, but not sufficient, conditions for the expected number of customers in the system to be finite, which in itself is sufficient but not necessary for general system stability (in terms of positive recurrence of the underlying process). Nevertheless, the probability that a customer that arrives at server n, after being blocked by the previous servers, finds it busy is P (X n = 1|Y ≥ n) = L n−1 (µ n ). This can be seen by considering the blocking probability of the first server in a system with external arrival distribution L n−1 . From Lemma 3b we have that for any service-rate series, This gives us an interesting result: "bad" servers, i.e. large n and slow service-rate µ n , block a fixed proportion of arrivals to them.

Geometric service-rate series
A very natural capacity allocation to consider is using a simple geometric series determined by a single parameter. This is especially called for in light of Proposition 5 that established that if ℓ < 1 and the service-rate series decay slower than a geometric series with rate ℓ then the expected delay is finite. Moreover, such service-rate series satisfy properties that will be useful for dealing with the capacity allocation problem. Namely, the stability and finite delay conditions have a simple form and the tail of the optimal solution is indeed approximately geometric under some invariance conditions which will be elaborated.
Suppose that the service-rate series is determined by a single parameter representing the service capacity allocated to the first server. If we assume without loss of generality that µ = 1 (and then ρ = λ), then the class of such service-rate series is In this formulation, the single parameter is the service allocation of the first server, µ 1 = α. For any {µ n } ∈ M g we have that ∞ i=n+1 µ i = (1 − α) n , and therefore the feasibility condition (6) is simply λp n < (1 − α) n , ∀n ≥ 1, and the finite delay condition (9) is It is possible that the feasibility condition is met but p n → (1 − α) (from below) and then condition (10) is not met. Let ℓ n (α) := L n−1 (α(1 − α) n−1 ) and ℓ(α) = lim n→∞ ℓ n (α). In Figure 2    There are several interesting numerical observations to be made on ℓ n (α). For every n ≥ 2, the function ℓ n (α) is unimodal (attaining a minimum) and ℓ(0) = ℓ(1) = 1. Furthermore, the slope of the functions at ℓ n (0) is decreasing with n, which can be verified by recalling that the derivative of the LST at zero equals the negative of the overflow expectation given by (5): ET n = 1 λpn (which goes to −∞ as n → ∞ and explains why there seems to be a downwards discontinuity at zero as n gets large). This implies that for every n ≥ 1 there exists an α ∈ (0, 1) such that ℓ n (α) < 1 − α. It appears that this is the case also for the limit ℓ(α), which implies the finite delay condition (10), but we currently have no proof to this effect. A proof of this would resolve the open question described in Remark 2 in the previous section. The limiting function ℓ(α) appears to have a region of invariance as n grows, specifically: the function starts at ℓ(0) = 1, sharply decreases after zero, has an interval α ∈ (0, α) which it is almost constant ℓ(α) ℓ, and then sharply increases back to ℓ(α) 1 for α ∈ [α, 1]. In the case of a Poisson arrival process we observe that ℓ = ρ and α = 1 − √ ρ. The latter value is the explicit solution of ℓ 1 (α) = ℓ 2 (α), i.e. the α value where the first and second functions intersect. Interestingly, it appears that all of the functions intersect at around the same point. It is hard to tell whether the limiting function ℓ(α) would have an upward discontinuity to 1 at α or just a sharp and continuous increase as we see for n = 25. We were unable to computationally explore the function for higher values. The invariance of the limit function ℓ(α) also has implications on the delay-minimization problem: if ℓ(α) = ℓ for all α ∈ (0, α) then the tail of the objective function has a very simple form, ℓ n (1−α) n , and the optimal α can be computed as described below. Suppose now that the blocking probability for all n ≥ 1 is p n = ℓ n , and consequently q n = (1 − ℓ)ℓ n−1 , where ℓ < 1. We already established in Lemma 3b and Proposition 5 that this is a reasonable approximation for the tail behaviour of the expected delay for any feasible service with finite delay. In the sequel (specifically in Lemma 8) we will also show that if {µ n } ∞ n=1 ∈ F D then ℓ has a certain degree of invariance to the tail of {µ n } ∞ n=1 , thus providing additional justification for the use of approximation of the optimal solution with a fixed ℓ. The optimal service-rate series for such a system is the solution to an infinite dimensional convex program on a simplex: We refer to (11) as the Tail Approximation Program (TAP). The following proposition asserts that the solution to the TAP is in M g ∩ M with α = (1 − √ ℓ). This solution resembles the square-root optimal capacity allocation in a Jackson network (see [15], p. 329) 2 , but there is no direct link between the models. The program is a convex infinite horizon program, in the sense of [11] (for general optimality conditions see [5], p. 153), which allows us to find the optimal solution as a limit of finite dimensional programs. Proposition 6. The solution to (11) Proof. First of all we argue that the optimal service-rate series is non-increasing by applying a simple interchange argument. Suppose that {µ n } ∞ n=1 is an optimal service-rate series such that µ i < µ j for some i < j. The contribution of elements i and j to the objective function is If i < j then ℓ i > ℓ j , which means that a greater weight is given to 1 µ i which is bigger than 1 µ j . Hence, we can improve the objective without deviating from the capacity constraint by switching the values of µ i and µ j . This contradicts the assumption that the series is optimal. The objective function is an infinite sum of convex single-variable functions. We first consider the finite program for an integer M, Every element of the objective function is unbounded as µ n → 0 and therefore the solution is in the interior of the constraint set. This means that every element satisfies the firstorder condition − ℓ n µ 2 where κ is the Lagrange multiplier for the equality constraint κ M n=1 µ n − 1 = 0.
Simple algebra then yields

Optimization and approximation
We are interested in solving the mathematical program, This program can be formulated as an infinite horizon Markov Decision Process with state and action dependent discount factor (see [26]). The idea is that at every step n we consider a new system with inter-arrival distribution given by the overflows from server n − 1 and the remaining capacity constraint. The discount factor at step n will be given by the blocking probability when a customer overflows to server n, L n−1 (µ n ).
First we define the mappinĝ where L is the space of non-increasing functions from [0, ∞) to [0, 1]. For any n ≥ 1, given the overflow distribution L n−1 we take advantage of the recursive form of q n in (8) to obtain where by (2), The objective function of (12) can then be written as Therefore, an equivalent program to (12) is given by the Bellman equation v(µ, L) = min with the objective v(µ, L 0 ). In every step µ is the total available capacity and L is the LST defining the external arrival process to the system. While (13) has an elegant form it is not straightforward to solve even numerically. This is due to the infinite dimensional state space L, which is a space of continuous functions. We next suggest an equivalent program with a simpler state space that includes the server index and the capacities that have been allocated. For any given exogenous arrival distribution L 0 we can compute the values of L n (s) given the series {µ 1 , . . . , µ n−1 } using the recursive formula (3). The program (13) with capacity constraint µ can then be defined by the Bellman equation where s n := n i=1 µ i . The overall objective is v 1 (∅). Unfortunately there is an additional problem of computational complexity. Specifically, computing q n requires computing the recursion for L n−1 (µ n ) which is of the magnitude of 2 n steps. In the sequel we propose a numerical approximation method that relies on the solution of (14) for a small number of steps with the TAP solution (11) as an initial condition.
Observe that there is no direct restriction for the solution of (14) to be non-increasing, which is necessary if customers always go to the fastest server available. We next argue that the optimal series is indeed non-increasing, even without the explicit constraint. In Proposition 6 the explicit geometric decay rate of the optimal service series was shown to be √ ℓ for the approximation model, whereas in the general case we only know that it decreases but not at what rate. (12) is a non-increasing series. Proof. Suppose that {µ n } ∞ n=1 is an optimal solution such that µ n < µ n+1 for some n ≥ 1. The average expected delay is If the rates of server n and n + 1 are interchanged then first summand is unchanged, while the second is decreased because all blocking probabilities p k for k ≥ n decrease (see [17]), thus, contradicting the optimality of the series.
A nice property of decreasing service-rate series is given to us by Lemma 1c, which states that the series of blocking properties p n is discrete convex: Recall that p n = n i=1 L i−1 (µ i ), so in terms of the LST series this is equivalent to and thus This means that the series q n is decreasing, hence the weights of the increasing series of expected service times, 1 µn , in the objective function of (12) is decreasing.

Approximate solution
If ℓ < 1 then (2) gives us a geometric approximation of the tail behaviour of the blocking probabilities p n ≈ ℓ n . Thus, for large M we set Due to Proposition 6 we have that if the series ℓ n does not vary by much then a good approximation for the optimal solution is given by a geometric service-rate series with decay rate √ ℓ. Specifically, the optimal tail series is and the approximate optimal residual is For small values of M (≤ 25) we can accurately compute q 1 , . . . , q M and approximate the optimal residual by using the TAP solution. This yields the approximation and ℓ M := L M −1 (µ M ). The term r M represents the residual of the expected delay given by the tail approximation. According to Proposition 5b, ℓ < 1 for any series with finite delay. A finite-horizon dynamic program that approximates (14) can now be formulated: for 1 ≤ n ≤ M, v (M ) n (µ 1 , . . . , µ n−1 ) = min µn≤µ−s n−1 with initial condition v (M ) M +1 (µ 1 , . . . , µ M ) = r M and the objective v 1 (µ). We can increase M until r M is lower than some tolerance parameter, or alternatively until the change in the L n−1 (µ n ) series is smaller than some parameter.
The tail series {µ n } ∞ n=M +1 needs to satisfy the capacity constraint, which according to (15) yields In an optimal allocation (17) will have an equality, as there is no gain from not allocating all of the capacity. Lemma 1a implies that for every (µ 1 , . . . , µ M −1 ) there is a unique µ M for which an equality holds. Therefore, the dynamic program effectively only has M − 1 steps.
(3) For any allocation (µ 1 . . . , µ M −1 ) we can compute µ M which yields a single value v (M ) M +1 (µ 1 , . . . , µ M ) = r M . The search at any stage can be carried using a more efficient approach, such as bisection, and then the functions do not have to be evaluated at every point in the continuous search interval for every µ n . In practice, the search finds the optimal value in a very small number of computations using various generic optimization packages and the computational bottleneck is the evaluation of L n () for increasing n.
In the TAP the blocking probability is assumed to be constant, specifically the limit ℓ. For this to be a good approximation the tail of the series ℓ n := L n−1 (µ n ) needs to be somehow insensitive to changes in the tail of the service-rate series. This behaviour appeared in Observation 2 in Section 4 and was illustrated in Figure 2. To strengthen the justification of this approximation we further show that the tail of any service-rate series with finite delay is decreasing and is bounded from below by an increasing series. Proof. According to Lemma 3 the LST series has a lower bound of L 0 (µ), i.e. the external input LST with all of the capacity. This argument can be repeated for every overflow distribution L n−1 into server n, when the remaining capacity is µ − s n−1 . And so at any step n of the dynamic program (16) we have ℓ n = L n−1 (µ n ) ≥ L n−1 (µ − s n−1 ) =: ℓ n .
By (3), and as µ n + µ − s n = µ − s n−1 and the denominator is smaller than one we have that Hence the series of lower bounds, ℓ n , is increasing with n. Furthermore, if the series {µ n } ∞ n=1 is non-increasing and has finite delay then the series qn µn converges to zero and is therefore decreasing at the tail. Using (8), this implies that for large n: which yields 1 − ℓ n 1 − ℓ n+1 < µ n µ n+1 ℓ n < 1.
The last inequality comes from the finite delay condition in Proposition 5c that demands that the decay of the service-rate series be slower than that of the blocking probabilities. We therefore conclude that ℓ n > ℓ n+1 at the tail.
To summarize, for any reasonable service-rate series, that is non-increasing and with finite delay, the series ℓ n is decreasing and is also bounded from below by an increasing series. This shows that changing the service-rate series at the tail has a small, or bounded, effect on the limit ℓ, as long as the finite delay condition is met. In Figure 3 the lower bound series is illustrated alongside the LST series for the approximated optimal solution for an example set of parameters. Indeed, ℓ n approaches the lower bound series ℓ n very quickly. In this example we have that 0.025, which indicates that the error term is very accurate even for M = 15 (and at M = 19 the normalized error is already smaller than 0.001). For other examples similar behaviour was observed, and, as expected, for higher levels of ρ a bigger M was required to achieve good accuracy.

Numerical analysis
In this section we assume µ = 1 and that the external inter-arrival distribution is Gamma(k, kλ). In this case the overall utilization is ρ = kλ k = λ and the variance of the exogenous inter-arrival times is 1 kλ 2 = 1 kρ 2 . We can therefore examine different levels of utilization and variance by changing ρ and k. For high levels of ρ this analysis is quite general as the blocking probabilities in the heavy-traffic approximation of the GI/M/∞ are known to depend only on the first two moments of the arrival distribution (see [27]). Figure 4 illustrates the approximate optimal service-rate series by solving (16) for different parameter values with M = 15. The first thing to observe is that in all examples the approximate optimal service-rate series is very close to geometric.
In Table 1 we present the approximate optimal values of ES and the respective tail approximations r M . For high levels of ρ the contribution of the tail approximation is substantial and hence potentially less accurate. However, we find that the series of L n−1 (µ n ) stabilizes very quickly and therefore the approximation ℓ L M −1 (µ M ) is quite accurate, suggesting the error terms provide a decent approximation. The series of LST of the approximated optimal service-rate series are illustrated in Figure 5. In all examples the series indeed stabilizes very fast, hence the tail approximation using the value of ℓ M is appropriate. This stability was further verified by running simulations for a large number of servers using the approximate optimal service-rate series.      Figure 5: Laplace transform series (L n−1 (µ n )) corresponding to the approximate optimal service-rate series for varying values of ρ and k.
In the special case of Poisson arrivals (k = 1) it is interesting to observe that µ 1 1 − √ ρ, as seen in Figure 6. Thus, a reasonable rough approximation for the optimal service-rate series given by The value √ ρ appeared in two places before: (1) The solution to the equation L 0 (α) = L 1 (α(1 − α)) is α = 1 − √ ρ, as was elaborated in Observation 2 of Section 4 (see also Figure 2). This seems to be a critical point for the limit function ℓ(α) for geometric service-rate series (with rate α).
(2) The optimal tail decay rate given in Proposition 6 is √ ℓ, where in the Poisson case we observe that ℓ Cρ where C is a constant that was in the range of (1, 1.15) in all examples computed. Let ρ 0 = ρ = λ µ and let ρ n := λpn µ− n i=1 µ i denote the effective utilization of the subsystem excluding the first n servers for n ≥ 1. In Figure 7 we see that the effective utilization series, given the approximate optimal service-rate series, is decreasing with n for all parameter values. An interesting result is that in all examples the tail of the utilization level series decays geometrically. The rate of decay is slightly higher than 1−µ 1 , that is to say the effective utilization decreases at a slower rate than the service-rate series, as expected.

Summary of numerical results
The approximated optimal service-rate series is close to geometric with decay rate 1 − µ 1 . In the special case of a Poisson arrival process we observed that µ 1 1 − √ ρ and ℓ was slightly larger than ρ. When considering a fixed ρ, lower variance of the exogenous arrival stream leads to a higher service-rate for the first server, along with a faster decline to zero. However, as ρ increases the service capacity is allocated more "uniformly" (with a lower decay rate). The effective utilization series of sub-systems, ρ n , under the approximate optimal service-rate series is decreasing. Furthermore, this series has a geometric tail with a slower decay rate than the optimal service-rate series.
Recall that in order for the approximation to be reasonable we would like the series of overflow LST L n−1 (µ n ) to approach a constant at a quick rate. Indeed, we observe that it stabilizes very fast, suggesting that our approximation scheme using its limit is accurate even for a small number of DP steps M. This may provide some explanation as to why the optimal series seems geometric from the start. If the series L n−1 (µ n ) is more or less constant from the start then the solution to the TAP from Proposition 6 is close to optimal. It would be interesting to find an analytical explanation for why the optimal service-rate series comes with a stable overflow LST series.

Applications
The analysis presented here can be modified in order to solve several other system design problems, for example: a. Multi-objective optimization: suppose that the system administrator can choose the number of servers as well as the capacity allocation. If n servers are used then the system has a customer loss probability of p n . The administrator may seek an optimal allocation with a constraint on the loss probability, e.g. p n ≤ p < 1. The other way around is also an option: minimize p n subject to a constraint on the delay, ES ≤ w.
b. Suppose that the system administrator wants to maximize the number of users that wait less than some τ > 0 time threshold. This could be an exogenous performance measure or the case if customers do not pay if their delay is too long. The objective is now c. Customers are heterogeneous with respect to the utility from the speed of service, and balk from the system if the fastest available server is slower than their value. Assume that customers are distributed according to a continuous distribution with a convex cdf Λ such that Λ(0) > 0. If the system wants to minimize the blocking probability then the objective becomes min {µn} ∞ n=1 ∞ n=1 q n (1 − Λ(µ n )).
In this case the overflow distribution also requires a modification toL n−1 (µ n ) = L n−1 (µ n )Λ(µ n ), in order to take into account the balking customers.
d. In general, our analysis can be applied to any convex function of the service rate.

Discussion
This paper analyses stability and expected delay in an infinite-server system with finite service capacity. In particular, the expected service-time minimizing allocation of service-rates is examined. It is shown that the optimal service-rate series is geometrically decreasing at the tail. Numerical analysis suggests that the optimal allocation is very close to geometric throughout the series, and not just at the tail. This property is related to the product form of the blocking probabilities from finite sub-systems. An interesting numerical observation is that in the Poisson case we have that ℓ ρ (the limiting term in the blocking probability product) and consequentially the optimal tail decay is simply √ ρ. An open challenge is to find analytical justification for this phenomenon.
The most important conclusion of the paper is that allocating capacity to heterogeneous servers under capacity constraints should be done with caution. Even if there is seemingly enough capacity for the incoming arrival rate, allocating too much capacity to the fast servers can lead to very long expected delay. In this paper we analysed an infinite server system but this conclusion is also relevant for finite server loss systems with very low blocking probability, in which case expected delay would be finite but potentially very big. This is most relevant for very large systems, however the geometric tail behaviour implies that with a "bad" allocation the expected delay can increase very fast with the number of servers and therefore the conclusion is still relevant for moderately sized systems, i.e. not necessarily hundreds of servers.
Stability analysis in the probabilistic sense of the underlying Markov chain X(t) ∈ S is also of interest. Specifically, establishing necessary and sufficient conditions for the process to be positive recurrent. This can perhaps be achieved by considering the embedded Markov chain at arrival times. For the Markovian M/M/∞ ordered system, lower-bound and upper-bound systems with simpler dynamics can possibly be constructed and analysed using matrix-analytic methods. Rigorous characterisation of the stability conditions could potentially also shed some light on the open problems discussed in the end of Section 3. In particular, establishing the exact necessary and sufficient conditions for expected finite delay. The distinction between positive and null recurrence is potentially the additional refinement required for dealing with the case of a service series on the boundary of the feasibility region.
A question that arises is whether there is an asymptotic analog to the optimal geometric series we have presented here, perhaps even in closed form. Detailed heavy-traffic analysis for the homogeneous ranked M/M/∞ system can be found in [18], and approximation analysis of blocking probabilities in multi-server systems can be found in [27]. Approximations of the number of idle servers in many-server systems, such as [9], may also be useful. Approximating our model requires the analysis of systems with non-homogeneous servers (see [3]).