Persistent-Idle Load-Distribution

A parallel server system is considered in which a dispatcher routes incoming jobs to a ﬁxed number of heterogeneous servers, each with its own queue. Much eﬀort has been previously made to design policies that use limited state information ( e.g., the queue lengths in a small subset of the set of servers, or the identity of the idle servers). However, existing policies either do not achieve the stability region or perform poorly in terms of job completion time. We introduce Persistent-Idle (PI), a new, perhaps counter-intuitive, load-distribution policy that is designed to work with limited state information. Roughly speaking, PI always routes to the server that has last been idle. Our main result is that this policy achieves the stability region. Since it operates quite diﬀerently from existing policies, our proof method diﬀers from standard arguments in the literature. Speciﬁcally, large time properties of reﬂected random walk, along with a careful choice of a Lyapunov function, are combined to obtain a Lyapunov condition over suﬃciently long time intervals. We also provide simulation results that indicate that job completion times under PI are low for diﬀerent choices of system parameters, compared to several state-of-the-art load-distribution schemes.


Introduction
This paper is concerned with the problem of distributing incoming jobs among a fixed set of K heterogeneous servers working in parallel, each with a dedicated buffer in which a queue can form.Jobs arrive at a dispatcher, which routes them immediately to one of the queues.Many load distribution schemes have been proposed for this setting, differing from one another in the assumptions on the information available to the dispatcher upon arrival, and the way in which this information is used.It is well known, for example, that if job processing times are available upon arrival, routing each job to the queue with the least amount of workload (the time it will take the server to complete its currently assigned jobs) results in excellent performance in terms of job completion time (e.g., Foss (1982), Daley (1987), Wolff (1987), Koole (1992)).This policy is referred to as join-the-least-workload (JLW).A similar policy, also with appealing performance guarantees (e.g., Winston (1977), Weber (1978), Foschini and Salz (1978)), is the well-known jointhe-shortest-queue (JSQ), which uses full knowledge of current queue lengths to route the job to the shortest queue.Unfortunately, deriving such complete state information upon each arrival is often impossible in applications.Indeed, processing times are often not known in advance, and therefore workload is not observable (e.g., Georgiadis et al. (2006)).Another issue is that queue length information may incur excessive communication overhead and delay involved in probing the servers for this information (e.g., Lu et al. (2011)).
Several policies that use just a small amount of state information, attempting to approximate JSQ, have been proposed.For example, in the power-of-d-choices policy, denoted SQ(d), the dispatcher is only aware of the current queue lengths in d > 1 queues chosen uniformly at random, and routes the job to the shortest among these queues.Even using d as small as 2 has been proven to result in dramatic performance improvement as compared to sending jobs randomly (e.g., Vvedenskaya et al. (1996), Mitzenmacher (2001)).However, it is well known that, when servers are heterogeneous, SQ(d) is not stable (Foss and Chernova (1998)).Here and throughout the paper, we refer to a policy as stable if the underlying Markov chain describing the state of the system (e.g., queue lengths) is positive recurrent whenever the system is sub-critical.The instability of SQ(d) is the result of the fact that all servers are equally likely to be members of the chosen d.
Thus slow servers receive more work than they can handle when the overall load is high enough.
Another policy that operates under little information transmission is the power-of-memory policy, denoted SQ(d,m) (Shah and Prabhakar (2002)).Upon a job arrival, the dispatcher samples the m shortest queues from the previous round in addition to d ≥ m ≥ 1 new randomly-chosen queues.
The job is routed to the shortest among these d + m queues.Thus the dispatcher state information consists of d + m queue lengths.It was shown that SQ(1,1) is stable (Shah and Prabhakar (2002)).
However, under this policy, when a server becomes idle, it may take time for the dispatcher to get an update on this status, due to the random sampling mechanism, whereas under PI an update on idleness is immediate.This is expected to have a consequence on the cumulative idle times of servers, and as a result, on job completion times.In our simulations, we indeed observe that SQ(1,1) performs poorly in terms of job completion time compared to PI as well as other policies we examine (see §6 for details).
A recently proposed policy that is also aimed at working with reduced state information is jointhe-idle-queue (JIQ) (Lu et al. (2011)).In this policy, the dispatcher maintains a list of the servers that are currently idle.When a job arrives, a member of the list is chosen (e.g., the first member), and the job is routed to it.If no servers are idle, the job is routed to one out of the K servers, which is chosen uniformly at random.JIQ was analyzed in the large system limit (e.g., Lu et al. (2011), andStolyar (2015)) and was shown to achieve excellent performance for homogeneous systems at scale.However, this policy is not stable in heterogeneous systems with a fixed number of servers due to the randomization in the case where no servers are idle.This is demonstrated in Example 3 in the Appendix, and supported by the simulation results in §6.
Persistent-Idle (PI) load-distribution.The policy proposed in this paper adopts the approach of JIQ, according to which the dispatcher knows which servers are idle.In addition, the dispatcher remembers the identity of the last server it sent a job to.As in JIQ, when a job arrives, and there are idle servers, it is sent to one of them (e.g., chosen uniformly at random).However, in PI, if there are no idle servers, then it is sent to the last server to which a job was sent.Thus, as long as servers are not idle, PI sends all incoming work to a single server, until a new server becomes idle, in which case all new jobs are routed to the new server, and so on.This perhaps counter-intuitive approach is fundamentally different from the policies discussed above, which strive for instant balance of the load across the servers.Accordingly, this raises immediate concerns regarding the duration of the period of time during which a single queue receives all incoming work, and how this affects stability and job completion time.
Our main contribution is in establishing the stability of PI in heterogeneous systems.Due to the unique nature of PI, which does not myopically stabilize the (multidimensional) queue length process (and in which jobs can even be sent to the longest queue), standard techniques of proving stability fail.We expand on the difficulty of this problem and the motivation for our proof technique in § 3.
We also provide simulation results comparing PI and all of the policies discussed above in heterogeneous systems.The simulations indicate that PI significantly outperforms the other reduced-state policies (i.e., SQ(2), SQ(1,1) and JIQ) in terms of job completion time and average workload.
Finally, similarly to token-based policies (e.g., Stolyar (2015)), PI can be implemented using a fixed set of K tokens, each going back and forth between the dispatcher and the server to which it belongs.Namely, a server sends its token when it becomes idle; the dispatcher sends it back only with a job to process.The required communication overhead is at most one message per job (Stolyar (2015)).We also demonstrate this via simulations in §6.
Organization.In §2, the model is described, and the main result of stability is stated.§3 provides a motivation for the proof, which is then presented in §4.§5 presents and establishes several auxiliary lemmas and their proofs that are used in §4.Finally, §6 presents a simulation study that compares the performance of the PI policy with that of several load distribution policies.

Model and main result
We start with the introduction of some notation that is used throughout the paper.Model.We consider a parallel-server system with a dispatcher and a set [K] of heterogeneous servers, as depicted in Figure 1.Each server is work-conserving and has a buffer of infinite size in which a queue can form.The system evolves in discrete time n ∈ N, such that at each time, first a possible arrival occurs and then service is given (hence a server can work on a job at the time at which it arrives).The resulting state of the queue is then determined.In case a queue is empty, we refer to the corresponding server as being idle.Thus the term idle refers to the state at which a server presently has no jobs to process.

Notation
Idle tokens.An idle token (token for short) is associated with each server.Each of the K tokens can be held by either the corresponding server or the dispatcher.A mechanism that assures that at all times a token is held by the dispatcher if and only if the corresponding server is idle operates as follows.The dispatcher initially holds the tokens of the servers that are idle.When an idle server becomes busy, it receives its token from the dispatcher along with the arriving job.When a non-idle server becomes idle, it sends its token back to the dispatcher.All propagation times of jobs and tokens are zero.
Persistent-Idle policy.At each time, the dispatcher selects a server to which any (potential) arrival is to be routed.The dispatcher remembers its last decision and updates it according to the following rule: if it has no tokens that are different from the token corresponding to its last decision, it maintains its decision; otherwise, it selects a server corresponding to one of the tokens that it holds, uniformly at random.
For ease of exposition we assume that the dispatcher may update its selection at each time regardless of whether a job arrived or not.This may appear wasteful.However, the analysis remains the same when the dispatcher may only update its selection whenever an actual arrival occurs.
Arrivals and services.Let a probability space (Ω, F, P) be given, on which all the random variables and stochastic processes to be presented are defined.Denote by E the expectation w.r.k) , and Var(b We assume that on the event {a n = 1}, if the arriving job is assigned to server k, the amount of time it takes this server to complete the job, that we also refer to as the amount of work associated with it, is given by b (k)  n .Thus, the parameters {µ (k) } k∈K represent the mean service rates of each of the servers.Since b (k)  n takes values in N, we have µ (k) ≤ 1 for all k.The sequences {a n }, {b (1) n }, . . ., {b (K)  n } are assumed to be mutually independent.We do not make any assumptions on the service policy within each buffer.For example, the server can provide service to-completion according to some rule such as FIFO or LIFO, or process a unit of work belonging to each of the present jobs in a round-robin fashion.
The same mathematical model describes a more general situation, in which multiple jobs may arrive at each time, all routed together to the same server according to the aforementioned policy.
The event {a n =1} is then interpreted to mean that at least one job arrives at time n, and b (k) n models the total amount of work associated with these arrivals, provided that they are routed to server k.Whereas the results to be presented apply to this extended model, it is more convenient to stick to the original model, as far as the verbal description is concerned.
Routing.Denote by W (k) n the workload in buffer k at time n ≥ 0, defined as the total work present in the buffer after possible arrival and service at time n, and let The routing is encoded in a process that takes values in [K], denoted by R n .If a job arrives at time n, it joins the queue of the server specified by the value of R n .Recall that under the PI policy R n+1 may differ from R n if and only if at time n, after arrival and service, the dispatcher holds at least one token that is different from R n .We refer to such a time n as a New Token Available (NTA) time.Thus, if n 1 and n 2 are two consecutive NTA times, R n 1 +1 is chosen uniformly at random from the set of available tokens at time n 1 and the routing process Then n is an NTA time if and only if η n =1.Note that the event {η n =0} describes two possible scenarios: (a) there are no idle servers at time n, and (b) there is a single idle server at time n, corresponding to the last used token.
Order of events.For clarity, before turning to the Markov chain formulation, we state the order of events at each time.At time n: (1) R n is determined with respect to W n−1 ; (2) there is a possible arrival which is routed according to R n ; (3) service by each non-idle server is given; (4) the state of the queues, i.e., W n , is updated.
Markov chain formulation.The state of the system is given by the process This state contains two pieces of information: the routing R n and the workload W n at the different servers.This process satisfies the recursion, for n ≥ 1, where ξ n is a r.v. that is independent of (a n , b n ), and whose conditional distribution, given the past states (X 0 , X 1 , . . ., X n−1 ), is uniform on the set {k : W n−1 = 0} on the event {η n−1 =1}, and takes an arbitrary value on the event {η n−1 = 0}.
We define the set S ⊂ Ŝ as the collection of all states s ∈ Ŝ such that s can be reached with positive probability from any of the states of the form (k, 0), k ∈ [K], corresponding to an empty system.
The initial condition X 0 = (R 0 , W 0 ) is assumed to lie in S with probability 1.As a consequence, we may and will consider X n as a Markov chain on the state space S. Using (2), the definition of S, and the fact that the process can reach (k, 0) from (k , 0) for any k, k (thanks to the assumption λ < 1), it follows that X n is an irreducible, time homogeneous Markov chain.
Remark 1.The information accessible to the policy consists of the set of servers that are currently idle and the last server a job was sent to.Whereas the workload at the different buffers is required in order to describe the system dynamics by means of a Markov chain, this information is not available to the algorithm.In other words, workload, queue length and individual job size information are all used in analyzing the algorithm, but are not accessible to the policy.
Our main result is the following.k) .Then: (i) X n is positive recurrent.Consequently, (ii) X n has a unique stationary distribution, denoted by π X , and (iii) For any initial state s ∈ S and any B ⊂ S, Recall that the model is defined in such a way that one always has λ ≤ 1.In the case when K k=1 µ (k) ≤ 1, our result states that the policy is stable.When K k=1 µ (k) > 1, the above is still the best possible result, because the limitation that λ cannot lie in the interval (1, µ (k) ) is due to the model rather than the policy.

Proof motivation
A standard approach to proving stability for irreducible Markov processes in general, and for queueing models in particular, relies on the Lyapunov function technique, where a function L is constructed, mapping the (countable) state space to R + , possessing a negative drift bounded away from zero at all states x outside of some finite set F ⊂ S, and a finite expectation The existence of such a function ensures several stability properties, such as positive recurrence and convergence to a unique invariant measure (Lemma I.3.10 of Asmussen (2008)).
The stability proofs of load-balancing policies such as JSQ (Tassiulas and Ephremides (1992), Georgiadis et al. (2006), SQ(d,m) Shah and Prabhakar (2002), Shah (2017), JIQ Lu et al. (2011)) and random routing (see Example 6.12 of (Hajek (2015))) use this technique.Specifically, they all rely on a quadratic Lyapunov function, L(x) = x T Ax where A is positive semi-definite, with the state space defined as the queue length vector.Some works on stability of related models use sub-quadratic Lyapunov functions (Andrews et al. (2004), Walton (2013)).
In the case of PI, recall that the state descriptor includes the last routing decision in addition to workload, and so a Lyapunov function may, in general, depend on both components.However, the Lyapunov function that we construct does not depend on the last routing decision.We thus restrict the discussion to functions that depend only on workload or queue length.
Note that outside of any finite set there are states where the server with the largest queue receives all the incoming work with probability 1 (this corresponds to cases where the queue in the server that was idle last became the largest while none of the other queues reached zero).As a result, it is not hard to see that quadratic and sub-quadratic functions do not satisfy the negative drift condition in a single time step.
Intuitively, what makes the Lyapunov function technique effective in all the cases of the aforementioned load balancing policies is that they all strive to push the state of the system 'towards the origin' at every time step.As already mentioned, for PI this is not the case.There could be periods of time during which a large queue keeps receiving all incoming jobs, and this is maintained as long as all other queue lengths are non-empty.Thus the stabilizing nature of the policy (if there is any) is not to be observed in a single time step, but rather over potentially many time steps.
This leads us to looking at the process at certain sampling times.
By working with workload rather than queue length we accomplish the following two goals: (1) Given a state, it is known precisely when a new server is to become idle.This is given by the minimal workload in the servers which do not receive work; (2) We do not need to make any assumptions on service time distributions or service policies within each buffer.The dynamics of the system are captured by a Markov process without the need to keep track of residual service or arrival times.
As for the sampling times, the construction that we provide uses a state-dependent number of steps (as in Malyshev and Men'shikov (1979), Meyn and Tweedie (1994)).Thus the calculation of the drift is performed with respect to time intervals that depend on the states the process traverses.The sampling times are chosen to be the NTA times, namely the times at which a token is available at the dispatcher and it is not the last used token.At these times the workload vector is always at the boundary of the positive orthant.The drift condition then only needs to be verified at these special states, rather than at the whole space.Denote the set of these states by S b (where b is mnemonic for boundary).Because the Lyapunov function is observed at sampling times, the negative drift condition must be stated in terms of the duration of the intervals.That is, in place of (3), one must show for all but finitely many x in the set S b , where ε > 0 and τ denotes the next sampling time.
Whereas the dynamics of the chain are somewhat complicated, there is one element of it that can be described in relatively simple terms.Namely, between two consecutive sampling times, there is a single buffer that receives incoming traffic.During this interval, the workload at this particular buffer follows a reflected random walk starting at zero.Indeed, in this buffer, the increment of workload over a time step is given by the arriving workload minus 1, subject to the constraint that the workload remains non-negative.An additional useful fact is that the workload at all other queues decreases (or stays put) during the aforementioned interval.The significance of these observations is that they allow us to provide probabilistic estimates on the drift.
Our proposed Lyapunov function is given by where α ∈ (0, 1).Before we present the proof in the next section, we show that when α = 0 or α = 1, the function L of (4) does not, in general, meet the desired negative drift condition.
Example 1 (Counterexample in the case α=0).Consider a system with 3 servers, each with rate µ < 1/3, and assume 2.5µ < λ < 3µ < 1.Now, suppose that a finite set is already chosen, and consider a state outside of it with the workload vector x 0 = (x, 0, 0).Therefore, if α = 0, then L(x 0 ) = µx.Now, the state at the next sampling time must be of the form x 1 = (x − 1, 0, y) or x 1 = (x − 1, y, 0).The server which receives the work receives on average λ/µ units of work and completes one unit if it is non-idle.Thus, roughly, the average workload of this server is equal to at least y = λ/µ − 1, which is larger than 1.5.Thus, L(x 1 ) = µ(x − 1 + y) > µ(x + 0.5) > L(x 0 ) and the drift is positive.
Example 2 (Counterexample in the case α=1).Again, consider a system with 3 servers, each with rate µ < 1/3, and assume 2.5µ < λ < 3µ < 1.Now, suppose that a finite set is already chosen, and consider a state outside of it with the workload vector x 0 = (x, x, 0).Therefore, if α = 1, then L(x 0 ) = 2µx 2 .Now, the state at the next sampling time must be of the form x 1 = (0, 0, y).
Every time during this interval of duration x, the third server receives on average λ/µ units of work and completes one unit if it is non-idle.Thus, roughly, the average workload of the third server is equal to at least (λ/µ − 1)x, which is larger than 1.5x.Thus, L(x 1 ) = 2.25µx 2 > L(x 0 ) and the drift is positive.
We finally comment that the parameter α, with which the function (4) satisfies the drift condition, depends on the arrival rate λ.Hence the Lyapunov function depends on λ, which is quite uncommon in the literature of queueing systems stability.

Proof of main result
For F ⊂ S define the hitting time Our goal is to prove that there exists a nonempty finite set F ⊂ S such that for all s ∈ F , where throughout the proof E s The irreducibility of X n and (5) imply positive recurrence, and hence part (i) of Theorem 1, by the following Lemma: Lemma 1 (Lemma I.3.10,Asmussen (2008)).Let {X n } be an irreducible Markov chain and F a finite subset of its state space.Then the chain is positive recurrent provided that E s [τ F ] < ∞ for all s ∈ F .
Part (ii) then follows by I.3.6 of Asmussen (2008).Finally, by the ergodic theorem for Markov chains, Theorem I.4.2 of Asmussen (2008), part (iii) will follow once it is shown that the chain is aperiodic.To this end, it suffices to show that there exists a state with self-transition.This is clearly the case for any of the states (k, 0), again owing to the assumption λ < 1.
It thus remains to show (5).Our main idea is to sample X n at the NTA times {n : η n = 1}, namely, when a token is available and it is not the last used token.To define these random times, we first define the filtration While including ξ n+1 in ( 6) is not mandatory, we have chosen to do so because knowing the identity of the newly used token simplifies the analysis.Using (1), define {N i , i ≥ 0}, an increasing sequence of F n stopping times by and We have Lemma 2. The stopping times N i are finite a.s.for all i.
The proof is deferred to Section 5.
Define the sampled chain Y i = X N i .For F ⊂ S, define the stopping time Thus σ F equals the number of steps the sampled chain makes until reaching F .With a slight abuse of notation, we write σ = σ F , where the corresponding set F should be clear from the context.
If X Ñi ∈ F , the chain has stopped and ∆ i = 0.If X Ñi ∈ F and there are several tokens available, then Ñi+1 = Ñi + 1.Finally, if X Ñi ∈ F and there is a single token available, then its corresponding server must be idle and chosen as R Ñi +1 while all other servers are not idle.This means that during ( Ñi , Ñi+1 ] all incoming work is routed to the server given by R Ñi +1 .Thus ∆ i equals the number of workload units at the server with the minimal workload at time Ñi , not including R Ñi +1 (which must be idle at time Ñi ).
Our main step towards proving ( 5) is proving that a state dependent drift criterion is fulfilled by the skeleton chain Y .
Lemma 3.There exists > 0, a function L : S → R + and a non-empty finite set F ⊂ S, such that Proof.For ease of exposition, throughout this proof we drop the subscript s and write E instead of E s .We begin by fixing > 0 and a function L. We then consider a finite set F of the form and show that if C is large enough, (10) holds.Denote µ min = min k µ (k) .Let 0 = K k=1 µ (k) − λ > 0, and The existence of such an α is guaranteed by the following Lemma: The proof is deferred to Section 5.
The reason for analyzing E 1 and E 2 separately is that when considering two consecutive sampling times n 1 and n 2 , the drift behavior of the system under the events E 1 or E 2 at n 1 is qualitatively different.
Specifically, under E 1 , during the interval (n 1 , n 2 ] there is a single server i which is idle at time n 1 and receives all incoming work, while all other servers are non-idle.As a result, the drift during (n 1 , n 2 ] is determined by the weight the Lyapunov function gives to the reflected random walk at server i as opposed to the decrease in all other servers.As discussed in Section 3, since the workload at server i at time n 2 may be the maximal workload in the system, the Lyapunov function may need to be sub-quadratic, i.e., α < 1, depending on λ.
On the other hand, under E 2 , there are more idle servers at n 1 and the next sampling time is The drift in this case is determined by the weight the Lyapunov function gives to the possible arrival to an idle server as opposed to the decrease in other non-idle servers in the system.
As discussed in Section 3, since there are more idle servers, if λ is sufficiently large, the expected amount of work that enters the system may be larger than the work completed by the non-idle servers.This suggests using a super-linear Lyapunov function, i.e., α > 0.
On the event E 1 .
In this case, since i < σ, we have Y i / ∈ F .Additionally, since |{k : W (k) Ñi = 0}| = 1 and Ñi is a sampling time, there is a single idle server at Ñi , which is not R Ñi and must be chosen as r i .
Therefore, during the time period [ Ñi , Ñi+1 ), all servers except r i are non-idle.Hence, by ( 2) and ( 8), for k = r i , we have Since ξ n+1 is included in F n , r i is known given F Ñi .Thus, using (14 To bound the right hand side of (15) we use the following Lemma: The proof is deferred to Section 5.
Using Lemma 5 in ( 15), for α > 0, there exists δ(α) > 0 such that Using ( 15) and ( 17) we obtain We now bound the second term on the right hand side of (13).The server r i is idle at time Ñi and receives all incoming work during ( Ñi , Ñi+1 ].During this time period, by (2), the workload dynamics at r i are given by a random walk reflected at zero, with an average step size β (r i ) = (λ/µ (r i ) − 1).
Note that the service rate weights in the Lyapunov function ( 12) are needed to make sure that the right hand side of ( 22) is negative regardless of the value of r i . Denote Using ( 21), ( 22) and ( 23) yields

Finally, we have
Lemma 7.There exists The proof is deferred to Section 5.
Therefore taking C ≥ C 1 in (11) completes the proof of this case.
On the event E 2 .
In this case, |{k : , the dispatcher has more than a single token available at Ñi . Therefore Additionally, Y i ∈ F .We provide a condition on F which ensures that the decrease in the server with the maximal workload results in a negative average drift.Similarly to (19), since r i is measurable with respect to F Ñi and W (r i ) Ñi = 0, by ( 24) and Lemma 6, there exists a constant γ > 0 such that Denote C min = min k µ (k) 1+α , and let Therefore there exists a k * =r i such that W > 1.Thus, by Lemma 5, By ( 27) and (29), Using ( 25), ( 28) and ( 30) in ( 13), followed by ( 26) and ( 24), we obtain On the event E 3 .
Fix an initial state s 0 = (r 0 , w (1) 0 , . . ., w We argue that conditioned on X 0 = s 0 , n 0 is a sampling time and consequently one has X n 0 ∈ S * .
Case 1: s 0 ∈ S * .The dispatcher has no tokens available which are different from r 0 , and w (k) > 0 for k = r 0 .Thus the first sampling time occurs when the least loaded server becomes idle.
Case 2: s 0 ∈ S * .In this case, time 0 is a sampling time and there is a token available different from r 0 .If w (r 0 ) = 0, time 1 is necessarily a sampling time as well.Otherwise, r 0 receives no work until it becomes idle and w (r 0 ) is a sampling time.Note that times belonging to (0, w (r 0 ) ) may be sampling times as well.
Denote the n-step transition probabilities by Also, p (n 0 ) s 0 s = 0 for s / ∈ S * .We have where the last inequality is due to X n being a Markov chain.To upper-bound E s [τ F ], we use the fact that if X 0 = s ∈ S * \ F then τ F ≤ N σ a.s., where N σ is the time it takes sampled chain Y n to hit F .As a consequence of Lemma 3, σ < ∞ a.s., as we prove in Lemma 8. Thus N σ is well defined.Therefore, we proceed with finding an upper-bound on E s [N σ ].By (7) and the fact that We now apply Lemma 3. Taking expectation on both sides of (10), we obtain Summing over i ∈ [0, m − 1] and using (32) yields Using the fact that E s [L( Ỹm )] ≥ 0 and rearranging yields Lemma 8. Assume the conditions of Lemma 3 hold and s ∈ S * \ F .Then, E s [σ] < ∞ and consequently σ < ∞ a.s.
The proof is deferred to Section 5.
Recall Ñi = N i∧σ .By Lemma 8, the monotone convergence theorem and (33), Revisiting (31), using E s [τ F ] ≤ E s [N σ ] and ( 34) we obtain where the last inequality is due to the fact that This concludes the proof.Q.E.D.

Lemmas
In this section we prove several lemmas that were used in the proof of the main result.For the reader's convenience we restate these lemmas before providing their proofs.
Lemma 2 The stopping times N i are finite a.s.for all i.
Proof.Fix i≥0, and consider the event {N i =∞}.On this event, there exists an infinite sequence of consecutive times such that apart from one server, all other servers are busy and receive no input.Therefore the workload in these servers must have been infinite at some point in time, which contradicts the finiteness of the initial condition and the arrival process.Thus the probability of this event is zero and the claim follows. Q.E.D.
Lemma 6 Let a sequence of i.i.d.Bernoulli r.v.s {a n } be given, with E[a 1 ] = λ.Let µ > 0 and a sequence of i.i.d.r.v.s {b n } be given, with E[b 1 ] = 1/µ, and Var(b 1 ) = σ 2 .Let H n be a random walk such that H 0 = 0 and for n ≥ 1, Define the corresponding reflected random walk W n by 1+α) , Proof.First, since 0 < (1 + α)/2 ≤ 1, by Jensen's inequality we have The process M n is a symmetric random walk, and therefore a martingale.We have where in the first inequality we have used the fact that W n ≥ 0. By the L p maximal inequality and using Jensen's inequality, Using ( 42), ( 43) and ( 44) we obtain By ( 41) and ( 45), Proof.On E 1 , i < σ and therefore ∆ i ≥ 1. Denote We consider two cases corresponding to the sign of d 2 − ∆ (1+α)/4 i in (46).
Case 2: ∆ i < u.In this case, Second, we show that Applying ( 48) and ( 49) in (46), we obtain where the last inequality is due to (47).Thus, it remains to prove (49).By (47), u ≤ C 1 /K.Since ∆ i < u, and k∈ Therefore, there exists a k * = r i such that W where the last inequality is due to (50).This concludes the proof. Q.E.D.

Simulations
We turn to present simulation results, which compare PI's performance with that of several loaddistribution policies, namely JLW, JSQ, SQ(2), SQ(1,1) and JIQ.As shall be seen, the simulations indicate that PI is indeed stable.Moreover, they consistently show that its performance is comparable to JSQ's, even though JSQ utilizes full queue-length state information.Moreover, they also indicate that JIQ and SQ(2) are not stable for heterogeneous systems and that SQ(1,1) achieves poor performance with respect to job completion time.

Simulation settings
We consider a system with 10 heterogeneous servers, some being slow servers, working at a rate µ, and the others being fast servers, working at a rate 10µ.There are three scenarios, which correspond to different partitions of the servers into the two classes.More specifically, we consider the following slow:fast ratios: 1:9, 5:5 and 9:1.Given the ratio, the exact service rates are determined such that their sum is one.For example, if the ratio is 1:9, there is one slow server and nine fast servers.
Their service rates sum to µ + 9 • 10µ = 91µ, so µ is set to equal 1/91.Thus, for λ < 1, the system is sub-critical.The service time distribution is Geometric, with the parameter depending on the server (e.g., Geometric(1/µ) for slow servers).
For each scenario, we run simulations on a large number of time slots for different loads, where all policies receive the same input process, but possibly make different routing decisions.The number of time slots was chosen such that the difference in the outputs of different runs at the maximal load were negligible.We present the following graphs: Average workload in steady state versus load.For each simulation run with a given load we calculate the average workload in the system (over all time-slots, after an initial duration required for convergence).Under all policies, whenever the chain is positive recurrent, it is also ergodic.Thus message rate.Compared to the remaining policies, PI's message rate is significantly smaller and keeps decreasing as the offered load increases.

Appendix. Instability of JIQ
The example below demonstrates the claim made in the introduction that JIQ is not always stable.
Example 3. Consider the following simple example in continuous time, where the arrival process and service times are deterministic.There are 2 servers, with service rates µ 1 = 2/11 and µ 2 = 10/11, such that a job processed in the first (respectively, second) server takes precisely 11/2 (respectively, 11/10) time units.A new job arrives upon every unit of time.Thus, the input rate is 1, and the sum of the service rates is 12/11; hence the system is sub-critical.If server 2 is idle when a job arrives, it is necessarily not idle when the next job arrives (because it requires more than a single unit of time to complete a job).Thus, at least 1/2 of the jobs are either randomly routed to one of the servers, or are routed to server 1 if it is idle.Therefore, on average, server 1 receives at least 1/4 of the jobs.Since 1/4 > 2/11 server, 1 is overloaded, and the system is unstable.

.
Figure1A system of K parallel heterogeneous servers and a single dispatcher.The dispatcher routes incoming jobs using the Persistent-Idle load-distribution policy.