Randomized Assignment of Jobs to Servers in Heterogeneous Clusters of Shared Servers for Low Delay

We consider the job assignment problem in a multi-server system consisting of $N$ parallel processor sharing servers, categorized into $M$ ($\ll N$) different types according to their processing capacity or speed. Jobs of random sizes arrive at the system according to a Poisson process with rate $N \lambda$. Upon each arrival, a small number of servers from each type is sampled uniformly at random. The job is then assigned to one of the sampled servers based on a selection rule. We propose two schemes, each corresponding to a specific selection rule that aims at reducing the mean sojourn time of jobs in the system. We first show that both methods achieve the maximal stability region. We then analyze the system operating under the proposed schemes as $N \to \infty$ which corresponds to the mean field. Our results show that asymptotic independence among servers holds even when $M$ is finite and exchangeability holds only within servers of the same type. We further establish the existence and uniqueness of stationary solution of the mean field and show that the tail distribution of server occupancy decays doubly exponentially for each server type. When the estimates of arrival rates are not available, the proposed schemes offer simpler alternatives to achieving lower mean sojourn time of jobs, as shown by our numerical studies.

1. Introduction. Consider a stream of jobs arriving at a multi-server system consisting of a large number of parallel processor sharing servers. The servers are categorized into different types or clusters according to their processing capabilities. Each job, upon arrival, is assigned to a server where it completes its service and leaves the system. The objective is to design job assignment schemes that reduce the average sojourn, or response, time of jobs in the system. online search, social networking etc. In these systems, a small increase in the average response time of requests may cause significant loss of revenue and users [15]. Therefore, it is critical to reduce the average response time of jobs in such systems.
Reduction in the average response time can be achieved by assigning arrivals to less congested servers [19,7,21] in the system. However, in today's systems, where the number of front end servers is large, obtaining state information of all the servers incurs a significant communication overhead. For such systems, randomized job assignment schemes, in which each assignment decision is made based on comparing the states of a random subset of d (≥ 2) servers, are promising solutions. For systems with identical servers (homogeneous), such randomized schemes have been shown [18,10,6] to result in a significant reduction in mean response time of jobs as compared to state independent schemes, in which job assignments are made independent of server states. This implies that for large homogeneous systems, a small, randomly chosen subset of servers is representative of the distribution of load in the overall system.
In this paper, we consider heterogeneous systems where servers are grouped into different types or clusters, often geographically separated, based on their capacities. Motivated by the aforementioned intuition arising from the homogeneous case, we consider randomized job assignment schemes, in which a small random subset of servers is sampled from each server type. The least loaded servers of each type are then compared based on the instantaneous processing rates they offer. The job is then assigned to the server that provides the highest processing rate. We consider processor sharing (PS) as the service discipline in this paper since it closely approximates roundrobin discipline with small granularity [14] usually employed in server farms. Moreover, processor sharing discipline has the desirable property of being insensitive to job length distribution type [8].
1.2. Related literature. Randomized job assignment schemes have been primarily studied in the literature for a system consisting of N identical first come first serve (FCFS) servers, which is also referred to as the supermarket model. Most studies consider the so called shortest-queue-d (SQ(d)) scheme in which each job is assigned to the shortest of d randomly chosen queues.
For d ≥ 2, [18] showed, using the theory of operator semigroups, that the equilibrium queue sizes decay doubly exponentially in the limit as the system size increases (as N → ∞). Mitzenmacher in [10,11] derived the same result using an extension of Kurtz's theorem [5]. In [17], a coupling argument was used to show that larger values of d result in more even distri-bution of loads among the servers. Chaoticity on path space (or asymptotic independence among queue length processes) was established in [6] using empirical measures on the path space. Results of [18] were generalized to the case of open Jackson networks in [9].
Recently, in [3], the SQ(d) scheme was analyzed under more general service disciplines and service time distributions. It was shown that in the case of FCFS discipline and power-law service time distribution, the equilibrium queue sizes decay doubly exponentially, exponentially, or just polynomially, depending on the power-law exponent and the number of choices, d. The stability of more general randomized schemes for non-idling service disciplines was analyzed in [2], which derived a sufficient condition under which such networks are stable. Asymptotic independence of servers in equilibrium was proposed in [4] under local service disciplines and general service time distributions. However, the result was proved only for FCFS service discipline and service time distributions having decreasing hazard rate (DHR) functions.
The tradeoff between sampling cost of servers and the expected sojourn time seen by a customer in the supermarket model was studied under a game theoretic framework in [22]. It was shown that for arrival rates within the stability region of the network, a symmetric Nash equilibrium for identical customers exists in which each customer chooses a fixed number of queues to sample.
Recently, in [13,12], the SQ(d) scheme was considered for a system of parallel processor sharing servers with heterogeneous service rates. It was shown that, in the heterogeneous setting, random sampling of d servers from the entire system reduces the stability region. However, it can be recovered using the SQ(d) scheme over a randomly chosen server type.
1.3. Main results. In this paper, we propose two new randomized schemes for job assignment in the heterogeneous scenario. In both the schemes, upon arrival of a job, a small number of servers of each type is randomly sampled. The sampled servers are then compared based on their states and the arrival is assigned to the best server among the chosen set of servers. The metric for choosing the best server distinguishes the two schemes.
This represents a scenario where a centralized dispatcher first requests information from each bank or type of servers and then routes the job to the server that is going to give the lowest response time among the sampled servers. The number of servers sampled from a given type depends on the tradeoff between the sampling cost and the likely sojourn time as in the supermarket model in [22]. We do not address the precise tradeoffs in this paper suffice to say that we assume that they could be different at each server type. We describe the precise mechanisms below.
In the first scheme, each arrival is assigned to the sampled server with the least number of unfinished jobs. In the second, each arrival is assigned to the sampled server offering the maximum processing rate per unfinished job. Note that, in the both the schemes, the sampled set contains servers of all types. We show that such sampling achieves the maximum possible stability region.
We analyze the performance of the proposed schemes in the limit as the system size N → ∞ using the mean field approach. Our analysis shows the following.
• The stationary tail distribution of server occupancies decay doubly exponentially in the limiting system. We devise indirect methods to show this, since, unlike the homogeneous case, closed form solutions of the stationary distribution cannot be obtained in the heterogeneous scenario. • We establish the existence and uniqueness of the equilibrium point of the mean field equations in the space of empirical tail measures having finite first moment. Our proof, again, differs from the earlier works since closed form solutions cannot be obtained. • We show that propagation of chaos holds at each finite time and also at the equilibrium. In that, we generalize the earlier results on propagation of chaos to systems where exchangeability holds only among servers of the same type.
We also numerically compare the proposed schemes with existing schemes for the heterogeneous case. It is observed that the proposed schemes result in lower mean response time of jobs in scenarios where arrival rates cannot be estimated.

1.4.
Organization. The rest of the paper is organized as follows. In Section 2, we describe the system model, the proposed job assignment schemes and our notations. We then analyze the proposed schemes in Sections 3, 4, and 5. In Section 6, numerical results are presented that compare the schemes and determine the accuracy of the theoretical results derived in the paper. Finally, we conclude the paper in Section 7 with a summary and a discussion on future work.

Model and notations.
We consider a multi-server system consisting of N parallel processor sharing (PS) servers. The capacity, C (bits/sec), of a server is defined as the time rate at which it processes a single job Job dispatcher System consisting of N parallel processor sharing (PS) servers, categorized into M types. There are γjN servers of type j, each of which has a capacity or rate Cj. Arrivals occur according to a Poisson process with rate N λ. For each arrival, the job dispatcher samples dj servers of type j and routes the arrival to to one of the sampled servers.
present in it. If there are q(t) jobs present at a server of capacity C at time t, then the instantaneous rate at which each job is processed in the server is given by C/q(t). Depending on their capacities, the servers in the system are categorized into M (≪ N ) types. Define J = {1, 2, . . . , M } to be the index set of server types. The capacity of type j servers is denoted by C j , for j ∈ J , and we assume, without loss of generality, that the server capacities are ordered in the following way: Further, for each j ∈ J , we denote the proportion of type j servers in the system by γ j (0 ≤ γ j ≤ 1). Clearly, M j=1 γ j = 1. Jobs arrive at the system according to a Poisson process with rate N λ. Each job is of random length, independent and exponentially distributed with a finite mean 1 µ (bits). 1 The inter-arrival times and the job lengths are assumed to be independent of each other. Upon arrival, a job is assigned to one of the N servers where the job stays till the completion of its service after which it leaves the system. The model is illustrated in Figure 1. We consider the following two job assignment schemes.
2.1. Scheme 1. In this scheme, upon arrival of a job, d j servers of type j are sampled uniformly at random from the set of N γ j servers of type j, for each j ∈ J . Note that this sampling is done at the cluster of type j servers by a local router.
Let q (j,1) denote the vector of occupancies of the d j sampled servers of type j. For each type j ∈ J , a sampled server with index k j is chosen for further comparison where k j is given by In case of ties among sampled servers of type j, the index k j is chosen uniformly at random from the tied servers of that type. The occupancy information of the server corresponding to k j is sent to the central dispatcher.
Using this information from each of the clusters j ∈ J the arriving job is assigned by the dispatcher to the type i sampled server having index k i where Ties across server types are broken by choosing the server type having the highest capacity among the tied servers. Thus, in this scheme, each arrival is assigned to the server having the least instantaneous occupancy among the subset of randomly selected servers.
2.2. Scheme 2. As in Scheme 1, upon arrival of a job, a random subset of d j servers of type j is chosen uniformly, for each j ∈ J . Then from each type j ∈ J , a server with index k j is chosen according to (2.2) for further comparison across different server types. The arriving job is finally assigned to the type i sampled server having index k i if Note that the quantity C j /q (j,k j ) N denotes the processing rate per unfinished job at the sampled type j server with index k j . Thus, in this scheme, an arrival is assigned to the server that provides the highest processing rate per job among the sampled set of servers. Ties are broken in the same way as described in Scheme 1.
It is clear that Scheme 2 differs from Scheme 1 only in the criterion for server selection. In Scheme 1, server selection is done based only on the instantaneous occupancies of the sampled servers, whereas in Scheme 2 server capacities are also taken into account in the selection criterion. Note that in the heterogeneous scenario a server with higher occupancy can still provide a higher processing rate than a server with lower occupancy. Therefore, Scheme 2 provides a finer metric for server selection.
2.3. Notations. We define the following real sequence spaces: Let j∈JŪ   It can be easily verified that under the metric defined in (2.8), the spaceŪ M is compact (and hence complete and separable). Further, for any k ∈ Z + and i, j ∈ J , we define where ⌊x⌋ denotes the greatest integer not exceeding x and ⌈x⌉ denotes the smallest integer greater than or equal to x.
Let (H, H, µ H ) be a measure space and f : H → R be a µ H -integrable function. We define duality brackets as f, µ H = f dµ H . We denote the weak convergence (convergence in distribution) of a sequence of probability measures P n (random variables X n ) to a probability measure P (random variable X) by P n ⇒ P (X n ⇒ X).
3. Stability analysis. In this section, we derive the sufficient condition for the system to have a finite expected number of jobs at all times under Scheme 1 and Scheme 2. In other words, we find the set of arrival rates for which the Markov process describing the time evolution of the system is positive Harris recurrent or stable. We use the stability condition derived in [2] for more general join-the-shortest-queue (JSQ) networks.
Proof. Suppose that the N servers in the system are indexed by the set S = {1, 2, . . . , N }. For each job, we define a selection set to be the subset of j∈J d j servers sampled at its arrival. We denote by p A the probability that the subset A ⊆ S is chosen as the selection set for an arrival. Note that p A , A ⊆ S, defines the job assignment scheme used. Under Scheme 1 and Scheme 2, the probability p A is non-zero only for subsets A which contain d j servers of type j for all j ∈ J and for each such a subset A, the probability p A is given by  [2] to the system under consideration, we obtain the sufficient condition for stability of the system to be where C (n) denotes the capacity of the server with index n in the set S. Clearly, for Scheme 1 and Scheme 2, the term within the braces in (3.3) is non-zero only when the subset B is composed of at least d j servers of type j for all j ∈ J . Let B j (≥ d j ) denote the number of type j servers in B. Using (3.2) and (3.3) we now have It is easy to verify that that the function j∈J ( B j d j ) j∈J B j C j is increasing with respect to B j for each j ∈ J . Hence, the expression within the braces in (3.4) is maximized when we set B j = N γ j . Hence, we have Therefore, from (3.3) and (3.5) we conclude that the system under consideration is stable under Scheme 1 and Scheme 2 if (3.1) holds.
Remark 3.1. An alternative proof of stability via a coupling argument is as follows: Consider a modified scheme in which, upon arrival of each job, one server is chosen from each type uniformly at random (i.e., d j = 1 for all j ∈ J ). The job is then routed to the sampled server of type j with probability γ j C j i∈J γ i C i for each j ∈ J . A coupling argument, similar to the one discussed in the proof of Theorem 3 of [9], shows that the system operating under the modified scheme always has higher number of unfinished jobs than that operating under Scheme 1 or Scheme 2. It is easy to check that the system operating under the modified scheme is stable under (3.1). Hence, the system operating under Scheme 1 and Scheme 2 also must be stable under (3.1).
As discussed in [2], for λ > µ j∈J γ j C j , the system under consideration is unstable under any job assignment policy. Thus, from Theorem 3.1 we conclude that Scheme 1 and Scheme 2 achieve the maximal stability region.

Mean field analysis.
We now analyze the time evolution of the number of jobs in the system under Scheme 1 and Scheme 2. Its exact characterization is difficult since under both the schemes, arrivals at a given server depend on the states of other servers. However, it is possible to analyze the system in the limit as the system size N → ∞. Such a limit is known as the mean field limit [10,18,9] and it exists because under random sampling of a fixed number of servers from each type the statistical properties of the system do not change when states among servers of the same type are permuted.
To formally state our results, we define the process N,n (t) denotes the fraction of type j servers having at least n unfinished jobs at time t. Thus, x (j) N,n (t), n ∈ Z + denotes the empirical tail distribution of occupancy of type j servers at time t. Clearly, x N (t) is a Markov process in the state space j∈JŪ

4.1.
Convergence to the mean field. The main aim of this subsection is to prove the following result.
Theorem 4.1. If x N (0) converges in distribution to some constant g ∈ U M as N → ∞, then the process {x N (t)} t≥0 converges in distribution to a process {u(t)} t≥0 , lying in the spaceŪ M as N → ∞. For Scheme 1, the process u(t) is given by the solution of the following system of differential equations where the mapping l :Ū M → R Z + M is given by For Scheme 2, the process u(t) is given by the solution of where the mappingl : The process {u(t)} t≥0 , defined in the theorem above, is referred to as the mean field. We first note that Theorem 4.1 implicitly assumes that the ordinary differential systems (4.2)-(4.3) and (4.6)-(4.7) have unique solutions in the spaceŪ M . In the following proposition, we show that this is indeed the case. To emphasize the dependence of the solution u(t) on the initial point g, we will often denote u(t) by u(t, g). Proof. The proof is given in Appendix A.
We will prove Theorem 4.1 using the theory of semigroup operators of Markov processes as in [18,9]. Before doing so, we recall the following from [5].
acting on continuous functions f :Ū M → R is defined as In the next proposition, we show that T N (t) converges to T(t) uniformly on bounded intervals. This in conjunction with Theorem 2.11 of Chapter 4 of [5] proves Theorem 4.1.
Proposition 4.2. For both Scheme 1 and Scheme 2, and for any continuous function f :Ū M → R and t ≥ 0, and the convergence is uniform in t within any bounded interval.
Proof. The proof is given in Appendix B.
Remark 4.1. We note that Theorem 4.1 implies that if x N (0) ⇒ g ∈ U M as N → ∞, then the following weaker convergence results also hold: The last assertion follows from the first since x (j) N,k (t) is bounded for each N, j, k, t.

4.2.
Properties of the mean field. In this section, we characterize some important properties of the mean field. In particular, we show that, under the stability condition (3.1), both (4.2)-(4.3) and (4.6)-(4.7) have unique equilibrium points in U M . Further, we show that the equilibrium points are globally asymptotically stable for both systems.
Let P,P denote the equilibrium points of (4.2)-(4.3) and (4.6)-(4.7), respectively. In other words, P andP satisfy l(P) = 0 andl(P) = 0. Hence, for all k ∈ Z + and j ∈ J the following must hold Note that by definition we have P  Hence, if a sequence {z n } n≥1 decays doubly exponentially, then it is summable, i.e., ∞ n=1 z n < ∞.
Then the following equations must hold (4.14)P Further, for each j ∈ J , the sequences P (j) k , k ∈ Z + and P (j) k , k ∈ Z + decrease doubly exponentially. In particular, under the assumption of the proposition, both P (j) k , k ∈ Z + and P (j) k , k ∈ Z + are summable sequences.
Proof. We prove the proposition for P. The proof forP follows along the same line of arguments. For a fix j we add (4.11) for all k ≥ l and use lim k→∞ P (j) Now, multiplying both sides of the above equation by 1 ∆ j and adding over all j ∈ J and using lim k→∞ P (j) k = 0 yields (4.13). From (4.13) we obtain Since by hypothesis, for each j, P k . This proves that the sequence P (j) k , k ∈ Z + decreases doubly exponentially for each j.
The following proposition guarantees that there exists equilibrium points of systems (4.2)-(4.3) and (4.6)-(4.7) in U M . Proof. The proof is given in Appendix C.
The next theorem shows that P andP are the unique globally asymptotically stable equilibrium points of the systems (4.2)-(4.3) and (4.6)-(4.7) in the space U M .  Proof. The proof for Scheme 1 is given in Appendix D. For Scheme 2, the theorem can be similarly proved.
We now show that, under (3.1), the stationary distribution of the process x N converges weakly to the Dirac measure concentrated at the unique equilibrium point of the mean field. Let π N denote the stationary distribution of the process x N . Clearly, π N exists and is unique under (3.1). Further, for Scheme 1 and for Scheme 2.
Proof. We prove the theorem for Scheme 1. The proof for Scheme 2 follows similarly.
Note that since the spaceŪ M is compact, so is the space of probability measures onŪ M . Therefore, the sequence of probability measures {π N } N has limit points. Thus, in order to prove the theorem we need to show that all limit points coincide with δ P .
Due to Theorem 4.1, any limit point π of the sequence π N must be an invariant distribution of the maps g → u(t, g). Hence, by uniqueness proved in Theorem 4.3, it is sufficient to prove that π is concentrated on U M . To prove that π is concentrated on U M it is sufficient to show that This completes the proof.
We have so far established that the interchange property indicated in Figure 2 holds. Note that the convergences indicated in the figure are in distribution. In this subsection, we focus on the occupancies of a given finite set of servers as N → ∞. We show that as the system size grows the server occupancies become independent of each other. Such independence holds at any finite time and also at the equilibrium, provided that the initial server occupancies satisfy certain assumptions. This is formally known as the propagation of chaos [6,16] or asymptotic independence property [4,3] in the literature.
To formally state the results we introduce the following notations. Let q (j,k) N (t), for j ∈ J and k ∈ {1, 2, . . . , N γ j }, denote the occupancy of the k th server of type j at time t ≥ 0. By q (j,k) N (∞) we denote the occupancy of the k th server of type j in equilibrium. Further, let χ (j) N,n (t), for j ∈ J and n ∈ Z + , denote the fraction of type j servers having occupancy n at time t ≥ 0. Define the process χ N (t) = χ (j) N,n (t), j ∈ J , n ∈ Z + . Clearly, χ N,n (t), n ∈ Z + denotes the empirical distribution of occupancies of type j servers and for each n, j, we have χ N (∞) we will denote the empirical distribution occupancies for type j servers in equilibrium. Let the process Q(t) = Q (j) n (t), j ∈ J , n ∈ Z + be defined as Q n , n ∈ Z + . We also define the following notion of exchangeable random variables.
denote a collection of N random variables among which N γ j belong to a particular class j and are indexed by k, where 1 ≤ k ≤ N γ j . The collection is called intra-class exchangeable if the joint law of the collection is invariant under permutation of indices, 1 ≤ k ≤ N γ j , of random variables belonging to the same class.
Proposition 4.4. For the model considered in this paper, for both schemes, q (j,k) Proof. Note that the first part of the proposition is a special case of the second part. Hence, it is sufficient to prove the second part. We will provide a proof for the M = 2 case. The proof can be readily generalized to any M ≥ 2.
Due to the dynamics of the system (under Scheme 1 or Scheme 2) and the hypothesis of the proposition {q (j,k) The hypothesis of the proposition also implies that χ N (t) ⇒ Q(t) as N → ∞ for all t ∈ [0, ∞]. Henceforth, we will omit the variable t in our calculations, which hold for all t ∈ [0, ∞].
To prove the proposition, it is sufficient to show that the following convergence holds as N → ∞.
for all bounded mappings φ k , ψ k : Z + → R + . Now we have Note that the second term on the right hand side of the above inequality vanishes as N → ∞ since χ (j) N ⇒ Q (j) as N → ∞ for j = 1, 2 and Q (1) and Q (2) are constants. Now, due to exchangeability we have Hence, the first term on the right hand side of (4.21) can be bounded as follows where max ( φ k ∞ , ψ k ∞ ) = B. This completes the proof.
Thus, the above proposition shows that in the limiting system server occupancies become independent of each other. It also shows that the stationary occupancy distribution of any type j server is given by n+1 , n ∈ Z + for Scheme 2.
5. Computation of the stationary distribution. So far we have shown that in the limiting system (N → ∞) each finite collection of servers behave independently and the stationary tail distribution of occupancy of a type j ∈ J server in the limiting system is given by P (j) k , k ∈ Z + under Scheme 1 and P (j) k , k ∈ Z + under Scheme 2. Using the independence of servers in the limiting system we conclude the following proposition.
Proposition 5.1. In equilibrium, the arrival process of jobs at any given server in the limiting system is a state dependent Poisson process. Further, the arrival rate of jobs to a server of type j ∈ J when it has occupancy k in the equilibrium is given by for Scheme 1 and for Scheme 2.
Proof. We provide the proof for Scheme 1. The proof for Scheme 2 follows from similar line of arguments.
Consider a tagged type j server in the system and the arrivals that have the tagged server as one of its possible destinations. These arrivals constitute the potential arrival process at the tagged server. The probability that the tagged server is selected as a potential destination server for a new arrival is Thus, due to Poisson thinning, the potential arrival process to the tagged server is a Poisson process with rate γ j . Next, we consider the arrivals that actually join the tagged server. These arrivals constitute the actual arrival process at the server. For finite N , this process is not Poisson since a potential arrival to the tagged server actually joins the server depending on the number of jobs present at the other possible destination servers. However, as N → ∞, due to the asymptotic independence property shown in 4.4 the occupancies of the sampled servers become independent of each other. As a result, in equilibrium the actual arrival process converges to a state dependent Poisson process as N → ∞.
Consider the potential arrivals that occur to the tagged server when its occupancy is k. This arrival actually joins the tagged server with probability 1 x+1 when x other servers among the d j servers of type j have occupancy k, all the d i servers of type i < j have at least occupancy k, and all the d i servers of type i > j have at least occupancy k + 1. Thus, the total arrival rate λ (j) k can be computed as which simplifies to (5.1).
Hence, the above proposition shows that in equilibrium the arrival rate at a given server depends on the stationary tail probabilities P (j) k , k ∈ Z + and j ∈ J .
The stationary tail probabilities can in turn be expressed as functions of the arrival rate. Indeed, in equilibrium the global balance equations (which hold under state dependent Poisson arrivals due to Theorems 3.10 and 3.14 of [8]) yield k+1 . Hence, the equilibrium point P is the unique fixed point of the mapping Θ : U M → U M defined as Θ(P) = F (G(P)), where G(·) denotes the mapping from U M to the space of possible arrival rates (defined by (5.1)) and F (·) denotes the mapping from the space of possible arrival rates to the space U M (defined by (5.4)). Thus, the equilibrium point P can be computed using the fixed point iterations (i.e., by repeatedly applying the mapping Θ(·) to some arbitrary point Q ∈ U M .) Remark 5.1. So far our results have been obtained for exponential job length distributions. Note that the conclusions of Proposition 5.1 continue to hold for any job length distributions due to the Whittle balance criterion [20] that can be shown to hold for the stationary distribution (also see Theorems 3.10 and 3.14 of [8]). In view of the uniqueness of the stationary distribution and propagation of chaos this suggests that in stationarity the servers are asymptotically independent for general job size distributions. In Section 6, we provide numerical evidence to support insensitivity.
Remark 5.2. From Proposition 4.4 it directly follows that the expected occupancy of a type j server at equilibrium is given by ∞ k=1 P (j) k for Scheme 1 and ∞ k=1P (j) k for Scheme 2. Hence, a simple application of the Little's law, yields that the mean sojourn time of jobs in the limiting system is given k for Scheme 2. Thus, the mean sojourn time of jobs in the limiting system can be computed using stationary tail probabilities which in turn can be computed using the fixed point method described in this section.
6. Numerical Results. In this section, we present simulation results to compare the different job assignment schemes discussed in this paper. The results also indicate the accuracy of the asymptotic analyses of the Scheme 1 and Scheme 2 in predicting their performance in a finite system of servers. We set µ = 1 in all our simulations.
To determine accuracy of the asymptotic analysis presented in the paper we first compare the results obtained from the theoretical analysis with that obtained from the simulations. In Figure 3, we plot the mean sojourn time jobs as a function of the normalized arrival rate, λ, for different values of the system size N . We observe a very good match between the analysis and simulation results for N = 100. For N = 10 and N = 20 the relative errors between the analysis and the simulation results are around 10% and 5%, respectively. Thus, we conclude that the asymptotic analysis accurately captures the behaviour of the system for moderately large system sizes.
We now compare the performance of the proposed schemes with that of other existing schemes for heterogeneous scenario. In particular, we consider the following two schemes as benchmarks.  6.1. The state independent scheme. As a baseline, we consider a scheme that assigns an incoming job to a server with a fixed probability, independent of the current state of the servers in the system [1]. We denote by p j , for j ∈ J , the probability with which an arrival is assigned to one of the servers of type j. The probabilities p j , j ∈ J , can be chosen chosen such that the mean sojourn time of the jobs is minimized. Clearly, in this scheme, no communication is required between the job dispatcher and the servers as the job assignment decisions are made independently of the state of the servers.
6.2. The hybrid SQ(d) scheme. In this scheme [13], upon arrival of a new job, the router first chooses a server type j ∈ J with probability p j . Then d j servers of type j are chosen uniformly at random from set of N γ j servers of type j. The job is then assigned to the server having the least number of unfinished jobs among the d j chosen servers. Ties are broken by tossing a fair coin. As in the state independent scheme, the probabilities p j , j ∈ J , can be chosen such that the mean sojourn time of jobs in the system is minimized.
We choose the parameter values as follows: M = 2, C 1 = 1/5, C 2 = 9/5, γ 1 = γ 2 = 0.5, and d 1 = d 2 = 2. Under this parameter setting, the stability region for all the schemes under consideration is λ < 1. In Figure 4, we plot the mean sojourn time of jobs as a function of the normalized arrival rate, λ, for Scheme 1, Scheme 2, the state independent scheme, and the hybrid SQ(d) scheme. We choose the optimal routing probabilities p j , j ∈ J , for both state independent scheme and the hybrid SQ(d) scheme. We observe that the mean sojourn time of jobs under Scheme 1 and is almost the same as that under Scheme 2 for small values of λ. However, for larger values of λ, Scheme 2 outperforms Scheme 1. This is expected for reasons explained in Section 2. We also see that hybrid SQ(d) scheme results in a smaller mean sojourn time of jobs than that in Scheme 1 and Scheme 2, for smaller values of λ. This is because, in the hybrid SQ(4) scheme, the routing probabilities are chosen optimally based on the arrival rate λ. However, for larger values of λ, we observe that Scheme 2 outperforms the hybrid SQ(d) scheme.
To observe the effect of fixing the routing probabilities for the hybrid SQ(d) scheme and the state independent scheme, we choose p i = γ i C i j∈J γ j C j for each server type i ∈ J . This choice of routing probabilities ensures that all arrival rates in the maximal stability region can be supported by the system operating under either the state independent scheme or the Hybrid SQ(d) scheme. We choose the same parameter setting as before and plot mean sojourn time of jobs as a function of λ in Figure 5 for the schemes under consideration. In this case, we notice that both Scheme 1 and Scheme 2 outperform the hybrid SQ(d) scheme. Hence, in the scenarios where estimation of arrival rates is not possible, Scheme 2 is a better choice than the hybrid SQ(d) scheme. We now numerically investigate the behaviour of the proposed schemes under different job length distributions. In Table 1, mean sojourn time of jobs under Scheme 1 is shown as a function of λ, for the following distributions.
1. Constant: We consider job length distribution having the cumulative distribution given by F (x) = 0 for 0 ≤ x < 1, and F (x) = 1, otherwise.
For both distributions we have µ = 1. We choose the following parameter values M = 2, C 1 = 4/3, C 2 = 2/3, N = 100, γ 1 = γ 2 = 1 2 , and d 1 = d 2 = 2. We observe that there is insignificant change in the mean sojourn time of jobs when the job length distribution type is changed. The results, therefore, justify the insensitivity property as discussed in Remark 5.1.

Conclusion.
We considered randomized job assignment schemes in a multi-server system consisting of N parallel processor sharing servers, categorized into M (≪ N ) different types according to their processing capacity or speed. In the proposed schemes, a small number of servers from each type is sampled uniformly at random at each arrival instant. It was shown that due to such sampling the schemes achieve the maximal stability region. Mean field analysis was carried out to show that asymptotic independence among servers holds even when M is finite and exchangeability holds only within servers of the same type. The existence and uniqueness of stationary solution of the mean field and doubly exponentially decreasing nature of the tail distribution of the number of jobs was established. Numerical studies have shown that, when the estimates of arrival rates are not available, the proposed schemes offer simpler alternatives to achieving lower mean sojourn time of jobs.
Define θ(x) = [min(x, 1)] + , where [z] + = max {0, z} and let us consider the following modification of (4.2)-(4.3): where the mappingl : R Z + M → R Z + M is given by Clearly, the right hand side of (4.5) and (A.4) are equal if u ∈Ū M . Therefore, the two systems must have identical solutions inŪ M . Also if g ∈Ū M , then any solution of the modified system remains withinŪ M . This is because of the facts that if u    Using the metric defined in (2.8) and the facts that |x + − y + | ≤ |x − y| for any x, y ∈ R, where u, w ∈ (R Z + ) M , K 1 and K 2 are constants defined as K 1 = λ min j∈J γ j + µ(max j∈J C j ) and K 2 = 4M λ max j∈J d j min j∈J γ j +3µ(max 1≤j≤M C j ). The uniqueness now follows from inequalities (A.5) and (A.6) by using Picard's iteration technique since (R Z + ) M is complete under the metric defined in (2.8).

APPENDIX B
We prove Proposition 4.2 by showing that the generators of the corresponding semigroups converge as N → ∞. We first recollect the following from [5].
• The generator A N of the semigroup {T N (t)} t≥0 acting on functions N , denotes the transition rate from state g to state h.
• The generator A of the semigroup {T(t)} t≥0 acting on functions f : U M → R having bounded partial derivatives is given by Af (g) = lim t↓0 u(t, g))| t=0 . In the following lemma, we characterize the the generator A N associated with the process x N (t). N → R is given by Under Scheme 2, the generator A N of the Markov process x N (t) acting on functions f : M j=1Ū (j) N → R is given by Proof. We only prove the lemma for Scheme 1. For Scheme 2, it can be shown similarly.
We first consider an arrival joining a server of type j with exactly n−1 unfinished jobs, when the state of the system is g. This corresponds to the transition from state g to the state g + e(n,j) N γ j . The term g d i denotes the probability with which an arrival joins a type j server with exactly n − 1 jobs. This is because a job joins a server of type j with exactly n − 1 occupancy only when the following conditions are satisfied: • Among the d j sampled servers of type j, at least one has exactly n − 1 jobs and the rest of them have at least n jobs. • For each i < j, all the d i sampled servers of type i have at least n − 1 jobs. • For each i > j, all the d i servers of type i have at least n jobs.
Since the arrival rate of jobs is N λ, the rate of the above transition is given by Further, the rate at which jobs depart from a server of type j having exactly n jobs is µC j N γ j g (j) n+1 . The expression (B.1) now follows directly from the definition of A N .
We now show that the solutions u(t, g) of (4.2)-(4.3) and (4.6)-(4.7) are smooth with respect to the initial point g and their partial derivatives are bounded.
Lemma B.2. For each j, n, j ′ , n ′ , i, k, and t ≥ 0, the partial derivatives ∂u(t,g) ∂g (j) n , ∂ 2 u(t,g) ∂g (j) n 2 , and ∂ 2 u(t,g) ∂g (j) n ∂g (j ′ ) n ′ exist for g ∈Ū M and satisfy Proof. The proof follows the same line of arguments as the proof of Lemma 3.2 of [9]. We omit the details.
Proof of Proposition 4.2. The proof is essentially the same as the proof Theorem 2 of [9]. We omit the details.

APPENDIX C
We prove the existence of equilibrium point for Scheme 1. Similar arguments apply for Scheme 2. For simplicity of exposition, we further restrict ourselves to the M = 2 case. However, the proof can be extended to any M ≥ 2.
The idea is to construct sequences P (j) k , k ∈ Z + for j = 1, 2 such that they satisfy the following three properties P.1 Equation (4.11) for j = 1, 2.
According to Proposition 4.3, we see that P = P (j) k , k ∈ Z + , j ∈ {1, 2} with components P (j) k satisfying the above properties, must be an equilibrium point of the system (4.2)-(4.3) and also must lie in the space U 2 . Note that if (P.1) holds and P (j) k ≥ 0 for all k and j, then P We now construct the sequences P Combining the above relations we obtain Note that that the sequences P (1) l (α), l ∈ Z + and P (2) l (α), l ∈ Z + are constructed such that they satisfy property (P.1). Hence, the the proof will be complete if for some α ∈ (0, 1) the properties (P.2) and (P.3) are satisfied. We first proceed to find α ∈ (0, 1) such that the sequences P (1) l (α), l ∈ Z + and P (2) l (α), l ∈ Z + are both positive sequences of real numbers in [0, 1]. This will ensure that (P.2) is satisfied.
We now proceed to show that the above sequences satisfy property (P.3). Let lim l→∞ P (1) l (α) = ξ 1 ≥ 0 and lim l→∞ P (2) l (α) = ξ 2 ≥ 0, where α is chosen such that both sequences become positive and non-increasing. Now, taking limit of (C.7) as l → ∞ we have Now using the stability criterion and the fact that 0 ≤ ξ 1 , ξ 2 ≤ 1 we have with equality holding if and only if ξ 2 = 0. Further, we have Hence, by multiplying both sides with ξ 1 we have with equality if and only if ξ 1 = ξ 2 = 0. Again, since ξ 1 ≤ 1 we have Hence, we have shown with equality holding if and only if ξ 1 = ξ 2 = 0. Hence, for (C.21) to hold we must have ξ 1 = ξ 2 = 0. This proves (P.3) and thus completes the proof.

APPENDIX D
To prove Theorem 4.3, we first state the following lemma. We will write g ≤ g ′ to mean that g (j) n ≤ g ′ (j) n holds for all n ∈ Z + and j ∈ J .
Proof. The proof is essentially the same as that of Lemma 3.3 of [9] and hence omitted.
We define v  In particular, Proof. Suppose that u(t, g) ∈ U M holds for all t ≤ τ . Hence, v 1 (τ, g) < ∞ and lim n→∞ u (j) n (τ, g) = 0 for each j ∈ J . Summing (4.4) first over all k ≥ n and then over all j ∈ J yields for all n ≥ 1. Hence, for all sufficiently small h > 0, we have v n (τ +h, g) < ∞ for all n ≥ 1. This implies that u(τ + h, g) ∈ U M for all sufficiently small h > 0. This fact along with g = u(0, g) ∈ U M implies that u(t, g) ∈ U M for all t ≥ 0. Further, (D.1) can be obtained by summing (4.4) first over all k ≥ n and then over all j ∈ J Proof of Theorem 4.3. Clearly, Lemma D.1 implies the following (D.4) u(t, min(g, P)) ≤ u(t, g) ≤ u(t, max(g, P)) Hence, to prove (4.16), it is sufficient to show that the convergence holds for g ≥ P and for g ≤ P.
We first need to check that for each such g, the quantity v 1 (t, g) (and hence also v n (t, g) for n > 1) is bounded uniformly in t. If g ≤ P, then by Lemma D.1 we have u(t, g) ≤ u(t, P) = P for all t ≥ 0. Hence, v 1 (t, g) ≤ v 1 (P).
Since the derivative of u (j) n (t) is bounded for all j ∈ J , the convergence u(t, g) → P will follow from (D.6) ∞ 0 u (j) n (t, g) − P (j) n dt < ∞, j ∈ J , n ≥ 1 in the case g ≥ P, and from (D.7) ∞ 0 P (j) n − u (j) n (t, g) dt < ∞, j ∈ J , n ≥ 1 in the case g ≤ P. Both the bounds can be shown similarly. We discuss the proof of (D.6).
To prove (D.6) it is sufficient to show that for all n ≥ 1. We will use induction starting with n = 1. Using (D.2), we have Since the right hand side is bounded by a constant for all τ , the integral on the left hand side must converge as τ → ∞. Now assume that (D.6) holds for all n ≤ L − 1. We have from (D.1) and (4.13) v L (0, g) − v L (τ, g) = − By the induction hypothesis, the last integral on the right hand side converges as τ → ∞. The left hand side also is uniformly bounded. Hence, the first integral on the left hand side also must converge as required.