A SKILL BASED PARALLEL SERVICE SYSTEM UNDER FCFS-ALIS — STEADY STATE , OVERLOADS , AND ABANDONMENTS By

• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.


Introduction.
1.1. Background and motivation. It is an inconvenient fact that most queueing models are complicated and often intractable. This is true even for the single server queue, with the shining exception of the M/M/1 queue, which in stationary form has exponential sojourn times and geometric (1−ρ)ρ n queue length distribution, and for which many other quantities can be calculated by explicit formulae. Research on queueing networks would have gone nowhere if it weren't for Jackson's discovery that Jackson networks have a stationary distribution given by (1 − ρ i )ρ n i i . From that time onwards, product form results have been keenly sought after, as they seem the best way to get explicit solutions and useful insight for more general models.
In recent years it has become very important to investigate service systems which serve customers of several types, and which employ servers of various skills. Such service systems are referred to in current literature as queues with skill based parallel service. These systems have a bipartite compatibility graph to indicate which types of servers can serve which types of customers. Applications include such varied fields as call centers, outsourcing, manufacturing processes, cloud computing and health systems. As one might expect, studying these systems immediately gives rise to very complex models. Most queueing papers in this area deal with very limited simple compatibility graphs, such as "V" systems, "Λ" systems, "N" systems, and results are mostly obtained as approximations using various types of scaling. The behavior of queues with skill based parallel service is highly dependent on the type of policy which is used to assign servers to customers, and much effort has gone into trying to define objectives and identify optimal or near optimal policies, see e.g. [12,15,27,31,36].
In this paper we present the first general product form results for an important class of useful skill based parallel service queues. Our focus is the FCFS-ALIS -first come first served, assign longest idle server -policy. This means that, whenever a server becomes available, he will pick the longest waiting customer in the system which he can serve, and whenever a customer arrives to find several idle servers, he will be assigned to the longest idle compatible server. This policy has several attractive features: First and foremost it is fair to both customers and servers -in many systems, e.g. organ donations, public housing assignment, FCFS is dictated by law. ALIS is the best way to equalize the efforts of the servers, and thus it encourages diligent service. The policy is also very natural and easy to implement, and it requires minimal information about the parameters of the system and its current state. As a result, it is useful also when the parameters are time varying, e.g. for systems with periodic fluctuations of the workload.
Our results in this paper complement and extend earlier results, which we have obtained in two recent papers. In [35] we have derived a product form solution for the same model under a different policy, in which service is FCFS, but arriving customers that find several idle servers are assigned to a compatible server randomly. However, those results did not give rise to a practical policy, since product form only held for a singular choice of the assignment distribution. In [4] we considered FCFS matching of an infinite sequence of customers and an infinite sequence of servers, giving rise to a model that is much simpler than a queue, but provides a great deal of insight. Both models are closely related to the model of this paper. Two motivating precursors to these papers were the paper of Talreja and Whitt [34], which introduced FCFS skill based routing in an overloaded system with abandonments, and a paper of Caldentey, Kaplan and Weiss [18], which introduced the infinite matching model. Product form results for skill based routing in a loss system were obtained in [3,4].
Our model in this paper is described in the standard customer server language used for queueing models. However, it should find as much use also to describe the flow of jobs in a manufacturing system with non-homogenous machines, skill based routing of calls to agents in a call center, wireless messages to ad hoc nodes, evaluation threads to computing processors, and so on. See in particular [6,7,9,11,24,25,26,30,32,37].

System description.
We study the following system: There are J parallel servers, labeled m 1 , . . . , m J , and there are several types of customers, labeled a, b, c, . . .. We denote the set of servers by S and the set of customer types by C. Service is skill based, in the sense that each server m j has a non-empty subset of customer types which he can serve, given by C(m j ), the union of which is C, and customers of type c have a non-empty set of servers which can serve them, given by S(c). This is described by a bipartite graph between the servers and the customer types, with directed arc (c, m j ) if c can be served by m j . We assume that this graph is connected. The motivating advantage of such systems is that while they provide custom tailored service to the various types of customers, the overlap of server skills allows for resource pooling and reduced congestion.
We assume that arrivals are Poisson and service is exponential. Customers of type c arrive at the system in independent Poisson streams with rates λ c , c ∈ C. Service times of server m j are independent and exponentially distributed with rate µ m j , j ∈ S. Note that service durations of customers depend on the server providing the service, and not on the customer type. In the literature, this case is also referred to as pool dependent service rates, see e.g. [16].
As stated before, the service policy is FCFS-ALIS. Under this combined FCFS-ALIS policy we adopt the following, somewhat non-standard, Markovian system description. Typically, parallel service systems are described by several queue of waiting customers and the set of parallel servers which are either busy or idle. Here, we imagine all the customers to be waiting in a single queue according to their order of arrival, each of the customers being distinguished by his type. The servers move through the queue to provide service to successive customers, and customers leave their place only on completion of service. A full description of the state of the system consists of three parts: first the list of all the customers in the system in their order of arrivals, including customers which are being served (since they stay in line), second the location in this list of each of the busy servers, where we imagine that the servers are situated in the queue at the position of the customer that they are serving, and third the list of all idle servers, ordered by increasing idle time and waiting at the head of the line for new customers to arrive. The following illustration of the system, and description of its state are similar to those in [35]. We use the same example here to point out how the current system differs from [35]. Consider a system with three servers and three customer types, as shown in Fig. 1: There are three customer types a, b, c and three servers with C(m 1 ) = {a, b}, C(m 2 ) = {a, c}, C(m 3 ) = {a}, the left side of the figure is the compatibility graph, the right side shows the routing of arrivals to the servers.
Three possible states of this system are depicted in Fig. 2, which employs the following way to describe the state of the system: The customers are denoted by circles and the servers by rectangles. Customers in service have a rectangle drawn around them with the identity of the server inside. Idle servers are denoted by rectangles with a * instead of a circle. The customers are ordered from left to right by increasing time of arrival, followed on the right by the idle servers, ordered from left to right by increasing idle time. One can visualize the dynamics of the system with customers arriving from the right, scanning the idle servers to pick the rightmost (longest idle) one that is compatible, and then joining the queue at the rightmost position with this server, or without a server if none is compatible. Concurrently when a service is completed the customer is removed from the queue, and starting from its current position, the server that completed service moves to the right looking for the earliest waiting customer which is compatible, and starts serving it, or if no compatible customer is found the server joins the idle servers in the leftmost position. Note that waiting customers to the left of a server in this picture must be incompatible with that server, because of the FCFS rule. The difference with the system in [35] is that in the current system we need to record the list of idle servers, ordered by increasing idle time, whereas in [35] we only need to know which servers are idle. For product form systems, however, it is not uncommon that apparently small changes immediately render the system intractable. Surprisingly, in this case it appears that product form is preserved. In Fig. 2 part (i) there are 12 customers in the system, and all the servers are busy. Server m 1 is serving the first customer in line, which must therefore be either of type a or of type b. Following the queue to the right, server m 2 is serving the first customer in the line which he can serve, which is the 5th customer in the line, and must therefore be either type a or type c. Server m 3 is serving the first customer in the line (apart from customers 1 and 5) which he can serve, and must be of type a. There are 3 customers waiting between servers m 1 and m 2 . These customers cannot be served by either server m 2 or by server m 3 , so they must be type b customers. There are 4 customers waiting between m 2 and m 3 , those cannot be served by server m 3 , so they must be of types b or c. Finally, there are 2 customers at the tail of the queue, behind server m 3 , which may be of types a, b, or c. The situation in part (ii) of Fig. 2 is that servers m 3 , m 1 are busy, with server m 3 serving the earlier customer, while server m 2 is idle. There can be no customers waiting after server m 3 and before the next server, because servers m 1 , m 2 combined can serve all types, and would have picked up the next customer after server m 3 . The situation in part (iii) of Fig. 2 is that only server m 1 is busy, with 3 type b customers waiting and servers m 2 , m 3 are idle, with server m 3 the one that became idle first. If the next arriving customer is of type a he will go to server m 3 (and not to m 2 , because of ALIS). If he is of type c, he will go to server m 2 . If he is of type b, the two servers will remain idle and the customer will join the queue in the last position.
We will actually aggregate some of the states in this detailed description, to simplify the model while retaining the Markovian property.
1.3. Overview of results. In the first part of this paper (Section 2) we examine our system when it is stable, i.e. positive recurrent. We define a Markov chain which describes the system, derive its transition rates, set up partial balance equations, and solve these equations to obtain conditions for ergodicity and an explicit product-form expression for the stationary distribution of the system. The derivation of the product form solution is similar to that in [35], but the difference in the state description, by including the list of idle servers ordered by increasing idle time, makes it necessary to present the full proof again. We feel that this is also important for the readability of the paper. We then compare, in Section 2.3, the behavior of the system under ALIS with the behavior under the random assignment policy of [35]. Surprisingly, it turns out that they both possess the same stationary distribution. The distribution of the waiting time in the queue is briefly mentioned in Section 2.4, as it is the same as the one derived in [35].
In the second part of the paper (Section 3) we analyze the system under overload conditions. The behavior under FCFS for an overloaded system with several types of customers is trivial for systems with uniform serversall the servers work all the time and customers are served one after the other, while the queue of customers continues to grow at a linear rate. For systems with skill based parallel servers, the behavior under overload is much more intricate: Different types of servers may move through the queue at different rates, and different types of customers will be served at different rates, even though the policy is FCFS. We are not aware of any previous papers which address this question. The main tool in the analysis of the overloaded system is the derivation of the fluid scale dynamics under FCFS-ALIS. The novel derivations of these dynamics are of interest in their own rights, and may prove to be useful in future research. We also require a lemma on local steady state of a Markovian system, which is given in [2]. To analyze the behavior of overloaded systems under FCFS-ALIS, we introduce the concept of complete resource pooling of the system: under complete resource pooling, the system is stable under FCFS-ALIS whenever the total arrival rate λ = c∈C λ c is less than the total service rate µ = J j=1 µ m j . We derive an explicit condition for complete resource pooling. Under complete resource pooling, if the system is overloaded, i.e. λ > µ, the system is transient with the number of customers in the system growing without bound as t → ∞. However, we show that at the same time, as t → ∞, the servers stay together, and only the queue behind the last server grows without bound, while the state of the servers and the customers waiting between them converges to a stationary distribution. Remarkably, this stationary distribution is the same as that obtained for FCFS infinite matching in [4]. We also consider the case when complete resource pooling does not hold. In that case we show, by considering a network maximal flow problem, that the system under FCFS-ALIS policy will, when overloaded, decompose in a unique way to a partition of the servers, where each subset of servers stay together and serve a subset of the customers, while queues between these subsets of servers will grow without bound. Again, the state of each subset of servers, and the customers waiting between them, will converge to a stationary distribution given by [4].
Finally, in Section 4 we outline heuristic conjectures on many server behavior of the FCFS-ALIS parallel service system, under several generalizations, including abandonments, general arrival streams and customer-server type dependent service times. These conjectures may stimulate further research. At the time of writing we do not yet have rigorous proofs of this behavior under many server scaling.
Note: We chose to put most of the proofs in an appendix, for various reasons: The proofs for Section 2 mainly follow results in previous papers [35,4], which the reader may be familiar with. The proofs for Section 3.2 employ techniques of fluid limits similar to [17,21], while the proofs for Section 3.4 employ techniques of flows in networks [23], and the reader may not wish to pursue those. We hope this may improve the readability of the paper.
1.4. Some comments on resource pooling for skill based parallel service. As stated before, the performance of systems with skill based parallel service is highly dependent on the service policy, and on the traffic intensity. When resource pooling occurs, the performance of the system is essentially similar to what could be achieved with a homogeneous pool of servers of similar capacity. In the case of systems which have excess capacity, referred to as QD (Quality Driven) regime, pooling usually reduces waiting times: lack of pooling results in the partition of the servers into several pools, with separate queues at each, so that service is not uniform, and waiting times are much higher. In critically loaded systems, for example in the QED (Quality and Efficiency Driven) regime, lack of pooling will result in some parts of the system being stable and others being unstable. For overloaded systems, which are stabilized by abandonments (in the ED, Efficiency driven regime), resource pooling will as a rule lessen the abandonments. These topics are discussed in [12,13,14,15,16,28], where various policies, seeking optimal performance are discussed. In these papers results are obtained mainly for heavy traffic, and the analysis is based on diffusion approximations. In the current paper, in contrast to all the above papers, we focus on the FCFS-ALIS policy. We obtain resource pooling in the stable case (Section 2), and also in the overloaded case, when our condition for complete resource pooling holds (see Section 3.1 for the definition, and Section 3.3 for the pooling result). The important thing to note is that our condition for complete resource pooling is a minimal condition. It is a necessary condition under any of the policies in the above references. Furthermore, we believe it is a necessary condition for resource pooling under any policy, for any reasonable definition of resource pooling. As we show in the current paper, this minimal necessary condition is also sufficient for resource pooling when the policy is FCFS-ALIS.
2. The stable system. In this section we define a continuous time Markov chain X(t) to describe the dynamics of our queueing system. We derive conditions for ergodicity for this Markov chain, and we obtain its stationary distribution, which is of product form (Sections 2.1, 2.2). We then compare it with the stationary distribution of the system under the policy of [35] (Section 2.3), and mention the derivation of the waiting time distribution (Section 2.4).
To define the Markov chain we aggregate some of the states in the detailed description, to simplify the model while retaining the Markovian property. We retain the identity and location of the busy servers, but we do not specify the type of customer that each of them is serving. Also we only record the number of customers between the busy servers, and do not specify the string of customer types. Finally, we retain the order of the idle servers. Thus the state of the system in Fig. 2(i) is denoted (m 1 , 3, m 2 , 4, m 3 , 2), the state in Fig. 2(ii) is (m 3 , 0, m 1 , 3, m 2 ), and the state in Fig. 2 Note that each busy server is followed by a number which counts how many (could be zero) customers are waiting behind him, while idle servers are not followed by a number. Also note that the location of the servers implicitly contains information on the type of customers waiting in line between them: they have been skipped and thus cannot be served by the servers further down the line. For example, the three customers behind m 1 , in the state (m 1 , 3, m 2 , 4, m 3 , 2) in Fig. 2(i), can only be handled by m 1 and not by m 2 or m 3 . This means that they have to be of type b. Similarly, each of the four customers behind m 2 are either of type b or type c. We can tell that they are of type b with probability λ b λ b +λc and of type c with probability λc λ b +λc . Customers behind m 3 have not been scanned yet and thus may be of any type. We introduce the following notation: M := an arbitrary server M from the set of servers S = {m 1 , . . . , m J }. The capitalised M points to one of the servers m j . Note that the names (or labels) of the servers m j are not capitalised.
C(Y) := total set of customer types that can be handled by the servers in Y ⊆ S, which is equal to m j ∈Y C(m j ).
U (Y) := set of customer types unique to the servers in Y ⊆ S, thus the set of customer types that cannot be served by servers outside Y. We have U (Y) = C(Y), where A denotes the complement of set A. S(X ) := total set of servers that can serve customer types in X ⊆ C, which is equal to c∈X S(c).
2.1. Definition of the system state and the Markov chain. We define the Markov chain X(t) as the process which records the state of the system at time t, where the state in general is: There is a total of i + i j=1 n j customers in the system, i of which are being served.
The state space is denoted by S and to simplify the notation we use s to denote an arbitrary state s = (M 1 , n 1 , . . . , M i , n i , M i+1 , . . . , M J ) ∈ S. Fig. 3 shows a system in state s. There are a few things that are important to note about this state description: First, the waiting customers between servers M j and M j+1 can only be handled by some of the servers M 1 , . . . , M j and not by any of the servers M j+1 , . . . , M J . This is due to the FCFS serving order. Thus waiting customers between servers M j and M j+1 can only be of type c ∈ U ({M 1 , . . . , M j }). The n i waiting customers in the back of the queue cannot be handled by any of the idle servers and have to be of type c ∈ U ({M 1 , . . . , M i }).
Second, since each part of the queue between two servers contains customers from different subsets of customer types, it is necessary to keep these parts separated in the state description. It is not possible to aggregate the state description any further without losing the Markov property. Because the state description does not specify the types of the n j customers between M j and M j+1 we cannot tell the type of each of them, but we do know that he is of type c where c ∈ U ({M 1 , . . . , M j }) with probability , independent of all the others. This is because the queue between servers M j and M j+1 contains all the customers of types U ({M 1 , . . . , M j }) that arrived between the two customers served by M j , M j+1 , in their original order of arrival.
Third, it is possible that the set of customer types U ({M 1 , . . . , M j }) is empty for a certain set of servers {M 1 , . . . , M j }. In this case there are no customers which cannot be handled by any of the servers M j+1 , . . . , M J . Thus there can be no waiting customers between M j and M j+1 , and therefore n j can only be equal to zero. In that case one also has that n 1 , n 2 , . . . , n j−1 = 0.
Fourth, it is important to note that in this state description we lose customer type information about the customers that are in service, since we only denote the server that is handling the customer and not the type of the customer. This aggregation preserves the Markov property since all types are processed by server m j at rate µ m j . We conjecture that specifying the customer types in service will destroy the possibility of a product form solution. This was also indicated in Proposition 8, Section 2, of [35], which illustrates the subtlety of the definition of the system state, being crucial to the existence of product forms.

2.2.
Dynamics of the Markov chain, ergodicity, and product form stationary distribution. The dynamics of the Markov chain X(t) are as follows: When the system is in state s, the following transitions are possible: (i) When a customer of type c arrives, he will activate server M J if c ∈ C(M J ), and he will activate server M j for i . The customer will move to the end of the queue with the activated server which will then become M i+1 with n i+1 = 0. If c ∈ U ({M 1 , . . . , M i }), then the customer will move to the end of the queue and wait, the idle servers will remain unchanged, and n i will increase by one. (ii) When server M j completes service, he will scan the customers in queue from left to right, starting with the n j customers queued behind it, and continuing with the queues behind servers M j+1 , . . . , M i . It will skip a customer in the queue behind M k , k ≥ j, if the customer type is c ∈ U ({M 1 , . . . , M k })\C(M j ), and will pick up the first customer of type c ∈ U ({M 1 , . . . , M k }) ∩ C(M j ). If he finds no customer to serve, he will join the idle servers, in the leftmost position.
It is readily verified that the process X(t) on S is a continuous time Markov chain. Furthermore it is irreducible (cf. Section 3.1 in [35]). The following theorem states the stationary distribution of X(t) which is of product form.
Theorem 2.1. The solution to the equilibrium equations for the process X(t) is given by The Markov chain is ergodic if and only if the two equivalent conditions hold: After normalisation this solution becomes the stationary distribution, with normalization constant: where P is the set of all the permutations of S (by convention empty products are 1).
..,M j }) = 0 and hence, π(s) = 0 for all s = (M 1 , n 1 , . . . , M j , n j , . . . , M J ) with n j > 0, as it should. Setting up the equilibrium equations and verifying Theorem 2.1 is similar to [35]. The proof is given in Appendix A.1 2.3. Comparison with random assignment model. In a recent paper [35], the same queueing system was analyzed, but under a different policy. Surprisingly, we find that the stationary distribution under both policies is the same.
The policy used in [35] is as follows: When a server becomes available, he will take the longest waiting compatible customer, so FCFS. However, when a customer arrives and finds several compatible idle servers, he will not go to the longest idle server, but will instead choose one of them randomly, using a random assignment probability. Thus, if a customer of type c arrives to find servers M 1 , . . . , M i busy, he will go to server M j which is idle and for which c ∈ C(M j ) with probability P (c, M j |{M 1 , . . . , M i }). It is shown in [35] that these probabilities can always be chosen in such a way that the queueing system will have a product form solution, provided it is stable (which holds if and only if (2.2) is satisfied). While these special assignment probabilities may not be unique, they will determine unique values of λ M j ({M 1 , . . . , M i }), the rate at which idle server M j will be activated when servers {M 1 , . . . , M i } are busy. These unique activation rates can be calculated recursively from: These activation rates have the special property that: for every permutation M 1 , . . . , M i of M 1 , . . . , M i . In [35], this property is referred to as the assignment condition.
The dynamics of the system under the random assignment policy are described by a Markov chain Y (t), whose states list the busy servers in order, and the number of customers queueing between them, given by M 1 , n 1 , . . . , n i−1 , M i , n i , where the idle servers are S\{M 1 , . . . , M i } (and now there is no need to record the list of idle servers, in order of increasing idle time). The stationary distribution, when the assignment probabilities and the activation rates are as described above, are given by (see [35], Theorem 2) Theorem 2.2. The system under random assignment, satisfying the assignment condition of [35], and the system under FCFS-ALIS share the same stationary behavior in the sense that: The proof of Theorem 2.2 is given in Appendix A.2, and it is based on the proof of a similar result for a skill based service Erlang loss system [3,5].

2.4.
Waiting time distribution. The waiting time distribution for a customer of type c that arrives at the system is derived for the random assignment policy in [35], by using the distributional form of Little's law. By Theorem 2.2 in the previous section, a customer of type c that arrives at the system when the system is governed by the FCFS-ALIS policy, will, under steady state, see exactly the same distribution of states as under the random assignment policy of [35]. As a consequence, the waiting time distribution for a customer of type c will be exactly the one derived in [35], Theorem 3.
3. The overloaded system. In this section we consider the system as before, with total arrival rate λ = c∈C λ c and total service rate µ = J j=1 µ m j . We introduce the following notations: ρ = λ/µ is the total traffic intensity, α c = λ c /λ is the fraction of arrivals of type c customers, and β m j = µ m j /µ is the fraction of service capacity of server m j . Also, for subsets, α X = λ X /λ, X ⊆ C, and It is convenient in this section to rewrite the stationary probabilities of X(t) as: We will study what happens to X(t) as the total traffic intensity increases. We will assume that α, β are fixed, µ is fixed, and the total arrival rate λ increases. Under these conditions, for ρ > 1 the system becomes unstable with some of the queues growing without bounds. We will discover that when some of the queues grow without bounds, the rest of the system stabilizes and has a stationary behavior, which is identical to that observed for FCFS matching of two infinite sequences (of servers and of customers) as discussed in [4]. We will distinguish a case of complete resource pooling and a case of incomplete resource pooling. For the latter we will find a unique decomposition of the system.
We state the infinite matching model of [4] here (see also [18]): There are two infinite sequences, a sequence of customers c 1 , c 2 , . . . chosen i.i.d. from C with probabilities α, and a sequence of servers (or of services) s 1 , s 2 , . . . chosen i.i.d. from S with probabilities β, and a bipartite graph of compatibilities. Servers and customers are matched on a FCFS basis. Note that this is not a queueing model -each server is used only once, when he is matched to a customer, and there are no service durations and no arrival times, only the order of the customers and of the servers is relevant. The roles of the servers and customers in this model are entirely symmetric. The policy is FCFS in the ordinal sense and not in the temporal sense: a customer that precedes another customer in the sequence of customers is matched to the server in the earliest possible position in the sequence of servers.
A discrete time markov chain, I = (I(N ), N = 0, 1, . . .) is associated with the infinite matching model. I(N ) describes the state of the system after matching the first N servers, and it is defined as follows: After matching the first N servers to customers, the sequence of customers has an initial segment in which all customers are matched, followed by a middle segment in which some are matched and some are not, followed by a last infinite segment where none are matched. I(N ) describes this middle segment. The state is of the form (M 1 , n 1 , . . . , M J−1 , n J−1 , M J ), where M 1 , . . . , M J is a random permutation of S that lists the last matched server of each type according to its order of appearance in the sequence of customers, and n j is the number of unmatched customers between M j and M j+1 . It is shown in [4] that I is an ergodic Markov chain and has a product form stationary distribution given by: whenever B I is positive and finite. The analogy between I and our system is clear: The last matched server of type m j in I(N ) corresponds to the position of server m j in the queue, the permutation M 1 , . . . , M J in I(N ) is the order of the servers in the queue, and the unmatched customers in I(N ) correspond to the queues between the servers. The infinite sequence of customers that follow after M J in the infinite matching model corresponds to waiting customers and to all future arrivals in the queueing model. The rest of this section is structured as follows: In Section 3.1 we define complete resource pooling, under which X(t) is stable for all λ < µ, and show that as λ ր µ the stationary distribution of X(t) converges to that of I(N ). In Section 3.2 we study the fluid approximation to our queueing system. This contains information on the dynamics of the system which cannot be gleaned from its stationary distribution. The rest of the section deals with overloaded systems: in Section 3.3 we study the limiting behavior of the overloaded system under complete resource pooling. We show that while the last queue grows to infinity, the rest of the system converges in law to the stationary distribution of I(N ). In particular, while the last queue grows to infinity, the remaining queues between the servers remain well behaved, and the servers stay close together. In Section 3.4 we study a network maximal flow problem that is related to the stability of our system, and derive a unique decomposition of the system, in the case of incomplete resource pooling. In Section 3.5 we study the limiting behavior of the overloaded system under incomplete resource pooling.
3.1. Complete resource pooling. We say that the system satisfies complete resource pooling if the following three equivalent statements hold: for C ⊂ C, C = ∅, C.
As we state in the next theorem, under complete resource pooling, the process X(t) will be stable for all ρ < 1, and as ρ ր 1 its stationary distribution will converge to the stationary distribution of I(N ), the process describing FCFS matching of two infinite sequences in [4].
We will use the following notation for the marginal distributions : Theorem 3.1. Consider the system with fixed µ, α, β, and let λ vary. Assume that complete resource pooling holds.
(i) The process X(t) is ergodic for all λ < µ.
The same results with the same limiting values hold also for the stationary distribution of the discrete-time Markov chain of jumps.
The proof of Theorem 3.1 is given in Appendix B.1.

3.2.
Fluid limits of the FCFS-ALIS queueing system with multitype customers and servers. The stationary distribution of a Markovian system provides complete information about long term performance measures, and it is therefore very useful. However, it does not provide any information about the dynamics of the system. The dynamics of a Markovian system are of interest when the system is stable, as they provide information about its short term behavior. Furthermore, for unstable systems they provides information on the transient behavior of the system. Exact analysis of the dynamics is almost always intractable, however the dynamics can be approximated by studying fluid or diffusion approximations to the system. In this section we study the fluid approximation of our system. For an introduction to the study of fluid limits, see [17,19,20,21]. We introduce a different notation for our Markov process X(t). We let The total number of customers in the system is |Q(t)| = Q j (t), ordered in order of arrival with Q 1 earlier than Q 2 earlier than Q 3 , and so on. Of the queues at time t, the first k(t) queues are non-empty, the remaining are empty. Server M j (t) is serving the first of the customers of Q j (t) (the 1 + i<j Q i (t) customer in the queue), where j = 1, . . . , k(t), and servers M j (t), j = k(t) + 1, . . . , J are idle, ordered by increasing idle time. The customer in service at M j (t) is of course of type c ∈ C(M j ). The remaining Q j (t)−1 waiting customers are all of them of types in U ({M 1 (t), . . . , M j (t)}).
We also introduce the following processes that give an alternative description of the system. Assume each customer upon arrival is given a number which counts its position in the arrival stream. We let A(t) be the total number of arrivals by time t, where the last arrival was numbered A(t). We also let Y 1 (t) < · · · < Y J (t) be the positions of the servers in the sequence of customers up to time t, so that Y i (t) is the sequence number of the customer currently served by the ith server, which is server M i (t). If the number of busy servers is k(t) < J, we will let the positions of the idle servers The fluid scaling of an arbitrary function z(t) is denotedz n (t) = z(nt)/n. Let z(t, ω) be a stochastic process with paths in D d (real vector functions in the d dimensional Euclidean space which are right continuous with left limits). Let ω denote a fixed sample path, and let r be a divergent subsequence of integers. Ifz r (t, ω) converges to a deterministic functionz(t) as r → ∞, then we callz(t) a fluid limit. Convergence here is in the Skorohod J 1 topology on D d , but ifz(t) is continuous it is equivalent to convergence uniformly on compacts (u.o.c.). To obtain fluid limits withz(0) = 0 one needs to consider a sequence of functions z n (t, ω) which are all defined on the same probability space, and which differ in their initial values z n (0, ω).
To study fluid limits of our system we now define a sequence of systems, by describing the primitives in the probability space which are common to all of them and by defining the sequence of their initial states. For simplicity take λ + µ = 1 and think of the evolution of our systems for all n = 1, 2, . . . and t > 0 as powered by a common Poisson process with rate 1. Each event of the process is either an arrival with probability λ or a service completion with probability µ. An arrival is an arrival of a customer of type c with probability α c . A service completion is by server m j with probability β m j . A customer departs at a service completion only if server m j is not idle. To define the state of the system at time 0 we also have a multi-Bernoulli process of customer types, denoted C = c −1 , c −2 , . . . , c −k , . . ., which represent customers that arrived in the past, prior to time t = 0, ordered by order of arrival, with type c occurring with probability α c . We exclude from this list of previous customers the ≤ J customers which are in service at time 0. These stochastic primitives are common to all the systems.
Next we describe the initial states of the sequence of systems: They only differ by the location of the servers at time 0, within the sequence C: Let q 1,n , . . . , q J,n be some deterministic non-negative integers, with q n = J k=1 q k,n , we then set Y n j (0) = j + j−1 k=1 q k,n , A n (0) = J + q n . The q n customers waiting for service at time 0, between the positions of the servers, are of types c −q n , . . . , c −1 , so that c −q n is the type of the earliest arrival customer, and c −1 is the type of the latest arrival prior to t = 0. To fix our initial limiting state we assume that q j,n /n converge to some fixed values, which determines the fluid limit values of 0 =Ȳ 1 (0) ≤ · · · ≤Ȳ k(0) (0) < A(0) =Ȳ k(0)+1 (0) = · · · =Ȳ J (0). For simplicity we let S n (0) = S 0 where S 0 is a fixed initial permutation which definesS(0). Note that the sequence q n,j and S 0 are fixed deterministic, while the sequence of customers that arrived prior to time 0 is random i.e. it depends on ω.
Fluid limits of A, Y , and S may be obtained when we takeĀ n (t, ω) = A n (nt, ω)/n andȲ n (t, ω) = Y n (nt, ω)/n, andS n (t, ω) = S n (nt, ω) and let n → ∞. Note that S n (·, ω) is a function from [0, ∞) to the finite set of permutations P, so there is no division by n in the definition of S n (t, ω). The following complication does however arise: For all ω, n, t, we have thatȲ n 1 (t, ω) < · · · <Ȳ n J (t, ω), and there is a unique permutation S n 1 (t, ω) < · · · <S n J (t, ω) associated with them. However, in the limit we may haveȲ 1 (t, ω) ≤ · · · ≤Ȳ J (t, ω), and so the fluid limits ofS n (t, ω) will not be a permutation, but will be a partition of a permutation, i.e. an ordered partition. For example, ifȲ 1 (t) <Ȳ 2 (t) =Ȳ 3 (t) =Ȳ 4 (t) < Y 5 (t) =Ȳ 6 (t) <Ȳ 7 (t), then a possible fluid limit will be the ordered parti- . In general we will associate with a fluid limitȲ (t) ofȲ r (t, ω) a fluid limit forS r (t, ω) that will be given by a corresponding ordered partitionS(t) = (S 1 , . . . , S K ) where each S i is a subset of servers, and S 1 , . . . , S K is a partition of S. This is not standard in the literature on fluid limits and we shall make the definition of such a fluid limit precise in Appendix B.2.
When discussing ordered partitions, we may sometimes suppress some information and group some of the subsets of the partition, and replace (S 1 , . . . , S K ) by say (S ′ , S k , S ′′ ), where S ′ = k−1 l=1 S l and S ′′ = K l=i+1 S l . We extend the definition of complete resource pooling (3.4) to subsets.
Definition 3.2. Consider a partition of the servers into subsets S ′ , S, S ′′ . We say that S has complete resource pooling between S ′ and S ′′ (the order of S ′ before S ′′ is important here), if the subsystem which consists of servers m i ∈ S, and the customer types We now have the following theorem, which summarizes the fluid dynamics of the system: Theorem 3.3. Consider a sequence of systems as above. Then A n (t, ω),Ȳ n (t, ω),S n (t, ω),Q n (t, ω) converge almost surely to fluid limits A(t),Ȳ (t),S(t),Q(t) as n → ∞. The fluid limits satisfy: (i) for some k < l and for some t, and letS(t) = (S ′ , {M k , . . . , M l }, S ′′ ) be the corresponding partition of the servers. ThenȲ k , . . .Ȳ l will move together at rates: , i = k, . . . , l.
during t < τ < t + ∆ for some ∆ > 0, if and only if {M 1 , . . . , M k } have complete resource pooling between S ′ and S ′′ .  Theorem 3.3 implies that the processesȲ j (t) will move at any time with constant rates, and in coalescent subsets, so that whenever one subset of servers overtakes another, they either coalesce into a single subset, which is the union of both, or the union of the two subsets breaks up into new subsets, which move away form each other. These coalescent subsets always satisfy complete resource pooling between their predecessors and successors. This can continue until all the servers coalesce and all move together, or until they are partitioned into subsets, which drift apart at constant rates. The front subset of servers may move at the rate of λ if that subset has enough capacity to serve all of its customers, or at a rate < λ if it is overloaded. We show in Section 3.4 that these subsets are uniquely determined, irrespective of the initial fluid stateZ(0). We now present an example that illustrates how Theorem 3.3 determines the dynamics of the fluid limits.
Example. We consider a system with 5 types of customers and 5 servers, where each node in the bipartite graph is of degree 2. The system and values of α, µ are displayed in Fig. 4. The total arrival rate is λ = 60, which is larger than the total service capacity of 50, so the system is overloaded.
Because any four servers can serve all types of customers,Q 1 (t) = 0 for all t. We assume that the initial partition of the servers is  (Fig. 5 (a) ) At t 2 server m 2 overtakes server m 1 , there is complete resource pooling of {m 1 , m 2 }, and they coalesce and continue together. HereQ 4 empties, and stays empty in the next interval ( Fig. 5 (b) ). At t 3 server m 5 overtakes {m 1 , m 2 }, there is no resource pooling, and m 5 moves ahead (Fig. 5 (c) ). At t 4 servers {m 3 , m 4 } overtake servers {m 1 , m 2 }, there is no complete resource pooling, and the two subsets change positions (Fig. 5 (d) ). At t 5 server m 5 catches up with the input, and thereafterQ 5 remains empty, and server m 5 is underloaded Following t 5 there are no more collisions, the queue of server m 5 is stable with fluid 0, and the remaining two fluid queues continue to fill up at constant rates.
The fluid dynamics ofȲ (t) can be calculated from Equation (3.6) of Theorem 3.3, and the fluid dynamics ofQ(t) can be calculated from Equation (3.7) of Theorem 3.3. Table 1 lists the the configurations in each interval, the collision times, the fluid queue lengths at those times, and the rate of change in the queue lengths in each interval. Overlined numbers denote repeated fractions. Fig. 6 plots the fluids in the system over time.
In the long run, after t 5 , servers m 1 , m 2 will lag behind, serving customers of type c 1 , servers m 3 , m 4 will be ahead of them serving customers of types c 2 , c 3 and skipping customers of type c 1 . The queues of customers of these types will continue to fill up. Server m 5 will be ahead, serving customers of type c 4 , c 5 , skipping all other types, and keeping the queue behind him stable. Server m 5 will be processing at rate 7.5 and will be idle a quarter of the time.
3.3. The overloaded system under complete resource pooling. We consider now the behavior of the system under complete resource pooling, when λ > µ. Here clearly the Markov process is transient. However, what we will see is that while the queue behind the last server grows without bound, the servers and the queues between them tend to a limiting distribution, which is again that of the FCFS infinite matching model. We will use the notation and the results on the fluid dynamics from Section 3.2. We will also need the following lemma, the proof of which is given in [2]. Lemma 1. Let X(n) = (X 1 (n), X 2 (n)) be a Markov chain on countable state space with X i (n) ∈ Z + . Assume the following: 1. lim n→∞ X 2 (n) = ∞ almost surely. 2. P (X 1 (n + 1) = j|X 1 (n) = i, X 2 (n) = l) = P i,j , for all values of l > 0, where P i,j are transition probabilities of an ergodic Markov chain with stationary probabilities π j .
Then for all initial i 0 , j 0 : i.e. X 1 (n) converges in distribution to π in total variation norm.
We can now prove the following theorem: Theorem 3.4. Assume complete resource pooling and λ > µ.
Proof. We consider first the fluid limits for the overloaded system. By the results of the previous section, we have that d dtȲ J (t) ≤ µ. This will hold, because by Proposition B.9, the front subset of servers S will move at a rate µ β S α C(S) , and by complete resource pooling, β S < α C(S) . SinceQ J (t) = A(t) −Ȳ J (t), we have: HenceQ J (t) → ∞ as t → ∞, and then of course Q J (t) will diverge almost surely. In particular, this implies that all servers will be busy, so k(t) → J almost surely as t → ∞. Consider now the behavior of our Markovian system when Q J (t) > 0. When Q J (t) > 0, all the servers are busy, and the queues Q j (t), j = 1, . . . , J −1 have transitions which occur as a Poisson process of rate µ, irrespective of the current state. The sequence of states following each transition form a discrete time process, with Markovian transition probabilities, which are exactly those of the FCFS infinite matching model, and do not depend on the value of Q J (t). Hence, the conditions of Lemma 1 are fulfilled, and the discrete time jump process of states will converge in law to π I . As a result, the continuous time process will also converge in law to π I .

3.4.
Unique decomposition under incomplete resource pooling. We again consider the system with fixed α, β, µ and let λ increase, but we now consider the case that complete resource pooling does not hold. We show that there exists a unique decomposition of the system when it is overloaded. To do so, we associate with our system the following network (see Fig. 7): The nodes are c ∈ C, m j ∈ S, a source node o, and a sink node t. The arcs are o t c m j (o, c) of capacity λ c for all c ∈ C, (m j , t) of capacity µ m j for all m j ∈ S, and (c, m j ) of infinite capacity for all (c, m j ) in the bipartite compatibility graph. The proofs for propositions and theorems in this section are given in Appendix B.3, throughout here and in the appendix we use the terminology of network flows, as in [10,23].

Servers Customers
The maximal flow in the network is related to the stability of our system through the following proposition.
Proposition 3.5. A necessary and sufficient condition for stability is that the maximal flow from o to t is λ, and that the cut through arcs (o, c), c ∈ C is the unique minimal cut.
The following theorem describes the solution of the o to t maximum network flow problem, as a function of λ. Fig. 8 illustrates this theorem as well as the following Corollary 3.7.
Theorem 3.6 induces a decomposition of the servers and of the customer types, as detailed in the next corollary. The decomposition is illustrated in Fig. 9. In this figure we have minimal cuts, which belong to a case where there are 4 breakpoints in f (λ), which correspond to 5 intervals, and 5 minimal cuts as numbered.
Corollary 3.7. If the maximal flow f (λ) has breakpoints 0 = λ (0) < λ (1) < · · · < λ (L) < λ (L+1) = ∞, then there is a unique partition of the set of servers into non-empty subsets S (1) , . . . , S (L) and of the set of customer types into non-empty subsets C (1) , . . . , C (L) , such that: The values of the breakpoints are: (iv) Consider the subsystem composed of servers S (i) and of customer types C (i) , with arrival rates λα c for customers of type c. Then this system is stable for λ < λ (i) and unstable for λ ≥ λ (i) . (v) Consider the subsystem composed of servers L k=i S (k) , and of customer types L k=i C (k) , with arrival rates λα c for customers of type c. Then this system is stable for λ < λ (i) and unstable for λ ≥ λ (i) . (vi) For λ (i−1) < λ < λ (i) the maximal flow solution of the whole system will have zero flow on arcs from customers c ∈ C (k) to servers m j ∈ S (l) for all k ≥ i > l.
The following corollary gives another way of solving the max flow problem and decomposing the sets of servers and customer types, and pinpoints the nature of this decomposition: Corollary 3.8. The sets C (i) , S (i) have the following characterization: Intuitively, the picture is as follows: Under complete resource pooling C (1) = C, S (1) = S. When there is no resource pooling, for some subsets of customers, the requirement that α C < β S(C) is violated. As λ increases, the subset of customers, which have the least value of β S(C) /α C , becomes overloaded when λ reaches µ β S(C) α C , and this defines C (1) and S (1) . This leaves servers S\S (1) to serve the remaining customer types C\C (1) . Note that even if c ∈ C\C (1) can be served by a server in S (1) , this will not happen in the max flow solution, since all the servers in S (1) are fully occupied by C (1) . The remaining servers and customer types now behave like a subsystem withα c = αc . If in the remaining subsystem,β S(C) α C ≥ 1 for all subsets, then L = 2 and the minimum of these ratios will be reached by S\S (1) , C\C (1) . Else, if the minimum is again < 1, the subsets C, S(C) with minimal ratio of β S(C)\S (1) /α C will be S (2) , C (2) , and the servers in S (2) will become overloaded when λ = µ β S (2) α C (2) , and so on (see Figs. 8 and 9).
Example (continued). We return to the 5 server 5 customer types system of the example in Section 3.2. We consider long term behavior as a function of the input rate λ.
The system is stable for 0 < λ < 40. 3.5. Limiting behavior of overloaded system under incomplete resource pooling. Following our study of the fluid limits of our system, and the decomposition of the set of servers and of customer types when there is no resource pooling, we can now describe the limiting behavior of our system as t → ∞ in the case that there is no resource pooling, as a function of the total arrival rate λ. In the following theorem we use the notation developed in Section 3.4. Theorem 3.9. Assume the system has incomplete resource pooling, as in Section 3.4 and that the total arrival rate is λ (l) < λ < λ (l+1) . Then as t → ∞ the following convergences will occur for the state of the system, s = (S(t), Q 1 (t), . . . , Q J (t)): (i) The permutation S(t) as t → ∞ will consist of a permutation of S (1) followed by a permutation of S (2) and so on up to a permutation of S (l) , followed by a permutation of the remaining servers. (ii) As t → ∞, the queue between the last server of S (k) and the first server of S (k+1) , for k = 1, . . . , min{l, L − 1} will diverge, growing at a rate µ( )α j≤k C (j) . If l = L, i.e. λ > λ L , the queue after the last server will grow at rate: (iii) For k = 1, . . . , l, the probability distribution of the permutation of S (k) and the queue length between the servers S (k) will converge to the stationary distribution of the FCFS infinite matching model for the subsystem of C (k) , S (k) . (iv) If l < L, then the probability distribution of the permutation of the remaining servers and the queue lengths between them and behind the last of them, and the ordered set of idle servers, will converge to the stationary distribution of the stable system consisting of k>l C (k) , k>l S (k) , as given by Theorem 2.1.
The diverging queues will diverge almost surely. The convergence of the probabilities to stationary probabilities will be in total variation distance. The overloaded subsystems and the remaining stable system will converge in distribution to independent processes.
Proof. It follows from the results of Sections 3.2 and 3.4 that the fluid limits of the processes Y k for the various servers will eventually coalesce into subsets, which will move together, with the servers of the subsets S (k) moving together at rates increasing with k for k = 1, . . . , l, and the remaining servers will move together with A(t) at the rate λ. This implies (i). The rates in (ii) are obtained directly from (3.6) and (3.7).
It is then seen that for each subsystem the transition rates, conditional on the diverging queues being > 0 are exactly those of the FCFS matching model for the subsystems k = 1, . . . , l, and for the remaining subsystem they are equal to those of a stable system, consisting of k>l C (k) , k>l S (k) . Parts (iii) and (iv) then follow by Lemma 1.
The independence follows, since the various subsystems have independent transitions, given that the diverging queues are > 0.
4. Discussion of many server scaling. In this section we give some indications on the behavior of the parallel service system under FCFS-ALIS, when the number of servers is large. Our assumption now is that there are n j servers of type m j , j = 1, . . . , J, so that the total service capacity is µ = n j µ m j of which a fraction β j = n j µ m j /µ belongs to servers of type m j . The total arrival rate is λ = c λ c and a fraction α c of them are of type c. We also assume that customers of type c have patience distribution F c . We fix α c , F c , β j , µ m j and let n j and λ increase. The following discussion is heuristic, and will require further research to verify it.
Our main premise is that the behavior of the system with many servers will be on two time scales: The total number of customers waiting, the rate of abandonment, and the number of idle servers behave similar to a system with a single customer type and a single server type. This is also indicated in the seminal paper of Talreja and Whitt [34]. The allocation of servers to customers will behave as in the infinite matching model, similar to the limiting behavior that we found in Section 3.
We assume first that complete resource pooling holds, and discuss many server behavior under the three regimes, of QD (quality driven, total offered load < 1), ED (efficiency driven, overloaded system), and QED (quality and efficiency driven, critically loaded). Stability in all these regimes is guaranteed by the abandonments. When there is no resource pooling we then discuss decomposition similar to Sections 3.4, 3.5. Finally, we discuss some generalizations.
4.1. Resource pooled system under QD regime. When the number of servers is large and the system is underloaded (λ < µ), there will always be some idle servers, there will be no waiting time and no customers will abandon. In that case, our main premise is that, because of ALIS, all servers, regardless of their types, will have the same idle time distribution. Hence, a server of type m j will have cycles of serving at rate µ m j , and then idling for a random time T which has the same distribution for all j. Furthermore, the types of the servers which become idle in sequence are i.i.d.
As the λ, n j → ∞, the idle time T will become constant, and the fraction of services of type m j will then be The sequence of matches will then behave as in the infinite bipartite matching model, with α andβ.

4.2.
Resource pooled system under ED regime. When the number of servers is large and the system is overloaded (λ > µ), there will always be customers waiting in the system. In that case, our main premise is that, because of FCFS, all customers, regardless of their types, will have the same waiting time distribution (see [34]), from which the abandonment rates can be obtained. Furthermore, the types of the customers that are served in sequence will be i.i.d.
As the λ, n j → ∞, the waiting time W will become constant. The total service rateλ, of which a fractionα c are of type c, will then be: The sequence of matches will then behave as in the infinite bipartite matching model, withα and β.

4.3.
Resource pooled system under QED regime. When λ and n j are balanced, so that µ − λ = κ √ λ, and we let λ, n j → ∞, then servers will be busy almost all the time and almost no customers will abandon. Our premise then is that customer types and server types in sequence will be i.i.d. The sequence of matches will then behave as in the infinite bipartite matching model, with α and β.

4.4.
No resource pooling. In that case the system will decompose into subsystems in a unique way, as in Sections 3.4, 3.5. Each of those subsystems will have complete resource pooling, and will be in one of the above regimes and behave accordingly.

4.5.
Generalizations. We believe that these results also hold under the following conditions: (i) Replace Poisson arrivals by general independent stationary arrival streams of rates λ c . Arrivals will no longer be i.i.d., but dependence between successive arrivals will be short range. (ii) Replace independent exponential service times by independent general service times of rates µ m j . The individual servers will then become available at times which will form a stationary process, and they will be almost independent, so that again, types of serves will be i.i.d. in sequence.
Furthermore, the sequence of matches will still behave as in the infinite bipartite matching model, even if we allow customer-server type dependent service times, i.e., the service duration of a customer depends on both the customer type and the server type. In that case the calculation of the cor-responding ratesα c andβ j will be more involved. In a recent paper [1] we provide some evidence that this is the case by simulation studies.
Acknowledgements. We are very grateful to the referees for pointing out a gap in our proofs and for the valuable suggestions to improve the presentation of the paper.

APPENDIX A: PROOFS FOR SECTION 2 OF THE PAPER
A.1. Proof of Theorem 2.1. We proceed in the following steps: We first obtain the transition rates into state s, we then set up the partial balance equilibrium equations, finally we check that they are satisfied by π X (s), and calculate the normalizing constant. The ergodicity condition follows directly from the expression for the normalizing constant.
Because the proof is quite similar to the proof of product form solution in [35], we skip some details.
This system actually satisfies partial balance equations which are: (i) The total probability flux out of state s due to an arrival that activates a server equals the total probability flux into state s due to a departure which idles a server: The total probability flux out of state s, due to an arrival that joins the queue, equals the total probability flux into state s, due to a departure which is followed by another start of service (so that the set of idle servers is unchanged): The total probability flux out of state s in which n i = 0 due to a departure, equals the total probability flux into state s, due to an arrival of a customer which activates server M i : (iv) The total probability flux out of state s in which n i > 0 due to a departure, equals the total probability flux into state s, due to an arrival of a customer which joins the queue: A.1.3. Verification of the balance equations, calculation of the normalization constant, and ergodicity conditions. To prove Theorem 2.1 it remains to verify that the expression (2.1) satisfies the partial balance equations (A.2)-(A.5). This is quite straightforward, involving at one step a summation over coefficients of a binomial distribution. The verification is similar to the corresponding verification of Theorem 2 in [35].
Calculation of the normalization constant (2.3) is straightforward from summation of the geometric sequences. Note that it is the inverse of a sum of J! terms.
When calculating the normalizing constant, it is clear that it is finite only if all the geometric sums are finite, and that is the case if and only if the ergodicity condition (2.2) holds.
This completes the proof.

A.2. Proof of Theorem 2.2.
Proof. Comparing (2.1) and (2.6), and recalling that π X and π Y both sum to 1, what we have to show is that for some constant D (which is the same for any M 1 , . . . , M i ): We take By the assignment condition (2.5), D is the same for all permutations of M 1 , . . . , M J , and so it is the same for all choices of M 1 , . . . , M i . We note that Hence we need to show that: This is exactly statement (8) in [5], with the following change of notation: (M i+1 , . . . , M J ) is (j m , j m−1 , . . . , j 1 ) in [5], This statement is proved there.

APPENDIX B: PROOFS FOR SECTION 3 OF THE PAPER
B.1. Proof of Theorem 3.1.
Next we observe that the value ofB(ρ) is: All terms in this expression, exceptB(ρ), remain bounded, and hence the whole expression tends to 0 as ρ ր 1. This proves (ii).
On the other hand, when all J servers are busy, we calculate: To prove (iv) we note that the transition rates of the queueing process X(t) are bounded between λ and λ + µ, and so the jump chain and the continuous process are both ergodic or both non-ergodic at the same time. Furthermore, all the stationary probabilities of states, when not all the servers are busy, will tend to zero for the jump chain of the queueing process as well as for the continuous queueing process, as ρ ր 1. Finally, the transition rate at which jumps occur, when all the servers are busy, is λ + µ, independent of the state. Hence the stationary probabilities for the jump chain and for the continuous time process, for states when all servers are busy, will tend to the same limits as ρ ր 1. This proves (iv).
B.2. The fluid model and proof of Theorem 3.3. The proof of Theorem 3.3 proceeds in three logical steps. First we prove that fluid limits exist. Next we show that every fluid limit obeys a set of rules that determine its dynamics. Finally, because these dynamics determine a unique fluid limit, the convergence to this fluid limit follows. We start with some preliminaries.
We add another component to our description of the system: We let T S,j (t) be the accumulated time over (0, t] that the order of the servers is S and the j server in the order of servers is serving a customer, for all j = 1, . . . , J and all S ∈ P where P is the set of all the permutations of servers m 1 , . . . , m J . We let T (t) denote the vector of all the T S,j (t). The dynamics of our sequence of systems are fully described by Z(t) = (A(t), T (t), Y (t), S(t), Q(t)).
For the times of events, the types of customers, and the identities of servers completing service, the functional strong law of large numbers (FSLLN) applies. We now consider only the set of sample paths ω for which there is FSLLN convergence. We exclude all other sample paths, which are a set of measure zero, hence all our following statements hold almost surely. This means in particular that the number of type c customers among c −q n , . . . , c −1 , divided by n converges toĀ(0)α c . Our first proposition follows immediately from the FSLLN: Proposition B.1. Equations (3.5) and (3.7) hold almost surely.
The next three propositions examine the existence of fluid limits.
Proposition B.2. Fluid limits ofT n (t),Ȳ n (t) exist and are absolutely continuous.
Proof. The arguments are similar to Dai and Lin [21]. We consider first T n (t, ω). We have for every S, j that the cumulative time T n S,j (t, ω) is Lipschitz continuous (with constant 1), and this property is retained bȳ T n S,j (t, ω) = T n S,j (nt, ω)/n. Hence for every sample path ω we have a sequence of equi-continuous functions, and therefore (this is the argument of [21], based on the Arzela-Ascoli Theorem, see Royden [33]) there exists a subsequence, indexed by divergent r such thatT r S,j (t, ω) converges toT S,j (t) as r → ∞, u.o.c.. Since the number of components of T n (t) is finite (in fact J × J!), we can by a standard argument select successive subsequences and end up with a subsequence r such that for all components simultaneously, T r (t, ω) →T (t).
To examine Y r (t, ω) we partition the time during which Y r j (t, ω) is evolving into subsets of time given by the T r S,j (t, ω). Consider now the service completions of the server in position j during the time accumulated by T r S,j (t, ω): The server in position j is M j , with service times that are exponentially distributed with rate µ M j . By the memoryless property of the exponential distribution, the total number of service completions by server M j conditional of the value of T r S,j (t) is distributed as N r S,j ∼ Poisson µ M j T r S,j (t, ω) , or to put it more generally, N r S,j (t) is a Poisson process with time dependent rate µ M j d dt T r S,j (t). To find the next customer to serve, while the permutation of the servers is S, server M j will skip all the customers which have already been served by the servers which are ahead of him in the permutation, and he will also skip all the customers which can only be served by servers which are behind him in the permutation. For a fixed permutation S, counting from one customer that M j can serve to the next, the number of customers is distributed as a Geometric random variable. Again, by the memoryless property of the Geometric distribution, we have that for every job that server M j will complete during T r S,j (t, ω), he will move ahead over a random number of customers which is geometrically distributed, with a parameter which we denote p S,j (we will calculate it later).
We let Y r S,j (t, ω) be the total number of customers that the server in position j passes (serving or skipping them) during the accumulated time T r S,j (t, ω). Then by Wald's equation this will have expected value µ M j T r S,j (t, ω)/p S,j (in fact it will be distributed as a Poisson random variable with that parameter). This means that Y r S,j (t) is a non-homogenous compound Poisson process. Hence, by the functional strong law of large numbers, we get thatȲ r S,j (t, ω) = Y r S,j (rt, ω)/r →T S,j (t)µ M j /p S,j =Ȳ S,j (t) as r → ∞ u.o.c. This is for the same sequence of r that defines all thē To complete the proof, we note that sinceT j (t) are Lipschitz continuous with constant 1,Ȳ j (t) are Lipschitz continuous, with a constant max {S,j} µ M j /p S,j .
Limiting fluid behavior of S n (nt, ω) is more involved. We noted that while S(t, ω) is a function from [0, ∞) to the permutations P, fluid limitsS(t) need to be partitioned permutations. To discuss fluid limits ofS n (t, ω) = S n (nt, ω) we need to make some definitions. A simple collision ofȲ (·) and O(·) happens at t if O(τ ) is constant for τ ∈ (t − ǫ, t) and for τ ∈ (t, t + ǫ) for some ǫ > 0, but the partitions in the two intervals are different. In that case the partitions in the two intervals are refinements of O(t). Definition B.5. We say thatS r (t, ω) converges toS(t) as r → ∞ if for any interval t ∈ [0, T ] there is an n 0 , so thatS(t) is an ordered partition of S r (t, ω) for all r > n 0 .
The following two proposition summarize what we can say about fluid limits of S n (nt, ω).
Proposition B.6. Assume that for a divergent sub-sequence r, Y r (t, ω) →Ȳ (t), and consider a closed interval [t ′ , t ′′ ] in which O(t) is constant. Then there exists a constant partitionS =S(t) of the servers according to O(t), such that for some divergent subsequence r ′ of r, the partition of S r ′ (r ′ t, ω) according to O(t) is equal toS for all t ∈ [t ′ , t ′′ ] and for all r ′ .
Proof. Because O(t) is constant over [t ′ , t ′′ ] andȲ (t) are continuous, there will be an ǫ > 0 such that components ofȲ (t) that belong to different subsets in O(t) are never closer than 3ǫ. By the u.o.c. convergence ofȲ r (t, ω) there is a n 0 such that for all r > n 0 , |Ȳ r (t, ω) −Ȳ (t)| < ǫ for all t ∈ [t ′ , t ′′ ], so components ofȲ r (t, ω) for r > n 0 which belong to different parts of O(t) never meet in the interval [t ′ , t ′′ ]. But this implies that for every r > n 0 , the partition of the permutation of servers S r (rt, ω) into the subsets determined by O(t) is a constant partition for all t ∈ [t ′ , t ′′ ] (we are not saying it is the same constant partition for different values of r). But the total number of possible partitions is finite (it is J J+1 (2J)! J! ). So at least one of them appears in infinitely many r > n 0 , which defines the constant partitionS and the subsequence r ′ .
Proposition B.7. Consider a fluid limitȲ r (t, ω) →Ȳ (t). If the number of 'collisions', instants where O(t) changes, is finite, then for arbitrarily large T , there exists a fluid limitS(t) which is compatible withȲ (t) over [0, T ], and a divergent subsequence r ′ of r such thatS r ′ (t, ω) converges toS(t) as r ′ → ∞.
Proof. Assume 0 < t 1 < t 2 < · · · < t M < ∞ are all the collision times. Because there is only a finite number of them they are all simple. Consider the closed intervals [t i − δ, t i + δ], i = 1, . . . , M , and [0, . . , M − 1, for some small δ > 0 and arbitrarily large T . Consider the interval [0, T ]. By the argument of the previous Proposition B.6, we can find ǫ > 0 such that all components of Y (t) which belong to different parts of O(t) are never closer than 3ǫ, and we can then find n 0 such that |Ȳ r j (t, ω) −Ȳ j (t)| < ǫ for all j = 1, . . . , J, 0 ≤ t ≤ T and r > n 0 .
It is now easy to see that for every r > n 0 , when we let δ → 0, that there is a fluid limit candidateS r (t) which is compatible withȲ (t), so thatS r (t) is an ordered partition of the permutationS r (t, ω) for all t.
We now use again the argument of the previous Proposition B. 6: Since there are only 2M +1 intervals, the total possible sets of 2M +1 partitions is a finite number. Hence there is one candidateS(t) =S r (t) which appears with infinitely many r. This gives us the fluid limit, and the divergent subsequence r ′ , for whichS r ′ (t, ω) =S(t) for all r ′ .
To summarize, we now know that fluid limitsĀ(t),Ȳ (t),T (t),Q(t) exist and are Lipschitz continuous for all t ≥ 0. We also know, by Proposition B.7, that ifȲ (t) have only a finite number of collisions then also a partitionS(t) compatible withȲ (t) exists for all t ≥ 0, and by the same argument, if Y (t) have only a finite number of collisions in [0, T ] then a partitionS(t) compatible withȲ (t) exists for t ∈ [0, T ].
Once we established existence of fluid limits we will in the next propositions investigate their dynamics. It will turn out that these determine the fluid limits uniquely for all t ≥ 0.
U ({M 1 , . . . , M i−1 }), and so at each service completion, he will move a random number of places G in the sequence of customers, where G is a geometric random variable with parameter (probability of success) α U ({M 1 ,...,M i }) − α U ({M 1 ,...,M i−1 }) . Hence, the total change in Y r i over the interval I will be: which, by Wald's equation and the FSLLN, converges as r → ∞ tō from which (B.1) follows.
Proposition B.8 clarifies how single isolated servers move in the fluid limits. The next proposition studies movement of servers which stay together in the fluid limit.
The servers in S are working all the time, so they will process a total of L ∼ Poisson(µβ {M k ,...,M l } r∆). They will all start approximately at the same place, and end up approximately at the same place, processing all the customers of types in U ({M 1 , . . . , M l })\U ({M 1 , . . . , M k−1 }, and skipping all the other customers, between their approximately common starting and ending positions. The total distance travelled by all the processors in S will therefore be approximately equal to L l=1 G l where G l are again i.i.d. geometric random variables with probability of success Thus, proceeding to the limit as in the proof of Proposition B.8, we get thatȲ and (B.2) follows.
For the next proposition we will make use of the following elementary lemma, the proof of which may be found in [22] Lemma 2. Let g(t) be an absolutely continuous nonnegative function on t ≥ 0 and letġ(t) denote its derivative whenever it exists.
Complete resource pooling is also necessary for all the servers to move together as the next proposition shows: Proposition B.11. Assume that there is no complete resource pooling. Assume we start fromQ i (0) = 0, i = 1, . . . , J − 1,Q J (0) > 0. Then immediately the set of servers will split into more than one subset, which will move at different rates.
Proof. Assume to the contrary that all servers move together. Then by Proposition B.10, for all servers d dtȲ i (t) = µ, 0 < t < ∆. If there is no resource pooling, then there is a subset of servers S such that β S < α U (S) . We will show that for any time t, some of the servers in S will move at a rate which is < µ. This will prove the proposition.
Assume first that the servers in S move together, and are behind all other servers. Then the rate at which they move will be, by Proposition B.9, d dtȲ i (t) = µ β S α U (S) < µ. Assume next that the servers in S split into S = M 1 ∪ · · · ∪ M L subsets, each of which moves together, and that all these subsets are behind all the servers in S. Then as argued in the proof of Proposition B.10, the servers of the last subset M 1 will move at a rate slower than µ β S α U (S) < µ. Finally, if these subsets of S, which move together, are not behind all the servers in S, then the servers of the last subset M 1 will have more customers to serve than if they were moving in the very back. Hence the rate of moving for i ∈ M 1 will be: We now use the extended definition of complete resource pooling for subsets, Definition 3.2, to prove part (ii) of Theorem 3.3, and equation (3.6), and to complete the proof of the theorem.
Proof of part (ii) of Theorem 3.3. BecauseQ k−1 (t) > 0 and Q l (t) > 0, the servers S = {M k , . . . , M l } will move in isolation of all the other servers during a time interval t < τ < t + ∆, and will be between servers S ′ and S ′′ . In their movement, they will process all the customers in U (S ∪ S ′ )\U (S ′ ), and only those customers. The condition that S has complete resource pooling between S ′ and S ′′ means that in a system that would consist only of servers S and customer types U (S ∪ S ′ )\U (S ′ ), with β,α as defined above, there would be complete resource pooling. Hence, by Propositions B.10, B.11, it would be necessary and sufficient for the servers of these subsystems to move together. But the movement of the servers S when they are between S ′ , S ′′ andQ k−1 (t) > 0 andQ l (t) > 0, is exactly as if they were a separate system, with the only difference that they will actually also skip over all the customers types in C(S ′′ ). Hence, by Propositions B.10, B.11 they will stay together if and only if S has complete resource pooling between S ′ and S ′′ .
Completing the proof of Theorem 3.3. We have shown that fluid limits exist for t ∈ [0, T ] as long as they include only a finite number of simple isolated collisions. We have also shown that they satisfy (3.5), (3.6), (3.7). In Section 3.4 we show, employing results form flows in networks which are independent of the stochastic model, that the decomposition of a set of servers into subsets which have complete resource pooling is unique. Equations (3.5), (3.6), (3.7) and the uniqueness of the decomposition completely determine the evolution ofZ(t) = (Ā(t),T (t),Ȳ (t),S(t),Q(t)), and shows it to have only a finite number of collisions. This means that every fluid limit that starts from givenZ(0) = (Ā(0),T (0),Ȳ (0),S(0),Q(0)) has to follow the same unique trajectory, for almost all ω, and for any subsequence r for which there is convergence to a fluid limit. But this implies that Z n (t, ω) = (Ā n (t, ω),T n (t, ω),Ȳ n (t, ω),S n (t, ω),Q n (t, ω)) converges almost surely to the unique fluid limit.
B.3. Maxflow network analog, and proofs for Section 3. 4. In what follows we use the terms, notation, and results as formulated in Ford and Fulkerson's book [23], pages 1-14, see also [10]. In a directed network with origin and terminal o, t, a cut is given by a partition of the nodes into two sets, one of which includes o and the other includes t. For our network it is given by {o, C, S} and {t, C, S}, for some subset of customer types C and some subset of servers S, where C, S are the complements of C, S.
The capacity of the cut is the sum of the capacities of arcs directed from the o part to the t part. For our network this will be the sum of the capacities of the arcs from o to the nodes in C, the arcs from the nodes of S to t, and the arcs from C to S. For any cut with finite capacity, there are no arcs from C to S, hence S(C) ⊆ S, and C(S) ⊆ C. We will only consider cuts with finite capacity. Rather than talk about the cut as a partition, we will describe a cut by the sets C and S. The capacity of a (C, S) cut is then λ C + µ S . The celebrated max-flow min-cut theorem states that the maximal o to t flow through the network equals the capacity of the minimal cut.
Proof of proposition 3.5. The proof is similar to proofs that were given in [8,18], we give it here for completeness.
Sufficiency: We note first that if max flow is λ and the unique minimal cut consists of the arcs (o, c), c ∈ C, then there exists ǫ > 0 such that for capacities λ c (1 + ǫ) the max flow is λ(1 + ǫ). But then, for any subset of customer types C, the total flow from o to the nodes in C equals (1 + ǫ)λ C , so the total flow from C to the server nodes S(C) equals at least (1+ǫ)λ C (it may be more, since S(C) may receive flow from additional customer nodes), and so the total flow from server nodes S(C) to t equals at least (1 + ǫ)λ C . But the capacity of the arcs from server nodes S(C) to t equals µ S(C) , so we must have have µ S(C) ≥ (1 + ǫ)λ C > λ C . Hence the condition (2.2) for stability of the queueing system with total arrival rate λ holds, and the system is stable.
Necessity: Assume that the maximal flow is < λ or that the maximal flow is λ but the cut (C, ∅) consisting of arcs (o, c), c ∈ C is not the unique minimal cut. Then there exists a minimal cut (C, S) with capacity ≤ λ where C = C and S = ∅. But then, as observed above, S(C) ⊆ S. Hence µ S(C) ≤ µ S ≤ λ − λ C = λ C , which contradicts the condition (2.2).
Proof of Theorem 3.6. (i) The maximal flow problem for each fixed λ is a linear program with feasible and bounded solutions, and when it is considered with varying λ, it is a parametric linear program. As such it will have intervals in which the same basis is optimal, and such intervals will cover the whole line of λ > 0. Within such an interval, the flows of the optimal solution will be affine functions of λ. The optimal maximal flow objective f (λ) is clearly a continuous non-decreasing function of λ. Consider now the optimal flows for λ ′ and λ ′′ with λ ′ < λ ′′ and look at λ = (1 − θ)λ ′ + θλ ′′ . The convex combination of the optimal flows for λ ′ and for λ ′′ is a feasible flow for λ with objective value (1 − θ)f (λ ′ ) + θf (λ ′′ ), which can only be suboptimal. This proves the concavity. Finally, if 0 < λ < min{µ m 1 , . . . , µ m J }, the maximal flow is λ, so the slope of the initial interval Proof of Corollary 3.7. Parts (i) and (ii) and (iii) follow directly from the construction of minimal cuts in Theorem 3.6. Parts (iv) and (v) then follow from Proposition 3.5 and the expression in (iii). Finally, for part (vi), we note that when λ (i−1) < λ < λ (i) , then the servers S (j) receive flow µβ S (j) from customer types C (j) for j = 1, . . . , i − 1, so there can be no additional flow to any of them from nodes of customer types in C (i) , . . . , C (L) .
Proof of Corollary 3.8. One way to see this is that, by Corollary 3.7 (iv), for each C ⊂ C (i) we have β S(C) α C < β S (i) α C (i) , and by (iv), β S (i) α C (i) are monotone increasing in i.