The Join-the-Shortest-Queue System in the Halfin-Whitt Regime: Rates of Convergence to the Diffusion Limit
Abstract
We bound the rate at which the steady-state distribution of the join-the-shortest-queue (JSQ) system converges, in the Halfin-Whitt regime, to its diffusion limit. Our proof uses Stein’s method and, specifically, the recently proposed prelimit generator comparison approach. The JSQ system is nontrivial and high-dimensional and has a state-space collapse component; our analysis may serve as a helpful example to readers wishing to apply the approach to their own setting.
1. Introduction
Consider a queueing system with n identical servers, each with a finite buffer of length b. Customers arrive according to a Poisson process with rate , and service times are independent and identically distributed (i.i.d.), exponentially distributed with rate one. Customers cannot change servers after the initial routing decision, and a customer arriving to a system where all servers are busy and all buffers are full is blocked. This is known as a parallel-server system. A load-balancing policy specifies the manner in which arriving customers are assigned to the servers. In this paper, we consider the classical join-the-shortest-queue (JSQ) policy. Under JSQ, an arriving customer enters service immediately if at least one server is idle; if not, they get routed to the server with the smallest number of customers in its buffer. Ties are broken arbitrarily. We refer to this as the JSQ system.
Parallel-server systems have generated immense interest in recent years, and the JSQ policy is fundamental because it minimizes the expected customer delay and maximizes, with respect to stochastic order, the number of customers served in a given time interval; see, for instance, Winston (1977) and Weber (1978). For a sample of recent work on the JSQ policy, we refer readers to Eryilmaz and Srikant (2012), Mukherjee et al. (2016), Eschenfeldt and Gamarnik (2018), Banerjee and Mukherjee (2019), Gupta and Walton (2019), Liu and Ying (2019), Banerjee and Mukherjee (2020), Braverman (2020), Zhou and Shroff (2020a, b), Cao et al. (2021), Hurtado-Lange and Maguluri (2022), and Zhao et al. (2021). Other popular load-balancing policies include the join-the-idle-queue policy (Stolyar 2015, Mukherjee et al. 2016), the idle-one-first policy (Gupta and Walton 2019), and of course the power-of-d policy (Vvedenskaya et al. 1996, Mitzenmacher 2001); but in this paper, we focus on the JSQ policy. We make no attempt to give a comprehensive review of the literature on parallel-server systems, instead referring the reader to van der Boor et al. (2021) for a recent survey.
Understanding the exact performance of the system is known to be difficult, and much attention has been devoted over the past decade to “heavy-traffic” asymptotics. The term heavy-traffic refers to parameter regimes where the system utilization tends to one. “Conventional heavy traffic” assumes that the number of servers n is fixed and , whereas “many-server heavy traffic” assumes that and jointly. For two examples of work in the conventional heavy-traffic setting, see Eryilmaz and Srikant (2012) and Zhou and Shroff (2020b). In this paper, we use the term heavy-traffic to refer to the many-server setting—the setting considered in most of the papers mentioned in the previous paragraph.
There are multiple many-server heavy-traffic regimes, depending on how n and λ jointly converge to their limit. For example, assuming that yields entirely different asymptotic behavior compared to when . To capture all the possible heavy-traffic regimes, it is common practice to assume that the per-server load λ is related to the number of servers n through for some and . In this paper, we focus on the case when ; that is, . This regime is known as the Halfin-Whitt regime and is ubiquitous across the queueing theory literature. It derives from the work of Halfin and Whitt (1981) and is also known as the quality-and-efficiency-driven regime because it achieves reasonable customer wait times while maintaining high utilization of servers. The full list of parameter regimes is found in Figure 1.

Notes. Higher values of α represent heavier loads. Existing work across the different parameter regimes is reviewed in Section 1.1.
We now state and discuss our main results. Let be the number of servers with i or more customers at time , noting that for . The process is an irreducible continuous-time Markov chain (CTMC) on a finite state space and therefore possesses a unique stationary distribution. We let be the random vector having the stationary distribution of the CTMC. To describe the asymptotic behavior of Q, we let and define the diffusion-scaled random vector by , and for . The results of Eschenfeldt and Gamarnik (2018) and Braverman (2020) imply that X converges in distribution to some limiting -valued random vector Y as . In this paper, we establish an upper bound of order on the rate of convergence to Y.
The random variable Y is distributed according to the stationary distribution of the diffusion process , which satisfies
Our main result is that there exists a constant such that for all , and any function whose first-order and second-order partial derivatives are bounded in magnitude by one,
The assumption that b is finite is used frequently in the proof of (2) and, specifically, in the proof of Proposition 1. We deem the finite buffer assumption to be acceptable because it was shown by Braverman (2020) that even with infinite-sized buffers, for all in the Halfin-Whitt regime, implying that or that the mass concentrates on those states with at most one customer waiting. Moreover, Liu and Ying (2020) showed that assuming finite buffers, as in the even busier super-Halfin-Whitt regime ().
In addition to the novelty of our result, this paper makes a methodological contribution. We prove (2) using Stein’s method, a framework introduced by Stein (1972) that allows one to study the rate of convergence of a sequence of random variables to its limit. Popularized in the area of queueing systems by Gurvich (2014), Ying (2017), Braverman and Dai (2017), and Gast (2017), the generator comparison approach of Stein’s method, attributed to Barbour (1988, 1990) and Götze (1991), is used to study convergence rates of steady-state Markov chain distributions to their diffusion, fluid, or mean-field limits. For a few recent applications of the generator comparison approach in queueing, we refer the reader to Gaunt and Walton (2020), Hurtado-Lange and Maguluri (2022), Lu (2021), and Liu et al. (2022); this list is by no means comprehensive. In this paper, we restrict our attention to the case when the limit is the stationary distribution of a diffusion process, referring the reader to Ying (2017) for a treatment of fluid and mean-field limits.
The generator approach requires bounds on various moments of the prelimit, known as moment bounds, and bounds on the derivatives of the solution to the Poisson equation for the limiting distribution. The latter are called gradient bounds in Braverman and Dai (2017). But in this paper, we stick with the original term “Stein factors” or “Stein factor bounds”; see, for example, Ross (2011). Although moment bounds can be difficult to obtain in some applications, Stein factor bounds are typically the bigger problem. When the limit is one dimensional, Stein factors are bounded using the explicit form of the solution to the Poisson equation—an ordinary differential equation. When the limit is multidimensional, the Poisson equation is a partial differential equation (PDE) that generally does not have an explicit solution, making Stein factor bounds harder to establish. Techniques proposed to obtain multidimensional Stein factor bounds include using a priori Schauder estimates from elliptic PDE theory as in Gurvich (2014), using couplings to analyze and bound the sensitivity of the diffusion to its initial condition as in Barbour (1988) and Mackey and Gorham (2016), and bounding the Stein factors using Malliavin calculus as in Fang et al. (2018) and Jin et al. (2022). A detailed description of these techniques can be found in section 1.1 of Braverman (2022). However, despite progress on multidimensional Stein factor bounds, the JSQ system is not covered by existing results because our limiting diffusion in (1) is constrained to the nonnegative orthant via reflecting boundary conditions.
To deal with the Stein factor bound problem, this paper promotes the use of the prelimit generator comparison approach, which was recently proposed by Braverman (2022) as an alternative to the generator comparison approach. The prelimit approach is the mirror image of the classical generator approach. Whereas the latter requires moment bounds on the prelimit X and Stein factor bounds for limit Y, the former needs moment bounds on Y and Stein factor bounds for the prelimit X. For the moment bounds used in this paper, the result that all moments of Y are finite, proved by Banerjee and Mukherjee (2019), is sufficient because our limit Y does not depend on n. The Stein factor bounds pose a bigger challenge, and we deal with them in Section 3. It was noted in Braverman (2022) that the prelimit and classical generator comparison approaches should be equivalent, in theory, in the sense that any bound on obtained using one of them should be attainable using the other. However, in practice, one approach could be more tractable, or convenient, to work with; see, for instance, the example in section 4 of Braverman (2022). In the case of the JSQ system, we discuss in Remark 1 of Section 3.2.3 how the discrete state space simplifies the analysis of the couplings we use to establish Stein factor bounds because the initial spacing of the coupled systems is preserved until coupling.
The introduction of the prelimit approach in Braverman (2022) was intended to be gentle, with the only example used there being the system. Our application of the approach to the JSQ system exposes all of its moving pieces and can be useful to those who want to apply the prelimit approach to their own setting. For example, some of the technical components of this paper that could be useful in other settings include the regenerative argument used to establish first-order Stein factor bounds in Section 3.1, the approach we use to bound in Section 3.2.1, and our treatment of reflecting boundary conditions in Section A.2.2 in Appendix A.
It should be noted that Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a) used the classical generator comparison approach to obtain rates of convergence of the steady-state total customer count to an exponential random variable for . The former paper was in the continuous-time setting, whereas the latter considered the discrete-time system; the results in both papers also hold for routing policies other than JSQ, such as the power-of-d policy. Because the limiting random variable in both papers is one dimensional, the Stein factors bounds do not pose a challenge there.
1.1. Literature Review
Let us first review the literature on the analysis of the JSQ system in the various many-server heavy-traffic regimes. Most of the work has been done in the setting with infinite buffer sizes, so, unless otherwise noted, we assume that . In Eschenfeldt and Gamarnik (2018), the authors established the process-level convergence of to its diffusion limit in the Halfin-Whitt regime (). That paper triggered a wave of interest in the many-server heavy-traffic asymptotics of the JSQ system. Convergence of the stationary distributions was later established by Braverman (2020), and the behavior of the stationary distribution of the limiting diffusion was studied by Banerjee and Mukherjee (2019, 2020). Our work fits with this group of papers, elevating the steady-state convergence result to one with rates of convergence.
Outside the Halfin-Whitt regime, Mukherjee et al. (2016) studied the transient and steady-state behavior of the JSQ system’s fluid limit when is a fixed constant (α = 0); Gupta and Walton (2019) established process-level convergence to the diffusion limit when α = 1, known as the nondegenerate slowdown (NDS) regime and introduced by Atar (2012). In the sub-Halfin-Whitt regime when , Liu and Ying (2019) assumed finite buffers and obtained bounds on the steady-state total customer count in the system. A similar result was obtained for Coxian-2 service times by Liu et al. (2022) and by Liu and Ying (2020) for the super-Halfin-Whitt regime . Another recent work in the super-Halfin-Whitt regime was by Zhao et al. (2021), who worked with infinite buffers and established transient and steady-state diffusion limits for the normalized total queue length process. Their analysis exploited the regenerative structure of the JSQ system and contained several hitting-time estimates very close to our own estimates needed for the Stein factor bounds in Section 3. Lastly, both Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a) established rates of convergence to the exponential distribution for the steady-state normalized total customer count. Their results covered the case when .
Other works have used Stein’s method in the setting of parallel-server systems beyond Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a). In Liu and Ying (2019, 2020) and Liu et al. (2022), the authors used Stein’s method for mean-field analysis to obtain bounds on steady-state performance metrics of interest, like for instance, for the power-of-d system. Another line of work on power-of-d systems was by Gast (2017), Gast and Van Houdt (2017), and Gast et al. (2019), where the authors showed how to derive refined mean-field models for improved steady-state approximations. More recently, Hairi and Ying (2021) provide calculable error bounds for the mean-field approximation of the power-of-two-choices model.
1.2. Notation
We use to denote the set of integers and let . For any and , we let be the set of all k-times continuously differentiable functions . We let be the vector whose elements all equal one and let be the element with one in the ith entry and zeros otherwise. For any and integer d > 0, we let and define similarly. For any function , we define the forward difference operator in the ith direction as
2. Main Result
Recall that is the number of servers with i or more customers at time and that is an irreducible CTMC with state space given by
Figure 2 gives an example of a state .

Notes. Customers below the dashed horizontal line are in service, while those above are waiting in buffers. Each vertical column corresponds to a server and its buffer.
We assume that for some fixed . Let and define the diffusion-scaled CTMC by
We will often use and interchangeably. Recalling that , for any , the infinitesimal generator of satisfies
The first line of transitions in (5) corresponds to arrivals. We see that for , the jth component of only grows provided the preceding j – 1 horizontal levels, as depicted in Figure 2, are full. The transitions in the second line of (5) correspond to service completions. Using Figure 2 again, we interpret as the number of servers (vertical columns) with exactly j customers.
Recall that and are distributed according to the stationary distributions of the scaled CTMC and the diffusion defined in (1), respectively. Going forward, we note that unless explicitly stated, all expectations are with respect to the stationary distribution at hand, that is, either X or Y. To state our main result, we define
For any , there exists a constant such that for all ,
Note that is also a convergence-determining class because . We prove Theorem 1 in Section 2.1 using the prelimit generator approach of Stein’s method. Multiple parts of the proof assume that n is large enough, say, for some . We can make this assumption without loss of generality by redefining to be larger than .
2.1. Proving Theorem 1
Central to our proof is the ability to extend any grid-valued function to be defined on all of . Although there are infinitely many such extensions, we use a polynomial spline A that extends grid-valued functions to functions . We leave the detailed construction to Section A.2 in Appendix A because for this section, it suffices to know that A is a linear operator, that , and that A applied to a constant equals that constant. Recalling that , the following auxiliary lemma is needed.
Define
There exist some independent of any JSQ model parameters such that
The result follows by repeating the arguments used in the proof of lemma 1 in Braverman (2022). □
Going forward, when we write , the constant C is assumed to be the one in Lemma 1. Furthermore, note that if , then the linearity of A and the fact that A applied to a constant equals that constant implies that satisfies . We therefore, without loss of generality, consider only those such that .
To prove Theorem 1, we bound the right-hand side of (7) with the help of the following two ingredients. The first ingredient is a rate-conservation law for , proved in Appendix A.
Given , define
If and , and if Y(0) is initialized according to Y, then
The second ingredient is the Poisson equation. For and , let
Most applications of Stein’s method have c = 0, but we choose and define
Our choice of c yields , which comes in handy later when we need to bound in Proposition 1. Going forward, we assume that when referring to (10).
Let us give an informal roadmap for bounding (7), with the formal statement of the bounds left to Proposition 2. We bound (6) by comparing the CTMC and diffusion generators. However, the former is defined only on a subset of , which requires the following workaround. Suppose that we are given a set such that (a) for and (b) the probability that goes to zero rapidly (we will make this precise) as . We decompose as
Now extend to by defining for and consider . Provided that and , we can invoke Lemma 2 with there to conclude that
To state the following proposition, define elementwise by . For notational convenience, we also define . The following proposition is proved in Section A.2 in Appendix A.
If , then Ah(Y), , and are integrable and (12) holds. Furthermore, suppose that n > 16, define
There exist independent of h(x) and n such that
Note that and are related to the first and second lines of (12), respectively, whereas and are related to the last line there. From the bounds in Proposition 2, we see that the bound on (12) depends on the CTMC through the function and its differences and on the diffusion through the distribution of Y. The differences of are commonly known as Stein factors; the following proposition, proved in Section 3, exhibits the Stein factor bounds we need to prove Theorem 1.
There exists such that for any and ,
The last component needed for the proof of Theorem 1 is the following lemma.
All moments of Y1 and Y2 are finite. Furthermore, suppose that Y(0) is initialized according to Y. Then for any j > 0,
The finiteness of the moments follows from theorem 2.1 of Banerjee and Mukherjee (2019), and (13) is implied by (9) of Lemma 2 with there. □
Initialize Y(0) according to Y. Using (12) and the definitions of , it follows that
We argue that for any , which implies Theorem 1 when combined with Lemma 1. Because for , applying the Stein factor bounds in Proposition 1 with the bounds on and in Proposition 2 yields
We point out that
Furthermore, applying the Stein factor bounds in Proposition 1 to the bounds on and in Proposition 2, and using (15), we get
Thus, (13) of Lemma 3 implies that
3. Stein Factor Bounds
In this section, we prove Proposition 1. We bound the first-order differences in Section 3.1. This requires the most effort. The second-order differences are bounded at the start of Section 3.2, with Section 3.2.1 showing how they can be used to bound , which may be of independent interest. Section 3.2.2 contains the third-order bounds, and Section 3.2.3 proves two technical lemmas needed for the second-order bounds.
3.1. First-Order Differences
In this section, we bound
For , define . There exists a coupling of whose transient distribution satisfies
Furthermore, if , then
The equality , where .
The pair belongs to for all times .
Let V be a unit-mean exponentially distributed random variable independent of . Then
(17)
Let us construct a joint CTMC by specifying its transitions. For simplicity, we refer to as system 1 and to as system 2. We think of system 2 as a copy of system 1 but with an additional low-priority customer following a preemptive resume rule. That is, service is interrupted, and the extra customer moves to the back of its buffer when a regular customer joins, even if the low-priority customer is currently in service.
Any state in is one where the low-priority customer is in service. The remaining correspond to states where the low-priority customer is assigned to a server with a total of i customers; Figure 3 contains an example of a states in and . Assuming for some , we now describe the possible transitions of the joint chain.
If i = 1, then the low-priority customer is in service. After a unit-mean exponentially distributed amount of time, the customer leaves system 2 and both systems couple. After coupling, systems 1 and 2 are identical in terms of current and future customers, so they coincide on every sample path. All other transitions of the joint chain are based on the standard transitions of the JSQ model. In other words, a service completion by any of the q1 servers working in system 1 results in a customer departure from both systems.
Figure 4 illustrates the effect of arrivals when . Namely, when , a new arrival is assigned to the same idle server in both systems. If a customer arrives when , then system 1 has only one idle server and system 2 has none. In system 1, that customer will be assigned to the last remaining idle server. Recall that when defining our JSQ model, we allowed for an arbitrary tie-breaking decision in routing arrivals. Therefore, in system 2, we assign that customer to the server working on the low-priority customer, causing a service preemption and pushing the low-priority customer to the back of the buffer. An arrival when transitions the joint chain from to .
If , then the low-priority customer is in the back of some server’s buffer. A service completion by any of the q1 servers working in system 1 results in a customer departure from both systems. If, however, the service completion happens at the server containing the low-priority customer, then the chain transitions from to because the low-priority customer is now assigned to a server with i – 1 customers; see Figure 5 for a depiction of such a transition. All new arrivals get assigned to the same server in each system. Note that if an arrival happens when and , then the system transitions from to .
The final case is when . All transitions are identical to the case, except for a customer arrival to a system where and . In that case, system 1 assigns the customer to the last available slot; but system 2 blocks the customer because it is already full. This transition causes the two systems to couple. Note that our construction immediately implies the three claims in Lemma 4. □

Notes. The red circles correspond to customers in Q(t), while the blue circle is the extra customer in . In the figure on the left, the joint chain is in , meaning the blue customer is in service and will leave the system after an exponentially distributed amount of time, coupling the joint chain. In the figure on the right, the joint chain is in because the blue customer is assigned to a server with a total of three customers.

Note. The second arrival results in a transition from to .

Let be the scaled version of . For any with , and any ,
For any ,
Before proving the lemma, we note that the first-order bounds in Proposition 1 are a consequence of (18) and Lemma 5; that is,
Furthermore, note that for any with ,
Recall that and that the definition of S implies that for any and . Combining these facts with (19) yields
We now describe the main idea and introduce several auxiliary lemmas used to prove Lemma 5. Our discussion communicates the main intuition behind the proof, leaving the technical details to Appendix B. Let be a constant independent of n whose precise value will be specified later and define
Additionally, we define the stopping times
We now describe a sequence of cycles, or attempts, such that in each cycle, the probability of the joint chain coupling is bounded from below by a constant independent of n. Given an initial state belonging to some , we wait until , which marks the start of the first cycle. From that point, we wait until . If , then we give up trying to couple this cycle and wait until to start a fresh cycle. If , then there are idle servers and at most nonempty buffers. From such a state, we are guaranteed that coupling happens if the joint CTMC enters and spends an exponentially distributed amount of time there before all servers in become busy; that is, . If , we give up trying to couple this cycle and wait until for the next cycle to restart the coupling attempt. Note that this cycle sequence resembles a renewal sequence, but the new cycle times are not renewal times because the values of can vary at the start of each new cycle.
From our discussion, it follows that coupling is guaranteed in any given cycle if, starting from a state with , the events and occur. In Appendix B, we derive a lower bound, uniform in n, on the probability of coupling in a given cycle, implying that coupling is guaranteed to happen after a geometrically distributed number of cycles. We also derive an upper bound, uniform in n, on the expected time until the start of the first cycle, as well as the expected cycle duration, and then combine these bounds and prove Lemma 5.
3.2. Higher-Order Bounds
To prove the higher-order bounds, we first use the Poisson equation to write in terms of and first-order differences of . With the help of this expression, we use the dynamics of the JSQ model to relate all the second-order differences to each other and prove that
For the following discussion, we assume that with . Recall from (5) that
We rearrange the Poisson equation to see that when , or alternatively ,
Note that because and . Together with the bound on from (19), this implies that
Similarly, if ,
Not all second-order differences can be bounded like this. For example, the equation for would involve the third-order difference , which we have not bounded. Instead, the following lemma relates the remaining second-order differences to and using the structure of the JSQ system. The proof is postponed to Section 3.2.3.
Fix . Then for any with ,
We see from Lemma 6 that to bound the second-order differences, we only need bounds on and . The former is bounded in (24); for the latter term, we note that for any with ,
For all ,
3.2.1. Bounding .
The bounds in (21) and (26) do not yet look like the stated bounds in Proposition 1 because the term is present. However, we can bound this expectation using the Poisson equation as follows. Recall that ; let , and observe that this point is in S. In fact, it is the closest point in S, when rounded up, to the fluid equilibrium of the JSQ system, which happens to be ; see Braverman (2020). From (22) we have
Choosing and noting that yields
To bound we need only bound because because of (19). Note that we cannot use (24) for the second-order difference bound because is present on the right-hand side there. Instead, we exploit the structure of the JSQ model to bound as follows.
Define ; let be the scaled version of the coupling defined in Lemma 4, and let V be the unit-rate exponentially distributed random variable defined in the same lemma. Fix with , and suppose and . Consider the evolution of for . If , the two processes couple and become identical. Otherwise, the joint process is in state . Using the strong Markov property, we conclude that
Choosing , we see that
Choosing and using , we arrive at
The quantities involving are bounded in the following lemma.
There exists a constant such that for all ,
Lemma 8 is proved in Section B.5 in Appendix B. It implies that , and therefore
Combining (32) with (21) proves the second-order bounds in Proposition 1.
Before moving on, let us make a few remarks. The bound in (32) implies that the sequence of steady-state distributions is tight; when combined with process-level convergence of to the diffusion , tightness can be used to imply convergence of the steady-state distributions via a limit-interchange argument. For an example of this applied to the JSQ model, see Braverman (2020). Alternatively, (32) can be recast into a result about the convergence rate to the mean-field equilibrium.
Let , noting that and that ; suppose for the sake of exposition that . One may check that the bound in (30) holds even when , in which case (29) implies that
If we divide both sides by to consider the mean-field scaled version of , we get
Thus, we recover the rate of convergence to the mean field equilibrium that one typically obtains using Stein’s method for the mean-field model, like in Ying (2017). The approach used to show tightness in this section can offer an alternative to the one proposed by Ying (2017), but the difficulty of implementing our approach is directly related to the difficulty of obtaining the relevant Stein factor bounds.
As a final remark, in this section we have shown that establishing tightness, or rates of convergence to the mean-field equilibrium, is equivalent to bounding the first- and second-order differences of at a single point near the fluid equilibrium of the CTMC. In contrast, establishing rates of convergence to the diffusion requires bounds on the second- and third-order differences at all points in the support of Y.
3.2.2 Third-Order Bounds.
To bound , we recall (23), which says that for with ,
Applying to both sides yields
The bounds on the first- and second-order differences of , together with the fact that , imply that
3.2.3. Proving Lemmas 6 and 7.
To conclude the section, we prove the auxiliary lemmas from Section 3.2.
Our first task is to bound
Note that with implies that . Working with the unscaled CTMC, we now construct four processes defined on the time interval , where
We refer to as the ith process. Process four is a copy of . Numbers two and three are copies of four but with one extra customer who is assigned to a server with an empty buffer. The extra customer in two is different from the one in three. Lastly, process one is a copy of four but with two extra customers. The extra customers are the same as those in two and three. Figure 6 visualizes the initial condition of the processes.
Let be the scaled counterparts of these processes. Note that
We refer to the different customers according to their shapes in Figure 6. Define τs and τd to be the service times of the server with the star and diamond customer, respectively. Both are exponentially distributed with unit mean. Setting , we observe that if , then
Therefore,
Because for ,
The remaining bounds are proved similarly, starting with . Fix with , and consider
We again construct a coupling corresponding to the four initial states on the right-hand side above. The initial conditions of the unscaled processes are visualized in Figure 7. Our construction yields
Let . We again let τs and τd be the remaining service time of the server with the star and diamond customer, respectively, and set . Just like we argued before, if , then the integrand in (35) is zero after τm. If, however, , then
Therefore,
Figure 8 illustrates the coupling needed to bound . The idea of the proof is again to wait until and analyze what could happen if one of the servers containing the star or diamond customer completes service before . We leave the details to the reader. □

Notes. The red customers represent those common to all four systems. The diamond and star are the extra customers.

Note. The red customers represent those common to all four systems.

Let us say a few words on the advantage of using the prelimit generator comparison approach over the classical generator comparison approach. Lemma 6 is proved using a synchronous coupling of four JSQ systems. The four systems are initialized one or two customers apart from one another; because of the discrete state space of the CTMC, all four systems stay one or two customers apart until they couple. Had we used the classical generator comparison approach, we would have needed to carry out a similar analysis by coupling four copies of the diffusion . However, unlike the JSQ coupling, the four diffusions would not maintain their initial spacing relative to each other because takes values in a continuous state space. This would further complicate the analysis as we would now need to keep track of the positions of the four diffusions relative to each other.
We want to bound
As we are accustomed to doing by now, let us construct a coupling with
System two has one less idle server and one more customer waiting in a buffer compared to system one, but the total initial customer count is identical across both systems. The initial condition of both systems is visualized in Figure 9. We assume that the diamond and star customers are independent of each other, that the systems see identical arrivals, and that the rest of the customers are identical across both systems.
Now define τd and τs to be the remaining service times of the server that has the diamond and star customer, respectively; let ; set . If or , then for . Letting be the scaled version of , it follows that
To bound the first term on the right-hand side, note that
The first inequality is true because , and the last inequality follows from Lemma 8 with there. Furthermore,
The first inequality follows from the bound on the first-order difference in (19) together with the fact that for all . The second inequality follows by noting that τs is independent of ν1 and using Lemma 8 with , and there. □

Note. The red circles represent customers common to both systems.
4. Conclusion
As stated in the introduction, the Stein factor bounds require the bulk of our efforts. Proving the first-order bounds in Section 3.1 amounts to considering two coupled JSQ systems, initialized with a difference of one customer, and bounding the expected coupling time of this joint chain. We bound the coupling time by considering a sequence of coupling attempts where the probability of coupling in a single attempt is bounded away from zero uniformly in n; the expected interattempt times are also bounded from above, uniformly in n. The coupling time can then be bounded by a sum of a geometrically distributed number of random variables representing the interattempt durations. This renewal-like argument applies more generally to settings where (a) there is a region of the state space where the joint chain is guaranteed to couple provided it spends enough time there and (b) one can control the expected time to reach this region and the probability of coupling in the region before leaving it.
With the first-order Stein factor bounds in hand, the higher-order bounds require less effort. Our proofs of the high-order bounds make heavy use of the transition structure of the JSQ system and, in particular, that increase only at those times when . Readers should not be misled into thinking that high-order Stein factor bounds require less effort than first-order bounds for all models. Indeed, in the classical generator comparison approach, high-order bounds require much more effort; see, for example, Mackey and Gorham (2016), Erdogdu et al. (2018), Jin et al. (2022).
Regarding extending our results, we note that Proposition 2, which compares GX to GY, can be easily adjusted to hold for other parameter regimes and load-balancing policies. The main difficulty would be establishing Stein factor bounds. As mentioned in the introduction, Zhao et al. (2021) considered the super-Halfin-Whitt regime () and established several hitting-time estimates similar to the ones we use in the proof of Lemma 5 to bound the first-order Stein factors. It may be possible to build on their results and obtain rates of convergence for the super-Halfin-Whitt regime too.
Furthermore, it seems that the sub-Halfin-Whitt regime () should present less of a challenge than our own setting. Recall from the discussion in Section 3.1 that coupling of the joint CTMC is guaranteed provided it enters and spends an exponentially distributed amount of time there before all servers become busy. Compared to the Halfin-Whitt regime, the rate at which customers arrive in the sub-Halfin-Whitt regime is much smaller, so the event that all servers are busy should happen less frequently. Indeed, Liu and Ying (2020) showed that the steady-state probability that all servers are busy tends to zero in the sub-Halfin-Whitt regime. Consequently, the Stein factor bounds should be simpler to establish.
Appendix A. Supporting Proofs for Section 2
We first prove Lemma 2 and then introduce the operator A in Section A.1 in Appendix A. Once A is introduced, we prove Proposition 2 in Section A.2 in Appendix A.
Initialize Y(0) according to Y. Because satisfies (1), for any with , Itô’s lemma implies that
If , then follows from the Fubini-Tonelli theorem. □
A.1. The Interpolator A
The operator A discussed in this section is identical to the one introduced in appendix A of Braverman (2022), but we repeat its key properties here as they are needed for the proof of Proposition 2. Consider a one-dimensional function . We can extend it to by defining
The following result is theorem A.1 of Braverman (2022). We use this as an interface that contains the important properties of A without delving into the low-level details behind its construction.
Given a convex set , define
The bound in (A.4) holds almost everywhere when . This bound is the reason we need to use and the four points to the right of it (in each dimension). By using more (fewer) points, one can alter the theorem so that (A.4) holds for larger (smaller) values of . It is worth noting that to prove the results in this paper, we do not go beyond .
Going forward, we let A be the operator described in Theorem A.1. Because Af coincides with f on the grid, we refer to A as an interpolator. For the interested reader, A is a degree-7 polynomial spline. From (A.3) we see that A is a linear operator, and (A.6) implies that A applied to a constant simply equals that constant. Before we can prove Proposition 2, we require one more lemma.
In the setting of Theorem A.1, for any and ,
Furthermore, there exists some satisfying
The proof is identical for all indices, so we assume that j = 1. Fix and let be a function in x1 only. The form of in (A.2), together with (A.5), implies that
It follows that , where is a polynomial defined in (A.1) of Braverman (2022). Furthermore, (A.1) implies that
Now ,
Note that some of the bounds in Theorem A.1 and Lemma A.1 have a constant C(d) depending on the dimension d of the function (e.g., (A.4)). In the JSQ model , but when proving Proposition 2 in the next section, we can assume that d = 2 because of the following. Given a function , we can use (A.2) and (A.5) of Theorem A.1, and the fact that Yi = 0 for i > 2, to see that
Because depends only on Yj, we see that is actually a bivariate function. In Section A.2, we treat any function of the form as a function of two variables.
A.2. Proving Proposition 2
Fix . We recall from (5) that for ,
We first argue that , and , which together imply that (12) holds. The latter two statements follow immediately from the fact that , and therefore , have compact support. Because , inequality (A.4) of Theorem A.1 implies that Ah(Y) is Lipschitz and therefore, because of Lemma 3, which states that the moments of Yi are finite.
Next we argue that for all . Given the Poisson Equation (A.9) and the definition of A in (A.2) of Theorem A.1, it suffices to show that for all . From the definition of SQ in (4) we know that any point satisfies . The corresponding points satisfy , and . The latter inequality says that the combined number of idle servers and servers with at least one person waiting in the buffer cannot exceed n. Now, provided that n > 16, any point δk in
We bound , and in Section A.2.1 and bound in Section A.2.2.
A.2.1. Bounding through .
We begin with the bound on
The fact that is Lipschitz, that , and that imply that
Combining the bounds on the three terms yields the bound on . Lemma A.1 implies the bound on , and (A.10) implies the bound on .
A.2.2. Bounding .
Bounding requires more effort. The first thing to note is that the weighted sum representation of is difficult to work with. Our first task is therefore to write it in a form that is more amenable to analysis. To this end, we extend the domain of to allow either the first or second coordinate to take the value by defining
The form of is tied to the transition structure of the JSQ model and specifically to the “reflection” that occurs near the boundaries and . Furthermore, the definition of A in Theorem A.1 implies that for because on . Having defined , we present the following lemma, which is proved in Section A.2.3.
For any ,
Consequently, for any ,
We now bound using Lemma A.2. Applying Taylor expansion to (A.13), we have
Note that because , so . We now prove the following four bounds, which together imply the bound on :
We begin with (A.15). Observe that because . Furthermore, implies . Combining this with (A.4) of Theorem A.1, we get
We now prove (A.16). As before, implies that , so
Now when and , the definition of in (A.11) implies that
Combining this with (A.19) implies (A.16). To prove (A.17), we note that because , so
The definition of in (A.11) says that , so , implying (A.17). Lastly, we prove (A.18). Theorem A.1 tells us that are degree-7 polynomials in whose coefficients do not depend on kj or δ, so there exists a constant C > 0 such that for j = 1, 2, so
Now
An identical argument allows us to bound the second term on the right-hand side of (A.14), yielding
Using and , we conclude (A.18).
A.2.3. Proving Lemma A.2
To prove Lemma A.2, we need the following result.
Suppose and let . Given , for those k such that and , we define
Then is well defined for those such that and for all , where . Furthermore, for all such x,
The proof is identical to the proof of proposition 3 of Braverman (2022). □
First, we prove (A.12). Any satisfies , or . It follows from the definition of GX in (5) that for ,
Note that , and . Although is technically not defined when , we adopt the convention that . Using the definition of in (A.11), we have
Similarly, because corresponds to ,
Appendix B. Supporting Proofs for Section 3
Apart from the short proof of Lemma 8 in Section B.5, this appendix is devoted to the proof of Lemma 5. Going forward, we fix and recall from Section 3.1 that
Following the proof roadmap of Lemma 5, we need an upper bound on the expected start of the first cycle and the expected duration of a single cycle. The following two lemmas provide the ingredients for these bounds and are proved in Sections B.1 and B.2, respectively.
For all ,
For all and with ,
To bound the probability of coupling in a given cycle, we require the following two lemmas.
There exists a constant such that for all ,
There exists a constant such that for all ,
Lemmas B.3 and B.4 are proved in Sections B.3 and B.4, respectively.
Throughout the proof, we use C to denote a positive constant that may change from line to line but depends only on β and b. Given any initial condition ,
For convenience, we abuse notation and adopt the convention that
Lemma B.2 implies that for any ,
We will argue that if and are the constants from Lemmas B.3 and B.4, then
As a result, choosing implies that
We note that
Provided we can show that
To conclude, we argue that
If , then for all by construction. The joint CTMC couples before if , where V is as in (17). If for , coupling will happen before if the joint CTMC transitions to and then spends V time units there, all before . From the construction of , we know that the time taken to get from to equals the sum of i – 1 unit-mean exponentially distributed random variables, so the worst case is when . Letting represent this sum, it follows that
B.1. Proving Lemma B.1
Define and observe that
Because , it follows that for any with ,
Let M > 0, , and note that for . Dynkin’s formula, for example, lemma 17.2 in Kallenberg (2001), then implies that for any with and ,
Because and , it follows that , so
B.2. Proving Lemma B.12
Recall that and . In this section, we show that if . Our proof is based on a Lyapunov function characterized by the following proposition, proved in Section B.2.1.
There exists a function such that for any and any with ,
Furthermore, there exists a constant such that for any ,
Let V(x) be the function in Lemma B.5, fix with , M > 0, and define . Dynkin’s formula says that
Because for all , Lemma B.5 implies that
Combining this inequality with (B.12) and that and yields
If b = 1, the lemma follows trivially, so we assume that b > 1. It suffices to show that
Let N be the number of customers in the interval that arrive when all servers are busy and all queues have at least one customer in them; that is, . For , let ξi be the time customer i spends being second in line, even if that customer becomes second in line after time M. We argue that conditioned on , each ξi is exponentially distributed with unit mean. Upon entry into the system, if customer i is routed to a busy server with only one other customer waiting in the buffer, then ξi is distributed according to the remaining service time of the server, which is exponentially distributed with unit mean. If the buffer has more than one customer waiting, then ξi equals the service time of the customer two spots ahead of customer i, which is also exponentially distributed with unit mean. Further note that the CTMC can be constructed in such a way that the value of ξi is determined at the instant when customer i enters the system, so
Let ηi be the time spent by the CTMC in a state with before customer i’s arrival. Because the arrivals to the JSQ system are governed by a rate- Poisson process, the arrival of customer i corresponds to a time when ηi accumulates to equal an exponentially distributed random variable with rate ; therefore,
B.2.1. Proving Lemma B.5.
The Lyapunov function in Lemma B.5 is based on the fluid limit of the JSQ system, studied in Braverman (2020). Lemma B.5 was, unfortunately, not proved there; but that paper contains all the necessary ingredients for the proof. We now recall them using notation from Braverman (2020).
Consider the two-dimensional process . Note that the first coordinate is nonpositive, whereas so far we have been using a nonnegative first coordinate. Section 4.1 of Braverman (2020) described the fluid limit of this process. Letting
The following result proved in Section B.2.2 gives us control over the derivatives of V(x).
For any with ,
Furthermore, if we choose and , then for any with , and any ,
Let and and V(x) be the function from Lemma B.6, and recall GX defined in (5). Because V(x) depends only on x1 and x2,
Using Taylor expansion, we get
Now suppose . The bounds on the second-order derivatives of V(x) from Lemma B.6, together with the facts that , and , imply that . Next, we rewrite the first line on the right-hand side of (B.18), for which we note that
Now when , so (B.17) in Lemma B.6 tells us that provided that , which we assume, so
B.2.2. Proof of Lemma B.6.
Fix and . The function was considered in lemma 8 of Braverman (2020), which tells us that
Combining this with
We write the equation for in (B.22), but writing it requires us to introduce some nontrivial objects from Braverman (2020). The first object we need is the family of curves , where is the graph of the unique fluid-limit trajectory that intersects the x2 axis at the point . For the purposes of this proof, it suffices to treat as a two-dimensional geometric object satisfying the following properties:
The set is a graph of a continuous function; that is, for some continuous function .
The intersection .
If and , then .
If , then and lies above .
The first three properties are implied by lemma 5 of Braverman (2020), and the fourth one follows from (39) there. Because is a graph, sets of the form , etc., are well defined. Let us use and to partition Ω into the four sets
The four properties of are sufficient to argue that and that the interiors of Si and Sj are disjoint when ; we refer the reader to section C.2 of Braverman (2020) for more details.
The last object we need is the function , which represents the first time that the fluid limit hits the x2 axis starting from a state . The precise definition of is bulky and involves the Lambert-W function, but we can get by with only a few of its properties. Namely, for any , lemma 6 of Braverman (2020) introduces a nonnegative function with , which is differentiable for all and satisfies
By choosing , we are assured that is defined on the set . Item 1 of lemma 6 in Braverman (2020) tells us that is tied to for any via
We are now ready to bound the derivatives of . Equation (C.9) of Braverman (2020) tells us that
Let us now argue that for any . If , this bound is implied by the inequality for . If , the bound is implied by the fact that and . If , we note that , and , meaning that
Observe that when , which is true for any with , implying the claim about in (B.17). To conclude the proof, it remains to show by differentiating both sides in (B.22). Note that for . When , we use the bound on in (B.23), as well as the fact that , to see that
When ,
Using the expression for in (B.20), we see that
The first inequality follows from because of (B.21) and the fact that for . Lastly, we consider the case when , for which we recall that
To help organize terms, let and note from (B.20) that
We see that because and for because of (B.21). Furthermore, because and for , we conclude that
Let us now differentiate and bound each term on the right-hand side of (B.25) individually. First,
The first inequality is because is nondecreasing and , and the second inequality follows from the fact that and the bound on in (B.23). Differentiating the second term in (B.25), we get
To bound the first term, we use the fact that ; we repeat the argument used to prove (B.24) to see that
Furthermore, the bounds on and in (B.23) and (B.26), together with the fact that , imply that
Combining the pieces yields , proving (B.17).
To conclude, we prove that for by proving that for . The form of below can be found in lemma B.1 of Braverman (2020):
The fact that follows from , the definitions of S1, S2, and S3, and (B.21). We combine all the cases above into the single upper bound
Using the inequality for , and the fact that and , we see that ,
Furthermore, (B.21) and the definitions of S2 and S3 imply that
We conclude by combining all of these bounds with (B.27). □
B.3. Proof of Lemma B.3
Assume without loss of generality that and , because starting from , a state with and must be visited before , so
We bound the right-hand side by relating it to the ruin probability in a certain gambler’s ruin problem. Namely, we construct a random walk with that satisfies
Jumps in the random walk are governed by a Poisson process with rate ; the up-step and down-step probabilities are
Recalling the values of θ1 and θ2 and the fact that , we see that
Recall that and , and let be a copy of , but with the modification that any server with a nonempty buffer permanently halts all its work. Then for all and all because this modified system has the same arrival stream as but serves fewer customers. It follows that
Now consider the process
Note that because is nondecreasing in t, which implies that
Note also that because is nondecreasing in t and increases only when . Hence,
An arrival to increases the value of , and a service completion by a server with an empty buffer decreases its value. However, is still not the random walk we desire because the rate at which it decreases depends on the state of . Instead, we want a random walk with a constant downward rate.
To construct this random walk, for let us define by setting and defining the transitions of the joint process in Tables B.1–B.3. Because we are defining only until the time hits , we do not need to specify the transitions for states where . The intuition for the transition structure is as follows. Because arrivals occur at the constant rate of , we want any arrival to to also occur in . However, we want to keep the rate at which decreases a constant value of . To accomplish this, when , the transitions in Table B.2 have ignore some departures from ; when , we supplement the departures from ; for example, see transition in Table B.3. Having defined , let us define
|
Table B.1. Arrival Transitions for the Joint Process in State
| # | Rate | Transition |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 |
|
Table B.2. Departure Transitions for the Joint Process in State with and
| # | Rate | Transition |
|---|---|---|
| 5 | ||
| 6 |
|
Table B.3. Departure Transitions for the Joint Process in State with and
| # | Rate | Transition |
|---|---|---|
| 7 | ||
| 8 | ||
| 9 |
To prove that satisfies (B.28), we show that
Together with (B.30), these inequalities imply that
To see why (B.31) is true, let us study the transitions in Tables B.1–B.3. Table B.1 tells us that and R(t) increase at the same times. The transitions in Table B.2 show that any decrease in , and consequently , must be accompanied by a decrease in and R(t) but not vice versa. The only way can ever drop below is via transition 8, which can happen only if , so the first intersection of and has to occur below . Therefore, for all times
Let us now prove (B.31) by showing that the right-hand side is greater than
Because ,
Furthermore, because is nondecreasing and increases only at those times when , it follows that for all ,
B.4. Proving Lemma B.4
Central to our argument is a result about the moment-generating function of the duration of a gambler’s ruin game. We now describe this result and then prove Lemma B.4. Consider a discrete-time gambler’s ruin problem where the initial player’s wealth is z, the win probability is p, the loss probability is q, and the player keeps playing until he or she goes broke or accumulates a total wealth of a. Let be the number of turns until the game ends, given an initial wealth of z. An expression for the generating function was given in (4.11) and (4.12) in section XIV.4 of Feller (1968):
Now consider the continuous-time gambler’s ruin problem, where the durations between turns are governed by an i.i.d. sequence of rate r exponentially distributed random variables. Given initial wealth z, the duration of the continuous game equals . Because the Ei are independent of Dz, it follows that
Let i and q2 be integers such that and , and define
Consider the continuous-time gambler’s ruin problem with probabilities
As discussed below (B.10), , where is the sum of b + 1 unit-mean exponentially distributed random variables. The same discussion says that represents the time needed by the joint CTMC to transition from to and to then couple by spending an exponentially distributed amount of time in . Thus,
Let us analyze the probability above. At time t = 0, there are q2 servers with nonempty buffers and another server containing the extra customer in . We group these servers together into group A and the remaining servers into group B. Let and be the number of busy group A and B servers, respectively. Because
If a customer arrives when more than one server is idle, we prioritize assigning this customer to servers in group B over group A. Note that this tie-breaking rule is consistent with the tie-breaking rule we imposed in the proof of Lemma 4. Let be the first time that all servers in group B are busy. By construction, , so
The last inequality is true because increasing the value of the initial condition does not increase the chance that . We now relate the right-hand side to the moment-generating function considered in Lemma B.7 and use that lemma to conclude the proof. We can write , where Gi are i.i.d. unit-mean exponentially distributed random variables independent of for because they correspond to service times of the server containing the additional customer in , which is a server in group A.
Fixing and , for , we define
We now show that can be bounded from below by the duration of a gambler’s ruin game, which allows us to apply Lemma B.7. Fix , and consider the time interval , on which we construct the coupling by setting
Note that the only time decreases but does not is when the latter is smaller than , so we are guaranteed that
Recalling the definitions of and , we have
Recall that Gi corresponds to the service time of a group A server and is therefore independent of . Furthermore, because Gi is exponentially distributed with unit mean, conditioning on the value of yields
Applying (B.34) of Lemma B.7 concludes because our construction of implies that is the duration of a gambler’s ruin game with initial wealth , terminal wealth , rate , and up-step and down-step probabilities
|
Table B.4. Transition Rates in State with
| Rate | Transition |
|---|---|
|
Table B.5. Transition Rates in State with
| Rate | Transition |
|---|---|
B.4.1 Proving the Gambler’s Ruin Result.
We require the following auxiliary lemma.
Assume is a sequence that converges to . Then
Let and ; note that for any ,
From the mean-value theorem, we know that there exists some cn between xn and such that
Because , it follows that for n large enough; therefore,
We can make the right-hand side arbitrarily small by increasing n. □
Recall that and that
Fix . To show that , we derive expressions for and . For notational economy, we let . We can write p and q as
Let us first consider , which satisfies
We now show that the terms inside the square brackets have limits as ; that is,
Note that ; recall the definition of r to see that , so
Furthermore, recalling the definition of , we have
We know that exists because q2 is fixed between zero and . This proves (B.37). Recall that and . Because , it follows that
The expressions for and follow similarly. Comparing
For convenience, we define and , so that
It is straightforward to check that using (B.37). Let us now prove that . Using the definition of , we have
Set . We want to show that for any ,
Rearranging terms, this is equivalent to
Fix and treat the left-hand side as a function of x. Both sides are equal when x = 0, so it suffices to show that the derivative of the left-hand side with respect to x is negative. Now
For the right-hand side to be negative, we must have
Because , the left-hand side is bounded from below by c provided that . The right-hand side converges to c as , so we must show that the derivative of the right-hand side is negative. Differentiating yields
The numerator equals zero when y = 0. Its derivative equals
B.5. Proof of Lemma 8
It suffices to show that because
Letting and using yields , which implies that , so
Note that is equivalent to . If , we observe that the right-hand side equals , which verifies (31) when . If, however, , we may use Stirling’s approximation to see that for ,
Setting proves (31) when . To prove (31) when and requires just a little more work. Setting ,
To conclude, we need to bound
Using Taylor expansion,
The argument when is identical. This proves (31) when . □
References
- (2012) A diffusion regime with nondegenerate slowdown. Oper. Res. 60(2):490–500.Link, Google Scholar
- (2019) Join-the-shortest queue diffusion limit in Halfin-Whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. 29(2):1262–1309.Google Scholar
- (2020) Join-the-shortest Queue diffusion limit in Halfin-Whitt regime: Sensitivity on the heavy-traffic parameter. Ann. Appl. Probab. 30(1):80–144.Google Scholar
- (1988) Stein’s method and Poisson process convergence. J. Appl. Probab. 25:175–184.Google Scholar
- (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84(3):297–322.Google Scholar
- (2020) Steady-state analysis of the join the shortest queue model in the Halfin-Whitt regime. Math. Oper. Res. 45(3):1069–1103.Link, Google Scholar
- (2022) The prelimit generator comparison approach of Stein’s method. Stochast. Systems 12(2):181–204.Link, Google Scholar
- (2017) Stein’s method for steady-state diffusion approximations of systems. Ann. of Appl. Probab. 27(1):550–581.Google Scholar
- (2001) Stein’s method and birth-death processes. Ann. Probab. 29(3):1373–1403.Google Scholar
- (2021) To pool or not to pool: Queueing design for large-scale service systems. Oper. Res. 69(6):1866–1885.Link, Google Scholar
- (2018) Global non-convex optimization with discretized diffusions. Preprint, submitted October 29, https://arxiv.org/abs/1810.12361v1.Google Scholar
- (2012) Asymptotically tight steady-state queue length bounds implied by drift conditions. Queueing Systems 72(3-4):311–359.Google Scholar
- (2018) Join the shortest queue with many servers: The heavy-traffic asymptotics. Math. Oper. Res. 43(3):867–886.Link, Google Scholar
- (2018) Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Preprint, submitted January 24, https://arxiv.org/abs/1801.07815.Google Scholar
- (1968) An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. (John Wiley & Sons Inc., New York).Google Scholar
- (2017) Expected values estimated via mean-field approximation are 1/n-accurate. Proc. ACM Measurement Anal. Comput. Syst. 1(1):1–26.Google Scholar
- (2017) A refined mean field approximation. Proc. ACM Measurement Anal. Comput. Syst. 1(2):1–28.Google Scholar
- (2019) Size expansions of mean field approximation: Transient and steady-state analysis. Performance Evaluation 129:60–80.Google Scholar
- (2020) Stein’s method for the single server queue in heavy traffic. Statist. Probab. Lett. 156:108566.Google Scholar
- (1991) On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2):724–739.Google Scholar
- (2019) Load balancing in the nondegenerate slowdown regime. Oper. Res. 67(1):281–294.Link, Google Scholar
- (2014) Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. 24(6):2527–2559.Google Scholar
- (2021) Beyond scaling: Calculable error bounds of the power-of-two-choices mean-field model in heavy-traffic. Proc. Twenty-Second Internat. Sympos. Theory, Algorithmic Foundations, Protocol Design Mobile Networks Mobile Comput. (Association for Computing Machinery, New York), 1–10.Google Scholar
- (1981) Heavy-traffic limits for queues with many exponential servers. Oper. Res. 29(3):567–588.Link, Google Scholar
- (2022) A load balancing system in the many-server heavy-traffic asymptotics. Queueing Systems Theory Appl. 101(3–4):353–391.Google Scholar
- (2022) An approximation to the invariant measure of the limiting diffusion of G/Ph/n+GI queues in the Halfin-Whitt regime and related asymptotics. Preprint, submitted September 15, https://arxiv.org/abs/2209.07361.Google Scholar
- (2001) Foundations of Modern Probability, Springer Series in Statistics, Probability and Its Applications, 2nd ed. (Springer, New York).Google Scholar
- (2019) A simple steady-state analysis of load balancing algorithms in the sub-halfin-whitt regime. SIGMETRICS Perform. Eval. Rev. 46(2):15–17.Google Scholar
- (2020) Steady-state analysis of load-balancing algorithms in the Sub-Halfin-Whitt regime. J. Appl. Probab. 57(2):578–596.Google Scholar
- (2022) Steady-state analysis of load balancing with coxian-2 distributed service times. Naval Res. Logist. 69(1):57–75.Google Scholar
- (2021) On a stein method based approximation for a two-dimensional Markov chain. Preprint, submitted June 6, https://arxiv.org/abs/2106.03145.Google Scholar
- (2016) Multivariate Stein factors for a class of strongly log-concave distributions. Electronic Comm. Probab. 21:1–14.Google Scholar
- (2001) The power of two choices in randomized load balancing. IEEE Trans. Parallel Distributed Systems 12(10):1094–1104.Google Scholar
- (2016) Universality of load balancing schemes on the diffusion scale. J. Appl. Probab. 53(4):1111–1124.Google Scholar
- (2011) Fundamentals of Stein’s method. Probab. Surveys 8:210–293.Google Scholar
- (1972)
Probability theory: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , vol. 2, Proc. Sixth Berkeley Sympos. Math. Statist. Probab. (University of California Press, Berkeley), 583–602.Google Scholar - (2015) Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems 80(4):341–361.Google Scholar
- (2021) Scalable load balancing in networked systems: A survey of recent advances. Preprint, submitted November 4, https://arxiv.org/abs/1806.05444.Google Scholar
- (1996) Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems Inform. Transmission 32(1):15–27.Google Scholar
- (1978) On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2):406–413.Google Scholar
- (1977) Optimality of the shortest line discipline. J. Appl. Probab. 14(1):181–189.Google Scholar
- (2017) Stein’s method for mean field approximations in light and heavy traffic regimes. Proc. ACM Measurement Anal. Comput. Systems 1(1):1–27.Google Scholar
- (2021) Many-server asymptotics for join-the-shortest queue in the super-Halfin-Whitt scaling window. Preprint submitted May 31, https://arxiv.org/abs/2106.00121.Google Scholar
- (2020a) A note on load balancing in many-server heavy-traffic regime. Preprint, submitted April 20, https://arxiv.org/abs/2004.09574v1.Google Scholar
- (2020b) A note on Stein’s method for heavy-traffic analysis. Preprint, submitted Mar 13, https://arxiv.org/abs/2003.06454v1.Google Scholar

