The Join-the-Shortest-Queue System in the Halfin-Whitt Regime: Rates of Convergence to the Diffusion Limit

Published Online:https://doi.org/10.1287/stsy.2022.0102

Abstract

We bound the rate at which the steady-state distribution of the join-the-shortest-queue (JSQ) system converges, in the Halfin-Whitt regime, to its diffusion limit. Our proof uses Stein’s method and, specifically, the recently proposed prelimit generator comparison approach. The JSQ system is nontrivial and high-dimensional and has a state-space collapse component; our analysis may serve as a helpful example to readers wishing to apply the approach to their own setting.

1. Introduction

Consider a queueing system with n identical servers, each with a finite buffer of length b. Customers arrive according to a Poisson process with rate nλ, and service times are independent and identically distributed (i.i.d.), exponentially distributed with rate one. Customers cannot change servers after the initial routing decision, and a customer arriving to a system where all servers are busy and all buffers are full is blocked. This is known as a parallel-server system. A load-balancing policy specifies the manner in which arriving customers are assigned to the servers. In this paper, we consider the classical join-the-shortest-queue (JSQ) policy. Under JSQ, an arriving customer enters service immediately if at least one server is idle; if not, they get routed to the server with the smallest number of customers in its buffer. Ties are broken arbitrarily. We refer to this as the JSQ system.

Parallel-server systems have generated immense interest in recent years, and the JSQ policy is fundamental because it minimizes the expected customer delay and maximizes, with respect to stochastic order, the number of customers served in a given time interval; see, for instance, Winston (1977) and Weber (1978). For a sample of recent work on the JSQ policy, we refer readers to Eryilmaz and Srikant (2012), Mukherjee et al. (2016), Eschenfeldt and Gamarnik (2018), Banerjee and Mukherjee (2019), Gupta and Walton (2019), Liu and Ying (2019), Banerjee and Mukherjee (2020), Braverman (2020), Zhou and Shroff (2020a, b), Cao et al. (2021), Hurtado-Lange and Maguluri (2022), and Zhao et al. (2021). Other popular load-balancing policies include the join-the-idle-queue policy (Stolyar 2015, Mukherjee et al. 2016), the idle-one-first policy (Gupta and Walton 2019), and of course the power-of-d policy (Vvedenskaya et al. 1996, Mitzenmacher 2001); but in this paper, we focus on the JSQ policy. We make no attempt to give a comprehensive review of the literature on parallel-server systems, instead referring the reader to van der Boor et al. (2021) for a recent survey.

Understanding the exact performance of the system is known to be difficult, and much attention has been devoted over the past decade to “heavy-traffic” asymptotics. The term heavy-traffic refers to parameter regimes where the system utilization tends to one. “Conventional heavy traffic” assumes that the number of servers n is fixed and λ1, whereas “many-server heavy traffic” assumes that n and λ1 jointly. For two examples of work in the conventional heavy-traffic setting, see Eryilmaz and Srikant (2012) and Zhou and Shroff (2020b). In this paper, we use the term heavy-traffic to refer to the many-server setting—the setting considered in most of the papers mentioned in the previous paragraph.

There are multiple many-server heavy-traffic regimes, depending on how n and λ jointly converge to their limit. For example, assuming that λ=11/n yields entirely different asymptotic behavior compared to when λ=11/n. To capture all the possible heavy-traffic regimes, it is common practice to assume that the per-server load λ is related to the number of servers n through λ=1β/nα(0,1) for some α0 and β>0. In this paper, we focus on the case when α=1/2; that is, λ=1β/n. This regime is known as the Halfin-Whitt regime and is ubiquitous across the queueing theory literature. It derives from the work of Halfin and Whitt (1981) and is also known as the quality-and-efficiency-driven regime because it achieves reasonable customer wait times while maintaining high utilization of servers. The full list of parameter regimes is found in Figure 1.

Figure 1. The Various Many-Server Heavy-Traffic Regimes
Notes. Higher values of α represent heavier loads. Existing work across the different parameter regimes is reviewed in Section 1.1.

We now state and discuss our main results. Let Qi(t) be the number of servers with i or more customers at time t0, noting that Qi(t)=0 for i>b+1. The process {Q(t)=(Q1(t),,Qb+1(t))} is an irreducible continuous-time Markov chain (CTMC) on a finite state space and therefore possesses a unique stationary distribution. We let Q=(Q1,,Qb+1) be the random vector having the stationary distribution of the CTMC. To describe the asymptotic behavior of Q, we let δ=1/n and define the diffusion-scaled random vector X=(X1,,Xb+1) by X1=δ(nQ1), and Xi=δQi for 2ib+1. The results of Eschenfeldt and Gamarnik (2018) and Braverman (2020) imply that X converges in distribution to some limiting R+b+1-valued random vector Y as n. In this paper, we establish an upper bound of order 1/n on the rate of convergence to Y.

The random variable Y is distributed according to the stationary distribution of the diffusion process {Y(t)R+b+1}, which satisfies

Y1(t)=Y1(0)+2W(t)+βt0t(Y1(s)+Y2(s))ds+U(t),
Y2(t)=Y2(0)+U(t)0tY2(s)ds,Y3(t)==Yb+1(t)=0,(1)
where {W(t)} is standard Brownian motion and {U(t)} is the unique nondecreasing, nonnegative process in the space of càdlàg functions D[0,) satisfying 01(Y1(t)>0)dU(t)=0. The diffusion {Y(t)} was shown to be positive recurrent; see Banerjee and Mukherjee (2019) or Braverman (2020). Furthermore, (1) implies that Y3==Yb+1=0.

Our main result is that there exists a constant C(b,β) such that for all n1, and any function h:R+b+1R whose first-order and second-order partial derivatives are bounded in magnitude by one,

|Eh(X)Eh(Y)|C(b,β)/n.(2)

The assumption that b is finite is used frequently in the proof of (2) and, specifically, in the proof of Proposition 1. We deem the finite buffer assumption to be acceptable because it was shown by Braverman (2020) that even with infinite-sized buffers, EQ3C(β) for all n1 in the Halfin-Whitt regime, implying that X30 or that the mass concentrates on those states with at most one customer waiting. Moreover, Liu and Ying (2020) showed that assuming finite buffers, EQ30 as n in the even busier super-Halfin-Whitt regime (1/2<α<1).

In addition to the novelty of our result, this paper makes a methodological contribution. We prove (2) using Stein’s method, a framework introduced by Stein (1972) that allows one to study the rate of convergence of a sequence of random variables to its limit. Popularized in the area of queueing systems by Gurvich (2014), Ying (2017), Braverman and Dai (2017), and Gast (2017), the generator comparison approach of Stein’s method, attributed to Barbour (1988, 1990) and Götze (1991), is used to study convergence rates of steady-state Markov chain distributions to their diffusion, fluid, or mean-field limits. For a few recent applications of the generator comparison approach in queueing, we refer the reader to Gaunt and Walton (2020), Hurtado-Lange and Maguluri (2022), Lu (2021), and Liu et al. (2022); this list is by no means comprehensive. In this paper, we restrict our attention to the case when the limit is the stationary distribution of a diffusion process, referring the reader to Ying (2017) for a treatment of fluid and mean-field limits.

The generator approach requires bounds on various moments of the prelimit, known as moment bounds, and bounds on the derivatives of the solution to the Poisson equation for the limiting distribution. The latter are called gradient bounds in Braverman and Dai (2017). But in this paper, we stick with the original term “Stein factors” or “Stein factor bounds”; see, for example, Ross (2011). Although moment bounds can be difficult to obtain in some applications, Stein factor bounds are typically the bigger problem. When the limit is one dimensional, Stein factors are bounded using the explicit form of the solution to the Poisson equation—an ordinary differential equation. When the limit is multidimensional, the Poisson equation is a partial differential equation (PDE) that generally does not have an explicit solution, making Stein factor bounds harder to establish. Techniques proposed to obtain multidimensional Stein factor bounds include using a priori Schauder estimates from elliptic PDE theory as in Gurvich (2014), using couplings to analyze and bound the sensitivity of the diffusion to its initial condition as in Barbour (1988) and Mackey and Gorham (2016), and bounding the Stein factors using Malliavin calculus as in Fang et al. (2018) and Jin et al. (2022). A detailed description of these techniques can be found in section 1.1 of Braverman (2022). However, despite progress on multidimensional Stein factor bounds, the JSQ system is not covered by existing results because our limiting diffusion in (1) is constrained to the nonnegative orthant via reflecting boundary conditions.

To deal with the Stein factor bound problem, this paper promotes the use of the prelimit generator comparison approach, which was recently proposed by Braverman (2022) as an alternative to the generator comparison approach. The prelimit approach is the mirror image of the classical generator approach. Whereas the latter requires moment bounds on the prelimit X and Stein factor bounds for limit Y, the former needs moment bounds on Y and Stein factor bounds for the prelimit X. For the moment bounds used in this paper, the result that all moments of Y are finite, proved by Banerjee and Mukherjee (2019), is sufficient because our limit Y does not depend on n. The Stein factor bounds pose a bigger challenge, and we deal with them in Section 3. It was noted in Braverman (2022) that the prelimit and classical generator comparison approaches should be equivalent, in theory, in the sense that any bound on |Eh(X)Eh(Y)| obtained using one of them should be attainable using the other. However, in practice, one approach could be more tractable, or convenient, to work with; see, for instance, the example in section 4 of Braverman (2022). In the case of the JSQ system, we discuss in Remark 1 of Section 3.2.3 how the discrete state space simplifies the analysis of the couplings we use to establish Stein factor bounds because the initial spacing of the coupled systems is preserved until coupling.

The introduction of the prelimit approach in Braverman (2022) was intended to be gentle, with the only example used there being the M/M/1 system. Our application of the approach to the JSQ system exposes all of its moving pieces and can be useful to those who want to apply the prelimit approach to their own setting. For example, some of the technical components of this paper that could be useful in other settings include the regenerative argument used to establish first-order Stein factor bounds in Section 3.1, the approach we use to bound E|X| in Section 3.2.1, and our treatment of reflecting boundary conditions in Section A.2.2 in Appendix A.

It should be noted that Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a) used the classical generator comparison approach to obtain rates of convergence of the steady-state total customer count to an exponential random variable for α>2. The former paper was in the continuous-time setting, whereas the latter considered the discrete-time system; the results in both papers also hold for routing policies other than JSQ, such as the power-of-d policy. Because the limiting random variable in both papers is one dimensional, the Stein factors bounds do not pose a challenge there.

1.1. Literature Review

Let us first review the literature on the analysis of the JSQ system in the various many-server heavy-traffic regimes. Most of the work has been done in the setting with infinite buffer sizes, so, unless otherwise noted, we assume that b=. In Eschenfeldt and Gamarnik (2018), the authors established the process-level convergence of {X(t)}n=1 to its diffusion limit in the Halfin-Whitt regime (α=1/2). That paper triggered a wave of interest in the many-server heavy-traffic asymptotics of the JSQ system. Convergence of the stationary distributions was later established by Braverman (2020), and the behavior of the stationary distribution of the limiting diffusion was studied by Banerjee and Mukherjee (2019, 2020). Our work fits with this group of papers, elevating the steady-state convergence result to one with rates of convergence.

Outside the Halfin-Whitt regime, Mukherjee et al. (2016) studied the transient and steady-state behavior of the JSQ system’s fluid limit when λ=1β<1 is a fixed constant (α = 0); Gupta and Walton (2019) established process-level convergence to the diffusion limit when α = 1, known as the nondegenerate slowdown (NDS) regime and introduced by Atar (2012). In the sub-Halfin-Whitt regime when α(0,1/2), Liu and Ying (2019) assumed finite buffers and obtained bounds on the steady-state total customer count in the system. A similar result was obtained for Coxian-2 service times by Liu et al. (2022) and by Liu and Ying (2020) for the super-Halfin-Whitt regime α(1/2,1). Another recent work in the super-Halfin-Whitt regime was by Zhao et al. (2021), who worked with infinite buffers and established transient and steady-state diffusion limits for the normalized total queue length process. Their analysis exploited the regenerative structure of the JSQ system and contained several hitting-time estimates very close to our own estimates needed for the Stein factor bounds in Section 3. Lastly, both Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a) established rates of convergence to the exponential distribution for the steady-state normalized total customer count. Their results covered the case when α>2.

Other works have used Stein’s method in the setting of parallel-server systems beyond Hurtado-Lange and Maguluri (2022) and Zhou and Shroff (2020a). In Liu and Ying (2019, 2020) and Liu et al. (2022), the authors used Stein’s method for mean-field analysis to obtain bounds on steady-state performance metrics of interest, like EQ2 for instance, for the power-of-d system. Another line of work on power-of-d systems was by Gast (2017), Gast and Van Houdt (2017), and Gast et al. (2019), where the authors showed how to derive refined mean-field models for improved steady-state approximations. More recently, Hairi and Ying (2021) provide calculable error bounds for the mean-field approximation of the power-of-two-choices model.

1.2. Notation

We use to denote the set of integers and let N={0,1,2,}. For any kN and BRd, we let Ck(B) be the set of all k-times continuously differentiable functions f:BR. We let eRd be the vector whose elements all equal one and let e(i) be the element with one in the ith entry and zeros otherwise. For any δ>0 and integer d > 0, we let δd={δk:kd} and define δNd similarly. For any function f:δdR, we define the forward difference operator in the ith direction as

Δif(δk)=f(δ(k+e(i)))f(δk),kd,1id;
for j0, we define
Δij+1f(δk)=Δijf(δ(k+e(i)))Δijf(δk),(3)
with the convention that Δi0f(δk)=f(δk). For a vector aNd, we also let
Δaf(δk)=Δ1a1Δdadf(δk);
if f:RdR, then
axaf(x)=a1x1a1adxdadf(x),
and we adopt the convention that 0f(x)/x0=f(x). For any xRd, we define x1=i=1d|xi| and use |x| to denote the Euclidean norm. For any f:RdR, we let f=supxRd|f(x)|. Throughout the paper, we will often use C to denote a generic positive constant that may change from line to line and that is independent of any parameters not explicitly specified.

2. Main Result

Recall that Qi(t) is the number of servers with i or more customers at time t0 and that {Q(t)=(Qi(t))i=1b+1}t0 is an irreducible CTMC with state space given by

SQ={q{0,,n}b+1:qiqi+1}.(4)

Figure 2 gives an example of a state qSQ.

Figure 2. An Example of a State Q(t) = q in a System Where the Number of Servers n = 5
Notes. Customers below the dashed horizontal line are in service, while those above are waiting in buffers. Each vertical column corresponds to a server and its buffer.

We assume that λ=1β/n for some fixed β>0. Let δ=1/n and define the diffusion-scaled CTMC {X(t)} by

X1(t)=δ(nQ1(t)),Xi(t)=δQi(t),2ib+1,
which takes values on the state space
S={(x1q,x2q,,xb+1q)=(δ(nq1),δq2,,δqb+1):qSQ}.

We will often use xqS and qSQ interchangeably. Recalling that Δif(xq)=f(xq+δe(i))f(xq), for any f:SR, the infinitesimal generator of {X(t)} satisfies

GXf(xq)=1(q1<n)nλΔ1f(xqδe(1))+nλj=1b1(q1==qj=n,qj+1<n)Δj+1f(xq)+(q1q2)Δ1f(xq)j=2b(qjqj+1)Δjf(xqδe(j))qb+1Δb+1f(xqδe(b+1)).(5)

The first line of transitions in (5) corresponds to arrivals. We see that for j2, the jth component of xjq only grows provided the preceding j – 1 horizontal levels, as depicted in Figure 2, are full. The transitions in the second line of (5) correspond to service completions. Using Figure 2 again, we interpret (qjqj+1) as the number of servers (vertical columns) with exactly j customers.

Recall that X=(X1,,Xb+1) and Y=(Y1,Y2,0,,0) are distributed according to the stationary distributions of the scaled CTMC and the diffusion {Y(t)R+b+1} defined in (1), respectively. Going forward, we note that unless explicitly stated, all expectations are with respect to the stationary distribution at hand, that is, either X or Y. To state our main result, we define

Mj={h:Rb+1R,axah(x)1,1a1j},
and dMj(X,Y)=suphMj|Eh(X)Eh(Y)|. We use an asterisk to emphasize that h(x) is defined on the continuum Rb+1. Later we will drop the asterisk to refer to functions defined only on the grid δb+1. It was shown in lemma 2.2 of Mackey and Gorham (2016) that M3 is a convergence-determining class; that is, dM3(U,V)0 implies U and V converge in distribution. The following is our main result.

Theorem 1.

For any 0<b<, there exists a constant C(b,β) such that for all n1,

dM2(X,Y)=suphM2|Eh(X)Eh(Y)|C(b,β)/n.(6)

Note that M2 is also a convergence-determining class because M3M2. We prove Theorem 1 in Section 2.1 using the prelimit generator approach of Stein’s method. Multiple parts of the proof assume that n is large enough, say, n>N(β) for some N(β)>0. We can make this assumption without loss of generality by redefining C(b,β) to be larger than max1nN(β)dM2(X,Y).

2.1. Proving Theorem 1

Central to our proof is the ability to extend any grid-valued function to be defined on all of R+b+1. Although there are infinitely many such extensions, we use a polynomial spline A that extends grid-valued functions f:δNb+1R to functions Af:R+b+1R. We leave the detailed construction to Section A.2 in Appendix A because for this section, it suffices to know that A is a linear operator, that AfC2(R+b+1), and that A applied to a constant equals that constant. Recalling that δ=1/n, the following auxiliary lemma is needed.

Lemma 1.

Define

Mdisc,j(c)={h:δNb+1R,|Δah(δk)|cδa1,1a1j,δkδNb+1}.

There exist some C,C>0 independent of any JSQ model parameters such that

dM2(X,Y)suphMdisc,2(C)|Eh(X)EAh(Y)|+Cδ.(7)

Proof of Lemma 1.

The result follows by repeating the arguments used in the proof of lemma 1 in Braverman (2022). □

Going forward, when we write Mdisc,2(C), the constant C is assumed to be the one in Lemma 1. Furthermore, note that if h(0)0, then the linearity of A and the fact that A applied to a constant equals that constant implies that h˜(x)=h(x)h(0) satisfies Eh˜(X)EAh˜(Y)=Eh(X)EAh(Y). We therefore, without loss of generality, consider only those hMdisc,2(C) such that h(0)=0.

To prove Theorem 1, we bound the right-hand side of (7) with the help of the following two ingredients. The first ingredient is a rate-conservation law for {Y(t)}, proved in Appendix A.

Lemma 2.

Given fC2(R+b+1), define

GYf(x)=(β(x1+x2))x1f(x)x2x2f(x)+2x12f(x),xR+b+1.(8)

If E|f(Y)|< and E|GYf(Y)|<, and if Y(0) is initialized according to Y, then

EGYf(Y)+E(01(x1f(Y(s))+x2f(Y(s)))1(Y1(s)=0)dU(s))=0.(9)

The second ingredient is the Poisson equation. For h:δNb+1R and cR, let

fh(c)(xq)=c+0(Exqh(X(t))Eh(X))dt,xqS,
which is well defined because the CTMC has a finite state space and is therefore exponentially ergodic. Furthermore, lemma 2 of Braverman (2022) (see also lemma 1 of Barbour 1988) implies that
GXfh(c)(xq)=Eh(X)h(xq),xqS.(10)

Most applications of Stein’s method have c = 0, but we choose c=c=fh(0)(0) and define

fh(xq)=fh(c)(xq)=0(Exqh(X(t))Eh(X))dt0(E0h(X(t))Eh(X))dt=0(Exqh(X(t))E0h(X(t)))dt,xqS.(11)

Our choice of c yields fh(0)=0, which comes in handy later when we need to bound |fh(xq)| in Proposition 1. Going forward, we assume that c=c when referring to (10).

Let us give an informal roadmap for bounding (7), with the formal statement of the bounds left to Proposition 2. We bound (6) by comparing the CTMC and diffusion generators. However, the former is defined only on a subset of R+b+1, which requires the following workaround. Suppose that we are given a set BR+b+1 such that (a) Eh(X)Ah(x)=AGXfh(x) for xB and (b) the probability that YB goes to zero rapidly (we will make this precise) as n. We decompose Eh(X)Ah(x) as

Eh(X)Ah(x)=AGXfh(x)1(xB)+(Eh(X)Ah(x))1(xB)
and take expected values with respect to Y (we will show that these are finite) to get
Eh(X)EAh(Y)=E(AGXfh(Y)1(YB))+E((Eh(X)Ah(Y))1(YB)).

Now extend fh(xq) to δNb+1 by defining fh(xq)=0 for xqδNb+1S and consider Afh(x). Provided that E|Afh(Y)|< and E|GYAfh(Y)|<, we can invoke Lemma 2 with f(x)=Afh(x) there to conclude that

Eh(X)EAh(Y)=E((AGXfh(Y)GYAfh(Y))1(YB))+E((Eh(X)Ah(Y)GYAfh(Y))1(YB))E(01(x1Afh(Y(s))+x2Afh(Y(s)))1(Y1(s)=0)dU(s)),(12)
where Y(0) in the third line is initialized according to Y. We bound the first line by showing that GX and GY are close to one another. The middle term is small because of our choice of B, and the last term can be bounded because the JSQ system exhibits reflecting behavior similar to {Y(t)} at the boundary {xS:x1q=0}. As a final remark, our choice of fh(xq)=0 for xqδNb+1S is made for convenience and is not essential to the proof because the probability that YB shrinks rapidly as n.

To state the following proposition, define k:Rb+1b+1 elementwise by kj(x)=xj/δ. For notational convenience, we also define I={i=(i1,i2,0,,0)Nb+1:0i1,i24}. The following proposition is proved in Section A.2 in Appendix A.

Proposition 2.

If hMdisc,2(C), then Ah(Y), Afh(Y), and GYAfh(Y) are integrable and (12) holds. Furthermore, suppose that n > 16, define

B={(x1,x2,0,,0)R+b+1:x2+x1δ(n/28)=(n/28)/n},
and let
ε1(Y)=(AGXfh(Y)GYAfh(Y))1(YB),ε2(Y)=(Eh(X)Ah(Y)GYAfh(Y))1(YB),ε3(Y)=(x1Afh(Y)+x2Afh(Y))1(YB), and ε4(Y)=(x1Afh(Y)+x2Afh(Y))1(YB).

There exist C(β),C(b,β)>0 independent of h(x) and n such that

|ε1(Y)|C(β)(1+δ1Y2)maxiIa1+a2=2|Δ1a1Δ2a2fh(δ(k(Y)+i))|+C(β)δ2maxiI|Δ13fh(δ(k(Y)+i))|+C(β)δ21(Y1δ)maxiIi1=0|(Δ12(Δ1+Δ2))fh(δ(k(Y)+i))|,|ε2(Y)|C(b,β)1(YB)δ2(1+Y1+Y2)maxiI|fh(δ(k(Y)+i))|,|ε3(Y)|C(β)δ11(YB)(|(Δ1+Δ2)fh(δk(Y)|+maxiIa1+a2=2|Δ1a1Δ2a2fh(δ(k(Y)+i))|),|ε4(Y)|C(β)δ11(YB)maxiI|fh(δ(k(Y)+i))|.

Note that ε1(Y) and ε2(Y) are related to the first and second lines of (12), respectively, whereas ε3(Y) and ε4(Y) are related to the last line there. From the bounds in Proposition 2, we see that the bound on (12) depends on the CTMC through the function fh(xq) and its differences and on the diffusion through the distribution of Y. The differences of fh(xq) are commonly known as Stein factors; the following proposition, proved in Section 3, exhibits the Stein factor bounds we need to prove Theorem 1.

Proposition 1.

There exists C(β,b)>0 such that for any n1 and hMdisc,2(C),

|Δ1a1Δ2a2fh(xq)|C(β,b)δa1+a2(1+x2q)a1+a2,
for all a1,a20 with 1a1+a22, and all xqS with x1qδ(na1),x2qδ(na2), and x3q=0. Furthermore,
|fh(xq)|C(β,b)(1+x2q)(x1q+x2q)/δ,xqS,x3q=0,|Δ13fh(xq)|C(β,b)δ3(1+x2q)3,xqS,x1qδ(n3),x3q=0,
and for all xqS with x1q=0,0x2qδ(n1), and x3q=0,
|(Δ1+Δ2)fh(xq)|C(β,b)δ2(1+x2q)2 and |(Δ12(Δ1+Δ2))fh(xq)|C(β,b)δ3(1+x2q)3.

The last component needed for the proof of Theorem 1 is the following lemma.

Lemma 3.

All moments of Y1 and Y2 are finite. Furthermore, suppose that Y(0) is initialized according to Y. Then for any j > 0,

EY2j+1=(01(Y2(s))j1(Y1(s)=0)dU(s)).(13)

Proof of Lemma 3.

The finiteness of the moments follows from theorem 2.1 of Banerjee and Mukherjee (2019), and (13) is implied by (9) of Lemma 2 with f(y)=y2j+1 there. □

Proof of Theorem 1.

Initialize Y(0) according to Y. Using (12) and the definitions of ε1(Y),,ε4(Y), it follows that

Eh(X)EAh(Y)=Eε1(Y)+Eε2(Y)E(01(ε3(Y(s))+ε4(Y(s)))1(Y1(s)=0)dU(s)).

We argue that |Eh(X)EAh(Y)|C(b,β)δ for any hMdisc,2(C), which implies Theorem 1 when combined with Lemma 1. Because δ(k2(Y)+i2)Y2+4δ for iI, applying the Stein factor bounds in Proposition 1 with the bounds on ε1(Y) and ε2(Y) in Proposition 2 yields

|ε1(Y)|C(b,β)1(YB)δ(1+Y2)3,|ε2(Y)|1(YB)C(β)δ3(1+Y1+Y2)3.(14)

We point out that

δ1C(Y1+Y2), for any YB,(15)
which follows from the facts that Y1+Y2δ(n/28)=δ1/2δ for YB, that δ=1/n, and that n > 16. Combining (14), (15), and the fact that the moments of Yi are finite yields
E|ε1(Y)|+E|ε2(Y)|C(b,β)δE(1+Y1+Y2)7C(b,β)δ.

Furthermore, applying the Stein factor bounds in Proposition 1 to the bounds on ε3(Y) and ε4(Y) in Proposition 2, and using (15), we get

|ε3(Y)+ε4(Y)|C(b,β)1(YB)δ(1+Y2)2+C(b,β)1(YB)δ2(1+Y1+Y2)2C(b,β)δ(1+Y1+Y2)5.

Thus, (13) of Lemma 3 implies that

E(01|ε3(Y(s))+ε4(Y(s))|1(Y1(s)=0)dU(s))C(b,β)δ.

3. Stein Factor Bounds

In this section, we prove Proposition 1. We bound the first-order differences in Section 3.1. This requires the most effort. The second-order differences are bounded at the start of Section 3.2, with Section 3.2.1 showing how they can be used to bound E|h(X)|, which may be of independent interest. Section 3.2.2 contains the third-order bounds, and Section 3.2.3 proves two technical lemmas needed for the second-order bounds.

3.1. First-Order Differences

In this section, we bound

Δifh(xq)=0Exq+δe(i)h(X(t))Exqh(X(t))dt
by coupling two copies of the JSQ model initialized one customer apart. The coupling is introduced in the following lemma, which is stated in terms of the unscaled CTMC {Q(t)}.

Lemma 4.

For 1ib+1, define ΘiQ={(q,q˜)SQ×SQ:qi<n,q˜i=qi+1}. There exists a coupling {Q˜(t)} of {Q(t)} whose transient distribution satisfies

{Q˜(t)|(Q(0),Q˜(0))ΘiQ,Q(0)=q}t0=d{Q(t)|Q(0)=(q+e(i))}.(16)

Furthermore, if (Q(0),Q˜(0))i=1b+1ΘiQ, then

  1. The equality Q˜(t)=Q(t)holds for all timestτC, where τC=inf{t0:Q(t)=Q˜(t)}.

  2. The pair (Q(t),Q˜(t)) belongs to i=1b+1ΘiQ for all times t<τC.

  3. Let V be a unit-mean exponentially distributed random variable independent of {Q(t)}. Then

    τC=dmin{inft0{0t1((Q(s),Q˜(s))Θ1Q)ds=V},inft0{Qb+1(t)=n}}.(17)

Proof of Lemma 4.

Let us construct a joint CTMC {(Q(t),Q˜(t))} by specifying its transitions. For simplicity, we refer to {Q(t)} as system 1 and to {Q˜(t)} as system 2. We think of system 2 as a copy of system 1 but with an additional low-priority customer following a preemptive resume rule. That is, service is interrupted, and the extra customer moves to the back of its buffer when a regular customer joins, even if the low-priority customer is currently in service.

Any state in Θ1Q is one where the low-priority customer is in service. The remaining ΘiQ correspond to states where the low-priority customer is assigned to a server with a total of i customers; Figure 3 contains an example of a states in Θ1Q and Θ3Q. Assuming (Q(0),Q˜(0))=(q,q˜)ΘiQ for some 1ib+1, we now describe the possible transitions of the joint chain.

If i = 1, then the low-priority customer is in service. After a unit-mean exponentially distributed amount of time, the customer leaves system 2 and both systems couple. After coupling, systems 1 and 2 are identical in terms of current and future customers, so they coincide on every sample path. All other transitions of the joint chain are based on the standard transitions of the JSQ model. In other words, a service completion by any of the q1 servers working in system 1 results in a customer departure from both systems.

Figure 4 illustrates the effect of arrivals when (q,q˜)Θ1Q. Namely, when q1n2, a new arrival is assigned to the same idle server in both systems. If a customer arrives when q1=n1, then system 1 has only one idle server and system 2 has none. In system 1, that customer will be assigned to the last remaining idle server. Recall that when defining our JSQ model, we allowed for an arbitrary tie-breaking decision in routing arrivals. Therefore, in system 2, we assign that customer to the server working on the low-priority customer, causing a service preemption and pushing the low-priority customer to the back of the buffer. An arrival when q1=n1 transitions the joint chain from Θ1Q to Θ2Q.

If 2ib, then the low-priority customer is in the back of some server’s buffer. A service completion by any of the q1 servers working in system 1 results in a customer departure from both systems. If, however, the service completion happens at the server containing the low-priority customer, then the chain transitions from ΘiQ to Θi1Q because the low-priority customer is now assigned to a server with i – 1 customers; see Figure 5 for a depiction of such a transition. All new arrivals get assigned to the same server in each system. Note that if an arrival happens when qi=n1 and q1==qi1=n, then the system transitions from ΘiQ to Θi+1Q.

The final case is when i=b+1. All transitions are identical to the 2ib case, except for a customer arrival to a system where qb+1=n1 and q1==qb=n. In that case, system 1 assigns the customer to the last available slot; but system 2 blocks the customer because it is already full. This transition causes the two systems to couple. Note that our construction immediately implies the three claims in Lemma 4. □

Figure 3. Two Possible States of the Joint Chain (Q(t),Q˜(t)) Are Depicted
Notes. The red circles correspond to customers in Q(t), while the blue circle is the extra customer in Q˜(t). In the figure on the left, the joint chain is in Θ1Q, meaning the blue customer is in service and will leave the system after an exponentially distributed amount of time, coupling the joint chain. In the figure on the right, the joint chain is in Θ3Q because the blue customer is assigned to a server with a total of three customers.
Figure 4. From Left to Right, the Figures Depict the Arrival of Two Customers
Note. The second arrival results in a transition from Θ1Q to Θ2Q.
Figure 5. From Left to Right, the Server Containing the Blue Customer in Its Buffer Completes Service, Resulting in a Transition from Θ2Q to Θ1Q

Let X˜(t)=(δ(nQ˜1(t)),δQ˜2(t),,δQ˜b+1(t)) be the scaled version of Q˜(t). For any xqS with x1q>0, and any hMdisc,2(C),

|0Exδe(1)h(X(t))Exh(X(t))dt|=|0E(x,xδe(1))(h(X˜(t))h(X(t)))dt||0E(x,xδe(1))(δ1(tτC))dt|=δE(x,xδe(1))τC,(18)
where E(x,xδe(1))(·) denotes the expectation given (X(0),X˜(0))=(x,xδe(1)). The inequality is true because the gap between {X(t)} and {X˜(t)} never increases beyond one customer. The same argument implies that |Δifh(xq)|δE(x,x+δe(i))τC for i2, and we see that bounding the first-order Stein factors amounts to bounding the expected coupling time τC. The following lemma provides the necessary bound. It is worth highlighting that proving this result requires a large amount of effort and JSQ-model-specific insight.

Lemma 5.

For any (q,q˜)i=1b+1ΘiQ,

E(q,q˜)τCC(b,β)(1+δq2).

Before proving the lemma, we note that the first-order bounds in Proposition 1 are a consequence of (18) and Lemma 5; that is,

|Δifh(xq)|C(b,β)δ(1+x2q),i=1,,b+1.(19)

Furthermore, note that for any xqS with x3q=0,

fh(xq)=fh(0)+j1=0x1q/δ1Δ1fh(δj1,0,,0)+j2=0x2q/δ1Δ2fh(x1q,δj2,0,,0).

Recall that fh(0)=0 and that the definition of S implies that δ(j1,j2,0,,0)S for any 0j1x1q/δ and 0j2x2q/δ. Combining these facts with (19) yields

|fh(xq)|C(b,β)(1+x2q)(x1q+x2q)/δ,xqS,x3q=0,(20)
which proves one of the claims from Proposition 1.

We now describe the main idea and introduce several auxiliary lemmas used to prove Lemma 5. Our discussion communicates the main intuition behind the proof, leaving the technical details to Appendix B. Let γ>0 be a constant independent of n whose precise value will be specified later and define

θ1=nnβ/2, and θ2=γn.

Additionally, we define the stopping times

τi(qi)=inf{t0:Qi(t)=qi},qi{0,1,,n},i=1,2.

We now describe a sequence of cycles, or attempts, such that in each cycle, the probability of the joint chain coupling is bounded from below by a constant independent of n. Given an initial state (Q(0),Q˜(0))=(q,q˜) belonging to some ΘiQ, we wait until τ2(θ2), which marks the start of the first cycle. From that point, we wait until min(τ1(θ1),τ2(2θ2)). If τ1(θ1)τ2(2θ2), then we give up trying to couple this cycle and wait until τ2(θ2) to start a fresh cycle. If τ1(θ1)<τ2(2θ2), then there are nβ/2 idle servers and at most 2θ2 nonempty buffers. From such a state, we are guaranteed that coupling happens if the joint CTMC enters Θ1Q and spends an exponentially distributed amount of time there before all servers in {Q(t)} become busy; that is, τC<τ1(n). If τCτ1(n), we give up trying to couple this cycle and wait until τ2(θ2) for the next cycle to restart the coupling attempt. Note that this cycle sequence resembles a renewal sequence, but the new cycle times are not renewal times because the values of Q3(·),,Qb+1(·) can vary at the start of each new cycle.

From our discussion, it follows that coupling is guaranteed in any given cycle if, starting from a state with q2=θ2, the events {τ1(θ1)<τ2(2θ2)} and {τC<τ1(n)} occur. In Appendix B, we derive a lower bound, uniform in n, on the probability of coupling in a given cycle, implying that coupling is guaranteed to happen after a geometrically distributed number of cycles. We also derive an upper bound, uniform in n, on the expected time until the start of the first cycle, as well as the expected cycle duration, and then combine these bounds and prove Lemma 5.

3.2. Higher-Order Bounds

To prove the higher-order bounds, we first use the Poisson equation to write Δ12fh(xq) in terms of h(xq),Eh(X) and first-order differences of fh(xq). With the help of this expression, we use the dynamics of the JSQ model to relate all the second-order differences to each other and prove that

|Δ1a1Δ2a2fh(x1q,x2q,0,,0)|δ2i=1b+1EXi+C(b,β)δ2(1+x2q)2(21)
for a1=2,x1qδ(na1) and x2qδ(na2), followed by a similar bound for |(Δ1+Δ2)fh(0,x2q,0,,0)|. We then bound i=1b+1EXi using the Poisson equation in Section 3.2.1 and bound |Δ13fh(xq)| and |(Δ12(Δ1+Δ2))fh(xq)| in Section 3.2.2. In Section 3.2.3, we prove two technical lemmas needed to establish (21). We also briefly discuss (see Remark 1) the advantage of using the prelimit generator approach and working with finite differences of fh(xq) as opposed to using the classical generator approach and working with the derivatives of the solution to the Poisson equation for the diffusion.

For the following discussion, we assume that xqS with x3q=0. Recall from (5) that

GXfh(xq)=1(q1<n)nλΔ12fh(xqδe(1))+1(q1=n,q2<n)nλ(Δ2+Δ1)fh(xq)+1δ(β(x1q+x2q))Δ1fh(xq)1δx2qΔ2fh(xqδe(2)).(22)

We rearrange the Poisson equation GXfh(xq)=Eh(X)h(xq) to see that when 0<q1<n, or alternatively 0<x1q<δn,

Δ12fh(xqδe(1))=1nλ(Eh(X)h(xq))1nλ1δ(β(x1q+x2q))Δ1fh(xq)+1nλ1δx2qΔ2fh(xqδe(2)).(23)

Note that E|h(X)|CE(X1++Xb+1) because h(0)=0 and hMdisc,2(C). Together with the bound on Δifh(xq) from (19), this implies that

|Δ12fh(xq)|δ2Ci=1b+1EXi+δ2x1q+C(b,β)δ2(1+x2q)2,x1q<δ(n1),x3q=0.(24)

Similarly, if x1q=0,

(Δ2+Δ1)fh(xq)=1nλ(Eh(X)h(xq))1nλ1δ(βx2q)Δ1fh(xq)+1nλ1δx2qΔ2fh(xqδe(2)),(25)
and therefore
|(Δ2+Δ1)fh(0,x2,0,,0)|δ2Ci=1b+1EXi+C(b,β)δ2(1+x2q)2,x2q<δn.(26)

Not all second-order differences can be bounded like this. For example, the equation for Δ22fh(xq) would involve the third-order difference Δ2Δ12fh(xq), which we have not bounded. Instead, the following lemma relates the remaining second-order differences to Δ12fh(xq) and (Δ2+Δ1)fh(0,x2q,0,,0) using the structure of the JSQ system. The proof is postponed to Section 3.2.3.

Lemma 6.

Fix hMdisc,2(C). Then for any xqS with x3q=0,

|Δ12fh(xq)|Cδ2+max0y2qx2q|Δ12fh(0,y2q,0,,0)|, provided xq+2δe(1)S,|Δ2Δ1fh(xq)|Cδ2+max0y2qx2qj=1,2|Δj2fh(0,y2q,0,,0)|, provided xq+δe(1)+δe(2)S,|Δ22fh(xq)|Cδ2+max0y2qx2qj=1,2|Δj2fh(0,y2q,0,,0)|, provided xq+2δe(2)S.

We see from Lemma 6 that to bound the second-order differences, we only need bounds on |Δ12fh(0,x2q,0,,0)| and |Δ22fh(0,x2q,0,,0)|. The former is bounded in (24); for the latter term, we note that for any xqS with x1q=x3q=0,

|Δ22fh(xq)|=|Δ2fh(xq+δe(2))Δ2fh(xq)|=|(Δ2+Δ1)fh(xq+δe(2))Δ1fh(xq+δe(2))Δ2fh(xq)|=|(Δ2+Δ1)fh(xq+δe(2))+(fh(xq)fh(xq+δe(1)+δe(2)))|δ2Ci=1b+1EXi+C(b,β)δ2(1+x2q)2+|fh(0,x2q,0,,0)fh(δ,x2q+δ,0,,0)|,(27)
where the inequality follows from (26). The following lemma bounds the last term on the right-hand side, implying that |Δ22fh(0,x2q,0,,0)|δ2Ci=1b+1EXi+C(b,β)δ2(1+x2q)2 and, consequently, (21). It is proved in Section 3.2.3.

Lemma 7.

For all n1,

|fh(0,x2q,0,,0)fh(δ,x2q+δ,0,,0)|C(b,β)δ2(1+x2q),0x2q<δn.(28)

3.2.1. Bounding i=1b+1EXi.

The bounds in (21) and (26) do not yet look like the stated bounds in Proposition 1 because the term i=1b+1EXi is present. However, we can bound this expectation using the Poisson equation as follows. Recall that λ=1β/n; let x()=(δ(nnλ),0,,0)=(β+δ(nλnλ),0,,0), and observe that this point is in S. In fact, it is the closest point in S, when rounded up, to the fluid equilibrium of the JSQ system, which happens to be (β,0,,0); see Braverman (2020). From (22) we have

GXfh(x())=nλΔ12fh(x()δe(1))+(nλnλ)Δ1fh(x())=Eh(X)h(x()).

Choosing h(xq)=i=1b+1xiq and noting that h(x())=β+δ(nλnλ) yields

nλΔ12fh(x()δe(1))+(nλnλ)Δ1fh(x())βδ(nλnλ)=i=1b+1EXi.(29)

To bound i=1b+1EXi we need only bound Δ12fh(x()δe(1)) because |Δ1fh(x())|δC(b,β) because of (19). Note that we cannot use (24) for the second-order difference bound because i=1b+1EXi is present on the right-hand side there. Instead, we exploit the structure of the JSQ model to bound Δ12f(x()δe(1)) as follows.

Define τ(x1q)=inft0{X(t)=(x1qδ,0,,0)|X(0)=(x1q,0,,0)}; let (X(t),X˜(t)) be the scaled version of the coupling defined in Lemma 4, and let V be the unit-rate exponentially distributed random variable defined in the same lemma. Fix xq=(x1q,0,,0) with x1q2δ, and suppose X(0)=xq and X˜(0)=xqδe(1). Consider the evolution of (X(t),X˜(t)) for t[0,Vτ(x1q)]. If V<τ(x1q), the two processes couple and become identical. Otherwise, the joint process is in state (xqδe(1),xq2δe(1)). Using the strong Markov property, we conclude that

Δ1fh(xqδe(1))=0Exq[(h(X(t))h(X(t)δe(1)))1(t(Vτ(x1q)))]dt+(Vτ(x1q))Δ1fh(xq2δe(1)).

Choosing x1q=x1(), we see that

Δ1fh(x()δe(1))Δ1fh(x()2δe(1))=0Ex()[(h(X(t))h(X(t)δe(1)))1(t(Vτ(x1())))]dt(V<τ(x1()))Δ1fh(x()2δe(1)).

Choosing h(xq)=i=1b+1xiq and using |Δ1f(x())|δC(b,β), we arrive at

|Δ12fh(x()δe(1))|δEτ(x1())+C(b,β)δ(V<τ(x1())).(30)

The quantities involving τ(x1()) are bounded in the following lemma.

Lemma 8.

There exists a constant C(β)>0 such that for all n1,

Eτ(x1q)C(β)δ, and (Vτ(x1q))C(β)δ, for x1q{x1(),δ,2δ}.(31)

Lemma 8 is proved in Section B.5 in Appendix B. It implies that |Δ12fh(x()δe(1))|C(b,β)δ2, and therefore

i=1b+1EXiC(b,β).(32)

Combining (32) with (21) proves the second-order bounds in Proposition 1.

Before moving on, let us make a few remarks. The bound in (32) implies that the sequence of steady-state distributions {X}n=1 is tight; when combined with process-level convergence of {X(t)} to the diffusion {Y(t)}, tightness can be used to imply convergence of the steady-state distributions via a limit-interchange argument. For an example of this applied to the JSQ model, see Braverman (2020). Alternatively, (32) can be recast into a result about the convergence rate to the mean-field equilibrium.

Let h(x)=|x1++xb+1β|β, noting that hMdisc,1(1) and that h(0)=0; suppose for the sake of exposition that nλ=nλ. One may check that the bound in (30) holds even when hMdisc,1(1), in which case (29) implies that

E|i=1b+1Xiβ|=β+nλΔ12fh(x()δe(1))C(b,β).

If we divide both sides by n to consider the mean-field scaled version of i=1b+1Xi, we get

E|(nQ1)/n+i=2b+1Qi/nβ|C(b,β)/n.

Thus, we recover the 1/n rate of convergence to the mean field equilibrium that one typically obtains using Stein’s method for the mean-field model, like in Ying (2017). The approach used to show tightness in this section can offer an alternative to the one proposed by Ying (2017), but the difficulty of implementing our approach is directly related to the difficulty of obtaining the relevant Stein factor bounds.

As a final remark, in this section we have shown that establishing tightness, or rates of convergence to the mean-field equilibrium, is equivalent to bounding the first- and second-order differences of fh(xq) at a single point near the fluid equilibrium of the CTMC. In contrast, establishing rates of convergence to the diffusion requires bounds on the second- and third-order differences at all points in the support of Y.

3.2.2 Third-Order Bounds.

To bound Δ13fh(xq), we recall (23), which says that for 0<x1q<δn with x3=0,

Δ12fh(xqδe(1))=1nλ(Eh(X)h(xq))1nλ1δ(β(x1q+x2q))Δ1fh(xq)+1nλ1δx2qΔ2fh(xqδe(2)).

Applying Δ1 to both sides yields

Δ13fh(xqδe(1))=1nλΔ1h(xq)1nλ1δ(β(x1q+x2q))Δ12fh(xq)+1nλΔ1fh(xq+δe(1))+1nλ1δx2qΔ1Δ2fh(xqδe(2)),0<x1q<δ(n1).

The bounds on the first- and second-order differences of fh(xq), together with the fact that hMdisc,2(C), imply that

|Δ13fh(xq)|C(b,β)δ3(1+x2q)3,xqS,x1qδ(n3),
which matches the inequality in Proposition 1. The bound on |(Δ12(Δ1+Δ2))fh(xq)| when x1q=0 is proved identically by subtracting (Δ1+Δ2)fh(xq) in (25) from Δ12fh(xq) in (23). This concludes the proof of Proposition 1. □

3.2.3. Proving Lemmas 6 and 7.

To conclude the section, we prove the auxiliary lemmas from Section 3.2.

Proof of Lemma 6.

Our first task is to bound

Δ12fh(xq)=0(Exq+2δe(1)h(X(t))2Exq+δe(1)h(X(t))+Exqh(X(t)))dt.

Note that xqS with xq+2δe(1)S implies that q2q12. Working with the unscaled CTMC, we now construct four processes {Q˜(1)(t)},,{Q˜(4)(t)} defined on the time interval [0,τ1(n)], where

τ1(n)=inft0{Q˜1(1)(t)=n}=inft0{X˜1(1)(t)=0}.(33)

We refer to {Q˜(i)(t)} as the ith process. Process four is a copy of {Q(t)}. Numbers two and three are copies of four but with one extra customer who is assigned to a server with an empty buffer. The extra customer in two is different from the one in three. Lastly, process one is a copy of four but with two extra customers. The extra customers are the same as those in two and three. Figure 6 visualizes the initial condition of the processes.

Let {X˜(1)(t)},,{X˜(4)(t)} be the scaled counterparts of these processes. Note that

Δ12fh(xq)=0(Exq+2δe(1)h(X(t))2Exq+δe(1)h(X(t))+Exqh(X(t)))dt=0EX˜(1)(0)=xq((h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t))))dt.

We refer to the different customers according to their shapes in Figure 6. Define τs and τd to be the service times of the server with the star and diamond customer, respectively. Both are exponentially distributed with unit mean. Setting τm=min{τs,τd,τ1(n)}, we observe that if τm=τs, then

X˜(1)(t)=X˜(3)(t),X˜(2)(t)=X˜(4)(t),tτm;
if τm=τd, then
X˜(1)(t)=X˜(2)(t), and X˜(3)(t)=X˜(4)(t),tτm.

Therefore,

0EX˜(1)(0)=x((h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t))))dt=EX˜(1)(0)=x0τm((h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t))))dt+X˜(1)(0)=x(τm=τ1(n))EX˜(1)(0)=x[Δ12fh(0,X˜2(1)(τ1(n)),0,,0)|τm=τ1(n)].(34)

Because X˜(4)(t)=X˜(3)(t)+δe(1)=X˜(2)(t)+δe(1)=X˜(1)(t)+2δe(1) for 0tτm,

|(h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t)))|=|Δ12h(X˜(1)(t))|Cδ2,
where the last inequality follows from hMdisc,2(C). Combining this with the facts that X˜2(1)(τ1(n))X˜2(1)(0) and ExτmEτs=1, we conclude that the right-hand side of (34) is bounded by Cδ2+|Δ12fh(0,x2q,0,,0)|, which proves the bound on |Δ12fh(xq)|.

The remaining bounds are proved similarly, starting with |Δ2Δ1fh(xq)|. Fix xqS with x3q=0, and consider

Δ2Δ1fh(xq)=(fh(xq+δe(1)+δe(2))fh(xq+δe(1)))(fh(xq+δe(2))fh(xq)).

We again construct a coupling {X˜(1)(t)},,{X˜(4)(t)} corresponding to the four initial states on the right-hand side above. The initial conditions of the unscaled processes are visualized in Figure 7. Our construction yields

Δ2Δ1fh(xq)=0EX˜(1)(0)=xq((h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t))))dt.(35)

Let ν1=inft0{Q˜1(3)(t)=n}. We again let τs and τd be the remaining service time of the server with the star and diamond customer, respectively, and set τm=min{τs,τd,ν1}. Just like we argued before, if τm=τs, then the integrand in (35) is zero after τm. If, however, τm=τd, then

X˜2(1)(τm)=X˜2(2)(τm)=X˜2(3)(τm)=X˜2(4)(τm),X˜1(2)(τm)+2δ=X˜1(1)(τm)+δ=X˜1(4)(τm)+δ=X˜1(3)(τm);
if τm=ν1, then X˜1(i)(τm)=0 for 1i4 and
X˜2(2)(τm)=X˜2(1)(τm)+δ=X˜2(4)(τm)+δ=X˜2(3)(τm)+2δ.

Therefore,

Δ2Δ1fh(xq)=EX˜(1)(0)=xq0τm((h(X˜(4)(t))h(X˜(3)(t)))(h(X˜(2)(t))h(X˜(1)(t))))dt+X˜(1)(0)=xq(τm=τd)Ex[Δ12fh(X˜(2)(τd))|τm=τd]+X˜(1)(0)=xq(τm=ν1)Ex[Δ22fh(0,X˜2(3)(ν1),0,,0)|τm=ν1]Cδ2+|Δ12fh(0,x2q,0,,0)|+|Δ22fh(0,x2q,0,,0)|.(36)

Figure 8 illustrates the coupling needed to bound |Δ22fh(xq)|. The idea of the proof is again to wait until τ1(n) and analyze what could happen if one of the servers containing the star or diamond customer completes service before τ1(n). We leave the details to the reader. □

Figure 6. The Initial State of the Four Systems
Notes. The red customers represent those common to all four systems. The diamond and star are the extra customers.
Figure 7. The Initial State of the Four Systems
Note. The red customers represent those common to all four systems.
Figure 8. The Coupling Needed to Bound |Δ22fh(xq)|
Remark 1.

Let us say a few words on the advantage of using the prelimit generator comparison approach over the classical generator comparison approach. Lemma 6 is proved using a synchronous coupling of four JSQ systems. The four systems are initialized one or two customers apart from one another; because of the discrete state space of the CTMC, all four systems stay one or two customers apart until they couple. Had we used the classical generator comparison approach, we would have needed to carry out a similar analysis by coupling four copies of the diffusion {Y(t)}. However, unlike the JSQ coupling, the four diffusions would not maintain their initial spacing relative to each other because {Y(t)} takes values in a continuous state space. This would further complicate the analysis as we would now need to keep track of the positions of the four diffusions relative to each other.

Proof of Lemma 7.

We want to bound

|fh(0,x2q,0,,0)fh(δ,x2q+δ,0,,0)|=|0(E(0,x2q,0,,0)h(X(t))E(δ,x2q+δ,0,,0)h(X(t)))dt|.

As we are accustomed to doing by now, let us construct a coupling {Q˜(1)(t),Q˜(2)(t)} with

Q˜(1)(0)=(n,q2,0,,0), and Q˜(2)(0)=(n1,q2+1,0,,0).

System two has one less idle server and one more customer waiting in a buffer compared to system one, but the total initial customer count is identical across both systems. The initial condition of both systems is visualized in Figure 9. We assume that the diamond and star customers are independent of each other, that the systems see identical arrivals, and that the rest of the customers are identical across both systems.

Now define τd and τs to be the remaining service times of the server that has the diamond and star customer, respectively; let ν1=inft0{Q˜1(2)(t)=n}; set τm=min{τs,τd,ν1}. If τm=ν1 or τm=τd, then Q˜(1)(t)=dQ˜(2)(t) for tτm. Letting {X˜(i)(t)} be the scaled version of {Q˜(i)(t)}, it follows that

fh(0,x2q,0,,0)fh(δ,x2q+δ,0,,0)=EX˜(1)(0)=(0,x2q,0,,0)0τm(h(X˜(1)(t))h(X˜(2)(t)))dt+X˜(1)(0)=(0,x2q,0,,0)(τm=τs)Exq[Δ2fh(X˜(1)(τs))|τm=τs].

To bound the first term on the right-hand side, note that

|EX˜(1)(0)=(0,x2q,0,,0)0τm(h(X˜(1)(t))h(X˜(2)(t)))dt|CδEX˜(2)(0)=(δ,x2q+δ,0,,0)ν1C(β)δ2.

The first inequality is true because hMdisc,2(C), and the last inequality follows from Lemma 8 with x1q=δ there. Furthermore,

X˜(1)(0)=(0,x2q,0,,0)(τm=τs)|Ex[Δ2fh(X˜(1)(τs))|τm=τs]|X˜(1)(0)=(0,x2q,0,,0)(τs<ν1)C(b,β)δ(1+x2q)C(b,β)δ2(1+x2q).

The first inequality follows from the bound on the first-order difference in (19) together with the fact that X˜2(1)(t)x2q for all t[0,τm]. The second inequality follows by noting that τs is independent of ν1 and using Lemma 8 with x1q=δ,τ(x1q)=ν1, and V=τs there. □

Figure 9. The Initial State of the Two Systems in an Example Where n = 6
Note. The red circles represent customers common to both systems.

4. Conclusion

As stated in the introduction, the Stein factor bounds require the bulk of our efforts. Proving the first-order bounds in Section 3.1 amounts to considering two coupled JSQ systems, initialized with a difference of one customer, and bounding the expected coupling time of this joint chain. We bound the coupling time by considering a sequence of coupling attempts where the probability of coupling in a single attempt is bounded away from zero uniformly in n; the expected interattempt times are also bounded from above, uniformly in n. The coupling time can then be bounded by a sum of a geometrically distributed number of random variables representing the interattempt durations. This renewal-like argument applies more generally to settings where (a) there is a region of the state space where the joint chain is guaranteed to couple provided it spends enough time there and (b) one can control the expected time to reach this region and the probability of coupling in the region before leaving it.

With the first-order Stein factor bounds in hand, the higher-order bounds require less effort. Our proofs of the high-order bounds make heavy use of the transition structure of the JSQ system and, in particular, that Q2(t),,Qb+1(t) increase only at those times when Q1(t)=n. Readers should not be misled into thinking that high-order Stein factor bounds require less effort than first-order bounds for all models. Indeed, in the classical generator comparison approach, high-order bounds require much more effort; see, for example, Mackey and Gorham (2016), Erdogdu et al. (2018), Jin et al. (2022).

Regarding extending our results, we note that Proposition 2, which compares GX to GY, can be easily adjusted to hold for other parameter regimes and load-balancing policies. The main difficulty would be establishing Stein factor bounds. As mentioned in the introduction, Zhao et al. (2021) considered the super-Halfin-Whitt regime (1/2<α<1) and established several hitting-time estimates similar to the ones we use in the proof of Lemma 5 to bound the first-order Stein factors. It may be possible to build on their results and obtain rates of convergence for the super-Halfin-Whitt regime too.

Furthermore, it seems that the sub-Halfin-Whitt regime (0<α<1/2) should present less of a challenge than our own setting. Recall from the discussion in Section 3.1 that coupling of the joint CTMC is guaranteed provided it enters Θ1Q and spends an exponentially distributed amount of time there before all servers become busy. Compared to the Halfin-Whitt regime, the rate at which customers arrive in the sub-Halfin-Whitt regime is much smaller, so the event that all servers are busy should happen less frequently. Indeed, Liu and Ying (2020) showed that the steady-state probability that all servers are busy tends to zero in the sub-Halfin-Whitt regime. Consequently, the Stein factor bounds should be simpler to establish.

Appendix A. Supporting Proofs for Section 2

We first prove Lemma 2 and then introduce the operator A in Section A.1 in Appendix A. Once A is introduced, we prove Proposition 2 in Section A.2 in Appendix A.

Proof of Lemma 2.

Initialize Y(0) according to Y. Because {Y(t)} satisfies (1), for any fC2(R+b+1) with E|f(Y)|<, Itô’s lemma implies that

0=Ef(Y(1))Ef(Y(0))=E01GYf(Y(s))ds+E(01(x1f(Y(s))+x2f(Y(s)))1(Y1(s)=0)dU(s)).(A.1)

If E|GYf(Y)|<, then E01GYf(Y(s))ds=EGYf(Y) follows from the Fubini-Tonelli theorem. □

A.1. The Interpolator A

The operator A discussed in this section is identical to the one introduced in appendix A of Braverman (2022), but we repeat its key properties here as they are needed for the proof of Proposition 2. Consider a one-dimensional function f:δR. We can extend it to R by defining

Af(x)=i=04αk(x)+ik(x)(x)f(δ(k(x)+i)),
where k(x)=x/δ and αk+ik:RR are weights defined for all k and i=0,,4. The function Af(x) is a weighted sum of the five points f(δk(x)),,f(δ(k(x)+4)). We mention the reason for using five points after stating Theorem A.1. Note that if f(x) is defined only on a subset of δ, then Af(x) can still be defined provided that f(δk(x)),,f(δ(k(x)+4)) are defined. Braverman (2022) described how to choose these weights to make Af(x) coincide with f(·) on grid points and also to make it a differentiable function whose derivatives behave like the corresponding finite differences of f(·). The idea can be applied to multidimensional grid-valued functions as well.

The following result is theorem A.1 of Braverman (2022). We use this as an interface that contains the important properties of A without delving into the low-level details behind its construction.

Theorem A.1.

Given a convex set KRd, define

K4={xKδd:δ(k(x)+i)Kδd for all 0i4e};
let Conv(K4) be the convex hull of K4; and, for xRd, define k(x) by kj(x)=xj/δ. There exist weights {αk+ik:RR,k,i=0,1,2,3,4} such that for any f:KδdR, the function
Af(x)=id=04αkd(x)+idkd(x)(xd)i1=04αk1(x)+i1k1(x)(x1)f(δ(k(x)+i))=i1,,id=04(j=1dαkj(x)+ijkj(x)(xj))f(δ(k(x)+i)),xConv(K4)(A.2)
satisfies Af(x)C3(Conv(K4)), where i=(i1,,id) in (A.2). Additionally, Af(x) is infinitely differentiable almost everywhere on Conv(K4),
Af(δk)=f(δk),δkK4,(A.3)
and there exists a constant C(d) > 0 independent of f(·), x, and δ such that
|axaAf(x)|C(d)δa1max0ij4ajj=1,,d|Δ1a1Δdadf(δ(k(x)+i))|,xConv(K4),(A.4)
for 0a13; (A.4) also holds when a1=4 for almost all xConv(K4). Additionally, the weights {αk+ik:RR,k,i=0,1,2,3,4} are degree-7 polynomials in (xδk)/δ whose coefficients do not depend on k or δ. They satisfy
αkk(δk)=1, and αk+ik(δk)=0,k,i=1,2,3,4,(A.5)
i=04αk+ik(x)=1,k,xR,(A.6)
and also the following translational invariance property:
αk+j+ik+j(x+δj)=αk+ik(x),i,j,k,xR.(A.7)

Remark A.1.

The bound in (A.4) holds almost everywhere when a1=4. This bound is the reason we need to use f(δk(x)) and the four points to the right of it (in each dimension). By using more (fewer) points, one can alter the theorem so that (A.4) holds for larger (smaller) values of a1. It is worth noting that to prove the results in this paper, we do not go beyond a1=3.

Going forward, we let A be the operator described in Theorem A.1. Because Af coincides with f on the grid, we refer to A as an interpolator. For the interested reader, A is a degree-7 polynomial spline. From (A.3) we see that A is a linear operator, and (A.6) implies that A applied to a constant simply equals that constant. Before we can prove Proposition 2, we require one more lemma.

Lemma A.1.

In the setting of Theorem A.1, for any kK4 and 1jd,

xjAf(x)|x=δk=δ1(Δj12Δj2+13Δj3)f(δk).(A.8)

Furthermore, there exists some ϵ:Conv(K4)R satisfying

|ϵ(x)|C(d)δ1max0i4ea1=2|Δ1a1Δdadf(δ(k(x)+i))|
such that for any xConv(K4),
xjAf(x)=δ1Δjf(δk(x))+ϵ(x).

Proof of Lemma A.1.

The proof is identical for all indices, so we assume that j = 1. Fix δkK4 and let g(x1)=Af(x1,δk2,,δkd) be a function in x1 only. The form of Af(x) in (A.2), together with (A.5), implies that

x1Af(x)|x=δk=g(δk1).

It follows that g(δk1)=Pk1(δk1), where Pk1(x) is a polynomial defined in (A.1) of Braverman (2022). Furthermore, (A.1) implies that

Pk1(δk1)=δ1(Δ112Δ12+13Δ13)=g(δk1)=δ1(Δ112Δ12+13Δ13)f(δk),
from which (A.8) follows. To prove the second claim of the lemma, we write
xjAf(x)=δ1(Δj12Δj2+13Δj3)f(δk(x))+xjAf(x)xjAf(x)|x=δk(x).

Now |Δj2f(δk(x))|max{|Δ1a1Δdadf(δ(k(x)+i))|:0i4e,a1=2},

|Δj3f(δk(x))|=|Δj2f(δ(k(x)+e(j)))Δj2f(δk(x))|max0i4ea1=2|Δ1a1Δdadf(δ(k(x)+i))|,
and
|xjAf(x)xjAf(x)|x=δk(x)|j=1d|xjδkj(x)||2xjxjAf(ξ)|C(d)δ1max0i4ea1=2|Δ1a1Δdadf(δ(k(x)+i))|,
where ξ is some point between δk(x) and x. The last inequality follows from (A.4) and the fact that |xjδkj(x)|δ. □

Note that some of the bounds in Theorem A.1 and Lemma A.1 have a constant C(d) depending on the dimension d of the function (e.g., (A.4)). In the JSQ model d=b+1, but when proving Proposition 2 in the next section, we can assume that d = 2 because of the following. Given a function f:δNb+1R, we can use (A.2) and (A.5) of Theorem A.1, and the fact that Yi = 0 for i > 2, to see that

Af(Y)=ib+1=04αib+10(0)i3=04αi30(0)i2=04αk2(Y)+i2k2(Y)(Y2)i1=04αk1(Y)+i1k1(Y)(Y1)f(δ(k(Y)+i))=i2=04αk2(Y)+i2k2(Y)(Y2)i1=04αk1(Y)+i1k1(Y)(Y1)f(δ(k1(Y)+i1),δ(k2(Y)+i2),0,,0).

Because kj(Y) depends only on Yj, we see that Af(Y) is actually a bivariate function. In Section A.2, we treat any function of the form Af(Y) as a function of two variables.

A.2. Proving Proposition 2

Fix hMdisc,2(C). We recall from (5) that for xqS,

GXf(xq)=1(q1<n)nλΔ1f(xqδe(1))+nλj=1b1(q1==qj=n,qj+1<n)Δj+1f(xq)+(q1q2)Δ1f(xq)j=2b(qjqj+1)Δjf(xqδe(j))qb+1Δb+1f(xqδe(b+1))
and fh(xq) is the unique solution to the Poisson equation
GXfh(xq)=Eh(X)h(xq),xqS(A.9)
with fh(0)=0. Also recall that we extended fh(xq) by setting fh(xq)=0 for xqδNb+1S and defined
B={(x1,x2,0,,0)R+b+1:x2+x1δ(n/28)=(n/28)/n} and I={i=(i1,i2,0,,0)Nb+1:0i1,i24}.

We first argue that E|Ah(Y)|<,E|Afh(Y)|<, and E|GYAfh(Y)|<, which together imply that (12) holds. The latter two statements follow immediately from the fact that fh(xq), and therefore Afh(x), have compact support. Because hMdisc,2(C), inequality (A.4) of Theorem A.1 implies that Ah(Y) is Lipschitz and therefore, E|Ah(Y)|< because of Lemma 3, which states that the moments of Yi are finite.

Next we argue that AGXfh(x)=Eh(X)Ah(x) for all xB. Given the Poisson Equation (A.9) and the definition of A in (A.2) of Theorem A.1, it suffices to show that δ(k(x)+i)S for all iI. From the definition of SQ in (4) we know that any point qSQ satisfies 0q2q1n. The corresponding points xqS satisfy x1q0,x2q0, and x1q+x2q=δ(nq1)+δq2δn. The latter inequality says that the combined number of idle servers and servers with at least one person waiting in the buffer cannot exceed n. Now, provided that n > 16, any point δk in

BδNb+1={(x1,x2,0,,0)R+b+1:x2+x1δ(n/28)}δNb+1
must satisfy δ(k+i)S for all iI because δ(k1+i1)+δ(k2+i2)δn/2. Finally, recall that
ε1(Y)=(AGXfh(Y)GYAfh(Y))1(YB),ε2(Y)=(Eh(X)Ah(Y)GYAfh(Y))1(YB),ε3(Y)=(x1Afh(Y)+x2Afh(Y))1(YB), and ε4(Y)=(x1Afh(Y)+x2Afh(Y))1(YB).

We bound ε2(Y),ε3(Y), and ε4(Y) in Section A.2.1 and bound ε1(Y) in Section A.2.2.

A.2.1. Bounding ε2(Y) through ε4(Y).

We begin with the bound on

|ε2(Y)||Ah(Y)|1(YB)+1(YB)E|h(X)|+|GYAfh(Y)|1(YB).

The fact that Ah(Y) is Lipschitz, that Ah(0)=h(0)=0, and that hMdisc,2(C) imply that

|Ah(Y)1(YB)|C(Y1+Y2)1(YB) and 1(YB)E|h(X)|1(YB)CE(X1++Xb+1)1(YB)C(b,β),
where the last inequality follows from Inequality (32). To bound the remaining term, we recall (A.4) of Theorem A.1, which says that
|axaAf(Y)|Cδa1maxiI0ij4aj|Δ1a1Δ2a2f(δ(k(Y)+i))|Cδa1maxiI|f(δ(k(Y)+i))|(A.10)
for 1a13. Combined with this bound, the definition of GY in (8) implies that
|GYAfh(Y)|=|(β(Y1+Y2))x1Afh(Y)Y2x2Afh(Y)+2x12Afh(Y)|C(β)δ2(1+Y1+Y2)maxiI|f(δ(k(Y)+i))|.

Combining the bounds on the three terms yields the bound on ε2(Y). Lemma A.1 implies the bound on ε3(Y), and (A.10) implies the bound on ε4(Y).

A.2.2. Bounding ε1(Y).

Bounding ε1(Y) requires more effort. The first thing to note is that the weighted sum representation of AGXfh(Y) is difficult to work with. Our first task is therefore to write it in a form that is more amenable to analysis. To this end, we extend the domain of fh(xq) to allow either the first or second coordinate to take the value δ by defining

f^h(xq)=fh(xq),xqδNb+1,f^h(δ,x2q,,xb+1q)=fh(0,x2q+δ,x3q,,xb+1q),(0,x2q,,xb+1q)δNb+1,f^h(x1q,δ,x3q,,xb+1q)=(1Δ2)fh(x1q,0,x3q,,xb+1q),(x1q,0,x3q,,xb+1q)δNb+1.(A.11)

The form of f^h(xq) is tied to the transition structure of the JSQ model and specifically to the “reflection” that occurs near the boundaries {x1q=0} and {x2q=0}. Furthermore, the definition of A in Theorem A.1 implies that Afh(x)=Af^h(x) for xR+b+1 because f^h=fh on δNb+1. Having defined f^h(xq), we present the following lemma, which is proved in Section A.2.3.

Lemma A.2.

For any xqBδNb+1,

GXfh(xq)=nλ(f^h(xqδe(1))f^h(xq))+(n(x1q+x2q)/δ)(f^h(xq+δe(1))f^h(xq))+1δx2q(f^h(xqδe(2))f^h(xq)).(A.12)

Consequently, for any xB,

AGXfh(x)=nλ(Af^h(xδe(1))Af^h(x))+(n(x1+x2)/δ)(Af^h(x+δe(1))Af^h(x))+1δx2(Af^h(xδe(2))Af^h(x))+ε5(x),(A.13)
where
ε5(x)=i2=04αk2(x)+i2k2(x)(x2)i1=04αk1(x)+i1k1(x)(x1)1δ(δ(k2(x)+i2)x2)×(Δ2f^h(δ(k(x)+ie(2)))+Δ2f^h(δ(k(x)e(2)))+i2=04αk2(x)+i2k2(x)(x2)i1=04αk1(x)+i1k1(x)(x1)1δ(δ(k1(x)+i1+k2(x)+i2)+x1+x2)×(Δ1f^h(δ(k(x)+i))Δ1f^h(δk(x))).(A.14)

We now bound ε1(Y) using Lemma A.2. Applying Taylor expansion to (A.13), we have

AGXfh(Y)=nλ(δx1Af^h(Y)+12δ22x12Af^h(Y)16δ33x13Af^h(ξ1))+(n(Y1+Y2)/δ)(δx1Af^h(Y)+12δ22x12Af^h(Y)+16δ33x13Af^h(ξ2))+1δY2(δx2Af^h(Y)+12δ22x22Af^h(ξ3))+ε5(Y),
where ξ1,ξ2, and ξ3 are points strictly between Yδe(1) and Y, Y and Y+δe(1), and Yδe(2) and Y, respectively. Recall that δ2=1/n,δ(nnλ)=β, and GY from (8), which imply that
AGXfh(Y)GYAf^h(Y)=16δλ3x13Af^h(ξ1)+16δ(1δ(Y1+Y2))3x13Af^h(ξ2)+δY22x22Af^h(ξ3)+ε5(Y).

Note that Af^h(Y)=Afh(Y) because Y0, so GYAf^h(Y)=GYAfh(Y). We now prove the following four bounds, which together imply the bound on ε1(Y):

|16δ(1δ(Y1+Y2))3x13Af^h(ξ2)|Cδ2maxiI|Δ13fh(δ(k(Y)+i))|,(A.15)
|16δλ3x13Af^h(ξ1)|Cδ2maxiI|Δ13fh(δ(k(Y)+i))|+Cδ21(Y1δ)maxiIi1=0|(Δ12(Δ1+Δ2))fh(δ(k(Y)+i))|,(A.16)
|δY22x22Af^h(ξ3)|Cδ1Y2maxiI|Δ22fh(δ(k(Y)+i))|, and (A.17)
|ε5(Y)|Cmaxa1+a2=2iI|Δ1a1Δ2a2fh(δ(k(Y)+i))|.(A.18)

We begin with (A.15). Observe that (1δ(Y1+Y2))(0,1/2) because YB. Furthermore, Y<ξ2<Y+δe(1) implies k(ξ2)=k(Y)0. Combining this with (A.4) of Theorem A.1, we get

|16δ(1δ(Y1+Y2))3x13Af^h(ξ2)|Cδ2maxiI|Δ13f^h(δ(k(Y)+i))|=Cδ2maxiI|Δ13fh(δ(k(Y)+i))|.

We now prove (A.16). As before, Yδe(1)<ξ1<Y implies that k(ξ1)=k(Yδe(1))=k(Y)e(1), so

|16δλ3x13Af^h(ξ1)|Cδ2maxiI|Δ13f^h(δ(k(Y)e(1)+i))|.(A.19)

Now when Y[0,δ) and i1=0, the definition of f^h(xq) in (A.11) implies that

f^h(δ(k(Y)e(1)+i))=f^h(δ,δ(k2(Y)+i2),0,,0)=fh(0,δ(k2(Y)+i2+1),0,,0),
from which we see that Δ1f^h(δ,δ(k2(Y)+i2),0,,0)=Δ2fh(0,δ(k2(Y)+i2),0,,0), and therefore
Δ13f^h(δ,δ(k2(Y)+i2),0,,0)=(Δ12(Δ1+Δ2))fh(0,δ(k2(Y)+i2),0,,0).

Combining this with (A.19) implies (A.16). To prove (A.17), we note that k(ξ3)=k(Y)e(2) because Yδe(2)<ξ3<Y, so

|δY22x22Af^h(ξ3)|Cδ1Y2maxiI|Δ22f^h(δ(k(Y)e(2)+i))|.

The definition of f^h(xq) in (A.11) says that Δ2f^h(Y1,δ,0,,0)=Δ2f^h(Y1,0,,0), so Δ22f^h(Y1,δ,0,,0)=0, implying (A.17). Lastly, we prove (A.18). Theorem A.1 tells us that αkj+ijkj(xj) are degree-7 polynomials in (xjδkj)/δ whose coefficients do not depend on kj or δ, so there exists a constant C > 0 such that |αkj(x)+ijkj(x)(xj)|C for j = 1, 2, so

|i2=04αk2(x)+i2k2(x)(x2)i1=04αk1(x)+i1k1(x)(x1)1δ(δ(k2(x)+i2)x2)×(Δ2f^h(δ(k(x)+ie(2)))+Δ2f^h(δ(k(x)e(2)))|CmaxiI|Δ2f^h(δ(k(x)+ie(2)))Δ2f^h(δ(k(x)e(2))|.

Now

Δ2f^h(δ(k(x)+ie(2)))Δ2f^h(δ(k(x)e(2)))=Δ2f^h(δ(k(x)+ie(2)))Δ2f^h(δ(k(x)+i2e(2)e(2)))+Δ2f^h(δ(k(x)+i2e(2)e(2)))Δ2f^h(δ(k(x)e(2)))=i1=0i11Δ1Δ2f^h(δ(k(x)+i1e(1)+i2e(2)e(2)))+i2=0i21Δ22f^h(δ(k(x)+i2e(2)e(2))),
implying that
CmaxiI|Δ2f^h(δ(k(x)+ie(2)))Δ2f^h(δ(k(x)e(2))|CmaxiI|Δ22f^h(δ(k(Y)e(2)+i))|+CmaxiI|Δ1Δ2f^h(δ(k(Y)e(2)+i))|.

An identical argument allows us to bound the second term on the right-hand side of (A.14), yielding

ε5(Y)CmaxiI|Δ22f^h(δ(k(Y)e(2)+i))|+CmaxiI|Δ1Δ2f^h(δ(k(Y)e(2)+i))|+CmaxiI|Δ12f^h(δ(k(Y)+i))|+CmaxiI|Δ1Δ2f^h(δ(k(Y)+i))|.

Using Δ2f^h(Y1,δ,0,,0)=Δ2f^h(Y1,0,,0) and Δ22f^h(Y1,δ,0,,0)=0, we conclude (A.18).

A.2.3. Proving Lemma A.2

To prove Lemma A.2, we need the following result.

Lemma A.3.

Suppose Pδd and let f,g:PR. Given d, for those k such that δkP and δ(k+)P, we define

F(δk)=g(δk)(f(δ(k+))f(δk)).

Then AF(x) is well defined for those xRd such that δ(k(x)+i)P and δ(k(x)++i)P for all 0i4e, where ki(x)=xi/δ. Furthermore, for all such x,

AF(x)=Ag(x)(Af(x+δ)Af(x))+i1,,id=04(j=1dαkj(x)+ijkj(x)(xj))(g(δ(k(x)+i))Ag(x))×(f(δ(k(x)++i))f(δ(k(x)+i))(f(δ(k(x)+))f(δk(x)))).

Proof of Lemma A.3.

The proof is identical to the proof of proposition 3 of Braverman (2022). □

Proof of Lemma A.2.

First, we prove (A.12). Any xq=BδNb+1 satisfies x2qδ(n/28), or q2n/28. It follows from the definition of GX in (5) that for xqBδNb+1,

GXfh(xq)=1(q1<n)nλΔ1fh(xqδe(1))+1(q1=n)nλΔ2f(xq)+(q1q2)Δ1fh(xq)q2Δ2fh(xqδe(2)).

Note that q1q2=n(nq1)q2=n(x1q+x2q)/δ, and q2=x2q/δ. Although Δ2fh(xqδe(2)) is technically not defined when x2q=0, we adopt the convention that 1(q2=0)q2Δ2fh(xqδe(2))=0. Using the definition of f^(xq) in (A.11), we have

1(q2=0)q2Δ2f(xqδe(2))=0=1(q2=0)q2Δ2f^(xqδe(2)).

Similarly, because q1=n corresponds to x1q=0,

nλ1(q1=n)Δ2f(xq)=1(q1=n)nλΔ1f^(xqδe(1)),
which proves (A.12). To prove (A.13), note that if g(xq)=(nx1qx2q)/δ, then Ag(x)=n(x1q+x2q)/δ. To see why, note that ΔiΔjg(xq)=0 for any i, j, so Theorem A.1 implies that all second-order partial derivatives of Ag(x) are zero. Because Ag(x) is twice continuously differentiable, it must be a linear function, and the only linear function that coincides with g(xq) on the grid is Ag(x)=n(x1q+x2q)/δ. Similarly, if g(xq)=q2=x2q/δ, then Ag(x)=x2/δ. Applying Lemma A.3 to each of the three terms on the right-hand side of (A.12) proves (A.13). □

Appendix B. Supporting Proofs for Section 3

Apart from the short proof of Lemma 8 in Section B.5, this appendix is devoted to the proof of Lemma 5. Going forward, we fix γ=2(17/β+β+1) and recall from Section 3.1 that

θ1=nnβ/2,θ2=γn,τi(qi)=inf{t0:Qi(t)=qi},qi{0,1,,n},i=1,2.

Following the proof roadmap of Lemma 5, we need an upper bound on the expected start of the first cycle and the expected duration of a single cycle. The following two lemmas provide the ingredients for these bounds and are proved in Sections B.1 and B.2, respectively.

Lemma B.1.

For all n1,

maxθ1<q1nq2=θ2,qSQEq(τ2(2θ2)τ1(θ1))C(b,β).(B.1)

Lemma B.2.

For all n1 and qSQ with q2>θ2,

Eqτ2(θ2)C(b,β)(1+δq2)=C(b,β)(1+x2q),qSQ with q2>θ2.

To bound the probability of coupling in a given cycle, we require the following two lemmas.

Lemma B.3.

There exists a constant p1(β)(0,1) such that for all n1,

minθ1<q1nq2=θ2,qSQq(τ1(θ1)<τ2(2θ2))p1(β).

Lemma B.4.

There exists a constant p2(b,β)(0,1) such that for all n1,

min0q1θ10q22θ2qSQ(τC<τ1(n)|Q(0)=q,(Q(0),Q˜(0))i=1b+1ΘiQ)p2(b,β).

Lemmas B.3 and B.4 are proved in Sections B.3 and B.4, respectively.

Proof of Lemma 5.

Throughout the proof, we use C to denote a positive constant that may change from line to line but depends only on β and b. Given any initial condition (q,q˜)i=1b+1ΘiQ,

E(q,q˜)τCC(b,β)(1+δq2).

For convenience, we abuse notation and adopt the convention that

EqτC=maxq˜:(q,q˜)i=1b+1ΘiQE[τC|(Q(0),Q˜(0))=(q,q˜)],qSQ,
but Eq(W)=E(W|Q(0)=q) for any random variable W other than τC. We also assume that every max operator in this proof automatically considers the maximum over all qSQ; that is,
maxq2=θ2EqτC=maxqSQq2=θ2EqτC.

Lemma B.2 implies that for any qSQ,

EqτCEqτ2(θ2)+maxq2=θ2EqτCC(1+δq2)+maxq2=θ2EqτC.(B.2)

We will argue that if p1=p1(β) and p2=p2(b,β) are the constants from Lemmas B.3 and B.4, then

maxθ1q1nq2=θ2EqτCC+(1p1p2)maxq2=θ2EqτC, and (B.3)
max0q1<θ1q2=θ2EqτCC+(1p2)maxq2=θ2EqτC.(B.4)

As a result, choosing p3=max{(1p1p2),(1p2)}(0,1) implies that

maxq2=θ2EqτC=max{max0q1<θ1q2=θ2EqτC,maxθ1q1nq2=θ2EqτC}C+p3maxq2=θ2EqτC,
and therefore maxq2=θ2EqτCC(1p3)1C. Combining this with (B.2) implies the lemma. We now prove (B.3), followed by (B.4). Defining τM=τ2(2θ2)τ1(θ1), we have
maxθ1q1nq2=θ2EqτCmaxθ1q1nq2=θ2EqτM+maxθ1q1nq2=θ2Eq[EQ(τM)τC]C+maxθ1q1nq2=θ2Eq[EQ(τM)τC],(B.5)
where in the second inequality we used (B.1) of Lemma B.2. To bound the right-hand side, let us define the events
E1={τ1(θ1)<τ2(2θ2)}, and E2={τC<τ1(n)},
and their complements E1c and E2c, respectively. Note that if Q2(0)<2θ2, and then the event E1c implies that Q(τM)=(n,2θ2) because Q2(t) increases only at times when Q1(t)=n. Using the law of total probability,
maxθ1q1nq2=θ2Eq[EQ(τM)τC]maxθ1q1nq2=θ2{q(E1c)maxq1=nq2=2θ2EqτC+q(E1)maxq1=θ10q22θ2EqτC}.(B.6)

We note that

maxq1=nq2=2θ2EqτCmaxq1=n0q22θ2EqτCmaxq1=n0q22θ2Eqτ2(θ2)+maxq2=θ2EqτCC+maxq2=θ2EqτC,(B.7)
where we used Lemma B.2 in the last inequality, so
q(E1c)maxq1=nq2=2θ2EqτCq(E1c)(C+maxq2=θ2EqτC).(B.8)

Provided we can show that

q(E1)maxq1=θ10q22θ2EqτCq(E1)(C+(1p2)maxq2=θ2EqτC),(B.9)
we can combine (B.8) and (B.9) with (B.6) to get
maxθ1q1nq2=θ2Eq[EQ(τM)τC]maxθ1q1nq2=θ2{q(E1c)(C+maxq2=θ2EqτC)+q(E1)(C+(1p2)maxq2=θ2EqτC)}=C+(maxq2=θ2EqτC)maxθ1q1nq2=θ2{1p2q(E1)}C+(1p2p1)maxq2=θ2EqτC,
where the last inequality follows from the lower bound on q(E1) in Lemma B.3. Combining this bound with (B.5) proves (B.3). We now prove (B.9). Recall that E2={τC<τ1(n)} and observe that
maxq1=θ10q22θ2EqτCmaxq1=θ10q22θ2Eq([τCτ1(n)]1(E2))+maxq1=θ10q22θ2Eq([τCτ1(n)+EQ(τ1(n))τC]1(E2c))2maxq1=θ10q22θ2Eq[τCτ1(n)]+maxq1=θ10q22θ2(E2c|Q(0)=q,(Q(0),Q˜(0))i=1b+1ΘiQ)Eq[EQ(τ1(n))τC]2maxq1=θ10q22θ2Eq[τCτ1(n)]+(1p2)maxq1=n0q22θ2EqτC,
where in the last inequality we used Lemma B.4 and the fact that Q2(τ1(n))Q2(0) because Q2(t) increases only at times when Q1(t)=n. Applying (B.7) to the right-hand side, we arrive at
maxq1=θ10q22θ2EqτC2maxq1=θ10q22θ2Eq[τCτ1(n)]+C+(1p2)maxq2=θ2EqτC.

To conclude, we argue that

maxq1=θ10q22θ2Eq[τCτ1(n)]b+1.(B.10)

If (Q(0),Q˜(0))Θ1Q, then (Q(t),Q˜(t))Θ1Q for all t[0,τ1(n)] by construction. The joint CTMC couples before τ1(n) if τ1(n)>V, where V is as in (17). If (Q(0),Q˜(0))ΘiQ for i2, coupling will happen before τ1(n) if the joint CTMC transitions to Θ1Q and then spends V time units there, all before τ1(n). From the construction of Q˜(·), we know that the time taken to get from ΘiQ to Θ1Q equals the sum of i – 1 unit-mean exponentially distributed random variables, so the worst case is when i=b+1. Letting Γb+1 represent this sum, it follows that

maxq1=θ10q22θ2Eq[τCτ1(n)]maxq1=θ10q22θ2Eq[Γb+1τ1(n)]E(Γb+1)b+1,
which proves (B.10). Our argument for (B.9) can be repeated to prove (B.4). □

B.1. Proving Lemma B.1

Proof of Lemma B.1.

Define V(xq)=i=1b+1qi and observe that

GXV(xq)=nλ1(qb+1<n)q1,xqS.

Because θ1=nnβ/2, it follows that for any qSQ with θ1<q1n,

GXV(xq)=nλq1nλ(nnβ/2)=βn+nβ/2nβ/2.

Let M > 0, t(M)=min{τ1(θ1),τ2(2θ2),M}, and note that Q1(t)nnβ/2 for tt(M). Dynkin’s formula, for example, lemma 17.2 in Kallenberg (2001), then implies that for any qSQ with θ1<q1n and q2=θ2,

ExqV(X(t(M)))V(xq)=Exq0t(M)GXV(X(s))dsnβ2Exqt(M).

Because Q1(t(M))nnβ/2 and θ1<q1n, it follows that q1Q1(t(M))nβ/2, so

nβ2Exqt(M)V(xq)ExqV(X(t(M)))q1ExqQ1(t(M))+i=2b+1qinβ/2+bθ2,
where in the last inequality we used q2q3qb+1. Dividing both sides by n, and noting that θ2/nγ=2(17/β+β+1), yields Exqt(M)C(b,β). We conclude by taking M and using the monotone convergence theorem. □

B.2. Proving Lemma B.12

Recall that θ2=γn and γ=2(17/β+β+1). In this section, we show that Eqτ2(θ2)C(b,β)(1+δq2) if q2>θ2. Our proof is based on a Lyapunov function characterized by the following proposition, proved in Section B.2.1.

Lemma B.5.

There exists a function V:R+b+1R such that for any n1 and any xqS with x2q2(17/β+β)+δ,

GXV(xq)3/17+δβ(q31(b>1)nλ1(q1=q2=n)).(B.11)

Furthermore, there exists a constant C(β)>0 such that for any n1,

0V(x)C(β)(1+x2),xRb+1 with x22(17/β+β).

Proof of Lemma B.2.

Let V(x) be the function in Lemma B.5, fix X(0)=xqS with x2qδθ2, M > 0, and define τ2M(θ2)=Mτ2(θ2). Dynkin’s formula says that

ExqV(X(τ2M(θ2)))V(xq)=Exq0τ2M(θ2)GXV(X(t))dt.(B.12)

Because X2(t)δθ22(17/β+β)+δ for all t[0,τ2M(θ2)], Lemma B.5 implies that

GXV(X(t))3/17+δβ(Q3(t)1(b>1)nλ1(Q1(t)=Q2(t)=n)),t[0,τ2M(θ2)].

Combining this inequality with (B.12) and that V(X(τ2M(θ2)))0 and V(xq)C(β)x2q yields

317Exq(τ2M(θ2))C(β)x2q+δβExq0τ2M(θ2)(Q3(t)1(b>1)nλ1(Q1(t)=Q2(t)=n))dt.

If b = 1, the lemma follows trivially, so we assume that b > 1. It suffices to show that

Exq0M(Q3(t)1(b>1)nλ1(Q1(t)=Q2(t)=n))dti=3b+1qi
because i=3b+1qibq2. Because Q3(t) is the number of servers with at least two customers in their buffers, it is also the number of customers that are second in line at time t. Thus, 0MQ3(t)dt is the cumulative time spent by customers being second in line. This cumulative time is contributed to by customers already in the system at time t = 0 and by new arrivals after t = 0. Of those customers present in the system at t = 0, the number that are, or could at some point become, second in line is i=3b+1qi, and each will spend at most one unit of time being second in line, in expectation.

Let N be the number of customers in the interval [0,M] that arrive when all servers are busy and all queues have at least one customer in them; that is, Q1(t)=Q2(t)=n. For 1iN, let ξi be the time customer i spends being second in line, even if that customer becomes second in line after time M. We argue that conditioned on {Ni}, each ξi is exponentially distributed with unit mean. Upon entry into the system, if customer i is routed to a busy server with only one other customer waiting in the buffer, then ξi is distributed according to the remaining service time of the server, which is exponentially distributed with unit mean. If the buffer has more than one customer waiting, then ξi equals the service time of the customer two spots ahead of customer i, which is also exponentially distributed with unit mean. Further note that the CTMC can be constructed in such a way that the value of ξi is determined at the instant when customer i enters the system, so

Exq0MQ3(t)dti=3b+1qi+Exqi=1ξi1(Ni)=i=3b+1qi+i=1Exq(ξi|Ni)(Ni)=i=3b+1qi+ExqN.

Let ηi be the time spent by the CTMC in a state with Q1(t)=Q2(t)=n before customer i’s arrival. Because the arrivals to the JSQ system are governed by a rate-nλ Poisson process, the arrival of customer i corresponds to a time when ηi accumulates to equal an exponentially distributed random variable with rate nλ; therefore,

Exq0M1(Q1(t)=Q2(t)=n)dtExqi=1ηi1(Ni)=i=1Exq(ηi|Ni)(Ni)=1nλExqN.

B.2.1. Proving Lemma B.5.

The Lyapunov function in Lemma B.5 is based on the fluid limit of the JSQ system, studied in Braverman (2020). Lemma B.5 was, unfortunately, not proved there; but that paper contains all the necessary ingredients for the proof. We now recall them using notation from Braverman (2020).

Consider the two-dimensional process {(Q1(t)n)/n,Q2(t)/n}. Note that the first coordinate is nonpositive, whereas so far we have been using a nonnegative first coordinate. Section 4.1 of Braverman (2020) described the fluid limit of this process. Letting

Ω={xR2:x10,x20},
the fluid limit is a dynamical system v:R+Ω with initial condition v(0)=xΩ; we write vx(t) to emphasize the relationship on x. Postponing the discussion of the behavior of vx(t), for ,uR with <u define the smoothed indicator function ϕ(,u):R[0,1] by
ϕ(,u)(x)={0,x,(x)2((x)((u+)/2)2(u)+2((u+)/2)(u)),x[,(u+)/2],1(xu)2((xu)((u+)/2u)2(u)2((u+)/2u)(u)),x[(u+)/2,u],1,xu,(B.13)
and let
f(2)(x)=0ϕ(δκ1,δκ2)(vx(t))dt,xΩ,
where δ=1/n and κ1,κ2R are to be determined. The function f(2)(x) appeared in section 5.1 of Braverman (2020), where it was used as a Lyapunov function for the diffusion limit of the JSQ system; that is, the process {Y(t)} in (1). We show that this is also a Lyapunov function for the CTMC. Define
V(x)=f(2)(δx1,δx2),xR+b+1.(B.14)

The following result proved in Section B.2.2 gives us control over the derivatives of V(x).

Lemma B.6.

For any xR+b+1 with x2κ2,

(β(x1+x2))V(x)x1δx2V(x)x2=1, and 1(x1=0)(x1+x2)V(x)=0.(B.15)

Furthermore, if we choose κ1=17/β+β and κ2=2κ1, then for any xR+b+1 with x2κ2, and any x2x2,

2x12V(x)9/17,(B.16)
x2V(x)x2V(0,x2)=1β,2x22V(x)5/17;(B.17)
and there exists a constant C(β) such that 0V(x)C(β)(1+x2).

Proof of Lemma B.5.

Let κ1=17/β+β and κ2=2κ1 and V(x) be the function from Lemma B.6, and recall GX defined in (5). Because V(x) depends only on x1 and x2,

GXV(xq)=1(q1<n)nλ(Δ1V(xqδe(1)))+nλ1(q1=n,q2<n)Δ2V(xq)+(q1q2)Δ1V(xq)+(q2q31(b>1))(Δ2V(xqδe(2))),xqS.

Using Taylor expansion, we get

Δ1V(xqδe(1))=V(xqδe(1))V(xq)=δx1V(xq)+x1qδx1q(u(x1qδ))2x12V(u,x2q)du,Δ1V(xq)=V(x+δe(1))V(xq)=δx1V(xq)+x1qx1q+δ(x1q+δu)2x12V(u,x2q)du;
and a similar expression holds for Δ2V(xq) and Δ2V(xqδe(2)). Therefore,
GXV(xq)=δ(1(q1<n)nλ(q1q2))x1V(xq)+δ(1(q1=n,q2<n)nλq2)x2V(xq)q31(b>1)(Δ2V(xqδe(2)))+ψ(xq),(B.18)
where
ψ(xq)=nλ1(q1<n)x1qδx1q(u(x1qδ))2x12V(u,x2q)du+nλ1(q1=n,q2<n)x2qx2q+δ(x2q+δu)2x22V(x1q,u)du+(q1q2)x1qx1q+δ(x1q+δu)2x12V(u,x2q)du+q2x2qδx2q(u(x2qδ))2x22V(x1q,u)du.

Now suppose x2qκ2+δ. The bounds on the second-order derivatives of V(x) from Lemma B.6, together with the facts that q1q20,q20,δ2nλ1, and δ2qi1, imply that ψ(xq)14/17. Next, we rewrite the first line on the right-hand side of (B.18), for which we note that

λ=1β/n,x1q=δ(nq1),1(q1=n)=1(x1q=0),1(q1<n)=11(x1q=0),1(q1=n,q2<n)=1(x1q=0)1(q1=q2=n),
so
δ(1(q1<n)nλ(q1q2))x1V(xq)+δ(1(q1=n,q2<n)nλq2)x2V(xq)=(β(x1q+x2q))x1V(xq)x2qx2V(x)1(q1=q2=n)δnλx2V(xq)+δnλ1(x1q=0)(x1+x2)V(xq)=11(q1=q2=n)δnλx2V(xq),
where the last equality is due to (B.15) from Lemma B.6. We have thus shown that
GXV(xq)1+14/171(q1=q2=n)δnλx2V(xq)q31(b>1)(Δ2V(xqδe(2))).

Now V(xq)=V(0,n) when q1=q2=n, so (B.17) in Lemma B.6 tells us that V(0,n)=1/β provided that nκ2=2(β/17+β), which we assume, so

1(q1=q2=n)δnλx2V(x)q31(b>1)(Δ2V(xqδe(2)))=1(q1=q2=n)δnλ1β+q31(b>1)x2qδx2qx2V(x1q,u)duδβ(q31(b>1)nλ1(q1=q2=n)),
where the inequality follows from (B.17) in Lemma B.6. □

B.2.2. Proof of Lemma B.6.

Fix κ1=17/β+β and κ2=2κ1. The function f(2)(x) was considered in lemma 8 of Braverman (2020), which tells us that

(βδ+x1x2)x1f(2)(x)x2x2f(2)(x)=1,xΩ with x2>κ2/n,x1f(2)(x)=x2f(2)(x),xΩ with x1=0.

Combining this with

x1V(x)=δx1f(2)(δx1,δx2),x2V(x)=δx2f(2)(δx1,δx2)(B.19)
gives us (B.15). Going forward, we assume that xΩ. Let us bound the derivatives of V(x). On page 1100 of Braverman (2020), it was shown that
2x12f(2)(x)nβ(κ1β)+κ1κ1β4nβ(κ2κ1)=n17+17/β+β17/β4nβ(17/β+β)=5n17,xΩ,
implying the bound on 2V(x)/x12 in (B.16). We now prove (B.17), followed by the bound on V(x). Unfortunately, f(2)(x)/x2 and 2f(2)(x)/x22 are not bounded in Braverman (2020), so we must bound these partial derivatives ourselves.

We write the equation for f(2)(x)/x2 in (B.22), but writing it requires us to introduce some nontrivial objects from Braverman (2020). The first object we need is the family of curves {Γ(κ)Ω}κβ, where Γ(κ) is the graph of the unique fluid-limit trajectory that intersects the x2 axis at the point (0,κ/n). For the purposes of this proof, it suffices to treat Γ(κ) as a two-dimensional geometric object satisfying the following properties:

  1. The set Γ(κ) is a graph of a continuous function; that is, Γ(κ)={(x1,f(x1)} for some continuous function f:R+R+.

  2. The intersection Γ(κ){xΩ:x1=0}=(0,κ/n).

  3. If xΓ(κ) and x1<0, then x2>κ/n.

  4. If κ>κ, then Γ(κ)Γ(κ)= and Γ(κ) lies above Γ(κ).

The first three properties are implied by lemma 5 of Braverman (2020), and the fourth one follows from (39) there. Because Γ(κ) is a graph, sets of the form {x<Γ(κ)},{xΓ(κ)}, etc., are well defined. Let us use Γ(κ1) and Γ(κ2) to partition Ω into the four sets

S0={xΩ:x2κ1/n},S1={xΩ:x2κ1/n,xΓ(κ1)},S2={xΩ:Γ(κ1)xΓ(κ2)},S3={xΩ:xΓ(κ2)}.

The four properties of Γ(κ) are sufficient to argue that S0S1S2S3=Ω and that the interiors of Si and Sj are disjoint when ij; we refer the reader to section C.2 of Braverman (2020) for more details.

The last object we need is the function τ(x), which represents the first time that the fluid limit hits the x2 axis starting from a state x>Γ(β). The precise definition of τ(x) is bulky and involves the Lambert-W function, but we can get by with only a few of its properties. Namely, for any κ>β, lemma 6 of Braverman (2020) introduces a nonnegative function τ:{xΩ:xΓ(κ)}R+ with τ(0,x2)=0, which is differentiable for all x{xΩ:xΓ(κ)} and satisfies

x1τ(x)=eτ(x)x2eτ(x)β/n0,x2τ(x)=τ(x)x1τ(x)0,x{xΩ:xΓ(κ)}.(B.20)

By choosing κ=κ1=17/β+β, we are assured that τ(x) is defined on the set {xΩ:xΓ(κ1)}=S2S3. Item 1 of lemma 6 in Braverman (2020) tells us that τ(x) is tied to Γ(κ) for any κ>β via

x2eτ(x)κ/n,xΓ(κ).(B.21)

We are now ready to bound the derivatives of f(2)(x). Equation (C.9) of Braverman (2020) tells us that

x2f(2)(x)={0,xS0,1x2ϕ(x2),xS1,1x2(ϕ(x2)ϕ(x2eτ(x)))+ϕ(x2eτ(x))nβeτ(x)(τ(x)+1),xS2,nβeτ(x)(τ(x)+1),xS3,(B.22)
where ϕ(x)=ϕ(δκ1,δκ2)(x) is the smoothed indicator defined in (B.13). By differentiating both sides of (B.13), it is straightforward to check that ϕ(x) is nondecreasing, and
ϕ(x)4δ(κ2κ1)=4n17/β+β.(B.23)

Let us now argue that f(2)(x)/x2n/β for any xΩ. If xS3, this bound is implied by the inequality et(t+1)1 for t0. If xS1, the bound is implied by the fact that ϕ(x2)1 and 1/x2n/κ1n/β. If xS2, we note that ϕ(x2)ϕ(x2eτ(x))0, and 1/x2n/β, meaning that

x2f(2)(x)=1x2(ϕ(x2)ϕ(x2eτ(x)))+ϕ(x2eτ(x))nβeτ(x)(τ(x)+1)nβ(ϕ(x2)ϕ(x2eτ(x)))+ϕ(x2eτ(x))nβ=nβ.

Observe that f(2)(x)/x2=n/β when τ(x)=0, which is true for any xS2S3 with x1=0, implying the claim about V(x)/x2 in (B.17). To conclude the proof, it remains to show 2V(x)/x229/17 by differentiating both sides in (B.22). Note that 2f(2)(x)/x22=0 for xS0. When xS1, we use the bound on ϕ(x) in (B.23), as well as the fact that 1/x2n/β, to see that

2x22f(2)(x)=1x22ϕ(x2)+1x2ϕ(x2)1x2ϕ(x2)nβ4n17/β+β4n17,xS1.

When xS3,

2x22f(2)(x)=nβeτ(x)(τ(x)+1)x2τ(x)+nβeτ(x)x2τ(x)=nβeτ(x)τ(x)x2τ(x).

Using the expression for τ(x)/x2 in (B.20), we see that

2x22f(2)(x)=eτ(x)x2eτ(x)β/nτ2(x)nβeτ(x)n7β(κ2β)n7β(34/β+β)4n17,xS3.(B.24)

The first inequality follows from x2eτ(x)κ2/n because of (B.21) and the fact that t2e2t1/7 for t0. Lastly, we consider the case when xS2, for which we recall that

x2f(2)(x)=1x2(ϕ(x2)ϕ(x2eτ(x)))+ϕ(x2eτ(x))nβeτ(x)(τ(x)+1),xS2.(B.25)

To help organize terms, let g(x2)=x2eτ(x) and note from (B.20) that

g(x2)=eτ(x)(1x2x2τ(x))=eτ(x)(1+x2eτ(x)x2eτ(x)β/nτ(x))=eτ(x)(1+τ(x)+β/nx2eτ(x)β/nτ(x)).

We see that g(x2)0 because τ(x)0 and x2eτ(x)κ1/n for xS2 because of (B.21). Furthermore, because ett1 and et(t+1)1 for t0, we conclude that

0g(x2)1+β/nx2eτ(x)β/n1+βκ1β=1+β217.(B.26)

Let us now differentiate and bound each term on the right-hand side of (B.25) individually. First,

x2(1x2(ϕ(x2)ϕ(x2eτ(x))))=1x22(ϕ(x2)ϕ(x2eτ(x)))+1x2(ϕ(x2)g(x2)ϕ(x2eτ(x)))1x2ϕ(x2)nβ4n17/β+β=4n17.

The first inequality is because ϕ(x) is nondecreasing and g(x2)0, and the second inequality follows from the fact that 1/x2n/β and the bound on ϕ(x) in (B.23). Differentiating the second term in (B.25), we get

x2(ϕ(x2eτ(x))nβeτ(x)(τ(x)+1))=ϕ(x2eτ(x))x2(nβeτ(x)(τ(x)+1))+ϕ(x2eτ(x))g(x2)nβeτ(x)(τ(x)+1).

To bound the first term, we use the fact that ϕ(x)1; we repeat the argument used to prove (B.24) to see that

ϕ(x2eτ(x))x2(nβeτ(x)(τ(x)+1))n7β(κ1β)=n7β(17/β)=n119.

Furthermore, the bounds on ϕ(x) and g(x2) in (B.23) and (B.26), together with the fact that et(t+1)1, imply that

ϕ(x2eτ(x))g(x2)nβeτ(x)(τ(x)+1)4n17/β+β(1+β217)nβ=4nβ17+β217+β217nβ=4n17.

Combining the pieces yields 2f(2)(x)/x229n/17, proving (B.17).

To conclude, we prove that 0V(x)C(β)(1+x2) for x2κ2 by proving that 0f(2)(x)C(β)(1+nx2) for x2κ2/n. The form of f(2)(x) below can be found in lemma B.1 of Braverman (2020):

f(2)(x)={0,   xS0,0log(nx2/κ1)ϕ(x2et)dt,x2κ2/n,xS1,log(nx2/κ2)+0log(κ2/κ1)ϕ(κ2net)dt,x2κ2/n,xS1,0τ(x)ϕ(x2et)dt+nβκ1/nx2eτ(x)ϕ(t)dt,x2κ2/n,xS2,log(nx2/κ2)+log(nx2/κ2)τ(x)ϕ(x2et)dt+nβκ1/nx2eτ(x)ϕ(t)dt, x2κ2/n,xS2,τ(x)+x2eτ(x)κ2/nβ/n+nβκ1/nκ2/nϕ(t)dt,   xS3.

The fact that f(2)(x)0 follows from ϕ(x),τ(x)0, the definitions of S1, S2, and S3, and (B.21). We combine all the cases above into the single upper bound

f(2)(x)log(nx2/κ1)1(xS1S2S3)+log(κ2/κ1)+τ(x)1(xS2S3)+nβ(x2eτ(x)κ1/n)1(xS2)+nβ(x2eτ(x)κ2/n)1(xS3)+κ2κ1β.(B.27)

Using the inequality log(t)1+t for t0, and the fact that κ1=17/β+β and κ2=2κ1, we see that log(κ2/κ1)=log(2),(κ2κ1)/β=1+17/β2,

log(nx2/κ1)1(xS1S2S3)1+nx2κ11+nx2β,nβ(x2eτ(x)κ1/n)1(xS2)nx2β, and nβ(x2eτ(x)κ2/n)1(xS3)nx2β.

Furthermore, (B.21) and the definitions of S2 and S3 imply that

τ(x)1(xS2S3)log(x2n/κ1)1+nx2κ1=1+nx2β.

We conclude by combining all of these bounds with (B.27). □

B.3. Proof of Lemma B.3

Assume without loss of generality that Q1(0)=n and Q2(0)=θ2, because starting from (q1,θ2,q3,,qb+1)SQ, a state with q1=n and q2=θ2 must be visited before τ2(2θ2), so

minq2=θ2qSQq(τ1(θ1)<τ2(2θ2))minq2=θ2,q1=nqSQq(τ1(θ1)<τ2(2θ2)).

We bound the right-hand side by relating it to the ruin probability in a certain gambler’s ruin problem. Namely, we construct a random walk {R¯(t)} with R¯(0)=0 that satisfies

minq2=θ2,q1=nqSQq(τ1(θ1)<τ2(2θ2))(inft0{R¯(t)=γn}>inft0{R¯(t)=nβ/2}).(B.28)

Jumps in the random walk are governed by a Poisson process with rate nλ+θ13θ2; the up-step and down-step probabilities are

nλnλ+θ13θ2 and θ13θ2nλ+θ13θ2,(B.29)
respectively. Note that we implicitly assume n is large enough so that θ13θ2>0. The right-hand side in (B.28) is therefore the ruin probability in a gambler’s ruin problem with initial wealth nβ/2 and opponent’s wealth nβ/2+γn. A formula for the ruin probability was given by equation (2.4) in Section XIV.2 of Feller (1968):
(inft0{R¯(t)=γn}>inft0{R¯(t)=nβ/2})=11((θ13θ2)/nλ)nβ/21((θ13θ2)/nλ)nβ/2+γn.

Recalling the values of θ1 and θ2 and the fact that γ>β, we see that

θ13θ2nλ=nnβ/23γnnλ=1βn+nβ/2+3γnnλ<1,
and therefore,
limn1((θ13θ2)/nλ)nβ/21((θ13θ2)/nλ)nβ/2+γn=limn1(1βn+nβ/2+3γnnλ)nβ/21(1βn+nβ/2+3γnnλ)nβ/2+γn<1,
implying Lemma B.3. It remains to construct {R¯(t)}.

Recall that Q1(0)=n and Q2(0)=θ2, and let {Q^(t)} be a copy of {Q(t)}, but with the modification that any server with a nonempty buffer permanently halts all its work. Then Q^i(t)Qi(t) for all t0 and all 1ib+1 because this modified system has the same arrival stream as {Q(t)} but serves fewer customers. It follows that

τ1(θ1)=inft0{Q1(t)=θ1}inft0{Q^1(t)=θ1},τ2(2θ2)=inft0{Q2(t)=2θ2}inft0{Q^2(t)=2θ2}, and minq2=θ2,q1=nqSQq(τ1(θ1)<τ2(2θ2))minq2=θ2,q1=nqSQq(inft0{Q^2(t)=2θ2}>inft0{Q^1(t)=θ1}).

Now consider the process

R(t)=Q^1(t)+Q^2(t)Q^1(0)Q^2(0)=Q^1(t)+Q^2(t)(n+θ2).

Note that R(t)Q^1(t)Q^1(0)=Q^1(t)n because Q^2(t) is nondecreasing in t, which implies that

inft0{R(t)=nβ/2}inft0{Q^1(t)=nnβ/2}=inft0{Q^1(t)=θ1}.

Note also that inft0{Q^2(t)=2θ2}=inft0{R(t)=θ2} because Q^2(t) is nondecreasing in t and Q^2(t) increases only when Q^1(t)=n. Hence,

minq2=θ2,q1=nqSQq(inft0{R(t)=θ2}>inft0{R(t)=nβ/2})minq2=θ2,q1=nqSQq(inft0{Q^2(t)=2θ2}>inft0{Q^1(t)=θ1})minq2=θ2,q1=nqSQq(τ1(θ1)<τ2(2θ2)).(B.30)

An arrival to {Q^(t)} increases the value of {R(t)}, and a service completion by a server with an empty buffer decreases its value. However, {R(t)} is still not the random walk we desire because the rate at which it decreases depends on the state of Q^(t). Instead, we want a random walk with a constant downward rate.

To construct this random walk, for 0tinft0{Q^2(t)=2θ2} let us define {Q¯(t)=(Q¯1(t),Q¯2(t))} by setting Q¯(0)=Q^(0) and defining the transitions of the joint process {(Q^(t),Q¯(t))} in Tables B.1B.3. Because we are defining Q¯(t) only until the time Q^2(t) hits 2θ2, we do not need to specify the transitions for states where Q^2(t)>2θ2. The intuition for the transition structure is as follows. Because arrivals occur at the constant rate of nλ, we want any arrival to {Q^(t)} to also occur in {Q¯(t)}. However, we want to keep the rate at which {Q¯(t)} decreases a constant value of θ13θ2. To accomplish this, when Q^1(t)θ1θ2, the transitions in Table B.2 have {Q¯(t)} ignore some departures from {Q^(t)}; when Q^1(t)<θ1θ2, we supplement the departures from {Q^(t)}; for example, see transition #8 in Table B.3. Having defined Q¯(t), let us define

R¯(t)=Q¯1(t)Q¯1(0)+Q¯2(t)Q¯2(0),tinft0{Q^2(t)=2θ2}.

Table

Table B.1. Arrival Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2))

Table B.1. Arrival Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2))

#RateTransition
1nλ1(u^1<n,u¯1<n)((u^1+1,u^2),(u¯1+1,u¯2))
2nλ1(u^1=n,u¯1<n)((u^1,u^2+1),(u¯1+1,u¯2))
3nλ1(u^1<n,u¯1=n)((u^1+1,u^2),(u¯1,u¯2+1))
4nλ1(u^1=n,u¯1=n)((u^1,u^2+1),(u¯1,u¯2+1))
Table

Table B.2. Departure Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2)) with u^22θ2 and u^1θ1θ2

Table B.2. Departure Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2)) with u^22θ2 and u^1θ1θ2

#RateTransition
5θ13θ2((u^11,u^2),(u¯11,u¯2))
6u^1u^2(θ13θ2)((u^11,u^2),(u¯1,u¯2))
Table

Table B.3. Departure Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2)) with u^22θ2 and u^1<θ1θ2

Table B.3. Departure Transitions for the Joint Process in State ((u^1,u^2),(u¯1,u¯2)) with u^22θ2 and u^1<θ1θ2

#RateTransition
7(u^12θ2)1(u^12θ2)((u^11,u^2),(u¯11,u¯2))
8θ1θ2u^12θ2((u^1,u^2),(u¯11,u¯2))
92θ2u^1u^2((u^11,u^2),(u¯1,u¯2))

To prove that {R¯(t)} satisfies (B.28), we show that

R¯(t)R(t) for all times tmin{inft0{R¯(t)=γn},inft0{R¯(t)=nβ/2}}(B.31)
and, as a result,
inft0{R¯(t)=γn}inft0{R(t)=γn},inft0{R¯(t)=nβ/2}inft0{R(t)=nβ/2}.

Together with (B.30), these inequalities imply that

minq2=θ2,q1=nqSQq(inft0{R¯(t)=γn}>inft0{R¯(t)=nβ/2})minq2=θ2,q1=nqSQq(inft0{R(t)=γn}>inft0{R(t)=nβ/2})minq2=θ2,q1=nqSQq(τ1(θ1)<τ2(2θ2)).

To see why (B.31) is true, let us study the transitions in Tables B.1B.3. Table B.1 tells us that R¯(t) and R(t) increase at the same times. The transitions in Table B.2 show that any decrease in Q¯1(t), and consequently R¯(t), must be accompanied by a decrease in Q^1(t) and R(t) but not vice versa. The only way Q¯1(t) can ever drop below Q^1(t) is via transition 8, which can happen only if Q^1(t)<θ1θ2, so the first intersection of Q¯1(t) and Q^1(t) has to occur below θ1θ2. Therefore, R¯(t)R(t) for all times

tmin{inft0{Q¯1(t)=θ1θ2},inft0{Q^2(t)=2θ2}}=min{inft0{Q¯1(t)=θ1θ2},inft0{R(t)=θ2}}.

Let us now prove (B.31) by showing that the right-hand side is greater than

min{inft0{R¯(t)=γn},inft0{R¯(t)=nβ/2}}.

Because R¯(t)R(t),

min{inft0{Q¯1(t)=θ1θ2},inft0{R(t)=θ2}}min{inft0{Q¯1(t)=θ1θ2},inft0{R¯(t)=θ2}}.

Furthermore, because Q¯2(t) is nondecreasing and increases only at those times when Q¯1(t)=n, it follows that for all tinft0{R¯(t)=θ2},

R¯(t)=Q¯1(t)+Q¯2(t)nθ2Q¯1(t)n+θ2,
and therefore
min{inft0{Q¯1(t)=θ1θ2},inft0{R¯(t)=θ2}}=min{inft0{Q¯1(t)=nnβ/2θ2},inft0{R¯(t)=γn}}min{inft0{R¯(t)=nβ/2},inft0{R¯(t)=γn}}.

B.4. Proving Lemma B.4

Central to our argument is a result about the moment-generating function of the duration of a gambler’s ruin game. We now describe this result and then prove Lemma B.4. Consider a discrete-time gambler’s ruin problem where the initial player’s wealth is z, the win probability is p, the loss probability is q, and the player keeps playing until he or she goes broke or accumulates a total wealth of a. Let Dz+ be the number of turns until the game ends, given an initial wealth of z. An expression for the generating function EsDz was given in (4.11) and (4.12) in section XIV.4 of Feller (1968):

EsDz=λ1a(s)λ2z(s)λ1z(s)λ2a(s)λ1a(s)λ2a(s)+λ1z(s)λ2z(s)λ1a(s)λ2a(s),s(0,1),(B.32)
where
λ1(s)=1+14pqs22ps, and λ2(s)=114pqs22ps,s(0,1).

Now consider the continuous-time gambler’s ruin problem, where the durations between turns are governed by an i.i.d. sequence {Ei} of rate r exponentially distributed random variables. Given initial wealth z, the duration of the continuous game equals i=1DzEi. Because the Ei are independent of Dz, it follows that

Eei=1DzEi=E(EeE1)Dz=E(rr+1)Dz,
so Eei=1DzEi is related to (B.32). The following result proved in Section B.4.1 is needed to prove Lemma B.4.

Lemma B.7.

Let i and q2 be integers such that 1ib+1 and 0q2θ2, and define

q(B,i)=nq21nβ/2+nβ/2(b+1).

Consider the continuous-time gambler’s ruin problem with probabilities

p=nλnλ+q(B,i)nβ/2, and q=q(B,i)nβ/2nλ+q(B,i)nβ/2,
rate r=nλ+q(B,i)nβ/2, initial wealth z and terminal wealth a given by
z=nβ/2, and a=nβ/2+nβ/2(b+1),(B.33)
and game duration i=1DzEi. Then
limnmax0q22γnEei=1DzEi<1.(B.34)

Proof of Lemma B.4.

As discussed below (B.10), {τC<τ1(n)}{Γb+1<τ1(n)}, where Γb+1 is the sum of b + 1 unit-mean exponentially distributed random variables. The same discussion says that Γb+1 represents the time needed by the joint CTMC (Q(t),Q˜(t)) to transition from Θb+1Q to Θ1Q and to then couple by spending an exponentially distributed amount of time in Θ1Q. Thus,

min0q1θ10q22θ2qSQ(τC<τ1(n)|Q(0)=q,(Q(0),Q˜(0))i=1b+1ΘiQ)min0q1θ10q22θ2qSQ(Γb+1<τ1(n)|Q(0)=q).

Let us analyze the probability above. At time t = 0, there are q2 servers with nonempty buffers and another server containing the extra customer in {Q˜(t)}. We group these q2+1 servers together into group A and the remaining nq21 servers into group B. Let Q1(A)(t) and Q1(B)(t) be the number of busy group A and B servers, respectively. Because

Q1(A)(0)=q2+1, and Q1(A)(0)+Q1(B)(0)=Q1(0)nnβ/2,
it follows that Q1(B)(0)nq21nβ/2. We are implicitly assuming that n is large enough so nq21nβ/20. Note that the buffer of any group B server is empty for all tτ1(n).

If a customer arrives when more than one server is idle, we prioritize assigning this customer to servers in group B over group A. Note that this tie-breaking rule is consistent with the tie-breaking rule we imposed in the proof of Lemma 4. Let τB=inft0{Q1(B)(t)=nq21} be the first time that all servers in group B are busy. By construction, τBτ1(n), so

min0q1θ10q22θ2qSQ(Γb+1<τ1(n)|Q(0)=q)min0q22θ20q(B)nq21nβ/2(Γb+1<τB|Q1(B)(0)=q(B))min0q22γn(Γb+1<τB|Q1(B)(0)=nq21nβ/2).

The last inequality is true because increasing the value of the initial condition Q1(B)(0) does not increase the chance that Γb+1<τB. We now relate the right-hand side to the moment-generating function considered in Lemma B.7 and use that lemma to conclude the proof. We can write Γb+1=i=1b+1Gi, where Gi are i.i.d. unit-mean exponentially distributed random variables independent of Q1(B)(t) for t[0,τB] because they correspond to service times of the server containing the additional customer in {Q˜(t)}, which is a server in group A.

Fixing 0q22θ2 and Q1(B)(0)=nq21nβ/2, for 0ib+1, we define

q(B,i)=nq21nβ/2+nβ/2ib+1, and τB,i=inft0{Q1(B)(t)Q1(B)(0)=nβ/2ib+1}=inft0{Q1(B)(t)=q(B,i)}
and note that τB=τB,b+1. We are guaranteed that Γb+1<τB if for each 1ib+1, the exponentially distributed Gi is smaller than the time it takes for Q1(B)(t) to reach q(B,i) if started from q(B,i1), so
(Γb+1<τB|Q1(B)(0)=nq21nβ/2)i=1b+1(Gi<τB,i|Q1(B)(0)=q(B,i1)).

We now show that τB,i can be bounded from below by the duration of a gambler’s ruin game, which allows us to apply Lemma B.7. Fix 1ib+1, and consider the time interval t[0,τB,i], on which we construct the coupling {(Q1(B)(t),Q¯1(B)(t))} by setting

Q¯1(B,i)(0)=Q1(B)(0)=q(B,i1)
and defining the transitions of the joint process in Tables B.4 and B.5. We implicitly assume that n is large enough that q(B,i)nβ/2>0.

Note that the only time Q¯1(B,i)(t) decreases but Q1(B)(t) does not is when the latter is smaller than q(B,i1)nβ/2, so we are guaranteed that

Q¯1(B,i)(t)Q1(B)(t), for all tmin{τB,i,inft0{Q¯1(B,i)(t)=q(B,i1)nβ/2}}.(B.35)

Recalling the definitions of τB,i and q(B,i), we have

min{τB,i,inft0{Q¯1(B,i)(t)=q(B,i1)nβ/2}}=min{inft0{Q1(B)(t)=q(B,i)},inft0{Q¯1(B,i)(t)=q(B,i1)nβ/2}}=min{inft0{Q1(B)(t)=q(B,i1)+nβ/2/(b+1)},inft0{Q¯1(B,i)(t)=q(B,i1)nβ/2}}min{inft0{Q¯1(B,i)(t)=q(B,i1)+nβ/2/(b+1)},inft0{Q¯1(B,i)(t)=q(B,i1)nβ/2}},
where the last inequality follows from (B.35). Let τ¯B,i equal the right-hand side and note that
τ¯B,i=inft0{(Q¯1(B,i)(t)Q¯1(B,i)(0)){nβ/2,nβ/2/(b+1)}}
because Q¯1(B,i)(0)=q(B,i1). Because τ¯B,iτB,i, it follows that
min0q22γni=1b+1(Gi<τB,i|Q1(B)(0)=q(B,i1))min0q22γni=1b+1(Gi<τ¯B,i|Q1(B)(0)=q(B,i1)).

Recall that Gi corresponds to the service time of a group A server and is therefore independent of τ¯B,i. Furthermore, because Gi is exponentially distributed with unit mean, conditioning on the value of τ¯B,i yields

min0q22γn(Gi<τ¯B,i|Q¯1(B,i)(0)=q(B,i1))=min0q22γn(1E(eτ¯B,i|Q¯1(B,i)(0)=q(B,i1)))=1max0q22γnE(eτ¯B,i|Q¯1(B,i)(0)=q(B,i1)).

Applying (B.34) of Lemma B.7 concludes because our construction of {Q¯1(B,i)(t)} implies that τ¯B,i is the duration of a gambler’s ruin game with initial wealth z=nβ/2, terminal wealth a=nβ/2+nβ/2/(b+1), rate nλ+q(B,i)nβ/2, and up-step and down-step probabilities

nλnλ+q(B,i)nβ/2 and q(B,i)nβ/2nλ+q(B,i)nβ/2.

Table

Table B.4. Transition Rates in State (u,u¯) with uq(B,i1)nβ/2

Table B.4. Transition Rates in State (u,u¯) with uq(B,i1)nβ/2

RateTransition
nλ(u+1,u¯+1)
q(B,i1)nβ/2(u1,u¯1)
u(q(B,i1)nβ/2)(u1,u¯)
Table

Table B.5. Transition Rates in State (u,u¯) with u<q(B,i1)nβ/2

Table B.5. Transition Rates in State (u,u¯) with u<q(B,i1)nβ/2

RateTransition
u(u1,u¯1)
q(B,i1)nβ/2u(u,u¯1)
B.4.1 Proving the Gambler’s Ruin Result.

We require the following auxiliary lemma.

Lemma B.8.

Assume {xnR} is a sequence that converges to x¯. Then

limn(1+xnn)nex¯.

Proof of Lemma B.8.

Let f(x)=ex and fn(x)=(1+xn)n; note that for any n0,

|fn(xn)ex¯||fn(xn)fn(x¯)|+|fn(x¯)ex¯|.

From the mean-value theorem, we know that there exists some cn between xn and x¯ such that

|fn(xn)fn(x¯)||xnx¯|fn(cn)=|xnx¯|(1+cnn)n1.

Because xnx¯, it follows that (1+cn/n)n1(1+2|x¯|/n)n1 for n large enough; therefore,

|fn(xn)ex¯||xnx¯|(1+2|x¯|n)n1+|fn(x¯)ex¯|.

We can make the right-hand side arbitrarily small by increasing n. □

Proof of Lemma B.7.

Recall that Eei=1DzEi=E(r/(r+1))Dz and that

EsDz=λ1a(s)λ2z(s)λ1z(s)λ2a(s)λ1a(s)λ2a(s)+λ1z(s)λ2z(s)λ1a(s)λ2a(s)=λ2z(s)(λ1a(s)1)λ1z(s)(λ2a(s)1)λ1a(s)λ2a(s),
where
λ1(s)=1+14pqs22ps and λ2(s)=114pqs22ps,s(0,1).

Fix s=r/(r+1). To show that limnEsDz<1, we derive expressions for limnλjz(s) and limnλja(s). For notational economy, we let θ3=nβ/2. We can write p and q as

p=nλnλ+q(B,i)θ3=12+12nλ(q(B,i)θ3)nλ+q(B,i)θ3,q=1212nλ(q(B,i)θ3)nλ+q(B,i)θ3,
and
pq=1414(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2.

Let us first consider λ1(s), which satisfies

λ1(s)=(1+1s2+(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2s2)s1(1+nλ(q(B,i)θ3)nλ+q(B,i)θ3)1=(1+1n[n(1s2)+n(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2s2])s1(1+1n[nnλ(q(B,i)θ3)nλ+q(B,i)θ3])1.(B.36)

We now show that the terms inside the square brackets have limits x¯,y¯R as n; that is,

limnn(1s2)+n(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2s2=x¯ and limnnnλ(q(B,i)θ3)nλ+q(B,i)θ3=y¯.(B.37)

Note that limns2=1; recall the definition of r to see that limnr/n=1+λ, so

limnn(1s2)=limnn(2r+1)1+2r+r2=limn(2r+1)/n(1+2r+r2)/n2=limn2r/n=21+λ.

Furthermore, recalling the definition of q(B,i), we have

limnn(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2=limnn(βn+q2+1+2nβ/2nβ/2i1b+1nλ+nq212nβ/2+nβ/2i1b+1)2=(limnq2/ni1b+1β/2λ+1)2.(B.38)

We know that limnq2/n exists because q2 is fixed between zero and 2γn. This proves (B.37). Recall that z=nβ/2 and a=nβ/2+nβ/21b+1. Because r/n1+λ, it follows that

limnsa=limn(11/(r+1))nβ/2+nβ/2/(b+1)=1 and limnsz=limnsnβ/2=1;
combined with (B.36), (B.37), and Lemma B.8, this implies that
limnλ1z(s)=limnλ1nβ/2(s)=exp(x¯β2)exp(y¯β2);limnλ1a(s)=limnλ1nβ/2+nβ/21b+1(s)=exp(x¯β2b+2b+1)exp(y¯β2b+2b+1).

The expressions for limnλ2z(s) and limnλ2a(s) follow similarly. Comparing

λ2(s)=(11s2+(nλ(q(B,i)θ3)nλ+q(B,i)θ3)2s2)s1(1+nλ(q(B,i)θ3)nλ+q(B,i)θ3)1
to the form of λ1(s) in (B.36), we see that we can use (B.37) and Lemma B.8 again to conclude that
limnλ2z(s)=exp(x¯β2)exp(y¯β2), and limnλ2a(s)=exp(x¯β2b+2b+1)exp(y¯β2b+2b+1).

For convenience, we define x=(x¯y¯)β/2 and y=(x¯+y¯)β/2, so that

limnλ1z(s)=ex,limnλ1a(s)=ex(b+2)/(b+1),limnλ2z(s)=ey,limnλ1a(s)=ey(b+2)/(b+1).

It is straightforward to check that x,y>0 using (B.37). Let us now prove that limnEsDz<1. Using the definition of EsDz, we have

limnEsDz=limnλ2z(s)(λ1a(s)1)λ1z(s)(λ2a(s)1)λ1a(s)λ2a(s)=ey(ex(b+2)/(b+1)1)ex(ey(b+2)/(b+1)1)ex(b+2)/(b+1)ey(b+2)/(b+1).

Set c=(b+2)/(b+1). We want to show that for any x,y>0,

ey(exc1)ex(eyc1)<exceyc, or eyexcexeyc<exceyc+eyex.

Rearranging terms, this is equivalent to

exc(ey1)ex(eyc1)<eyc+ey.

Fix y>0 and treat the left-hand side as a function of x. Both sides are equal when x = 0, so it suffices to show that the derivative of the left-hand side with respect to x is negative. Now

x(exc(ey1)ex(eyc1))=cexc(ey1)ex(eyc1).(B.39)

For the right-hand side to be negative, we must have

cex(c1)>1eyc1ey.

Because c=(b+2)/(b+1)>1, the left-hand side is bounded from below by c provided that x0. The right-hand side converges to c as y0, so we must show that the derivative of the right-hand side is negative. Differentiating yields

y1eyc1ey=ceyc(1ey)ey(1eyc)(1ey)2=ey×cey(c1)ceyc1+eyc(1ey)2.

The numerator cey(c1)(c1)ey(c1)1 equals zero when y = 0. Its derivative equals

c(c1)ey(c1)+(c1)2eyc<c(c1)ey(c1)+(c1)2ey(c1)cc1=0,y0,
where the inequality is due to ey1<c/(c1). Therefore, the numerator is strictly negative for y > 0, meaning that (B.39) holds. □

B.5. Proof of Lemma 8

It suffices to show that Eτ(x1q)C(β)δ because

(Vτ(x1q))=0(Vt)dF(t)=0(1et)dF(t)=1Eeτ(x1q)Eτ(x1q),
where F(t) is the distribution function of τ(x1q). Define
τ+(q1)=inft0{Q(t)=(q1+1,0,,0)|Q(0)=(q1,0,,0)},0q1n1,
and note that τ+(q1)=τ(x1q). If we let {πq}qSQ be the stationary distribution of the unscaled CTMC, it follows from (2.11) of Brown and Xia (2001) that
Eτ+(q1)=i=0q1πi,0,,0nλπq1,0,,0.

Letting f(xq)=1(x1qi) and using EGXf(X)=0 yields nλπi,0,,0=(i+1)πi+1,0,,0, which implies that πi,0,,0=π0,,0(nλ)i/i!, so

Eτ+(q1)=k=0q1(nλ)kk!nλ(nλ)q1q1!=q1!(nλ)q1+1k=0q1(nλ)kk!.

Note that x1q=β+δ(nλnλ) is equivalent to q1=nλ. If nλ=0, we observe that the right-hand side equals 1/(nλ), which verifies (31) when x1q=β+δ(nλnλ). If, however, nλ>0, we may use Stirling’s approximation to see that for q1>0,

q1!(nλ)q1+1k=0q1(nλ)kk!3q1q1+1/2eq1(nλ)q1+1k=0q1(nλ)kk!3q1q1+1/2eq1(nλ)q1+1enλ.

Setting q1=nλ proves (31) when x1=β+δ(nλnλ). To prove (31) when x1q=δ and x1q=2δ requires just a little more work. Setting q1=n1,

Eτn1+3(n1)n1/2e(n1)(nλ)nenλ3en1nn(nβn)nenenλ=3en1(1βn)neβn.

To conclude, we need to bound

((1βn)neβ)n=(exp(nlog(1βn)β))n.

Using Taylor expansion,

log(1βn)=βn12(βn)21(1+ξ(β/n))2,
where ξ(β/n)[β/n,0]. Therefore,
(exp(nlog(1βn)β))n=exp(β2/2(1+ξ(β/n))2),
and we conclude that
supn0((1βn)neβ)n<.

The argument when q1=n2 is identical. This proves (31) when x1=δ,2δ. □

References

  • Atar R (2012) A diffusion regime with nondegenerate slowdown. Oper. Res. 60(2):490–500.LinkGoogle Scholar
  • Banerjee S, Mukherjee D (2019) Join-the-shortest queue diffusion limit in Halfin-Whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. 29(2):1262–1309.Google Scholar
  • Banerjee S, Mukherjee D (2020) Join-the-shortest Queue diffusion limit in Halfin-Whitt regime: Sensitivity on the heavy-traffic parameter. Ann. Appl. Probab. 30(1):80–144.Google Scholar
  • Barbour AD (1988) Stein’s method and Poisson process convergence. J. Appl. Probab. 25:175–184.Google Scholar
  • Barbour A (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84(3):297–322.Google Scholar
  • Braverman A (2020) Steady-state analysis of the join the shortest queue model in the Halfin-Whitt regime. Math. Oper. Res. 45(3):1069–1103.LinkGoogle Scholar
  • Braverman A (2022) The prelimit generator comparison approach of Stein’s method. Stochast. Systems 12(2):181–204.LinkGoogle Scholar
  • Braverman A, Dai JG (2017) Stein’s method for steady-state diffusion approximations of M/Ph/n+M systems. Ann. of Appl. Probab. 27(1):550–581.Google Scholar
  • Brown TC, Xia A (2001) Stein’s method and birth-death processes. Ann. Probab. 29(3):1373–1403.Google Scholar
  • Cao P, He S, Huang J, Liu Y (2021) To pool or not to pool: Queueing design for large-scale service systems. Oper. Res. 69(6):1866–1885.LinkGoogle Scholar
  • Erdogdu MA, Mackey L, Shamir O (2018) Global non-convex optimization with discretized diffusions. Preprint, submitted October 29, https://arxiv.org/abs/1810.12361v1.Google Scholar
  • Eryilmaz A, Srikant R (2012) Asymptotically tight steady-state queue length bounds implied by drift conditions. Queueing Systems 72(3-4):311–359.Google Scholar
  • Eschenfeldt P, Gamarnik D (2018) Join the shortest queue with many servers: The heavy-traffic asymptotics. Math. Oper. Res. 43(3):867–886.LinkGoogle Scholar
  • Fang X, Shao QM, Xu L (2018) Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Preprint, submitted January 24, https://arxiv.org/abs/1801.07815.Google Scholar
  • Feller W (1968) An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed. (John Wiley & Sons Inc., New York).Google Scholar
  • Gast N (2017) Expected values estimated via mean-field approximation are 1/n-accurate. Proc. ACM Measurement Anal. Comput. Syst. 1(1):1–26.Google Scholar
  • Gast N, Van Houdt B (2017) A refined mean field approximation. Proc. ACM Measurement Anal. Comput. Syst. 1(2):1–28.Google Scholar
  • Gast N, Bortolussi L, Tribastone M (2019) Size expansions of mean field approximation: Transient and steady-state analysis. Performance Evaluation 129:60–80.Google Scholar
  • Gaunt RE, Walton N (2020) Stein’s method for the single server queue in heavy traffic. Statist. Probab. Lett. 156:108566.Google Scholar
  • Götze F (1991) On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2):724–739.Google Scholar
  • Gupta V, Walton N (2019) Load balancing in the nondegenerate slowdown regime. Oper. Res. 67(1):281–294.LinkGoogle Scholar
  • Gurvich I (2014) Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. 24(6):2527–2559.Google Scholar
  • Hairi LX, Ying L (2021) Beyond scaling: Calculable error bounds of the power-of-two-choices mean-field model in heavy-traffic. Proc. Twenty-Second Internat. Sympos. Theory, Algorithmic Foundations, Protocol Design Mobile Networks Mobile Comput. (Association for Computing Machinery, New York), 1–10.Google Scholar
  • Halfin S, Whitt W (1981) Heavy-traffic limits for queues with many exponential servers. Oper. Res. 29(3):567–588.LinkGoogle Scholar
  • Hurtado-Lange D, Maguluri ST (2022) A load balancing system in the many-server heavy-traffic asymptotics. Queueing Systems Theory Appl. 101(3–4):353–391.Google Scholar
  • Jin X, Pang G, Xu L, Xu X (2022) An approximation to the invariant measure of the limiting diffusion of G/Ph/n+GI queues in the Halfin-Whitt regime and related asymptotics. Preprint, submitted September 15, https://arxiv.org/abs/2209.07361.Google Scholar
  • Kallenberg O (2001) Foundations of Modern Probability, Springer Series in Statistics, Probability and Its Applications, 2nd ed. (Springer, New York).Google Scholar
  • Liu X, Ying L (2019) A simple steady-state analysis of load balancing algorithms in the sub-halfin-whitt regime. SIGMETRICS Perform. Eval. Rev. 46(2):15–17.Google Scholar
  • Liu X, Ying L (2020) Steady-state analysis of load-balancing algorithms in the Sub-Halfin-Whitt regime. J. Appl. Probab. 57(2):578–596.Google Scholar
  • Liu X, Gong K, Ying L (2022) Steady-state analysis of load balancing with coxian-2 distributed service times. Naval Res. Logist. 69(1):57–75.Google Scholar
  • Lu Y (2021) On a stein method based approximation for a two-dimensional Markov chain. Preprint, submitted June 6, https://arxiv.org/abs/2106.03145.Google Scholar
  • Mackey L, Gorham J (2016) Multivariate Stein factors for a class of strongly log-concave distributions. Electronic Comm. Probab. 21:1–14.Google Scholar
  • Mitzenmacher M (2001) The power of two choices in randomized load balancing. IEEE Trans. Parallel Distributed Systems 12(10):1094–1104.Google Scholar
  • Mukherjee D, Borst SC, van Leeuwaarden JSH, Whiting PA (2016) Universality of load balancing schemes on the diffusion scale. J. Appl. Probab. 53(4):1111–1124.Google Scholar
  • Ross N (2011) Fundamentals of Stein’s method. Probab. Surveys 8:210–293.Google Scholar
  • Stein C (1972) Probability theory: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, vol. 2, Proc. Sixth Berkeley Sympos. Math. Statist. Probab. (University of California Press, Berkeley), 583–602.Google Scholar
  • Stolyar AL (2015) Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems 80(4):341–361.Google Scholar
  • van der Boor M, Borst SC, van Leeuwaarden JSH, Mukherjee D (2021) Scalable load balancing in networked systems: A survey of recent advances. Preprint, submitted November 4, https://arxiv.org/abs/1806.05444.Google Scholar
  • Vvedenskaya N, Dobrushin R, Karpelevich F (1996) Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems Inform. Transmission 32(1):15–27.Google Scholar
  • Weber RR (1978) On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15(2):406–413.Google Scholar
  • Winston W (1977) Optimality of the shortest line discipline. J. Appl. Probab. 14(1):181–189.Google Scholar
  • Ying L (2017) Stein’s method for mean field approximations in light and heavy traffic regimes. Proc. ACM Measurement Anal. Comput. Systems 1(1):1–27.Google Scholar
  • Zhao Z, Banerjee S, Mukherjee D (2021) Many-server asymptotics for join-the-shortest queue in the super-Halfin-Whitt scaling window. Preprint submitted May 31, https://arxiv.org/abs/2106.00121.Google Scholar
  • Zhou X, Shroff N (2020a) A note on load balancing in many-server heavy-traffic regime. Preprint, submitted April 20, https://arxiv.org/abs/2004.09574v1.Google Scholar
  • Zhou X, Shroff N (2020b) A note on Stein’s method for heavy-traffic analysis. Preprint, submitted Mar 13, https://arxiv.org/abs/2003.06454v1.Google Scholar