Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models

This paper provides an introduction to the Stein method framework in the context of steady-state diffusion approximations. The framework consists of three components: the Poisson equation and gradient bounds, generator coupling, and moment bounds. Working in the setting of the Erlang-A and Erlang-C models, we prove that both Wasserstein and Kolmogorov distances between the stationary distribution of a normalized customer count process, and that of an appropriately defined diffusion process decrease at a rate of $1/\sqrt{R}$, where $R$ is the offered load. Futhermore, these error bounds are \emph{universal}, valid in any load condition from lightly loaded to heavily loaded.

1. Introduction. In [10], the authors developed a framework based on Stein's method [53,54] to prove the rates of convergence for steadystate diffusion approximations. Using their framework, they proved convergence rates for the steady-state approximation of the M/P h/n + M system, the many server queue with customer abandonment and phase-type service times, in the Halfin-Whitt regime [32]. The framework in [10] is modular and has four components: the Poisson equation and gradient bounds, generator coupling, moment bounds, and state space collapse (SSC). The purpose of this paper is to provide an accessible introduction to the Stein framework, focusing on two simple and yet fundamental systems. They are the M/M/n + M system, known as the Erlang-A system, and M/M/n system, known as the Erlang-C system. The accessibility is due to the fact that both systems can be represented by a one-dimensional continuous time Markov chain (CTMC). In addition, by focusing on these two systems we are able to present some sharp results that serve as benchmarks that future research should aspire to meet.
Stein's method is a powerful method used for studying approximations of probability distributions, and is best known for its ability to establish convergence rates. It has been widely used in probability, statistics, and their wide range of applications such as bioinformatics; see, for example, the survey papers [13,52], the recent book [15] and the references within. Applications of Stein's method always involve some unknown distribution to be approximated, and an approximating distribution. For instance, the first appearance of the method in [53] involved the sum of identically distributed dependent random variables as the unknown, and the normal as approximating distribution. Other approximating distributions include the Poisson [14], binomial [19], and multinomial [46] distributions, just to name a few. For each approximating distribution, one needs to establish separately gradient bounds, also known as Stein factors [4,60], like those in Lemma 3 in Section 3. In this paper, the approximating distribution is the stationary distribution of a diffusion process, and the unknown is the stationary distribution of the CTMC introduced in (1.1) below.
Both Erlang-A and Erlang-C systems have n homogeneous servers that serve customers in a first-come-first-serve manner. Customers arrive according to a Poisson process with rate λ, and customer service times are assumed to be i.i.d. having exponential distribution with mean 1/µ. In the Erlang-A system, each customer has a patience time and when his waiting time in queue exceeds his patience time, he abandons the queue without service; the patience times are assumed to be i.i.d. having exponential distribution with mean 1/α. The Erlang-A system is a special case of the systems studied in [10], where SSC played an important role. The systems in this paper can be represented by a one-dimensional CTMC, meaning that there is no need to invoke SSC. Therefore, this paper illustrates only the first three components of the framework proposed in [10].
We will study the birth-death process X = {X(t), t ≥ 0}, (1.1) where X(t) is the number of customers in the system at time t. In the Erlang-A system, α is assumed to be positive and therefore the mean patience time is finite. This guarantees that the CTMC X is positive recurrent. In the Erlang-C system, α = 0, and in order for the CTMC to be positive recurrent we need to assume that the offered load to the system, defined as R = λ/µ, satisfies (1.2) R < n.
For both Erlang-A and Erlang-C systems, we use X(∞) to denote the random variable having the stationary distribution of X.
Consider the case when α = 0 and (1.2) is satisfied. Set and let Y (∞) denote a continuous random variable on R having density where κ > 0 is a normalizing constant that makes the density integrate to one, Although our choice of notation does not make this explicit, we highlight that the random variable Y (∞) depends on λ, µ, and n, meaning that we are actually dealing with a family of random variables {Y (λ,µ,n) (∞)} (λ,µ,n) . This plays a role in Lemma 3 in Section 3 for example, where we need to know exactly how the gradient bounds depend on λ, µ, and n. The following theorem illustrates the type of result that can be obtained by Stein's method.
Theorem 1. Consider the Erlang-C system (α = 0). For all n ≥ 1, λ > 0, and µ > 0 satisfying 1 ≤ R < n, where The framework developed in [10] was inspired largely by the work of Gurvich in [28] who developed methodologies to prove statements similar to Theorem 1 for a broad class of multidimensional CTMCs. Along the way, he independently rediscovered many of the ideas central to Stein's method in the setting of steady-state diffusion approximations. See [10] for a more detailed discussion of Gurvich's results.
Several points are worth mentioning. First, we note that Theorem 1 is not a limit theorem. Steady-state approximations are usually justified by some kind of limit theorem. That is, one considers a sequence of queueing systems and proves that the corresponding sequence of steady-state distributions converges to some limiting distribution as traffic intensity approaches one, or as the number of servers goes to infinity. In contrast, our theorem holds for any finite parameter choices of λ, n, and µ satisfying (1.2) and R ≥ 1. Second, the error bound in (1.5) is universal, as it does not assume any relationship between λ, n, and µ, other than the stability condition (1.2) and the condition that R ≥ 1. The latter condition is a mere convenience, as Theorem 1 could be restated without it, but the error bound would then contain some terms involving 1/R. One consequence of universality is that the error bound holds when parameters λ, n, and µ fall in one of the following asymptotic regimes: where β > 0 is fixed, while R → ∞. The first two parameter regimes above describe the quality-driven (QD), and quality-and-efficiency-driven (QED) regimes, respectively. The last regime is the nondegenerate-slowdown (NDS) regime, which was studied in [1,62]. Universal approximations were previously studied in [31,59]. Third, as part of the universality of Theorem 1, we see that For a fixed n, let ρ = R/n ↑ 1. One expects that EX(∞) be on the order of 1/(1 − ρ). Conventional heavy-traffic limit theorems often guarantee that the left hand side of (1.6) is at most o(1/ √ 1 − ρ), whereas our error is bounded by a constant regardless of the load condition. This suggests that the diffusion approximation for the Erlang-C system is accurate not only as R → ∞, but also in the heavy-traffic setting when R → n. Table 1 contains some numerical results where we calculate the error on the left side of (1.6). The constant 205 in (1.6) is unlikely to be a sharp upper bound. In this paper we did not focus too much on optimizing the upper bound, as Stein's method is not known for producing sharp constants. From Theorem 1 we know that the first moment ofX(∞) can be approximated universally by the first moment of Y (∞). It is natural to ask what can be said about the approximation of higher moments. We performed some numerical experiments in which we approximate the second and tenth moments ofX(∞) in a system with n = 500. The results are displayed in Table 2. One can see that the approximation errors grow as the offered load R gets closer to n. We will see in Section 6 that this happens because the R E(X(∞)) 2 E(X(∞)) 2 − E(Y (∞)) 2 E(X(∞)) 10 E(X(∞)) 10 Table 2 Approximating the second and tenth moments ofX(∞) with n = 500. The approximation error grows as R approaches n and suggests that the diffusion approximation of higher moments is not universal.
(m − 1)th moment appears in the approximation error of the mth moment. A similar phenomenon was first observed for the M/GI/1 + GI model in Theorem 1 of [30]. Theorem 1 provides rates of convergence under the Wasserstein metric [52]. The Wasserstein metric is one of the most commonly studied metrics in the context of Stein's method. This is because the the space Lip(1) is relatively simple to work with, but is also rich enough so that convergence under the Wasserstein metric implies the convergence in distribution [25]. Another metric commonly studied in problems involving Stein's method is the Kolmogorov metric, which measures the distance between cumulative distribution functions of two random variables. The Kolmogorov distance betweenX(∞) and Y (∞) is Theorems 3 and 4 of Section 2 involve the Kolmogorov metric. A general trend in Stein's method is that establishing convergence rates for the Kolmogorov metric often requires much more effort than establishing rates for the Wasserstein metric, and our problem is no exception. The extra difficulty always comes from the fact that the test functions belonging to the class H K are discontinuous, whereas the ones in Lip(1) are Lipschitz-continuous. In Section 5, we describe how to overcome this difficulty in our model setting.
The first paper to have established convergence rates for steady-state diffusion approximations was [31], which studied the Erlang-A system using an excursion based approach. Their approximation error bounds are universal. Although the authors in [31] did not study the Erlang-C system, their approach appears to be extendable to it as well. However, their method is not readily generalizable to the multi-dimensional setting.
We wish to point out that the results proved in this paper are quite sharp, and proving analogous results in a high dimensional setting is likely much more difficult. For instance, the constants in the error bounds in each theorem of this paper can be recovered explicitly, but this is not true in the problem studied in [10]. Moreover, the result of [10] is restricted to the Halfin-Whitt regime, and is not universal. All of this is because the model considered there is high dimensional.
processes, [11] and the more recent [44] are the most relevant to this work. The former studies one-dimensional birth-death processes, with the focus being that many common distributions such as the Poisson, Binomial, Hypergeometric, Negative Binomial, etc., can be viewed as stationary distributions of a birth-death process. Although the Erlang-A and Erlang-C models are also birth-death processes, the focus in our paper is on how well these models can be approximated by diffusions, e.g. qualitative features of the approximation like the universality in Theorem 1. Diffusion approximations go beyond approximations of birth-death processes, with the real interest lying in cases when a higher-dimensional Markov chain collapses to a onedimensional diffusion, e.g. [18,55,58], or when the diffusion approximation is multi-dimensional [9,33,49,51,63].
In [44], the authors apply Stein's method to one-dimensional diffusions. The motivation is again that many common distributions like the gamma, uniform, beta, etc., happen to be stationary distributions of diffusions. Their chief result is to establish gradient bounds for a very large class of diffusion processes, requiring only the mild condition that the drift of the diffusion be a decreasing function. However, their result cannot be applied here, because it is impossible to say how their gradient bounds depend on the parameters of the diffusion. Detailed knowledge of this dependence is crucial, because we are dealing with a family of approximating distributions; cf. (1.3) and the comments below (1.4).
Outside the diffusion approximation domain, Ying has recently successfully applied Stein's framework to establish error bounds for steady-state mean-field approximations [65,66]. There is one additional recent line of work [6,39,40,41,67] that deserves mention, where the theme is corrected diffusion approximations using asymptotic series expansions. In particular, [41] considers the Erlang-C system and [67] considers the Erlang-A system. In these papers, the authors derive series expansions for various steadystate quantities of interest like the probability of waiting P(X(∞) ≥ n). These types of series expansions are very powerful because they allow one to approximate steady-state quantities of interest within arbitrary precision. However, while accurate, these expansions vary for different performance metrics (e.g. waiting probability, expected queue length), and require nontrivial effort to be derived. They also depend on the choice of parameter regime, e.g. Halfin-Whitt. In contrast, the results provided by the Stein approach can be viewed as more robust because they capture multiple performance metrics and multiple parameter regimes at the same time.
1.2. Notation. For two random variables U and V , define their Wasserstein distance to be It is known, see for example [52], convergence under the Wasserstein metric implies convergence in distribution. When Lip(1) in (1.7) is replaced by the corresponding distance is the Kolmogorov distance, denoted by d K (U, V ). For a, b ∈ R, we use a + , a − , a ∧ b, and a ∨ b to denote max(a, 0), max(−a, 0), min(a, b), and max(a, b), respectively.
The rest of the paper is structured as follows. In Section 2 we state our main results. In Section 3 we present the Stein framework needed to prove our main results, Theorems 1-4, by introducing three ingredients central to our framework. Namely, the Poisson equation and gradient bounds, generator coupling, and moment bounds. In Section 4 we prove Theorem 1, which deals with the Wasserstein metric. The Kolmogorov metric presents an additional challenge, because the test functions are discontinuous. In Section 5 we address this new challenge. In Section 6, we discuss the approximation of higher moments in the Erlang-C model. Proofs of technical lemmas are left to the four appendices.
2. Main results. Recall the offered load R = λ/µ. For notational convenience we define δ > 0 as Let x(∞) be the unique solution to the flow balance equation Here, x(∞) is interpreted as the equilibrium number of customers in the corresponding fluid model, and is the point at which the arrival rate equals the departure rate. The latter is the sum of the service completion rate and the customer abandonment rate with x(∞) customers in the system. One imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 can check that the flow balance equation has a unique solution x(∞) given by By noting that the number of busy servers x(∞) ∧ n equals n minus the number of idle servers (x(∞) − n) − , the equation in (2.1) becomes We note that x(∞) is well-defined even when α = 0, because in that case we always assume that R < n. We consider the CTMC (2.4) and let the random variableX(∞) have its stationary distribution. Define with convention that α is set to be zero in the Erlang-C system. For intuition about the quantity ζ, we note that in the Erlang-C system satisfying (1.2), Thus, −ζ = |ζ| > 0 is precisely the "safety coefficient" in the square-root safety-staffing principle [24, equation (15)]. We point out that the event {X(t) = −ζ} corresponds to the event {X(t) = n}. Throughout this paper, let Y (∞) denote a continuous random variable on R having density where κ > 0 is a normalizing constant that makes the density integrate to one. Note that these definitions are consistent with (1.3) and (1.4).
Theorem 2. Consider the Erlang-A system (α > 0). There exists an increasing function C W : R + → R + such that for all n ≥ 1, λ > 0, µ > 0, and α > 0 satisfying R ≥ 1, Remark 1. The proof of Theorems 1 and 2 uses the same ideas. Therefore, for the sake of brevity, we only give an outline for the proof of Theorem 2 and its ingredients in Appendix C.1, without filling in all the details. It is for this reason that we do not write out the explicit form of C W (α/µ), although it can be obtained from the proof. The same is true for Theorem 4 below.
Given two random variables U and V , [52, Proposition 1.2] implies that when V has a density that is bounded by C > 0, At best, (2.9) and Theorems 1 and 2 imply a convergence rate of √ δ for d K (X(∞), Y (∞)). However, this bound is typically too crude, and the following two theorems show that convergence happens at rate δ. Theorem 3 is proved in Section 5.3. The proof of Theorem 4 is outlined in Appendix C.2.
Theorem 3. Consider the Erlang-C system (α = 0). For all n ≥ 1, λ > 0, and µ > 0 satisfying 1 ≤ R < n, Theorem 4. Consider the Erlang-A system (α > 0). There exists an increasing function C K : R + → R + such that for all n ≥ 1, λ > 0, µ > 0, and α > 0 satisfying R ≥ 1, Theorems 1 and 3 are new, but versions of Theorems 2 and 4 were first proved in the pioneering paper [31] using an excursion based approach. However, our notion of universality in those theorems is stronger than the one in [31], because most of their results require µ and α to be fixed. The only exception is in Appendix C of that paper, where the authors consider the NDS regime with µ = µ(λ) = β √ λ and λ = nµ + β 1 µ for some β > 0 and β 1 ∈ R.
We emphasize that both constants C W and C K are increasing in α/µ. That is, for an Erlang-A system with a higher abandonment rate with respect to its service rate, our error bound becomes larger. The reader may wonder why these constants depend on α/µ, while the constant in the Erlang-C theorems does not depend on anything. Despite our best efforts, we were unable to get rid of the dependency on α/µ. The reason is that the Erlang-C model depends on only three parameters (λ, µ, n), while the Erlang-A model also depends on α. As a result, both the gradient bounds and moment bounds have an extra factor α/µ in the Erlang-A model. For example, compare Lemma 3 in Section 3.5 with Lemma 13 in Appendix B.2.
imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 3. Outline of the Stein Framework. In this section we introduce the main tools needed to prove Theorems 1-4. At this point we do not restrict ourselves to either the Erlang-A or Erlang-C systems, as the framework outlined here is generic and holds for both systems.
The following is an informal outline of the rest of this section. We know thatX(∞) follows the stationary distribution of the CTMCX, and that this CTMC has a generator GX . To Y (∞), we will associate a diffusion process with generator G Y . We will start by fixing a test function h : R → R and deriving the identity We then focus on bounding the right hand side of (3.1), which is easier to handle than the left hand side. This is done by performing a Taylor expansion of GX f h (x) in Section 3.3. To bound the error term from the Taylor expansion, we require bounds on various moments of X (∞) , as well as the derivatives of f h (x). We refer to the former as moment bounds, and the latter as gradient bounds. These are presented in Sections 3.4 and 3.5, respectively.

The Poisson Equation of a Diffusion
Process. The random variable Y (∞) in Theorems 1-4 is well-defined and its density is given in (2.7). It turns out that Y (∞) has the stationary distribution of a diffusion process Y = {Y (t), t ≥ 0}, which we will define shortly. We do not prove this claim in this paper since it is not used anywhere in this paper. Nevertheless, it is helpful to think of Y (∞) in the context of diffusion processes. The diffusion process Y is the one-dimensional piecewise Ornstein-Uhlenbeck (OU) process. Its generator is given by Since the diffusion process Y depends on parameters λ, n, µ, and α in an arbitrary way, there is no appropriate way to talk about the limit of Y (∞) in terms of these parameters. Therefore, we call Y a diffusion model, as opposed to a diffusion limit. Having a diffusion model whose input parameters are directly taken from the corresponding Markov chain model is critical to achieve universal accuracy. In other words, this diffusion model is accurate in any parameter regime, from underloaded, to critically loaded, and to overloaded. Diffusion models, not limits, of queueing networks with a given set of parameters have been advanced in [16,28,30,31,34,35,59].
The main tool we use is known as the Poisson equation. It allows us to say that Y (∞) is a good estimate forX(∞) if the generator of Y behaves similarly to the generator ofX, whereX is defined in (2.4). Let H be a class of functions h : R → R, to be specified shortly. For each function h(x) ∈ H, consider the Poisson equation One may verify by differentiation that for all functions h : R → R satisfying E h(Y (∞)) < ∞, the Poisson equation has a family of solutions of the form where a 1 , a 2 ∈ R are arbitrary constants, and ν(x) is as in (2.7).
In this paper, we take H = Lip(1) when we deal with the Wasserstein metric (Theorems 1 and 2), and we choose H = H K when we deal with the Kolmogorov metric (Theorems 3 and 4). We claim that |Eh(Y (∞))| < ∞. Indeed, when H = H K , this clearly holds. When H = Lip(1), without loss of generality we take h(0) = 0 in (3.3), and use the Lipschitz property of h(x) to see that where the finiteness of E |Y (∞)| will be proved in (B.16).
From (3.3), one has In (3.5),X(∞) has the stationary distribution of the CTMCX, not necessarily defined on the same probability space of Y (∞). Actually,X(∞) in (3.5) can be replaced by any other random variable, although one does not expect the error on the right side to be small if this random variable has no relationship with the diffusion process Y .

Comparing Generators.
To prove Theorems 1-4, we need to bound the right side of (3.5). The CTMCX defined in (2.4) also has a generator. We bound the right side of (3.5) by showing that the diffusion generator in (3.2) is similar to the CTMC generator.
For any k ∈ Z + , we define x = x k = δ(k − x(∞)). Then for any function f : R → R, the generator ofX is given by is the departure rate corresponding to the system having k customers. One may check that The relationship between GX and the stationary distribution ofX is illustrated by the following lemma.
is dominated by a quadratic function), and assume that the CTMCX is positive recurrent. Then E GX f (X(∞)) = 0. Remark 2. We will see in Lemma 3 later this section, in Lemmas 4 and 5 of Section 5, and in Lemma 13 of Appendix B.2 that there is a family of solutions to the Poisson equation (3.3) whose first derivatives grow at most linearly in both the Wasserstein and Kolmgorov settings, meaning that these solutions satisfy the conditions of Lemma 1.
The proof of Lemma 1 is provided in Appendix D.1, and relies on Proposition 3 of [26]. Suppose for now that for any h(x) ∈ H, the solution to the Poisson equation f h (x) satisfies the conditions of Lemma 1. We can apply Lemma 1 to (3.5) to see that While the two random variables on the left side of (3.9) are usually defined on different probability spaces, the two random variables on the right side of (3.9) are both functions ofX(∞). Thus, we have achieved a coupling through Lemma 1. Setting up the Poisson equation is a generic first step one performs any time one wishes to apply Stein's method to a problem. The next step is to bound the equivalent of our EG Y f h (X(∞)) . This is usually done by using a coupling argument. However, this coupling is always problem specific, and is one of the greatest sources of difficulty one encounters when applying Stein's method. In our case, this generator coupling is natural because we deal with Markov processesX and Y .
Since the generator completely characterizes the behavior of a Markov process, it is natural to expect that convergence of generators implies convergence of Markov processes. Indeed, the question of weak convergence was studied in detail, for instance in [20], using the martingale problem of Stroock and Varadhan [57]. However, (3.9) lets us go beyond weak convergence, both because different choices of h(x) lead to different metrics of convergence, and also because the question of convergence rates can be answered. One interpretation of the Stein approach is to view f h (x) as a Lyapunov function that gives us information about h(x). Instead of searching very hard for this Lyapunov function, the Poisson equation (3.3) removes the guesswork. However, this comes at the cost of f h (x) being defined implicitly as the solution to a differential equation.

Taylor Expansion.
To bound the right side of (3.9), we study the For that we perform a Taylor expansion on GX f h (x). To illustrate this, suppose that f ′′ h (x) exists for all x ∈ R, and is absolutely continuous. Then for any k ∈ Z + , and (3.10) As one can see, to show that the right hand side of (3.10) vanishes as δ → 0, we must be able to bound the derivatives of f h (x); we refer to these as gradient bounds. Furthermore, we will also need bounds on moments of X (∞) ; we refer to these as moment bounds. Both moment and gradient bounds will vary between the Erlang-A or Erlang-C setting, and the gradient bounds will be different for the Wasserstein, and Kolmogorov settings. Moment bounds will be discussed shortly, and gradient bounds in the Wasserstein setting will be presented in Section 3.5. We discuss the Kolmogorov setting separately in Section 5. In that case we face an added difficulty because f ′′ h (x) has a discontinuity, and we cannot use (3.10) directly.
3.4. Moment Bounds. The following lemma presents the necessary moment bounds to bound (3.10) in the Erlang-C model, and is proved in Appendix A.1. These moment bounds are used in both the Wasserstein and the Kolmogorov metric settings.
One may wonder why the bounds are separated using the indicators 1{X(∞) ≤ −ζ} and 1{X(∞) ≥ −ζ}. This is related to the drift b(x) appearing in (3.10), and the fact that b(x) takes different forms on the regions x ≤ −ζ and x ≥ −ζ. Furthermore, it may be unclear at this point why both (3.12) and (3.13) are needed, as the left hand side in both bounds is identical. The reason is that (3.12) is an O(1) bound (we think of δ ≤ 1), whereas (3.13) is an O(|ζ|) bound. The latter is only useful when |ζ| is small, but this is nevertheless an essential bound to achieve universal results. As we will see later, it negates 1/ |ζ| terms that appear in (3.10) from f ′′ h (x) and f ′′′ h (x). For the Erlang-A model, we also require moment bounds similar to those stated in Lemma 2. Both the proof, and subsequent usage, of the Erlang-A moment bounds are similar to the proof and subsequent usage of the Erlang-C moment bounds. We therefore delay the precise statement of the Erlang-A bounds until Lemma 11 in Appendix A.2, to avoid distracting the reader with a bulky lemma.  Then for all n ≥ 1, λ > 0, and µ > 0 satisfying 0 < R < n, Remark 3. This lemma validates the Taylor expansion used to obtain Gradient bounds, also known as Stein factors, are central to any application of Stein's method. The problem of gradient bounds for diffusion approximations can be divided into two cases: the one-dimensional case, and the multi-dimensional case. In the former, the Poisson equation in (3.3) is an ordinary differential equation (ODE) corresponding to a one-dimensional diffusion process. In the latter, the Poisson equation is a partial differential equation (PDE) corresponding to a multi-dimensional diffusion process.
The one-dimensional case is simpler, because the explicit form of f h (x) is given to us by (3.4). To bound f ′ h (x) and f ′′ h (x) we can analyze (3.4) directly, as we do in the proof of Lemma 3. This direct analysis can be used as a goto method for one-dimensional diffusions, but fails in the multi-dimensional case, because closed form solutions for PDE's are not typically known. In this case, it helps to exploit the fact that f h (x) satisfies (3.20), one may use coupling arguments to bound finite differences of the form 1 . For examples of coupling arguments, see [3,4,5,11,23]. A related paper to these types of gradient bounds is [56], where the author used a variant of (3.20) for the fluid model of a flexible-server queueing system as a Lyapunov function. As an alternative to coupling, one may combine (3.20) with a-priori Schauder estimates from PDE theory, as was done in [28].
Just like we did with the moment bounds, we delay the Erlang-A gradient bounds to Lemma 13 in Appendix B.2. We are now ready to prove Theorem 1.
4. Proof of Theorem 1. In this section we prove Theorem 1. Fix h(x) ∈ Lip(1), and recall that the family of solutions to the Poisson equation is given by (3.4). For the remainder of Section 4, we fix one such solution f h (x) with a 2 = 0. Then by Lemma 3 , f ′′ h (x) is absolutely continuous, implying that (3.10) holds; we recall it here as The proof of Theorem 1 simply involves applying the moment bounds and gradient bounds to show that the error bound in (4.1) is small.
Proof of Theorem 1. Throughout the proof we assume that R ≥ 1, or equivalently, δ ≤ 1. We bound each of the terms on the right side of (4.1) individually. We recall here that the support ofX(∞) is a δ-spaced grid, and in particular this grid contains the point −ζ. In the bounds that follow, we will often consider separately the cases whereX(∞) ≤ −ζ − δ, and apply the moment bounds (3.12), (3.13), and the gradient bound (3.18), to see that Next, we use (3.15) and the gradient bound in (3.19) to get By a similar argument, we can show that with the only difference in the argument being that we consider the cases Lastly, we use the form of b(x), the moment bounds (3.12), (3.13), and (3.16), and the gradient bound (3.19) to get Hence, from (3.10) we conclude that for all R ≥ 1, and h(x) ∈ Lip (1), which proves Theorem 1.

The Kolmogorov Metric.
In this section we prove Theorem 3, which is stated in the Kolmogorov setting. The biggest difference between the Wasserstein and Kolmogorov settings is that in the latter, the test functions h(x) used in the Poisson equation (3.3) are discontinuous. For this reason, the gradient bounds from Lemma 3 and Lemma 13 in Appendix B.2 do not hold anymore, and new gradient bounds need to be derived separately for the Kolmogorov setting; we present these new gradient bounds in Section 5.1. Furthermore, the solution to the Poisson equation no longer has a continuous second derivative, meaning that the Taylor expansion we used to derive the upper bound in (3.10) is invalid. We discuss an alternative to (3.10) in Section 5.2. This alternative bound contains a new error term that cannot be handled by the gradient bounds, nor the moment bounds. This term appears because the solution to the Poisson equation has a discontinuous second derivative, and to bound it we present Lemma 6. We then prove Theorem 3 in Section 5.3.

Kolmogorov Gradient Bounds.
Recall that in the Kolmogorov setting, we take the class of test functions for the Poisson equation (3.3) to be H K defined in (1.8). For the statement of the following two lemmas, we fix a ∈ R and set h(x) = 1 (−∞,a] (x). We use f a (x) instead of f h (x) to denote a solution to the Poisson equation. We recall that the family of solutions to the Poisson equation is parametrized by constants a 1 , a 2 ∈ R. The following lemmas state the gradient bounds in the Kolmogorov setting.
Lemma 4. Consider the Erlang-C model (α = 0). Any solution to the Poisson equation f a (x) is continuously differentiable, with an absolutely continuous derivative. Fix a solution in (3.4) with parameter a 2 = 0. Then for all n ≥ 1, λ > 0, and µ > 0 satisfying 0 < R < n, and for all x ∈ R, is understood to be the left derivative at the point x = a.
Lemma 5. Consider the Erlang-A model (α > 0). Any solution to the Poisson equation f a (x) is continuously differentiable, with an absolutely continuous derivative. Fix a solution in (3.4) with parameter a 2 = 0, and fix n ≥ 1, λ > 0, µ > 0, and α > 0. If 0 < R ≤ n (an underloaded system), then and if n ≤ R (an overloaded system), then Moreover, for all λ > 0, n ≥ 1, µ > 0, and α > 0, and all x ∈ R, is understood to be the left derivative at the point x = a.
Lemmas 4 and 5 are proved in Appendix B.3. Unlike the Wasserstein setting, these lemmas do not guarantee that f ′′ a (x) is absolutely continuous. Indeed, for any a ∈ R, substituting h( Thus, we can no longer use the error bound in (3.10), and require a different expansion of GX f a (x).

Alternative Taylor Expansion.
To get an error bound similar to (3.10), we first define Now observe that For k ∈ Z + and x = x k = δ(k − x(∞)), we recall the forms of G Y f a (x) and GX f a (x) from (3.2) and (3.6) to see that where in the second equality we used the fact that b(x) = δ(λ − d(k)), and in the last equality we use that δ 2 λ = µ. Combining this with (3.9), we have an error bound similar to (3.10): where ǫ 1 (x) and ǫ 2 (x) are as in (5.6) and (5.7). To bound the error terms in (5.8) that are associated with ǫ 1 (x) and ǫ 2 (x), we need to analyze the The inequalities above contain the indicators 1 (a−δ,a] (x) and 1 (a,a+δ] (x). When we consider the upper bound in (5.8), these indicators will manifest themselves as probabilities P(a − δ <X(∞) ≤ a) and P(a <X(∞) ≤ a + δ).
To this end we present the following lemma, which will be used in the proof of Theorem 3.
Recall that d K (X(∞), W ) is the Kolmogorov distance between X(∞) and W . Then for any a ∈ R, n ≥ 1, and 0 < R < n, This lemma is proved in Appendix D.2. We will apply Lemma 6 with W = Y (∞) in the proof of Theorem 3 that follows. The following lemma guarantees that the modulus of continuity of the cumulative distribution function of Y (∞) is bounded by a constant independent of λ, n, and µ. Its proof is provided in Appendix D.3.
Lemma 7. Consider the Erlang-C model (α = 0), and let ν(x) be the density of Y (∞). Then for for all n ≥ 1, λ > 0, and µ > 0 satisfying 0 < R < n, Lemmas 6 and 7 are stated for the Erlang-C model, but one can easily repeat the arguments in the proofs of those lemmas to prove analogues for the Erlang-A model. Therefore, we state the following lemmas without proof.

Proof of Theorem 3.
Proof of Theorem 3. Throughout the proof we assume that R ≥ 1, or equivalently, δ ≤ 1. For h(x) = 1 (−∞,a] (x), we let f a (x) be a solution the Poisson equation (3.3) with parameter a 2 = 0. In this proof we will show that for all a ∈ R, The upper bound in (5.11) is similar to (4.3), however (5.11) has the extra term The reason this term appears in the Kolmogorov setting but not in the Wasserstein setting is because f ′′ a (x) is discontinuous in the Kolmogorov case, as opposed to the Wasserstein case where f ′′ h (x) is continuous. Applying Lemmas 6 and 7 to the right hand side of (5.11), and taking the supremum over all a ∈ R on both sides, we see that We want to add that Lemma 6 makes heavy use of the birth-death structure of the Erlang-C model, and that it is not obvious how to handle (5.12) more generally.
To prove Theorem 3 it remains to verify (5.11), which we now do. The argument we will use is similar to the argument used to prove (4.3) in Theorem 1. We will bound each of the terms in (5.8), which we recall here as We also recall the form of b(x) from (4.2). We use the moment bounds (3.12) and (3.16), and the gradient bound (5.2) to see that Next, we use (5.9), (5.13), and the gradient bound (5.1) to get where in the last inequality we used the fact that for y ∈ [X(∞),X(∞) By a similar argument, one can check that with the only difference in the argument being that we consider the cases whenX(∞) ≤ −ζ andX(∞) ≥ −ζ + δ, instead ofX(∞) ≤ −ζ − δ and X(∞) ≥ −ζ. Lastly, we use the first inequality in (5.10) to see that where in the last inequality we used (5.13) and the moment bound (3.12). Now by (3.11) and (3.16), imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 and similarly, Therefore, This verifies (5.11) and concludes the proof of Theorem 3.
6. Extension: Erlang-C Higher Moments. In this section we consider the approximation of higher moments for the Erlang-C model. We begin with the following result.
Theorem 5. Consider the Erlang-C system (α = 0), and fix an integer m > 0. There exists a constant C = C(m), such that for all n ≥ 1, λ > 0, and µ > 0 satisfying 1 ≤ R < n, The proof of this theorem follows the standard Stein framework in Section 3, but we do not provide it in this paper. The most interesting aspect of (6.1) is the appearance of 1/ |ζ| m−1 in the bound on the right hand side, which of course only matters when |ζ| is small. To check whether the bound is sharp, we performed some numerical experiments illustrated in Table 3. The results suggest that the approximation error does indeed grow like 1/ |ζ| m−1 .
Lemma 10. For any integer m ≥ 1, and all n ≥ 1, λ > 0, and µ > 0 satisfying R < n, Multiplying both sides of (6.1) by |ζ| m and applying Lemma 10, we see that for all n ≥ 1, λ > 0, and µ > 0 satisfying 1 ≤ R < n, In other words, we can rewrite (6.1) as whereC(m) is a redefined version of C(m). That the approximation error in Table 3 increases is then attributed to the fact that EX(∞) increases as ζ ↑ 0. As we mentioned before, the appearance of the (m − 1)th moment in the approximation error of the mth moment was also observed recently in [30] for the virtual waiting time in the M/GI/1 + GI model, potentially suggesting a general trend.  Table 3 The error term above equals E(X(∞)) 2 − E(Y (∞)) 2 and grows as R → n. The error term still grows when multiplied by |ζ| 0.5 , and the error term shrinks to zero when multiplied by |ζ| 1.5 . However, when multiplied by |ζ|, the error term appears to converge to some limiting value, suggesting that the error does indeed grow at a rate of 1/ |ζ|. We observed consistent behavior for higher moments ofX(∞) as well.
Acknowledgement. This research is supported in part by NSF Grants CNS-1248117, CMMI-1335724, and CMMI-1537795. The first two authors were stimulated from discussions with the participants of the 2015 Workshop on New Directions in Stein's Method held at the Institute for Mathematical Sciences at the National University of Singapore and they would like to thank the financial support from the Institute. The authors also wish to thank two anonymous referees for suggestions to improve the presentation of the paper.

APPENDICES
Appendix A handles the moment bounds, while Appendix B handles the gradient bounds. Appendix C contains outlines for the proofs of Theorems 2 and 4, and in Appendix D we prove several miscellaneous lemmas.

APPENDIX A: MOMENT BOUNDS
In Section A.1, we first prove Lemma 2, establishing the moment bounds for Erlang-C model. In Section A.2, we state and prove Lemma 11, establishing the moment bounds for Erlang-A model.

A.1. Erlang-C Moment Bounds.
Proof of Lemma 2. We first prove (3.11), (3.12), and (3.14). Recalling the generator GX defined in (3.6), we apply it to the function V (x) = x 2 to see that for k ∈ Z + and x = x k = δ(k − x(∞)), Instead of splitting the last two lines into the cases x ≤ −ζ and x > −ζ, we could have also considered x < −ζ and x ≥ −ζ instead, and would have obtained We take expected values on both sides of (A.1) with respect toX(∞), and apply Lemma 1 to see that This implies that when |ζ| > δ/2, and when |ζ| ≤ δ/2, Therefore, which proves (3.11). Jensen's inequality immediately gives us which proves (3.12). Furthermore, (A.3) also gives us which is not quite (3.14) because the inequality above has 1(X(∞) > −ζ) as opposed to 1(X(∞) ≥ −ζ) as in (3.14). However, we can use (A.2) to get the stronger bound which proves (3.14). We now prove (3.13), or We use the triangle inequality to see that The second term on the right hand side is just the expected number of idle servers, scaled by δ. We now show that this expected value equals |ζ|. Applying the generator GX to the test function f (x) = x, one sees that for all k ∈ Z + and x = x k = δ(k − x(∞)), imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 Taking expected values with respect toX(∞) on both sides, and applying Lemma 1, we arrive at which proves (3.13).
We move on to prove (3.15), or Let I be the unscaled expected number of idle servers. Then by (A.5), Now let {ν k } ∞ k=0 be the stationary distribution of X (the unscaled CTMC). We want to prove an upper bound on the probability Observe that Now let k * be the first index that maximizes {ν k } ∞ k=0 , i.e. k * = inf{k ≥ 0 : ν k ≥ ν j , for all j = k}. Then Applying GX to the test function f (x) = (k ∧ k * ), we see that for all k ∈ Z + and x = x k = δ(k − x(∞)), Taking expected values with respect to X(∞) on both sides and applying Lemma 1, we see that Using the inequality above, together with the fact that k * ≤ n, we see that The fact that k * ≤ n is a consequence of λ < nµ, and can be verified through the flow balance equations of the CTMC X. We combine the bound above with (A.7) to arrive at (3.15), which concludes the proof of this lemma.

A.2. Erlang-A Moment Bounds.
The following lemma states the necessary moment bounds for the Erlang-A model. The underloaded and overloaded cases have to be handled separately. Since the drift b(x) is different between the Erlang-A and Erlang-C models, the quantities bounded in the following lemma will resemble those in Lemma 2, but will not be identical.
Lemma 11. Consider the Erlang-A model (α > 0). Fix n ≥ 1, λ > 0, µ > 0, and α > 0. If 0 < R ≤ n (an underloaded system), then and if n ≤ R (an overloaded system), then A.2.1. Proof Outline for Lemma 11: The Underloaded System. The proof of the underloaded case of Lemma 11 is very similar to that of Lemma 2. Therefore, we only outline some key intermediate steps needed to obtain the results. We remind the reader that when R ≤ n, then ζ ≤ 0. We first show how to establish (A.8), which is proved in a similar fashion to (3.11) of Lemma 2 -by applying the generator GX to the Lyapunov function V (x) = x 2 . The following are some useful intermediate steps for any reader wishing to produce a complete proof. The first step to prove (A.8) is to get an analogue of (A.1). Namely, when x ≤ −ζ, and when x ≥ −ζ, From here, we use Lemma 1 to get a statement similar to (A.3), from which we can infer (A.8) and by applying Jensen's inequality to (A.8), we get (A.9). Observe that this procedure yields (A.12), (A.13), and (A.14) as well. We now describe how to prove (A.11), which requires only a slight modification of (A.25). Namely, for x ≥ −ζ, From this, we can deduce that since x ≥ −ζ, Then Lemma 1 can be applied as before to see that both 2µ |ζ| E X (∞)1(X(∞) ≥ −ζ) and 2(µ ∧ α)E X (∞) Applying the generator GX to the test function f (x) = x and taking expected values with respect toX(∞), we get Eb(X(∞)) = 0, or When combined with (A.9), this implies that to which we can apply the triangle inequality and (A.13) to conclude (A.10). Lastly, the proof of (A.15) is nearly identical to the proof of (3.15) in Lemma 2. The key step is to obtain an analogue of (A.7).
A.2.2. Proof Outline for Lemma 11: The Overloaded System. The proof of the overloaded case of Lemma 11 is also similar to that of Lemma 2. Therefore, we only outline some key intermediate steps needed to obtain the results; the bounds in this lemma are not proved in the order in which they are stated. We remind the reader that when R ≥ n, then ζ ≥ 0. We start by proving (A.18). Although the left hand side of (A.18) is slightly different from (3.11) of Lemma 2, it is proved using the same approach -by applying the generator GX to the Lyapunov function V (x) = x 2 . The following are some useful intermediate steps for any reader wishing to produce a complete proof. The first step to prove (A.18) is to get analogue of (A.1). Namely, when x ≤ −ζ, 28) and when x ≥ −ζ,  .17), which requires only a slight modification of (A.28). Namely, we use the fact that for x ≤ −ζ, From this, one can deduce that since x ≤ −ζ, and also GX V (x) ≤ −2α |ζ| |x| + 2µ.
Then Lemma 1 and Jensen's inequality can be applied as before to get both (A.16) and (A.17).
We now prove (A.23). Observe that where the last equality comes from applying the generator GX to the function f (x) = x and taking expected values with respect toX(∞) to see that Eb(X(∞)) = 0, or Therefore, and we can invoke (A.19) to conclude (A.23). We now prove (A.24), which requires additional arguments that we have not used in the proof of Lemma 2. We assume for now that Fix γ ∈ (0, 1/2), and define k=0 is the stationary distribution of X. We note that by (A.30), imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 which implies that n − γ √ R > 0. Then To bound J 1 we observe that Combining (A.20)-(A.23), we conclude that Now to bound J 2 , we apply GX to the test function f (x) = k ∧ n, where x = δ(k − x(∞)), and take the expectation with respect toX(∞) to see that Noticing that we arrive at The flow balance equations λν k−1 = kµν k , k = 1, 2, · · · , n imply that ν 0 < ν 1 < · · · < ν n−2 < ν n−1 ≤ ν n , and therefore We use (A.30), the fact that γ ∈ (0, 1/2), and that R ≥ n ≥ 1 to see that Then by rearranging terms in (A.34) and applying (A.32) we conclude that Hence, we have just shown that under assumption (A.30), where to get the last inequality we fixed γ ∈ (0, 1/2) that solves γ + 1/γ = 3. We now wish to establish the same result without assumption (A.30), i.e. when λ > nµ + 1 nµ. For this, we rely on the following comparison result. Fix n, µ and α and let X (λ) (∞) be the steady-state customer count in an Erlang-A system with arrival rate λ, service rate µ, number of servers n, and abandonment rate α. Then for any 0 < λ 1 < λ 2 , 35) This says that with all other parameters being held fixed, an Erlang-A system with a higher arrival rate is less likely to have idle servers. For a simple proof involving a coupling argument, see page 163 of [45].
Furthermore, recall that ν(x) is the density of Y (∞), and satisfies where κ is a normalizing constant. Now recall that the family of solutions to the Poisson equation is given by (3.4), and is parametrized by constants a 1 , a 2 ∈ R. We fix a solution f h (x) with a 2 = 0, and see that for this solution which implies that We will see that to establish gradient bounds, we will have to make use of both expressions for f ′ h (x) in (B.1) and (B.2), depending on the value of x. It is here that the relationship between the diffusion process Y and the random variable Y (∞) surfaces. If Eh(Y (∞)) were to be replaced by any other constant, then (B.2) would not hold. The reason Eh(Y (∞)) is the "correct" constant is because Y (∞) has the stationary distribution of the diffusion process Y .
We now proceed to prove the gradient bounds. We do this first in case when H = Lip(1) (the Wasserstein setting), and then when H = H K (the Kolmogorov setting). (1); without loss of generality we assume that h(0) = 0. We now derive some equations that will be useful to prove Lemma 3. From the form of f ′ h (x) in (B.1) and (B.2), we have

Now by differentiating the Poisson equation (3.3), we see that for those
x ∈ R where h ′ (x) and b ′ (x) are defined, By observing that b(x) = µν ′ (x)/ν(x), we can rearrange the terms above and multiply both sides by ν(x) to get Suppose we know that and since lim x→∞ ν(x)f ′′ h (x) = 0, it is also true that To summarize, we have just shown that provided (B.4) holds, The multiple bounds for f ′ h (x) are convenient, because we can choose which one to use based on the value of x. For example, suppose ζ ≤ 0. Then when x ≤ 0, we will use (B.5), when x ≥ −ζ we will use (B.6), and when x ∈ [0, −ζ] we will use both (B.5) and (B.6) and choose the better bound. The same applies to f ′′ h (x).
B.1.1. Proof of Lemma 3. The following lemma presents several bounds that will be used to prove Lemma 3. We prove it at the end of this section.
Lemma 12. Consider the Erlang-C model (α = 0), and let ν(x) be the density of Y (∞). Then Proof of Lemma 3. We begin by bounding f ′ h (x) by applying Lemma 12 to (B.5) and (B.6). For x ≤ −ζ, we apply (B.10), (B.12), and (B.16) to (B.5), and for x ≥ 0, we apply (B.11), (B.13), and (B.16) to (B.6) to see that To do so, we rearrange the Poisson equation (3.3) to get It is now obvious that (1), the drift b(x) is piecewise linear, and f ′ h (x) is bounded as in (B.18), but on the other hand ν(x) decays exponentially fast as x → ∞, and decays even faster as x → −∞. To bound |f ′′ h (x)|, we use (B.7) and (B.8), together with the facts that h ′ ≤ 1 and to see that We know |f ′ h (x)| is bounded as in (B.18). For x ≤ −ζ, we apply (B.10) to (B.19) and for x ≥ 0 we apply (B.11) to (B.20) to conclude that (B.21) By considering separately the cases when |ζ| ≤ 1 and |ζ| ≥ 1, we see that and therefore, (B.23) Lastly, we bound |f ′′′ h (x)|, which exists for all x ∈ R where h ′ (x) and b ′ (x) exist. The fact that h(x) ∈ Lip(1) together with (B.9) tells us that Although tempting, it is not sufficient to use the bound on |f ′′ h (x)| in (B.21) and the form of b(x) to bound |f ′′ h (x)b(x)| for all x ≤ −ζ. Instead, we multiply both sides of (B.19) and (B.20) by |b(x)| to see that (B.24) By invoking (B.14) and (B.15) together with the bound on |f ′ h (x)| from (B.18), we conclude that This concludes the proof of Lemma 3.
Proof of Lemma 12 . Recall that in the Erlang-C model, we set α = 0, which makes The density of Y (∞) has the form where a − and a + are normalizing constants that make ν(x) continuous at the point x = −ζ.
In the proof of this lemma we often rely on the fact that for any c > 0 and x ≥ 0, One can verify that the left hand side of (B.26) peaks at x = 0 by using the bound to see that the derivative of the left side of (B.26) is negative for x > 0. We now prove (B.10). When x ≤ 0, we use (B.26) and the symmetry of the function e − 1 2 y 2 to see that 1 and when x ∈ [0, −ζ], we use (B.26) again to get We now prove (B.11). Observe that when x ≥ −ζ, and when x ∈ [0, −ζ], we use the fact that a − = a + e − 1 2 ζ 2 together with (B.26) to see that Moving on to show (B.12), when x ≤ 0 we have and when x ∈ [0, −ζ], When x ≥ −ζ, We move on to prove (B.14) and (B.15). When x ≤ 0, we use (B.27) to see that When x ∈ [0, −ζ], we also use the fact that a + = a − e 1 2 ζ 2 to get ≤ xe Lastly we prove (B.16). Letting V (x) = x 2 , and recalling the form of G Y from (3.2), we consider By the standard Foster-Lyapunov condition (see [47,Theorem 4.3] for example), this implies that imsart-ssy ver. 2014/10/16 file: mmnplus-SSY_final.tex date: February 21, 2017 and in particular, where we applied Jensen's inequality in the second set of inequalities. This concludes the proof of Lemma 12.
B.2. Erlang-A Wasserstein Gradient Bounds. Below we state the Erlang-A gradient bounds, the proof of which is similar to that of Lemma 3. We only outline the necessary steps needed for a proof, and emphasize all the differences with the proof of Lemma 3. Furthermore, we do not seek an explicit value for the constant C below, although it can certainly be recovered.
Lemma 13. Consider the Erlang-A model (α > 0). The solution to the Poisson equation f h (x) is twice continuously differentiable, with an absolutely continuous second derivative. Fix a solution in (3.4) with parameter a 2 = 0. Then there exists a constant C > 0 independent of λ, n, µ, and α such that for all n ≥ 1, λ > 0, µ > 0, and α > 0 satisfying 0 < R ≤ n (an underloaded system), and for all n ≥ 1, λ > 0, µ > 0, and α > 0 satisfying n ≤ R (an overloaded system), Lemma 14. Consider the Erlang-A model (α > 0) with 0 < R ≤ n, and let ν(x) be the density of Y (∞). Then To prove this lemma, we first observe that where a − and a + are normalizing constants that make ν(x) continuous and integrate to one. By comparing the density in (B.46) to (B.25) for the region x ≤ −ζ, we immediately see that (B.39), (B.41) and (B.43) are restatements of (B.10), (B.12), and (B.14) from Lemma 12, and hence have already been established. The proof of (B.45) involves applying G Y to the Lyapunov function V (x) = x 2 to see that One can compare these inequalities to (B.28) in the proof of Lemma 12 to see that (B.45) follows by the Foster-Lyapunov condition.
We now go over the proofs of (B.40), (B.42) and (B.44). We first prove (B.40) when x ∈ [0, −ζ]. Observe that where in the first inequality we used a change of variables and the fact that a + /a − = e −ζ 2 /2 e α 2µ ( µ α ζ) 2 , and in the last inequality we used both (B.26) and (B.27), and the fact that e 1 2 (x 2 −ζ 2 ) ≤ 1. The rest of (B.40) is proved identically. We now prove (B.42). When x ∈ [0, −ζ], and when x ≥ −ζ, We then apply (B.29), (B.40), and the fact that ν(−ζ)/ν(x) ≤ 1 to conclude (B.30). The proof of (B.32) relies on (B.9), which tells us that We Lemma 15. Consider the Erlang-A model (α > 0) with R ≥ n, and let ν(x) be the density of Y (∞). Then To prove this lemma, we first observe that where a − and a + are normalizing constants that make ν(x) continuous and integrate to one. Observe that in the region x ≥ −ζ, the density in ( The proof of (B.56) involves applying G Y to the Lyapunov function V (x) = x 2 to see that One can compare this inequality to (B.28) in the proof of Lemma 12 to see that (B.56) follows by the Foster-Lyapunov condition. We where in the last inequality we used both (B.26) and (B.27). For x ∈ [−ζ, 0], Repeating arguments from (B.58) and using a − /a + = e − α 2µ ζ 2 e 1 2 ( α µ ζ) 2 , we can show that the first term above satisfies and by computing the second term explicitly, we conclude that which follows by rewriting the Poisson equation (3.3) and using the Lipschitz property of h(x). We now prove (B.36)-(B.38). We recall (B.9) to see that and the difference between (B.37) and (B.38) lies in the way that the quantity above is bounded. To get (B.37), we simply apply the bounds on f ′′ h (x) from (B.34) to the right hand side above.
To prove (B.38), we will first argue that where C is some positive constant independent of everything else; this will imply (B.38). The only difference between the proof of (B.59) and the bound on f ′′′ h (x) in (B.37) is in how |f ′′ h (x)b(x)| is bounded; we now describe the different way to bound |f ′′ h (x)b(x)|. When x ≥ 0, we multiply both sides of (B.8) by b(x) and use the bounds in (B.33) and (B.55) to bound |f ′′ h (x)b(x)|. When x ∈ [−ζ, 0], we want to prove that which, after considering separately the cases when ζ ≤ µ/α and ζ ≥ µ/α, implies that We can then use this fact to bound |f ′′ h (x)b(x)| = α |x| |f ′′ h (x)|. To prove (B.60) for ζ ≤ µ/α, we bound (B.8) using (B.33) and (B.51). To prove (B.60) for ζ ≥ µ/α, we bound (B.7) using (B.33) and (B.50). We point out that to bound (B.7) we need to perform a manipulation similar to the one in (B.49). This concludes the proof outline for the overloaded case.

B.3. Kolmogorov Gradient Bounds: Proof of Lemmas 4 and 5.
For the remainder of this section, we take H = H K in (3.3). With this choice of test functions, any solution to the Poisson equation will have a discontinuity in its second derivative, which makes the gradient bounds for it differ from the Wasserstein setting. Fix a ∈ R and consider the Poisson equation We now prove the Kolmogorov gradient bounds for the Erlang-C model.
where f ′′ a (x) is understood to be the left derivative at the point x = a.
Proof of Lemma 5. The proof of this lemma is almost identical to the proof of Lemma 4. By using the analogues of (B.14) and (B.15) from Lemmas 14 and 15, its not hard to check that (B.62) holds for the Erlang-A model as well. To prove the bounds on f ′ a (x), we obtain inequalities similar to (B.61) by using analogues of (B.10) and (B.11) from Lemmas 14 and 15. These inequalities will imply (5.3) and (5.4) once we consider in them separately the cases when |ζ| ≤ 1 and |ζ| ≥ 1.

APPENDIX C: PROOF OUTLINES OF ERLANG-A THEOREMS
Sections C.1 and C.2 contain an outline for the proofs of Theorems 2 and 4, respectively.
C.1. Outline for Theorem 2. Proving Theorem 2 consists of bounding the four error terms in (3.10). Since the procedure is very similar to the proof of Theorem 1, we will only outline which gradient and moment bounds need to be used to bound each error term.
We start with the underloaded case, when R ≤ n. To bound the first term in C.2. Outline for Theorem 4. The proof of this theorem is nearly identical to the proof of Theorem 3. Therefore, we only outline the key steps and differences. The goal is to obtain a version of (5.11), from which the theorem follows by applying Lemmas 8 and 9. To get a version of (5.11), we bound each of the terms in (5.8), just like we did in the proof of Theorem 3. The proof varies between the underloaded and overloaded cases.
We begin with the underloaded case (1 ≤ R ≤ n). To bound the first term in (5.8), we use moment bounds (A.9), (A.11), and (A.13), together with gradient bound (5.5). For the second and third terms in (5.8) we use the gradient bound in (5.3). For the fourth error term, we use gradient bound where GX (x, x) is the diagonal entry of the generator matrix GX corresponding to state x.
The case of the Erlang-A model is not very different. When α > 0, the transition rates of the CTMC depend linearly on its state. Hence, to satisfy (D.1) we need to show that E(X(∞)) 3 < ∞. This is readily proven by repeating the procedure above with the Lyapunov function V (k) = k 4 , and we omit the details.
There are now 4 cases to consider, with the first three being simple to handle. Recall that ω(F W ) is the modulus of continuity of F W (w).