An asymptotic optimality result for the multiclass queue with finite buffers in heavy traffic

For a multiclass G/G/1 queue with finite buffers, admission and scheduling control, and holding and rejection costs, we construct a policy that is asymptotically optimal in the heavy traffic limit. The policy is specified in terms of a single parameter which constitutes the free boundary point from the Harrison-Taksar free boundary problem, but otherwise depends"explicitly"on the problem data. The c mu priority rule is also used by the policy, but in a way that is novel, and, in particular, different than that used in problems with infinite buffers. We also address an analogous problem where buffer constraints are replaced by throughput time constraints.


Introduction
In this work we consider the problem of finding asymptotically optimal (AO) controls for the multiclass G/G/1 queue with finite buffers, in heavy traffic. Upon arrival of a class-i customer into queue i (with i ∈ {1, . . . , I} and where I denotes the number of classes), a decision maker may either accept or reject the job. In addition, the decision maker controls the fraction of effort devoted by the server to the customer at the head of queue i, for each i. We refer to the two elements of the control as admission control and scheduling, respectively. The problem considered is to minimize a combination of holding and rejection costs. The term 'heavy traffic' refers to assuming a critical load condition and observing the model at diffusion scale. Our interest in this problem stems from recent developments in the application area of cloud computing. In a hybrid cloud where a private cloud (namely, a local server) has a given capacity and memory limits, tasks that cannot be queued in real time are rejected from the local system and sent to a public cloud, where a fixed charge per usage applies. For further details on modeling toward these applications, see [38]. For a more general modeling framework of data centers, see [12]. The analysis of the model leads in the scaling limit to a control problem associated with Brownian motion (BM), often referred to in this context as a Brownian control problem (BCP). Our main result is the convergence of the queueing control problem (QCP) value function to that of the BCP, and the construction of a particular AO admission/scheduling policy. The policy is specified in terms of a free boundary point that is used in solving the BCP, but otherwise it depends explicitly on the problem data.
A line of research starting from Harrison [25] and continuing with Harrison and van Mieghem [29], Harrison [26], [27] and Harrison and Williams [30] has treated BCP associated with a broad family of models called stochastic processing networks. These problems, aimed at describing the heavy traffic limits of QCP, were shown to be equivalent to reduced BCP (RBCP), in which workload plays the role of a state process. RBCP simplify BCP in two ways: Their state lies in lower dimension, and their form, specifically, that of a singularly controlled diffusion, makes control theoretic tools applicable. Addressing these models at the same level of generality, Atar and Budhiraja [6] and Atar, Budhiraja and Williams [7] use such control theoretic tools to characterize the BCP (equivalently, RBCP) value functions as solutions to Hamilton-Jacobi-Bellman (HJB) equations, and Budhiraja and Ghosh [15] and [16] prove convergence of QCP value function to BCP value functions. Many other works address these models in situations where the BCP are explicitly solvable, see e.g. Ata and Kumar [2] and references therein.
As far as BCP are concerned, the model studied here is a special case of the models considered in some of the aforementioned papers. In particular, BCP and RBCP play here important roles, where the reduction from an I-dimensional BCP to a one-dimensional RBCP is a special case of [30]. Moreover, the HJB equation, that in the present setting is an ordinary differential equation and will be referred to merely as a Bellman equation, is a special case of the partial differential equations treated in [7]. In addition, our specific one-dimensional RBCP, its relation to the Bellman equation, and its solution go back to Harrison and Taksar [28], where a singular control problem for a BM is solved. The solution is given by a reflected BM (RBM), with supporting interval determined by a free boundary problem associated with the Bellman equation. This type of free boundary problem first appeared in [28], and we therefore refer to it as the Harrison-Taksar free boundary problem. In our case, the interval is always of the form [0, x * ], and we call x * the free boundary point.
On the other hand, the works [15], [16] and [2], despite their vast generality, do not cover the present model as they do not treat admission control and rejection penalties. Thus, while the BCP is well understood, convergence and AO issues have not been addressed before. Addressing these issues is the main contribution of this paper. This is done by proving that the BCP value function constitutes a lower bound on the limit inferior of QCP costs under any sequence of policies (Theorem 3.1), and then constructing a specific policy that asymptotically achieves this lower bound (Theorem 4.1). This AO policy depends explicitly on the system parameters, except that it also depends on the quantity x * . Moreover, it uses the well-known cµ rule in a novel way, as we now explain.
The structure of the policy alluded to above is simple enough to describe without introducing much notation. The notation needed is as follows. For class-i customers, denote holding cost per unit time by h i , rejection penalty per customer by r i , and reciprocal mean service time by µ i . The policy is defined in terms of three elements: The index h i µ i , the index r i µ i , and the free boundary point x * . The first index is used for scheduling. It is precisely the index used for the cµ priority rule (a terminology used when c, rather than h, denotes holding cost per unit time), where classes are prioritized in the order of h i µ i , the highest priority given to the class i with greatest h i µ i . As observed first by Smith [39] and Cox and Smith [18], the cµ priority rule is exactly optimal for holding costs. Many extensions to this result have been shown (see e.g., [17], [40] and discussions therein). Our scheduling policy uses the same index to assign priorities, but in a state-dependent fashion, as follows. At any given time, the lowest priority is assigned to the class i having lowest index h i µ i among classes for which the buffers are not nearly full. We give precise meaning to the term 'nearly full'.
Let us contrast this with the case of infinite buffers and no rejection. For this model, an AO policy applying dynamic priorities, in the form of an extended version of the cµ rule, was developed by van Mieghem [40] to address nonlinear delay costs. When costs are linear, as they are in the present paper, it is the fixed priority rule according to h i µ i that is AO. Suppose now that I is the class that has lowest h i µ i value, so that class I is assigned lowest priority by this rule. Then, as is well-known since Whitt [42], the multiclass G/G/1 queue behaves in such a way that all classes i < I exhibit vanishing queuelength in the heavy traffic limit. Consequently, it is not only the aforementioned assignment rule that is AO. Any priority policy assigning lowest priority to the class I performs equally well, and is therefore AO for such a QCP. In other words, the only aspect of the index policy which is important for AO in the problem with infinite buffers and no rejections, is the class assigned the lowest priority. Thus there is a major difference between the way in which the index is used in the infinite buffer setting and in this paper. In the latter case, the full information on the ordering of classes is important.
The admission control is based on the other index, r i µ i , and the free boundary point x * . The significance of this index for admission control in heavy traffic was first noticed by Plambeck, Kumar and Harrison [34] (see below). Our policy acts as follows. When the diffusion-scaled workload level exceeds the level x * , all arrivals of one particular class are rejected. This is the class i having the least r i µ i value. When the workload level is below x * , all arrivals are admitted, except rejections that must take place so as to keep the buffer size constraint valid (namely arrivals that occur at a time when the corresponding buffer is full). We call these forced rejections. A property of the policy that is important for AO is that it maintains, with high probability, a low number of forced rejections. As a result, nearly all rejections occur when the workload exceeds x * , and only from one class. It is to this end that the scheduling policy prioritizes classes with nearly full buffer.
The aforementioned paper [34] studies the problem of minimizing rejection penalties, subject to throughput time constraints, for the multiclass G/G/1 queue in heavy traffic (see also Ata [1] for a closely related formulation). Each class has a deterministic constraint on the throughput time, and arrivals that are admitted into the system are assured that, with high probability, their throughput time constraint will be kept. This property of the policy is referred to as asymptotic compliance. The policy of [34] admits all arrivals except those from the class having lowest r i µ i value, and only when the workload exceeds a threshold value. Thus our admission policy resembles that of [34], except that our threshold level is characterized by the free boundary problem, whereas it is explicit in [34] (their scheduling policy is different than ours).
But the relation of our work to [34] is deeper than similarities in the admission policies. Reiman's snapshot principle [36], and the pathwise Little's law, state that, under suitable assumptions, a deterministic relation holds in the heavy traffic limit between throughput time and queuelength processes. Accordingly, buffer constraints on queuelength should be asymptotically equivalent to throughput time constraints. We follow this rationale in the last section of this paper, where we formulate a QCP that parallels the QCP addressed in the main body of the paper, where finite buffer constraints are replaced by throughput time constraints. This may be regarded an extended version of the QCP of [34] that accommodates holding costs. We do not succeed in fully solving this problem here; our purpose in this part of the work is mainly to pose the problem and to discuss similarities with the main body of this paper, leaving the main question open. We begin by proving a pathwise Little's law in the form of a conditional result (Proposition 5.1). There is no guarantee that queuelength and throughput times satisfy Little's law under an arbitrary sequence of controls. We show that C-tightness of the processes involved suffices. Using this result we can show that the policy we develop for the finite buffer problem satisfies the throughput time constraints and that its limit performance is dominated by the BCP value (Theorem 5.1). In order to deduce that it is AO, a lower bound in the same form is also needed. However, due to the lack of validity of Little's law for general sequences of policies, we can only show AO in a restricted class of policies (Proposition 5.2). The broader problem, and hence the question of AO remain open (see Conjecture 5.1).
Under the AO policy, the I-dimensional queuelength process converges to the process solving the BCP. This convergence is a form of a state space collapse (SSC), a term referring to a behavior where queuelength process limits are dictated by workload process limits. SSC is an important ingredient in the analysis of queueing network models in heavy traffic. It has been considered in many works, and in particular in a general setting by Bramson [14] and Williams [43]. The form of the SSC obtained in this paper involves spatial inhomogeneity due to the dynamic priorities, and is not covered by [14], [43], or, to the best of our knowledge, any other work on SSC. A part of the proof of Theorem 4.1 is aimed at showing a SSC result.
For a different formulation of a QCP with finite buffers and rejection costs, see Ghosh and Weerasinghe [24]. For a formulation other than [34] that combines asymptotic compliance and asymptotic optimality see Plambeck [35]. See Ward and Kumar [41], Rubino and Ata [37] and Ata and Olsen [3] for other treatments of AO in heavy traffic via a Bellman equation with free boundary, and Dai and Dai [19] for results on heavy traffic for systems with finite buffers without optimal control aspects. Finally, see Ghamami and Ward [23] for asymptotic optimality results based on a Bellman equation for the BCP, for a model with customer abandonment rather than rejection.
We will use the following notation. Given k ∈ N, {e (i) , i = 1, . . . , k} denote the standard basis The modulus of continuity of y is given bȳ The rest of this paper is organized as follows. In the next section we introduce the queueing model and QCP. Then we formulate the BCP and the RBCP, and state their solution via the Harrison-Taksar free boundary problem. We then discuss the interpretation of the solution. Section 3 shows that the BCP value function is a lower bound on the limit inferior of the sequence of value functions for the QCP. Section 4 constructs a policy for the QCP and proves that it is AO. Section 5 proves pathwise Little's law and relates the main body of the paper to the throughput time constraints formulation of [34].
2 Queueing and diffusion models 2.1 The multiclass G/G/1 model A sequence of systems is considered, indexed by n ∈ N. Quantities that depend on n have n as superscript in their notation. The system has a single server and I ≥ 1 buffers, where each buffer is dedicated to a class of customers. The capacity of each of the buffers is limited, where the precise formulation of capacity is presented later. Customers that arrive at the system may either be accepted or rejected. Those that are accepted are queued in the corresponding buffers. Within each class, service is provided in the order of arrival, where the server only serves the customer at the head of each line. Processor sharing is allowed, in the sense that the server is capable of serving up to I customers (of distinct classes) simultaneously. An allocation vector, representing the fraction of effort dedicated to each of the classes, is any member of where, throughout, I = {1, 2, . . . , I}.
A probability space (Ω, F, P) is given, on which all random variables and stochastic processes involved in describing the model will be defined. Expectation w.r.t. P is denoted by E. Arrivals occur according to independent renewal processes. Let parameters λ n i > 0, i ∈ I, n ∈ N be given, representing the reciprocal mean inter-arrival times of class-i customers in the n-th system. Let {IA i (l) : l ∈ N} i∈I be independent sequences of strictly positive i.i.d. random variables with mean E[IA i (1)] = 1, i ∈ I and squared coefficient of variation Var(IA i (1))/E[IA i (1) 2 ] = C 2 IA i ∈ (0, ∞). With 0 1 = 0, the number of arrivals of class-i customers up to time t, for the n-th system, is given by The parameters λ n i satisfy where λ i > 0 andλ i ∈ R are fixed.
Similarly, let parameters µ n i > 0, i ∈ I, n ∈ N be given, representing reciprocal mean service times for service to class i in the n-th system. Let independent sequences {ST i (l) : l ∈ N} i∈I of strictly positive i.i.d. random variables (independent of the sequences {IA i }) be given, with mean The time required to complete the l-th service to a class-i customer in the n-th system is given by ST i (l)/λ n i units of time dedicated by the server to this class. This can otherwise be stated in terms of the potential service time processes, given by S n i (t) is the number of class-i jobs completed by the time when the server has dedicated t units of time to work on jobs of this class. It is assumed that µ n i satisfy where µ i > 0 andμ i ∈ R are fixed. The first order quantities λ i and µ i are assumed to satisfy the critical load condition The number of class-i rejections until time t in the n-th system is denoted by Z n i (t). Since rejections occur only at times of arrival, we have for some process z n,i .
The number of class-i customers present in the n-th system at time t is denoted by X n i (t). For simplicity, the initial number of customers, X n i (0) is deterministic, and it is assumed that no partial service has been provided to any of the jobs present in the system at time zero. We will call X n = (X n i ) i∈I the queuelength process. Let B n = (B n i ) i∈I be a process taking values in the set B, representing the fraction of effort devoted by the server to the various customer classes. Then Use the notation A n for (A n i ) i∈I and similarly for the processes S n , Z n , D n , X n , T n . It is assumed that B n has RCLL sample paths. By construction, the arrival and potential service processes also have RCLL paths, and accordingly, so do D n , Z n and X n .
We define a rescaled version of the processes at diffusion scale aŝ We now come to the buffer structure. A bounded closed convex set with nonempty interior X ⊂ R I + is given, satisfying 0 ∈ X . It is assumed that, for every n, the rescaled initial condition X n (0) lies in X , and that the rejection mechanism assures that the buffer constraint is always met, namely:X n (t) ∈ X , t ≥ 0, a.s.
For example, the case X = {y ∈ R I + : y i ≤ b i , i ∈ I} corresponds to a system having a dedicated buffer, of size b i √ n, for each class, i. A single, shared buffer of size b √ n can be modeled by letting X = {y ∈ R I + : i y i ≤ b}. In any case, the actual (un-normalized) buffer size scales like √ n. To meet the constraint (12), the control mechanism must reject some of the arrivals. In particular, consider a class-i arrival occurring at a time t when This arrival has to be rejected so as to keep (12) valid. Physically, this situation represents buffers being full, with no available space to accommodate new arrivals. Such rejections, that occur when (13) holds, are often called loss in the literature. In our setting, admission/rejection decisions are controlled by the decision maker, and it is natural to refer to these as part of the rejection control process. We will refer to them as forced rejections, to distinguish them from rejections that occur when the buffers are not full (i.e., when (13) does not hold).
The process U n = (Z n , B n ) is regarded a control, that is determined based on observations from the past (and present) events in the system. The precise definition is as follows.
Definition 2.1. (Admissible control, QCP) Fix n ∈ N and consider fixed processes (A n , S n ) given by (2) and (4). A n and S n are called the primitive processes. A process U n = (Z n , B n ), taking values in R I + × B, having RCLL sample paths with the processes Z n i , i ∈ I having nondecreasing sample paths and given in the form (7), is said to be an admissible control for the n-th system if the following holds. Let the processes T n , D n , X n be defined by the primitive and control processes, (A n , S n ) and (Z n , B n ), via equations (8), (9) and (10), respectively. Then • One has a.s., that, for all i ∈ I and t ≥ 0, An admissible control under which the scaled versionX n of X n satisfies (12) is said to satisfy the buffer constraints.
The first bullet above asserts that control decisions are based on the past arrival and departure events. The second bullet expresses the fact that jobs from a certain class can be processed only if there is at least one customer of that class in the system. We denote the class of all admissible controls U n byŨ n , and the subset of those members ofŨ n satisfying the buffer constraints, by U n . Except for the last section of this paper, we will refer to processes in U n as merely admissible controls, for short. Note that the class U n depends on the processes A n and S n , but we consider these processes to be fixed.
Fix α > 0, h ∈ (0, ∞) I and r ∈ (0, ∞) I . For each n ∈ N consider the cost It will be assumed throughout that, for some x 0 ∈ X , The QCP value is given by We will be interested in the asymptotic behavior of V n .
Denote by θ n = (θ n i ) i∈I θ n i = 1/µ n i , and θ = (θ i ) i∈I , θ i = 1/µ i . The process θ n · X n , its normalized version θ n ·X n and the formal limit of the latter, θ · X, will play an important role in reducing the dimensionality of the problem. These processes are often referred to as the nominal workload (eg., in [34]), but we will refer to them simply as workload.

The Brownian control problems
Using (6), (10) and the definition of the rescaled processes, a simple calculation shows that the following identity holds for i ∈ I and t ≥ 0: where, denoting Since i ρ i = 1 and one always has i B n i (t) ≤ 1, it follows that θ n ·Ŷ n is a nonnegative, nondecreasing process.
We derive from (18)- (22) and (15)-(17) a control problem associated with diffusion by taking formal limits. Consider equation (18). The scaled initial conditions converge to x by (16). Next, the centered, rescaled renewal processÂ n i [resp.,Ŝ n i ] converges weakly to a BM starting from zero, with zero mean and diffusion coefficient √ Section 17 of [13]). Heuristically, if the processes involved in (18) are to give rise to a limiting BCP then in particularŶ n are order one as n → ∞. Thus by (21) one has that T n (t) converge to ρt. Thus, taking into account the time change in the second term of (20),Ŵ n is to be replaced a BM starting from zero, with drift vector m = (m i ) i∈I and diffusion matrix σ = diag(σ i ), where Such a process will be called an (m, σ)-BM. Finally,Ŷ n gives rise to a process Y for which θ · Y is nonnegative and nondecreasing, whereasẐ n to a process having nonnegative, nondecreasing components.

The BCP
Definition 2.2. (Admissible control, BCP) An admissible control for the initial condition for which there exist an (m, σ)-BM, W , and a process U = (Y, Z) taking values in (R I + ) 2 , with RCLL sample paths, such that the following conditions hold: • θ · Y and Z i , i = 1, . . . , I, are nondecreasing; • With one has X(t) ∈ X for all t, P ′ -a.s.
We write A(x 0 ) for the class of admissible controls for the initial condition x 0 . When we write (Y, Z) ∈ A(x 0 ) it will be understood that these processes carry with them a filtered probability space and the processes W and X. Moreover, with a slight abuse of notation, we will write E for the expectation corresponding to this probability space. For (Y, Z) ∈ A(x 0 ), let The BCP is to find (Y, Z) that minimize J(Y, Z) and achieve the value

The RBCP
The BCP is treated by reduction to a one-dimensional problem. This is obtained by multiplying equation (25) and the processes involved in it by θ.
Definition 2.3. (Admissible control, RBCP) An admissible control for the initial condition for which there exist an (m,σ)-BM,W , and a processŪ = (Ȳ ,Z) taking values in R 2 + , with RCLL sample paths, such that the following conditions hold: We writeĀ(x 0 ) for the class of admissible controls for the initial conditionx 0 . Given (Ȳ ,Z) ∈ A(x 0 ), letJ . Note thath is convex by convexity of the set X (in case when X is polyhedral,h is also piecewise linear). Note also that as members of (0, ∞) I , θ and h cannot be orthogonal, thush(w) > 0 for any w > 0. Sinceh(0) = 0, it follows thath is strictly increasing. Let Toward relating the two problems, we will need the following additional definitions. First, the extremal points of the set {z ∈ R I Fix such i * and the corresponding ζ * . Note that i * can alternatively be characterized via Next, let γ : [0, x] → X be Borel measurable, satisfying (For the existence of a measurable selection see Corollary 10.3 in the appendix of [22]). Note that, by definition, γ(w) ∈ X , θ · γ(w) = w, and h · γ(w) =h(w) ≤ h · ξ for every ξ ∈ X for which θ · ξ = w. The relation between the problems is as follows.
forx 0 for the RBCP be given, and assume the probability space supports an (m, σ)-BM W . Assume W is {F ′ t }-adapted and satisfies θ · W =W and (23). Construct (X, Y, Z) by Proof. i. We verify that Definition 2.3 is satisfied by (W ,X,Ȳ ,Z). The first three bullets in that definition are straightforward. Equation (31) follows from (25), while (32) from (26).
and by definition ofr, Therefore ii. We show that (Y, Z) ∈ A(x 0 ) by verifying that Definition 2.2 is satisfied. The adaptedness follows by the assumption on W and the construction of X, Y and Z. Property (23) holds by assumption. By construction, (X, Y, Z) satisfy (25). Property (26) holds because, by definition, Hence θ · Y is nonnegative, nondecreasing. As a result, (Y, Z) ∈ A(x 0 ). Therefore iii. The last assertion will follow from the first two once we show that, in (ii), one can always find W with the stated properties. This is possible by supplementing the one-dimensional BM W with an (I − 1)-dimensional BM, independent ofW , and augmenting the probability space accordingly. Specifically, ifW is a (one-dimensional) (m,σ)-BM w.r.t. a filtration {F t } t≥0 and W is a standard (I − 1)-dimensional BM independent ofW then it is not hard to see that an I × (I − 1) matrixÂ and I-dimensional vectorsĀ and a can be found so that the I-dimensional process W (t) =ÂŴ (t) +ĀW (t) + at is an (m, σ)-BM and one has θ · W (t) =W (t), t ≥ 0. Letting then gives a filtration with which all conditions of an admissible control for the BCP are satisfied.

The Harrison-Taksar free boundary problem
The solution to the one-dimensional problem has been studied by Harrison and Taksar [28] via the Bellman equation. They showed that the functionV is C 2 [0, x] and solves the equation It follows from their work that an optimal control is one under which the processX is a RBM on a certain subinterval of [0, x]. We will consider a RBM as a path transformation of a BM by a Skorohod map, a map that will later be used in a wider context. To introduce this map, let a > 0.
It is characterized as the solution map ψ → (ϕ, η 1 , η 2 ) to the so called Skorohod Problem, namely the problem of finding, for a given ψ, a triplet (ϕ, η 1 , η 2 ), such that η i are nonnegative and nondecreasing, η i (0−) = 0, and See [32] for existence and uniqueness of solutions, and continuity and further properties of the map. In particular, it is well-known that Γ [a,b] is continuous in the uniformly-on-compacts topology.
We now go back to the RBCP. The following is mostly a result of [28].
W be an (m,σ)-BM and letX,Ȳ andZ be the corresponding RBM on [0, x * ] and boundary terms for 0 and x * , defined as Proof. The fact thatV is C 2 and solves the equation is proved in [28], Proposition 6.6 and the discussion that follows. Uniqueness follows from the uniqueness of solutions in the viscosity sense, for a class of equations for which the above is a special case [7]. Let us explain how. It follows from the main result of [7] that uniqueness of viscosity solutions holds for (41) where the Neumann boundary condition (BC) is replaced by a state constraint BC (see [7] for the definitions of viscosity solutions and state constraint BC). As is well-known (and follows directly from the definition), any C 2 function satisfying equation (41) is also a viscosity solution in the interior (0, x). As for the state constraint BC, it is easy to check (again following directly from the definition) that any smooth function satisfying the Neumann BC f ′ (0) = 0 and f ′ (x) =r, also satisfies the state constraint BC. This gives the uniqueness.
It is shown in [28] (see the discussion preceding (6.9) therein) that Next, it is shown in [28] that the control under whichX is a RBM on [a, x * ], for some 0 ≤ a < x * , is optimal. It remains to show that, in the case considered in this paper, a = 0. The argument relies on the fact thath is strictly increasing, as shown in the discussion following (33).
Arguing by contradiction, assume a > 0. The interval [a, x * ] is independent of the initial condition, and so we are free to choose anyx 0 . Considerx 0 = 0. Consider a BMW and the processX =x 0 +W +Ȳ −Z that initially has the value a and is given as a RBM on [a, x * ], driven byW . By the result of [28] alluded to above, (Ȳ ,Z) is optimal, i.e.,J(0,Ȳ ,Z) =V (0). Let τ be the first hitting time ofX at a + ε < x * . Next, construct on the same probability space another triplet (X,Ỹ ,Z), whereX behaves as a RBM on [0, x * ], driven byW , up to the time τ , and starting at time τ agrees with X (in particular, it has a jump at time τ ). In other words, is equal to that incurred by (X,Ȳ ,Z) on that interval, while, owing to the strict monotonicity of h and the positivity of τ , Note that no cost of the formrdZ is incurred during the time interval [0, τ ). Taking expectations showsJ(0,Ỹ ,Z) <J(0,Ȳ ,Z) =V (0), a contradiction. This shows a = 0.
As an immediate consequence of the above two results, we obtain an optimal control for the BCP.

Discussion
A brief description of the solution to the BCP is as follows. The workload processX = θ ·X is given as a RBM on [0, x * ], where the free boundary point x * is dictated by the Bellman equation. The multidimensional queuelength process X is recovered fromX by X = γ(X). The multidimensional rejection process Z has only one nonzero component, namely the i * -th component, which increases only whenX ≥ x * . This structure has an interpretation for the queueing model, that can be used to identify asymptotically optimal policies. Our main interest will be in the case of a rectangular domain, namely for some fixed b i > 0, representing a system where each class has a dedicated buffer (this will be our assumption in Section 4, although in Section 3 we allow general domains). In this case, the parameter x associated with the RBCP (defined in (29)) is given by θ · b.
The BCP solution suggests that, in the queueing model, rejections should occur only when the scaled workload exceeds the level x * , and only from class i * . Recall from (34) that this class is the class for which r i µ i is minimal. As explained in [34], i * is the class for which the rejection penalty per unit of work is smallest.
Next, the relationX n = γ(θ ·X n ) + o(1) between the queuelength and workload processes should hold. This is a requirement on the scheduling control. As mentioned in the introduction, when a critically loaded multiclass G/G/1 queue operates under fixed priority, the queuelength of all classes but one is asymptotically zero in diffusion scale, the exception being the class with least priority [42]. This is a simple example of a scheduling policy that dictates a relation of the form (46), where here γ(w) = (0, . . . , 0, wµ I ). Relation (46) with a more complicated γ appears implicitly when applying the generalized cµ rule of [40]. In [10] and [8] the scheduling policies keep (46) where γ is a generic minimizing curve.
We can solve for the minimizing curve γ in the present setting, where X takes the form (45). Equation (35) can in this case be written as Assume that the classes are labeled in such a way that  Given w ∈ [0, x] let (j, ξ) be the unique pair determined by the relation An exception is the special case w = x = θ · b, where one lets j = 1 and ξ = b 1 . In other words, denoteb j = I i=j+1 θ i b i , j ∈ {0, . . . , I} and note that 0 =b I <b I−1 < · · · <b 1 <b 0 = θ · b = x.
and ξ = ξ(w) = (w −b j )/θ j . Thus γ can be written explicitly as Simple examples are depicted in Figure 1. While the usual use of the cµ rule is by assigning fixed priority, here the index shows up differently. When buffer I becomes full and workload is increased, a queue in buffer I − 1 starts building up, and so on. A policy that aims at achieving (46) is developed in Section 4. Examples for the case of a shared buffer are depicted in Figure 2. As shown in this figure, the case of two classes with a shared buffer leads to a triangular domain. In higher dimension one may think of one set of classes sharing one buffer, another set sharing another buffer etc., leading to more examples of non-rectangular domains. General domains are covered in this paper as far as the lower bound is concerned, but we only address AO controls for the case of rectangular domains.
Example 2.1. (Numerical solution of the BCP) In this example we consider a specific threedimensional BCP and provide its solution explicitly. The parameters are given in the following table: The functionh defined bȳ A numerical solution of the equation is shown in Figure 3 below.
The free boundary point in this case, found numerically, is the point x * ≈ 1.47 at which V ′ =r. The curve γ from (48)-(49) is given by (We have not specified γ for values of w beyond the free boundary point 1.47). Note that the structure of this curve is of the form depicted in Figure 1 (right).

Example 2.2. (Numerical solution of the QCP)
Here we present simulation results for the behavior of a two-class M/M/1 queue operating under the optimal policy. While for general service time and inter-arrival time distributions finding the optimal policy is hard, in the case of Poisson arrivals and exponential service times the problem has the form of a Markov decision process and one has access to the optimal policy by means of the corresponding Bellman equation on the discrete 2d grid. We have solved this equation numerically, computed the optimal policy based on the solution, and run a simulation for the behavior of the resulting queuelength process. Figure 4 depicts histograms for the position of the two-dimensional queueing process, where gray levels encode the frequency of visits to each site in the state space (darker gray corresponds to more often visited sites). The histograms are depicted for an increasing value of the heavy traffic parameter. The results clearly indicate that the behavior becomes closer and closer to that given by the limit curve of the form depicted in Figure 1 (left).
While this numerical analysis is related to our results, note carefully that the relation is indirect: the simulation runs demonstrate the behavior under the optimal policy, whereas our results  address the asymptotics under a sub-optimal (but AO) policy. The two are related in that both show convergence to the limit behavior identified by the BCP solution.
Finally, we have also simulated the performance of the sub-optimal policy that we propose. The graph in Figure 5 shows the ratio between the cost under the proposed policy and the optimal cost for different values of n.

A general lower bound
Recall that V n is defined for the specific initial conditionX n (0), and that by (16),X n (0) → x 0 as n → ∞. The main result of this section asserts that the performance of any sequence of policies for the queueing model is asymptotically bounded below by the BCP value function. With an eye toward the last section, we will, in fact, prove a slightly stronger result. Instead of assuming the hard constraint (12), that is a part of the definition of 'admissible controls satisfying the buffer constraint', we will assume throughout this section the following weaker condition.
For every open setX ⊂ R I with X ⊂X , and every T > 0, Denote by I the operator for locally integrable functions ϕ.
Proof. The identity will follow from integration by parts once we show that the three terms e −αt IX n (t), e −αtẐ n (t) and e −αt IẐ n (t) converge to zero a.s. as t → ∞. Note by (10) that As a renewal process with finite expectation, A n i satisfies a law of large numbers in the sense that A n i (t)/t converges a.s. as t → ∞. Thus the three terms alluded to above converge to zero a.s., and the identity follows.
Before stating the following lemma we introduce some additional notation. Let A P (x 0 ) denote the class of controls for the BCP, defined as in Definition 2.2, except that instead of having RCLL paths, the processes are only assumed to be progressively measurable. More precisely, an element of A P (x 0 ) is a filtered probability space (Ω ′ , F ′ , {F ′ t }, P ′ ) with an (m, σ)-BM, W , and a progressively measurable process (Y, Z) taking values in (R I + ) 2 , such that W is adapted, , and, on an event having full P ′ -measure one has: θ · Y is a.e. equal to a process with nondecreasing sample paths; the same holds for each of the processes Z i , i = 1, . . . , I; and, with Note that A(x 0 ) ⊂ A P (x 0 ).
The purpose of introducing this extended class of controls is as follows. The technique employed in the proof of Theorem 3.1 below is based on tightness of the processes IX n , IŶ n and IẐ n rather thanX n ,Ŷ n andẐ n . It is established that the limits of these processes have Lipschitz continuous sample paths, and as a result they are a.e. differentiable. In order to connect these limits to the BCP one needs to construct from them an admissible control for the latter, but since the derivatives of Lipschitz functions need not be RCLL, the class of controls A(x 0 ) is too small for this purpose.
Using instead the class A P (x 0 ) is possible thanks to a result from [20] (see below) that shows that progressively measurable a.e. derivatives always exist.
The following lemma shows that working with the extended class of controls does not vary the value function. (27), (28) of J and V ). Then Proof. Given x 0 , consider the specific control that is optimal for V (x 0 ), namely (X, Y, Z) given in Proposition 2.1(ii), where (X,Ȳ ,Z) is the RBM on [0, x * ]. In particular, Z(t) = ζ * Z (t), whereZ is one of the boundary terms of a RBM. It is well known that e −αt (Z(t) + IZ(t)) → 0 a.s., as t → ∞. As a result, a similar statement holds for e −αt (Z i (t) + IZ i (t)), for each i ∈ I. Using integration by parts, this shows that V ( Next, let ε > 0 and consider an ε-optimal control for V P (x 0 ), again denoted by (X, Y, Z). Fix T > 0. Construct processes (X,Ỹ ,Z) that are identical to (X, Y, Z) on [0, T ). As for the time interval [T, ∞), letX be a RBM on [0, x * ] starting fromX(T ) = 0, and let (X,Ỹ ,Z) be constructed from this RBM in the same fashion that (X, Y, Z) are constructed fromX in the first part of the proof. In particular,Z satisfies e −αt (Z i (t) + IZ i (t)) → 0, for each i ∈ I (and it may have a jump at T ). By construction, (Ỹ ,Z) is progressively measurable. The constructed processes thus form an element of A P (x 0 ), and owing to the above tail condition, using integration by parts, J(x 0 ,Ỹ ,Z) =J(x 0 ,Ỹ ,Z). Now, using the equalityZ = Z on [0, T ). Since on [T, ∞),Z −Z(T ) is the boundary term of an RBM on a fixed interval, it is a standard fact that the second term in the above display converges to zero as T → ∞. As for the last term, sinceJ(x 0 ,Ỹ ,Z) < ∞, one has E ∞ T e −αt r · IZ(t)dt → 0 as T → ∞, thus using monotonicity of r ·Z, where a(T ) → 0 as T → ∞. Taking T → ∞ shows J(x 0 ,Ỹ ,Z) ≤ V P (x 0 ) + ε. Thus to complete the proof, it suffices to show that J(x 0 ,Ỹ ,Z) ≥ V (x 0 ). This is not immediate, because (Ỹ ,Z) is an element of A P (x 0 ) whereas V is defined with the smaller class A(x 0 ). We will argue by passing to the one-dimensional problem. To this end, note that (38) and (39) are valid for the progressively measurable processes, thus Now, the processes θ ·Ỹ and θ ·Z are pathwise nondecreasing, due to the definition of A P (x 0 ). Hence, if we defineŶ (t) = lim s↓t θ · Y (s),Ẑ(t) = lim s↓t θ · Z(s) andX = θ · x 0 + θ · W +Ŷ −Ẑ, then X,Ŷ andẐ are RCLL. Moreover, they satisfy all assumptions of Definition 2.3, withx 0 = θ · x 0 andW = θ · W . As a result, they are inĀ(x 0 ), and so J(x 0 ,Ỹ ,Z) ≥J(x 0 ,Ŷ ,Ẑ) ≥V (x 0 ). By Proposition 2.1,V (x 0 ) = V (x 0 ). We have thus shown that V (x 0 ) ≤ V P (x 0 ) + ε, and the result follows on taking ε → 0.
In the proof below and in the next section we will use the following characterization of Ctightness for processes with sample paths in D R (see Proposition VI.3.26 of [31]): C-tightness of {X N }, N ∈ N is equivalent to The sequence of random variables X N T is tight for every fixed T < ∞, and For every T < ∞, ε > 0 and η > 0 there exist N 0 and θ > 0 such that wherew is defined in (1).
Proof of Theorem 3.1. The structure of the proof is as follows. We invoke Lemma 3.1 that allows us to work with the cost associated with the integrated version of the processes. We establish Ctightness of the integrated processes; more precisely, of the sequence (Ŵ n , IX n , IŶ n , IẐ n ). The rest of the proof is devoted to showing that any subsequential limit of this sequence gives rise to control within the extended class A P , where the justification to work with the extended class is provided by Lemma 3.2.
We thus will rely on Lemma 3.1 and work withJ n . Using (17) and Lemma 3.1, V n = inf U nJ n (U n ). Fix a subsequence {n ′ } along which limJ n ′ (U n ′ ) = V , and relabel it as {n}. Assume, without loss of generality, thatJ n (U n ) < V (x 0 ) + 1 for all n. ThenJ n (U n ) is bounded, and so is J n (U n ), and therefore, for every T < ∞, This shows that Ẑ n (T ) , n ∈ N, is tight as a sequence of r.v.s, for each T .
Recall thatÂ n andŜ n converge u.o.c. to BMs, and note by (8) that T n i (t) ≤ t for every t. Using this and equations (19) and (20) shows that the sequence of processesŴ n is C-tight.
Given T , using the monotonicity ofẐ n i (·), the Lipschitz constant of IẐ n i | [0,T ] is bounded by Ẑ n (T ) . Thus, using the characterization (53)-(54), the tightness ofẐ n (T ) for each T implies that IẐ n is a C-tight sequence of processes. The condition (51) implies that, for every T , X n T , n ∈ N, is a tight sequence of random variables. As a result, by (53)-(54), the sequence IX n is also C-tight. Next, by (18), It follows from this discussion that, for each T , is a tight sequence of r.v.s, and that (IX n , IŶ n , IẐ n ) is C-tight, with bound L n (T ) on the Lipschitz constant over the interval [0, T ]. Since L n (T ) are tight for each T , any weak limit point of the C-tight sequence is a process having locally Lipschitz paths a.s.
Next, since for each T , the sequence Ŷ n T is tight, we have by (21) and the fact µ n i / √ n → ∞, (20), using a lemma regarding random change of time [13], p. 151, it follows thatŴ n ⇒ W , where we recall that W is an (m, σ)-BM.
By tightness of (Ŵ n , IX n , IŶ n , IẐ n ), there exists a convergent subsequence. Denote its limit by (W, IX, IY, IZ). Note that the last three terms have Lipschitz sample paths. By an argument as in section IV.17 of [20], they possess a.e. derivatives that are progressively measurable w.r.t. the filtration Then (X, Y, Z) are progressively measurable, and IX = IX. We will show below that these processes along with the filtration {F ′ t } form an element of the class A P (x 0 ). Consequently, using Lemma 3.1 and Fatou's lemma for the subsequence under consideration, where in the last equality we used Lemma 3.2.
It thus remains to show that the progressively measurable processes we have constructed form an element of the class A P (x 0 ). To show (23), we borrow a few lines from the proof of Lemma 6 of [9]. Fix 0 ≤ s ≤ t < t + u. Let α n = (Ŵ n (s), IX n (s), IŶ n (s), IẐ n (s)) and α = (W (s), IX(s), IY (s), IZ(s)). For i ∈ I let t n i [resp., τ n i ] denote the renewal epoch of A n i [resp., S n i ] following t [resp., T n i (t)]. That is, Let β n = (β n i ) i∈I be defined by Then α n and β n are mutually independent. As a result, α n and γ n = (γ n i ) i∈I are mutually independent, where Recall the definition (20) ofŴ n . We have t n i ⇒ t, and T n (t) ⇒T (t) by which τ n i ⇒ ρ i t. As a result,Ŵ n i (t + u) −Ŵ n i (t) − γ n ⇒ 0. This shows that α and W (t + u) − W (t) are mutually independent. Since u > 0 and s ≤ t are arbitrary, an application of Theorem 1.4.2 of [21] shows that all increments W (t + u) − W (t) and F ′ s are independent. Let X δ = {x ∈ R I : dist(x, X ) < δ}, δ > 0. Condition (51) implies that for every s < t and δ > 0, (t − s) −1 (IX n (t) − IX n (s)) ∈ X δ occurs with probability tending to 1 as n → ∞. As a result, (t − s) −1 (IX(t) − IX(s)) ∈ X δ a.s. Since X is closed and convex, the intersection of X δ over δ > 0 gives X , so (t − s) −1 (IX(t) − IX(s)) ∈ X a.s. Thus X(t) ∈ X for a.e. t, a.s. Now, each IẐ n i is nonnegative, nondecreasing and convex, hence so is IZ i . Therefore Z i is nonnegative and nondecreasing. As for θ · Y , note that it is a.e. equal to the pathwise left-derivative of the process θ · IY , which, for reasons as above, has convex sample paths a.s. Hence θ · Y is a.e. equal to a nondecreasing process. This shows that (X, Y, Z) ∈ A P (x 0 ) and completes the proof.

A nearly optimal policy in the case of a rectangle
In this section we consider the case of a rectangular domain, where each customer class has a dedicated buffer. We have introduced in Section 2.3 some notation for this case, and identified the curve γ. In particular, the domain X is given by (45), where b i > 0 are fixed constants, and the parameter x is given by θ · b. The classes are labeled so that . . , I} and one has 0 =b I <b I−1 < · · · <b 1 <b 0 = θ · b = x. With this notation, γ is given (as in (49)) by The difficulty in treating the queueing model according to the BCP solution, as described in terms of γ, is that this curve lies along the boundary of X , in particular, along the part ∂ + X := {x ∈ X : x i = b i for some i} of the boundary ∂X . This part corresponds to states at which some of the buffers are full. This sets up contradictory goals of keeping some of the buffers (nearly) full and at the same time avoiding any rejections except when the workload process reaches the level x * . The policy we propose is based on an approximation of γ by another curve that is bounded away from the buffer limit boundary.
Let ε ∈ (0, min i b i ) be given. Let a i = b i − ε, i ∈ I, and a * := x * ∧ (θ · a) < x = θ · b. Note that if ε is small then a * = x * (unless x * = x). We define an approximation γ a : [0, x] → X of γ by first defining it on [0, θ · a] as the function obtained upon replacing the parameters (b i ) by (a i ) in (48) and (49). That is, for w ∈ [0, θ · a), the variables j = j(w) and ξ = ξ(w) are determined via The figures depict possible statesX n (t) = x at a time when the normalized workload θ · x = x 1 + · · · + x 6 is around 3.5. The target population distribution is then γ(3.5) = (0, 0, 0.5, 1, 1, 1). The class of low priority (L) is the maximal i with x i < a i . The classes served (S) are the high priority classes having positive population. Thus, for all i, i is being served provided that x i exceeds the target population γ i (3.5). and Given w ∈ [0, θ · a), we will sometimes refer to the unique pair (j, ξ) alluded to above as the representation (j, ξ) of w via (56). Next, on [θ · a, θ · b] we only need the function γ a to be continuous and satisfy the relation θ · γ a (w) = w. For concreteness we may define it as the linear interpolation between the points (θ · a, a) and (θ · b, b): We also defineâ j = I i=j+1 θ i a i , j ∈ {0, 1, . . . , I}, similarly tob j . The definition of the policy is provided by specifying (Z n (t), B n (t)) as a function of X n (t).
Rejection policy: As under any policy, in order to keep the buffer size constraint (12), all forced rejections take place. That is, if a class-i arrival occurs at a time t whenX n i (t−) + n −1/2 > b i , then it is rejected. Apart from that, no rejections occur from any class except class i * , and no rejections occur (from any class) when θ ·X n < a * . When θ ·X n ≥ a * , all class-i * arrivals are rejected.
Service policy: For each x ∈ X define the class of low priority When there is at least one class among H(x) having at least one customer in the system, L(x) receives no service, and all classes within H(x), having at least one customer, receive service at a fraction proportional to their traffic intensities. More formally, denote H + (x) = {i ∈ H(x) : x i > 0}, and define ρ ′ (x) ∈ R I as , if x i = 0 for all i < I and x I > 0.
(58) (Note that H + (x) = ∅ can only happen if x i = 0 for all i < I, which is covered by the first and last cases in the above display). Then for each t, Note that when H + (x) = ∅, That is, all prioritized classes receive a fraction of effort strictly greater than the respective traffic intensity. Also note that i B n i = 1 wheneverX n is nonzero. This is therefore a work conserving policy. See Figure 3 for an example of how the class with low priority and the served classes are determined.  Although we have assumed ε > 0, the policy is well-defined even for ε = 0, in which case a = b, a * = x * and γ a = γ. This policy is not used here but it is used in the next section.
Arguing by induction on the times when the driving processes A n and S n jump, it is clear that there exists a unique solution to the set of equations (7)-(10), (59) along with the verbal description of the rejection mechanism. Thus the policy is well-defined.
Proof. We fix ε and write U n = (Z n , B n ) for U n (ε). We denote by τ n the time of the first forced rejection. A crucial point about the proof idea is that most of the analysis is performed on the processes up to the first forced rejection. It is established that the target state is asymptotically achieved by the proposed policy, in the sense of weak convergence as n → ∞. This is done in two steps: First, the workload process θ n ·X n is shown to converge to a RBM, and then it is shown that X n lies close to the minimizing curve at all times. Once these elements are established, it follows that in any finite time, τ n is not reached, and as a result one has that (i) only rejections from class i * occur, and only when θ n ·X n ≈ a * ; (ii) the running cost is minimized locally. These elements are then combined with some integrability conditions at the last step of the proof.
We begin with the case where the system starts with initial condition close to the minimizing curve. More precisely, X n (0) − γ a (θ ·X n (0)) → 0 as n → ∞, and θ n ·X n (0) ∈ [0, a * ] for all n large.
At the last step of the proof we relax this assumption.
Let W •,n := W #,n (· ∧ τ n ) denote the process W #,n when stopped at the time τ n . Define similarly X •,n , Y •,n and Z •,n . Our goal in the step is to show that the sequence (W •,n , X •,n , Y •,n , Z •,n ) is C-tight, and that any subsequential limit (W ,X,Ỹ ,Z) satisfies a.s., To this end, note first that the argument for C-tightness of the processesŴ n , given in the proof of the lower bound, is valid here. As a result, W #,n are C-tight. Hence so are W •,n .
By construction (see (59)), the policy is work conserving, namely i B n i (t) = 1 wheneverX n (t) is nonzero. By the relations (8) and (21), it follows that the nondecreasing processŶ #,n does not increase when X #,n > 0. A similar property then holds for the stopped processes, and this can be expressed as Fix T > 0. We show next that, as n → ∞, For ε ′ > 0 consider the event Ω n 1 := {sup t∈[0,T ] X •,n (t) > a * + ε ′ }. On this event there exist random times 0 ≤ τ n 1 < τ n 2 ≤ τ n such that X #,n (τ n 1 ) ≤ a * + ε ′ /2, X #,n (τ n 2 ) ≥ a * + ε ′ and X #,n (t) > a * for all t ∈ [τ n 1 , τ n 2 ]. Thus by (63) and the fact that Y #,n does not increase on an interval where the system is not empty, denoting here and in the sequel A[s, t] = A(t) − A(s) for any process A, where we used the fact that the policy rejects all class-i * jobs when X #,n > a * . Fix a sequence r n > 0, r n → 0, such that √ nr n → ∞. In case τ n 2 − τ n 1 < r n , the above implies ε ′ /2 ≤ W #,n [τ n 1 , τ n 2 ] ≤w T (W #,n ; r n ).
In case τ n 2 − τ n 1 ≥ r n , for some positive constant c. Combining the two cases, the C-tightness of W #,n and the tightness ofÂ n shows that P(Ω n 1 ) → 0 as n → ∞. Since ε ′ > 0 is arbitrary, (66) follows. Since rejections occur only when X •,n ≥ a * , we have Moreover, we can use (63) to write Combining these relations with (65) shows that the defining relations of the Skorohod problem, namely (42)- (43), are valid here, implying By (66), E n ⇒ 0 uniformly on compacts. Recall that W •,n are C-tight. IfW denotes a subsequential limit of it, using the continuity of Γ [0,a * ] and using (66) once again, shows that along the same subsequence, (W •,n , X •,n , Y •,n , Z •,n ) converges, and that its limit satisfies (64), as claimed. The Skorohod map maps continuous paths starting in [0, a * ] to continuous paths. Hence (W ,X,Ỹ ,Z) have continuous paths a.s. This proves the claimed C-tightness of these processes.
Step 2. State space collapse. The next major step is to show that the multidimensional procesŝ X n lies close to the minimizing curve. More precisely, we will show that, as n → ∞, uniformly on compacts.
Denote by G = {x ∈ X : θ · x ≤ a * , x = γ a (θ · x)} the set of points lying on the minimizing curve, and recall the set ∂ + X = {x ∈ X : x i = b i for some i} corresponding to the buffer limit boundary. These two compact sets do not intersect. As a result, there exists ε 0 > 0 such that for any 0 < ε ′ < ε 0 , G ε ′ and (∂ + X ) ε ′ do not intersect, where for a set A ∈ R I we denote In what follows, it is always assumed that ε ′ < ε 0 . Forced rejections occur only at times when X n lies in (∂ + X ) ε ′ (for all n large). As a result, as long as the processX n lies in G ε ′ , no forced rejections occur. This observation can be used to deduce that σ n ≤ τ n , where σ n =ζ n ∧ ζ n , ζ n = inf{t : X #,n ≥ a * + ε ′ }, ζ n = inf{t : max Note carefully that σ n is not precisely given as inf{t :X n (t) / ∈ G ε ′ }, because X #,n is defined using θ n while γ a and G are defined with θ. However, since θ n → θ andX n remains bounded, the conclusion that σ n ≤ τ n , provided that n is sufficiently large, is valid.
We turn to proving (67). It suffices to show that P(σ n < T ) → 0, for any small ε ′ > 0 and any T . Fix ε ′ and T . Thanks to the fact that σ n ≤ τ n , P(σ n < T ) = P(σ n < T, σ n ≤ τ n ) We have established in Step 1 the convergence (66), from which it follows that P(ζ n ≤ T ∧ τ n ) → 0 as n → ∞. It therefore suffices to prove the following.
Proof. On ζ n ≤ T ∧ τ n let x n := X #,n (ζ n ) = X •,n (ζ n ) and let j = j n and ξ n be the corresponding components from the representation (j, ξ) of x n (with w = x n ).
Fix such δ and δ ′ . Then for all large n, where, denoting by T n the interval [(ζ n − δ ′ ∨ 0), ζ n ], Note that we have used the identity X #,n = X •,n on [0, τ n ]. We fix k and analyze Ω n,k , by an argument similar to (but somewhat more complicated than) that used in Step 1 to treat Ω n 1 . The value assigned by the policy to B n (see (59)) remains fixed asX n varies within any of the intervals (â j ,â j+1 ). Aiming at showing that P(Ω n,k ) → 0 as n → ∞, for each k, we will first consider the case whereX n remains in one of these intervals during the time window T n ; that is, (I)Ξ k ⊂ (0, a * ) and for all j,â j / ∈Ξ k . Then we consider the cases (II)Ξ k ⊂ (0, a * ) butâ j ∈Ξ k for some j ∈ {1, 2, . . . , I − 1}.
There may be additional intervalsΞ k , but they are all subsets of (a * , ∞) and therefore not important for our purpose.
(I)Ξ k ⊂ (0, a * ) and for all j,â j / ∈Ξ k . Note that this means that all points x inΞ k lead to the same j in the representation (j, ξ) of x given by (56). Note that j = j(k) depends on k only, and in particular does not vary with n. Also, j = j n under Ω n,k . In what follows, j = j(k).
Note that γ a is continuous and that ∆ n (0) → 0 as n → ∞, by (61). Using the fact that the jumps ofX n are of size n −1/2 , on the event indicated in (71) there must exist η n ∈ [0, ζ n ] with the properties thatX On this event, during the time interval [η n , ζ n ], i is always a member of H(X n ), and therefore by (59)-(60), B n i (t) = ρ ′ i (X n (t)) > ρ i + c, for some constant c > 0. Thus by (21) Moreover, if we defineη n = η n ∨ (ζ n − δ ′ ) then for all t ∈ [η n , ζ n ] one hasX n (t) ∈Ξ k ⊂ (0, a * ) and therefore no rejections occur. Using these facts in (18), we havê Again, fix a sequence r n > 0 with r n → 0 and r n √ n → ∞. If ζ n − η n < r n and n is sufficiently large thenη n = η n , thus by (71) and the definition of η n ,X n i [η n , ζ n ] ≥ ε ′′ /2. As a result, w T (Ŵ n i ; r n ) ≥Ŵ n i [η n , ζ n ] ≥ ε ′′ /2 must hold. If, on the other hand, ζ n − η n ≥ r n then by (73), for some constant c > 0. Hence the probability in (71) is bounded by which converges to zero as n → ∞, by C-tightness ofŴ n . This proves (71).
Suppose that we show (except in the case j = I) for any fixed ε ′′ and all large n that on the event indicated in (76), j ∈ H(X n (t)) whenever, prior to ζ n , one has ∆ n j (t) ∈ (ε ′′ /2, ε ′′ ).
Then we can argue as in the case of i > j, with the following modifications. Let C n (t) = γ a j (X •,n (t)). Then ∆ n j =X n j − C n , and similarly to (72), there exists η n ≤ ζ n such that ∆ n j (η n ) < ε ′′ /2, ∆ n j (t) > 0 for all t ∈ [η n , ζ n ].
Since by (77) j is high priority during this interval we will still have identity (73) valid. Arguing separately for the cases ζ n − η n < r n and ζ n − η n ≥ r n , leads, in analogy to (74), to the conclusion that the probability in (76) is bounded by In addition to the C-tightness ofŴ , we now invoke that of C n , which follows from the continuity of γ a and the C-tightness of X •,n . This shows (74). Now, since θ · γ a (θ · x) = θ · x for all x ∈ X , θ n → θ, and γ a uniformly continuous and X bounded, we have q n := sup x∈X |θ · γ a (θ n · x) − θ · x| → 0, as n → ∞.
To show that (77) holds (except in the case j = I), note by (79) where we used γ a i = γ a i (X •,n ) = 0 for i < j and γ a i = a i for i > j. For all large n, this implieŝ X n i < a i for at least one i > j, by which j ∈ H(X n ).
Since ε ′′ is arbitrarily small, it follows from the definition of ζ n that P(Ω n,k ) → 0 as n → ∞.
(II)Ξ k ⊂ (0, a * ) butâ j ∈Ξ k for some j ∈ {1, 2, . . . , I − 1}. Let (j n (t), ξ n (t)) denote the representation (56) for X #,n (t). The difficulty here is that in the time window T n , j n varies between two values, namely j and j + 1, and it is no longer true that γ a j+1 (X #,n ) = a j+1 on that time interval. The way we treat this is by bounding ∆ n from above by a quantity that depends on ε 1 , rather than by an arbitrarily small ε ′′ . To this end, let us show that on Ω n,k , where c 1 = 4/θ min and θ min = min i θ i . Indeed, we have for any w ∈Ξ k , |w −â j | ≤ 4ε 1 , sinceâ j is also inΞ k . Now, if w ≥â j then γ a j+1 (w) = a j+1 . Otherwise, w =â j+1 + θ j+1 ξ =â j − θ j+1 a j+1 + θ j+1 ξ, thus |a j+1 − ξ| ≤ 4θ −1 j+1 ε 1 , whence follows (81). Now, (71) is valid for all i > j + 1, by the proof given in case (I). For i = j + 1 it is also valid, even though γ a j+1 (X #,n (t)) is not necessarily equal to a j+1 . For i < j, (75) is valid with the same proof. As for i = j, (76) is valid with same proof (the fact that γ a j may assume the value zero does not affect this proof).
(III) 0 ∈Ξ k . The only difference of this case from case (I) is that during T n ,X n may hit zero, and so by (58) and (59), B n will be zero. However, the analysis in case (I) is performed only on intervals whereX n = 0, and as a result gives rise to the same conclusion, namely P(Ω n,k ) → 0 as n → ∞.
(IV) a * ∈Ξ k . In this case, during T n , θ ·X n may exceed a * , and so rejections of class i * customers may occur. The only way it affects the proof of case (I) is by adding a negative term to the r.h.s. of (73). However, the consequences of (73) remain valid with this addition. (Note that for all sufficiently small ε one hasâ i = a * for all i, hence assuming ε is sufficiently small, we do not need to check case (II) here.) Having shown that P(Ω n,k ) → 0 in all cases, using (70) and the fact that δ > 0 is arbitrary completes the proof of the lemma.
Step 3. Weak convergence. Having shown that P(σ n < T ) → 0, we have, using σ n ≤ τ n , that P(τ n < T ) → 0. As a result, the conclusion of Step 1 regarding the stopped processes holds also for the unstopped ones. That is, (W #,n , X #,n , Y #,n , Z #,n ) are C-tight, and any subsequential limit satisfies (64) a.s.
For any finite T , the sequence Z #,n (T ) is tight. On the event τ n > T , which has overwhelming probability,Ẑ n (T ) =Ẑ n i * (T )e (i * ) , hence Ẑ n (T ) is a tight sequence. The bound (55) thus gives the tightness of Ŷ n (T ) . The argument from the lower bound in the paragraph following (55) shows thatŴ n ⇒ W as n → ∞. Thus (64) determines the limit of the one-dimensional processes, namely (W #,n , X #,n , Y #,n , Z #,n ) ⇒ (W ,X a ,Ȳ a ,Z a ). Moreover, (Ŵ n ,X n ,Ŷ n ,Ẑ n ) ⇒ (W, X, Y, Z) where θ · X =X a and γ a (X a ) (by (67)) Z = ζ * Z a (by (82)) and Y = X − x 0 − W + Z, by (18). We obtain precisely the relations from Proposition 2.1, except that the reflection interval is [0, a * ] rather than [0, x * ].
We have shown that, as n → ∞, Step 4. Convergence of costs. SinceX n are uniformly bounded, we immediately obtain E ∞ 0 e −αt h ·X n (t)dt → E ∞ 0 e −αt h · γ a (X a (t))dt. As for the second term, we borrow an argument from [11]. Consider the probability space (R + × Ω, B(R + ) × F, m × P), where dm = αe −αt dt. Then the result of the previous step can be expressed as the convergence in law, r ·Ẑ n →rZ a , w.r.t. the probability measure m × P. Thus to obtain E ∞ 0 e −αt r ·Ẑ n (t)dt → E ∞ 0 e −αtrZ a (t)dt, it suffices to show the m × P-uniform integrability (UI) of r ·Ẑ n . For this, it suffices that lim sup n E ∞ 0 e −αt Ẑ n (t) 2 dt < ∞.
It is established in equation (172) of [11] that for a constant c independent of n and t, with the same estimate holding forÂ n . In what follows, we show that we can deduce (83) from (84).
To this end, recall that rejections occur only when either θ ·X n ≥ a * or, for some i,X n i ≥ a i − n −1/2 . In particular, if we letā = a * ∧ min i (a i θ i /2), then using the convergence θ n → θ, we have, for all large n, that no rejections take place when X #,n = θ n ·X n <ā. Consider the truncated version X 1,n :=ā ∧ X #,n of X #,n . Then by (63), X 1,n (t) = W 1,n + Y #,n − Z #,n , where we denote W 1,n = W #,n + E n , E n = X #,n (0) + X 1,n − X #,n .
Step 5. General initial condition. Finally, we relax the assumption (61) on the initial condition. Here we do not give the proof in full detail, but only a brief sketch. Let ε be given and let a, a * be as before. Let τ n 0 denote the first time when a condition analogous to (61) holds; more precisely, let α n > 0, α n → 0 and τ n 0 = inf{t : X n (t) − γ a (θ ·X n (t)) ≤ α n and θ n ·X n (t) ∈ [0, a * ]}.
The idea is to show that (i) with a suitable choice of α n , one has τ n 0 → 0 in probability, and (ii) starting from (τ n 0 ,X n (τ n )) in place of (0,X n (0)), the arguments in all the proof can be repeated without additional effort. While (ii) follows in a straightforward manner, but notationally heavy, (i) is a consequence of similar to the proof of (67). We omit the details.

Proof.
Step 4 in the proof of Theorem 4.1 gives the upper bound on the cost. Thus it suffices to show asymptotic compliance. Now, it follows from the proof of Theorem 4.1 that the controls under consideration satisfy the assumptions of Proposition 5.1. Using this proposition along with the fact that P(σ n < T ) → 0 (see the proof of Theorem 4.1), gives the result.
If the above is true then, by Theorem 5.1, {U n * } are AO for the problem under consideration. One might approach the conjecture by using Proposition 5.1 to connect to the lower bound of Theorem 3.1. The difficulty here is that one must consider an arbitrary sequence of controls, and there is no guarantee that the assumptions of Proposition 5.1, particularly the C-tightness ofŶ n , hold in such generality. We are able to show a partial result.
We address only policies that give rise to state space collapse. More precisely, consider a sequence {U n } n∈N ∈ AC, and write {U n } ∈ AC if it satisfies the following. (i) Each U n is work conserving; (ii) for somex ∈ [0,x), rejections occur only when the scaled workload exceedsx, and only from one particular class (save forced rejections); and (iii) for some continuousγ : [0, x] → X satisfying {x ∈ X : θ · x ≤x, x =γ(θ · x)} ∩ ∂ + X = ∅, one hasX n −γ(θ ·X n ) ⇒ 0 as n → ∞. Set Proposition 5.2. One has V AC ≥ V (x 0 ).
This result is far from being satisfactory. However, it shows that the two formulations are equivalent at least for this restricted class of policies. The proof is based on various elements of the proofs of the finite buffer results.
Proof. Assume, without loss of generality, that V AC < ∞, and consider {U n } ∈ AC with lim inf J n (U n ) < ∞. Finiteness of this quantity gives, along the lines of the proof of Theorem 3.1, thatŴ n are C-tight. The assumptions on {U n } as a sequence in AC imply, by arguments as in the proof of Theorem 4.1, that the one-dimensional processes (W •,n , X •,n , Y •,n , Z •,n ) are C-tight, and any subsequential limit (W ,X,Ỹ ,Z) satisfies a.s., (64). The state space collapse assumption, along with (92) imply that, for any T , P(τ n < T ) → 0, and that the un-stopped processes (X #,n , Y #,n , Z #,n ) as well asX n are C-tight. Finally, C-tightness ofẐ n follows from that of Z #,n by arguments as in the same proof, and that of the processesŶ n follows equation (18) now that we have C-tightness of all the other processes involved. This verifies the assumptions of Proposition 5.1.
As a resultẼ n ⇒ 0 (whereẼ n are defined in (91)). The asymptotic compliance of the sequence of controls along with the convergenceẼ n ⇒ 0 imply the validity of the relaxed assumption (51) under which the lower bound, Theorem 3.1, is proved. Thus we conclude from Theorem 3.1 that lim inf J n (U n ) ≥ V (x 0 ) for any {U n } ∈ AC.