Asymptotic analysis of a multiclass queueing control problem under heavy-traffic with model uncertainty

We study a multiclass M/M/1 queueing control problem with finite buffers under heavy-traffic where the decision maker is uncertain about the rates of arrivals and service of the system and by scheduling and admission/rejection decisions acts to minimize a discounted cost that accounts for the uncertainty. The main result is the asymptotic optimality of a $c\mu$-type of policy derived via underlying stochastic differential games studied in [16]. Under this policy, with high probability, rejections are not performed when the workload lies below some cut-off that depends on the ambiguity level. When the workload exceeds this cut-off, rejections are carried out and only from the buffer with the cheapest rejection cost weighted with the mean service rate in some reference model. The allocation part of the policy is the same for all the ambiguity levels. This is the first work to address a heavy-traffic queueing control problem with model uncertainty.


Introduction
We consider a multiclass M/M/1 queueing model under diffusion-scaled heavy-traffic where the decision maker (DM) is uncertain about the parameters and acts to optimize an overall cost that accounts for this uncertainty. The model consists of a server that at any time instant her effort is allocated by the DM to costumers from several number of classes. Customers of each class are kept in a finite buffer. Apart from the scheduling control, upon arrival of a costumer, the DM has to decide whether to reject it or to assign it to the buffer that corresponds to its class type. The DM has ambiguity about the rates of arrivals and the mean service times. The cost accounts for the ambiguity, the holding of customers in the buffers, and rejections of new arrivals.
The problem without ambiguity, also referred to as the risk-neutral problem, was analyzed by Atar and Shifrin in [6], under the framework of G/G/1. Plambeck et al. studied in [33] a similar non-robust problem with time constraints instead of the finite buffers constraints. In these problems as well as in many other classical models of queueing control problems (QCPs), see e.g., [8,12,13] and the references therein, there is a fixed random model; that is, the DM is certain about the evolution of the system, which, moreover does not change in time. Such an assumption is not realistic, and a robust analysis is desirable.
We assume that based on the available data, the DM has a reference model in mind, which, up to some degree, describes the situation she is facing. To model the uncertainty about the reference model, the DM takes into account other models and penalizes them based on their deviation from the reference model. The penalization depends then on how averse the DM is to ambiguity. Such ambiguity models are sometimes referred to as model uncertainty or Knightian uncertainty, see e.g., [30,21,20,7] and in the context of queueing systems see [23,11,29]. We allow for different levels of model uncertainty for each of the arrival processes as well as for each of the service processes. More specifically, we consider the following cost function, which the DM aims to minimize.
where I is the number of classes. The vectorsĥ andr stand for the holding and the rejection costs, respectively. The I-dimensional processes X and R represent the queue lengths and the rejection processes, respectively. The supremum is taken over the product measures Q = I i=1 (Q 1,i × Q 2,i ) (with some additional technical conditions), where the componenets with the indices 1 (resp., 2) refer to the arrival (resp., service) processes . The function L is a discounted variant of the Kullback-Liebler divergence and measures how far away Q j,i from the reference measure P j,i is. The parameters κ j,i > 0 are the ambiguity parameters, which penalize the deviation from the reference measures. Since the QCP is formulated under heavytraffic, throughout the paper we consider a sequence of queueing systems labeled by the scaling parameter. This parameter is omitted above in order to make the presentation clear and concise.
To tackle QCPs under heavy-traffic one often solves a limiting control problem associated with a Brownian motion, called Brownian control problem (BCP) and uses its solution to construct an asymptotically optimal policy in the QCP. This concept was first introduced by [22]; for further reading on BCPs see e.g., [8,12,13] and the references therein. In our case, the cost function given above suggests that the BCP is in fact a stochastic game. The players in this game are the DM and the nature, which according to their goals are referred to as the minimizer and the maximizer, respectively. Interpreting the roles of the processes from the QCP to a multidimensional stochastic differential game (MSDG), the minimizer controls the server's effort allocation and the admission/rejection, while the maximizer is free to choose a probability measure and is penalized in accordance to the deviation of the chosen measure from the reference measure. This game was analyzed in [16], where it was also shown that a state-space collpase property holds. This is done by considering a reduced stochastic differential game (RSDG), which emerges from the workload process, and showing that the games share the same value and that given equilibrium in either one of the games one can construct an equilibrium in the other game. Further properties of the games, such as dependency on the ambiguity parameters are also given there. This paper is devoted to the connection between the QCP and the BCPs (namely, the MSDG and RSDG); we show that the value function of the QCP is approximately the value function of the BCPs and that the minimizer's optimal strategy from the MSDG leads to an asymptotically stationary optimal policy in the QCP. Roughly speaking, this strategy suggests that the DM should use a cµ type of rule and fill in the buffers in accordance to their holding costs without using rejections, unless a cut-off level of workload that depends on the ambiguity level has been reached and then to use rejections only from the buffer with the cheapest rejection cost, weighted with the mean service rate, until the workload level goes below the cut-off. More specifically, let µ i be the mean service rate of class i customers under the reference model and recall thatĥ i is the holding cost per customer of class i. The DM should prioritized the classes in the order of {ĥ i µ i } I i=1 , where the lowest priority is given to the class with the lowestĥ i µ i among the classes for which the buffers are not 'almost' full. As for the admission control part of the policy, whenever the workload level remains below the mentioned cut-off use rejections only if there is a new arrival to a full buffer; as is shown, the probability of such an event vanishes with the scaling parameter. If the workload level exceeds the cut-off, the DM rejects all incoming customers to the class with the lowestr i µ i . This policy (with different cut-off level) was shown to be asymptotically optimal in the risk-neutral setup and in a QCP with the same mechanism but with the moderate-deviation heavy-traffic regime (instead of the diffusion scaling) and a risk-sensitive cost criteria. The latter QCP models a situation of a 'very' risk averse DM. The only difference between the policies in the three models is in the the position of the cut-off point. Specifically, the allocation policy is the same. This shows us the usefulness of the allocation policy as it is robust to ambiuity. The optimality of the same policy (with differences only in the cut-off level) in these three models is not obvious due to the existence of the maximizer, which leads to a non-stationary problem. For further reading about the moderate-deviation heay-traffic regime see [1,10,2,5,3,4] and in the context of our paper also the discussion in [16, p. 3]. The current paper does not aim to establish a rigorous relationship between the QCP with the model uncertainty described below and the moderate-deviation heavy-traffic regime QCP We now make some comments on proof techniques. The proof is divided into two parts: showing that the value function of the RSDG forms an asymptotic lower bound for the QCP and that the expected cost associated with the candidate policy is asymptotically bounded above by the value function of the RSDG. Keep in mind that the RSDG shares the same value as the MSDG. Also, recall that we consider a sequence of queueing systems. For the lower bound we assign the maximizer a strategy that is driven from the equilibrium strategy in the MSDG and using it to show that for any sequence of strategies used by the minimizer, the expected cost is bounded below by the value of the RSDG. By its structure the maximizer's strategy preserves the critical load of the system. The main technical difficulty in this part is that the sequence of strategies of the minimizer is arbitrary and due to the nature of the control in the QCP, which is replaced by a singular control in the BCP, compactness arguments do not apply here. In the risk-neutral case, where there is no maximizing player, Atar and Shifrin [6] managed to bypass this issue by C-tightness arguments applied to the integrands of the relevant processes and later on taking the derivatives of the implied limiting processes. In our setup, the maximizer's starategy depends on the scaled queue lengths process and not on its integrand. Therefore, the C-tightness of the sequence of the scaled queue lengths processes is required. As a result, the integrand-derivative method cannot be applied in our case. We use the time-stretching method, which was introduced by Meyer and Zheng in [32] and studied in the same framework by Kurtz in [26]. In the conext of stochastic control the method was first used by Kushner and Martins in [31,27] and was adopted in [12,14,15]. We set up a random time transformation for any system such that the controls are Lipschitz continuous with Lipschitz constant 1. Then we can apply C-tightness arguments for the sequence of the relevant time-stretched processes (including the scaled queue lengths process) and obtain limiting processes. Using an inverse time transformation we go back to the original scale. Finally, we connect between the costs associated with the time rescaled limiting processes and the value of the RSDG.
To show that the expected cost associated with the candidate policy is asymptotically bounded above by the value function of the RSDG we do the following. We consider an arbitrary sequence of strategies for the maximizer. This sequence of probability measures is not forced to satisfy the critical load condition. Therefore, at first step we show that it is 'too costly' for the maximizer to have a big deviation from the reference measure and thus restrict the maximizer to strategies that in average do not deviate much from the reference measure (Proposition 4.2). Then, in Proposition 4.3 we adapt a state-space collapse result from the risk-neutral case to ours and show that under the sequence of strategies chosen by the maximizer, the underlying stochastic dynamics of the scaled queue lengths lie close to a specific path, so that the properties of the policy mentioned before the previous paragraph hold. At this point one can use C-tightness arguments to conclude the convergence of the relevant dynamics including the holding and rejection cost parts. However, in order to estimate the change of measure penalty given by the Kullback-Liebler divergence one shall have another reduction and 'truncate' the maximizer's strategies such that the critically load condition is almost surely preserved. Then we show that the relevant processes and all the cost components associated with the two levels of restrictions of the maximizer's strategies are close to each other (Proposition 4.4). This approximation relies also on the state-space collapse. Thus, the rest of the analysis is perforem in the more convenient case, where the critically load condition holds and the expected cost is shown to be asymptotically bounded from above by the value function of the RSDG. For this, we reduce to one-dimensional dynamics and estimate from below the change of measure penalty and together with the convergence of the holding and rejections cost components, we conclude the upper bound.
The paper is organized as follows. Section 2 presents the model. Section 3 collects a few results from [16] required for the proof. In Section 4 we provide and prove the main result (Theorem 4.1), which states that the QCPs value converges to that of the BCP from [16] and an asymptotically optimal policy is provided. Some auxiliary results appear in the Appendix.

Notation
We use the following notation. For a, b ∈ R, a ∧ b = min{a, b} and a ∨ b = max{a, b}. For a positive integer k and c, d ∈ R k , c · d denotes the usual scalar product and c = (c · c) 1/2 . We denote [0, ∞) by R + . The infimum of the empty set is taken to be ∞. For subintervals I 1 , I 2 ⊆ R and m ∈ {1, 2} we denote by C(I 1 , I 2 ), C m (I 1 , I 2 ), and D(I 1 , I 2 ) the space of continuous functions [resp., functions with continuous derivatives of order m, functions that are right-continuous with finite left limits (RCLL)] mapping I 1 → I 2 . The space D(I 1 , I 2 ) is endowed with the usual Skorohod topology. For T, δ > 0 and a function f : For any RCLL processes X, Y , the quadratic variation of X is denoted by [X] and the quadratic covariation of X and Y is de noted by [X, Y ].

The queueing model
We consider a QCP with customers of I different classes that arrive at the system to get serviced by a single server. Upon arrival the customers are queued in I buffers with finite capacity in accordance to their class. Processor sharing is allowed and the server may serve up to I customers at a time, where two customers from the same class cannot be served simultaneously. We study the system under heavy-traffic. Hence, we consider a sequence of systems, indexed by the scaling parameter n ∈ N.

The reference model
For every i ∈ [I] := {1, . . . , I} and n ∈ N we consider the probability spaces (Ω n 1,i , G n 1,i P n 1,i ) and (Ω n 2,i , G n 2,i , P n 2,i ) that respectively support a Poisson process A n i with a given rate λ n i and a Poisson process S n i with rate µ n i . The process A n i counts the number of arrivals to the i-th buffer and S n i (t) stands for the number of service completions of costumers of class i had service was given to class i for t units of time. Denote A n = (A n i ) I i=1 and S n = (S n i ) I i=1 . We consider a different level of uncertainty about each of the 2I components of (A n , S n ), and thus set the following reference probability space that supports these processes By the structure of the probability space, under the measure P n , the 2I processes A n 1 , S n 1 , . . . , A n I , S n I are mutually independent. Moreover, the distribution of A n i (resp., S n i ) under P n is identical to its distribution under P n 1,i (resp., P n 2,i ).
represents the fraction of effort the server dedicates to class i (recall that process sharing is allowed). Then gives the time the server dedicates to class i customers that are present in the system at time t and the number of class i job completions by time t is thus S n i (T n i (t)). This is a Cox process with rate µ n i U n i . We allow rejections of customers (only) upon arrival and a rejected customer will never return to the system. The number of rejections from class i until time t is denoted by R n i (t) and satisfies, for some process z n i . Denote by X n i (t) the number of class i customers in the system at time t. Then, the balance equation is given by For simplicity, we assume that X n i (0) is deterministic. We use the notation L n = (L n i ) I i=1 for L ∈ {X, R, T }. Since U n is an RCLL process, and by construction, so are A n and S n , we conclude that X n and R n are RCLL as well. The capacity of buffer i is given byb n i :=b i n 1/2 for some constantb i ∈ (0, ∞), i ∈ [I].
We now introduce the diffusion scaling and the heavy-traffic condition. First, we assume that for some fixed constants λ i , µ i ∈ (0, ∞) andλ i ,μ i ∈ R. Moreover, the system is assumed to be critically loaded, that is, The process (U n , R n ) is regarded as a control in the n-th system and is now given rigorously. Definition 2.1 (admissible control for the decision maker, QCP) An admissible control for the minimizer for any initial state X n (0) is a process (U n , R n ) taking values in U × R I + that satisfies the following, (i) (U n , R n ) is adapted to the filtration G n t = G n (t) := σ{A n i (s), S n i (T n i (s)), i ∈ [I], s ≤ t} and has RCLL sample paths; (ii) the processes {R n } n are nondecreasing; (iii) for each i ∈ [I] and t ∈ R + , X n i (t) = 0 implies U n i (t) = 0; (iv) the buffer constraints X n The first condition expresses the fact that the DM makes her decision based on past observations. The second condition follows since rejections are accumulated. The third condition asserts that service cannot be given to an empty buffer. We denote the set of admissible controls for the DM in the n-th system by A n (X n (0)).

The optimization problem with model uncerainty
Recall that we consider a DM that is uncertain about the underlying reference probability measure P n , or in other words, she suspects that the rates/intensities {λ n i } I i=1 and {µ n i } I i=1 may be unspecified or may even change over the time. Therefore, instead of optimizing under the reference measure P n , she considers a set of candidate measures (provided in the sequel) and penalizes their deviation from P n . The penalization is done by using a discounted variant of the Kullback-Leibler divergence. More explicitly, the QCP is set up as a stochastic game that models a type of worst case scenario. The players are: the DM that chooses a policy that minimizes a cost and a maximizing player also referred to as the nature or maximizer, who has access to the policy chosen by the minimizer and to the history. The nature is penalized for deviating from the reference model. We are interested in a cost that accounts for the scaled queue lengths and rejections, in addition to the uncertainty about the model. Denote the scaled headcount process and the scaled rejection count bŷ Fix a discount factor > 0, vectors of holding and rejection costsĥ,r ∈ (0, ∞) I , and ambiguity parameters κ : The DM is facing the following robust optimization problem: where J n (X n (0), U n , R n ,Q n ; κ) := (2.4) and dT n i (t) . (2.5) The last two sums in (2.4) are referred to as the change of measure penalty. When κ j,i is 'small' (resp., 'big') we say that there is a weak (resp., strong) ambiguity about the rates of the processes A n i and S n i (T n i ) := S n i (T n i (·)). The idea is that for small κ j,i 's there is a big punishment per unit of deviation from the reference measure and therefore, the measuresQ n j,i and P n j,i should be close to each other and as a consequence also the relevant expectations. However, one needs to make sure that the total punishment given by 1 κ j,i L j (Q n j,i P n j,i ) is also small. In [16, Theorems 5.1 and 5.2] we show that as the ambiguity parameters converge to zero, the stochastic differential games, which are provided in the same paper and are summarized below in Section 3, converge to the risk-neutral BCP studied in [6]. Therefore, our problem indeed models ambiguity with respect to (w.r.t.) the risk-neutral model.
The set of candidate measuresQ n (X n (0)) consists of all the product measuresQ n = I i=1 (Q n 1,i ×Q n 2,i ) that for every i ∈ [I] and t ∈ R + satisfy for {G n t }-predictable measurable and positive processes ψ n j,i , j ∈ {1, 2}, satisfying t 0 ψ n j,i (s)ds < ∞ P-almost surely (a.s.), for every t ∈ R + . These conditions assure, first, that the right-hand sides in (2.6) are P n j,i -martingales, j ∈ {1, 2} and second, that under the measureQ n 1,i (resp., Q n 2,i ), A n i (resp., S n i (T n i )) is a counting process with intensity ψ n 1,i (resp., ψ n 2,i U n i ). Notice also that under the measuresQ n j,i , j ∈ {1, 2}, the critically load condition might be violated since we do not restrict the intensities {ψ n j,i } j,i,n in such a way. However, as we argue in (4.12), Proposition 4.2, and Section 4.3.1, such changes of measures are 'too costly' and will be avoided by the maximizer so that 'in avergae', ψ n 1,i (t) = λ n i + O(n 1/2 ) and ψ n 2,i (t) = µ n i + O(n 1/2 ).
(ii) Notice that the QCP is a stochastic game that models a type of worst case scenario. The minimizer chooses a strategy and the maximizer, which is penalized for deviating from the reference measure, responses to this strategy by choosing a worst case scenario. For further reading about the structure of the information in control problems with model uncertainty, the reader is referred to [35].

The BCPs
We now present two BCPs that have the form of stochastic differential games that approximate the QCP and that were fully analyzed in [16]. One game is I-dimensional as the QCP, and the other is one-dimensional with the workload process as its underlying state. We also provide some of the results from [16] that are relevant to the present paper.

The multidimensional stochastic differential game (MSDG)
To derive the game we introduce some scaled processes and constants in addition toX n and R n introduced earlier. Set for every i ∈ [I] and t ∈ R + A n i (t) := n −1/2 (A n i (t) − λ n i t),Ŝ n i (t) := n −1/2 (S n i (t) − µ n i t), Y n i (t) := µ n i n −1/2 (ρ i t − T n i (t)),m n i := n −1/2 (λ n i − ρ i µ n i ). (3.1) We use the notationL n = (L n i ) I i=1 for L ∈ {X, R, A, S, Y, m}. The scaled version of (2.2) is given by,X An admissible policy satisfieŝ Under the reference measure P n , {(Â n ,Ŝ n )} n weakly converges a 2I-dimensional (0,σ)-Brownian motion, whereσ := Diag (λ ). As we show rigorously in the proof of Lemma 4.4,Ŷ n is of order one as n → ∞. Hence its definition implies that T n (t) → (ρ 1 , . . . , ρ I )t, t ∈ R + , and therefore, under P n , {(Â n i −Ŝ n i (T n i )) I i=1 } n weakly converges to an I-dimensional (0,σ)-Brownian motion, whereŜ n i (T n i ) :=Ŝ n i (T n i (·)), i ∈ [I], and Recall that in the QCP, an admissible control is of the form (U n , R n ). Notice that (Ŷ n (t),R n (t)) is uniquely determined by (U n (s), R n (s)) 0≤s≤t . Hence, we consider an instanteneous control (Ŷ ,R). Definition 3.1 (admissible controls, MSDG) An admissible control for the minimizer for any initial statex 0 ∈ X is a filtered probability space (iii) The controlled process An admissible control for the maximizer is a product measureQ for an {F t }-progressively measurable processψ = (ψ 1 , . . . ,ψ I ) satisfying The motivation for Condition (ii) is that the rejection process (in the QCP) and also θ n ·Ŷ n are nondecreasing, where θ n := (1/µ n 1 , . . . , 1/µ n I ). Denote byÂ(x 0 ) and (resp.,Q(x 0 )) the set of all admissible controls for the minimizer (resp., maximizer), given the initial conditionx 0 .
In [16, (2.17)] it is argued that the controlled process can alternatively be written aŝ Pay attention that the difference between the processesÂ n i andŜ n i (T n i ) approximately behaves like Brownian motion (without drift under P i and with drift underQ i ). Hence, we consider only I changes of measures instead of 2I. For this, in the cost function we consider new ambiguity parameters, The intuition behind this form is given in [16, p. 10] and is rigorously justified in (4.88). Set The cost associated with the initial conditionx 0 and the strategies (Ŷ ,R) andQ is given bŷ The DM faces the following robust optimization problem

The reduced stochastic differential game (RSDG)
The I-dimensional game can be reduced to a game with one-dimensional dynamics by projecting the controlled process onto the workload vector θ. This game is referred to as the reduced stochastic differential game (RSDG).
To introduce the game, we need the following notation, Definition 3.2 (admissible controls, RSDG) An admissible control for the minimizer for any initial state x 0 ∈ [0, b] is a filtered probability space (Ω, F, {F t }, P) that supports a onedimensional standard Brownian motion B and a process (Y, R) taking values in R 2 + with RCLL sample paths, both adapted to the filtration {F t } and satisfy the following properties: (ii) Y and R are nonnegative and nondecreasing; (iii) The controlled process satisfies An admissible control for the maximizer is a meausre Q defined on (Ω, F, {F t }) such that for an {F t }-progressively measurable process ψ satisfying Denote by A(x 0 ) (resp., Q(x 0 )) the set of all admissible controls for the minimizer (resp., maximizer), given the initial condition x 0 . The cost associated with the initial condition x 0 and the controls (Y, R) and Q is given by

13)
r := min{r · q : q ∈ R I + , θ · q = 1}, (3.14) and L (Q P) is given by (3.9) with (Q, P) replacing (Q i , P i ). By the convexity of X it follows that h is convex. In fact, h is piecewise linear and Lipschitz continuous. Moreover, h(x) ≥ 0 for x ≥ 0 and equality holds if and only if x = 0. Therefore, h is strictly increasing. In [6, page 568] it is shown that there is i * ∈ [I] such that, The index i * stands for the class with the smallest rejection cost, weighted with the mean service rate. In fact, as was shown in [16,Theorem 4.1], in an equilibrium of the MSDG, rejections are performed only from this class. The candidate asyptotically nearly optimal policy in Section 4.1 asymptotically satisfies this condition. In [16, (2.23)] it is argued that This form will turn out to be usefull in the approximation procedure, see (4.35) and (4.89) below.
The value function is given by

Properties of the games
The RSDG admits a simple optimal strategy for the minimizer that enforces the workload to stay in a specific interval of the form [0, β] with minimal effort. To rigorously define such a strategy we make use of the Skorokhod map on an interval. Fix β > 0. For any η ∈ D(R + , R) there exists a unique triplet of functions (χ, ζ 1 , ζ 2 ) ∈ D(R + , R 3 ) that satisfies the following properties: . See [25] for existence and uniqueness of the solution, and continuity and further properties of the map. In particular, we have the following.
The next proposition provides a characterization of the value function of the RSDG and an equilibrium.
Moreover, set . Then the β ε -reflecting strategy is admissible and optimal for the minimizer and V (·) = V (·; ε) satisfies, Moreover, let Q V = Q V (·;ε) be the measure driven by ψ V (t) := εσV (X(t); ε). Then, the β ε strategy and the measure Q V form an equilibrium. That is, From the definition of (3.18) and the proposition above we get the following corollary, which is given for reference purposes.
The strategy of the maximizer in the RSDG under equilibrium plays an important role in the asymptotics given in Section 4.2.
Equilibrium in the MSDG. Given an equilibrium in the RSDG, one can construct an equilibrium in the MSDG. This construction is summarized now. Without loss of generality assume that where {e 1 , . . . , e I } is the standard basis of R I . The curve γ(x), x ∈ [0, b] is continuous and located on the edges of X , see Figure 1. The idea is as follows, recall that the components of Y = (Ŷ i ) I i=1 can be positive or negative, as long as θ ·Ŷ is nonnegative and nondecreasing. Now, as the workload changes in the interval [0, b), the minimizer can use only the processŶ , without the need of the processR, so thatX moves along the curve of γ. As we argue now, the processR should be active only when the workload exceeds the level β ε and only through the coordinate with the cheapest rejection cost, weighted with the service rate. More explicitly, let (Y βε , R βε ) be an optimal reflecting strategy for the minimizer. That is, where i * is given immediately after (3.15) and for any t ∈ R + , X βε (t) : = γ(X βε (t)).
Proposition 3.2 (Theorem 4.1 in [16]) The value functions of both games coincide and moreover, the strategies (Ŷ βε ,R βε ) andQ V form an equilibrium in the MSDG.
The construction of the nearly optimal policy in the QCP relies on the equilibrium strategy of the minimizer in the MSDG we just described.

Nearly optimal policy
As mentioned in the previous section, the policy relies on the function γ that maps workloads to points on a minimizing curve on the set X . The pre-limit process is discrete and therefore, we use a neighborhood of the minimizing curve. We show that with high probability it is possible to 'almost' trace the minimizing curve without rejections, unless the workload level β ε is reached in which case rejections occur, and only from the buffer with the cheapest rejection cost. Again, without loss of generality, we assume that the classes are labeled in such a way that (3.21) holds. Recall the definition of the cut-off parameter β ε . Fix δ 0 > 0 and let a = (â 1 , . . . ,â I ) be given byâ and let On the interval [θ ·â, b] define γ a as the linear interpolation between the points (θ ·â,â) and Recall the definition of h from (3.13) and let Also, set By the choice of a it is clear that ω 1 (0+) = 0.
Rejection policy: In case that a class-i arrival occurs at a time t whenX n i (t−) + n −1/2 >b i , then it is rejected. Such rejections are called forced rejections. Whenever θ ·X n ≥ a, all classi * (see the paragraph preceeding (3.15)) arrivals are rejected, and these rejections are called overload rejections. Apart from that, no rejections occur from any class.
Service policy: For eachx = (x 1 , . . . ,x I ) ∈ X define the class of low priority providedx i <â i for some i, and set L(x) = I otherwise. The complement set is the set of high priority classes: When at least one class among H(x) is not empty, the class L(x) receives no service, and all classes within H(x) that are not empty receive service at a fraction proportional to their traffic intensities. Namely, denote where recall that e I = (0, . . . , 0, 1) ∈ R I . Note that H + (x) = ∅ can only happen ifx i = 0 for all i < I, which is covered by the first and last cases in the above display. Then for each t ∈ R, That is, all prioritized classes receive a fraction of effort strictly greater than the respective traffic intensity. Also note that i U n i = 1 wheneverX n is nonzero. This is therefore a work conserving policy. Theorem 4.1 Assuming that x 0 := lim n→∞ θ n ·X n (0) exists, then, Moreover, for every n ∈ N, denote the policy constructed above by (U n (a), R n (a)). Then, where ω 2 (0+) = 0.
By a diagonalization argument, one can deduce an asymptotically optimal policy generated from U n (a). The proof of the theorem takes place in the next two sections. In Section 4.2 we show that the game's value function V bounds from below the liminf of the QCP's value functions. That is, In Section 4.3 we prove (4.7). Together, we obtain (4.6).

Proof of (4.8)
For every t ∈ R + and i ∈ [I], set and also Also, let {Q n j,i } j,i,n be the relevant measures defined as in (2.6). Notice that from Corollary 3.1 it follows that all the processes mentioned in (4.9)-(4.10) are uniformly bounded by some constant. Namely, there exists a constant C 0 > 0 such that for every j ∈ {1, 2}, i ∈ [I], n ∈ N, and t ∈ R + , We now simplify the change of measure penalty from (2.5). Since A n i (·) − · 0ψ n 1,i (s)ds is a martingale underQ n j,i we get the follwing sequence of equations: where the last equality follows by changing the order of integration. Similar calculations apply to the change of measure penalty associated with the service time. Set y n =ψ n 1,i (t)(λ i n) 1/2 /λ n i . Then, |y n | ≤ C 1 n −1/2 + o(n −1/2 ). Noticing that |(1 + y n ) log(1 + y n ) − y n | is uniformly bounded over n and recalling (4.11), we get that there exists a constant C 2 > 0, independent of n and t, such that, Fix an arbitrary sequence of controls {(Ŷ n ,R n )} n . Without loss of generality, we may assume that for every n ∈ N, J n (X n (0), U n , R n ,Q n ; κ) < V (x 0 ; ε) + 1. (4.14) Therefore, for every t ∈ R + , This property serves us in Lemma 4.1 below when we claim tightness of a time-rescaled version ofR n . Our goal here is to obtain the cost associated with the maximizer's equilibrium measure in the RSDG. Recall that the intensity of S n i (T n i ) is ψ n 2,i U n i and that dT n i (s) = U n i (s)ds. Then, under the measureQ n , the dynamics ofX n satisfŷ are G n t -martingales underQ n , where recall that G n t is given in Definition 2.1.(i). The latter claim follows by standard martingale techniques, see e.g., the arguments given in the proof of Theorem 3.4 in [28]. SetǍ n = (Ǎ n 1 , . . . ,Ǎ n I ) and similarlyĎ n = (Ď n 1 , . . . ,Ď n I ). Recalling (3.6) and (4.9)-(4.10), we expect to obtain a limiting process of the form where Q V is the measure associated withψ V given in (3.23) andBQ V = (BQ V 1 , . . . ,BQ V I ) is an I-dimensional standard Brownian motion under the measureQ V .
Since {(Ŷ n ,R n )} n is an arbitrary sequence of singular controls (up to the restriction in (4.14)), one cannot expect this sequence to be tight. Therefore, as mentioned in the introduction, we use time stretching in order to prove the lower bound. At this point, it is worth mentioning that Atar and Shifrin [6] managed to bypass this issue by arguing C-tightness of the integrands of the relevant processes. Repeating the same arguments, one may show that the integrands ofX n i ,Ŷ n i , andR n i are C-tight. Since V is bounded, see Corollary 3.1, we get that the sequence However, since we wish to obtain the specific integral given in (4.19), we still need to argue tightness ofX n . Therefore, a timestretching method is used. The idea is as follows. Take for example the processR n . In Section 4.2.1 we stretch the time (using the same transformation for all the processes) and generate a processR n in such a way that {R n } is Lipschitz-continuous with Lipschitz constant 1 and therefore tight and converges to a processR. Then in Section 4.2.2 we go back to the original scale by an inverse time transformation and get the processR, which is used in Section 4.2.3 to get the value function of the RSDG.

Lemma 4.1 The sequence of processes
{(Ǎ n ,Ď n ,Ã n ,D n ,X n ,τ n ,Ỹ n ,R n ,T n )} n is C-tight. Let (Ǎ,Ď,Ã,D,X,τ ,Ỹ ,R,T ) be a limit of a weakly convergent subsequence. Then, for every t ∈ R + one has, Proof. The C-tightness of {τ n ,Ỹ n ,R n )} n follows by the following observation, which relies on the identity τ n (τ n (t)) = t and the definition of τ n .
Next, since the sequence {(τ n ,Ỹ n )} n is C-tight, we get by (3.1) and the limit µ n i n −1/2 → ∞, thatT nQ n =⇒T . From (4.16), the C-tightness of {τ n ,Ỹ n ,R n )} n , and (4.12), the C-tightness of {X n } n follows once we show that {(Ã n ,D n )} n is C-tight. Recalling that {τ n } n is C-tight, then in order to prove the latter statement, it is sufficient to show the C-tightness of {(Ǎ n ,Ď n )}.
For every i ∈ [I] and n ∈ N, the processes ψ n 1,i and ψ n 2,i U n i are the intensities of A n i and D n i := S n i (T n i ), respectively. Denote Therefore, for every j ∈ {1, . . . , 2I},W n j is a martingale w.r.t. its own filtration, under the measureQ n . Notice that the quadratic variation ofW n j satisfies, for some constants C 3 , C 4 > 0, independent of n, δ, and π. Therefore, Aldous criterion for tightness holds (see e.g., [9,Theorem 16.10]). Since the jumps of these processes are of order O(n −1/2 ), any limit process has continuous paths with probability 1 and C-tightness of {W n } n is proved. The first two identities in (4.23) follow by the convergence (W n ,W n ,τ n ) ⇒ (W ,W ,τ ), which in turn follows by the tightness of {(W n ,W n ,τ n )} n . Finally, the quadratic variation of B follows by (4.11), (4.23), and the martingale central limit theorem, see [19,Theorem 7.1.4].
The following lemma states that the time rescaled processτ (t) grows to infinity together with t. The Lemma plays an important role in the proof of Proposition 4.1 below. Proof. Fix 0 < s < t. From (4.21), it follows that (4.27) From (4.12), (4.16), and the inequality T n (u) ≤ u, u ∈ R + we get that there is a constant C 5 > 0 independent of n and t, and such that Recalling thatǍ n i −Ď n i is a martingale, and (4.15), we get that the last term in (4.27) is bounded above by C 6 t−s , for some C 6 > 0, independent of n and t. Now, since the events {τ (t) > s} t are decreasing with t, we get that Using the convergence in lawτ n d ⇒τ , and since Q * (τ n (t) < s) ≤ C 6 t−s , we conclude that

Back to the original scale
We now define the inverse ofτ , which brings the limit processes back to the original scale. Set One can verify that τ is right-continuous and strictly increasing. Moreover, lim t→∞ τ (t) = ∞ Q * -a.s. and from Lemma 4.2, for every t ∈ R + , τ (t) < ∞ a.s. Finally, for every t ∈ R + , τ (τ (t)) = t, τ (τ (t)) ≥ t, and The time-transposed processes are defined as follows: From (4.22), the equalityτ (τ (t)) = t, and Lemma A.1, we have for every t ∈ R + , We now turn to showing that the processes within (4.28) satisfy the properties of Definition 3.1 for an appropriate filtration, where the hats are replaced by the superscript * . Property (ii) in Definition 3.1 follows since the processesŶ n andR n satisfy an equivalent condition, see the paragraph that comes after the definition. The rest of the properties will follow once we show that B * is a standard Brownian motion w.r.t. the chosen filtration, see (3.6). Thus, we now turn to define the relevant filtration. SetG t =G (t) := σ{(X(s),Ã(s),D(s),Ỹ (s),R(s)), 0 ≤ s ≤ t}. Notice that for every 0 ≤ s < t < ∞, {τ (s) < t} = {τ (t) > s} ∈G (t). Therefore, τ (s) is an optional time forG (t). From [24, Corollary 2.4], τ (s) is a stopping time for the complete right-continuous filtrationG t =G(t) :=G (t+) ∨ N , where N is the collection of Q * -null sets. Using the monotonicity of t → τ (t), we get that G * t :=G(τ (t)) is a filtration. We start by arguing the continuity. From (4.26), (B n ,τ n ,B n ) → (B,τ ,B) Q * -a.s., u.o.c. and therefore,B(·) =B(τ (·)), Q * -a.s., u.o.c. Thus, B * (·) =B(τ (·)) =B(τ (τ (·))) =B(·), which is continuous Q * -a.s., see Lemma 4.1. The proof that B * (t)(B * (t)) − It is a local martingale follows by the same lines of the proof that B * (t) is a local martingale. We start with the proof of the latter and then add the missing details for the former. Recall the definition of G n from Definition 2.1.(i). Fix t, s ∈ R + and n ∈ N. Notice that {τ n (s) ≤ t} = {τ n (t) ≥ s} = {t+θ ·R n (t)+θ ·Ŷ n (t) ≥ s ∈ G n t . Thus,τ n (s) is a G n t -stopping time. Recall thatǍ n andĎ n are G n t -martingales, then the optional sampling theorem (see Problem 1.3.24 in [20]) yields thatB n (t + ·) =B n (τ (t + ·)) is a G n (τ n (t))-martingale. As a consequence, for every i ∈ [I] and every G n (τ n (t))-measurable random variable ζ n , Recall that t →τ n (t) is nondecreasing and therefore G n (τ n (t)) is a filtration. Now, for every bounded continuous function g, From (4.12), (4.30), and the bound T n (u) ≤ u, u ∈ R + , we get that there are constants C 7 , C 8 > 0, independent of t, such that for sufficiently large n one has where the last inequality follows by the same arguments leading to (4.24). From (4.26), (4.29), and the bounded convergence theorem, we get that, which implies thatB is aG t -martingale. Next, for every i ∈ [I], we show that the composition B * i (t) =B i (τ (t)) is a G * t -local martingale. For this, we need to define the following stopping times. Fix M > 0 and set, One can verify that τ (π i,M ) =π i,M .
We now show that B * i (t ∧ π i,M ), which by definition equalsB i (τ (t ∧ π i,M )), is a G * tmartingale. Since lim M →∞ π i,M = ∞, Q * -a.s., we conclude that B * is a G * t -local martingale. For this, set,B i,M (·) :=B i (· ∧π i,M ). We use the optional sampling theorem given in [19,Theorem 2.2.13], which (in adaptation to our notation) states that if for every t ∈ R + , then for every 0 ≤ s ≤ t, where we used the identity G * (t) =G(τ (t)). Therefore,B i,M (τ (t ∧ π i,M )) is a G * t -martingale. Notice that and therefore, B * i (t ∧ π i,M ) is a G * t -martingale. Indeed, the first and last equalities follow by the definitions of B * andB i,M , respectively. The second and the forth equalities follow by the monotonicity of the function t → τ (t). Finally, the third equality follows since τ (π i,M ) =π i,M .
We now prove that Properties (4.32) and (4.33) hold. Property (4.32) follows by the definition ofB i,M . To prove Property (4.33) notice that Now, from Lemma 4.2 the l.h.s. of the above approaches 0 as T → ∞ and (4.33) is proven.
We end the proof by providing the missing arguments for the proof that B * (t)(B * (t)) − It is a martingale. One may go over the proof and replace theB i (t)'s, i ∈ The only difference between the proofs lies in proving thatÑ (t) = B(t)(B(t)) − It is aG t -martingale. Or equivalently, in showing that (4.31) holds withÑ ij , replacingB i . To this end, recall thatQ n = I i=1 (Q n 1,i ×Q n 2,i ). Thus, {A n 1 , . . . , A n I , D n 1 , . . . , D n I } are mutually independent underQ n and therefore also under Q * . Moreover, using the continuity ofτ n and the notation ∆L(t) = L(t) − L(t−), we get that for any i, j ∈ [I], i = j, the following equalities hold Q * -a.s., (∆A n i (τ n (s))) 2 + (∆D n i (τ n (s))) 2 where the limit holds by the strong law of large numbers. SinceB n i (t)B n j (t) − [B n i ,B n j ](t) is a G n (τ (t))-martingale, by taking the limit n → ∞, one deduces that (4.31) holds withÑ ij replacingB i .

Asymptotic lower bound
We are now ready to analyze the cost function. We start with a lower bound for the limit of the Kulback-Leibler divergences from (2.5). Recall (4.13) and consider y n =ψ n 1,i (t)(λ i n) 1/2 /λ n i . By (4.9)-(4.10) and Corollary 3.1, y n ≥ 0. Notice that for every y ≥ 0, Since λ i n/λ n i → 1 as n → ∞, we get from (4.13) that The same calulation yields, The last expectation above is of order o(1). Indeed, by (4.9)-(4.10), (ψ n 2,i (t)) 2 = c i (V (θ · X n (t−)); ε)) 2 for some c i > 0. From Lemma A.1, From Lemmas 4.1 and A.2 the last imtegral converges to 0, Q * -a.s. Although the mentioned lemma is stated for finite time interval, the discounted cost, the boundedness of V and the bound T n (t) ≤ t, allow us to take the integral's upper limit to be infinity. Now the uniform boundedness of the integral implies that the expectation of the last integral above converges to 0 as well. Using the representations above together with (3.7), (3.10), and (4.9)-(4.10), it follows that Notice thatX n has only countable number of jumps during the time interval [0, ∞) and therefore we could replaceX n (t−) byX n (t) without affecting the integral. From (2.4), the above, and Lemma A.1, one has J n (X n (0), U n , R n ,Q n ; κ) (4.36) Recall thatX n and V are uniformly bounded, and also thatR n is nondecreasing, continuous, and bounded on any compact time interval. Then from (4.26), Lemma A.2, and the bounded convergence theorem, we get that for every s ∈ R + , one has Sinceĥ andr have positive entries and 0 ≤ V ≤ r, Using again the uniform bound ofX n and V , thatR n is nondecreasing, and Lemma 4.2, taking s → ∞ in the above, we get where we used Lemma A.1 to get the equality. Indeed, recall thatτ (τ (t)) = t, X * (t) =X(τ (t)), R * (t) =R(τ (t)), and Lemma 4.2. The last inequality follows since by the definitions of h and r (see (3.13)-(3.14) and [16, (2.45)-(2.46)]), h(θ · X * (t)) ≤ĥ · X * (t) and Then, from (4.22), together with (3.7), (3.10), and the limit x 0 = lim n→∞ θ n ·X n (0), we have that From Proposition 4.1 we get that B is a standard one-dimensional Brownian motion w.r.t. G * t . Hence, (3.16) and the definition of Q V in Proposition 3.1 imply that the last expectation in the sequence of relations in (4.37) equals J(x 0 , Y, R, Q V ; ε). Together with (4.36) we have, lim inf n→∞ J n (X n (0), U n , R n ,Q n ; κ) ≥ J(x 0 , Y, R, Q V ; ε) ≥ V (x 0 ; ε).

Proof of (4.7)
In this section we consider a sequence of arbitrary strategies for the maximizer and show that in the limit, the candidate policy for the DM is bounded above by the value function of the RSDG. We start with considering an arbitrary sequence of strategies for the maximizer that is not too costly. Then in Section 4.3.2 we show that by using the candidate measure, the dynamics of the buffers' sizes stay close to γ from (3.22). In Secion 4.3.3 we asymptotically bound the expected cost; in order to estimate the change of measure penalty, we truncate the processes {ψ n j,i } n,j,i and show that by doing this the penalty does not change much.

The maximizer's perspective
Consider an arbitrary sequence of measures chosen by the maximizer in the QCP, {Q n } n∈N , where eachQ n ∈Q n (X n (0)). Recall that every meaureQ n ∈Q n (X n (0)) is associated with the processes {ψ n j,i } j,i , see (2.6). These processes stand for the 'new' intensity of the processes {A n i } i,n and {S n i (T n i )} i,n . To simplify the notation and some of the arguments, we consider one probability space (Ω, G,Q) that supports the processes {(A n i , S n i (T n i ))} i,n and underwhich, for every n ∈ N, the relevant intensities are {ψ n j,i } j,i,n . However, ocassionally, when we want to emphasize the relevant measure, we use the measures {Q n j,i } n,j,i and {Q n } n . Notice that the same calculation given in (4.13) is valid here as well, so (4.38) We now show that without any loss, the maximizer can be restricted to measuresQ n ∈ Q n (X n (0)), which are 'not too far away' from the reference measure P n . The idea behind it, as will be provided rigorously in the proof below, is that by changing the rate, the rejection cost will contribute at most a linear cost, while the penalty for the change of measure is superlinear. Without loss of generality, we may and will assume that Recall that in the previous subsection the maximizer was given a specific strategy for which sup t,j,i |ψ n j,i (t)| was bounded. Since we consider now a sequence of arbitrary strategies for the maximizer, it does not hold in this case. However, as we show in Proposition 4.2 and in Section 4.3.3, this is approximately the case.
Proof. The second bound follows from the first one together with inequality (4.34). Thus, we only prove the first one. As argued in [6], at the bottom of page 595, for every t ∈ R + , where in the above expression, and in the rest of the proof, C refers to a finite positive constant that is independent of n and t and which can change from one line to the next. Denote Recall the definitions ofǍ n ,Ď n given in (4.17) and (4.18), then Applying Burkholder-Davis-Gundy inequality toǍ n andĎ n , and noticing that n −1 |ψ n j,i | ≤ C(1 + n −1/2 |ψ n j,i |), we have Notice that the first term in (4.44) is non-positive. Taking expectation on both sides of it and using the bound (4.43), we get that Clearly,ĥ ·X n is bounded. Then (4.39) and the last bound yield that EQ ∞ 0 e − t (ψ n 1,i (t)) 2 dt + (ψ n 2,i (t)) 2 dT n i (t) for some C 0 > 0, independent of n. Since the function z → z 2 is superlinear, there is a constant C 1 ∈ R such that for any z ∈ R, z 2 ≥ C 1 + 2C 0 z, and thus Together with (4.45), we obtain that, and the result holds by another application of (4.45) and the bound above.  Fix δ, η, K > 0 and for every n ∈ N set Clearly, for any K > 0,  We now examine the second term on the r.h.s. of (4.47). On the event {P n ≤ K}, Jensen's inequality implies that there exists a constant C T > 0, independent of n, such that for every 0 ≤ s < t ≤ T , As a result, lim δ→0+ lim n→∞Q osc T ((Ψ n 1 ,Ψ n 2 ), δ) ≥ η, P n ≤ K = 0.

Staying close to the minimizing curve
We consider a set of one-dimensional processes. It is obtained by multiplying the scaled processes by θ n . For its definition denotê W n :=Ǎ n −Ď n +m n ·, (4.49) and W ,n := θ n · (Ŵ n ), X ,n := θ n ·X n , Y ,n := θ n ·Ŷ n , R ,n := θ n ·R n . The identity from (4.16) is valid here as well and can be expresses aŝ As a result, The next lemma states that under the candidate policy given in Section 4.1, the service time of every buffer converges to its traffic intensity. Proof. By (4.52) we have, Ŷ n T ≤ X n T + Ǎ n T + Ď n T + m n T + C 2 (Ψ n 1 ,Ψ n 2 ) T + R n T , (4.54) where C 2 depends solely on {(λ i , µ i )} i . From (4.41) it follows that there exists a constant C 3 > 0, independent of n and T , such that for every T ∈ R + , Recall that X n T is bounded and from Lemma 4.3, (Ψ n 1 ,Ψ n 2 ) T is tight. Then, once we show that {(Ǎ n ,Ď n )} n is C-tight, from (4.54) and (4.55), we get that { Ŷ n T } n is tight. Now, by the definitions ofŶ n and µ n , see (3.1) and (2.3), we get that {T n i } n converges u.o.c. toT i , whereT i (t) = ρ i t.
We now argue the C-tightness of {(Ǎ n ,Ď n )} n . From (4.42) and Proposition 4.2, it follows that for every T > 0, As argued in the proof of Lemma 4.3 it is dufficient to prove that for every K > 0, where P n is given in (4.46). This follows since under the event {P n < K}, the jumps of (Ǎ n ,Ď n ) are of order n −1/2 and therefore, the limit process is continuous.
Finally, the C-tightness of {Ŵ n } n follows now by it definition.
2 We now define another set of processes generated from the last set. Let τ n be the first time a forced rejection occurred in the n-th system and set L •,n := L ,n (· ∧ τ n ), L ∈ {W, X, Y, R, Ψ 1 , Ψ 2 }. (4.56) We continue the proof of (4.7)with the case that the initial state lies close to the minimizng curve. That is, where recall that x 0 := θ n ·X n (0). As argued in [6, Step 5, p. 596], this condition can be relaxed by considering a stopping time that indicates when the state is 'sufficiently close' to the minimizing path. This stopping time converges to zero. Then we may continue from that stopping time, in the same way. The proof follows by the same lines as in case that the initial state lies close to the minimizng curve but with heavier notation and therefore omitted. We now state a couple of results from [6] that are needed here. The proofs in our case are almost identical, yet require some technical modifications. For completeness of presentation we provide their proofs. A minor modification that can be observed immediately is that we definedŴ n and W ,n usingǍ n andĎ n instead ofÂ n andD n . We start with the arguments that appear in [6, Step 1, pp. 586-588]. Moreover, the sequence {(W •,n , X •,n , Y •,n , R •,n , Ψ •,n 1 , Ψ •,n 2 )} n is C-tight.
Proof. By the definition of the candidate policy, rejections do not occur when X •,n < a and the policy is work conserving, namely I i=1 U n i (t) = 1, wheneverX n (t) > 0. Therefore, and (4.58) holds.
as n → ∞ and moreover for every T > 0, lim n→∞Q (τ n < T ) = 0. This is the equivalent of Equation (67) and the conclusion in Step 3 at the bottom of page 593 in [6]. The proof there in given in Step 2, and spans over pages 588-593. We now provide an adaptation of [6, Step 2] to our case. For simplicity we share most of the notation and provide the claims in the same order.
Proof. Denote by G = {x ∈ X : θ ·x ≤ a,x = γ a (θ ·x)} the set of points lying on the minimizing curve, and ∂ + X = {x ∈ X :x i =b i for some i}. These two sets are compact and disjoit. Hence, there exists ν 0 > 0 such that for any 0 < ν < ν 0 , G ν ∩ (∂ + X ) ν = ∅, where for a set F ∈ R I we denote In the rest of the proof we consider only ν strictly smaller than ν 0 . For sufficiently large n, forced rejections occur only whenX n lies in (∂ + X ) ν . As a result, as long as the processX n lies in G ν , no forced rejections occur. Therefore, σ n ≤ τ n , where σ n =ζ n ∧ ζ n , ζ n = inf{t : X ,n ≥ a + ν }, ζ n = inf{t : max As a result, in order to prove the first limit in the lemma, it is sufficient to show thatQ(σ n < T ) → 0, for any small ν > 0 and any T . Fix ν and T . Since, σ n ≤ τ n , Q(σ n < T ) =Q(σ n < T, σ n ≤ τ n ) ≤Q(ζ n ∧ ζ n ≤ T ∧ τ n ) ≤Q(ζ n ≤ T ∧ τ n ) +Q(ζ n ≤ T ∧ τ n ).
Then for all large n,Q (Ω n,k ), (4.63) where we used the identity X ,n = X •,n on [0, τ n ]. We fix k ∈ {1, . . . , K − 1} and use a similar (but more advanced) argument to the one given in the proof of Lemma 4.5 to analyze Ω n,k . The value assigned by the policy to U n (see (4.3)) remains fixed asX n varies within any of the intervals (α j ,α j+1 ), whereα i := I k=i+1 θ kâk , i ∈ [I]. We now provide four separate cases, that under each one of them, for each k, lim n→∞ Ω n,k = 0: (I)Ξ k ⊂ (0, a) and for all j,α j / ∈Ξ k . Then we consider the cases (II)Ξ k ⊂ (0, a) butα j ∈Ξ k for some j ∈ {1, 2, . . . , I − 1}. (III) 0 ∈Ξ k . (IV) a ∈Ξ k . There may be additional intervalsΞ k , but they are all subsets of (a, ∞) and therefore not important for our purpose.
We analyze only case (I) (and afterwards comment on the other ones). This means that in the represenation of γ a , the j-th component is the same for all the points x ∈Ξ k . Note that j := j(k) depends on k only, and in particular does not vary with n.
Fix i ∈ {j +1, . . . , I} (unless i = I). We estimate the probability that, on Ω n,k , ζ n ≤ T ∧τ n occurs by havingX n i (ζ n ) − γ a i (X ,n (ζ n )) ≥ ν . More precisely, Since i > j, γ a i (x n ) = a i . Then we will show that for every ν ∈ (0, ν ),Q(Ω n,k ∩ {X n i (ζ n ) > a i + ν }) → 0 as n → ∞. Recall the convergence of the initial condition given in (4.34) and that γ a is continuous. Now, the jumps ofX n are of size n −1/2 , and therefore on the event indicated in (4.64) there must exist η n ∈ [0, ζ n ] with the properties that X n i (η n ) < a i + ν /2,X n i (t) > a i for all t ∈ [η n , ζ n ]. (4.65) On this event, during the time interval [η n , ζ n ], i is always a member of H(X n ), and therefore by (4.4)-(4.5), U n i (t) = ρ i (X n (t)) > ρ i + c, for some constant c > 0, independent of n. By the definition ofŶ n i from (3.1), d dtŶ n i ≤ − µ n i √ n c. Setη n = η n ∨ (ζ n − δ ). Then for every t ∈ [η n , ζ n ] one hasX n (t) ∈Ξ k ⊂ (0, a) and therefore no rejections occur. Combining these facts with (4.52) and the definitions ofǍ n ,Ď n , andŴ n , given in (4.17), (4.18), and (4.49), we havê where 1 we used the notation L[t, s] = L(t) − L(s) for any process L, and 0 ≤ s ≤ t. As in the proof of Lemma 4.5, fix a sequence r n > 0 with r n → 0 and r n √ n → ∞. If ζ n − η n < r n and n is sufficiently large thenη n = η n , thus by (4.64) and the definition of η n ,X n i [η n , ζ n ] ≥ ν /2. As a result, must hold. If on the other hand, ζ n − η n ≥ r n then by (4.66), for some constant c > 0. Hence the probability in (4.64) is bounded above bŷ the rest of the proof follows by the same lines as in [6], where againŴ n is replaced bŷ W n i + λ 1/2 iΨ n 1,i − µ 1/2 iΨ n 2,i . The properties needed are the C-tightness of {X •,n } n and {Ŵ n } n , the uniform continuity of γ a , the convergence θ n → θ and of the initial condition given in (4.57), the boundedness of X , and that the jumps ofX n are of size n −1/2 .

Asymptotic bound for the costs
In this subsection we take advantage of the convergence of the dynamics to the minimizing curve we just argued. We start by showing that the expected holding and rejection components of the cost can be approximated by equivalent components associated with one dimensional dynamics. Then we approximate the one-dimensional dynamics with simpler dynamics for which the intensities used by the maximizer are truncated in some sense. The difference between the expected holding and rejection cost components of these dynamics are shown to be small. The 1 This is where the proof given in [6] requires modification:Ŵ n is replaced byŴ n i + λ reason for this reduction is as follows. Although we know that theQ-a.s. absolutely continuous valued processes (Ψ n 1 ,Ψ n 2 ) are C-tight and therefore have a converging subsequence whose limit is denoted by (Ψ 1 ,Ψ 2 ), it does not imply thatQ-a.s. the paths of (Ψ 1 ,Ψ 2 ) are absolutely continuous. Hence, we cannot argue for example thatΨ 1 is of the formΨ 1,i = · 0ψ 1,i (t)dt for someψ 1,i . As a consequence, we cannot express the limiting measure for the maximizer, nor the change of measure penalty usingψ 1,i as can done for example in (4.38) by substituting ψ n 1,i = λ n i +ψ n 1,i (λ i n) 1/2 . After this reduction we bound the change of measure penalty. Finally, the expected cost associated with the uniformly bounded rates dynamics is approximated by the value function of the RSDG.
One dimensional reduction. We start with showing the following uniform bound lim sup n→∞ EQ ∞ 0 e − t (R n (t)) 2 dt < ∞. (4.67) To establish this, recall the bounds in (4.41)-(4.42). As a result, for some constant C > 0 independent of n and t. Together with Proposition 4.2, (4.67) follows. Now turn to the holding and rejection costs. By the definitions of the rejection mechanism and of R ,n in Section 4.3.2, on the event {τ n > T }, one haŝ Relations Recall that h a (w) =ĥ · γ a (w). Then, Propsition 4.3, the boundedness ofX n and h a (X ,n ), the bound R ,n ≤ θ n R n , and (4.67) imply that where we used the identity r = r i * µ i * , see (3.15). Using the limit lim n→∞Q (τ n < T ) → 0 and (4.68), we deduce, e − t {h a (X •,n (t)) + rR •,n (t)}dt = 0. (4.69) Truncated intensities. We now show that the maximizer can use probability measures for which {ψ n j,i } j,i,n are uniformly bounded from above by a sufficiently large constant k without too much lost.
Recall that underQ, A n i (·) − · 0 ψ n 1,i (t)dt and S n i (T n i (·)) − · 0 ψ n 2,i (t)dT n i (t) are martingales. We now truncate the intensities and consider theQ martingales A n,k i (·) − · 0 ψ n,k 1,i (t)dt and S n,k i (T n,k i (·)) − · 0 ψ n,k 2,i (t)dT n,k i (t), where and T n,k = (T n,k is the DM's control associated with the intensities {ψ n,k j,i } j,i . Denotê ψ n,k j,i (·) :=ψ n j,i (·)1 {|ψ n j,i (·)|≤k} , j ∈ {1, 2}. Clearly, |ψ n,k j,i | ≤ k and The processes (A n,k , D n,k ) and (A n , D n ) are coupled such that for every i ∈ [I], · 0 ψ n,k 1,i (t)dt) and similarly defineĎ n,k . For every L ∈ {W, X, Y, R, Ψ 1 , Ψ 2 } letL n,k and L ,n,k be defined asL n and L ,n , where the intensities (ψ n,k 1,i , ψ n,k 2,i ) are replacing (ψ n i,i , ψ n 2,i ), i ∈ [I] and let T n,k i be the equivalence of T n i in this setup. Also, let τ n,k be the first time a forced rejection occurs in this setup and set L •,n,k := L ,n,k (· ∧ τ n,k ). Clearly, Lemmas 4.4, 4.5, and Proposition 4.3 hold in this case as well, where the superscript n is replaced by n, k. For the sake of exposition, we state all the necessary results here.
As a private case of Lemma 4.4 we get that {T n,k } n converges u.o.c. toT = (T 1 , . . . ,T I ), where recall thatT i (t) = ρ i t, t ∈ R + . Proposition 4.3 implies that for sufficiently small δ 0 , the following limits hold lim n→∞X n,k − γ a (X ,n,k )Q =⇒ 0, and for every k, T > 0,Q (τ n,k < T ) = 0. ,Ψ n,k 1 ,Ψ n,k 2 )} n is C-tight and any subsequential limit of it (W •,k , Moreover, there are processes {φ k j,i } j,i , progressively measurable w.r.t. G n t , such that (4.77) Proof. The Skorokhod mapping formulation of the pre-limit processes, the limit in (4.73), and the C-tightness are private cases of Lemma 4.5. This results imply the Skorokhod formulation of the limiting process in (4.74).
The expressions in (4.75) follow since θ n → θ and from (4.51), As a result from the definitions ofŴ n , W ,n,k , and W •,n,k (see their equivalences in (4.49), (4.50), and (4.56)), we get thatW •,k is an (m, σ)-Brownian motion w.r.t. its own filtration. The proof that its filtration can be replaced by F t follows by the same lines of Proposition 4.1 and therefore omitted.
2 The next proposition tells us that by truncating the intensities, the expected cost does not change much.
whereQ n,k 1,i andQ n,k 2,i are the measures associated with ψ n 1,i (t) = λ n i t + (λ i n) 1/2ψ n,k 1,i (t) and ψ n 2,i (t) = µ n i t + (µ i n) 1/2ψ n,k 2,i (t). e − t |f n 2,i (ψ n,k 2,i (t)) − f n 2,i (ψ n 2,i (t))|dT n i (t) = 0, (4.85) We start with the limit in (4.81) Simple algebraic manipulation gives the bound W •,n,k − W •,n t ≤ C( Ǎ n,k −Ǎ n t + Ď n,k −Ď n t ), where C refers to a finite positive constant that is independent of n and t and which can change from one line to the next. From and similar bounds apply for j = 2 as well. We may continue in the same way as we did in (4.86), where now the n 1−2 term is absent, using the two bounds in Proposition 4.2.
Since the constant δ 1 (> 0) and the measures {Q n } n were chosen arbitrary, we get (4.7).
t ∈ R + . Let f be a nonnegative Borel-measurable function on R + and let F : R + → R + be a right-continuous, nondecreasing function. Then where we use the convention that the contribution to the integrals above at 0 is f (0)F (0).
Lemma A.3 Let {f n } n∈N and f be bounded integrable functions mapping R + to R. Also, let {ζ n } n∈N and ζ be nondecreasing and continuous functions mapping R + to itself such that ζ n (t) − ζ n (s) ≤ t − s for every 0 ≤ s ≤ t. Assume that ζ n → ζ and Thus, it is sufficint to prove the required bound with T replacing ∞ in the upper limit of the integrals. Fix ν > 0. From the assumptions in the lemma, there exists n 0 ∈ N, such that for every n ≥ n 0 Every measurable function is the (a.s. w.r.t. Lebesgue measure) pointwise limit of step functions (see [36,Theorem 4.3]). Denote them by {g m } m∈N . Since f is bounded, we may and will take the step functions to be uniformly bounded. From Egorov's theorem it follows that there is a set B ν with Lebesgue measure smaller than ν such that lim m→∞ sup s∈ and together with the last inequality, the result holds. 2