Jumping Fluid Models and Delay Stability of Max-Weight Dynamics Under Heavy-Tailed Traffic

Published Online:https://doi.org/10.1287/stsy.2023.0110

Abstract

We say that a random variable is light-tailed if moments of order 2+ϵ are finite for some ϵ>0; otherwise, we say that it is heavy-tailed. We study queueing networks that operate under the max-weight scheduling policy for the case in which some queues receive heavy-tailed and some receive light-tailed traffic. Queues with light-tailed arrivals are often delay stable (that is, expected queue sizes are uniformly bounded over time) but can also become delay unstable because of resource sharing with other queues that receive heavy-tailed arrivals. Within this context and for any given “tail exponents” of the input traffic, we develop a necessary and sufficient condition under which a queue is robustly delay stable, in terms of jumping fluid models—an extension of traditional fluid models that allows for jumps along coordinates associated with heavy-tailed flows. Our result elucidates the precise mechanism that leads to delay instability through a coordination of multiple abnormally large arrivals at possibly different times and queues and settles an earlier open question on the sufficiency of a particular fluid-based criterion. Finally, we explore the power of Lyapunov functions in the study of delay stability.

1. Introduction

We say that a random variable is light-tailed if moments of order 2+ϵ are finite for some ϵ>0; otherwise, we say that it is heavy-tailed. We study queueing networks that operate under the max-weight (MW) scheduling policy for the case in which some queues receive heavy-tailed traffic, whereas some other queues receive light-tailed traffic; our motivation stems from the fact that heavy-tailed processes are often natural models of the inputs to computer and communications networks (Foss et al. 2011). Queues that receive heavy-tailed traffic are naturally delay unstable, that is, they incur infinite expected delay as an immediate consequence of the Pollaczek–Khintchine formula. However, it is also known that, because of the relatively complex max-weight dynamics, some of the queues that receive light-tailed traffic may also end up delay unstable.1 Our aim is to develop conditions that determine whether any particular queue is delay stable or not.

This problem is studied extensively (Markakis 2013; Markakis et al. 2014, 2018; Nair et al. 2015), and a necessary condition for delay stability is given in Markakis et al. (2014, 2018). In particular, Markakis et al. (2018) consider the associated fluid model, initialized at zero, except for a positive initial condition at some queue that receives heavy-tailed arrivals. If another queue happens to eventually become positive under that fluid model, one can then conclude that the latter queue is delay unstable. This result leads to the natural question whether this condition is also sufficient, that is, whether delay stability is guaranteed when any such fluid trajectory (with a positive initialization at any single one of the queues that receive heavy-tailed traffic) keeps the queue of interest at zero level. Such a sufficiency result might appear plausible because, in models involving heavy-tailed random variables, large fluctuations are often the consequence of a single abnormally large value in the underlying heavy-tailed random variables (Pakes 1975, Veraverbeke 1977, Anantharam 1989, Asmussen 1996, Durrett 1980, Foss et al. 2011).

1.1. Our Contributions

We start in Section 3 by showing that the aforementioned possible sufficiency result does not hold. We accomplish this by providing a fairly simple example in which a large arrival at any single heavy-tailed queue does not cause a certain queue of interest to grow, but a combination of two large arrivals, at two different heavy-tailed queues, can result in large backlogs at the queue of interest. We also provide necessary and sufficient conditions for delay stability in that particular example, and then provide intuition for a possible more general result. Interestingly, delay instability manifests itself only when the tail exponents (defined precisely in Section 2.2) of the heavy-tailed arrival processes lie in a specific range. As a consequence, we need to take these exponents explicitly into account, something that traditional fluid models cannot do.

We then generalize, by developing general and tight (necessary and sufficient) conditions for delay stability, in terms of deterministic fluid-like models in which there can be multiple jumps (in different queues and possibly at different times) at the heavy-tailed queues.

Our conditions are not easy to test computationally, but this seems unavoidable: because the conditions are necessary and sufficient, the complexity of testing them reflects the intrinsic complexity of testing delay stability. On the positive side, our conditions

  1. Provide a conceptual understanding of the mechanism that results in delay instability.

  2. Can be checked in special cases, for instance, for the example in Section 3; see Section 5.3.

On the technical side, our general conditions involve a small but technically crucial reformulation of the delay stability problem. To understand the underlying issue, note that fluid models do not always lead to definite conclusions when the underlying system is marginally stable but are generally effective when used to analyze “robust” properties. For this reason, we consider a network with given nominal arrival rates and focus on robust stability. Namely, we ask whether a certain queue is delay stable for all arrival processes with given tail exponents and for all (possibly time-varying) arrival rates that lie in some open ball around the nominal ones. When the problem is framed that way, definitive necessary and sufficient conditions for (robust) delay stability become possible. Furthermore, with this formulation, it is only the nominal arrival rates and the tail exponents that matter as opposed to the details of the distribution of the input traffic.

Finally, earlier works Markakis (2013) and Markakis et al. (2018) show that Lyapunov functions with certain structural properties can be used to certify delay stability. But it was not known whether this methodology is “complete,” that is, whether delay stability can always be established through a suitable Lyapunov function. Our results make progress in this direction, for the case of “very heavy” tails, that is, when there exists some γ>0 for which arriving traffic moments of order 1+γ are infinite but γ can be arbitrarily small. For this regime, we derive some necessary and sufficient conditions for delay stability: we show that a Lyapunov function of a special kind can be used to certify delay stability together with a partial converse.

1.2. Related Works on Multiple Big Jumps

As already mentioned, in several systems that involve heavy-tailed random variables, large fluctuations are the consequence of a single large jump in the underlying heavy-tailed variables. However, this is not always the case. There are known systems that have small fluctuations under a single large jump, and large deviations can only arise as a consequence of multiple jumps in heavy-tailed random variables with suitable timing. Early examples are studied in Jelenković and Momčilović (2003) and Zwart et al. (2004) for a single-buffer system with multiple on/off input processes with heavy-tailed periods. Furthermore, Foss and Korshunov (2012) provide conditions on the finiteness of moments of queue sizes in a multiserver G/G/s queue and show the special effects caused by multiple jumps. Perhaps the closest work to ours that demonstrates the necessity of multiple big jumps is Chen et al. (2019), who propose a rare-event simulation technique to estimate the probability of rare events in stochastic systems that receive heavy-tailed inputs. As an application, they consider a queuing network with fixed fluid services and fixed routing in which each queue receives independent heavy- or light-tailed exogenous arrivals. They characterized the tail exponent of the queues in terms of a knapsack-type constraint that involves the sum of numbers of large jumps in the different inputs that are required to make the queue of interest large, in which the sum is weighted by the tail exponents of the corresponding inputs. Such a knapsack-type constraint also appears in our bounds, and the tail exponent fully determines the delay stability/instability of the queue of interest. Thus, our results parallel those in Chen et al. (2019) but for a different setting. The main differences are that our network operates under the more complex max-weight scheduler, and our assumptions allow for non–independent and identically distributed (i.i.d.) arrival processes. Furthermore, our proof techniques are very different from those in Chen et al. (2019); it would be interesting to explore whether the techniques in Chen et al. (2019) can be used to obtain alternative proofs for our setting (Theorem 1).

1.3. Outline

The rest of the paper is organized as follows. We start in the next section with the details of our model and some definitions. In Section 3, we discuss an example that provides insights on the ways that arrival rates and tail exponents affect delay stability and demonstrate that criteria based on traditional fluid models are inadequate for the purpose of deciding delay stability. In Section 4, we introduce a fluid-like model, which we call the jumping fluid (JF) model and underlies our main result, the necessary and sufficient conditions for (robust) delay stability that we present in Section 5. In Section 6, we study the power of Lyapunov functions for our problem. In Sections 79, we provide the proofs of our results. As the proofs are quite involved, they are presented as a sequence of lemmas with the proofs of the lemmas provided in Appendices B and C. We discuss the results and directions for future research in Section 10. Finally, in Appendix A, we explore alternative definitions of robust stability and corresponding variants of our jumping fluid conditions and explain why they are unlikely to yield sharp necessary and sufficient conditions.

1.4. Notation

We collect here some notational conventions to be used throughout the paper. We use boldface symbols to denote vectors and ordinary font to denote scalars. For any vector v, we use vi to denote its ith component and |v| to denote the sum |v1|++|vn|. We also use the notation [v]+ to denote the vector with components max{0,vi}. Finally, we let ej stand for the jth unit vector.

We use R+ and Z+ to denote the sets of nonnegative reals and nonnegative integers, respectively. Furthermore, for a vector v, we write v0 (respectively, v0) to indicate that all components are nonnegative (positive). For any set S, we denote its convex hull by conv(S).

Throughout, · stands for the Euclidean norm. Sometimes, we use the alternative notation d(x,y) in place of xy. We also let d(x,S) be the distance of a vector x from a set S, that is, d(x,S)=infySxy. We finally use 𝟙(·) to denote the indicator function and log to denote the natural logarithm.

For any time function x(·) that is right-continuous with left limits, (dx/dt)(t) or x˙(t) stand for the right derivative of x(t) with the implicit assumption that it exists, and x(t) stands for limτtx(τ).

2. The Model

2.1. Network Model and the Max-Weight Policy

We consider a switched network that operates in discrete time. For simplicity and ease of presentation, we restrict ourselves to single-hop networks. However, our results are easily generalized to multihop networks of the type considered in Sharifnassab et al. (2020).

The network consists of queues that buffer incoming packets (or jobs). For any tZ+, we let Q(t) be a nonnegative vector whose jth component is the length of the jth queue at time t. Packets arrive to the queues according to a nonnegative stochastic vector arrival process, A(·). In particular, Aj(t) stands for the exogenous arrival to the jth queue at time t. We assume that the random variables Aj(t), for different j and t, are independent. We refer to E[A(t)] as the arrival rate vector at time t.

At each time t, the amount of service received by the queues is a nonnegative vector μ(t), which is chosen by a scheduler from a finite set M of possible service vectors. The queue lengths then evolve according to

Q(t+1)=[Q(t)μ(t)]++A(t).(1)

As in Sharifnassab et al. (2020), we assume throughout the paper that, for any μM, the set M also contains all vectors that result from setting some entries of μ to zero. This assumption is naturally valid in most contexts.

We focus exclusively on the popular MW scheduling policy,

μ(t)argmaxνMνTQ(t),(2)
which is known to have favorable stability properties (Tassiulas and Ephremides 1992): whenever there exists a policy under which the queues remain stable, MW results in stable queues. More specifically, let us consider the set M¯, defined as the convex hull, conv(M), of the set of all possible service vectors, which is the so-called capacity region of the network. For the case of i.i.d. arrivals and under common stochastic assumptions, it is known that, if the arrival rate vector lies in the interior of the capacity region, then MW results in stable queues; conversely, if the arrival rate vector lies outside the capacity region, the queues are unstable under every scheduling policy (Tassiulas and Ephremides 1992).

2.2. Light- and Heavy-Tailed Arrivals

In this section, we present some definitions related to the tails of the arrival process distributions.

To any nonnegative random variable X, we associate a tail exponent, defined as the value of γ at which E[X1+γ] switches from finite to infinite:

γ*=sup{γ:E[X1+γ]<}.(3)

As an example, consider a continuous random variable whose probability density function f(·) satisfies

c·x(2+γ)f(x)logkx·x(2+γ),xx0,(4)
where c, γ, k, and x0 are positive constants. Such a distribution has a tail exponent equal to γ.

For an i.i.d. arrival process, a tail exponent is unambiguously defined as the tail exponent of the marginal distribution at an arbitrary time. However, once we bring robustness into the picture, we are led to consider arrival processes with nonidentically distributed A(t). To any arrival process Aj(·), we associate a tail exponent, γj, defined as the largest value of γ such that Aj(t) is dominated by some nonnegative random variable X with tail exponent γ for all times t:

γj=sup{γ:thereexistsar.v.X0s.t.X dominates Aj(t) for all t0,and E[X1+γ]<}.(5)

Here, the term “dominates” refers to stochastic dominance: a random variables X dominates a random variable Y if P(X>a)P(Y>a) for all aR. We say that Aj(·) is heavy-tailed if γj1 and light-tailed otherwise. Note that this definition of a heavy-tailed process is aimed to capture the boundedness of variance of the input distribution and differs from the conventional definition of heavy-tailed random variables, which requires subexponential decay of tail probabilities (Nair et al. 2020). The tail behavior of the different arrival processes is summarized by the vector γ=(γ1,,γ).

We note that, as long as E[Aj(t)]μ¯ for some constant μ¯ and for all times t, the tail exponent γj is well-defined and lies in the range [0,]. Indeed, suppose that λj(t)=E[Aj(t)]μ¯ for some finite constant μ¯ and for all t. Consider a random variable X with probability density function fX(x)=(μ¯/x2)𝟙(xμ¯). The Markov inequality implies that

P(Aj(t)>a)min{1,E[Aj(t)]/a}min{1,μ¯/a}=P(X>a),a>0.

In particular, X dominates Aj(t) for all t. Furthermore, E[X1+γ]<, for every γ<0. Thus, γj is at least as large as any negative number, which implies that γj0.

Conversely, if γj>0, then E[Aj(t)] is finite and bounded as a function of t. We finally note that γj may be infinite; this is the case for bounded or, more generally, exponential-type distributions.

2.3. Robust Delay Stability

As already mentioned, we are interested in the question of delay stability under the MW policy in the presence of heavy-tailed arrivals. Given a set of arrival processes, we say that queue m is delay stable if, starting from Q(0)=0, we have suptE[Qm(t)]<.

The arrival rates and the tail exponents do not provide enough information to decide whether we have delay stability or even stability. For example, there is sometimes indeterminacy on the boundary of the capacity region. Furthermore, a tail exponent of γj=1 is compatible with E[Aj2(t)] being either finite or infinite, and this may be critical as far as delay stability is concerned (cf. the Pollaczek–Khintchine formula). These difficulties, all related to indeterminacy at certain boundaries, can be circumvented by focusing on a robust version of delay stability that incorporates two distinct elements.

  1. We require delay stability for all arrival process distributions with given tail exponents. This allows us to focus on conditions that involve the tail exponents and ignore other details of these distributions.

  2. We require delay stability for all arrival rates, possibly time-varying, in the vicinity of a given nominal rate. This allows us to avoid issues of indeterminacy when the arrival rate lies at a threshold between delay stability and instability.

Definition 1

(Robust Delay Stability). Let us fix a network with nodes and a set M of possible service vectors. We consider a tail exponent vector γ with components in [0,] and a nominal arrival rate vector λ*0.

  1. Given some δ0, an arrival process A(·) belongs to the class Aδ(γ;λ*) if

    1. The random variables Aj(t), for different j and t, are independent and have finite means.

    2. For every j, the tail exponent of the process Aj(·) is γj.

    3. For every t0, we have E[A(t)]λ*δ.

  2. For m{1,,}, we say that queue m is robustly delay stable (RDS) if there exists some δ>0 such that queue m is delay stable for all arrival processes in the class Aδ(γ;λ*).

3. A Counterexample and the Insufficiency of Fluid Models

In this section, we discuss a simple example with the following features:

  1. Similar to existing examples, heavy-tailed arrivals to some queues can cause delay instability at a queue that receives light-tailed traffic.

  2. Tight criteria for delay instability need to take into account the values of the tail exponents. In particular, criteria that are based on traditional fluid models cannot be conclusive because they do not involve the tail exponents.

  3. Delay instability may emerge from coordinated large arrivals at multiple heavy-tailed queues.

Consider the three-queue network in Figure 1. The set of possible service vectors M is such that, at any time slot, up to two queues can be served, each with rate one, but not all three queues can be served simultaneously. Thus, at each time step, the MW policy chooses two queues with the largest backlogs and serves each one of them with rate one.

Figure 1. A Single-Hop Network with Three Queues, Two of Which Receive Heavy-Tailed Traffic, Whereas the Third One Receives Deterministic Traffic
Notes. In this figure, the dotted ellipses illustrate the different possible service vectors. The third queue is delay unstable for certain ranges of λ and γ.

The third queue receives deterministic arrivals with A3(t)=λ<1 for all t0 so that γ3=. The other two queues receive heavy-tailed traffic with a density of the form (4) and tail exponent γ(0,1) and with rate 0.5; that is, E[A1(t)]=E[A2(t)]=0.5.

The total arrival rate is 1+λ, which is less than two, and the network is stable in the conventional sense. The first two queues are automatically delay unstable because they receive heavy-tailed arrivals. However, the third queue can be either delay stable or delay unstable, depending on the values of λ and γ. In what follows, we provide an informal discussion of the different cases.

Case 1

(0.5<λ<1). In this case, queue 3 is delay unstable through a scenario similar to those considered in earlier works (Markakis et al. 2014). Intuitively, because of its heavy-tailed arrivals, Q1 occasionally receives large inputs. When that happens, with A1(t0) being large at some time t0>0, Q1 becomes and stays largest for some time. During that time, the MW policy keeps serving Q1, while the remaining service capacity is split between Q2 and Q3. Because λ>0.5, the sum of the arrival rates to Q2 and Q3 exceeds the aggregate service rate to these two queues over a time interval of duration Ω(A1(t0)). Thus, Q2 and Q3 build up to size Ω(A1(t0)). Because γ<1, we have E[A12(t)]=, and using this property, it can be shown that E[Q3(t)] grows unbounded.

The intuition behind this argument is captured by an existing criterion from Markakis et al. (2014) that examines certain trajectories q(·) of a corresponding fluid model (also called fluid trajectories). In that fluid model, the arrival processes are replaced by deterministic flows with the same rates. The initial conditions are qh(0)=1 for some heavy-tailed queue (in our example, h = 1 or h = 2) and qj(0)=0 for jh. (Because of symmetry, we only need to consider the case in which h = 1.) Let us say that the “zero fluid” (ZF) condition holds if the solution to the fluid model (known to be unique for the MW policy) keeps q3(·) at zero. According to the criterion in Markakis et al. (2014), the failure of the ZF condition certifies the delay instability of queue 3. This is indeed the case here: starting with the initial conditions q(0)=(1,0,0), it can be verified that, for small positive times t, we have q3(t)=(λ0.5)t/2>0.

In summary, for this particular case, we have delay instability, and this is correctly predicted by the failure of the ZF condition and available results.

Case 2

(0<λ<0.5 and γ>0.5). In this case, queue 3 turns out to be delay stable (in fact, robustly delay stable). This is a consequence of our general result in Section 5.

For this case, the fluid model initialized at q(0)=(1,0,0) satisfies q3(t)=0 for all positive times, and the ZF condition holds. In particular, the ZF condition is “aligned” with delay stability. Observations of this nature lead to the question whether the ZF condition can be used as a certificate of delay stability (Markakis et al. 2018). However, this is not the case as we discuss next.

Case 3

(0<λ<0.5 and γ0.5). In this case, the ZF condition holds, similar to Case 2. However, queue 3 turns out to be delay unstable; this follows from the proof2 of our main result, Theorem 1.

We summarize here the underlying intuition. When γ0.5, there is considerable probability that both queues 1 and 2 receive large inputs within a certain time interval. More concretely, for large values of M, there is probability Ω(1/M) that both Q1 and Q2 receive aggregate arrivals of size at least 3M within the time interval [0,M]. If this happens, both Q1 and Q2 become large enough so that Q3 receives no service during the interval [M,2M]. As a result, Q3(2M)λM with probability Ω(1/M). One can then use this fact to show that, as t increases, E[Q3(t)] grows unbounded, and queue 3 is delay unstable.

In summary, the presence or absence of delay stability depends on the tail exponents in a nontrivial manner. Furthermore, the ZF condition cannot discriminate between Cases 2 and 3 and, thus, cannot account for the different outcomes (delay stability in Case 2, delay instability in Case 3). In fact, the same obstacle arises with any other criterion that relies on traditional fluid models because fluid models do not take the tail exponents into account. In order to make progress, we need to consider the probability that large inputs (or jumps) may arrive within a certain time interval as a function of the tail exponents (the probability is larger when the tail exponents are smaller). Furthermore, as illustrated by Case 3, we may have to consider the effect of “coordinated” large jumps at more than one queue within the same time interval. This is accomplished by the model in the next section: it is still in the spirit of traditional fluid models except that it allows for jumps along the heavy-tailed flows, subject to a “budget” on allowed jumps as determined by the tail exponents.

4. Jumping Fluid Models

In this section, we introduce a generalization of the fluid model that allows for jumps along certain coordinates. We proceed by first defining a traditional fluid model and then modifying it.

4.1. The Fluid Model

A fluid model is a deterministic continuous-time dynamical system that replaces the arrival process with a fluid stream of arrivals and updates queue lengths along max-weight drifts. The literature provides a few, somewhat different but equivalent, definitions of the fluid model (Shah and Wischik 2012, Markakis et al. 2018), which typically involve differential equations with boundary conditions. Here, we adopt an equivalent but somewhat simpler definition3 from Sharifnassab et al. (2020).

Recall the definition of M as the set of all possible service vectors. For any xR+, we define

M(x)={μM|μTxνTx,νM}=argmaxνMνTx,(6)
which is the set of all possible service vectors that, for the given x, attain the maximum in the definition of the MW policy; see (2). We also let
M¯(x)=conv(M(x)).(7)

Furthermore, given an arrival rate vector λ0 and any xR+, we let

D¯λ(x)=λM¯(x)=conv({λμ|μargmaxνMνTx}),(8)
which is the set of candidate drifts when the queue length vector is x.

For the definitions that follow, recall our convention that q˙(t) denotes the right derivative of q(·) with respect to time at time t.

Definition 2

(Fluid Trajectories). Let us fix a network with nodes, a set M of possible service vectors, and an arrival rate vector λ0. A fluid trajectory corresponding to λ is a nonnegative, continuous, and right-differentiable -dimensional function q(·) that satisfies the differential inclusion

q˙(t)D¯λ(q(t)),t0.(9)

Given some λ0 and q(0)0, there always exists a unique (and nonnegative) fluid trajectory q(·) corresponding to λ and initialized at q(0) (Markakis et al. 2018, Sharifnassab et al. 2020).

4.2. Adding the Jumps

We now introduce JF trajectories. Given a vector n=(n1,,n) of nonnegative integers, an ϵ-JF(n) trajectory is a fluid trajectory with jumps with nj positive jumps in its jth component, allowing for ϵ-changes in the arrival rate λ. More concretely, we have the following definition.

Definition 3

(ϵ-Jumping Fluid Trajectories). Let us fix a network with nodes and a set M of possible service vectors. We are given an arrival rate vector λ*0, a nonnegative integer vector n, and some ϵ0. An ϵ-JF(n) trajectory corresponding to λ* is a nonnegative -dimensional function q(·), which is right-continuous with left limits and right-differentiable initialized with q(t)=0 for all t < 0 and with the following properties:

  1. Each component qj(·) has nj points of discontinuity.

  2. If qj(·) has a discontinuity at some time t, then qj(t)>qj(t).

  3. For every t0,

    q˙(t)D¯λ(t)(q(t)),(10)
    for some nonnegative function λ(·) that is right-continuous and piecewise constant with finitely many points of discontinuity and satisfies λ(t)λ*ϵ for all t.

Finally, an ϵ-JF(n) trajectory with4 γTn=j=1γjnj1, is called an ϵ-JF(γ) trajectory.

Note that we allow jumps at time zero, in which case q(0)0. Note also that, if ϵ = 0 and jumps can only happen at time zero, then an ϵ-JF trajectory is just a fluid trajectory with the jumps of the JF trajectory determining the initial conditions of the fluid trajectory.

More generally, an ϵ-JF trajectory consists of a concatenation of fluid trajectories over the intervals in which λ(·) stays constant together with a finite number of jumps. For this reason, once the jump times, jump sizes, and function λ(·) are specified, the results for fluid models extend and establish the existence and uniqueness of the ϵ-JF(n) trajectory.

Our next definition formalizes the requirement that a certain queue must stay at zero under all ϵ-jumping fluid trajectories.

Definition 4

(JF Conditions). Let us fix a network with nodes and a set M of possible service vectors. We are given an arrival rate vector λ*0 and a particular queue, m, of interest.

  1. Given a vector γ with components in [0,] and some ϵ0, we say that the ϵ-JF(γ) condition holds for queue m and λ* if, for every ϵ-JF(γ) trajectory corresponding to λ* and every t0, we have qm(t)=0.

  2. Given a vector γ with components in [0,], we say that the robust jumping fluid condition (RJF(γ)) holds for queue m and λ* if there exists some ϵ>0 such that the ϵ-JF(γ) condition holds for queue m and λ*.

Note the restriction on the number of jumps in terms of the tail exponents: the heavier the arrival processes (i.e., the smaller the tail exponents γj), the larger the number nj of jumps that we allow. As an example, if γ11 and queue 1 is the only heavy-tailed queue, then we allow up to 1/γ1 jumps at queue 1 and no jumps at the other queues.

5. Main Result

Our main result provides a necessary and sufficient condition for robust delay stability in terms of JF trajectories. The proof is given in Sections 7 and 8.

Theorem 1.

Let us fix a network with  nodes; a set M of possible service vectors; an arrival rate vector λ*0; a particular queue, m, of interest; and a vector γ of tail exponents with components in (0,]. The queue m is RDS if and only if the RJF(γ) condition holds for queue m and λ*.

5.1. Some Intuition

We provide here a high-level explanation of our result. Some more refined intuition is provided by the proof outlines in Sections 7.1 and 8.1.

Let M be a large constant. We say that the stochastic process Aj(t) has a jump whenever it is larger than (approximately) M. Let Nj be the number of jumps of Aj(t) during the interval [0,M] and let N=(N1,,N). It turns out that, for any nonnegative integer vector n, the probability that the event N=n scales (approximately) like MγTn. The latter quantity is “significant” (in the sense that it makes an unbounded contribution to certain expected values) if and only if γTn1. Thus, over an interval of length M, we can focus on sample paths for which the realized vector n of jump counts satisfies γTn1 and examine whether such sample paths can cause the queue of interest to become large. We then argue that these sample paths are well-approximated by the ϵ-JF(n) trajectories involved in the ϵ-JF(γ) condition.

5.2. Remarks

We continue with some remarks on the scope of our result.

5.2.1. Heavy-Tailed Queues.

If queue m is heavy-tailed, that is, γm1, then the condition γTn1 allows n to be the mth unit vector. With such a vector n, we can have an ϵ-JF(n) trajectory with a positive jump in the mth component, resulting in a positive value of qm(t). Thus, the RJF(γ) condition does not hold, and queue m is not RDS. This is just a variation of the well-known fact that a queue with heavy-tailed arrivals is not delay stable.

5.2.2. Unstable Systems.

Theorem 1 makes no stability assumptions. For unstable (or marginally stable) systems, some components of ϵ-JF trajectories can grow arbitrarily large. On the other hand, these components do not necessarily have a substantial effect on the queue, m, of interest. As long as the mth component of all ϵ-JF trajectories stays at zero, queue m is RDS.

5.2.3. Comparison with the ZF Condition.

If γj>1/2 for all j, then an ϵ-JF trajectory can have at most one jump. For stable systems, the RJF condition boils down to a robust version of the ZF condition introduced in Section 3. In other words, for this case, a robust version of the ZF condition is a necessary and sufficient condition for robust delay stability. On the other hand, because the (robust version of) the ZF condition is strictly weaker than the RJF condition, it does not provide necessary and sufficient conditions for general γ.

5.2.4. The Light-Tailed Case.

Suppose that γj>1 for all j so that all arrival processes are light-tailed. In this case, no jumps are allowed, and the RJF condition boils down to considering ordinary fluid trajectories with slightly perturbed arrival rates. We have the following possibilities:

  1. If λ* is in the interior of the capacity region M¯, then 0 is in the interior of D¯λ*(0)=λ*M¯. It then turns out that 0 is an attracting fixed point of the fluid dynamics, the RJF condition holds, and we have RDS for all queues. This is in line with existing results (e.g., see theorem 4.5 of Georgiadis et al. 2006).

  2. If λ* is on the boundary or outside the capacity region and, similar to our earlier discussion of unstable systems, queue m could be either RDS or non-RDS, depending on whether (perturbed) fluid trajectories cause qm to become positive or not.

5.2.5. The Case of Zero Tail Exponents.

Our definitions in Sections 2 and 4 are formulated for a nonnegative vector γ. However, our result is restricted to the case in which this vector is positive. We comment on the reasons for this.

When γj=0, we are dealing with an arrival process for which E[Aj(t)] is finite, whereas E[Aj(t)1+γ] may be infinite for every γ>0. Our proofs involve, at certain places, a division by γj and break down if γj=0. It is not clear whether a similar result is possible when some of the tail exponents are zero.

5.2.6. Computational Issues.

As already hinted in the introduction, checking the RJF condition algorithmically appears to be a hard computational problem, amenable only to impractical Tarski-like elimination algorithms. In one possible simplification, we might just consider ϵ-JF trajectories with ϵ = 0 so that λ(t)=λ* for all times t. However, this would still leave the indeterminacy of the jump times and sizes to be reckoned with. Even worse, a restriction to this limited class of trajectories does not seem to lead to necessary and sufficient conditions for any suitably modified notion of stability. See Appendix A.2 for further discussion. A related question is whether we could, without loss of generality, require all the jumps to occur at the same time, for example, at time zero, thus eliminating the need to consider all possible values of the jump times. Unfortunately, this is not the case: there exist examples in which ϵ-JF trajectories can drive a queue m to a positive value, but this can happen only if we allow the jumps to occur at different times; see Appendix A.5 for an example.

5.3. Our Example, Revisited

On the positive side, Theorem 1 elucidates the precise mechanism that leads to delay instability through a coordination of multiple abnormally large arrival vectors at possibly different times and queues. Furthermore, it allows us to analyze simple problems, such as the one discussed in Section 3, which we do next.

Recall the network of three queues in Section 3, in which γ=(γ,γ,). For γ in the range (0,0.5], consider a jumping fluid trajectory q(·) with n=(1,1,0) in which both q1 and q2 undergo unit jumps at time 0. With this trajectory, q3 immediately starts to grow positive. Because γTn=2γ1, it follows that the RJF(γ) condition fails to hold. Theorem 1 then establishes that queue 3 is not robustly delay stable.

On the other hand, when γ is in the range (0.5,1], the constraint γTn1 allows at most one jump in either q1 or q2. Without loss of generality, we can assume that this jump takes place at time 0. It turns out that the fluid trajectories that start from either q(0)=(1,0,0) or q(0)=(0,1,0) keep q3(·) at zero if and only if λ0.5. Therefore, for γ in range (0.5,1] and also taking robustness into account, queue 3 is RDS if and only if λ<0.5.

6. Robust Delay Stability via Lyapunov Functions

Lyapunov functions are a powerful tool for the stability analysis of queueing networks (Tassiulas and Ephremides 1992, Bertsimas et al. 2001, Maguluri et al. 2016), for example, in throughput optimality proofs for the MW policy (Tassiulas and Ephremides 1992, Neely 2010). Markakis (2013) and Markakis et al. (2018) provide a sufficient condition for delay stability based on a class of piecewise linear Lyapunov functions and use it to derive a sharp characterization of delay stability for a special class of networks, namely, networks with disjoint schedules. Nevertheless, the Lyapunov approach in Markakis (2013) and Markakis et al. (2018) has some drawbacks: (a) the condition provided therein is, in general, only sufficient for delay stability; (b) it does not take into account the tail exponents even though they play an essential role in delay stability as already discussed in Sections 3 and 5.3; (c) the Lyapunov functions considered are piecewise linear, which is perhaps inadequate for the purpose of tight delay stability conditions.

In this section, we explore the power of Lyapunov functions for the case in which the tail exponents of the heavy-tailed queues may be arbitrarily close to zero so that the RJF condition allows an arbitrarily large number of jumps.

For the remainder of this section, we assume that queues 1,,h can be heavy-tailed, where h<, whereas the remaining queues are light-tailed. Formally, we consider the set Γ of tail coefficients, defined by

Γ={γ0:γj>1, for j=h+1,,}.

We also fix an arrival rate vector λ*0 and a light-tailed queue m > h of interest.

Definition 5

(Special Lyapunov Function). For any ϵ>0, we say that a function V:R+R+ is a special ϵ-Lyapunov function if

  1. V is Lipschitz continuous with Lipschitz constant one.

  2. V˙(q(t))ϵ whenever V(q(t))>0 for all fluid trajectories q(·) corresponding to arrival rate λ*.

  3. We have V(0)=0. Furthermore, if qm>0, then V(qm)>0.

  4. V is nonincreasing along the coordinates associated with heavy-tailed queues; that is, for j=1,,h, for any qR+n, and any α>0, we have V(q+αej)V(q), where ej is the jth unit vector.

Special ϵ-Lyapunov functions as defined are quite similar to the functions considered in theorem 2 of Markakis et al. (2018). However, in contrast to Markakis et al. (2018), our special Lyapunov functions need not be piecewise linear. Our next result establishes a strong connection between the ϵ-JF(γ) condition and the existence of special ϵ-Lyapunov functions. The proof is given in Section 9.

Theorem 2.

For any ϵ>0, there exists a special ϵ-Lyapunov function if and only if the ϵ-JF(γ) condition holds for every γΓ.

The special ϵ-Lyapunov functions constructed in the proof of Theorem 2 are not piecewise linear. We do not know whether Theorem 2 remains valid if we restrict to piecewise linear functions.

Combining Theorems 1 and 2, we can establish a strong connection between robust delay stability and special ϵ-Lyapunov functions. In what follows, we say that queue m is ϵ-RDS(γ) if it is delay stable under all arrival processes in the class Aϵ(γ;λ*) in Definition 1(a).

Corollary 1.

Let us fix a network with  nodes, a set M of possible service vectors, an arrival vector λ*0, the number h of heavy-tailed queues, and a light-tailed queue m > h.

  1. If there exists a special ϵ-Lyapunov function for some ϵ>0, then queue m is RDS for all tail exponents γΓ.

  2. Suppose that there exists some ϵ>0 such that, for all γΓ, queue m is ϵ-RDS(γ). Then, there exists an ϵ-special Lyapunov function.

The proof is provided in Section 9.3. We conjecture that Corollary 1(b) can be strengthened to provide a converse to part (a).

Hypothesis 1.

If queue m is RDS for all tail exponents γΓ, then for some ϵ>0, there exists a special ϵ-Lyapunov function.

If Hypothesis 1 is true, we have a tight characterization: a queue is RDS for all tail exponents γΓ if and only if there exists a special ϵ-Lyapunov function that testifies to this. Establishing the conjecture appears to be difficult. Technically it amounts to reversing the order of the quantifiers in the clause “there exists ϵ>0 such that, for all γΓ” in Corollary 1(b) and showing equivalence with the statement “for every γΓ, there exists some ϵ>0…”

Our Lyapunov-based results are relevant to the case in which nothing is known about the tail exponents of the heavy-tailed queues other than the fact that they are positive. On the other hand, Lyapunov functions are unlikely to provide useful characterizations of robust delay stability for specific values of the tail exponents because there is no apparent way of accounting for the number of jumps through Lyapunov functions.

7. Proof of the “if” Direction of Theorem 1 (RJFRDS)

In this section, we provide the proof of the “if” direction of Theorem 1, that is, that the RJF condition implies RDS.

Throughout this section, we consider a network with nodes and a set M of possible service vectors. We fix an arrival rate vector λ*0; a particular queue, m, of interest; a vector γ of tail exponents with components in (0,]; and some ϵ>0 for which the ϵ-JF(γ) condition holds for λ* and queue m. Our goal is to establish robust delay stability for queue m.

The proof is organized in a sequence of lemmas whose proofs are collected in Appendix B. However, before proceeding to the formal arguments, it is helpful to provide an overview of the proof.

7.1. Outline of the Proof

Let us fix an arbitrary time T. We aim at upper bounds for the probability P(Qm(T)M) as M gets large. As long as these bounds are a summable function of M and independent of T it follows that E[Qm(T)] is finite and a bounded function of T, which is our goal. Let us now fix some M and keep it fixed throughout except for the end of the proof.

The proof relies on various probabilistic bounds as well as on deterministic properties of the MW dynamics. Let us start with the probabilistic part, which is focused on showing that the stochastic system mostly follows the deterministic fluid dynamics except for certain jumps caused by the heavy tails of the arrival processes. We define a threshold for what constitutes a jump and then develop a probabilistic bound on the numbers of jumps. A difficulty here is that, if we use a fixed threshold and because T is arbitrary, a bound on the number of jumps is not possible. We handle this issue by using a threshold θt that increases almost linearly as we move further to the past of the form

θt=M+Ttηlog(M+Tt),t=0,1,,T.(11)

This is for some positive constant η to be defined later. At any time tT and for any index j, we say that Aj(t) is a jump if Aj(t)>θt. Ignoring logarithmic factors, we show that the jump probability P(Aj(t)>θt) is of order at most 1/(M+Tt)1+γj, where γj is slightly smaller than the tail exponent γj. By summing over t and after some elementary calculations, we then obtain that P(Nj=nj) is of order at most 1/Mnjγj, where Nj is the number of jumps of the jth arrival process during the interval [0,T]. Then, a further calculation shows that, if N=(N1,,N), then P(γTN>1) is of order at most 1/Mβ for some constant β(1,2) and is, therefore, a summable function of M; see Lemma 1.

We then consider stochastic fluctuations in the arrival process other than jumps. We argue that they average out so that the cumulative arrival process follows its fluid counterpart. We refer to this as the “small fluctuations event,” and show that it occurs with probability at least 1/M2; see (21) and Lemma 3.

Having completed the probabilistic analysis, we then switch to deterministic (sample path) considerations. Let W(n) be the set of all points q that can be reached by some ϵ-JF(n) trajectory (see Definition 6). As a first step, we exploit some special properties of the MW dynamics and show that W(n) is ϵ-attracting; that is, any fluid trajectory that starts outside W(n) moves toward that set with rate at least ϵ (Lemma 4). We then consider a “nice” sample path, that is, a sample path for which the small fluctuations event occurs and for which the realized vector of jump counts n satisfies γTn1. (As discussed earlier, nice sample paths have probability 1O(Mβ) with β>1.) Our deterministic analysis, outlined in the next paragraph, shows that any such sample path stays within O(M) distance from the set W(n). Because γTn1, the ϵ-JF(γ) condition implies that any point in W(n) satisfies qm = 0. Thus, for nice sample paths, we have Qm(T)=O(M), and therefore, P(Qm(T)M) is of order at most 1/Mβ for the constant β(1,2) mentioned earlier. This readily implies a uniform upper bound on E[Qm(T)].

The analysis of the dynamics under nice sample paths has two parts. We first study the dynamics between jumps: we rely on the small fluctuations event and then make use of a result from Sharifnassab et al. (2020, theorem 4) to ensure that small fluctuations in the arrivals result into comparably small changes in the resulting stochastic trajectory; cf. Lemma 5. Second, to understand what happens at jump times, we recall that the jump vectors n associated to nice sample paths satisfy γTn1. Such vectors n are allowed in ϵ-JF(γ) trajectories, and therefore, the jumps cannot take the ϵ-JF(n) trajectory away from W(n). This implies that a nice sample path stays “close” to an ϵ-JF(n) trajectory and, therefore, has a “small” Qm.

7.2. Sensitivity of Max-Weight Dynamics

The proof for both directions of the theorem requires fairly precise bounding of the fluctuations of the stochastic trajectories. To this effect, we rely heavily on a fluctuation bound for the MW dynamics established in Sharifnassab et al. (2020).

Theorem 3

(Sharifnassab et al. 2020, theorem 2). Fix a network (i.e., the number of nodes and the set M of possible service vectors) operating under the MW policy and let Q(·) be the corresponding queue length stochastic process. There exists a (deterministic) constant C1 such that, for any arrival rate vector λ0, any q(0)0, any t0, and any sample path, if q(0)=Q(0), then

Q(t)q(t)C(1+λ+maxk<tτ=0k(A(τ)λ)),(12)
where q(·) is the fluid trajectory corresponding to λ initialized at q(0).

We actually use the following variant of Theorem 3, which allows for different initial conditions. The proof is given in Appendix B.1.

Theorem 4.

Under the same assumptions as in Theorem 3 except that we allow for Q(0) and q(0) to be different and for the same constant C, we have

Q(t)q(t)Q(0)q(0)+C(1+λ+maxk<tτ=0k(A(τ)λ)).

7.3. Arrival Process and Jumps

We now return to the formal proof. Recall that, throughout this section, we fix the network, λ*0 and the tail exponents γj(0,]. We assume that the ϵ-JF(γ) condition holds for λ*, queue m, and some ϵ>0. We fix some positive integers M and T; these remain fixed throughout except for the end of the proof and for some additional assumptions that M is “large enough.”

We consider an arrival process A(·) in the class Aδ(γ;λ*) introduced in Definition 1 with δ=γϵ/20C, where C is the constant in Theorem 4 and

γ=minj=1,,γj.(13)

In particular,

E[A(t)]λ*γϵ20C,t0.(14)

Our goal is to derive an upper bound on P(Qm(T)M) that holds uniformly for every arrival process A(·) in Aδ(γ;λ*), every T, and every large enough M. This is then used to conclude that the RDS property holds.

Because we assume that the tail exponents are nonzero, we have γ>0. Furthermore, in order to simplify the proof, it is convenient to assume that

γ1.(15)

Claim 1.

The assumption γ1 can be made without loss of generality.

Proof.

Given a system, call it S, consider a new system S in which we add one more queue, queue 0, with γ0<1, which does not interact with the others; for example, for any allowed service vector μ in system S, we introduce a corresponding service vector (1,μ) in system S. This way, the dynamics of queues 1,, are the same in the two systems S and S. It follows that, for m1, queue m is RDS in system S if and only if it is RDS in system S. Furthermore, because of the lack of interaction, the RJF condition for queue m holds in system S if and only if it holds in system S.

Once we prove the result for the case γ1, we apply it to system S and obtain the equivalence of RDS and RJF for the latter system. Based on this discussion, this also establishes the equivalence of RDS and RJF for the original system S. □

We define some more constants:

μ¯=1+maxμMμ+λ*+ϵ,(16)
and
η=8000C22μ¯(γϵ)2.(17)

Having fixed M, T, and η, we finally define θt as in (11).

For j=1,, and t=0,,T1, we say that t is a jump time for Aj(·) if Aj(t)>θt. For any j and any τ[0,T), we let Nj(τ) be the (random) number of jumps in Aj(·) that occur during [0,τ] and also define the corresponding vector N(τ)=(N1(τ),,N(τ)). To simplify notation and as long as T is fixed, we use N and Nj to refer to N(T1) and Nj(T1), respectively. Note that, with this definition, N=N(T1) includes all jumps that can affect Q(T).

We consider the event

Ejump(T,M)={γTN1}.(18)

We argue that it occurs with high probability.

Let F be the set of all n0 such that 1<γTn2. If F is empty, then whenever γTn>1, we must have γTn>2. If F is nonempty, it has finitely many elements, which implies that the minimum minnFγTn is attained and its value is greater than one. In either case, we obtain that there exists a constant β>1 such that

if γTn>1, then γTnβ3.(19)

Without loss of generality, we can take β to satisfy 1<β<2. In the next lemma, we show the probability that γTN>1 decays at least as fast as 1/Mβ.

Lemma 1.

There exists a constant M10 independent of T such that, if MM1, then

P(Ejump(T,M))=P(γTN1)1Mβ.

The proof is given in Appendix B.2 and relies on the intuition that the probability of a jump in the jth arrival process scales at most as Mγj (approximately), which then implies that the probability of the event {N=n} scales at most as MγTn (approximately).

7.4. Fluctuations of the Arrival Process

We now study the remaining fluctuations of the cumulative arrival processes after we exclude the jumps. We begin with a concentration inequality for the sum of independent random variables. The proof of our next lemma is given in Appendix B.3 and is essentially a reformulation of the Bernstein inequality; see, for example, (1.21) in appendix 1 of Anthony and Bartlett (2009).

Lemma 2.

Suppose that X1,,Xn are independent random variables that satisfy Xi[0,b] and E[Xi]λ¯ for some b,λ¯>0. Let Y=X1++Xn. Then, for any z0,

P(|YE[Y]|>z)2exp(z22b(λ¯n+z/3)).(20)

We consider the truncated process A*(t)=min{A(t),θt}, where the minimum is taken component-wise. We define the small fluctuations event

Efluc(T,M)={τ=t0t(A*(τ)λ*)γϵ10C(M+Tt0),for 0t0t<T},(21)
where C is the constant in Theorem 4.

Lemma 3.

There exists a constant M20 independent of T such that, if MM2, then P(Efluc(T,M))1M2.

The proof is somewhat long but straightforward. It relies on the concentration inequality in Lemma 2 and is given in Appendix B.4.

7.5. Deterministic Analysis of the Dynamics

We start with some definitions. It is useful to recall here that ϵ-JF trajectories start at zero just before time 0 but can become nonzero at time 0 or later because of jumps or unstable drifts.

Definition 6.

For any nonnegative integer vector n, we define W(n) as the set of all points in R+ that can be reached by some ϵ-JF(n) trajectory.

Because ϵ-JF trajectories are by definition nonnegative, W(n) is a subset of R+. Note that the set W(n) depends on ϵ, but because ϵ is held fixed, we suppress this dependence from our notation.

Definition 7

(ϵ-Attracting and ϵ-Invariant Sets).

  1. A subset W of R+ is ϵ-attracting if every fluid trajectory q(·) corresponding to λ* and initialized at some arbitrary q(0)0 satisfies

    ddtd(q(t),W)ϵ,(22)
    whenever q(t)W.

  2. A subset W of R+ is ϵ-invariant if, for any λ0 that satisfies λλ*ϵ and any fluid trajectory q(·) corresponding to λ and initialized at some arbitrary q(0)W, we have q(τ)W for all τ0.

We observe that if W is ϵ-attracting and 0t0<t1, then every fluid trajectory satisfies

d(q(t1),W)max{0,d(q(t0),W)(t1t0)ϵ}.(23)

It can be shown that an ϵ-attracting set is ϵ-invariant, but we do not need this fact. We are interested instead in the converse statement, that every ϵ-invariant set is ϵ-attracting, which we then apply to the set W(n); this is the subject of the next lemma.

Lemma 4.

  1. Every ϵ-invariant set is ϵ-attracting.

  2. For any nZ+, the set W(n) in Definition 6 is ϵ-invariant and, a fortiori, ϵ-attracting.

The proof is given in Appendix B.5 and relies on the intuition that, for an ϵ-invariant set W, ϵ-perturbations of λ* cannot take the trajectories away from W. This means that there must be a drift toward W that locally counteracts such perturbations. The nonexpansive property of the max-weight dynamics then enables us to extend this local counteraction result into the desired global attraction property.

Let us fix a sample path of the arrival process. For any t[0,T), let n(t) be the realized value of N(t), that is, n(t) is the vector with the realized number of jumps in the process A(·) up to time t (see the paragraph preceding (18)). With a bit of abuse of notation, we let

W(t)=W(n(t1)).(24)

(The reason for the t – 1 term on the right-hand side is that we want to compare W(·) and Q(·), but Q(t) is only affected by jumps that happen before time t.)

7.6. Some More Intuition

The general idea is to show that Q(·) stays close to the set W(·) so that we can ultimately exploit the fact that qm = 0 for every qW(T). There are two parts to the argument:

  1. If, at a certain time, Qj(t) has a jump (i.e., a large increase), the set W(t) expands along the jth coordinate, and so the distance between Q(t) and W(t) does not increase.

  2. In between jumps, Q(t) follows the fluid trajectory, plus some fluctuations, within the range allowed by Lemma 3. These fluctuations get “eliminated” because the fluid trajectory is attracted to W(t) (Lemma 4). Note that (21) allows for larger fluctuations in the far past (see the term M+Tt0); however, for fluctuations in the far past, the motion in the direction of W(t) happens for a longer time period, enough to eliminate them. This explains the choice of the threshold θt in (11); the logarithmic term in the denominator is included for technical reasons.

7.7. The Distance from the Invariant Set

Given times that satisfy 0t0<t1T, we say that the interval (t0, t1) is jump-free if Aj(τ)θτ for all j and all τ(t0,t1). Note that the initial time t0 and the end time t1 are allowed to be jump times. Let M31 be a constant, independent of T, such that, for any MM3,

γϵM10+MηlogM+ϵ+(λ*+1)C+μ¯γϵM6.(25)

Lemma 5.

Suppose that MM3 and fix times that satisfy 0t0<t1T. Consider a sample path under which the event Efluc(T,M) occurs and the interval (t0, t1) is jump-free. Then,

d(Q(t1),W(t1))max{0,d(Q(t0),W(t0))(t1t0)ϵ}+ϵγ6(M+Tt0).(26)

Note that the lemma also applies when t1=t0+1 so that the set of integers in (t0, t1) is empty. The proof, which is given in Appendix B.6, relies on the sensitivity bound in Theorem 4 and the fact that W(t0) is ϵ-attractive. The first term on the right-hand side reflects the fact that a fluid trajectory is attracted to W(·) during the jump-free interval; the second term reflects the effect of the smaller fluctuations during the interval (t0, t1).

We then apply Lemma 5 and use strong induction on t for tT to establish the next lemma. Its proof is given in Appendix B.7.

Lemma 6.

Suppose that MM3. Consider a sample path under which the events Ejump(T,M) and Efluc(T,M) occur. Then, d(Q(T),W(T))Mϵ/2.

7.8. Bounding E[Qm(t)]

Let M¯=max{M1,M2,M3}, where M1, M2, and M3 are the constants in Lemma 1, Lemma 3, and (25), respectively. Because these constants are independent of T, the constant M¯ is also independent of T.

Let us consider some MM¯ and a sample path under which the events Ejump(T,M) and Efluc(T,M) occur. In particular, we have γTn1, where n is the realized value of N, that is, the vector with the number of jumps until time T – 1 for that particular sample path. The ϵ-JF(γ) condition, which we assume to hold, implies that every ϵ-JF(n) trajectory with γTn1 keeps qm at zero, and therefore, every vector qW(T) has qm = 0. Consequently, for every sample path in Ejump(T,M)Efluc(T,M), we have

Qm(T)d(Q(T),W(T))Mϵ2,(27)
where the second inequality follows from Lemma 6. Therefore, for any MM¯, we have
P(Qm(T)>Mϵ2)(1P(Ejump(T,M)))+(1P(Efluc(T,M)))Mβ+M2,(28)
where the first inequality follows from (27) and the union bound and the second inequality is due to Lemmas 1 and 3. Because β is by definition greater than one, the formula E[Qm(T)]=0P(Qm(t)>M)dM implies that E[Qm(T)] is bounded above by a constant that does not depend on T. Furthermore, note that this bound applies uniformly to all processes in the class Aδ(γ;λ*) for δ=γϵ/20C (see (14)). This shows that queue m is robustly delay stable and completes the proof of the first direction of Theorem 1.

8. Proof of the Reverse Direction of Theorem 1 (RDS RJF)

In this section, we prove the reverse (“only if”) direction of Theorem 1, that is, that RDS implies that the ϵ-JF(γ) condition holds for some ϵ>0. The proof is organized in a sequence of lemmas whose proofs are collected in Appendix C.

We actually prove the contrapositive and start by assuming that, for every ϵ>0, there exists an ϵ-JF(n) trajectory qϵ(·) with γTn1 and such that qmϵ(t)>0 at some positive time t.

We keep λ*,γ, and ϵ fixed throughout the proof and show that there exists an arrival processes in the class Aϵ(γ;λ*) for which queue m is not delay stable. Because ϵ can be arbitrarily small, this implies that there exists no δ such that queue m is delay stable for all arrival processes in the class Aδ(γ;λ*), and therefore, queue m is not RDS. However, before proceeding to the formal arguments, we overview informally the key ideas in the proof.

8.1. Outline of the Proof

The main idea is to construct a certain arrival process A(·) in the class Aϵ(γ;λ*), whose arrival rate is a time-scaled version of the piecewise constant rate λ(·) associated with the ϵ-JF(n) trajectory. We then use the bounded sensitivity property of the MW dynamics (Theorem 4) to show that the resulting process Q(·) tracks a suitably scaled (by a factor of T) version of the ϵ-JF trajectory qϵ(·) with substantial probability. In particular, Q(T) is (with substantial probability) comparable to Tqϵ(1), leading to a large value of E[Qm(T)]. This part of the argument capitalizes on the fact that the number of jumps of the ϵ-JF(n) trajectory is limited by the condition γTn1.

On the technical side, the tracking result involves two separate arguments:

  1. Whenever the ϵ-JF(n) trajectory has a jump, at some time τ, there is substantial probability that the stochastic process also has a jump at some time near the scaled counterpart, Tτ, of τ; see Lemma 8.

  2. In between jump times of the ϵ-JF(n) trajectory, we use concentration inequalities to show that there is a fairly large probability that the stochastic process stays close to its fluid counterpart.

8.2. Jumps of the ϵ-JF(n) Trajectory

We fix a network with nodes; a set M of possible service vectors; an arrival rate vector λ*0; a particular queue, m, of interest; and a vector γ of tail exponents with components in (0,]. We also fix some ϵ>0 and assume that the ϵ-JF(γ) condition fails to hold.

We start with a few elementary observations, namely, that the ϵ-JF(n) trajectories of interest can be taken, without loss of generality, through scaling and perturbations, to have some convenient properties that allow us to simplify subsequent notation and arguments. The proof is given in Appendix C.1.

Lemma 7.

If the ϵ-JF(γ) condition fails to hold, then there exists some n0 with γTn1 and an ϵ-JF(n) trajectory qϵ(·) with the following properties:

  1. qmϵ(1)>0.

  2. The times at which qϵ(·) is discontinuous (the jump times) all belong to the open interval (0, 1).

  3. At each jump time, exactly one of the components of qϵ(·) is discontinuous.

  4. The arrival rate associated to qϵ(·) satisfies inftλj(t)>0 for all j.

For the rest of the proof, we fix an ϵ-JF(n) trajectory with the properties in Lemma 7, together with the associated vector n and rate function λ(·). We define n=n1++n, which is the total number of jumps.5

We define Θ0=0,Θn+1=1, and for k=1,,n, let Θk be the kth jump time. In particular,

0=Θ0<Θ1<<Θn<Θn+1=1.(29)

We use jk to refer to the queue at which the kth jump occurs and ak to refer to the size of the kth jump. In particular, at time Θk, for k=1,,n,qj(·) is continuous for every jjk, and

qjkϵ(Θk)=qjkϵ(Θk)+ak.

8.3. Defining Certain Constants

As in (16), we define

μ¯=1+maxμMμ+λ*+ϵ.(30)

Moreover, similar to (13), we let

γ=minj=1,,γj>0.(31)

By arguing as in Claim 1, we can and do assume, without loss of generality, that γ1. We then define a positive constant

c=qmϵ(1),(32)
and also let
d=12min{γc4(1+4μ¯),mink=0,,n{Θk+1Θk},mink=1,,nak1+2μ¯}.(33)

(In case n = 0, the last term inside the brackets is taken to be zero.)

8.4. Defining the Stochastic Arrivals

Let us fix some constants t00 and T > 0 and keep them fixed until the end of the proof in Section 8.7. In this section, we define a stochastic arrival process over an interval of the form [t0,t0+T), thus constructing what we call an episode of the overall process. Later, in Section 8.7, we concatenate multiple episodes to construct the stochastic process over the entire timeline [0,).

Consider the arrival rate function λ(·) of the ϵ-JF(n) trajectory; in particular, λ(t)λ*ϵ. The arrival rate vector for the stochastic process during the episode is a time-scaled (by a factor of T) and shifted (by t0) version of λ(·). More concretely, we let

λ¯(t)=λ(tt0T),t[t0,t0+T).(34)

Clearly,

λ¯(t)λ*ϵ,t[t0,t0+T).(35)

Furthermore, it follows from Lemma 7(d) that inftλ¯j(t)>0 for every j.

We now digress to introduce certain constants that are used to specify the exact form of the distribution of the arrival processes. For any α>0, we let

σ(α)=μ¯x(1+α)log(x+1)dx,(36)
where μ¯ is defined in (30). Then, for any α>0, we have
σ(α)σ(1+α)=μ¯x·x(2+α)log(x+1)dxμ¯x(2+α)log(x+1)dxμ¯μ¯x(2+α)log(x+1)dxμ¯x(2+α)log(x+1)dx=μ¯.(37)

Definition 8

(Arrivals During an Episode). Given T > 0 and t00, we define the arrival process over the interval t[t0,t0+T) (which we call an episode) as follows:

  1. If γj=, then Aj(t)=λ¯j(t) (deterministically).

  2. If γj<, then Aj(t) is a random variable with probability density function

    fAj(t)(x)=λ¯j(t)σ(γj)·x(2+γj)log(x+1)𝟙(xμ¯)+(1λ¯j(t)σ(1+γj)σ(γj))δ(x),(38)
    where 𝟙(·) denotes the indicator function, δ(·) is Dirac’s delta function, and σ(·) is as defined in (36).

  3. The Aj(t) for different j and t are independent.

We refer to {A(t):t[t0,t0+T)} as an episode-adjusted arrival process.

From (37), we obtain σ(γj)/σ(1+γj)μ¯λ¯j(t), where the last inequality is due to (30) and (35). Therefore, the coefficient of the delta function is nonnegative, and we have a well-defined distribution. It is also easy to verify that 0fAj(t)(x)dx=1. Furthermore, from (35), λj¯(t) is bounded above. It follows that each Aj(t) is dominated by a random variable A¯j with tail exponent γj, and consequently, the process also has tail exponent γj. Moreover, E[Aj(t)]=0xfAj(t)(x)dx=λ¯j(t). Therefore, using again (35), we have E[A(t)]λ*ϵ, for all t[t0,t0+T). In particular, the process A(·) belongs to the class Aϵ(γ;λ*) as desired.

8.5. Probabilistic Analysis

We aim to show that, during an episode and with significant probability, the queue vector process Q(·) stays close to a scaled version of the ϵ-JF trajectory, that is, that

Q(t)Tqϵ(tt0T),t[t0,t0+T).

To accomplish this, we consider each interval of the form [t0+ΘkT,t0+Θk+1T) for k=0,1,,n and show that there is a substantial probability that the arrival process we have defined has the following properties: (a) as long as k > 0, it has a jump in a small segment in the beginning of the interval, and (b) it has small fluctuations in the rest of the interval. We define events that capture these two properties.

For k=1,,n, let Bk be the cumulative arrival vector over the interval [t0+ΘkT,t0+ΘkT+dT), that is,6

Bk=t=t0+ΘkTt0+ΘkT+dT1A(t).(39)

We let Ekjump be the event that Bk emulates the jump in qϵ(·) at time Θk scaled by T, that is,

Ekjump={BkTakejkdT(1+2μ¯)},(40)
where jk is defined as the index of the queue at which the kth jump takes place and d is the constant defined in (33).

Lemma 8.

There exist ψ(0,1) and T¯1>0 such that, if TT1¯, then for k=1,,n and for the episode-adjusted arrival process over an episode [t0,t0+T), we have P(Ekjump)ψTγjklogT.

The proof is given in Appendix C.2.

Recall now that the function λ(·) driving the trajectory qϵ(·) is piecewise constant with a finite number of pieces. We use r to denote the number of such pieces. We then proceed to define certain small fluctuation events. For k=0,1,,n, we let Ekfluc be the event that the cumulative fluctuations of A(·) over the interval [t0+ΘkT+dT,t0+Θk+1T) are small, that is,

Ekfluc={τ=t0+ΘkT+dTt(A(τ)λ¯(τ))γcT32Cr,t[t0+ΘkT+dT,t0+Θk+1T)}.(41)

Lemma 9.

There exists a T¯20 such that, if TT¯2, then for k=0,1,,n and for the episode-adjusted process over an episode [t0,t0+T), we have P(Ekfluc)1/2.

The proof is given in Appendix C.3.

Note that each one of the events Ekjump for k=1,,n and Ekfluc for k=0,,n is determined by the arrival process during a particular interval and all of these intervals are disjoint. Thus, the independence assumption on the arrival process implies that all of these events are independent. Therefore, if Tmax{T¯1,T¯2}, then

P(E1jump,,Enjump,E0fluc,,Enfluc)=k=1nP(Ekjump)·k=0nP(Ekfluc)k=1nψTγjklogT·k=0n12=12(ψlogT2)nTk=1nγjk=12(ψlogT2)nTγTn12(ψ2)nTγTnlogT12(ψ2)1/γTγTnlogT12(ψ2)1/γT1logT,(42)
where the first inequality follows from Lemmas 8 and 9. The second inequality is because n1 and (without loss of generality) logT1. The third inequality is due to nγ=j=1njγj=1njγj1 so that n1/γ together with the fact ψ1 (see Lemma 8). The last inequality is again because γTn1.

8.6. E[Qm] is Large at the End of an Episode

We are now ready to argue that, if the events E1jump,,Enjump and E0fluc,,Enfluc occur, then, over an episode [t0,t0+T), the process Q(·) stays close to the suitably scaled ϵ-JF trajectory. As a consequence, the value of Qm at the end of the episode becomes of order cT, where c=qmϵ(1)>0 as in (32). This argument is entirely deterministic. It is carried out in the course of the proof of the next lemma (in Appendix C.4) and relies on the sensitivity bound in Theorem 4.

Lemma 10.

There exists a T¯30 such that, if TT¯3, then the following holds. If a sample path of the episode-adjusted process over the episode [t0,t0+T) satisfies the events E1jump,,Enjump and E0fluc,,Enfluc and if Q(t0)cT/5, then Qm(t0+T)cT/2.

Let ρ=(ψ/2)1/γ/4 and T¯=max{2,T¯1,T¯2,T¯3}, where T¯1,T¯2, and T¯3 are the constants in Lemmas 810, respectively. We note that all of these constants, T¯1,T¯2,T¯3, and therefore T¯ as well, are defined in terms of general parameters and properties of the particular ϵ-JF trajectory and are deterministic. Lemma 10 and (42) imply that, for an episode (t0,t0+T) with TT¯ and initialized so that Q(t0)cT/5, we have

P(Qm(t0+T)cT2)P(E1jump,,Enjump,E0fluc,,Enfluc)2ρT1logT,(43)
which implies that
E[Qm(t0+T)]cT2P(Qm(t0+T)cT2)cT2·2ρT1logT=ρclogT.(44)

8.7. Concatenating Episodes over the Entire Timeline

So far, we have defined and studied an arrival process over an episode [t0,t0+T). We now concatenate a sequence of such episodes of increasing duration, which defines an arrival process over an infinite timeline.

We define times T0,T1,T2,, and arrival processes for the intervals [Ti,Ti+1), recursively, as follows. We let T0=0 and T1=T¯, where T¯ is defined in the last paragraph of Section 8.6. We also let A(·) for t[T0,T1) be the corresponding episode-adjusted process as in Definition 8. Suppose now that we have defined Ti for some i1 as well as the arrival process for t[0,Ti). We then let

Ti+1=Ti+max{Ti,10E[Q(Ti)]c}.(45)

Finally, we define the arrival process over the episode [Ti,Ti+1) to be the corresponding episode-adjusted process. With this recursion, the arrival process is now well-defined for all times t0.

Note that E[Q(Ti)]E[t=0Ti1A(t)]=t=0Ti1E[A(t)]Ti(λ*+ϵ)<. This guarantees that all Ti are finite and we have an infinite number of episodes. Moreover, note that, for i=1,2,, we have Ti+12Ti. As a result, Ti2i1T¯, and also,

log(Ti+1Ti)logTiilog2,(46)
where the last inequality is because Ti2i1T¯2i.

We define the event

Bi={Q(Ti)(Ti+1Ti)c5},
and use the Markov inequality to obtain
P(Bi)1E[Q(Ti)](Ti+1Ti)c/51E[Q(Ti)](10E[Q(Ti)])/5=12,(47)
where the second inequality is due to (45).

Recall now that Inequality (44), which is about an episode of length T, makes use of the assumption Q(t0)cT/5, where t0 is the start time of the episode. According to (47), this assumption is satisfied at the start time of the episode [Ti,Ti1) with probability at least 1/2. By interpreting (44) as a statement about conditional expectations and with t0 and t0+T replaced by Ti and Ti+1, respectively, we obtain

E[Qm(Ti+1)]P(Bi)·E[Qm(Ti+1)|Bi]12·ρclog(Ti+1Ti)ρcilog22,
where the last inequality follows from (46). Therefore, E[Qm(Ti)] grows unbounded as i increases. Consequently, under the arrival process that we constructed, queue m is not delay stable. This conclusion is obtained for any positive choice of ϵ, no matter how small, and establishes that queue m is not RDS. This completes the proof of the second direction of Theorem 1.

9. Proof of Theorem 2

9.1. Proof of the First Direction

Let us fix some ϵ>0. To establish one direction of the result, we assume that the ϵ-JF(γ) condition holds for every γΓ. We show that there exists a special ϵ-Lyapunov function.

Let N be the set of all nonnegative integer vectors n such that nj = 0 for j > h; that is, we allow arbitrarily many jumps at the heavy-tailed queues and no jumps at the light-tailed ones. As in Definition 6, for any nonnegative integer vector n, let W(n) be the set of all points in R+ that are reachable by ϵ-JF(n) trajectories. Let W=nNW(n) and consider a Lyapunov function V(·) equal to the distance from W, that is, V(x)=d(x,W), for any xR+. We show that this Lyapunov function has the desired properties.

The distance function is clearly Lipschitz continuous with a Lipschitz constant equal to one, which implies the first property in the definition of special ϵ-Lyapunov functions.

For the second property, Lemma 4(b) applies and shows that each set W(n) is ϵ-invariant. It can be seen that the union, W, of the ϵ-invariant sets W(n) is also ϵ-invariant. It then follows from Lemma 4(a) that W is ϵ-attracting. This proves the second property in Definition 5.

Note that every nN satisfies the inequality γTn1 for some γΓ. Because the ϵ-JF(γ) condition holds for every γΓ, it follows that every ϵ-JF(n) trajectory with nN satisfies qm(t)=0 for all t. Hence, qm = 0 for all qW. Furthermore, because 0W, we have V(0)=0. This establishes the third property in Definition 5.

Finally, W is closed under jumps along coordinates associated with heavy-tailed arrivals. Therefore, V(·) is nonincreasing along those directions, and the fourth property in Definition 5 follows. Thus, V(·) has all the required properties of special ϵ-Lyapunov functions. This completes the proof of one direction of the theorem.

9.2. Proof of the Reverse Direction

We continue with the proof of the reverse direction. We fix some ϵ>0 and assume that there exists a special ϵ-Lyapunov function V(·) and let W={xR+|V(x)=0}. The argument rests on the ϵ-invariance of W, which, in turn, relies on some properties of the MW dynamics that we discuss next.

For any λ,xR+, let

ξλ(x)=q˙(0),(48)
where q(·) is the fluid trajectory corresponding to arrival rate λ and initialized with q(0)=x. In view of (9), we have ξλ(x)D¯λ(x). Moreover, it is shown in lemma 2(a) of Sharifnassab et al. (2019) that ξλ(x) has the minimum norm among all vectors in D¯λ(x), that is,
ξλ(x)=argminνD¯λ(x)ν,xR+.(49)

Here, the minimizer is unique.

Given a closed and convex set AR and a point xR, we denote by πA(x) the projection of x on A, defined as the point in A that is closest to x. With this terminology, ξλ(x) is the projection πD¯λ(x)(0) of the zero vector on the set D¯λ(x).

In what follows, we also make use of an elementary property of projections: if A is a closed convex set, b is some vector, and B=A+b, then

πA(x)πB(x)b.(50)

As a consequence of this, for any λ1,λ2, and x in R+,

ξλ1(x)ξλ2(x)=πD¯λ1(x)(0)πD¯λ2(x)(0)=πD¯λ1(x)(0)πD¯λ1(x)+λ2λ1(0)λ1λ2,(51)
where the second equality is because
D¯λ2(x)=λ2M¯(x)=λ1M¯(x)+λ2λ1=D¯λ1(x)+λ2λ1,
and the inequality follows from (50).

Lemma 11.

The set W is ϵ-invariant.

Proof.

Because V is a special ϵ-Lyapunov function, it is Lipschitz continuous with Lipschitz constant one. Let λR+ be such that λλ*ϵ and consider fluid trajectories q(·) and p(·) corresponding to arrival rates λ and λ*, respectively, initialized with the same nonnegative vector q(0)=p(0)W. Then,

V˙(q(t))|t=0=limδ0V(q(δ))V(q(0))δ=limδ0V(p(δ))+V(q(δ))V(p(δ))V(q(0))δlimδ0[V(p(δ))+q(δ)p(δ)]V(q(0))δ=limδ0(V(p(δ))V(p(0))δ+q(δ)p(δ)δ)limδ0(V(p(δ))V(p(0))δ+λλ*)=V˙(p(t))|t=0+λλ*ϵ+ϵ=0,(52)
where the first inequality is because V has a Lipschitz constant equal to one, the third equality is due to q(0)=p(0), and the second inequality follows from (51). The last inequality follows from the second property of special ϵ-Lyapunov functions in Definition 5 and the assumption λλ*ϵ.

This argument shows that, for any λR+ with λλ*ϵ and for any fluid trajectory q(·) corresponding to arrival rate λ, the distance from the set W cannot increase. In particular, if q(·) is initialized with q(0)W, it must stay in W. Therefore, using the terminology in Definition 7, W is ϵ-invariant. □

From the third property in the definition of special ϵ-Lyapunov functions, we have V(0)=0 and, therefore, 0W. Thus, every ϵ-JF(n) trajectory starts in W. From the fourth property in the definition of special ϵ-Lyapunov functions, W is closed with respect to positive jumps along the coordinates associated with heavy-tailed arrivals. Using also Lemma 11 for the times between jumps, we see that, for any nN, every ϵ-JF(n) trajectory stays in W. Equivalently, for any γΓ, every ϵ-JF(γ) trajectory stays in W. Finally, employing again the third property of special ϵ-Lyapunov functions, we have qm = 0 for all qW. This implies that qm(t)=0 for all ϵ-JF(γ) trajectories q(·) with γΓ and all t0. This establishes the ϵ-JF(γ) condition for all γΓ and completes the proof of Theorem 2.

9.3. Proof of Corollary 1

For part (a), fix some ϵ>0 and suppose that there exists a special ϵ-Lyapunov function. Theorem 2 implies that the ϵ-JF(γ) condition holds for every γΓ. It then follows from Theorem 1 that queue m is RDS for every γΓ.

For part (b), we fix some ϵ>0 and assume that queue m is ϵ-RDS for all γΓ. In particular, queue m is RDS, and Theorem 1 implies that the ϵ-JF(γ) condition holds for some ϵ>0. However, a close inspection of the proof of the reverse part of Theorem 1 reveals that we can in fact choose ϵ to be the same as ϵ. Thus, the ϵ-JF(γ) condition holds, and Theorem 2 implies that there exists a special ϵ-Lyapunov function.

10. Discussion

In this section, we summarize some key points and conclude with a few open questions.

10.1. Framing and Results

We have addressed the problem of delay stability for a class of queueing networks that operate under the max-weight scheduling policy when some arrival processes are heavy-tailed and some are light-tailed. The overall purpose was to develop conditions for delay stability in terms of fluid-like models. However, as illustrated by the example in Section 3, delay instability can be the result of multiple coordinated large jumps. The probabilities of such large jumps are, in turn, affected by the tail exponents of the arrival processes. Given that traditional fluid models are oblivious to the tail exponents, we introduce JF models, a generalization that allows for jumps along the coordinates associated with heavy-tailed flows subject to a budget on the number of jumps with the budget being determined by the tail exponents.

At the same time, it became clear that tight conditions for delay stability that do not depend on the details of the arrival distributions are only possible under a suitable robust formulation with respect to both the arrival rates and the arrival process distributions. With a careful choice of definitions, we were finally able to establish necessary and sufficient conditions for robust delay stability in terms of ϵ-JF models.

In Section 3, we also discuss a related, so-called ZF condition. The ZF condition essentially examines fluid trajectories that start at zero and involve a single jump and leads to a necessary condition for delay stability (Markakis et al. 2018), but the question whether it can also form the basis for a sufficient condition was open. Our results show that, in order to obtain necessary and sufficient conditions, we need to examine a richer set of trajectories that involve multiple jumps.

Finally, earlier works (Markakis 2013, Markakis et al. 2018) show that Lyapunov functions with certain structural properties can yield sufficient conditions for delay stability. But it was not clear if and when delay stability is equivalent to the existence of such Lyapunov functions. Our results make progress toward establishing the completeness of such a Lyapunov-based methodology for the regime in which the heavy-tailed flows can have arbitrarily small tail exponents.

Our RJF condition is difficult to test for general networks. In some sense, this reflects the intrinsic complexity of the (robust) delay stability problem. The Lyapunov-based condition also appears to be hard to test for general networks.

10.2. Alternative Formulations

Given the complexity of the RJF condition, it is natural to inquire about simpler alternatives. For example, is it possible to obtain tight delay stability results (without robustness) if we consider JF models with a constant rate λ(·)? In the same spirit, can we restrict to the case in which all jumps in ϵ-JF models take place at the same time? Might it be easier to consider concrete arrival processes instead of focusing on delay stability for all arrival processes with given exponents?

For all three of these questions, the answer is negative. We discuss such variations and related (counter)examples in Appendix A.

10.3. Open Problems

We collect here a few open problems and possible future research directions.

  1. Can the results be generalized to other scheduling policies, for example, MW-α policies (Neely 2010), an extension of the MW policy considered in this paper, or more generally to other stochastic networks whose stability has been studied using fluid models? One obstacle here is that our main result relies heavily on a particular fluctuation bound, which is established specifically for the MW dynamics (Sharifnassab et al. 2020). However, progress may be possible if we rely on alternative stochastic bounding techniques.

  2. Can we identify some special classes of networks for which our criteria (either the RJF condition or the Luyapunov-based condition) can be tested in polynomial time?

  3. Is the RJF condition, which involves time-varying λ(t) with λ(t)λ* equivalent to a similar condition in which we only consider ϵ-JF trajectories with time-invariant λ(t)=λ with λλ*? See Appendix A for further discussion and some conjectures.

Acknowledgments

We are grateful to Dr. Bert Zwart for providing us with pointers to relevant works in the literature and for informative technical discussions. Various stages of this work were completed while A. Sharifnassab was affiliated with the Massachusetts Institute of Technology, Sharif University of Technology, and the University of Alberta.

Appendix A. Exploring Alternative Formulations

Our formulation involves rate-robustness (robustness with respect to variations in the arrival rate) as well as distributional robustness (by considering the worst case over all distributions with given tail exponents). It would be preferable to develop conditions that characterize delay stability for specific systems (with fixed arrival rates and arrival distributions). However, this seems to be impossible for reasons that become clearer in this appendix. In particular, the distributional robustness aspect appears to be inevitable as long as we are aiming at conditions that are both necessary and sufficient; see Appendix A.4. For this reason, most of this appendix is devoted to exploring variants of rate robustness. In the interest of brevity, we keep the discussion informal without rigorous proofs.

A.1. Variations of Our Definitions

In this section, we present a number of variations to our definitions of RDS and the RJF condition. In later sections, we elaborate on their relations. Throughout this appendix, we assume that the tail exponent vector γ is fixed with every γj in (0,]. We also fix some λ*>0. The various definitions that we offer differ only with respect to the choice of allowed functions λ(·).

Let F be a class of discrete-time functions λ(·). We say that queue m is F-RDS if it obeys Definition 1 except that the allowed arrival rates E[A(t)] are also required to belong to F.

We consider the following choices for F, leading to three alternative definitions of robust delay stability, namely, G-RDS, C-RDS, and 0-RDS:

  • G (general): Here, we impose no additional restrictions on E[A(t)]. Thus, G-RDS is identical to the RDS condition that we study.

  • C (constant): Here, we require E[A(t)] to be constant. Effectively, we are considering small but constant perturbations of λ*.

  • 0 (zero): Here, we require E[A(t)] to be equal to λ* for all times t.

We continue similarly to define variants of the RJF condition. Let F be a class of continuous-time functions λ(·). The F-RJF condition is defined exactly as in Definition 4 except that we only consider ϵ-JF trajectories for which the rate function λ(·) is also required to belong to the class F. We consider four possible choices for F, leading to four variants of the RJF condition, namely, UC-RJF, PC-RJF, C-RJF, 0-RJF:

  • UC (uniformly continuous): Here, we remove the requirement in Definition 3 that λ(·) be piecewise constant. Instead, we require λ(·) to be (i) piecewise continuous with a finite number of discontinuities and (ii) uniformly continuous on any interval in which it is continuous.

  • PC (piecewise constant): Here, λ(·) is exactly as in Definition 3, and in particular, piecewise constant. Thus, the PC-RJF condition coincides with the RJF condition we are studying.

  • C (constant): Here, we require λ(·) to be constant. Effectively, we are considering small but constant perturbations of λ*.

  • 0 (zero): Here, we require λ(·) to be equal to λ* for all times t.

A.2. Relations Between Alternative Definitions

In this section, we explore the relation between F-RDS and F-RJF conditions for different choices of F; see Figure A.1 for a visual summary.

Figure A.1. Relation Between the Various Conditions

It is clear that, when we restrict to a smaller class, the RDS or RJF conditions are easier to satisfy. Thus,

G-RDSC-RDS0-RDS,
and
UC-RJFPC-RJFC-RJF0-RJF.

Furthermore, Theorem 1 establishes that G-RDS is equivalent to PC-RJF.

A.2.1. PC-RJF UC-RJF.

An arbitrary (continuous-time) function λ(·) in the class UC can be approximated by a piecewise constant function with finitely many pieces uniformly over a compact set. Furthermore, it can be shown that, if we perturb by ϵ the vector λ(·) that drives a JF-trajectory, the resulting trajectory is perturbed by at most ϵ over a time interval of length one. It follows that, if the UC-RJF condition fails, we can construct piecewise-constant approximations of λ(·) that demonstrate that the PC-RJF condition also fails. Therefore, PC-RJFUC-RJF.

Taking Theorem 1 also into account, we see that all three conditions, G-RDS, PC-RJF, and UC-RJF, are equivalent. An alternative path to the same conclusion consists of modifying the proof in Section 8 and showing that G-RDSUC-RJF. This is possible, but quite tedious.

A.2.2. (C-RDS C-RJF) and (0-RDS 0-RJF).

These two implications are true because the proof in Section 8 applies verbatim. Indeed, if we assume that C-RJF fails to hold, we start with a trajectory q(·) that is driven by a constant rate λ and drives queue m to a positive value. The construction of the arrival process in Section 8.4 yields a process with a constant rate λ¯. Thus, the same proof establishes that failure of the C-RJF condition leads to failure of C-RDS; equivalently, C-RDS implies C-RJF. The argument that 0-RDS0-RJF is the same.

A.2.3. 0-RJF 0-RDS.

This fact exemplifies the difficulty of obtaining necessary and sufficient conditions in the absence of robustness considerations with respect to the arrival rates.

The argument is simple. Consider a single queue that is served at unit rate and let λ*=1. Suppose that the tail exponent is larger than one so that no jumps are allowed. In that case, there is only one possible JF trajectory, which obeys q˙=11=0. When initialized at zero, the JF trajectory stays at zero. Thus, the 0-RJF condition holds for the single queue of interest. On the other hand, as long as the arrivals are not deterministic, the stochastic system is marginally unstable, the expected queue length grows to infinity, and the 0-RDS condition does not hold.

This example involves a system operating at the boundary of its capacity region (marginally unstable). We can also construct simple examples (involving two queues) in which the system operates in the interior of the stability region, is stable and satisfies the 0-RJF condition but is not 0-RDS. Such an example (which we omit) involves a system that operates at the threshold between robust delay stability and instability.

A.2.4. 0-RJF C-RJF.

This is again a simple observation. Consider the same single-queue system as in the previous paragraph with λ*=1. As long as the rate is fixed at one, the JF trajectory stays at zero, and the 0-RJF condition holds. On the other hand, a small constant perturbation that results in λ>1 yields a divergent JF trajectory, and therefore, the C-RJF condition does not hold.

A.3. Conjectures and Open Problems

We list here a number of questions and conjectures.

A.3.1. 0-RDS ? C-RDS.

We conjecture that, when λ*0,7 0-RDS implies C-RDS. Ultimately, this amounts to showing that the set of positive arrival rate vectors λ* for which the system is delay stable (robustly over all distributions with given tail exponents) is open. The rationale behind this conjecture is that, in more standard settings (ordinary stability) the set of positive vectors λ* that lead to a stable system is open.

A.3.2. C-RJF ? PC-RJF.

We conjecture that this implication is true although we do not see how to establish it. If it is true, it would follow from the diagram in Figure A.1 that C-RJF and C-RDS are equivalent to G-RDS, UC-RJF, and PC-RJF.

An indirect approach to establishing the conjecture is to show that (i) C-RJF C-RDS and (ii) C-RDS G-RDS. However, this appears to be difficult. Our proof that PC-RJF implies G-RDS involves the set W of points reachable by ϵ-JF trajectories; see Definition 6. However, when we restrict λ(·) to be constant, this set is no longer ϵ-invariant, and Lemma 4(ii) fails to go through.

A.3.3. Generic Considerations.

A fundamental reason behind the mismatch between 0-RJF and G-RDS is that, at least for simple examples, the set of nonnegative nominal rates λ* for which 0-RJF holds is closed, whereas the set of positive nominal rates λ* for which G-RDS holds is open. It is conceivable, however, that one set is the closure of the other and the difference between the two sets is just a lower dimensional boundary. This leads us to the conjecture that 0-RJF and G-RDS are generically equivalent.

Hypothesis A.1.

Let us fix a network and some γ. The set of nonnegative nominal arrival vectors λ* for which the 0-RJF condition holds but G-RDS does not hold has zero Lebesgue measure.

A.4. The Details of the Arrival Distribution May Matter

Our discussion so far has been about distributionally robust results, dealing with delay stability for all arrival distributions with the given tail exponents γ. The reason for this was that JF models cannot take into account any further properties of these distributions.

Once we start inquiring about delay stability for a fixed, fully specified system, the situation is more complex: necessary and sufficient conditions for delay stability appear to be impossible. We illustrate the situation by stating a positive result and disciussing the obstacles in establishing a converse.

A.4.1. Delay Stability Implies the 0-RJF Condition Under a Regularity Assumption.

Suppose that a particular system (with a constant arrival rate λ* and given, i.i.d. arrival distributions) is delay stable. Suppose, furthermore, that the distribution of each Aj(t) satisfies (3) with γ replaced by the appropriate γj. Then, it can be shown that the 0-RJF condition holds. The argument involves similar ideas as the proof in Section 8. That is, we can show that the stochastic system can track an ϵ-JF trajectory with significant probability.

Note, however, that the 0-RJF condition does not imply delay stability,even under such a regularity assumption. The argument is the same as in our earlier example that showed that the 0-RJF condition does not imply the 0-RDS condition.

A.4.2. Without a Regularity Assumption, Delay Stability Need Not Imply the 0-RJF Condition.

In contrast to the aforementioned result, we have strong reasons to conjecture that there exist systems that are delay stable, and yet, the 0-RJF condition fails to hold. The intuition behind this conjecture is as follows.

Consider a system with two heavy-tailed arrival streams together with some light-tailed ones. Suppose that the tail exponents of the heavy-tailed arrivals are larger than 1/3. We can arrange the system so that a JF trajectory drives the light-tailed queue of interest to a positive value if and only if we have one jump at each heavy-tailed queue, the two jump times are approximately equal, and the jump sizes are comparable (within a constant factor of each other). Such a system does not satisfy the 0-RJF condition.

As in the proof in Section 8, we might expect that the stochastic system can track this JF trajectory. However, we can arrange the arrival process distributions for the two heavy-tailed queues to be such that their supports are wide apart. For example, one distribution may be supported on integers of the form 102i and the other on integers of the form 102i+1. In that case, equal-size jumps are essentially impossible. As a consequence, the stochastic system should be unable to emulate the JF trajectory, the instability mechanism suggested by the JF trajectory need not be present, and the queue of interest may turn out to be delay stable.

A.5. The Timing of the Jumps

The definition of ϵ-JF trajectories allows for jumps at different times. On the other hand, our examples so far rely on jumps that happen simultaneously. This raises the question whether the RJF condition is equivalent to an analogous condition in which we only consider trajectories with simultaneous jumps. It turns out that this is not possible. We give an example with four queues in which an ϵ-JF trajectory drives a certain queue to a positive value, but this is only possible if we allow jumps to occur at different times.

Consider the system in Figure A.2. The first queue receives heavy-tailed arrivals with γ1=1/2, whereas the three other queues receive light-tailed arrivals. There are three possible service vectors as shown in the figure, and the arrival rate vector is λ*=(1,2,1,1). Note that the condition γTn1 allows up to two jumps at queue 1. If we restrict to simultaneous jumps, this essentially limits us to a single jump at queue 1.

Figure A.2. A Network with Three Light-Tailed Queues and One Heavy-Tailed Queue, Which Demonstrates the Importance of the Timing of Multiple Jumps
Notes. In the example of this figure, if q1 undergoes a single jump at time 0, q4 stays at zero. However, two jumps in q1 at suitably arranged times can result in a positive q4.

Suppose that q1 has a jump of size 27 at time 0. Then, the 0-JF trajectory is piecewise linear with breakpoints q(0)=(27,0,0,0),q(3)=(6,6,0,0),q(5)=(0,2,2,0), and q(9)=0. This can be easily verified by noticing that the MW policy chooses service vector μ1 for t[0,3) and service vector μ2 for t(3,5), and the service capacity is split between μ2 and μ3 with ratios 5/8 and 3/8 for t(5,9). It then follows from the form of this piecewise linear fluid trajectory that, given a single jump at time 0, q4 stays at zero for all subsequent times. We now argue that this is not the case if q1 undergoes two jumps at different times.

Suppose that q1 has a jump of size 27 at time 0 and a jump of size 2 at time 5. Let p(·) be the associated jumping fluid trajectory. Then, right before time 5, we have p(5)=q(5)=(0,2,2,0). Therefore, after the second jump, we have p(5)=(2,2,2,0). In this case, μ3 is the dominant service vector for some positive time interval starting from time 5. Because q4 receives no service under μ3, it starts to build up and become positive.

In this example, the 0-RJF condition fails to hold, and the system is not 0-RDS. On the other hand, if we were to restrict to simultaneous jumps, we would not be able to tell that this is the case. Finally, using the same example, we see that we should also consider nonsimultaneous jumps when examining the C-RJF or PC-RJF conditions.

Appendix B. Proofs of Lemmas for the First Direction of Theorem 1 (ϵ-JF RDS)

B.1. Proof of Theorem 4

We compare the fluid trajectory q(·), which is initialized with q(0)Q(0), with another fluid trajectory q˜(·), initialized with q˜(0)=Q(0). From the triangle inequality,

Q(t)q(t)Q(t)q˜(t)+q˜(t)q(t).

We apply Theorem 3 to bound the first term on the right-hand side. Because of the nonexpansive property of the MW dynamics, we also have

q˜(t)q(t)|q˜(0)q(0)=Q(0)q(0),
and the result follows.

B.2. Proof of Lemma 1

Let us fix T throughout this proof. Recall the constant β(1,2) introduced in the context of (19). For j=1,,, we define

γj={γj/βif γj<β3,β2if γjβ3.

We then let γ=(γ1,,γ), and γ=minjγj. Note that, in all cases, we have γjγj/β so that γγ/β. As argued in Claim 1, we can and do (without loss of generality) assume that γ=minjγj1, and thus, γ<1. Finally, in view of (19), it can be seen that, for any nonnegative integer vector n,

ifγTn>1,then(γ)Tnβ2>1.(B.1)

For every j and according to our definition (5) of the tail exponent of an arrival process, there is a random variable A¯j that dominates Aj(t) for all t0 and for which all moments of order less than 1+γj are finite; see (3). We define

Γj=E[A¯j1+γj],j=1,,,(B.2)
which is finite because γj<γj.

For t=0,,T1, and j=1,,, let

pj,t=P(Aj(t)>θt).

For any j and tT1, the Markov inequality yields

pj,t=P(Aj(t)>θt)P(A¯j>θt)=P(A¯j1+γj>θt1+γj)E[A¯j1+γj]θt1+γj=Γjη1+γjlog1+γj(M+Tt)(M+Tt)1+γj,(B.3)
where the last equality is due to the definitions of θt and Γj in (11) and (B.2), respectively.

Let ϕ=121/ and note that 0<ϕ<1. Because 0<(11/β)γ<γj for every j, there exists some M11 such that, if MM1, then

Γjη1+γjlog1+γjMϕ·(γj(11/β)γ)·M(11/β)γ,j=1,,.(B.4)

We fix such an M1. Then, for MM1, we have

t=0T1pj,tΓjη1+γjt=0T1log1+γj(M+Tt)(M+Tt)1+γjΓjη1+γjτ=1log1+γj(M+τ)(M+τ)1+γjϕ(γj(11/β)γ)τ=1(M+τ)(11/β)γ(M+τ)1+γjϕ(γj(11/β)γ)Mx(11/β)γx1+γjdx=ϕM(11/β)γγj,(B.5)
where the first inequality is due to (B.3) and the third inequality follows from (B.4).

Let Xj,t be the event {Aj(t)>θt}. Recall that Nj stands for the number of jumps at queue j during the interval [0,T). For any j and any nonnegative integer n, we have

P(Nj=n)0τ1<<τnT1P(Xj,τ1Xj,τn)=0τ1<<τnT1pj,τ1pj,τn(t=0T1pj,t)nϕnM(11/β)γnγjn.(B.6)

Here, the first inequality follows from the union bound, the equality is from the independence of the events Xj,τ1,,Xj,τn, and the last inequality is due to (B.5). Therefore, for any nZ+,

P(N=n)=j=1P(Nj=nj)j=1ϕnjM(11/β)γnjMγjnj=ϕ|n|M(11/β)γ|n|nTγ,(B.7)
where |n|=n1++n. If γTn>1, then (B.1) asserts that nTγβ2. As a result and because γ is the smallest component of γ,
(11β)γ|n|nTγ(11β)nTγnTγ=nTγββ.(B.8)

Hence,

P(γTN>1)=n:γTn>1P(N=n)n:γTn>1ϕ|n|M(11/β)γ|n|nTγMβn:γTn>1ϕ|n|Mβ((n1,,n0ϕ|n|)1)=Mβ((j=1nj=0ϕnj)1)=Mβ(1(1ϕ)1)=Mβ,(B.9)
where the first inequality is from (B.7) and the second inequality follows from (B.8). The last equality is because 1ϕ=21/. This completes the proof of Lemma 1.

B.3. Proof of Lemma 2

The Bernstein inequality (see, e.g., (1.21) in appendix 1 of Anthony and Bartlett 2009) asserts that, for any z0, we have

P(|YE[Y]|>z)2exp(z2/2i=1nE[(XiE[Xi])2]+bz/3).(B.10)

We note that E[(XiE[Xi])2]E[Xi2]E[Xib]bλ¯. Plugging this into (B.10), we obtain

P(|YE[Y]|>z)2exp(z2/2nbλ¯+bz/3).(B.11)

This implies (20) and completes the proof of Lemma 2.

B.4. Proof of Lemma 3

For every j, and according to our definition (5) of the tail exponent of an arrival process, there is a random variable A¯j that dominates Aj(t) for all t0 and for which all moments of order less than 1+γj are finite; see (3). Because γj>0, we have 0P(A¯j>x)dx=E[A¯j]<. Therefore, there exists a constant M2>0 such that, for any M>M2 and every j,

M/ηlogMP(A¯j>x)dxγϵ60C.(B.12)

Note that the choice of M2 is independent of T. For the rest of the proof, we fix M2 and assume that M>M2.

Recall that A*(τ)=min{A(τ),θτ} so that

Aj(τ)Aj*(τ)=max{0,Aj(τ)θτ}.(B.13)

Therefore,

E[Aj(τ)]E[Aj*(τ)]=E[max{0,Aj(τ)θτ}]=0P(max{0,Aj(τ)θτ}>x)dx=0P(Aj(τ)θτ>x)dx=θτP(Aj(τ)>x)dxM/ηlogMP(Aj(τ)>x)dxM/ηlogMP(A¯j>x)dxγϵ60C,(B.14)
where the fourth equality uses a change of variables from x+θτ to x, the first inequality uses the fact θτM/ηlogM for all τ0 (see the definition (11) of θt), the second inequality is because A¯j dominates Aj(τ), and the last inequality follows from (B.12). Therefore, for any τ0,
E[A(τ)]E[A*(τ)]maxj=1,,{E[Aj(τ)]E[Aj*(τ)]}γϵ60C.(B.15)

We now introduce some “simpler” events whose occurrence are shown to imply the event Efluc(T,M) that is introduced in (21):

E*={τ=t0t(A*(τ)E[A*(τ)])γϵ30C(M+Tt0),for 0t0t<T},(B.16)
and for every j,
Ej*={|τ=t0t(Aj*(τ)E[Aj*(τ)])|γϵ30C(M+Tt0),for 0t0t<T},(B.17)
and we observe that the events E1*,,E* imply the event E*. Then, the union bound, applied to the complements of these events, implies that
P(E*)1j=1(1P(Ej*)).(B.18)

We now argue that the event E* implies the event Efluc(T,M) defined in (21). Indeed, suppose that event E* occurs. Then, as long as 0t0t<T, we obtain

τ=t0t(A*(τ)λ*)=τ=t0t(A*(τ)E[A*(τ)])+τ=t0t(E[A*(τ)]E[A(τ)])+τ=t0t(E[A(τ)]λ*)τ=t0t(A*(τ)E[A*(τ)])+τ=t0tE[A*(τ)]E[A(τ)]+τ=t0tE[A(τ)]λ*γϵ30C(M+Tt0)+γϵ60C(tt0)+γϵ20C(tt0)γϵ10C(M+Tt0),(B.19)
where, in the second inequality, we use the definition of the event E* to bound the first term, (B.15) to bound the second term, and (14) to bound the third term. The last inequality is because t < T. Thus, the event E* implies the event Efluc(T,M), and
P(Efluc(T,M))P(E*)1j=1(1P(Ej*)),(B.20)
where the second inequality follows from (B.18).

To complete the proof, we derive an upper bound for 1P(Ej*). As a first step, we obtain a relation between various constants, which reflects the fact that η is chosen large enough. We have

γ2ϵ2η2(30C)2μ¯+20γϵCγ2ϵ2η1800C22μ¯+20μ¯Cγ2ϵ21820C22μ¯·η=γ2ϵ21820C22μ¯·8000C22μ¯γ2ϵ2>4,(B.21)
where the first inequality is due to the assumptions γ1 and the fact ϵ<μ¯, which is evident from the definition (16) of μ¯; the second inequality is because C1, and the equality follows from the definition of η in (17).

Finally, we note that, for any τ0 and every j,

E[Aj*(τ)]E[Aj(τ)]E[A(τ)]λ*+ϵμ¯.(B.22)

For any t0 and t with 0t0t<T, using the fact that Aj*(τ) is bounded above by θτ, we have

P(|τ=t0t(Aj*(τ)E[Aj*(τ)])|>γϵ30C(M+Tt0))2exp((γϵ30C(M+Tt0))22M+Tt0ηlog(M+Tt0)·(μ¯(tt0+1)+γϵ90C(M+Tt0)))2exp(γ2ϵ2ηlog(M+Tt0)2(30C)2μ¯+20γϵC)2exp(4log(M+tt0))=2(M+Tt0)4,(B.23)
where the first inequality follows from Lemma 2 with z=(γϵ/30C)(M+Tt0),b=(M+Tt0)/ηlog(M+Tt0)=θt0θτ,λ¯=μ¯ (see (B.22)), and n=tt0+1; the second inequality is because tt0+1<M+Tt0, and the third inequality is due to (B.21). Therefore, for every j,
1P(Ej*)t0=0T1t=t0T1P(|τ=t0t(Aj*(τ)E[Aj*(τ)])|>γϵ30C(M+Tt0))t0=0T1t=t0T12(M+Tt0)4=t0=0T12(Tt0)(M+Tt0)4t0=0T12(M+Tt0)3τ=12(M+τ)302(M+x)3dx=M2,(B.24)
where the first inequality is from the union bound and the second inequality is due to (B.23). Plugging (B.24) into (B.20), we obtain P(Efluc(T,M))1M2. This completes the proof of Lemma 3.

B.5. Proof of Lemma 4

We start with the proof of the first part and assume that W is ϵ-invariant. We prove the result under the additional assumption that the set W is closed. This is without loss of generality for the following reason. Given an ϵ-invariant set W, let W¯ be its closure. Because fluid trajectories under the MW policy are continuous functions of their initial conditions, it is not hard to see that W¯ is also ϵ-invariant. Once we show the result for closed sets, we have established that W¯ is ϵ-attracting. Finally, because d(x,W)=d(x,W¯) for all x, we can conclude that W is also ϵ-attracting.

Having assumed that W is closed, we now consider a fluid trajectory q(·) initialized with q(0)=q00 with q0W. Then, there exists some x0W, which is closest to q0. Let x(·) be a fluid trajectory initialized at x(0)=x0 and corresponding to the rate vector

λ=λ*+ϵq0x0q0x0.(B.25)

Because W is ϵ-invariant and λλ*=ϵ, we have x(t)W for all t0. In particular, d(q(t),W)q(t)x(t) for all t0. Furthermore, equality holds at time t = 0. This implies that

ddtd(q(t),W)|t=0ddtq(t)x(t)|t=0.(B.26)

From the fluid equations (see Definition 2), we have x˙(0)D¯λ(x0)=λM¯(x0). Equivalently, there exist coefficients αμ0 for μM(x0) such that μM(x0)αμ=1 and

x˙(0)=λμM(x0)αμμ.(B.27)

Similarly, let βν0 for νM(q0) be a set of coefficients such that νM(q0)βν=1 and

q˙(0)=λ*νM(q0)βνν.(B.28)

For any μM(x0) and any νM(q0) because, by definition, μ is a maximizer of uTx0 over all uM, we have (μν)Tx00. Similarly, (νμ)Tq00. Combining these two inequalities, we obtain

(μν)T(x0q0)=(μν)Tx0+(νμ)Tq00,μM(x0),νM(q0).(B.29)

Because μM(x0)αμ=νM(q0)βν=1, it follows that

(μM(x0)αμμνM(q0)βνν)T(x0q0)=μM(x0)νM(q0)αμβν(μν)T(x0q0)0.(B.30)

Using (B.26), we have

ddtd(q(t),W)|t=0ddtq(t)x(t)|t=0=(x˙(0)q˙(0))T(x0q0)x0q0=1x0q0(λμM(x0)αμμλ*+νM(q0)βνν)T(x0q0)=(λλ*)T(x0q0)x0q0(μM(x0)αμμνM(q0)βνν)Tx0q0x0q0(λλ*)T(x0q0)x0q0=(ϵ(q0x0)x0q0)T(x0q0)x0q0=ϵ.(B.31)

Here, the second equality follows from (B.27) and (B.28), the second inequality is due to (B.30), and the fourth equality uses the definition of λ in (B.25). Thus, W is ϵ-attracting.

For the proof of the second part, we fix some n and consider the set W(n) as in Definition 6. Let x be an element of W(n). Then, there exists an ϵ-JF(n) trajectory x(·) that reaches x. Consider now a fluid trajectory q(·), corresponding to λ, for some λ with λλ*ϵ and initialized at x. The concatenation of the trajectories x(·) and q(·) is an ϵ-JF(n) trajectory, and therefore, any point that it can reach also belongs to W(n). Thus, W(n) is ϵ-invariant as claimed.

B.6. Proof of Lemma 5

As in the statement of the lemma, we fix some MM3 and times that satisfy 0t0<t1T. We also fix a sample path under which the event Efluc(T,M) occurs, and the interval (t0, t1) is jump-free.

Let q(·) be a fluid trajectory corresponding to the arrival rate λ* and initialized with q(t0+1)=Q(t0+1). Then,

Q(t1)q(t1)C(1+λ*+maxt(t0,t1)τ=t0+1t(A(τ)λ*))=C(1+λ*+maxt(t0,t1)τ=t0+1t(A*(τ)λ*))C(1+λ*+γϵ10C(M+Tt0))=γϵ10(M+Tt0)+(λ*+1)C,(B.32)
where the first inequality follows from Theorem 4, the first equality is because (t0, t1) is jump-free, and as a result, A*(τ)=A(τ) for all τ(t0,t1), and the second inequality is because of the occurrence of the event Efluc(T,M).8

Moreover, from Lemma 4, W(t0+1) is ϵ-attracting. Furthermore, because (t0, t1) is a jump-free interval, we have W(t1)=W(t0+1). Therefore,

d(q(t1),W(t1))=d(q(t1),W(t0+1))max{0,d(q(t0+1),W(t0+1))(t1t01)ϵ},(B.33)
where the inequality rests on (23).

We now consider the effect of possible jumps at time t0. Let k0 be the number of jumps that occur at time t0 in different entries of A(t0). Without loss of generality, suppose that A1,,Ak undergo a jump at time t0. Let J be an -dimensional vector with the first k entries equal to A1(t0),,Ak(t0) and all other entries equal to zero. Then, A(t0)J is an -dimensional vector whose first k entries are zero; all other entries are jump-free and are, therefore, bounded by θt0=(M+Tt0)/(ηlog(M+Tt0)). Thus,

A(t0)J(M+Tt0)ηlog(M+Tt0).(B.34)

A key consequence of our definition of the sets W(t) is that if t0 is a jump time, then some components of the vector N(t0) are larger than those of N(t01) so that the set W(t0+1) is larger than the set W(t0). In particular, if xW(t0), then x+JW(t0+1). Now, let x be a point in the closure of W(t0) that is closest to Q(t0), that is,

xargminyclosure(W(t0))yQ(t0).

Because xclosure(W(t0)), it follows that x+Jclosure(W(t0+1)). Therefore,

d(q(t0+1),W(t0+1))d(q(t0+1),x+J)=d([Q(t0)μ(t0)]++A(t0),x+J)d([Q(t0)μ(t0)]++A(t0),Q(t0)+A(t0))+d(Q(t0)+A(t0),Q(t0)+J)+d(Q(t0)+J,x+J)=d([Q(t0)μ(t0)]+,Q(t0))+d(A(t0),J)+d(Q(t0),x)μ(t0)+d(A(t0),J)+d(Q(t0),x)μ¯+(M+Tt0)ηlog(M+Tt0)+d(Q(t0),x)=μ¯+(M+Tt0)ηlog(M+Tt0)+d(Q(t0),W(t0)),(B.35)
where the first inequality is because x+Jclosure(W(t0+1)), the first equality is from the evolution formula for the MW dynamics in (1) and the initialization q(t0+1)=Q(t0+1) for q(·), the last inequality follows from (B.34), and the last equality is from the definition of x.

Combining (B.32), (B.33), and (B.35), we obtain, for MM3,

d(Q(t1),W(t1))d(Q(t1),q(t1))+d(q(t1),W(t1))γϵ(M+Tt0)10+(λ*+1)C+d(q(t1),W(t1))γϵ(M+Tt0)10+(λ*+1)C+max{0,d(q(t0+1),W(t0+1))(t1t01)ϵ}γϵ(M+Tt0)10+(λ*+1)C+max{0,[d(Q(t0),W(t0))+μ¯+(M+Tt0)ηlog(M+Tt0)](t1t01)ϵ}max{0,d(Q(t0),W(t0))(t1t01)ϵ}+μ¯+(M+Tt0)ηlog(M+Tt0)+γϵ(M+Tt0)10+(λ*+1)Cmax{0,d(Q(t0),W(t0))(t1t0)ϵ}+ϵ+(M+Tt0)ηlog(M+Tt0)+γϵ(M+Tt0)10+(λ*+1)C+μ¯max{0,d(Q(t0),W(t0))(t1t0)ϵ}+γϵ(M+Tt0)6,(B.36)
where the second inequality is from (B.32), the third inequality is due to (B.33), the fourth inequality follows from (B.35), and the last inequality is because M has been chosen large enough as in (25). This completes the proof of Lemma 5.

B.7. Proof of Lemma 6

Let N(t) be the cardinality of N(t), that is, the number of jumps up to time t. We also use the convention N(1)=0. When the sample path is such that the event Ejump(T,M) occurs, then N(t)1/γ for all t < T. Thus, for any t[0,T),

(N(t)+2)γϵ3(1/γ+2)γϵ3=(1+2γ)ϵ3ϵ,(B.37)
where we have used our assumption that γ1.

We fix a sample path of the arrival process A(·) under which both events Ejump(T,M) and Efluc(T,M) occur. We use strong induction to prove that, for any t[0,T],

d(Q(t),W(t))(N(t1)+2)γϵ6(M+Tt).(B.38)

To establish the base case of the induction, we use the assumption Q(0)=0 and the fact that 0W(0) (because ϵ-JF trajectories are zero for negative times). Thus, d(Q(0),W(0))=0, and therefore, (B.38) holds for t = 0.

For the induction step, we consider a time t1(0,T] and assume that (B.38) holds for all t<t1. We show that (B.38) also holds for time t1. Let

t0=max{0,T(2(Tt1)+M)}=max{0,2t1TM}.(B.39)

Note that either t0=0<t1 or t0=2t1TMt1M<t1 so that we always have t0<t1. We consider two cases.

Case B.1.

For the first case, we assume that the interval (t0,t1) is jump-free.9 We consider two subcases. If t0=0, then

d(Q(t1),W(t1))max{0,d(Q(t0),W(t0))(t1t0)ϵ}+ϵγ6(M+Tt0)=ϵγ6(M+Tt0)ϵγ6(M+T(2t1TM))=2ϵγ6(M+Tt1),(B.40)
where the first inequality is due to Lemma 5 and the first equality is because of the assumptions t0=0 and Q(0)=0 together with the observation that 0W(0). The second inequality follows from (B.39), which implies that 0=t02t1TM. In particular, (B.38) holds for t = t1.

For the second subcase, we assume that t0>0, in which case, t0=2t1TM. Then,

d(Q(t1),W(t1))max{0,d(Q(t0),W(t0))(t1t0)ϵ}+ϵγ6(M+Tt0)max{0,(N(t01)+2)γϵ6(M+Tt0)(t1t0)ϵ}+ϵγ6(M+Tt0)=max{0,(N(t01)+2)γϵ6(2M+2T2t1)(M+Tt1)ϵ}+ϵγ6(2M+2T2t1)=max{0,((N(t01)+2)γϵ3ϵ)(M+Tt1)}+2ϵγ6(M+Tt1)=2ϵγ6(M+Tt1),(B.41)
where the first inequality follows from Lemma 5, the second inequality is due to the induction hypothesis (B.38), the first equality uses the substitution t0=2t1TM, and the last equality is due to (B.37). Thus, (B.38) again holds for t=t1. This completes the induction step for Case B.1.

Case B.2.

For the second case, we assume that there is a jump in the interval (t0,t1). Let t^0 be the last jump time in the interval (t0,t1) so that (t^0,t1) is jump-free. Because t^0 is the last jump time, we have

N(t11)=N(t^0)N(t^01)+1,(B.42)
where the inequality is because there is at least one jump at t^0 (we say “at least” because, at time t^0, we could have jumps at multiple components of A(·). Consequently,
d(Q(t1),W(t1))max{0,d(Q(t^0),W(t^0))(t1t^0)ϵ}+ϵγ6(M+Tt^0)max{0,(N(t^01)+2)γϵ6(M+Tt^0)(t1t^0)ϵ}+ϵγ6(M+Tt^0)=max{ϵγ6(M+Tt^0),(N(t^01)+3)γϵ6(M+Tt^0)(t1t^0)ϵ}max(ϵγ6(M+Tt^0),(N(t11)+2)γϵ6(M+Tt^0)(t1t^0)ϵ)max{ϵγ6(2M+2T2t1),(N(t11)+2)γϵ6(M+Tt1)+((N(t11)+2)γϵ6ϵ)(t1t^0)}max{2ϵγ6(M+Tt1),(N(t11)+2)γϵ6(M+Tt1)}=(N(t11)+2)γϵ6(M+Tt1),(B.43)
where the first inequality follows from Lemma 5 and the assumption that (t^0,t1) is jump-free, the second inequality is due to the induction hypothesis (B.38), the third inequality is from (B.42), the fourth inequality is because t^0t02t1TM, and the last inequality is due to (B.37). Therefore, the induction step goes through for Case B.2 as well. This completes the proof of the induction and implies (B.38) for all t[0,T].

Finally, letting t = T, (B.38) becomes

d(Q(T),W(T))(N(T1)+2)γϵ6MMϵ2,(B.44)
where the last inequality is due to (B.37). This completes the proof of Lemma 6.

Appendix C. Proof of Lemmas for the Second Direction of Theorem 1: RDSϵ-JF Condition

C.1. Proof of Lemma 7

Suppose that the ϵ-JF(γ) condition fails to hold. Then, there exists a nonnegative integer vector n with γTn1, an ϵ-JF(n) trajectory q(·), and some time T such that qm(T)>0. If T = 0, we can use right-continuity of trajectories to see that, without loss of generality, we can take T to be positive. We then define a new trajectory q(·) by letting q(t)=q(tT)/T for all t0. It is not hard to verify that q(·) is also an ϵ-JF(n) trajectory, and satisfies qm(1)=qm(T)/T>0 so that the first property is satisfied.

Suppose now that some of the jumps of q(·) happen after time 1. Let n be the vector that counts the number of jumps that take place until time 1. Starting with q(·), we eliminate the jumps that happen after time 1 to obtain an ϵ-JF(n) trajectory with γTnγTn1, all jumps in [0,1], and qm(1)>0. By slightly perturbing the jump times and using a continuity argument, we can ensure that no two components have simultaneous jumps and also that all jump times belong to (0, 1) so that properties (b) and (c) in Lemma 7 are satisfied.

Finally, we can replace the arrival rates λj(t) that drive the ϵ-JF trajectory by max{λj(t),ϵ}, where ϵ<ϵ is a small positive constant. This ensures that inftλj(t)>0. Furthermore, using a continuity argument and as long as ϵ is small enough, the property qm(1)>0 is preserved. This proves condition (d) in Lemma 7 and concludes the proof of the lemma.

C.2. Proof of Lemma 8

For simplicity, and without loss of generality, we present the proof for the case in which t0=0. We let T¯1 be a large enough constant such that, for all TT¯1 and all k, we have

(akd)Tmax{μ¯,T}.(C.1)

This is possible because, according to the definition of d in (33), we have minkak>d.

We fix some TT¯1 as well as some k{1,,n}. We aim to show that the process has a substantial probability of a jump of size approximately akT during the interval [ΘkT,(Θk+d)T). Within the proof of this lemma, we use the symbol j (instead of jk) to denote the index of the queue at which the kth jump took place.

From (C.1), we have log((akd)T)(1/2)logT. We then obtain, for any t, any j, and any x[(akd)T,akT],

fAj(t)(x)=λ¯j(t)σ(γj)·x(2+γj)log(x+1)𝟙(xμ¯)=λ¯j(t)σ(γj)·x(2+γj)log(x+1)λ¯j(t)σ(γj)·(akT)(2+γj)log((akd)T)ζT(2+γj)logT,(C.2)
where the second equality is due to (C.1) and ζ is a positive constant chosen so that ζλ¯i(t)/2σ(γi)ak(2+γj) for every i, k, and t. Note that ζ can be taken positive because, according to Lemma 7, we can assume that inftλ¯j(t)>0.

We define ϕ=ζd. Then,

P(Aj(t)[(akd)T,akT])=(akd)TakTfAj(t)(x)dx(dT)·ζT(2+γj)logT=ϕT(1+γj)logT.(C.3)

As in (39), but with t0=0, we define

Bk=t=ΘkTΘkT+dT1A(t),(C.4)
and for t[ΘkT,(Θk+d)T),
Ut=BkAj(t)ej,(C.5)
and note that UtBk for every k and t. We have
P(Ut2dTμ¯)E[Ut]2dTμ¯τ=ΘkT(Θk+d)T1E[A(τ)]2dTμ¯dT(λ*+ϵ)2dTμ¯12.(C.6)
where the first step makes use of the Markov inequality, and the last step uses the fact λ*+ϵμ¯.

For t[ΘkT,(Θk+d)T), let Et be the event that Aj(t)[(akd)T,akT] and Ut2dTμ¯. Note that the term Aj(t) is omitted from Ut, and therefore, Aj(t) and Ut are independent. Thus, using (C.3) and (C.6), we obtain

P(Et)ϕ2T(1+γj)logT,t[ΘkT,(Θk+d)T).(C.7)

In light of the definition of d in (33), we have d(1+2μ¯)<ak. Therefore,

2dTμ¯<(akd)T.(C.8)

Thus, for any t,τ[ΘkT,(Θk+d)T) with τt, if Ut2dTμ¯, then

Aj(τ)Ut2dTμ¯<(akd)T.(C.9)

Here, the first inequality follows because, if τt, then Aj(τ) is one of the summands in the definition (C.5) of Ut, and the last inequality is due to (C.8). Consequently, if Ut2dTμ¯, then Eτ holds for no τ[ΘkT,(Θk+d)T) with τt, that is, for tτ, the events Et and Eτ are disjoint.

We now claim that, for any t[ΘkT,ΘkT+dT), the event Et implies the event Ekjump defined in (40). Indeed, when event Et occurs, we obtain

BkakTej=Ut+Aj(t)ejakTejUt+Aj(t)ejakTej=Ut+|Aj(t)akT|2dTμ¯+dT,(C.10)
where the last inequality follows from the definition of Et. This shows that the event Ekjump occurs as claimed. Let ψ=min{1,dϕ/2}. We then have
P(Ekjump)P(t[ΘkT,ΘkT+dT)Et)=t=ΘkT(Θk+d)T1P(Et)dT·ϕT(1+γj)logT2ψTγjlogT,(C.11)
where the first equality is because, for tτ, the events Et and Eτ are disjoint; the second inequality is due to (C.7), and the last one uses the definition of ψ. This completes the proof of Lemma 8.

C.3. Proof of Lemma 9

This proof is similar to the proof of Lemma 3 in Appendix B.4 although some of the details are different.

Recall that r is the number of piecewise constant pieces in the trajectory qϵ(·). We fix some k{0,1,,n} and let

α=γc32Cr,(C.12)
and
η=8(6μ¯+α)3α2.(C.13)

Claim C.1.

There exists a T¯28 such that, for any TT¯2, any j, and any t[ΘkT+dT,Θk+1T), we have

T/ηlogTP(Aj(t)x)dxα2,(C.14)
and
P(Aj(t)>TηlogT)14T.(C.15)

Proof.

Because all γj are positive, we can fix some δ such that 0<δ<γj for every j. Then, the density fAj(t)(·) in Definition 8 decays at least as fast as x(2+δ). More concretely, there exists a constant χ such that for all j and t, we have

fAj(t)(x)χx(2+δ),xμ¯.

We then have, for any time t[ΘkT+dT,Θk+1T), any y1, and any j,

P(Aj(t)y)χyx(2+δ)dx=χ1+δ·x(1+δ).(C.16)

It then follows that, as T goes to infinity, both T·P(Aj(τ)>T/ηlogT) and T/ηlogTP(Aj(τ)x)dx go to zero uniformly over all j and τ. Therefore, there exists a T¯2 such that, for any TT¯2, (C.15) and (C.14) hold, and the claim follows. □

For the rest of the proof, we fix such a T¯2 and assume that TT¯2. For any τ[ΘkT+dT,Θk+1T) and every j, let

Aj*(τ)=min{Aj(τ),TηlogT}.(C.17)

We define the “no jumps” event E= as follows:

E=={A(τ)=A*(τ),forallτ[ΘkT+dT,Θk+1T)}.(C.18)

Then,

1P(E=)j=1τ=ΘkT+dTΘk+1TP(Aj(τ)>TηlogT)j=1τ=ΘkT+dTΘk+1T14T14,(C.19)
where the first inequality uses the union bound, the second inequality follows from (C.15), and the last inequality uses the fact that d as defined in (33) is no larger than one. Moreover, for any j and any τ[ΘkT+dT,Θk+1T) and using the same steps as in (B.14),
E[Aj(τ)]E[Aj*(τ)]T/ηlogTP(Aj(τ)>x)dxα2,
where the last inequality follows from (C.14). Therefore, for any τ0,
E[A(τ)]E[A*(τ)]maxj(E[Aj(τ)]E[Aj*(τ)])α2.(C.20)

Consider now the following events:

E*={τ=ΘkT+dTt(A*(τ)E[A*(τ)])αT2,t[ΘkT+dT,Θk+1T)},(C.21)
and for j=1,,,
Ej*={|τ=ΘkT+dTt(Aj*(τ)E[Aj*(τ)])|αT2,t[ΘkT+dT,Θk+1T)}.(C.22)

Note that, if all of the events E1*,,E* occur, then E* also occurs. By applying the union bound to the complement of these events, we have

1P(E*)j=1(1P(Ej*)).(C.23)

Consider a sample path under which the events E= and E* occur. Then, for any t[ΘkT+dT,Θk+1T),

τ=ΘkT+dTt(A(τ)E[A(τ)])=τ=ΘkT+dTt(A*(τ)E[A(τ)])=τ=ΘkT+dTt(A*(τ)E[A*(τ)])+τ=ΘkT+dTt(E[A*(τ)]E[A(τ)])τ=ΘkT+dTt(A*(τ)E[A*(τ)])+τ=ΘkT+dTtE[A*(τ)]E[A(τ)]τ=ΘkT+dTt(A*(τ)E[A*(τ)])+αT2αT2+αT2=γcT32Cr,(C.24)
where the first equality is due to E=, the second inequality is due to (C.20), the third inequality follows from E*, and the last equality is from the definition of α in (C.12). Therefore, the events E= and E* imply the event Ekfluc. Using the union bound on the complements of these events,
1P(Ekfluc)(1P(E=))+(1P(E*))14+j=1(1P(Ej*)),(C.25)
where the last inequality follows from (C.19) and (C.23). To complete the proof, we develop an upper bound on 1P(Ej*).

For any τ0 and every j and as in (B.22), we have

E[Aj*(τ)]E[Aj(τ)]E[A(τ)]μ¯.(C.26)

For any t[ΘkT+dT,Θk+1T), we have

P(|τ=ΘkT+dTt(Aj*(τ)E[Aj*(τ)])|>αT2)2exp((αT/2)22TηlogT·(μ¯(tΘkTdT+1)+αT/6))2exp(3α2ηlogT4(6μ¯+α))=2exp(2logT)=2T2,(C.27)
where the first inequality follows from Lemma 2 with the identifications z=αT/2,b=T/ηlogT,λ¯=μ¯ (see (C.26)), and n=tΘkTdT+1; the second inequality is because nT; the equality follows from the definition of η in (C.13). Therefore, for every j,
1P(Ej*)t=ΘkT+dTTP(|τ=ΘkT+dTt(Aj*(τ)E[Aj*(τ)])|>αT2)t=1T2T2=2T114,(C.28)
where the first inequality is from the union bound, the second inequality is due to (C.27), and the last inequality is from the condition TT¯28 in Claim C.1. Plugging (C.28) into (C.25), we obtain P(Ekfluc)1/2. This completes the proof of Lemma 9.

C.4. Proof of Lemma 10

The proof of this lemma is essentially a continuity argument together with an induction that pieces together different time segments.

For simplicity and without loss of generality, we assume that the constant t0 in the statement of the lemma is equal to zero. Note, however, that, with this convention Q(0) is, in general, nonzero.

For any λ,xR+, let

ξλ(x)=q˙(0),(C.29)
where q(·) is the fluid trajectory corresponding to arrival rate λ and initialized with q(0)=x. In view of (9), we have ξλ(x)D¯λ(x). From the definition (8) of the set D¯λ(x) of possible drifts and the standing assumption λ(τ)λ*ϵ for all τ, we have
ξλ(τ)(qϵ(τ))μ¯,τ0,(C.30)
where μ¯ is defined in (30). By taking into account the jump akejk at time Θk and then integrating the drift ξλ(τ) over the jump-free interval [Θk,Θk+d], we have
qϵ(Θk+d)qϵ(Θk)dμ¯.

Noting also that qϵ(Θk)=qϵ(Θk)+akejk, we obtain

qϵ(Θk+d)qϵ(Θk)akejkdμ¯.(C.31)

For τ0, let μ¯a(τ)=min{μ(τ),Q(τ)}, where the minimum is taken component-wise; thus, μa(τ) is the actual service received at time τ. It then follows from (1) that

Q(τ+1)=Q(τ)μa(τ)+A(τ).(C.32)

We start by considering the intervals [ΘkT,(Θk+d)T] associated with jumps for k=1,,n. We are working with a sample path for which the event Ekjump occurs, Therefore,

Q((Θk+d)T)(Q(ΘkT)+Takejk)=(Q(ΘkT)+τ=ΘkTΘkT+dT1(A(τ)μa(τ)))(Q(ΘkT)+Takejk)(τ=ΘkTΘkT+dT1A(τ))Takejk+τ=ΘkTΘkT+dT1μa(τ)dT(1+2μ¯)+dTμ¯=dT(1+3μ¯),(C.33)
where the first equality is due to (C.32). The second inequality follows from the definition (40) of Ekjump and the fact μa(τ)μ(τ)μ¯ for all τ0.

Combining (C.33) with (C.31), it follows that, for k=1,,n,

Q((Θk+d)T)Tqϵ(Θk+d)Q((Θk+d)T)(Q(ΘkT)+Takejk)+(Q(ΘkT)+Takejk)T(qϵ(Θk)+akejk)+T(qϵ(Θk)+akejk)Tqϵ(Θk+d)Q((Θk+d)T)(Q(ΘkT)+Takejk)+Q(ΘkT)Tqϵ(Θk)+dTμ¯dT(1+3μ¯)+Q(ΘkT)Tqϵ(Θk)+dTμ¯=Q(ΘkT)Tqϵ(Θk)+dT(1+4μ¯),
where the second and third inequalities are due to (C.31) and (C.33), respectively. Using the definition (33) of d, we conclude that
Q((Θk+d)T)Tqϵ(Θk+d)Q(ΘkT)Tqϵ(Θk)+γcT8.(C.34)

We have so far established that, if the two trajectories Q(·) and qϵ(·) are close at the beginning of an interval [ΘkT,(Θk+d)T], they are also close at the end. We now need to establish a similar conclusion over intervals of the form [(Θk+d)T,Θk+1T]. We wish to capitalize on Theorem 4. That result, however, refers to fluid models with constant arrival rates. In contrast, our stochastic process has a piecewise constant arrival rate, and the same holds for the associated JF trajectory. We deal with this issue by applying Theorem 4 repeatedly for each one of the piecewise constant segments.

Let us fix some k{1,,n} and recall that r is an upper bound on the number of subintervals in [Θk+d,Θk+1) during which λ(·) stays constant. Let us then fix some times θi for i=1,,r+1 such that

Θk+d=θ1<<θr<θr+1=Θk+1,
and such that λ(·) is constant during each one of the intervals (θi,θi+1) for i=1,,r.

Under our assumption that the sample path satisfies the event Ekfluc, we see that, during each one of the intervals [θiT,θi+1T] and for i=1,r, we have

maxt[θiT,θi+1T)τ=θiTt(A(τ)λ¯(τ))2maxt[ΘkT+dT,Θk+1T)τ=ΘkT+dTt(A(τ)λ¯(τ))γcT16Cr.(C.35)

We now note that the function q˜ϵ(·) defined by q˜ϵ(t)=Tqϵ(t/T) is also an ϵ-JF trajectory and, in particular, is a fluid trajectory during each interval [θiT,θi+1T). We apply Theorem 4 over this interval. Using also the fact that 1+λ(τ)μ¯, we obtain

Q(θi+1T)Tqϵ(θi+1)Q(θiT)Tqϵ(θi)+Cμ¯+C·γcT16Cr.(C.36)

By summing such inequalities, for i=1,,r and using the facts θ1=Θk+d,θr+1=Θk+1, and qϵ((Θk+d))=qϵ(Θk+d), we obtain

Q(Θk+1T)Tqϵ(Θk+1)Q((Θk+d)T)Tqϵ(Θk+d)+rCμ¯+γcT16.(C.37)

We now add (C.34) and (C.37) to obtain

Q(Θk+1T)Tqϵ(Θk+1)Q(ΘkT)Tqϵ(Θk)+rCμ¯+3γcT16.

We finally sum these bounds for k=1,,n and use the fact qϵ(Θn+1)=qϵ(1) to conclude that

Q(T)Tqϵ(1)Q(Θ1T)Tqϵ(Θ1)+nrCμ¯+3nγcT16Q(Θ1T)Tqϵ(Θ1)+nrCμ¯+3cT16,(C.38)
where we also make use of the property nγγTn1.

The interval [0,Θ1) is to be treated a little differently as Θ0=0 is not a jump time. Even so, the argument leading to (C.37) applies verbatim and shows that

Q(Θ1T)Tqϵ(Θ1)Q(0)Tqϵ(0)+rCμ¯+γcT16Q(0)+rCμ¯+cT16,
where we make use of the fact that the fluid trajectory is initialized at zero and the inequality γ1. Combining with (C.38), we finally conclude that
Q(T)Tqϵ(1)Q(0)+(n+1)rCμ¯+cT4.

We now let T¯3 be large enough so that, for any TT¯3, we have (n+1)rCμ¯+(cT/4)+(cT/5)cT/2. As long as TT¯3 and using the assumption Q(0)cT/5, we obtain

Q(T)Tqϵ(1)cT5+(n+1)rCμ¯+cT4cT2.

Finally, using c=qmϵ(1), we have

Qm(T)=Tqmϵ(1)(Tqmϵ(1)Qm(T))Tqmϵ(1)Tqϵ(1)Q(T)Tqmϵ(1)cT2=cTcT2=cT2,(C.39)
and this completes the proof of Lemma 10.

Endnotes

1 This phenomenon can arise under other scheduling policies as well, for example, the generalized processor sharing policy (Borst et al. 2003).

2 Strictly speaking, the results in Section 5 only establish the absence of robust delay stability, not delay instability for the specific arrival process distributions of our example. However, a slight modification of the proof in Section 8 shows that queue 3 is indeed delay unstable.

3 Recall our assumption in Section 2 that, for any μM, the set M also contains all vectors that result from setting some entries of μ equal to zero. It is shown in proposition 2 of Sharifnassab et al. (2020) that, under this assumption, the fluid model of Definition 2 is equivalent to the more standard, albeit more complicated, definitions of fluid models based on differential equations with boundary conditions.

4 If γj= and nj = 0, we use the convention ·0=0.

5 We note that n may be equal to zero. For example, for a single unstable queue, an ϵ-JF trajectory becomes positive even in the absence of jumps. Such a system is not RDS, consistent with our result.

6 To avoid notation clutter, we present the proof as if ΘkT or ΘkT+dT were integer, which is not necessarily the case. Everything goes through, with occasional trivial modifications, if a sum of the form t=abct is interpreted as t=abct.

7 The reason for the condition λ*0 is that, if λ*=0, then a system is trivially 0-RDS, but a small perturbation that leads to positive arrival rates can result in a delay unstable system.

8 In case t1=t0+1, the interval (t0, t1) is empty, and the term involving a maximum over t(t0,t1) is interpreted as zero.

9 This includes the case in which t1=t0+1 so that the interval (t0, t1) is empty.

References

  • Anantharam V (1989) How large delays build up in a GI/G/1 queue. Queueing Systems 5(4):345–367.Google Scholar
  • Anthony M, Bartlett PL (2009) Neural Network Learning: Theoretical Foundations (Cambridge University Press, New York).Google Scholar
  • Asmussen S (1996) Rare events in the presence of heavy tails. Glasserman P, Sigman K, Yao DD, eds. Stochastic Networks (Springer, New York), 197–214.Google Scholar
  • Bertsimas D, Gamarnik D, Tsitsiklis JN (2001) Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11(4):1384–1428.Google Scholar
  • Borst S, Boxma O, Jelenković P (2003) Reduced-load equivalence and induced burstiness in GPS queues with long-tailed traffic flows. Queueing Systems 43(4):273–306.Google Scholar
  • Chen B, Blanchet J, Rhee CH, Zwart B (2019) Efficient rare-event simulation for multiple jump events in regularly varying random walks and compound Poisson processes. Math. Oper. Res. 44(3):919–942.LinkGoogle Scholar
  • Durrett R (1980) Conditioned limit theorems for random walks with negative drift. Zeitschrift Wahrscheinlichkeitstheorie Verwandte Gebiete 52(3):277–287.Google Scholar
  • Foss S, Korshunov D (2012) On large delays in multi-server queues with heavy tails. Math. Oper. Res. 37(2):201–218.LinkGoogle Scholar
  • Foss S, Korshunov D, Zachary S (2011) An Introduction to Heavy-Tailed and Subexponential Distributions (Springer, New York).Google Scholar
  • Georgiadis L, Neely MJ, Tassiulas L (2006) Resource Allocation and Cross-Layer Control in Wireless Networks (Now Publishers Inc., Hanover, MA).Google Scholar
  • Jelenković P, Momčilović P (2003) Asymptotic loss probability in a finite buffer fluid queue with hetergeneous heavy-tailed on–off processes. Ann. Appl. Probab. 13(2):576–603.Google Scholar
  • Maguluri ST, Burle SK, Srikant R (2016) Optimal heavy-traffic queue length scaling in an incompletely saturated switch. ACM SIGMETRICS Performance Evaluation Rev. 44(1):13–24.Google Scholar
  • Markakis MG (2013) Scheduling in switched queueing networks with heavy-tailed traffic. Unpublished PhD thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • Markakis MG, Modiano E, Tsitsiklis JN (2014) Max-weight scheduling in queueing networks with heavy-tailed traffic. IEEE/ACM Trans. Networking 22(1):257–270.Google Scholar
  • Markakis MG, Modiano E, Tsitsiklis JN (2018) Delay analysis of the max-weight policy under heavy-tailed traffic via fluid approximations. Math. Oper. Res. 43(2):460–493.LinkGoogle Scholar
  • Nair J, Jagannathan K, Wierman A (2015) When heavy-tailed and light-tailed flows compete: The response time tail under generalized max-weight scheduling. IEEE/ACM Trans. Networking 24(2):982–995.Google Scholar
  • Nair J, Wierman A, Zwart B (2020) The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation (Cambridge University Press, Cambridge, UK).Google Scholar
  • Neely MJ (2010) Stochastic Network Optimization with Application to Communication and Queueing Systems. Ying L, ed., Synthesis Lectures on Communication Networks and Algorithms 3.1 (Springer Nature, Cham, Switzerland).Google Scholar
  • Pakes A (1975) On the tails of waiting-time distributions. J. Appl. Probab. 12(3):555–564.Google Scholar
  • Shah D, Wischik D (2012) Switched networks with maximum weight policies: Fluid approximation and multiplicative state space collapse. Ann. Appl. Probab. 22(1):70–127.Google Scholar
  • Sharifnassab A, Tsitsiklis JN, Golestani SJ (2019) Sensitivity to cumulative perturbations for a class of piecewise constant hybrid systems. IEEE Trans. Automatic Control 65(3):1057–1072.Google Scholar
  • Sharifnassab A, Tsitsiklis JN, Golestani SJ (2020) Fluctuation bounds for the max-weight policy with applications to state space collapse. Stochastic Systems 10(3):223–250.LinkGoogle Scholar
  • Tassiulas L, Ephremides A (1992) Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Trans. Automatic Control 37(12):1936–1948.Google Scholar
  • Veraverbeke N (1977) Asymptotic behaviour of Wiener-Hopf factors of a random walk. Stochastic Processes Appl. 5(1):27–37.Google Scholar
  • Zwart B, Borst S, Mandjes M (2004) Exact asymptotics for fluid queues fed by multiple heavy-tailed on-off flows. Ann. Appl. Probab. 14(2):903–957.Google Scholar