Jumping Fluid Models and Delay Stability of Max-Weight Dynamics Under Heavy-Tailed Traffic
Abstract
We say that a random variable is light-tailed if moments of order are finite for some ; otherwise, we say that it is heavy-tailed. We study queueing networks that operate under the max-weight scheduling policy for the case in which some queues receive heavy-tailed and some receive light-tailed traffic. Queues with light-tailed arrivals are often delay stable (that is, expected queue sizes are uniformly bounded over time) but can also become delay unstable because of resource sharing with other queues that receive heavy-tailed arrivals. Within this context and for any given “tail exponents” of the input traffic, we develop a necessary and sufficient condition under which a queue is robustly delay stable, in terms of jumping fluid models—an extension of traditional fluid models that allows for jumps along coordinates associated with heavy-tailed flows. Our result elucidates the precise mechanism that leads to delay instability through a coordination of multiple abnormally large arrivals at possibly different times and queues and settles an earlier open question on the sufficiency of a particular fluid-based criterion. Finally, we explore the power of Lyapunov functions in the study of delay stability.
1. Introduction
We say that a random variable is light-tailed if moments of order are finite for some ; otherwise, we say that it is heavy-tailed. We study queueing networks that operate under the max-weight (MW) scheduling policy for the case in which some queues receive heavy-tailed traffic, whereas some other queues receive light-tailed traffic; our motivation stems from the fact that heavy-tailed processes are often natural models of the inputs to computer and communications networks (Foss et al. 2011). Queues that receive heavy-tailed traffic are naturally delay unstable, that is, they incur infinite expected delay as an immediate consequence of the Pollaczek–Khintchine formula. However, it is also known that, because of the relatively complex max-weight dynamics, some of the queues that receive light-tailed traffic may also end up delay unstable.1 Our aim is to develop conditions that determine whether any particular queue is delay stable or not.
This problem is studied extensively (Markakis 2013; Markakis et al. 2014, 2018; Nair et al. 2015), and a necessary condition for delay stability is given in Markakis et al. (2014, 2018). In particular, Markakis et al. (2018) consider the associated fluid model, initialized at zero, except for a positive initial condition at some queue that receives heavy-tailed arrivals. If another queue happens to eventually become positive under that fluid model, one can then conclude that the latter queue is delay unstable. This result leads to the natural question whether this condition is also sufficient, that is, whether delay stability is guaranteed when any such fluid trajectory (with a positive initialization at any single one of the queues that receive heavy-tailed traffic) keeps the queue of interest at zero level. Such a sufficiency result might appear plausible because, in models involving heavy-tailed random variables, large fluctuations are often the consequence of a single abnormally large value in the underlying heavy-tailed random variables (Pakes 1975, Veraverbeke 1977, Anantharam 1989, Asmussen 1996, Durrett 1980, Foss et al. 2011).
1.1. Our Contributions
We start in Section 3 by showing that the aforementioned possible sufficiency result does not hold. We accomplish this by providing a fairly simple example in which a large arrival at any single heavy-tailed queue does not cause a certain queue of interest to grow, but a combination of two large arrivals, at two different heavy-tailed queues, can result in large backlogs at the queue of interest. We also provide necessary and sufficient conditions for delay stability in that particular example, and then provide intuition for a possible more general result. Interestingly, delay instability manifests itself only when the tail exponents (defined precisely in Section 2.2) of the heavy-tailed arrival processes lie in a specific range. As a consequence, we need to take these exponents explicitly into account, something that traditional fluid models cannot do.
We then generalize, by developing general and tight (necessary and sufficient) conditions for delay stability, in terms of deterministic fluid-like models in which there can be multiple jumps (in different queues and possibly at different times) at the heavy-tailed queues.
Our conditions are not easy to test computationally, but this seems unavoidable: because the conditions are necessary and sufficient, the complexity of testing them reflects the intrinsic complexity of testing delay stability. On the positive side, our conditions
Provide a conceptual understanding of the mechanism that results in delay instability.
Can be checked in special cases, for instance, for the example in Section 3; see Section 5.3.
On the technical side, our general conditions involve a small but technically crucial reformulation of the delay stability problem. To understand the underlying issue, note that fluid models do not always lead to definite conclusions when the underlying system is marginally stable but are generally effective when used to analyze “robust” properties. For this reason, we consider a network with given nominal arrival rates and focus on robust stability. Namely, we ask whether a certain queue is delay stable for all arrival processes with given tail exponents and for all (possibly time-varying) arrival rates that lie in some open ball around the nominal ones. When the problem is framed that way, definitive necessary and sufficient conditions for (robust) delay stability become possible. Furthermore, with this formulation, it is only the nominal arrival rates and the tail exponents that matter as opposed to the details of the distribution of the input traffic.
Finally, earlier works Markakis (2013) and Markakis et al. (2018) show that Lyapunov functions with certain structural properties can be used to certify delay stability. But it was not known whether this methodology is “complete,” that is, whether delay stability can always be established through a suitable Lyapunov function. Our results make progress in this direction, for the case of “very heavy” tails, that is, when there exists some for which arriving traffic moments of order are infinite but γ can be arbitrarily small. For this regime, we derive some necessary and sufficient conditions for delay stability: we show that a Lyapunov function of a special kind can be used to certify delay stability together with a partial converse.
1.2. Related Works on Multiple Big Jumps
As already mentioned, in several systems that involve heavy-tailed random variables, large fluctuations are the consequence of a single large jump in the underlying heavy-tailed variables. However, this is not always the case. There are known systems that have small fluctuations under a single large jump, and large deviations can only arise as a consequence of multiple jumps in heavy-tailed random variables with suitable timing. Early examples are studied in Jelenković and Momčilović (2003) and Zwart et al. (2004) for a single-buffer system with multiple on/off input processes with heavy-tailed periods. Furthermore, Foss and Korshunov (2012) provide conditions on the finiteness of moments of queue sizes in a multiserver G/G/s queue and show the special effects caused by multiple jumps. Perhaps the closest work to ours that demonstrates the necessity of multiple big jumps is Chen et al. (2019), who propose a rare-event simulation technique to estimate the probability of rare events in stochastic systems that receive heavy-tailed inputs. As an application, they consider a queuing network with fixed fluid services and fixed routing in which each queue receives independent heavy- or light-tailed exogenous arrivals. They characterized the tail exponent of the queues in terms of a knapsack-type constraint that involves the sum of numbers of large jumps in the different inputs that are required to make the queue of interest large, in which the sum is weighted by the tail exponents of the corresponding inputs. Such a knapsack-type constraint also appears in our bounds, and the tail exponent fully determines the delay stability/instability of the queue of interest. Thus, our results parallel those in Chen et al. (2019) but for a different setting. The main differences are that our network operates under the more complex max-weight scheduler, and our assumptions allow for non–independent and identically distributed (i.i.d.) arrival processes. Furthermore, our proof techniques are very different from those in Chen et al. (2019); it would be interesting to explore whether the techniques in Chen et al. (2019) can be used to obtain alternative proofs for our setting (Theorem 1).
1.3. Outline
The rest of the paper is organized as follows. We start in the next section with the details of our model and some definitions. In Section 3, we discuss an example that provides insights on the ways that arrival rates and tail exponents affect delay stability and demonstrate that criteria based on traditional fluid models are inadequate for the purpose of deciding delay stability. In Section 4, we introduce a fluid-like model, which we call the jumping fluid (JF) model and underlies our main result, the necessary and sufficient conditions for (robust) delay stability that we present in Section 5. In Section 6, we study the power of Lyapunov functions for our problem. In Sections 7–9, we provide the proofs of our results. As the proofs are quite involved, they are presented as a sequence of lemmas with the proofs of the lemmas provided in Appendices B and C. We discuss the results and directions for future research in Section 10. Finally, in Appendix A, we explore alternative definitions of robust stability and corresponding variants of our jumping fluid conditions and explain why they are unlikely to yield sharp necessary and sufficient conditions.
1.4. Notation
We collect here some notational conventions to be used throughout the paper. We use boldface symbols to denote vectors and ordinary font to denote scalars. For any vector v, we use vi to denote its ith component and to denote the sum . We also use the notation to denote the vector with components . Finally, we let ej stand for the jth unit vector.
We use and to denote the sets of nonnegative reals and nonnegative integers, respectively. Furthermore, for a vector v, we write (respectively, ) to indicate that all components are nonnegative (positive). For any set S, we denote its convex hull by .
Throughout, stands for the Euclidean norm. Sometimes, we use the alternative notation in place of We also let be the distance of a vector from a set S, that is, . We finally use to denote the indicator function and to denote the natural logarithm.
For any time function that is right-continuous with left limits, or stand for the right derivative of with the implicit assumption that it exists, and stands for .
2. The Model
2.1. Network Model and the Max-Weight Policy
We consider a switched network that operates in discrete time. For simplicity and ease of presentation, we restrict ourselves to single-hop networks. However, our results are easily generalized to multihop networks of the type considered in Sharifnassab et al. (2020).
The network consists of queues that buffer incoming packets (or jobs). For any , we let be a nonnegative vector whose jth component is the length of the jth queue at time t. Packets arrive to the queues according to a nonnegative stochastic vector arrival process, . In particular, stands for the exogenous arrival to the jth queue at time t. We assume that the random variables , for different j and t, are independent. We refer to as the arrival rate vector at time t.
At each time t, the amount of service received by the queues is a nonnegative vector , which is chosen by a scheduler from a finite set of possible service vectors. The queue lengths then evolve according to
As in Sharifnassab et al. (2020), we assume throughout the paper that, for any , the set also contains all vectors that result from setting some entries of to zero. This assumption is naturally valid in most contexts.
We focus exclusively on the popular MW scheduling policy,
2.2. Light- and Heavy-Tailed Arrivals
In this section, we present some definitions related to the tails of the arrival process distributions.
To any nonnegative random variable X, we associate a tail exponent, defined as the value of γ at which switches from finite to infinite:
As an example, consider a continuous random variable whose probability density function satisfies
For an i.i.d. arrival process, a tail exponent is unambiguously defined as the tail exponent of the marginal distribution at an arbitrary time. However, once we bring robustness into the picture, we are led to consider arrival processes with nonidentically distributed . To any arrival process , we associate a tail exponent, γj, defined as the largest value of γ such that is dominated by some nonnegative random variable X with tail exponent γ for all times t:
Here, the term “dominates” refers to stochastic dominance: a random variables X dominates a random variable Y if for all . We say that is heavy-tailed if and light-tailed otherwise. Note that this definition of a heavy-tailed process is aimed to capture the boundedness of variance of the input distribution and differs from the conventional definition of heavy-tailed random variables, which requires subexponential decay of tail probabilities (Nair et al. 2020). The tail behavior of the different arrival processes is summarized by the vector .
We note that, as long as for some constant and for all times t, the tail exponent γj is well-defined and lies in the range . Indeed, suppose that for some finite constant and for all t. Consider a random variable X with probability density function . The Markov inequality implies that
In particular, X dominates for all t. Furthermore, , for every . Thus, γj is at least as large as any negative number, which implies that .
Conversely, if , then is finite and bounded as a function of t. We finally note that γj may be infinite; this is the case for bounded or, more generally, exponential-type distributions.
2.3. Robust Delay Stability
As already mentioned, we are interested in the question of delay stability under the MW policy in the presence of heavy-tailed arrivals. Given a set of arrival processes, we say that queue m is delay stable if, starting from , we have .
The arrival rates and the tail exponents do not provide enough information to decide whether we have delay stability or even stability. For example, there is sometimes indeterminacy on the boundary of the capacity region. Furthermore, a tail exponent of is compatible with being either finite or infinite, and this may be critical as far as delay stability is concerned (cf. the Pollaczek–Khintchine formula). These difficulties, all related to indeterminacy at certain boundaries, can be circumvented by focusing on a robust version of delay stability that incorporates two distinct elements.
We require delay stability for all arrival process distributions with given tail exponents. This allows us to focus on conditions that involve the tail exponents and ignore other details of these distributions.
We require delay stability for all arrival rates, possibly time-varying, in the vicinity of a given nominal rate. This allows us to avoid issues of indeterminacy when the arrival rate lies at a threshold between delay stability and instability.
(Robust Delay Stability). Let us fix a network with nodes and a set of possible service vectors. We consider a tail exponent vector with components in and a nominal arrival rate vector .
Given some , an arrival process belongs to the class if
The random variables , for different j and t, are independent and have finite means.
For every j, the tail exponent of the process is γj.
For every , we have .
For , we say that queue m is robustly delay stable (RDS) if there exists some such that queue m is delay stable for all arrival processes in the class .
3. A Counterexample and the Insufficiency of Fluid Models
In this section, we discuss a simple example with the following features:
Similar to existing examples, heavy-tailed arrivals to some queues can cause delay instability at a queue that receives light-tailed traffic.
Tight criteria for delay instability need to take into account the values of the tail exponents. In particular, criteria that are based on traditional fluid models cannot be conclusive because they do not involve the tail exponents.
Delay instability may emerge from coordinated large arrivals at multiple heavy-tailed queues.
Consider the three-queue network in Figure 1. The set of possible service vectors is such that, at any time slot, up to two queues can be served, each with rate one, but not all three queues can be served simultaneously. Thus, at each time step, the MW policy chooses two queues with the largest backlogs and serves each one of them with rate one.

Notes. In this figure, the dotted ellipses illustrate the different possible service vectors. The third queue is delay unstable for certain ranges of λ and γ.
The third queue receives deterministic arrivals with for all so that . The other two queues receive heavy-tailed traffic with a density of the form (4) and tail exponent and with rate 0.5; that is, .
The total arrival rate is , which is less than two, and the network is stable in the conventional sense. The first two queues are automatically delay unstable because they receive heavy-tailed arrivals. However, the third queue can be either delay stable or delay unstable, depending on the values of λ and γ. In what follows, we provide an informal discussion of the different cases.
(). In this case, queue 3 is delay unstable through a scenario similar to those considered in earlier works (Markakis et al. 2014). Intuitively, because of its heavy-tailed arrivals, Q1 occasionally receives large inputs. When that happens, with being large at some time , Q1 becomes and stays largest for some time. During that time, the MW policy keeps serving Q1, while the remaining service capacity is split between Q2 and Q3. Because , the sum of the arrival rates to Q2 and Q3 exceeds the aggregate service rate to these two queues over a time interval of duration . Thus, Q2 and Q3 build up to size . Because , we have , and using this property, it can be shown that grows unbounded.
The intuition behind this argument is captured by an existing criterion from Markakis et al. (2014) that examines certain trajectories of a corresponding fluid model (also called fluid trajectories). In that fluid model, the arrival processes are replaced by deterministic flows with the same rates. The initial conditions are for some heavy-tailed queue (in our example, h = 1 or h = 2) and for . (Because of symmetry, we only need to consider the case in which h = 1.) Let us say that the “zero fluid” (ZF) condition holds if the solution to the fluid model (known to be unique for the MW policy) keeps at zero. According to the criterion in Markakis et al. (2014), the failure of the ZF condition certifies the delay instability of queue 3. This is indeed the case here: starting with the initial conditions , it can be verified that, for small positive times t, we have .
In summary, for this particular case, we have delay instability, and this is correctly predicted by the failure of the ZF condition and available results.
( and ). In this case, queue 3 turns out to be delay stable (in fact, robustly delay stable). This is a consequence of our general result in Section 5.
For this case, the fluid model initialized at satisfies for all positive times, and the ZF condition holds. In particular, the ZF condition is “aligned” with delay stability. Observations of this nature lead to the question whether the ZF condition can be used as a certificate of delay stability (Markakis et al. 2018). However, this is not the case as we discuss next.
( and ). In this case, the ZF condition holds, similar to Case 2. However, queue 3 turns out to be delay unstable; this follows from the proof2 of our main result, Theorem 1.
We summarize here the underlying intuition. When , there is considerable probability that both queues 1 and 2 receive large inputs within a certain time interval. More concretely, for large values of M, there is probability that both Q1 and Q2 receive aggregate arrivals of size at least 3M within the time interval . If this happens, both Q1 and Q2 become large enough so that Q3 receives no service during the interval . As a result, with probability . One can then use this fact to show that, as t increases, grows unbounded, and queue 3 is delay unstable.
In summary, the presence or absence of delay stability depends on the tail exponents in a nontrivial manner. Furthermore, the ZF condition cannot discriminate between Cases 2 and 3 and, thus, cannot account for the different outcomes (delay stability in Case 2, delay instability in Case 3). In fact, the same obstacle arises with any other criterion that relies on traditional fluid models because fluid models do not take the tail exponents into account. In order to make progress, we need to consider the probability that large inputs (or jumps) may arrive within a certain time interval as a function of the tail exponents (the probability is larger when the tail exponents are smaller). Furthermore, as illustrated by Case 3, we may have to consider the effect of “coordinated” large jumps at more than one queue within the same time interval. This is accomplished by the model in the next section: it is still in the spirit of traditional fluid models except that it allows for jumps along the heavy-tailed flows, subject to a “budget” on allowed jumps as determined by the tail exponents.
4. Jumping Fluid Models
In this section, we introduce a generalization of the fluid model that allows for jumps along certain coordinates. We proceed by first defining a traditional fluid model and then modifying it.
4.1. The Fluid Model
A fluid model is a deterministic continuous-time dynamical system that replaces the arrival process with a fluid stream of arrivals and updates queue lengths along max-weight drifts. The literature provides a few, somewhat different but equivalent, definitions of the fluid model (Shah and Wischik 2012, Markakis et al. 2018), which typically involve differential equations with boundary conditions. Here, we adopt an equivalent but somewhat simpler definition3 from Sharifnassab et al. (2020).
Recall the definition of as the set of all possible service vectors. For any , we define
Furthermore, given an arrival rate vector and any , we let
For the definitions that follow, recall our convention that denotes the right derivative of with respect to time at time t.
(Fluid Trajectories). Let us fix a network with nodes, a set of possible service vectors, and an arrival rate vector . A fluid trajectory corresponding to is a nonnegative, continuous, and right-differentiable -dimensional function that satisfies the differential inclusion
Given some and , there always exists a unique (and nonnegative) fluid trajectory corresponding to and initialized at (Markakis et al. 2018, Sharifnassab et al. 2020).
4.2. Adding the Jumps
We now introduce JF trajectories. Given a vector of nonnegative integers, an ϵ-JF trajectory is a fluid trajectory with jumps with nj positive jumps in its jth component, allowing for ϵ-changes in the arrival rate . More concretely, we have the following definition.
(ϵ-Jumping Fluid Trajectories). Let us fix a network with nodes and a set of possible service vectors. We are given an arrival rate vector , a nonnegative integer vector n, and some . An ϵ-JF trajectory corresponding to is a nonnegative -dimensional function , which is right-continuous with left limits and right-differentiable initialized with for all t < 0 and with the following properties:
Each component has nj points of discontinuity.
If has a discontinuity at some time t, then .
For every ,
(10)for some nonnegative function that is right-continuous and piecewise constant with finitely many points of discontinuity and satisfies for all t.
Finally, an ϵ-JF(n) trajectory with4 , is called an ϵ-JF() trajectory.
Note that we allow jumps at time zero, in which case . Note also that, if ϵ = 0 and jumps can only happen at time zero, then an ϵ-JF trajectory is just a fluid trajectory with the jumps of the JF trajectory determining the initial conditions of the fluid trajectory.
More generally, an ϵ-JF trajectory consists of a concatenation of fluid trajectories over the intervals in which stays constant together with a finite number of jumps. For this reason, once the jump times, jump sizes, and function are specified, the results for fluid models extend and establish the existence and uniqueness of the ϵ-JF(n) trajectory.
Our next definition formalizes the requirement that a certain queue must stay at zero under all ϵ-jumping fluid trajectories.
(JF Conditions). Let us fix a network with nodes and a set of possible service vectors. We are given an arrival rate vector and a particular queue, m, of interest.
Given a vector with components in and some , we say that the ϵ-JF condition holds for queue m and if, for every ϵ-JF() trajectory corresponding to and every , we have .
Given a vector with components in , we say that the robust jumping fluid condition (RJF()) holds for queue m and if there exists some such that the ϵ-JF condition holds for queue m and .
Note the restriction on the number of jumps in terms of the tail exponents: the heavier the arrival processes (i.e., the smaller the tail exponents γj), the larger the number nj of jumps that we allow. As an example, if and queue 1 is the only heavy-tailed queue, then we allow up to jumps at queue 1 and no jumps at the other queues.
5. Main Result
Our main result provides a necessary and sufficient condition for robust delay stability in terms of JF trajectories. The proof is given in Sections 7 and 8.
Let us fix a network with nodes; a set of possible service vectors; an arrival rate vector ; a particular queue, m, of interest; and a vector of tail exponents with components in . The queue m is RDS if and only if the RJF condition holds for queue m and .
5.1. Some Intuition
We provide here a high-level explanation of our result. Some more refined intuition is provided by the proof outlines in Sections 7.1 and 8.1.
Let M be a large constant. We say that the stochastic process has a jump whenever it is larger than (approximately) M. Let Nj be the number of jumps of during the interval and let . It turns out that, for any nonnegative integer vector n, the probability that the event scales (approximately) like . The latter quantity is “significant” (in the sense that it makes an unbounded contribution to certain expected values) if and only if . Thus, over an interval of length M, we can focus on sample paths for which the realized vector n of jump counts satisfies and examine whether such sample paths can cause the queue of interest to become large. We then argue that these sample paths are well-approximated by the ϵ-JF(n) trajectories involved in the ϵ-JF condition.
5.2. Remarks
We continue with some remarks on the scope of our result.
5.2.1. Heavy-Tailed Queues.
If queue m is heavy-tailed, that is, , then the condition allows n to be the mth unit vector. With such a vector n, we can have an ϵ-JF(n) trajectory with a positive jump in the mth component, resulting in a positive value of . Thus, the RJF() condition does not hold, and queue m is not RDS. This is just a variation of the well-known fact that a queue with heavy-tailed arrivals is not delay stable.
5.2.2. Unstable Systems.
Theorem 1 makes no stability assumptions. For unstable (or marginally stable) systems, some components of ϵ-JF trajectories can grow arbitrarily large. On the other hand, these components do not necessarily have a substantial effect on the queue, m, of interest. As long as the mth component of all ϵ-JF trajectories stays at zero, queue m is RDS.
5.2.3. Comparison with the ZF Condition.
If for all j, then an ϵ-JF trajectory can have at most one jump. For stable systems, the RJF condition boils down to a robust version of the ZF condition introduced in Section 3. In other words, for this case, a robust version of the ZF condition is a necessary and sufficient condition for robust delay stability. On the other hand, because the (robust version of) the ZF condition is strictly weaker than the RJF condition, it does not provide necessary and sufficient conditions for general .
5.2.4. The Light-Tailed Case.
Suppose that for all j so that all arrival processes are light-tailed. In this case, no jumps are allowed, and the RJF condition boils down to considering ordinary fluid trajectories with slightly perturbed arrival rates. We have the following possibilities:
If is in the interior of the capacity region , then is in the interior of . It then turns out that is an attracting fixed point of the fluid dynamics, the RJF condition holds, and we have RDS for all queues. This is in line with existing results (e.g., see theorem 4.5 of Georgiadis et al. 2006).
If is on the boundary or outside the capacity region and, similar to our earlier discussion of unstable systems, queue m could be either RDS or non-RDS, depending on whether (perturbed) fluid trajectories cause qm to become positive or not.
5.2.5. The Case of Zero Tail Exponents.
Our definitions in Sections 2 and 4 are formulated for a nonnegative vector . However, our result is restricted to the case in which this vector is positive. We comment on the reasons for this.
When , we are dealing with an arrival process for which is finite, whereas may be infinite for every . Our proofs involve, at certain places, a division by γj and break down if . It is not clear whether a similar result is possible when some of the tail exponents are zero.
5.2.6. Computational Issues.
As already hinted in the introduction, checking the RJF condition algorithmically appears to be a hard computational problem, amenable only to impractical Tarski-like elimination algorithms. In one possible simplification, we might just consider ϵ-JF trajectories with ϵ = 0 so that for all times t. However, this would still leave the indeterminacy of the jump times and sizes to be reckoned with. Even worse, a restriction to this limited class of trajectories does not seem to lead to necessary and sufficient conditions for any suitably modified notion of stability. See Appendix A.2 for further discussion. A related question is whether we could, without loss of generality, require all the jumps to occur at the same time, for example, at time zero, thus eliminating the need to consider all possible values of the jump times. Unfortunately, this is not the case: there exist examples in which ϵ-JF trajectories can drive a queue m to a positive value, but this can happen only if we allow the jumps to occur at different times; see Appendix A.5 for an example.
5.3. Our Example, Revisited
On the positive side, Theorem 1 elucidates the precise mechanism that leads to delay instability through a coordination of multiple abnormally large arrival vectors at possibly different times and queues. Furthermore, it allows us to analyze simple problems, such as the one discussed in Section 3, which we do next.
Recall the network of three queues in Section 3, in which . For γ in the range , consider a jumping fluid trajectory with in which both q1 and q2 undergo unit jumps at time 0. With this trajectory, q3 immediately starts to grow positive. Because , it follows that the RJF condition fails to hold. Theorem 1 then establishes that queue 3 is not robustly delay stable.
On the other hand, when γ is in the range , the constraint allows at most one jump in either q1 or q2. Without loss of generality, we can assume that this jump takes place at time 0. It turns out that the fluid trajectories that start from either or keep at zero if and only if . Therefore, for γ in range and also taking robustness into account, queue 3 is RDS if and only if .
6. Robust Delay Stability via Lyapunov Functions
Lyapunov functions are a powerful tool for the stability analysis of queueing networks (Tassiulas and Ephremides 1992, Bertsimas et al. 2001, Maguluri et al. 2016), for example, in throughput optimality proofs for the MW policy (Tassiulas and Ephremides 1992, Neely 2010). Markakis (2013) and Markakis et al. (2018) provide a sufficient condition for delay stability based on a class of piecewise linear Lyapunov functions and use it to derive a sharp characterization of delay stability for a special class of networks, namely, networks with disjoint schedules. Nevertheless, the Lyapunov approach in Markakis (2013) and Markakis et al. (2018) has some drawbacks: (a) the condition provided therein is, in general, only sufficient for delay stability; (b) it does not take into account the tail exponents even though they play an essential role in delay stability as already discussed in Sections 3 and 5.3; (c) the Lyapunov functions considered are piecewise linear, which is perhaps inadequate for the purpose of tight delay stability conditions.
In this section, we explore the power of Lyapunov functions for the case in which the tail exponents of the heavy-tailed queues may be arbitrarily close to zero so that the RJF condition allows an arbitrarily large number of jumps.
For the remainder of this section, we assume that queues can be heavy-tailed, where , whereas the remaining queues are light-tailed. Formally, we consider the set Γ of tail coefficients, defined by
We also fix an arrival rate vector and a light-tailed queue m > h of interest.
(Special Lyapunov Function). For any , we say that a function is a special ϵ-Lyapunov function if
V is Lipschitz continuous with Lipschitz constant one.
whenever for all fluid trajectories corresponding to arrival rate .
We have . Furthermore, if , then .
V is nonincreasing along the coordinates associated with heavy-tailed queues; that is, for , for any , and any , we have , where is the jth unit vector.
Special ϵ-Lyapunov functions as defined are quite similar to the functions considered in theorem 2 of Markakis et al. (2018). However, in contrast to Markakis et al. (2018), our special Lyapunov functions need not be piecewise linear. Our next result establishes a strong connection between the ϵ-JF condition and the existence of special ϵ-Lyapunov functions. The proof is given in Section 9.
For any , there exists a special ϵ-Lyapunov function if and only if the ϵ-JF condition holds for every .
The special ϵ-Lyapunov functions constructed in the proof of Theorem 2 are not piecewise linear. We do not know whether Theorem 2 remains valid if we restrict to piecewise linear functions.
Combining Theorems 1 and 2, we can establish a strong connection between robust delay stability and special ϵ-Lyapunov functions. In what follows, we say that queue m is ϵ-RDS if it is delay stable under all arrival processes in the class in Definition 1(a).
Let us fix a network with nodes, a set of possible service vectors, an arrival vector , the number h of heavy-tailed queues, and a light-tailed queue m > h.
If there exists a special ϵ-Lyapunov function for some , then queue m is RDS for all tail exponents .
Suppose that there exists some such that, for all , queue m is ϵ-RDS. Then, there exists an ϵ-special Lyapunov function.
The proof is provided in Section 9.3. We conjecture that Corollary 1(b) can be strengthened to provide a converse to part (a).
If queue m is RDS for all tail exponents , then for some , there exists a special ϵ-Lyapunov function.
If Hypothesis 1 is true, we have a tight characterization: a queue is RDS for all tail exponents if and only if there exists a special ϵ-Lyapunov function that testifies to this. Establishing the conjecture appears to be difficult. Technically it amounts to reversing the order of the quantifiers in the clause “there exists such that, for all ” in Corollary 1(b) and showing equivalence with the statement “for every , there exists some …”
Our Lyapunov-based results are relevant to the case in which nothing is known about the tail exponents of the heavy-tailed queues other than the fact that they are positive. On the other hand, Lyapunov functions are unlikely to provide useful characterizations of robust delay stability for specific values of the tail exponents because there is no apparent way of accounting for the number of jumps through Lyapunov functions.
7. Proof of the “if” Direction of Theorem 1 (RJFRDS)
In this section, we provide the proof of the “if” direction of Theorem 1, that is, that the RJF condition implies RDS.
Throughout this section, we consider a network with nodes and a set of possible service vectors. We fix an arrival rate vector ; a particular queue, m, of interest; a vector of tail exponents with components in ; and some for which the ϵ-JF condition holds for and queue m. Our goal is to establish robust delay stability for queue m.
The proof is organized in a sequence of lemmas whose proofs are collected in Appendix B. However, before proceeding to the formal arguments, it is helpful to provide an overview of the proof.
7.1. Outline of the Proof
Let us fix an arbitrary time T. We aim at upper bounds for the probability as M gets large. As long as these bounds are a summable function of M and independent of T it follows that is finite and a bounded function of T, which is our goal. Let us now fix some M and keep it fixed throughout except for the end of the proof.
The proof relies on various probabilistic bounds as well as on deterministic properties of the MW dynamics. Let us start with the probabilistic part, which is focused on showing that the stochastic system mostly follows the deterministic fluid dynamics except for certain jumps caused by the heavy tails of the arrival processes. We define a threshold for what constitutes a jump and then develop a probabilistic bound on the numbers of jumps. A difficulty here is that, if we use a fixed threshold and because T is arbitrary, a bound on the number of jumps is not possible. We handle this issue by using a threshold θt that increases almost linearly as we move further to the past of the form
This is for some positive constant η to be defined later. At any time and for any index j, we say that is a jump if . Ignoring logarithmic factors, we show that the jump probability is of order at most , where is slightly smaller than the tail exponent γj. By summing over t and after some elementary calculations, we then obtain that is of order at most , where Nj is the number of jumps of the jth arrival process during the interval . Then, a further calculation shows that, if , then is of order at most for some constant and is, therefore, a summable function of M; see Lemma 1.
We then consider stochastic fluctuations in the arrival process other than jumps. We argue that they average out so that the cumulative arrival process follows its fluid counterpart. We refer to this as the “small fluctuations event,” and show that it occurs with probability at least ; see (21) and Lemma 3.
Having completed the probabilistic analysis, we then switch to deterministic (sample path) considerations. Let be the set of all points q that can be reached by some ϵ-JF trajectory (see Definition 6). As a first step, we exploit some special properties of the MW dynamics and show that is ϵ-attracting; that is, any fluid trajectory that starts outside moves toward that set with rate at least ϵ (Lemma 4). We then consider a “nice” sample path, that is, a sample path for which the small fluctuations event occurs and for which the realized vector of jump counts n satisfies . (As discussed earlier, nice sample paths have probability with .) Our deterministic analysis, outlined in the next paragraph, shows that any such sample path stays within O(M) distance from the set . Because , the ϵ-JF() condition implies that any point in satisfies qm = 0. Thus, for nice sample paths, we have , and therefore, is of order at most for the constant mentioned earlier. This readily implies a uniform upper bound on .
The analysis of the dynamics under nice sample paths has two parts. We first study the dynamics between jumps: we rely on the small fluctuations event and then make use of a result from Sharifnassab et al. (2020, theorem 4) to ensure that small fluctuations in the arrivals result into comparably small changes in the resulting stochastic trajectory; cf. Lemma 5. Second, to understand what happens at jump times, we recall that the jump vectors n associated to nice sample paths satisfy . Such vectors n are allowed in ϵ-JF() trajectories, and therefore, the jumps cannot take the ϵ-JF(n) trajectory away from . This implies that a nice sample path stays “close” to an ϵ-JF(n) trajectory and, therefore, has a “small” Qm.
7.2. Sensitivity of Max-Weight Dynamics
The proof for both directions of the theorem requires fairly precise bounding of the fluctuations of the stochastic trajectories. To this effect, we rely heavily on a fluctuation bound for the MW dynamics established in Sharifnassab et al. (2020).
(Sharifnassab et al. 2020, theorem 2). Fix a network (i.e., the number of nodes and the set of possible service vectors) operating under the MW policy and let be the corresponding queue length stochastic process. There exists a (deterministic) constant such that, for any arrival rate vector , any , any , and any sample path, if , then
We actually use the following variant of Theorem 3, which allows for different initial conditions. The proof is given in Appendix B.1.
Under the same assumptions as in Theorem 3 except that we allow for and to be different and for the same constant C, we have
7.3. Arrival Process and Jumps
We now return to the formal proof. Recall that, throughout this section, we fix the network, and the tail exponents . We assume that the ϵ-JF condition holds for , queue m, and some . We fix some positive integers M and T; these remain fixed throughout except for the end of the proof and for some additional assumptions that M is “large enough.”
We consider an arrival process in the class introduced in Definition 1 with , where C is the constant in Theorem 4 and
In particular,
Our goal is to derive an upper bound on that holds uniformly for every arrival process in , every T, and every large enough M. This is then used to conclude that the RDS property holds.
Because we assume that the tail exponents are nonzero, we have . Furthermore, in order to simplify the proof, it is convenient to assume that
The assumption can be made without loss of generality.
Given a system, call it S, consider a new system in which we add one more queue, queue 0, with , which does not interact with the others; for example, for any allowed service vector in system S, we introduce a corresponding service vector in system . This way, the dynamics of queues are the same in the two systems S and . It follows that, for , queue m is RDS in system S if and only if it is RDS in system . Furthermore, because of the lack of interaction, the RJF condition for queue m holds in system S if and only if it holds in system .
Once we prove the result for the case , we apply it to system and obtain the equivalence of RDS and RJF for the latter system. Based on this discussion, this also establishes the equivalence of RDS and RJF for the original system S. □
We define some more constants:
Having fixed M, T, and η, we finally define θt as in (11).
For and , we say that t is a jump time for if . For any j and any , we let be the (random) number of jumps in that occur during and also define the corresponding vector . To simplify notation and as long as T is fixed, we use N and Nj to refer to and , respectively. Note that, with this definition, includes all jumps that can affect .
We consider the event
We argue that it occurs with high probability.
Let F be the set of all such that . If F is empty, then whenever , we must have . If F is nonempty, it has finitely many elements, which implies that the minimum is attained and its value is greater than one. In either case, we obtain that there exists a constant such that
Without loss of generality, we can take β to satisfy . In the next lemma, we show the probability that decays at least as fast as .
There exists a constant independent of T such that, if , then
The proof is given in Appendix B.2 and relies on the intuition that the probability of a jump in the jth arrival process scales at most as (approximately), which then implies that the probability of the event scales at most as (approximately).
7.4. Fluctuations of the Arrival Process
We now study the remaining fluctuations of the cumulative arrival processes after we exclude the jumps. We begin with a concentration inequality for the sum of independent random variables. The proof of our next lemma is given in Appendix B.3 and is essentially a reformulation of the Bernstein inequality; see, for example, (1.21) in appendix 1 of Anthony and Bartlett (2009).
Suppose that are independent random variables that satisfy and for some . Let . Then, for any ,
We consider the truncated process , where the minimum is taken component-wise. We define the small fluctuations event
There exists a constant independent of T such that, if , then .
The proof is somewhat long but straightforward. It relies on the concentration inequality in Lemma 2 and is given in Appendix B.4.
7.5. Deterministic Analysis of the Dynamics
We start with some definitions. It is useful to recall here that ϵ-JF trajectories start at zero just before time 0 but can become nonzero at time 0 or later because of jumps or unstable drifts.
For any nonnegative integer vector n, we define as the set of all points in that can be reached by some ϵ-JF trajectory.
Because ϵ-JF trajectories are by definition nonnegative, is a subset of . Note that the set depends on ϵ, but because ϵ is held fixed, we suppress this dependence from our notation.
(ϵ-Attracting and ϵ-Invariant Sets).
A subset W of is ϵ-attracting if every fluid trajectory corresponding to and initialized at some arbitrary satisfies
(22)whenever .A subset W of is ϵ-invariant if, for any that satisfies and any fluid trajectory corresponding to and initialized at some arbitrary , we have for all .
We observe that if W is ϵ-attracting and , then every fluid trajectory satisfies
It can be shown that an ϵ-attracting set is ϵ-invariant, but we do not need this fact. We are interested instead in the converse statement, that every ϵ-invariant set is ϵ-attracting, which we then apply to the set ; this is the subject of the next lemma.
Every ϵ-invariant set is ϵ-attracting.
For any , the set in Definition 6 is ϵ-invariant and, a fortiori, ϵ-attracting.
The proof is given in Appendix B.5 and relies on the intuition that, for an ϵ-invariant set W, ϵ-perturbations of cannot take the trajectories away from W. This means that there must be a drift toward W that locally counteracts such perturbations. The nonexpansive property of the max-weight dynamics then enables us to extend this local counteraction result into the desired global attraction property.
Let us fix a sample path of the arrival process. For any , let be the realized value of , that is, is the vector with the realized number of jumps in the process up to time t (see the paragraph preceding (18)). With a bit of abuse of notation, we let
(The reason for the t – 1 term on the right-hand side is that we want to compare and , but is only affected by jumps that happen before time t.)
7.6. Some More Intuition
The general idea is to show that stays close to the set so that we can ultimately exploit the fact that qm = 0 for every . There are two parts to the argument:
If, at a certain time, has a jump (i.e., a large increase), the set W(t) expands along the jth coordinate, and so the distance between and W(t) does not increase.
In between jumps, follows the fluid trajectory, plus some fluctuations, within the range allowed by Lemma 3. These fluctuations get “eliminated” because the fluid trajectory is attracted to W(t) (Lemma 4). Note that (21) allows for larger fluctuations in the far past (see the term ); however, for fluctuations in the far past, the motion in the direction of W(t) happens for a longer time period, enough to eliminate them. This explains the choice of the threshold θt in (11); the logarithmic term in the denominator is included for technical reasons.
7.7. The Distance from the Invariant Set
Given times that satisfy , we say that the interval (t0, t1) is jump-free if for all j and all . Note that the initial time t0 and the end time t1 are allowed to be jump times. Let be a constant, independent of T, such that, for any ,
Suppose that and fix times that satisfy . Consider a sample path under which the event occurs and the interval (t0, t1) is jump-free. Then,
Note that the lemma also applies when so that the set of integers in (t0, t1) is empty. The proof, which is given in Appendix B.6, relies on the sensitivity bound in Theorem 4 and the fact that is ϵ-attractive. The first term on the right-hand side reflects the fact that a fluid trajectory is attracted to during the jump-free interval; the second term reflects the effect of the smaller fluctuations during the interval (t0, t1).
We then apply Lemma 5 and use strong induction on t for to establish the next lemma. Its proof is given in Appendix B.7.
Suppose that . Consider a sample path under which the events and occur. Then, .
7.8. Bounding
Let , where M1, M2, and M3 are the constants in Lemma 1, Lemma 3, and (25), respectively. Because these constants are independent of T, the constant is also independent of T.
Let us consider some and a sample path under which the events and occur. In particular, we have , where n is the realized value of N, that is, the vector with the number of jumps until time T – 1 for that particular sample path. The ϵ-JF() condition, which we assume to hold, implies that every ϵ-JF(n) trajectory with keeps qm at zero, and therefore, every vector has qm = 0. Consequently, for every sample path in , we have
8. Proof of the Reverse Direction of Theorem 1 (RDS RJF)
In this section, we prove the reverse (“only if”) direction of Theorem 1, that is, that RDS implies that the ϵ-JF condition holds for some . The proof is organized in a sequence of lemmas whose proofs are collected in Appendix C.
We actually prove the contrapositive and start by assuming that, for every , there exists an ϵ-JF trajectory with and such that at some positive time t.
We keep , and ϵ fixed throughout the proof and show that there exists an arrival processes in the class for which queue m is not delay stable. Because ϵ can be arbitrarily small, this implies that there exists no δ such that queue m is delay stable for all arrival processes in the class , and therefore, queue m is not RDS. However, before proceeding to the formal arguments, we overview informally the key ideas in the proof.
8.1. Outline of the Proof
The main idea is to construct a certain arrival process in the class , whose arrival rate is a time-scaled version of the piecewise constant rate associated with the ϵ-JF(n) trajectory. We then use the bounded sensitivity property of the MW dynamics (Theorem 4) to show that the resulting process tracks a suitably scaled (by a factor of T) version of the ϵ-JF trajectory with substantial probability. In particular, is (with substantial probability) comparable to , leading to a large value of . This part of the argument capitalizes on the fact that the number of jumps of the ϵ-JF(n) trajectory is limited by the condition .
On the technical side, the tracking result involves two separate arguments:
Whenever the ϵ-JF(n) trajectory has a jump, at some time τ, there is substantial probability that the stochastic process also has a jump at some time near the scaled counterpart, , of τ; see Lemma 8.
In between jump times of the ϵ-JF(n) trajectory, we use concentration inequalities to show that there is a fairly large probability that the stochastic process stays close to its fluid counterpart.
8.2. Jumps of the ϵ-JF(n) Trajectory
We fix a network with nodes; a set of possible service vectors; an arrival rate vector ; a particular queue, m, of interest; and a vector of tail exponents with components in . We also fix some and assume that the ϵ-JF() condition fails to hold.
We start with a few elementary observations, namely, that the ϵ-JF(n) trajectories of interest can be taken, without loss of generality, through scaling and perturbations, to have some convenient properties that allow us to simplify subsequent notation and arguments. The proof is given in Appendix C.1.
If the ϵ-JF() condition fails to hold, then there exists some with and an ϵ-JF(n) trajectory with the following properties:
.
The times at which is discontinuous (the jump times) all belong to the open interval (0, 1).
At each jump time, exactly one of the components of is discontinuous.
The arrival rate associated to satisfies for all j.
For the rest of the proof, we fix an ϵ-JF(n) trajectory with the properties in Lemma 7, together with the associated vector n and rate function . We define , which is the total number of jumps.5
We define , and for , let Θk be the kth jump time. In particular,
We use jk to refer to the queue at which the kth jump occurs and ak to refer to the size of the kth jump. In particular, at time Θk, for is continuous for every , and
8.3. Defining Certain Constants
As in (16), we define
Moreover, similar to (13), we let
By arguing as in Claim 1, we can and do assume, without loss of generality, that . We then define a positive constant
(In case n = 0, the last term inside the brackets is taken to be zero.)
8.4. Defining the Stochastic Arrivals
Let us fix some constants and T > 0 and keep them fixed until the end of the proof in Section 8.7. In this section, we define a stochastic arrival process over an interval of the form , thus constructing what we call an episode of the overall process. Later, in Section 8.7, we concatenate multiple episodes to construct the stochastic process over the entire timeline .
Consider the arrival rate function of the ϵ-JF(n) trajectory; in particular, . The arrival rate vector for the stochastic process during the episode is a time-scaled (by a factor of T) and shifted (by t0) version of . More concretely, we let
Clearly,
Furthermore, it follows from Lemma 7(d) that for every j.
We now digress to introduce certain constants that are used to specify the exact form of the distribution of the arrival processes. For any , we let
(Arrivals During an Episode). Given T > 0 and , we define the arrival process over the interval (which we call an episode) as follows:
If , then (deterministically).
If , then is a random variable with probability density function
(38)where denotes the indicator function, is Dirac’s delta function, and is as defined in (36).The for different j and t are independent.
We refer to as an episode-adjusted arrival process.
From (37), we obtain , where the last inequality is due to (30) and (35). Therefore, the coefficient of the delta function is nonnegative, and we have a well-defined distribution. It is also easy to verify that . Furthermore, from (35), is bounded above. It follows that each is dominated by a random variable with tail exponent γj, and consequently, the process also has tail exponent γj. Moreover, . Therefore, using again (35), we have , for all . In particular, the process belongs to the class as desired.
8.5. Probabilistic Analysis
We aim to show that, during an episode and with significant probability, the queue vector process stays close to a scaled version of the ϵ-JF trajectory, that is, that
To accomplish this, we consider each interval of the form for and show that there is a substantial probability that the arrival process we have defined has the following properties: (a) as long as k > 0, it has a jump in a small segment in the beginning of the interval, and (b) it has small fluctuations in the rest of the interval. We define events that capture these two properties.
For , let be the cumulative arrival vector over the interval , that is,6
We let be the event that emulates the jump in at time Θk scaled by T, that is,
There exist and such that, if , then for and for the episode-adjusted arrival process over an episode , we have .
The proof is given in Appendix C.2.
Recall now that the function driving the trajectory is piecewise constant with a finite number of pieces. We use r to denote the number of such pieces. We then proceed to define certain small fluctuation events. For , we let be the event that the cumulative fluctuations of over the interval are small, that is,
There exists a such that, if , then for and for the episode-adjusted process over an episode , we have .
The proof is given in Appendix C.3.
Note that each one of the events for and for is determined by the arrival process during a particular interval and all of these intervals are disjoint. Thus, the independence assumption on the arrival process implies that all of these events are independent. Therefore, if , then
8.6. is Large at the End of an Episode
We are now ready to argue that, if the events and occur, then, over an episode , the process stays close to the suitably scaled ϵ-JF trajectory. As a consequence, the value of Qm at the end of the episode becomes of order cT, where as in (32). This argument is entirely deterministic. It is carried out in the course of the proof of the next lemma (in Appendix C.4) and relies on the sensitivity bound in Theorem 4.
There exists a such that, if , then the following holds. If a sample path of the episode-adjusted process over the episode satisfies the events and and if , then .
Let and , where , and are the constants in Lemmas 8–10, respectively. We note that all of these constants, , and therefore as well, are defined in terms of general parameters and properties of the particular ϵ-JF trajectory and are deterministic. Lemma 10 and (42) imply that, for an episode with and initialized so that , we have
8.7. Concatenating Episodes over the Entire Timeline
So far, we have defined and studied an arrival process over an episode . We now concatenate a sequence of such episodes of increasing duration, which defines an arrival process over an infinite timeline.
We define times and arrival processes for the intervals , recursively, as follows. We let and , where is defined in the last paragraph of Section 8.6. We also let for be the corresponding episode-adjusted process as in Definition 8. Suppose now that we have defined Ti for some as well as the arrival process for . We then let
Finally, we define the arrival process over the episode to be the corresponding episode-adjusted process. With this recursion, the arrival process is now well-defined for all times .
Note that . This guarantees that all Ti are finite and we have an infinite number of episodes. Moreover, note that, for , we have . As a result, , and also,
We define the event
Recall now that Inequality (44), which is about an episode of length T, makes use of the assumption , where t0 is the start time of the episode. According to (47), this assumption is satisfied at the start time of the episode with probability at least 1/2. By interpreting (44) as a statement about conditional expectations and with t0 and replaced by Ti and , respectively, we obtain
9. Proof of Theorem 2
9.1. Proof of the First Direction
Let us fix some . To establish one direction of the result, we assume that the ϵ-JF condition holds for every . We show that there exists a special ϵ-Lyapunov function.
Let be the set of all nonnegative integer vectors n such that nj = 0 for j > h; that is, we allow arbitrarily many jumps at the heavy-tailed queues and no jumps at the light-tailed ones. As in Definition 6, for any nonnegative integer vector n, let be the set of all points in that are reachable by ϵ-JF trajectories. Let and consider a Lyapunov function equal to the distance from W, that is, , for any . We show that this Lyapunov function has the desired properties.
The distance function is clearly Lipschitz continuous with a Lipschitz constant equal to one, which implies the first property in the definition of special ϵ-Lyapunov functions.
For the second property, Lemma 4(b) applies and shows that each set is ϵ-invariant. It can be seen that the union, W, of the ϵ-invariant sets is also ϵ-invariant. It then follows from Lemma 4(a) that W is ϵ-attracting. This proves the second property in Definition 5.
Note that every satisfies the inequality for some . Because the ϵ-JF condition holds for every , it follows that every ϵ-JF(n) trajectory with satisfies for all t. Hence, qm = 0 for all . Furthermore, because , we have . This establishes the third property in Definition 5.
Finally, W is closed under jumps along coordinates associated with heavy-tailed arrivals. Therefore, is nonincreasing along those directions, and the fourth property in Definition 5 follows. Thus, has all the required properties of special ϵ-Lyapunov functions. This completes the proof of one direction of the theorem.
9.2. Proof of the Reverse Direction
We continue with the proof of the reverse direction. We fix some and assume that there exists a special ϵ-Lyapunov function and let . The argument rests on the ϵ-invariance of W, which, in turn, relies on some properties of the MW dynamics that we discuss next.
For any , let
Here, the minimizer is unique.
Given a closed and convex set and a point , we denote by the projection of x on , defined as the point in that is closest to x. With this terminology, is the projection of the zero vector on the set .
In what follows, we also make use of an elementary property of projections: if is a closed convex set, b is some vector, and , then
As a consequence of this, for any , and x in ,
The set W is ϵ-invariant.
Because V is a special ϵ-Lyapunov function, it is Lipschitz continuous with Lipschitz constant one. Let be such that and consider fluid trajectories and corresponding to arrival rates and , respectively, initialized with the same nonnegative vector . Then,
This argument shows that, for any with and for any fluid trajectory corresponding to arrival rate , the distance from the set W cannot increase. In particular, if is initialized with , it must stay in W. Therefore, using the terminology in Definition 7, W is ϵ-invariant. □
From the third property in the definition of special ϵ-Lyapunov functions, we have and, therefore, . Thus, every ϵ-JF(n) trajectory starts in W. From the fourth property in the definition of special ϵ-Lyapunov functions, W is closed with respect to positive jumps along the coordinates associated with heavy-tailed arrivals. Using also Lemma 11 for the times between jumps, we see that, for any , every ϵ-JF(n) trajectory stays in W. Equivalently, for any , every ϵ-JF() trajectory stays in W. Finally, employing again the third property of special ϵ-Lyapunov functions, we have qm = 0 for all . This implies that for all ϵ-JF trajectories with and all . This establishes the ϵ-JF condition for all and completes the proof of Theorem 2.
9.3. Proof of Corollary 1
For part (a), fix some and suppose that there exists a special ϵ-Lyapunov function. Theorem 2 implies that the ϵ-JF() condition holds for every . It then follows from Theorem 1 that queue m is RDS for every .
For part (b), we fix some and assume that queue m is ϵ-RDS for all . In particular, queue m is RDS, and Theorem 1 implies that the -JF() condition holds for some . However, a close inspection of the proof of the reverse part of Theorem 1 reveals that we can in fact choose to be the same as ϵ. Thus, the ϵ-JF() condition holds, and Theorem 2 implies that there exists a special ϵ-Lyapunov function.
10. Discussion
In this section, we summarize some key points and conclude with a few open questions.
10.1. Framing and Results
We have addressed the problem of delay stability for a class of queueing networks that operate under the max-weight scheduling policy when some arrival processes are heavy-tailed and some are light-tailed. The overall purpose was to develop conditions for delay stability in terms of fluid-like models. However, as illustrated by the example in Section 3, delay instability can be the result of multiple coordinated large jumps. The probabilities of such large jumps are, in turn, affected by the tail exponents of the arrival processes. Given that traditional fluid models are oblivious to the tail exponents, we introduce JF models, a generalization that allows for jumps along the coordinates associated with heavy-tailed flows subject to a budget on the number of jumps with the budget being determined by the tail exponents.
At the same time, it became clear that tight conditions for delay stability that do not depend on the details of the arrival distributions are only possible under a suitable robust formulation with respect to both the arrival rates and the arrival process distributions. With a careful choice of definitions, we were finally able to establish necessary and sufficient conditions for robust delay stability in terms of ϵ-JF models.
In Section 3, we also discuss a related, so-called ZF condition. The ZF condition essentially examines fluid trajectories that start at zero and involve a single jump and leads to a necessary condition for delay stability (Markakis et al. 2018), but the question whether it can also form the basis for a sufficient condition was open. Our results show that, in order to obtain necessary and sufficient conditions, we need to examine a richer set of trajectories that involve multiple jumps.
Finally, earlier works (Markakis 2013, Markakis et al. 2018) show that Lyapunov functions with certain structural properties can yield sufficient conditions for delay stability. But it was not clear if and when delay stability is equivalent to the existence of such Lyapunov functions. Our results make progress toward establishing the completeness of such a Lyapunov-based methodology for the regime in which the heavy-tailed flows can have arbitrarily small tail exponents.
Our RJF condition is difficult to test for general networks. In some sense, this reflects the intrinsic complexity of the (robust) delay stability problem. The Lyapunov-based condition also appears to be hard to test for general networks.
10.2. Alternative Formulations
Given the complexity of the RJF condition, it is natural to inquire about simpler alternatives. For example, is it possible to obtain tight delay stability results (without robustness) if we consider JF models with a constant rate ? In the same spirit, can we restrict to the case in which all jumps in ϵ-JF models take place at the same time? Might it be easier to consider concrete arrival processes instead of focusing on delay stability for all arrival processes with given exponents?
For all three of these questions, the answer is negative. We discuss such variations and related (counter)examples in Appendix A.
10.3. Open Problems
We collect here a few open problems and possible future research directions.
Can the results be generalized to other scheduling policies, for example, MW-α policies (Neely 2010), an extension of the MW policy considered in this paper, or more generally to other stochastic networks whose stability has been studied using fluid models? One obstacle here is that our main result relies heavily on a particular fluctuation bound, which is established specifically for the MW dynamics (Sharifnassab et al. 2020). However, progress may be possible if we rely on alternative stochastic bounding techniques.
Can we identify some special classes of networks for which our criteria (either the RJF condition or the Luyapunov-based condition) can be tested in polynomial time?
Is the RJF condition, which involves time-varying with equivalent to a similar condition in which we only consider ϵ-JF trajectories with time-invariant with ? See Appendix A for further discussion and some conjectures.
We are grateful to Dr. Bert Zwart for providing us with pointers to relevant works in the literature and for informative technical discussions. Various stages of this work were completed while A. Sharifnassab was affiliated with the Massachusetts Institute of Technology, Sharif University of Technology, and the University of Alberta.
Appendix A. Exploring Alternative Formulations
Our formulation involves rate-robustness (robustness with respect to variations in the arrival rate) as well as distributional robustness (by considering the worst case over all distributions with given tail exponents). It would be preferable to develop conditions that characterize delay stability for specific systems (with fixed arrival rates and arrival distributions). However, this seems to be impossible for reasons that become clearer in this appendix. In particular, the distributional robustness aspect appears to be inevitable as long as we are aiming at conditions that are both necessary and sufficient; see Appendix A.4. For this reason, most of this appendix is devoted to exploring variants of rate robustness. In the interest of brevity, we keep the discussion informal without rigorous proofs.
A.1. Variations of Our Definitions
In this section, we present a number of variations to our definitions of RDS and the RJF condition. In later sections, we elaborate on their relations. Throughout this appendix, we assume that the tail exponent vector is fixed with every γj in . We also fix some . The various definitions that we offer differ only with respect to the choice of allowed functions .
Let be a class of discrete-time functions . We say that queue m is -RDS if it obeys Definition 1 except that the allowed arrival rates are also required to belong to .
We consider the following choices for , leading to three alternative definitions of robust delay stability, namely, G-RDS, C-RDS, and 0-RDS:
G (general): Here, we impose no additional restrictions on . Thus, G-RDS is identical to the RDS condition that we study.
C (constant): Here, we require to be constant. Effectively, we are considering small but constant perturbations of .
0 (zero): Here, we require to be equal to for all times t.
We continue similarly to define variants of the RJF condition. Let be a class of continuous-time functions . The -RJF condition is defined exactly as in Definition 4 except that we only consider ϵ-JF trajectories for which the rate function is also required to belong to the class . We consider four possible choices for , leading to four variants of the RJF condition, namely, UC-RJF, PC-RJF, C-RJF, 0-RJF:
UC (uniformly continuous): Here, we remove the requirement in Definition 3 that be piecewise constant. Instead, we require to be (i) piecewise continuous with a finite number of discontinuities and (ii) uniformly continuous on any interval in which it is continuous.
PC (piecewise constant): Here, is exactly as in Definition 3, and in particular, piecewise constant. Thus, the PC-RJF condition coincides with the RJF condition we are studying.
C (constant): Here, we require to be constant. Effectively, we are considering small but constant perturbations of .
0 (zero): Here, we require to be equal to for all times t.
A.2. Relations Between Alternative Definitions
In this section, we explore the relation between -RDS and -RJF conditions for different choices of F; see Figure A.1 for a visual summary.

It is clear that, when we restrict to a smaller class, the RDS or RJF conditions are easier to satisfy. Thus,
Furthermore, Theorem 1 establishes that G-RDS is equivalent to PC-RJF.
A.2.1. PC-RJF UC-RJF.
An arbitrary (continuous-time) function in the class UC can be approximated by a piecewise constant function with finitely many pieces uniformly over a compact set. Furthermore, it can be shown that, if we perturb by ϵ the vector that drives a JF-trajectory, the resulting trajectory is perturbed by at most ϵ over a time interval of length one. It follows that, if the UC-RJF condition fails, we can construct piecewise-constant approximations of that demonstrate that the PC-RJF condition also fails. Therefore, PC-RJFUC-RJF.
Taking Theorem 1 also into account, we see that all three conditions, G-RDS, PC-RJF, and UC-RJF, are equivalent. An alternative path to the same conclusion consists of modifying the proof in Section 8 and showing that G-RDSUC-RJF. This is possible, but quite tedious.
A.2.2. (C-RDS C-RJF) and (0-RDS 0-RJF).
These two implications are true because the proof in Section 8 applies verbatim. Indeed, if we assume that C-RJF fails to hold, we start with a trajectory that is driven by a constant rate and drives queue m to a positive value. The construction of the arrival process in Section 8.4 yields a process with a constant rate . Thus, the same proof establishes that failure of the C-RJF condition leads to failure of C-RDS; equivalently, C-RDS implies C-RJF. The argument that 0-RDS0-RJF is the same.
A.2.3. 0-RJF 0-RDS.
This fact exemplifies the difficulty of obtaining necessary and sufficient conditions in the absence of robustness considerations with respect to the arrival rates.
The argument is simple. Consider a single queue that is served at unit rate and let . Suppose that the tail exponent is larger than one so that no jumps are allowed. In that case, there is only one possible JF trajectory, which obeys . When initialized at zero, the JF trajectory stays at zero. Thus, the 0-RJF condition holds for the single queue of interest. On the other hand, as long as the arrivals are not deterministic, the stochastic system is marginally unstable, the expected queue length grows to infinity, and the 0-RDS condition does not hold.
This example involves a system operating at the boundary of its capacity region (marginally unstable). We can also construct simple examples (involving two queues) in which the system operates in the interior of the stability region, is stable and satisfies the 0-RJF condition but is not 0-RDS. Such an example (which we omit) involves a system that operates at the threshold between robust delay stability and instability.
A.2.4. 0-RJF C-RJF.
This is again a simple observation. Consider the same single-queue system as in the previous paragraph with . As long as the rate is fixed at one, the JF trajectory stays at zero, and the 0-RJF condition holds. On the other hand, a small constant perturbation that results in yields a divergent JF trajectory, and therefore, the C-RJF condition does not hold.
A.3. Conjectures and Open Problems
We list here a number of questions and conjectures.
A.3.1. 0-RDS C-RDS.
We conjecture that, when ,7 0-RDS implies C-RDS. Ultimately, this amounts to showing that the set of positive arrival rate vectors for which the system is delay stable (robustly over all distributions with given tail exponents) is open. The rationale behind this conjecture is that, in more standard settings (ordinary stability) the set of positive vectors that lead to a stable system is open.
A.3.2. C-RJF PC-RJF.
We conjecture that this implication is true although we do not see how to establish it. If it is true, it would follow from the diagram in Figure A.1 that C-RJF and C-RDS are equivalent to G-RDS, UC-RJF, and PC-RJF.
An indirect approach to establishing the conjecture is to show that (i) C-RJF C-RDS and (ii) C-RDS G-RDS. However, this appears to be difficult. Our proof that PC-RJF implies G-RDS involves the set W of points reachable by ϵ-JF trajectories; see Definition 6. However, when we restrict to be constant, this set is no longer ϵ-invariant, and Lemma 4(ii) fails to go through.
A.3.3. Generic Considerations.
A fundamental reason behind the mismatch between 0-RJF and G-RDS is that, at least for simple examples, the set of nonnegative nominal rates for which 0-RJF holds is closed, whereas the set of positive nominal rates for which G-RDS holds is open. It is conceivable, however, that one set is the closure of the other and the difference between the two sets is just a lower dimensional boundary. This leads us to the conjecture that 0-RJF and G-RDS are generically equivalent.
Let us fix a network and some . The set of nonnegative nominal arrival vectors for which the 0-RJF condition holds but G-RDS does not hold has zero Lebesgue measure.
A.4. The Details of the Arrival Distribution May Matter
Our discussion so far has been about distributionally robust results, dealing with delay stability for all arrival distributions with the given tail exponents . The reason for this was that JF models cannot take into account any further properties of these distributions.
Once we start inquiring about delay stability for a fixed, fully specified system, the situation is more complex: necessary and sufficient conditions for delay stability appear to be impossible. We illustrate the situation by stating a positive result and disciussing the obstacles in establishing a converse.
A.4.1. Delay Stability Implies the 0-RJF Condition Under a Regularity Assumption.
Suppose that a particular system (with a constant arrival rate and given, i.i.d. arrival distributions) is delay stable. Suppose, furthermore, that the distribution of each satisfies (3) with γ replaced by the appropriate γj. Then, it can be shown that the 0-RJF condition holds. The argument involves similar ideas as the proof in Section 8. That is, we can show that the stochastic system can track an ϵ-JF trajectory with significant probability.
Note, however, that the 0-RJF condition does not imply delay stability,even under such a regularity assumption. The argument is the same as in our earlier example that showed that the 0-RJF condition does not imply the 0-RDS condition.
A.4.2. Without a Regularity Assumption, Delay Stability Need Not Imply the 0-RJF Condition.
In contrast to the aforementioned result, we have strong reasons to conjecture that there exist systems that are delay stable, and yet, the 0-RJF condition fails to hold. The intuition behind this conjecture is as follows.
Consider a system with two heavy-tailed arrival streams together with some light-tailed ones. Suppose that the tail exponents of the heavy-tailed arrivals are larger than 1/3. We can arrange the system so that a JF trajectory drives the light-tailed queue of interest to a positive value if and only if we have one jump at each heavy-tailed queue, the two jump times are approximately equal, and the jump sizes are comparable (within a constant factor of each other). Such a system does not satisfy the 0-RJF condition.
As in the proof in Section 8, we might expect that the stochastic system can track this JF trajectory. However, we can arrange the arrival process distributions for the two heavy-tailed queues to be such that their supports are wide apart. For example, one distribution may be supported on integers of the form and the other on integers of the form . In that case, equal-size jumps are essentially impossible. As a consequence, the stochastic system should be unable to emulate the JF trajectory, the instability mechanism suggested by the JF trajectory need not be present, and the queue of interest may turn out to be delay stable.
A.5. The Timing of the Jumps
The definition of ϵ-JF trajectories allows for jumps at different times. On the other hand, our examples so far rely on jumps that happen simultaneously. This raises the question whether the RJF condition is equivalent to an analogous condition in which we only consider trajectories with simultaneous jumps. It turns out that this is not possible. We give an example with four queues in which an ϵ-JF trajectory drives a certain queue to a positive value, but this is only possible if we allow jumps to occur at different times.
Consider the system in Figure A.2. The first queue receives heavy-tailed arrivals with , whereas the three other queues receive light-tailed arrivals. There are three possible service vectors as shown in the figure, and the arrival rate vector is . Note that the condition allows up to two jumps at queue 1. If we restrict to simultaneous jumps, this essentially limits us to a single jump at queue 1.

Notes. In the example of this figure, if q1 undergoes a single jump at time 0, q4 stays at zero. However, two jumps in q1 at suitably arranged times can result in a positive q4.
Suppose that q1 has a jump of size 27 at time 0. Then, the 0-JF trajectory is piecewise linear with breakpoints , and . This can be easily verified by noticing that the MW policy chooses service vector for and service vector for , and the service capacity is split between and with ratios 5/8 and 3/8 for . It then follows from the form of this piecewise linear fluid trajectory that, given a single jump at time 0, q4 stays at zero for all subsequent times. We now argue that this is not the case if q1 undergoes two jumps at different times.
Suppose that q1 has a jump of size 27 at time 0 and a jump of size 2 at time 5. Let be the associated jumping fluid trajectory. Then, right before time 5, we have . Therefore, after the second jump, we have . In this case, is the dominant service vector for some positive time interval starting from time 5. Because q4 receives no service under , it starts to build up and become positive.
In this example, the 0-RJF condition fails to hold, and the system is not 0-RDS. On the other hand, if we were to restrict to simultaneous jumps, we would not be able to tell that this is the case. Finally, using the same example, we see that we should also consider nonsimultaneous jumps when examining the C-RJF or PC-RJF conditions.
Appendix B. Proofs of Lemmas for the First Direction of Theorem 1 (ϵ-JF RDS)
B.1. Proof of Theorem 4
We compare the fluid trajectory , which is initialized with , with another fluid trajectory , initialized with . From the triangle inequality,
We apply Theorem 3 to bound the first term on the right-hand side. Because of the nonexpansive property of the MW dynamics, we also have
B.2. Proof of Lemma 1
Let us fix T throughout this proof. Recall the constant introduced in the context of (19). For , we define
We then let , and . Note that, in all cases, we have so that . As argued in Claim 1, we can and do (without loss of generality) assume that , and thus, . Finally, in view of (19), it can be seen that, for any nonnegative integer vector n,
For every j and according to our definition (5) of the tail exponent of an arrival process, there is a random variable that dominates for all and for which all moments of order less than are finite; see (3). We define
For , and , let
For any j and , the Markov inequality yields
Let and note that . Because for every j, there exists some such that, if , then
We fix such an M1. Then, for , we have
Let be the event . Recall that Nj stands for the number of jumps at queue j during the interval . For any j and any nonnegative integer n, we have
Here, the first inequality follows from the union bound, the equality is from the independence of the events , and the last inequality is due to (B.5). Therefore, for any ,
Hence,
B.3. Proof of Lemma 2
The Bernstein inequality (see, e.g., (1.21) in appendix 1 of Anthony and Bartlett 2009) asserts that, for any , we have
We note that . Plugging this into (B.10), we obtain
This implies (20) and completes the proof of Lemma 2.
B.4. Proof of Lemma 3
For every j, and according to our definition (5) of the tail exponent of an arrival process, there is a random variable that dominates for all and for which all moments of order less than are finite; see (3). Because , we have . Therefore, there exists a constant such that, for any and every j,
Note that the choice of M2 is independent of T. For the rest of the proof, we fix M2 and assume that .
Recall that so that
Therefore,
We now introduce some “simpler” events whose occurrence are shown to imply the event that is introduced in (21):
We now argue that the event implies the event defined in (21). Indeed, suppose that event occurs. Then, as long as , we obtain
To complete the proof, we derive an upper bound for . As a first step, we obtain a relation between various constants, which reflects the fact that η is chosen large enough. We have
Finally, we note that, for any and every j,
For any t0 and t with , using the fact that is bounded above by , we have
B.5. Proof of Lemma 4
We start with the proof of the first part and assume that W is ϵ-invariant. We prove the result under the additional assumption that the set W is closed. This is without loss of generality for the following reason. Given an ϵ-invariant set W, let be its closure. Because fluid trajectories under the MW policy are continuous functions of their initial conditions, it is not hard to see that is also ϵ-invariant. Once we show the result for closed sets, we have established that is ϵ-attracting. Finally, because for all x, we can conclude that W is also ϵ-attracting.
Having assumed that W is closed, we now consider a fluid trajectory initialized with with . Then, there exists some , which is closest to . Let be a fluid trajectory initialized at and corresponding to the rate vector
Because W is ϵ-invariant and , we have for all . In particular, for all . Furthermore, equality holds at time t = 0. This implies that
From the fluid equations (see Definition 2), we have . Equivalently, there exist coefficients for such that and
Similarly, let for be a set of coefficients such that and
For any and any because, by definition, is a maximizer of over all , we have . Similarly, . Combining these two inequalities, we obtain
Because , it follows that
Using (B.26), we have
Here, the second equality follows from (B.27) and (B.28), the second inequality is due to (B.30), and the fourth equality uses the definition of in (B.25). Thus, W is ϵ-attracting.
For the proof of the second part, we fix some n and consider the set as in Definition 6. Let x be an element of . Then, there exists an ϵ-JF(n) trajectory that reaches x. Consider now a fluid trajectory , corresponding to , for some with and initialized at x. The concatenation of the trajectories and is an ϵ-JF(n) trajectory, and therefore, any point that it can reach also belongs to . Thus, is ϵ-invariant as claimed.
B.6. Proof of Lemma 5
As in the statement of the lemma, we fix some and times that satisfy . We also fix a sample path under which the event occurs, and the interval (t0, t1) is jump-free.
Let be a fluid trajectory corresponding to the arrival rate and initialized with . Then,
Moreover, from Lemma 4, is ϵ-attracting. Furthermore, because (t0, t1) is a jump-free interval, we have . Therefore,
We now consider the effect of possible jumps at time t0. Let be the number of jumps that occur at time t0 in different entries of . Without loss of generality, suppose that undergo a jump at time t0. Let J be an -dimensional vector with the first k entries equal to and all other entries equal to zero. Then, is an -dimensional vector whose first k entries are zero; all other entries are jump-free and are, therefore, bounded by . Thus,
A key consequence of our definition of the sets W(t) is that if t0 is a jump time, then some components of the vector are larger than those of so that the set is larger than the set . In particular, if , then . Now, let x be a point in the closure of that is closest to , that is,
Because , it follows that . Therefore,
Combining (B.32), (B.33), and (B.35), we obtain, for ,
B.7. Proof of Lemma 6
Let N(t) be the cardinality of , that is, the number of jumps up to time t. We also use the convention . When the sample path is such that the event occurs, then for all t < T. Thus, for any ,
We fix a sample path of the arrival process under which both events and occur. We use strong induction to prove that, for any ,
To establish the base case of the induction, we use the assumption and the fact that (because ϵ-JF trajectories are zero for negative times). Thus, , and therefore, (B.38) holds for t = 0.
For the induction step, we consider a time and assume that (B.38) holds for all . We show that (B.38) also holds for time t1. Let
Note that either or so that we always have . We consider two cases.
For the first case, we assume that the interval is jump-free.9 We consider two subcases. If , then
For the second subcase, we assume that , in which case, . Then,
For the second case, we assume that there is a jump in the interval . Let be the last jump time in the interval so that is jump-free. Because is the last jump time, we have
Finally, letting t = T, (B.38) becomes
Appendix C. Proof of Lemmas for the Second Direction of Theorem 1: RDS-JF Condition
C.1. Proof of Lemma 7
Suppose that the ϵ-JF() condition fails to hold. Then, there exists a nonnegative integer vector with , an ϵ-JF() trajectory , and some time T such that . If T = 0, we can use right-continuity of trajectories to see that, without loss of generality, we can take T to be positive. We then define a new trajectory by letting for all . It is not hard to verify that is also an ϵ-JF() trajectory, and satisfies so that the first property is satisfied.
Suppose now that some of the jumps of happen after time 1. Let n be the vector that counts the number of jumps that take place until time 1. Starting with , we eliminate the jumps that happen after time 1 to obtain an ϵ-JF(n) trajectory with , all jumps in , and . By slightly perturbing the jump times and using a continuity argument, we can ensure that no two components have simultaneous jumps and also that all jump times belong to (0, 1) so that properties (b) and (c) in Lemma 7 are satisfied.
Finally, we can replace the arrival rates that drive the ϵ-JF trajectory by , where is a small positive constant. This ensures that . Furthermore, using a continuity argument and as long as is small enough, the property is preserved. This proves condition (d) in Lemma 7 and concludes the proof of the lemma.
C.2. Proof of Lemma 8
For simplicity, and without loss of generality, we present the proof for the case in which . We let be a large enough constant such that, for all and all k, we have
This is possible because, according to the definition of d in (33), we have .
We fix some as well as some . We aim to show that the process has a substantial probability of a jump of size approximately akT during the interval . Within the proof of this lemma, we use the symbol j (instead of jk) to denote the index of the queue at which the kth jump took place.
From (C.1), we have . We then obtain, for any t, any j, and any ,
We define . Then,
As in (39), but with , we define
For , let be the event that and . Note that the term is omitted from Ut, and therefore, and Ut are independent. Thus, using (C.3) and (C.6), we obtain
In light of the definition of d in (33), we have . Therefore,
Thus, for any with , if , then
Here, the first inequality follows because, if , then is one of the summands in the definition (C.5) of , and the last inequality is due to (C.8). Consequently, if , then holds for no with , that is, for , the events and are disjoint.
We now claim that, for any , the event implies the event defined in (40). Indeed, when event occurs, we obtain
C.3. Proof of Lemma 9
This proof is similar to the proof of Lemma 3 in Appendix B.4 although some of the details are different.
Recall that r is the number of piecewise constant pieces in the trajectory . We fix some and let
There exists a such that, for any , any j, and any , we have
Because all γj are positive, we can fix some δ such that for every j. Then, the density in Definition 8 decays at least as fast as . More concretely, there exists a constant χ such that for all j and t, we have
We then have, for any time , any , and any j,
It then follows that, as T goes to infinity, both and go to zero uniformly over all j and τ. Therefore, there exists a such that, for any , (C.15) and (C.14) hold, and the claim follows. □
For the rest of the proof, we fix such a and assume that . For any and every j, let
We define the “no jumps” event as follows:
Then,
Consider now the following events:
Note that, if all of the events occur, then also occurs. By applying the union bound to the complement of these events, we have
Consider a sample path under which the events and occur. Then, for any ,
For any and every j and as in (B.22), we have
For any , we have
C.4. Proof of Lemma 10
The proof of this lemma is essentially a continuity argument together with an induction that pieces together different time segments.
For simplicity and without loss of generality, we assume that the constant t0 in the statement of the lemma is equal to zero. Note, however, that, with this convention is, in general, nonzero.
For any , let
Noting also that , we obtain
For , let , where the minimum is taken component-wise; thus, is the actual service received at time τ. It then follows from (1) that
We start by considering the intervals associated with jumps for . We are working with a sample path for which the event occurs, Therefore,
Combining (C.33) with (C.31), it follows that, for ,
We have so far established that, if the two trajectories and are close at the beginning of an interval , they are also close at the end. We now need to establish a similar conclusion over intervals of the form . We wish to capitalize on Theorem 4. That result, however, refers to fluid models with constant arrival rates. In contrast, our stochastic process has a piecewise constant arrival rate, and the same holds for the associated JF trajectory. We deal with this issue by applying Theorem 4 repeatedly for each one of the piecewise constant segments.
Let us fix some and recall that r is an upper bound on the number of subintervals in during which stays constant. Let us then fix some times θi for such that
Under our assumption that the sample path satisfies the event , we see that, during each one of the intervals and for , we have
We now note that the function defined by is also an ϵ-JF trajectory and, in particular, is a fluid trajectory during each interval . We apply Theorem 4 over this interval. Using also the fact that , we obtain
By summing such inequalities, for and using the facts , and , we obtain
We now add (C.34) and (C.37) to obtain
We finally sum these bounds for and use the fact to conclude that
The interval is to be treated a little differently as is not a jump time. Even so, the argument leading to (C.37) applies verbatim and shows that
We now let be large enough so that, for any , we have . As long as and using the assumption , we obtain
Finally, using , we have
1 This phenomenon can arise under other scheduling policies as well, for example, the generalized processor sharing policy (Borst et al. 2003).
2 Strictly speaking, the results in Section 5 only establish the absence of robust delay stability, not delay instability for the specific arrival process distributions of our example. However, a slight modification of the proof in Section 8 shows that queue 3 is indeed delay unstable.
3 Recall our assumption in Section 2 that, for any , the set also contains all vectors that result from setting some entries of equal to zero. It is shown in proposition 2 of Sharifnassab et al. (2020) that, under this assumption, the fluid model of Definition 2 is equivalent to the more standard, albeit more complicated, definitions of fluid models based on differential equations with boundary conditions.
4 If and nj = 0, we use the convention .
5 We note that n may be equal to zero. For example, for a single unstable queue, an ϵ-JF trajectory becomes positive even in the absence of jumps. Such a system is not RDS, consistent with our result.
6 To avoid notation clutter, we present the proof as if or were integer, which is not necessarily the case. Everything goes through, with occasional trivial modifications, if a sum of the form is interpreted as .
7 The reason for the condition is that, if , then a system is trivially 0-RDS, but a small perturbation that leads to positive arrival rates can result in a delay unstable system.
8 In case , the interval (t0, t1) is empty, and the term involving a maximum over is interpreted as zero.
9 This includes the case in which so that the interval (t0, t1) is empty.
References
- (1989) How large delays build up in a GI/G/1 queue. Queueing Systems 5(4):345–367.Google Scholar
- (2009) Neural Network Learning: Theoretical Foundations (Cambridge University Press, New York).Google Scholar
- (1996) Rare events in the presence of heavy tails. Glasserman P, Sigman K, Yao DD, eds. Stochastic Networks (Springer, New York), 197–214.Google Scholar
- (2001) Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11(4):1384–1428.Google Scholar
- (2003) Reduced-load equivalence and induced burstiness in GPS queues with long-tailed traffic flows. Queueing Systems 43(4):273–306.Google Scholar
- (2019) Efficient rare-event simulation for multiple jump events in regularly varying random walks and compound Poisson processes. Math. Oper. Res. 44(3):919–942.Link, Google Scholar
- (1980) Conditioned limit theorems for random walks with negative drift. Zeitschrift Wahrscheinlichkeitstheorie Verwandte Gebiete 52(3):277–287.Google Scholar
- (2012) On large delays in multi-server queues with heavy tails. Math. Oper. Res. 37(2):201–218.Link, Google Scholar
- (2011) An Introduction to Heavy-Tailed and Subexponential Distributions (Springer, New York).Google Scholar
- (2006) Resource Allocation and Cross-Layer Control in Wireless Networks (Now Publishers Inc., Hanover, MA).Google Scholar
- (2003) Asymptotic loss probability in a finite buffer fluid queue with hetergeneous heavy-tailed on–off processes. Ann. Appl. Probab. 13(2):576–603.Google Scholar
- (2016) Optimal heavy-traffic queue length scaling in an incompletely saturated switch. ACM SIGMETRICS Performance Evaluation Rev. 44(1):13–24.Google Scholar
- (2013) Scheduling in switched queueing networks with heavy-tailed traffic. Unpublished PhD thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- (2014) Max-weight scheduling in queueing networks with heavy-tailed traffic. IEEE/ACM Trans. Networking 22(1):257–270.Google Scholar
- (2018) Delay analysis of the max-weight policy under heavy-tailed traffic via fluid approximations. Math. Oper. Res. 43(2):460–493.Link, Google Scholar
- (2015) When heavy-tailed and light-tailed flows compete: The response time tail under generalized max-weight scheduling. IEEE/ACM Trans. Networking 24(2):982–995.Google Scholar
- (2020) The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation (Cambridge University Press, Cambridge, UK).Google Scholar
- (2010) Stochastic Network Optimization with Application to Communication and Queueing Systems. Ying L, ed., Synthesis Lectures on Communication Networks and Algorithms 3.1 (Springer Nature, Cham, Switzerland).Google Scholar
- (1975) On the tails of waiting-time distributions. J. Appl. Probab. 12(3):555–564.Google Scholar
- (2012) Switched networks with maximum weight policies: Fluid approximation and multiplicative state space collapse. Ann. Appl. Probab. 22(1):70–127.Google Scholar
- (2019) Sensitivity to cumulative perturbations for a class of piecewise constant hybrid systems. IEEE Trans. Automatic Control 65(3):1057–1072.Google Scholar
- (2020) Fluctuation bounds for the max-weight policy with applications to state space collapse. Stochastic Systems 10(3):223–250.Link, Google Scholar
- (1992) Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Trans. Automatic Control 37(12):1936–1948.Google Scholar
- (1977) Asymptotic behaviour of Wiener-Hopf factors of a random walk. Stochastic Processes Appl. 5(1):27–37.Google Scholar
- (2004) Exact asymptotics for fluid queues fed by multiple heavy-tailed on-off flows. Ann. Appl. Probab. 14(2):903–957.Google Scholar

