On Queue-Size Scaling for Input-Queued Switches

We study the optimal scaling of the expected total queue size in an $n\times n$ input-queued switch, as a function of the number of ports $n$ and the load factor $\rho$, which has been conjectured to be $\Theta (n/(1-\rho))$. In a recent work, the validity of this conjecture has been established for the regime where $1-\rho = O(1/n^2)$. In this paper, we make further progress in the direction of this conjecture. We provide a new class of scheduling policies under which the expected total queue size scales as $O(n^{1.5}(1-\rho)^{-1}\log(1/(1-\rho)))$ when $1-\rho = O(1/n)$. This is an improvement over the state of the art; for example, for $\rho = 1 - 1/n$ the best known bound was $O(n^3)$, while ours is $O(n^{2.5}\log n)$.


1.
Introduction.An input-queued switch is a popular and commercially available architecture for scheduling data packets in an internet router.In general, an input-queued switch maintains a number of virtual queues to which packets arrive.Packets to be served at each time slot are selected according to a scheduling policy, subject to system constraints that specify which queues can be served simultaneously.
The input-queued switch model is an important example of so-called "stochastic processing networks," formalized by Harrison [4,5], which have become a canonical model of a variety of dynamic resource allocation scenarios.While the most basic questions concerning throughput and stability 1 are relatively well-understood for general stochastic processing networks (see e.g., [9], [7], [6], [2], [16], [10], [17]), much less is known on the subject of more refined performance measures (e.g., results on the distribution and Table 1 Best known scalings of the expected total queue size in various regimes.Here, ρ is the load factor and n is the number of input ports.

Regime
Scaling References of the entire batch.By choosing the batch length large enough (deterministically or randomly), the total number of arriving packets is close to its expected value and can be served efficiently.In general, a longer batching interval improves efficiency, because the effect of random fluctuations is less pronounced, but on the other hand leads to larger delays and queue sizes.
For this reason, a good batching policy, as for example in [11], selects the smallest possible batch length that will guarantee stability; in [11], this led to a bound of O n log n (1−ρ) 2 on the expected total queue size.Given the stability requirement, we cannot hope to improve delay by reducing the batch length.On the other hand, the policy that we consider starts serving packets from a given batch a lot earlier, before the arrival of the entire batch.By starting to serve early, the expected delay (and hence queue size) is reduced.When the arrival rates at each queue are all equal, we show that the arrival process has sufficient regularity at a time scale shorter than the batch length.Consequently, the policy can indeed start serving the arriving packets early, while making sure that the stochastic fluctuations lead to only a small number of unserved packets, which can be "cleared" efficiently at the end of the batch.The combination of these ideas results in substantial improvement over the standard batching policy.
A few remarks are in order regarding the proposed policy.Our policy relies on the assumption of uniform arrival rates.In contrast, some existing policies, such as the maximum weight policy or the one in [14], are based only on the observed system state (the queue sizes) and are effective even with non-uniform arrival rates.However, we believe that our policy and its analysis can be modified to account for general (non-uniform) arrival rates.
1.1.Organization.The rest of the paper is organized as follows.In Section 2, we describe the input-queued switch model.In Section 3, we state our main theorem.In Section 4, we introduce some preliminary facts and theorems, which will be used in later sections.In Section 5, we describe our policy.In Section 6, we provide the proof of the main theorem.We conclude with some discussion in Section 7.
2. Input-queued switch model.An n × n input-queued switch has n input ports and n output ports.The switch operates in discrete time, indexed by τ ∈ {1, 2, . . .}.In each time slot, and for each port pair (i, j), a unit-sized packet may arrive at input port i destined for output port j, according to an exogenous arrival process.Let A i,j (τ ) denote the cumulative number of such arriving packets during time slots 1, . . ., τ .We assume that the processes A i,j (•) are independent for different pairs (i, j).Furthermore, for every inputoutput pair (i, j), {A i,j (τ ) − A i,j (τ − 1)} τ ∈N is a Bernoulli process with parameter ρ/n, with the convention that A i,j (0) = 0.In particular, E[A i,j (τ )] = ρ n τ , for all i, j, and all τ ≥ 1.
We are only interested in systems that can be made stable under a suitable policy, and for this reason, we assume that ρ < 1, i.e., that the system is underloaded.Furthermore, we consider a system load ρ of the form ρ = 1 − 1/f n , where the sequence {f n } satisfies f n ≥ n for all n.
For every input-output pair (i, j), the associated arriving packets are stored in separate queues, so that we have a total of n 2 queues.Let Q i,j (τ ) be the number of packets waiting at input port i, destined for output port j, at the beginning of time slot τ .
In each time slot, the switch can transmit a number of packets from input ports to output ports, subject to the following two constraints: (i) each input port can transmit at most one packet; and, (ii) each output port can receive at most one packet.In other words, the actions of a switch at a particular time slot constitute a matching between input and output ports.
A matching, or schedule, can be described by an array σ ∈ {0, 1} n×n , where σ i,j = 1 if input port i is matched to output port j, and σ i,j = 0 otherwise.Thus, at any given time, the set of all feasible schedules is A scheduling policy (or simply policy) is a rule that, at any given time τ , chooses a schedule σ(τ ) = [σ i,j (τ )] ∈ S, based on the past history and the current queue sizes Q i,j (τ ).If σ i,j (τ ) = 1 and Q i,j (τ ) > 0, then one packet is removed from the queue associated with the pair (i, j).
Regarding the details of the model, we adopt the following timing conventions.At the beginning of time slot τ , the queue sizes Q i,j (τ ) are observed by the policy.The schedule σ(τ ) is applied in the middle of the time slot.Finally, at the end of the time slot, new arrivals happen.Mathematically, for all i, j, and τ ∈ N, we have where for a set B, 1 B is its indicator function.We assume throughout the paper that the system starts empty, i.e., Q i,j (1) = 0, for all i, j.Summing Eq. ( 1) over time and using the assumption Q i,j (1) = 0, we get the following equivalent expression, for τ ∈ N: We define so that (2) reduces to We call S i,j (τ ) the actual service offered to queue (i, j) during the first τ time slots.Note that S i,j (τ ) may be different from τ t=1 σ i,j (t), which is the cumulative service offered to queue (i, j) during the first τ slots.
3. Main Result.The main result of this paper is as follows.
Theorem 3.1.Consider an n × n input-queued switch in which the arrival processes are independent Bernoulli processes with a common arrival rate ρ/n, where ρ = 1 − 1/f n and f n ≥ n.For any n, there exists a scheduling policy under which the expected total queue size is upper bounded by cn 1.5 where c is a constant that does not depend on n.
for i = 1, 2, . . ., m.Let X = m i=1 X i , so that E[X] = mp.Then, for any x > 0, we have Kingman Bound for the discrete-time G/G/1 Queue.Consider a discretetime G/G/1 queueing system.More precisely, let X(τ ) be the number of packets that arrive during time slot τ , let Y (τ ) be the number of packets that can be served during slot τ , and let Z(τ ) be the queue size at the beginning of time slot τ .Suppose that the X(τ ) are i.i.d.across time, and so are the Y (τ ).Furthermore, the processes X(•) and Y (•) are independent.The queueing dynamics are given by ( 5) Suppose that λ < µ.The following bound is proved in [15] (Theorem 3.4.2),using a standard argument based on a quadratic Lyapunov function.
Theorem 4.2 (Discrete-time Kingman bound).Suppose that Z(1) = 0 and that λ < µ.Then, In fact, the above theorem is proved in [15] for the expected queue size in steady state.However, since we assume that Z(1) = 0, a standard coupling argument shows that the same bound holds for E[Z(τ )] at any time τ .
Optimal Clearing Policy.Similar to [11], we will use the concept of the minimum clearance time of a queue matrix.Consider a certain queue matrix , where Q i,j denotes the number of packets at input port i destined for output port j.Suppose that no new packets arrive, and that the goal is to simply clear all packets present in the system, in the least possible amount of time, using only feasible schedules/matchings.We call this minimal required time the minimum clearance time of the given queue matrix, and we denote it by L.Then, L is characterized exactly as follows.
Theorem 4.3.Let [Q i,j ] n i,j=1 be a queue matrix.Let be the ith row sum and the jth column sum, respectively.Then, the minimum clearance time, L, is equal to the largest of the row and column sums: The proof of Theorem 4.3 is a simple modification of the proof of Theorem 5.1.9in [3].
Note that in each time slot at most one packet can depart from each input/output port, and therefore each R i and C j is decreased by at most 1.Thus, the minimum clearance time cannot be smaller than the right-hand side of (7).Theorem 4.3 states that there actually exists an optimal clearing policy that clears all packets within exactly max {max i R i , max j C j } time slots.

Policy Description.
To describe our policy, we introduce three parameters, b, d, and s, which specify the lengths of certain time intervals, and which, in turn, delineate the different phases of the policy.They are given by Without loss of generality, we will always assume that n ≥ 3, so that log f n > 1.Here c b , c d , and c s are positive constants (independent of n) that will be appropriately chosen.As will be seen in the course of the proof, it suffices to choose them so that (11) and which we henceforth assume.We note that the above inequalities do not necessarily lead to the best choices for these constants but they are imposed in order to simplify the details of the proof.For an n × n input-queued switch, we also define n particular schedules σ (1) , σ (2) , . . ., σ (n) .For m ∈ {1, 2, . . ., n}, σ (m) is defined by To illustrate, when n = 3, the schedules σ (1) , σ (2) , and σ (3) are given by Note that the n × n matrix of all 1s.We now proceed with the description of the policy.Time is divided into consecutive intervals, which we call arrival periods, of length b.For k = 0, 1, 2, . .., the kth arrival period consists of slots kb + 1, kb + 2, . . ., (k + 1)b.Arrivals that occur during the kth arrival period are said to belong to the kth batch.
The general idea behind the policy is as follows.The policy aims to serve all of the packets in the kth batch during the kth service period, of length b, which is offset from the arrival period by a delay of d.Thus, the kth service period consists of time slots kb + d + 1, . . ., (k + 1)b + d.If the policy does not succeed in serving all of the packets in the kth batch, the unserved packets will be considered backlogged and will be handled together with newly arriving packets from subsequent batches, in subsequent service periods.As it will turn out, however, the number of backlogged packets will be zero, with high probability.
We now continue with a precise description, by considering what happens during the kth service period.Note that the time slots kb + 1, . . ., kb + d do not belong to the kth service period.Packets from the kth batch will accumulate during these time slots, but none of them will be served.At the beginning of the kth service period (the beginning of time slot bk + d + 1), we may have some backlogged packets from previous service periods, and we denote their number by B k .We assume that B 0 = 0.
The kth service period consists of three phases, which are described below and are illustrated in Fig. 1.
7. Discussion.We presented a novel scheduling policy for a [general] n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper) the policy description and analysis are likely to more involved.
Finally, for the regime where ρ ≈ 1 − 1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [12]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of policies of such elaborate policies appears to be a difficult challenge.

Discussion. We presented a novel scheduling policy for a [general]
n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper) the policy description and analysis are likely to more involved.
Finally, for the regime where ρ ≈ 1 − 1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [12]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of policies of such elaborate policies appears to be a difficult challenge.

Discussion. We presented a novel scheduling policy for a [general]
n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper) the policy description and analysis are likely to more involved.
Finally, for the regime where ρ ≈ 1 − 1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [12]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of policies of such elaborate policies appears to be a difficult challenge.
7. Discussion.We presented a novel scheduling policy for a [general] n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper) the policy description and analysis are likely to more involved.
Finally, for the regime where ρ ≈ 1−1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [12]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of policies of such elaborate policies appears to be a difficult challenge.

SHAH & TSITSIKLIS & ZHONG
7. Discussion.We presented a novel scheduling policy for a [gene n × n input-queued switch.In the regime where the system load satis ρ = 1 − 1/n, and the arrival rates at the different queues are all eq our policy achieves an upper bound of order O(n 2.5 log n) on the expec total queue size, a substantial improvement upon earlier upper bounds, a which were of order O(n 3 ), ignoring poly-logarithmic dependence on n. policy is of the batching type.However, instead of waiting until an en batch has arrived, our policy only waits for enough arrivals to take place the system to exhibit a desired level of regularity, and then starts serv the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is hea dependent on the fact that all arrival rates are the same.While we bel that similar policies can be devised for arbitrary arrival rates (within regime considered in this paper) the policy description and analysis are lik to more involved.
Finally, for the regime where ρ ≈ 1− 1/n, there is a Ω(n 2 ) lower bound the expected total queue size under any policy (see [12]), whereas our up bound is of order O(n 2.5 log n).It is an interesting open question whet this gap between the upper and lower bound can be closed.Our policy u a prespecified sequence of schedules (round-robin) until the entire batch arrived and then uses an "adaptive" sequence of schedules to clear remain packets after the end of the batch.Within the class of policies of this ty with perhaps different choices of the parameters involved, it appears to impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in or to come closer to the Ω(n 2 ) lower bound, we will have to use an adap sequence of schedules early on, before the entire batch has arrived.In f if one were to achieve an upper bound close to O(n 2 ), we would have approximately constant expected number of packets in each queue.T means that with positive probability, many of the queues will be em Therefore, an elaborate policy would be needed to avoid offering servic empty queues and thus avoid queue buildup.But the analysis of policie such elaborate policies appears to be a difficult challenge.7. Discussion.We presented a novel scheduling policy for a [gene n × n input-queued switch.In the regime where the system load satis ρ = 1 − 1/n, and the arrival rates at the different queues are all eq our policy achieves an upper bound of order O(n 2.5 log n) on the expec total queue size, a substantial improvement upon earlier upper bounds, a which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.O policy is of the batching type.However, instead of waiting until an en batch has arrived, our policy only waits for enough arrivals to take place the system to exhibit a desired level of regularity, and then starts serv the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is hea dependent on the fact that all arrival rates are the same.While we bel that similar policies can be devised for arbitrary arrival rates (within regime considered in this paper) the policy description and analysis are lik to more involved.
Finally, for the regime where ρ ≈ 1−1/n, there is a Ω(n 2 ) lower bound the expected total queue size under any policy (see [12]), whereas our up bound is of order O(n 2.5 log n).It is an interesting open question whet this gap between the upper and lower bound can be closed.Our policy u a prespecified sequence of schedules (round-robin) until the entire batch arrived and then uses an "adaptive" sequence of schedules to clear remain packets after the end of the batch.Within the class of policies of this ty with perhaps different choices of the parameters involved, it appears to impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in or to come closer to the Ω(n 2 ) lower bound, we will have to use an adap sequence of schedules early on, before the entire batch has arrived.In f if one were to achieve an upper bound close to O(n 2 ), we would have approximately constant expected number of packets in each queue.T means that with positive probability, many of the queues will be em Therefore, an elaborate policy would be needed to avoid offering servic empty queues and thus avoid queue buildup.But the analysis of policie such elaborate policies appears to be a difficult challenge.7. Discussion.We presented a novel scheduling policy for a [general] n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper) the policy description and analysis are likely to more involved.
Finally, for the regime where ρ ≈ 1−1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [12]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of policies of such elaborate policies appears to be a difficult challenge.7. Discussion.We presented a novel scheduling polic n × n input-queued switch.In the regime where the syst ρ = 1 − 1/n, and the arrival rates at the different queu our policy achieves an upper bound of order O(n 2.5 log n) total queue size, a substantial improvement upon earlier upp which were of order O(n 3 ), ignoring poly-logarithmic depen policy is of the batching type.However, instead of waitin batch has arrived, our policy only waits for enough arrivals the system to exhibit a desired level of regularity, and th the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statisti dependent on the fact that all arrival rates are the same.that similar policies can be devised for arbitrary arrival r regime considered in this paper) the policy description and a to more involved.
Finally, for the regime where ρ ≈ 1−1/n, there is a Ω(n 2 the expected total queue size under any policy (see [12]), w bound is of order O(n 2.5 log n).It is an interesting open q this gap between the upper and lower bound can be closed a prespecified sequence of schedules (round-robin) until the arrived and then uses an "adaptive" sequence of schedules t packets after the end of the batch.Within the class of poli with perhaps different choices of the parameters involved, impossible to obtain an upper bound of O(n α ) for α < 2.5 to come closer to the Ω(n 2 ) lower bound, we will have to sequence of schedules early on, before the entire batch has if one were to achieve an upper bound close to O(n 2 ), w approximately constant expected number of packets in e means that with positive probability, many of the queue Therefore, an elaborate policy would be needed to avoid o empty queues and thus avoid queue buildup.But the anal such elaborate policies appears to be a difficult challenge.7. Discussion.We presented a novel scheduling poli n × n input-queued switch.In the regime where the syst ρ = 1 − 1/n, and the arrival rates at the different queu our policy achieves an upper bound of order O(n 2.5 log n) total queue size, a substantial improvement upon earlier up which were of order O(n 3 ), ignoring poly-logarithmic depe policy is of the batching type.However, instead of waitin batch has arrived, our policy only waits for enough arrivals the system to exhibit a desired level of regularity, and th the batch.This idea may be of independent interest.
Our policy uses detailed knowledge of the arrival statist dependent on the fact that all arrival rates are the same.that similar policies can be devised for arbitrary arrival regime considered in this paper) the policy description and to more involved.
Finally, for the regime where ρ ≈ 1 − 1/n, there is a Ω(n 2 the expected total queue size under any policy (see [12]), w bound is of order O(n 2.5 log n).It is an interesting open this gap between the upper and lower bound can be closed a prespecified sequence of schedules (round-robin) until th arrived and then uses an "adaptive" sequence of schedules t packets after the end of the batch.Within the class of pol with perhaps different choices of the parameters involved impossible to obtain an upper bound of O(n α ) for α < 2. to come closer to the Ω(n 2 ) lower bound, we will have to sequence of schedules early on, before the entire batch ha if one were to achieve an upper bound close to O(n 2 ), w approximately constant expected number of packets in means that with positive probability, many of the queue Therefore, an elaborate policy would be needed to avoid o empty queues and thus avoid queue buildup.But the ana such elaborate policies appears to be a difficult challenge.For an n × n input-queued switch, we also define n particular schedule σ (1) , σ (2) , . . ., σ (n) .For m ∈ {1, 2, . . ., n}, σ (m) is defined by To illustrate, when n = 3, the schedules σ (1) , σ (2) , and σ (3) are given by the n × n matrix of all 1s.arrival period We now proceed with the description of the policy.Time is divided int consecutive intervals, which we call arrival periods, of length b.For k = 0, 1, 2, . .., the kth arrival period consists of slots kb + 1, kb + 2, . . ., (k + 1)b Arrivals that occur during the kth arrival period are said to belong to th kth batch.
The general idea behind the policy is as follows.The policy aims to serv all of the packets in the kth batch during the kth service period, of lengt b, which is offset from the arrival period by a delay of d.Thus, the kt service period consists of time slots kb + d + 1, . . ., (k + 1)b + d.If th policy does not succeed in serving all of the packets in the kth batch, th unserved packets will be considered backlogged and will be handled togethe with newly arriving packets from subsequent batches, in subsequent servic periods.As it will turn out, however, the number of backlogged packets wi be zero, with high probability.
We now continue with a precise description, by considering what happen during the kth service period.Note that the time slots kb + 1, . . ., kb + do not belong to the kth service period.Packets from the kth batch wi accumulate during these time slots, but none of them will be served.At th beginning of the kth service period (the beginning of time slot bk + d + 1) we may have some backlogged packets from previous service periods, an we denote their number by B k .We assume that B 0 = 0.
The kth service period consists of three phases, which are described below For an n × n input-queued switch, we also define n particular sc σ (1) , σ (2) , . . ., σ (n) .For m ∈ {1, 2, . . ., n}, σ (m) is defined by To illustrate, when n = 3, the schedules σ (1) , σ (2) , and σ (3) are give the n × n matrix of all 1s.arrival period s + d We now proceed with the description of the policy.Time is divid consecutive intervals, which we call arrival periods, of length b.F 0, 1, 2, . .., the kth arrival period consists of slots kb + 1, kb + 2, . . ., ( Arrivals that occur during the kth arrival period are said to belong kth batch. The general idea behind the policy is as follows.The policy aims all of the packets in the kth batch during the kth service period, of b, which is offset from the arrival period by a delay of d.Thus, service period consists of time slots kb + d + 1, . . ., (k + 1)b + d policy does not succeed in serving all of the packets in the kth bat unserved packets will be considered backlogged and will be handled t with newly arriving packets from subsequent batches, in subsequent periods.As it will turn out, however, the number of backlogged pack be zero, with high probability.
We now continue with a precise description, by considering what h during the kth service period.Note that the time slots kb + 1, . . .do not belong to the kth service period.Packets from the kth ba accumulate during these time slots, but none of them will be served.beginning of the kth service period (the beginning of time slot bk +

Massachusetts Institute of Technology
We study the optimal scaling of the expected total queue size in an n × n input-queued switch, as a function of the number of ports n and the load factor ρ, which has been conjectured to be Θ(n/(1 − ρ)) (cf.[?]).In a recent work [?], the validity of this conjecture has been established for the regime where 1 − ρ = O(1/n 2 ).In this paper, we make further progress in the direction of this conjecture.[In the main result of this paper,] We provide a new class of scheduling policies under which the expected total queue size scales as O n This is an improvement over the state of the art; for example, for ρ = 1 − 1/n the best known bound was O(n 3 ), while ours is O(n 2.5 log n).

Introduction. service period
An input-queued switch is a popular and commercially available architecture for scheduling data packets in an internet router.In general, an input-queued switch maintains a number of virtual queues to which packets arrive.Packets to be served at each time slot are selected according to a scheduling policy, subject to system constraints that specify which queues can be served simultaneously.
The input-queued switch model is an important example of so-called "stochastic processing networks," formalized by Harrison [?], which have become a canonical model of a variety of dynamic resource allocation scenarios.While the most basic questions concerning throughput and stability 1 are relatively well-understood for general stochastic processing networks (see e.g., [?], [?], [?], [?], [?], [?], [?]), much less is known on the subject of more 1.The first b − d slots of the kth service period, namely, slots kb + d + 1, . . ., (k + 1)b, comprise a round-robin phase: we cycle through the schedules σ (1) , σ (2) , . . ., σ (n) in a round-robin manner.However, during this phase, we do not serve any of the backlogged packets; we only serve packets that belong to the kth batch.Furthermore, even though packets from the (k + 1)st batch may have started to arrive, we do not serve any of them.By the beginning of this phase, all of the arrivals from the kth batch have already arrived.Some of them have already been served during the round-robin phase.
To those that remain, we apply the optimal clearing policy described earlier; cf.Theorem 4.3.However, there is a possibility that the phase terminates before we succeed in serving all of the remaining packets from the kth batch.Let U k be the number of the packets from the kth batch that were left unserved during this phase.These U k packets are considered backlogged and are added to the backlog B k from earlier periods.3. The last r = b − s slots, namely slots kb + d + s + 1, . . ., (k + 1)b + d, comprise the kth backlog clearing phase.During this phase, we serve backlogged packets using some arbitrary policy.The only requirement is that the policy serve at least one packet at each slot that a backlogged packet is available.However, we do not serve any of the newly arrived packets from the (k + 1)st batch.Any backlogged packets that are not served during this phase remain backlogged and comprise the number B k+1 of backlogged packets at the beginning of the next service period.Since at least one backlogged packet is served (whenever available) during each one of these r slots, and since there are no additions to the backlog during this phase, we have The total length of the three phases is so that the length of a service period is equal to the length of an arrival period.However, before continuing, we need to make sure that the duration of each phase is a positive number, so that the policy is well-defined.This is accomplished in the next two lemmas, which also provide order of magnitude information on the durations of these phases.
Lemma 5.1.The length r = b − s of the backlog clearing phase satisfies In particular, when n is large enough, we have r ≥ 1.

Proof. Using the assumption
The fact that c r > 0 follows from our assumption in Eq. ( 11).Proof.Recall that r = b − s.It follows that Note that c r < c b < c d (cf.Lemma 5.1 and Eq. ( 11)), which implies that c > 0.
6. Policy Analysis.The performance analysis of the proposed policy involves the following line of argument for what happens during the kth arrival and service period.there is a positive number of packets from the kth arrival batch at each queue; cf.Lemmas 6.1 and 6.2.Therefore, offered service is never wasted.In particular, at least as many packets are served as they arrive (in the expected value sense), and the total queue size does not grow.(c) With high probability, all of the packets from the kth batch that are in queue at the beginning of the normal clearing phase get cleared and therefore the number U k of newly backlogged packets is zero; cf.Lemma 6.4.(d) The number B k of backlogged packets evolves similar to a discrete-time G/D/1 queue; cf.Eq. ( 12).Because U k is zero with high probability, the Kingman bound (Theorem 4.2) implies that the expected number of backlogged packets, at any time, is small; cf.Lemma 6.5.
The above steps, when translated into precise bounds on queue sizes, will lead to an O(nd) bound on the expected total queue size at any time.

6.1.
No waste during the round-robin phase.In this subsection, we establish that during the round-robin phase, every queue contains a nonzero number of packets from the current arrival batch, with high probability.We first introduce some convenient notation.We will use the variable t ∈ {1, . . ., b + 1} to index the b slots of the kth arrival period together with the first slot of the subsequent normal clearing phase.For t ∈ {1, . . ., b}, we let A k i,j (t) be the number of arrivals to the (i, j)th queue during the first t time slots of the kth arrival period; these are the time slots kb+1, kb+2, . . ., kb+t.Similarly, for t ∈ {1, . . ., b}, we let S k i,j (t) be the number of packets that arrive to queue (i, j) during the kth arrival period and get served during the first t time slots of the kth arrival period.Finally, for t ∈ {1, . . ., b + 1}, we let Q k i,j (t) be the number of packets from the kth arrival batch that are in queue (i, j) at the beginning of the tth slot of the kth arrival period.With these definitions, we have, (13) We are interested in conditions under which no offered service is wasted during the round-robin phase.Equivalently, we are interested in conditions under which all queues have a positive number of packets from the kth batch.Note that the round-robin phase involves slots for which t ∈ {d + 1, . . ., b}.
We have the following observation on the queue sizes at the beginning of these slots.Lemma 6.1.Suppose that t ∈ {d, . . ., b − 1} and that Then, Q k i,j (t + 1) > 0.
Proof.Note that that for the first d time slots, packets from the kth batch do not receive any service.Starting from the (d + 1)st slot, we are in the round-robin phase, and queue (i, j) is offered service once every n slots.Therefore, The result follows from Eq. ( 13).
The previous lemma highlights the importance of the events A k i,j (t) > (t − d)/n + 1.We will show that the complements of these events have, collectively, small probability.To this effect, let W k i,j (t) be the event defined by Let also W k be the union of these events, over all queues, and over all indices t that are relevant to the round-robin phase: Lemma 6.2.For n sufficiently large, we have for all k.
Proof.Let us fix some (i, j) and some t ∈ {d, . . ., b−1}.Note that E A k i,j (t) = ρt/n.Therefore, the event W k i,j (t) is the same as the event which is of the form Using the facts t − d ≤ b and 1 − ρ = 1/f n , the first term on the right-hand side is bounded above (in absolute value) by b/(nf n ).For the second term, we use the facts Now, for n large enough, we have c b + 1 ≤ (c d /4) √ n, and this implies that Using Eq. (3) (the lower tail bound in Theorem 4.1), we have Using also Eq. ( 14), we obtain where the last inequality follows from our assumption that c 2 d ≥ 640c b ; cf.Eq. (11).Consequently, as long as n is large enough so that c b ≤ f n .Therefore, using the union bound 6.2.The probability of no new backlog.In this subsection we show that U k , the additional backlog generated during the kth service period, is zero with high probability.Our analysis builds on an upper bound on the probability that the number of packets in the kth batch that are associated with a particular port is appreciably larger than its expected value.Towards this purpose, we define the row and column sums for the arrivals in the kth batch: We also define the events In what follows, we first show that the event H k has low probability.We then show that if neither of the events W k or H k occurs (which has high probability), then U k is equal to zero.Lemma 6.3.For n sufficiently large, we have for all k.
Proof.Let us focus on the event We have, using Eq. ( 4) (the upper tail bound in Theorem 4.1) in the last step, Therefore, when n ≥ 4, where the last inequality follows from our assumption that c s ≥ 30; cf.Eq. (11).The event H k is the union of 2n events, each with probability bounded above by 1/(4f 14 n ).Using the union bound and the assumption n ≤ f n , we obtain P(H k ) ≤ 1/(2f 13 n ).
(a) Consider a sample path under which neither W k nor H k occurs.Then, Proof.(a) We assume that neither W k nor H k occurs.Using Eq. ( 13), the queue sizes (where we only count packets from the kth batch) at the beginning of the normal clearing period are equal to (16) Now consider a fixed i.Note that the schedules σ (m) used during the round-robin phase have the property j σ (m) i,j = 1; that is, each input port is offered exactly one unit of service at each time slot.Furthermore, since event W k does not occur, Lemma 6.1 implies that all queues are positive at the beginning of each slot of the round-robin phase; that is, Q k ij (t + 1) > 0, for t = d, . . ., b − 1.Therefore, the offered service is never wasted during the b − d slots of the roundrobin phase.It follows that the total actual service at input port i during the round-robin phase is exactly b − d: Then, using the bound in (6), we have .
As n increases, the right-hand side converges to 1/2 and is therefore bounded above by 1 when n is sufficiently large.
6.4.Queue size analysis.In this subsection we show that at any time, the sum of the queue sizes is of order O(nd).We fix some time τ and consider two cases, depending on whether this time belongs to a round-robin phase or not.
Queue sizes during the round-robin phase.Suppose that τ satisfies kb + d + 1 ≤ τ ≤ (k + 1)b, so that τ belongs to the round-robin phase of the kth service period, and let us look at the queue size Q i,j (τ + 1).This queue size may include some packets that arrived during earlier arrival periods and that were backlogged; their total expected number (summed over all i and j) is Let us now turn our attention to packets that belong to the kth batch.Recall that the number of such packets in queue (i, j) at the beginning of the (t + 1)st slot (equivalently, the end of the tth slot) of the kth arrival period is denoted by Q k i,j (t + 1).For t = d + 1, . . ., b, we have, as in Eq. ( 13), Q k i,j (t + 1) = A k i,j (t) − S k i,j (t), and i,j S k i,j (t) .
By the same argument as in the proof of Lemma 6.4(a), if event W k does not occur, the service during the round-robin phase is never wasted: a total of n packets are served at each time, and for t = d + 1, . . ., b, a total of n(t − d) packets are served by the tth slot of the kth arrival period.Using also the inequality (cf.Lemma 6.2) we obtain Therefore, which is an upper bound of the desired form.
Queue sizes outside the round-robin phase.Suppose now that τ satisfies (k + 1)b + 1 ≤ τ ≤ (k + 1)b + d, so that τ belongs to one of the last two phases of the kth service period, and let us look again at the queue size Q i,j (τ + 1).As before, we may have some backlogged packets.These are either packets backlogged during the current period (the kth one) or in previous periods.Their total expected number (summed over all i and j) at any time in this range is upper bounded by Let us now turn our attention to packets that belong to the kth batch.Since there are no further arrivals from the kth batch from slot (k + 1)b + 1 onwards, the number of such packets is largest at the beginning of slot (k + 1)b + 1.Their expected value at that time satisfies i,j where in the inequality we used Eq. ( 17) with t = b.Finally, we need to account for arrivals that belong to the (k + 1)st arrival batch.The total number of such accumulated arrivals is largest when we consider the largest value of τ , namely, τ = (k + 1)b + d.By that time, we have had a total of d slots of the (k +1)st arrival period, and a total expected number of arrivals equal to ρnd, which is bounded above by nd.
Putting together all of the bounds that we have developed, we see that at any time, the expected total number of packets is bounded above by 2nd + 2 ≤ 3nd.This being true for all sufficiently large n, establishes Theorem 3.1.

7.
Discussion.We presented a novel scheduling policy for an n × n input-queued switch.In the regime where the system load satisfies ρ = 1 − 1/n, and the arrival rates at the different queues are all equal, our policy achieves an upper bound of order O(n 2.5 log n) on the expected total queue size, a substantial improvement upon earlier upper bounds, all of which were of order O(n 3 ), ignoring poly-logarithmic dependence on n.Our policy is of the batching type.However, instead of waiting until an entire batch has arrived, our policy only waits for enough arrivals to take place for the system to exhibit a desired level of regularity, and then starts serving the batch.This idea may be of independent interest.Our policy uses detailed knowledge of the arrival statistics, and is heavily dependent on the fact that all arrival rates are the same.While we believe that similar policies can be devised for arbitrary arrival rates (within the regime considered in this paper), the policy description and analysis are likely to be more involved.
Finally, for the regime where ρ ≈ 1 − 1/n, there is a Ω(n 2 ) lower bound on the expected total queue size under any policy (see [13]), whereas our upper bound is of order O(n 2.5 log n).It is an interesting open question whether this gap between the upper and lower bound can be closed.Our policy uses a prespecified sequence of schedules (round-robin) until the entire batch has arrived and then uses an "adaptive" sequence of schedules to clear remaining packets after the end of the batch.Within the class of policies of this type, with perhaps different choices of the parameters involved, it appears to be impossible to obtain an upper bound of O(n α ) for α < 2.5.Thus, in order to come closer to the Ω(n 2 ) lower bound, we will have to use an adaptive sequence of schedules early on, before the entire batch has arrived.In fact, if one were to achieve an upper bound close to O(n 2 ), we would have an approximately constant expected number of packets in each queue.This means that with positive probability, many of the queues will be empty.Therefore, an elaborate policy would be needed to avoid offering service to empty queues and thus avoid queue buildup.But the analysis of such elaborate policies appears to be a difficult challenge.

0 1 d
+ 1 b b + 1 s r b + d arrival interval service interval round-robin normal clearing backlog clearing 20 SHAH & TSITSIKLIS & ZHONG

0 1 d
+ 1 b b + 1 s r b + d arrival interval service interval round-robin normal clearing backlog clearing 20 SHAH & TSITSIKLIS & ZHONG

0 1 d
+ 1 b b + 1 s r b + d arrival interval service interval round-robin normal clearing backlog clearing 20 SHAH & TSITSIKLIS & ZHONG

0 1 d
+ 1 b b + 1 s r b + d arrival interval service interval round-robin normal clearing backlog clearing 20 SHAH & TSITSIKLIS & ZHONG

0 1 d
+ 1 b b + 1 s r b + d arrival interval service interval round-robin normal clearing backlog clearing ON QUEUE-SIZE SCALING FOR INPUT-QUEUED SWITCHES By D. Shah and J. N. Tsitsiklis and Y. Zhong *

Fig 1 :
Fig 1: Illustration of a typical arrival period and the phases of a service period.Slots are numbered consecutively, starting with the first slot of the arrival period.

3 2 .
The next = d + s − b slots, namely slots (k + 1)b + 1, . . ., kb + d + s, comprise the kth normal clearing phase.Similar to the roundrobin phase, we do not serve any backlogged packets during this phase.

Lemma 5 . 2 .
The length = d + s − b of the normal clearing phase satisfies ≥ c √ nf n log f n , where c = c d − c r > 0. In particular, when n is large enough, we have ≥ 1.
(a) In the first d slots of the kth arrival period, we have an expected number O(nd) of arrivals.(b) With high probability, at every time slot during the round-robin phase, (t + 1)] ≤ nρt − nρ(t − d) = nρd ≤ nd, t = d + 1, . . ., b.