Asynchronous Optimization over Weakly Coupled Renewal Systems

This paper considers optimization over multiple renewal systems coupled by time average constraints. These systems act asynchronously over variable length frames. For each system, at the beginning of each renewal frame, it chooses an action which affects the duration of its own frame, the penalty, and the resource expenditure throughout the frame. The goal is to minimize the overall time average penalty subject to several overall time average resource constraints which couple these systems. This problem has applications to task processing networks, coupled Markov decision processes(MDPs) and so on. We propose a distributed algorithm so that each system can make its own decision after observing a global multiplier which is updated slot-wise. We show that this algorithm satisfies the desired constraints and achieves $\mathcal{O}(\varepsilon)$ near optimality with $\mathcal{O}(1/\varepsilon^2)$ convergence time.

1. Introduction.Consider N renewal systems that operate over a slotted timeline (t ∈ {0, 1, 2, . ..}).The timeline for each system n ∈ {1, . . ., N } is segmented into back-to-back intervals of time slots called renewal frames.The start of each renewal frame for a particular system is called a renewal time or simply a renewal for that system.The duration of each renewal frame is a random positive integer with distribution that depends on a control action chosen by the system at the start of the frame.The decision at each renewal frame also determines the penalty and a vector of performance metrics during this frame.The systems are coupled by time average constraints placed on these metrics over all systems.The goal is to design a decision strategy for each system so that overall time average penalty is minimized subject to time average constraints.
We use k = 0, 1, 2, • • • to index the renewals.Let t n k be the time slot corresponding to the k-th renewal of the n-th system with the convention that t n 0 = 0. Let T n k be the set of all slots from t n k to t n k+1 − 1.At time t n k , the n-th system chooses a possibly random decision α n k in a set A n .This action determines the distributions of the following random variables: • The duration of the k-th renewal frame T n k := t n k+1 − t n k , which is a positive integer.• A vector of performance metrics at each slot of that frame z n [t] := (z n • A penalty incurred at each slot of the frame y n [t], t ∈ T n k .We assume each system has the renewal property that given α n k = α n ∈ A n , the random variables T n k , z n [t] and y n [t], t ∈ T n k are independent of the information of all systems from the slots before t n k with the following known conditional expectations E( In addition, we have an uncontrollable external i.i.d.random process {d[t]} ∞ t=0 ⊆ R L which can be observed during each time slot.Let d l := E(d l [t]).The goal is to minimize the total time average penalty of these N renewal systems subject to L total time average constraints on the performance metrics related to the external i.i.d.process, i.e. we aim to solve the following optimization problem: This problem is challenging because these N systems are weakly coupled by the time average constraints ( 2), yet each of them operates over its own renewal frames.The renewals of different systems do not have to be synchronized and they do not have to occur at the same rate.Fig. 1 plots a sample timeline of three parallel renewal systems.In Section 2, we will develop an algorithm that does not need the knowledge of d l = E(d l [t]) with a provable performance guarantee.1.1.1.Multi-server energy-aware scheduling.Consider a slotted time system with L classes of jobs and N servers.Job arrivals are Poisson distributed with rates λ 1 , • • • , λ L , respectively.These jobs are stored in separate queues denoted as in a router waiting to be served.Assume the system is empty at time t = 0 so that Q l [0] = 0, ∀l ∈ {1, 2, • • • , L}.Let λ l [t] be the precise number of class l job arrivals at slot t, then, we have E(λ l [t]) = λ l , ∀l ∈ {1, 2, • • • , L}.Let µ n l [t] and e n [t] be the number of class l jobs served and the energy consumption for server n at time slot t, respectively.Fig. 2 sketches an example architecture of the system with 3 classes of jobs and 10 servers.
Each server makes decisions over renewal frames and the first frame starts at time slot t = 0. Successive renewals can happen at different slots for different servers.For the n-th server, at the beginning of the k-th frame (k ∈ N), it chooses a processing mode m n k within the set of all modes M n .The processing mode m n k determines distributions on the number of jobs served, the service time, and the energy expenditure, with conditional expectations: Thus, we have formulated the problem into the form (1)- (2).Note that the external process in this example is the arrival process of L classes of jobs with potentially unknown arrival rates λ l .
Previously, [12] treats a special case of this problem where all energy and service quantities are deterministic functions of the processing modes.The newly developed algorithm in the current paper can be used to solve this problem with considerably more general stochastic assumptions.1.1.2.Coupled ergodic MDPs.Consider N discrete time Markov decision processes (MDPs) over an infinite horizon.Each MDP consists of a finite state space S n , and an action space U n at each state s ∈ S n .1For each state s ∈ S, we use P n u (s, s ) to denote the transition probability from s ∈ S n to s ∈ S n when taking action u ∈ U n , i.e.
where s[t] and u[t] are state and action at time slot t.
, where these functions are all bounded mappings from S n × U n to R. For simplicity we write ).The goal is to minimize the time average overall penalty with constraints on time average overall costs, where these MDPs are weakly coupled through the time average constraints.This problem can be written in the form (1)- (2).
In order to define the renewal frame, we need one more assumption on the MDPs.We assume each of the MDPs is ergodic, i.e. there exists a state which is recurrent and the corresponding Markov chain is aperiodic under any randomized stationary policy2 , with bounded expected recurrence time.Under this assumption, the renewals for each MDP can be defined as successive revisitations to the recurrent state, and the action set A n in such scenario is defined as the set of all randomized stationary policies that can be implemented in one renewal frame.Thus, our formulation includes coupled ergodic MDPs.We refer to [1], [3], and [17] for more details on MDP theory and related topics.
As a side remark, this multi-MDP problem can be viewed as a single MDP on an enlarged state space.Constrained MDPs are discussed previously in [1].One can show that under the previous ergodic assumption, the minimum of ( 1)-( 2) is achieved by a randomized stationary policy, and furthermore, such a policy can be obtained via solving a linear program reformulated from ( 1)-( 2) offline.However, formulating such LP requires the knowledge of all the parameters in the problem, including the statistics of the external process {d[t]} ∞ t=0 , and the resulting LP is often computationally intractable when the number of MDPs is very large.On the contrary, our newly developed algorithm is carried out in an online manner, does't require the statistics of the external process and enjoys a natural "decoupled" structure, effectively reducing the computational load.1.2.Challenges and previous approaches.As mentioned above, for the special case of coupled ergodic MDPs, this problem can be solved via a linear program (see [1] and also [8] for detailed discussions on formulating MDPs as linear programs).However, this approach becomes intractable as the number of MDPs gets very large.On the other hand, existing asynchronous algorithms and analysis (e.g.[4][6][15] [19]) treat only the case where system delays (frames) are of fixed distribution independent of the actions or even deterministic, which are not readily extendable to our problem.
The main technical challenge is the dilemma on how to pick a correct time scale to carry out an algorithm and corresponding analysis.On one hand, since time is slotted, one would naturally think of synchronizing all systems on the slot scale and designing a slot-based algorithm.However, since each renewal spans multiple slots, any such algorithm would essentially cut some renewals in the middle and it would be difficult to analyze any particular system.On the other hand, if we analyze each system over its own renewal frames, it is not clear how to piece together these individual analyses.
Prior approaches treat this challenge only in special cases.The works [12] and [13] consider a special case where all quantities introduced above are deterministic functions of the actions.The work [20] considers another special case of the current formulation in server scheduling, where there is only one queue stability constraint and it can be easily met via controlling the arrival rate to the system.These two methods circumvent the aforementioned dilemma by making extra assumptions on the system and thus can not be generalized to the current setting.1.3.Our contributions.The current paper develops a new algorithm where each system operates on its own renewal frame.It is fully analyzed with convergence as well as convergence time results.As a first technical contribution, we fully characterize the fundamental performance region of the problem (1)-(2) (Lemma 3.2).We then resolve the aforementioned dilemma by constructing a supermartingale along with a stopping-time to "synchronize" all systems on a slot basis, by which we could piece together analysis of each individual system to prove the convergence of the proposed algorithm.Furthermore, encapsulating this new idea into convex analysis tools, we prove the O(1/ε2 ) convergence time of the proposed algorithm to reach O(ε) near optimality under a mild assumption on the existence of a Lagrange multiplier (Section 4).Specifically, we show that for any accuracy > 0 and any time T ≥ 1/ε 2 , the sequence {y n [t]} and {z n [t]} produced by our algorithm satisfies, where f * denotes the optimal objective value of (1)- (2).Simulation experiments on the aforementioned multi-server energy-aware scheduling problem also demonstrate the effectiveness of the proposed algorithm.
1.4.Other related works.The problem considered in the current paper is a generalization of optimization over a single renewal system.It is shown in [14] that for the single renewal system with finite action set, the problem can be solved (offline) via a linear fractional program.Methods for solving linear fractional programs can be found in [5] and [18].The drift-plus-penalty ratio approach is also developed in [11] and [14] for the single renewal system.
On the other hand, our problem is also related to the multi-server scheduling as is shown in one of the example applications.When assuming proper statistics of the arrivals and/or services, energy optimization problems in multi-server systems can also be treated via queueing theory.Specifically, by assuming both arrivals and services are Poisson distributed, [9] treats the multi-server system as an M/M/k/setup queue and explicitly computes several performance metrics via the renewal reward theorem.By assuming arrivals are Poisson and only one server, [10] and [21] treat the system as a multi-class M/G/1 queue and optimize the average energy consumption via polymatroid optimization.The rest of the paper is organized as follows: Section 2 introduces the proposed algorithm along with technical assumptions.Section 3 introduces our main technical argument proving the convergence of the proposed algorithm via supermartingale and stopping time constructions.Building upon these technical tools, Section 4 takes one step further and proves the convergence time of the proposed algorithm.Finally, a simulation study regarding multi-server energy-aware scheduling is given in Section 5.
2.1.Technical preliminaries.Throughout the paper, we make the following assumptions.Following this assumption, we define f * as the infimum objective value for (1)-( 2) over all decision sequences that satisfy the constraints.Assumption 2.2 (Boundedness).For any k ∈ N and any n ∈ {1, 2, • • • , N }, there exist absolute constants y max , z max and d max such that Furthermore, there exists an absolute constant B ≥ 1 such that for every fixed α n ∈ A n and every s ∈ N for which P r( Remark 2.1.The quantity T n k −s is usually referred to as the residual lifetime.In the special case where s = 0, (5) gives the uniform second moment bound of the renewal frames as Note that (5) is satisfied for a large class of problems.In particular, it can be shown to hold in the following three cases: 1.If the inter-renewal T n k is deterministically bounded.

If the inter-renewal T n
k is geometrically distributed.3.If each system is a finite state ergodic MDP with a finite action set.Definition 2.1.For any α n ∈ A n , let and and let f n (α n ), g n (α n ) be a performance vector under the action α n .
Note that by Assumption 2.2, y n (α n ) and z n (α n ) in Definition 2.1 are both bounded, and The following mild assumption states that this set is also closed.
Finally, we define the performance region of each individual system as follows.
as the performance region of system n.
2.2.Proposed algorithm.In this section, we propose an algorithm where each system can make its own decision after observing a global vector of multipliers which is updated using the global information from all systems.We start by defining a vector of virtual queues ), which are 0 at t = 0 and updated as follows, These virtual queues will serve as global multipliers to control the growth of corresponding resource consumptions.
Then, the proposed algorithm runs as follows via a fixed trade-off parameter V > 0: • At the beginning of k-th frame of system n, the system observes the vector of virtual queues Q[t n k ] and makes a decision α n k ∈ A n so as to solve the following subproblem: • Update the virtual queue after each slot: Note that using the notation specified in Definition 2.1, we can rewrite (7) in a more concise way as follows: which is a deterministic optimization problem.Then, by the compactness assumption (Assumption 2.3), there always exists a solution to this subproblem.
This algorithm requires knowledge of the conditional expectations associated with the performance vectors and therefore decouples these systems.Furthermore, the virtual queue update uses observed d l [t] and does not require knowledge of distribution or mean of d l [t].
In addition, we introduce Q[t] as "virtual queues" for the following two reasons: First, it can be mapped to real queues in applications (such as the server scheduling problem mentioned in Section 1.1.1),where d[t] stands for the arrival process and z[t] is the service process.Second, stabilizing these virtual queues implies the constraints (2) are satisfied, as is illustrated in the following lemma, whose proof is given in the appendix.
imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 2.3.Computing subproblems.Since a key step in the algorithm is to solve the optimization problem (8), we make several comments on the computation of the ratio minimization (8).In general, one can solve the ratio optimization problem (7) (therefore ( 8)) via a bisection search algorithm.For more details, see section 7 of [11].However, more often than not, bisection search is not the most efficient one.We will discuss two special cases arising from applications where we can find a simpler way of solving the subproblem.
First of all, when there are only a finite number of actions in the set A n , one can solve (8) simply via enumerating.This is a typical scenario in energy-aware scheduling where a finite action set consists of different processing modes that can be chosen by servers.
Second, when the set 2 is itself a convex hull of a finite sequence {(y j , z j , T j )} m j=1 , then, ( 8) can be rewritten as a simple enumeration: To see this, note that by definition of convex hull, for any for some {p j } m j=1 , p j ≥ 0 and m j=1 p j = 1.Thus, where we let q i = p i T i m j=1 p j T j .Note that q i ≥ 0 and m i=1 q i = 1 because T i ≥ 1.Hence, solving (8) is equivalent to choosing {q i } m i=1 to minimize the above expression, which boils down to choosing a single (y i , z i , T i ) among {(y j , z j , T j )} m j=1 which achieves the minimum.Note that such a convex hull case stands out not only because it yields a simple solution, but also because of the fact that ergodic coupled MDPs discussed in Section 1.1.2have the region , where each point (y j , z j , T j ) results from a pure stationary policy ( [1]). 3 Thus, solving (8) for the ergodic coupled MDPs reduces to choosing a pure policy among a finite number of pure policies.
3. Limiting Performance.For the rest of the paper, the underlying probability space is denoted as the tuple (Ω, F, P ).Let F[t] be the system history up until time slot t.Formally, t ≥ 1 is the σ-algebra generated by all random variables from slot 0 to t.
For the rest of the paper, we always assume Assumptions 2.1-2.3 hold without explicitly mentioning them.
3.1.Convexity.The following lemma demonstrates the convexity of P n in Definition 2.2.
Lemma 3.1.The performance region P n specified in Definition 2.2 is convex for any n ∈ {1, 2, • • • , N }.Furthermore, it is the convex hull of the set Proof.We first prove the convexity of P n .Consider any two points (f 1 , g 1 ), (f 2 , g 2 ) ∈ P n .We aim to show that for any q ∈ (0, 1), (qf 1 + (1 − q)f 2 , qg 1 + (1 − q)g 2 ) ∈ P n .Notice that by definition of P n , there exists (y 1 , z 1 , T 1 ), (y 2 , z 2 , T 2 ) ∈ S n such that To show this, we make a change of variable by letting p = qT 2 Since S n is convex, Thus, by definition of P n again, ( 9) holds and the first part of the proof is finished.
To show the second part of the claim, let and let conv(Q n ) be the convex hull of Q n .First of all, By Definition 2.2, To show the reverse inclusion P n ⊆ conv(Q n ), note that any point in P n can be written in the form y T , z T , where (y, z, T ) ∈ S n .Since S n by definition is the convex hull of by the definition of convex hull, (y, z, T ) can be written as a convex combination of m points in the above set.Let be these points, so that imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 As a result, we have .
We make a change of variable by letting Since m i=1 q i = 1 and q i ≥ 0, it follows any point in P n can be written as a convex combination of finite number of points in Q n , which implies P n ⊆ conv(Q n ).Overall, we have Finally, by Assumption 2.3, we have Thus, P n , being a convex hull of a compact set, is also compact.
3.2.Key-feature inequality and supermartingale construction.First of all, we have the following fundamental performance lemma which states that the optimality of ( 1)-( 2) is achievable within P n specified in Definition 2.2.
where f * is the optimal objective value for problem (1)-(2), i.e. the optimality is achievable within ⊗ N n=1 P n , the Cartesian product of P n .Furthermore, for any e. one cannot achieve better performance than (1)-( 2) The proof of this Lemma is delayed to Appendix A. In particular, the proof uses the following lemma, which also plays an important role in several lemmas later. where are constant over each renewal frame for system n defined by The proof of this lemma is delayed to Appendix A.
Remark 3.1.Note that directly computing f n * and g n l, * indicated by Lemma 3.2 would be difficult because of the fractional nature of P n , the coupling between different systems through time average constraints and the fact that d l = E(d l [t]) might be unknown.However, Lemma 3.2 can be used to prove important performance theorems regarding our proposed algorithm as is indicated by the following lemma.
The following key-feature inequality connects our proposed algorithm with the performance vectors inside P n .Lemma 3.4.Consider the stochastic processes , and {T n k } ∞ k=0 resulting from the proposed algorithm.For any system n, the following holds for any k ∈ N and any Proof.First of all, since the proposed algorithm solves (7) over all possible decisions in A n , it must achieve value less than or equal to that of any action α n ∈ A n at the same frame.This gives, where D n k is defined in (7) and the equality follows from the renewal property of the system that Since S n specified in Definition 2.2 is the convex hull of ( y n (α n ), z n (α n ), T n (α n )), α n ∈ A n , it follows for any vector (y, z, T ) ∈ S n , we have Dividing both sides by T and using the definition of P n in Definition 2.2 give imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 Finally, since , and {T n k } ∞ k=0 result from the proposed algorithm and the action chosen is determined by Q[t n k ] as in (7), .
This finishes the proof.
Our next step is to give a frame-based analysis for each system by constructing a supermartingale on the per-frame timescale.Recall that {F[t]} ∞ t=0 is a filtration (with F[t] representing system history during slots {0, • • • , t}).Fix a system n and recall that t n k is the time slot where the k-th renewal occurs for system n.We would like to define a filtration corresponding to the random times t n k .To this end, define the collection of sets {F n k } ∞ k=0 such that for each k, For example, the following set A is an element of F n 3 : The following technical lemma is proved in the appendix.
is also adapted to {F n k } ∞ k=1 , where for any t ∈ N, G t (•) is a fixed real-valued measurable mappings.That is, for any k, it holds that any measurable function of ) is determined by events in F n k .
With Lemma 3.4 and Lemma 3.5, we can construct a supermartingale as follows, Lemma 3.6.Consider the stochastic processes where B, z max and d max are as defined in Assumption 2.2.Furthermore, define a real-valued process {Y n K } ∞ K=0 on the frame such that Y n 0 = 0 and Proof.Consider any t ∈ T n k , then, we can decompose X n [t] as follows By the queue updating rule (6), we have for any l ∈ {1, 2, • • • , L} and any t > t n k , ( Thus, for the last term in (14), by Hölder's inequality, where the second inequality follows from (15) and the last inequality follows from the boundedness assumption (Assumption 2.2) of corresponding quantities.Substituting the above bound into ( 14) gives a bound on E where we use the fact that 0 in the last inequality.Next, by the queue updating rule (6) outcomes from the slots before t n k .This implies the following display, By Lemma 3.4, we have the following: Thus, rearranging terms in above inequality gives the expectation on the right hand side of ( 17) is no greater than 0 and hence the first expectation on the right hand side of ( 16) is also no greater than 0. For the second expectation in (16), using (5) in Assumption 2.2 gives E (T n k ) 2 F n k ≤ B and the first part of the lemma is proved.For the second part of the lemma, by Lemma 3.5 and the definition of Y n K , the process Thus, E(|Y n K |) < ∞, ∀K ∈ N, i.e. it is absolutely integrable.Furthermore, by the first part of the lemma, finishing the proof.
3.3.Synchronization lemma.So far, we have analyzed the processes related to each individual system over its renewal frames.However, due the asynchronous behavior of different systems, the supermartingales of each system cannot be immediately summed.
In order to get a global performance bound, we have to get rid of any index related to individual renewal frames only.In other words, we need to look at the system property at any time slot T as opposed to any renewal t n k .We start with the following standard definition of stopping time: Definition 3.1.Given a probability space (Ω, F, P ) and a filtration {∅, i.e. the stopping time occurring at time i is contained in the information during slots 0, 1, 2, • • • , i− 1.
Next, for any fixed slot T > 0, let S n [T ] be the number of renewals up to (and including) time slot T , with with the convention that the first renewal occurs at time t = 0, so t n 0 = 0 and S n [0] = 1.The next lemma shows S n [T ] is a valid stopping time, whose proof is in the appendix.
The following theorem tells us a stopping-time truncated supermartingale is still a supermartingale.
Theorem 3.1 (Theorem 5.2.6 in [7]).If τ is a stopping time and With this theorem and the above stopping time construction, we have the following lemma: Lemma 3.8.For each n ∈ {1, 2, • • • , N } and any fixed T ∈ N, we have where Proof.First, note that the renewal index k starts from 0. Thus, for any fixed T ∈ N, , and where the third equality follows from the definition of Y n K in Lemma 3.6 and the last inequality follows from the fact that the number of renewals up to time slot T is no more than the total number of slots, i.e. S n [T ] ≤ T + 1.For the term E Y n S n [T ] , we apply Theorem 3.1 with τ = S n [T ] and index K to obtain For the last term in (18), by queue updating rule (6), for any l ∈ {1, 2, • • • , L}, it then follows from Hölder's inequality again that where in the second from last inequality we use (5) of Assumption 2.2 that the residual life and in the last inequality we use the fact that B ≥ 1, thus, √ B ≤ B. Substitute the above bound into ( 18) gives where f * is the optimal objective of (1)-( 2), C 1 is defined in Lemma 3.8 and Proof.Define the drift-plus-penalty expression at time t as ( 19) By the queue updating rule ( 6), we have where the second inequality follows from the boundedness assumption (Assumption 2.2) that 2 ≤ (N z max +d max ) 2 L, and the equality follows from the fact that d l [t] is i.i.d. and independent of Q l [t], thus, For simplicity, define C 3 = 1 2 (N z max + d max ) 2 L. Now, by the achievability of optimality in ⊗ N n=1 P n (Lemma 3.2), we have N n=1 g n l, * ≤ d l , thus, substituting this inequality into the above bound for P [t] gives where we use the definition of X n [t] in (13) by substituting (f n , g n ) with (f (Lemma 3.2) in the final equality.Now, by Lemma 3.8, we have for any T ∈ N, On the other hand, by the definition of P [t] in (19) and then telescoping sums with Q[0] = 0, we have Combining this with inequality ( 20) gives 2 ≥ 0, we can throw away the term and the inequality still holds, i.e. ( Taking lim sup T →∞ from both sides gives the near optimality in the theorem. To get the constraint violation bound, we use Assumption 2.2 that |y n [t]| ≤ y max , then, by ( 21) again, we have Sending T → ∞ gives lim Finally, by Lemma 2.1, all constraints are satisfied.
Since P n is convex, it follows P n is convex and ⊗ N n=1 P n is also convex.Thus, (24)-( 26) is a convex program.Furthermore, by Lemma 3.2, we have (24)-( 26) is feasible if and only if (1)-( 2) is feasible, and when assuming feasibility, they have the same optimality f * as is specified in Lemma 3.2.
Since P n is convex, one can show (see Proposition 5.1.1 of [2]) that there always exists a sequence (γ 0 , γ i.e. there always exists a hyperplane parametrized by (γ and containing the set side.This hyperplane is called "separating hyperplane".The following assumption stems from this property and simply assumes this separating hyperplane to be non-vertical (i.e.γ 0 > 0): Assumption 4.1.There exists non-negative finite constants γ 1 , γ 2 , • • • , γ L such that the following holds, i.e. there exists a separating hyperplane parametrized by imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 Remark 4.1.The parameters γ 1 , • • • , γ L are called Lagrange multipliers and this assumption is equivalent to the existence of Lagrange multipliers for constrained convex program (24)-(26).It is known that Lagrange multipliers exist if the Slater's condition holds ( [2]), which states that there exists a nonempty interior of the feasible region for the convex program.Slater's condition is very common in convex optimization theory and plays an important role in convergence rate analysis, such as the analysis of the interior point algorithm ( [5]).In the current context, this condition is satisfied, for example, in energy aware server scheduling problems, if the highest possible sum of service rates from all servers is strictly higher than the arrival rate.
are processes resulting from the proposed algorithm.Under the Assumption 4.1, where Proof.First of all, from the statement of Lemma 3.3, for the proposed algorithm, we can define the corresponding processes (f n [t], g n [t]) for all n as where the last equality follows from the definition of f n (α n ) and )) ∈ P n , ∀t, ∀n.
By Assumption 4.1, we have Rearranging terms gives Taking the time average from 0 to T − 1 gives For the left hand side of (27), we have where the inequality follows from (10) in Lemma 3.3.For the right hand side of (27), we have where the inequality follows from the fact that γ l ≥ 0, ∀l and (11) in Lemma 3.3.Substituting (28) and ( 29) into (27) finishes the proof.

Simulation Study in
Energy-aware Scheduling.Here, we apply the algorithm introduced in Section 2 to deal with the energy-aware scheduling problem described in Section 1.1.To be specific, we consider a scenario with 5 homogeneous servers and 3 different classes of jobs, i.e.N = 5 and L = 3.We assume that each server can only choose one class of jobs to serve during each frame.So the mode set M n contains three actions {1, 2, 3} and the action i stands for serving the i-th class of jobs and we count the number of serviced jobs at the end of each service duration.The action m n k determines the following quantities: • The uniformly distributed total number of class l jobs that can be served with expectation • The geometrically distributed idle/setup time I n k slots with constant energy consumption p n per slot and zero job service.The expectation E( I n k | m n k ) := I n (m n k ).The idle/setup cost is p n = 3 units per slot and the rest of the parameters are listed in Table 1.
Following the algorithm description in Section 2, the proposed algorithm has the queue updating rule and each system minimizes ( 7) each frame, which can be written as min .
Each plot for the proposed algorithm is the result of running 1 million slots and taking the time average as the performance of the proposed algorithm.The benchmark is the optimal stationary performance obtained by performing a change of variable and solving a linear program, knowing the arrival rates (see also [12] for details).
Fig. 3 shows as the trade-off parameter V gets larger, the time average energy consumptions under the proposed algorithm approaches the optimal energy consumption.Fig. 4 shows as V gets large, the time average number of services also approaches the optimal service rate for each class of jobs.In Fig. 5, we plot the time average queue backlog for each class of jobs verses V parameter.We see that the queue backlog for the first class is always low whereas the rest queue backlogs scale up linearly with V .This is because the service rate for the first class is always strictly larger than the arrival rate whereas for the rest classes, as V gets larger, the service rates approach the arrival rates.This plot, together with Fig. 3, also demonstrate that V is indeed a trade-off parameter which trades queue backlog for near optimality.
APPENDIX A: ADDITIONAL LEMMAS AND PROOFS.
).For each summand, by queue updating rule (6), Thus, by the assumption  Taking expectations of both sides with Dividing both sides by T and passing to the limit gives lim sup finishing the proof.
Proof of Lemma 3.3.We prove bound (10) (( 11) is proved similarly).By definition of f n (α n ) in Definition 2.1, we have for any By the renewal property of the system, given α n k = α n , T n k and t∈T n k y n [t] are independent of the past information before t n k .Thus, the same equality holds if conditioning also on F n k , i.e.Hence, By the definition of f n [t], this further implies that Since the number of renewals is always bounded by the number of slots at any time, i.e. S n [T ] ≤ T + 1, it follows where the last inequality follows from Assumption 2.2 for the residual life time.Thus, Dividing both sides by T finishes the proof.
Proof of Lemma 3.5.Recall that t n k is the time slot where the k-th renewal occurs (k = 0, 1, 2, • • • ), then, it follows from the definition of stopping time ( [7]) that {t n k } ∞ k=0 is a sequence of stopping times with respect to {F[t]} ∞ t=0 satisfying t n k < t n k+1 , ∀k.Thus, by definition of where the last step follows from the assumption that the random variable Z n [t−1] is measurable with respect to F[t] for any t > 0 and t n k is a stopping time with respect to {F[t]} ∞ t=0 for all k ≥ 1.This gives the second part of the claim.
Proof of Lemma 3.7.We aim to prove {S n [T ] = k} ∈ F n k , ∀k ∈ N. First of all, recall that the index of the renewal starts from k = 0 and t n 0 = 0, thus, for any k Consider two cases as follows: The rest of the section is devoted to the proof of Lemma 3.2.
Proof of Lemma 3.2.To prove the first part of the claim, we define the following notation: is a vector of lim sups.By definition, any vector in ⊕ N n=1 P n can be constructed from ⊗ N n=1 P n , thus, it is enough to show that there exists a vector r * ∈ ⊕ N n=1 P n such that r * 0 = f * and the rest of the entries r * l ≤ d l , l = 1, 2, • • • , L. By the feasibility assumption for (1)-( 2), we can consider any algorithm that achieves the optimality of (1)-( 2) and the corresponding process {(f n [t], g n [t])} ∞ t=0 defined in Lemma 3.3 for Thus, by our preceeding assumption that the algorithm under consideration achieves the optimality of ( 1)-( 2 Overall, we have shown that r * ∈ ⊕ N n=1 P n achieves the optimality of ( 1)-( 2), and the first part of the lemma is proved.
To prove the second part of the lemma, we show that any point in ⊗ N n=1 P n is achievable by the corresponding time averages of some algorithm.Specifically, consider the following class of randomized stationary algorithms: For each system n, at the beginning of k-th frame, the controller independently chooses an action α n k from the set A n with a fixed probability distribution.Thus, the actions {α n k } ∞ k=0 result from any randomized stationary algorithm is i.i.d..By the renewal property of each system, we have , is also an i.i.d.process for each system n.
Next, we would like to show that any point in S n can be achieved by the corresponding expectations of some randomized stationary algorithm.Recall that S n defined in Definition 2.2 is the convex hull of We can then use {p i } m i=1 to construct the following randomized stationary algorithm: At the start of each frame k, the controller independently chooses action α i ∈ A n with probability p i defined above for i = 1, 2, • • • , m.Then, the one-shot expectation of this particular randomized stationary algorithm on system n satisfies Next, by definition of P n in Definition 2.2, any (f n , g n ) ∈ P n can be written as (f n , g n ) = (y/T, z/T ), where (y, z, T ) ∈ S n .Thus, it is achievable by the ratio of one-shot expectations from a randomized stationary algorithm, i.e.For the first part on the right hand side of (38), since

Fig 1 .
Fig 1.The sample timelines of three asynchronous parallel renewal systems, where the numbers underneath the figure index time slots and the numbers inside the blocks index the renewals of each system.
imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018• T n (m n k ) := E( T n k | m n k ).The expected frame size.• µ n l (m n k ) = E t∈T n k µ n l [t] m n k .The expected number of class l jobs served.• e n (m n k ) = E t∈T n k e n [t] m n k .The expected energy consumption.The goal is to minimize the time average energy consumption, subject to the queue stability constraints, i.e. n l [t]) ≥ λ l , ∀l ∈ {1, 2, • • • , L}. (4)

Fig 2 .
Fig 2. Illustration of an energy-aware scheduling system with 3 classes of jobs and 10 parallel servers.

1. 5 .
Notation and organization of the paper.Throughout the paper, we use superscript n ∈ {1, 2, • • • , N } to index different systems, use the subscript l ∈ {1, 2, • • • , L} to index different constraints and use the subscript k ∈ N to index the frames.For any vector x ∈ R d , the considered norms are x := d i=1 x 2 i , x 1 := d i=1 |x i | and x ∞ := max i |x i |.

Fig 3 .
Fig 3. Time average energy consumption verses V parameter over 1 millon slots.

G
n := y n (α n ), z n (α n ), T n (α n ) , α n ∈ A n ⊆ R L+2 ,By definition of convex hull, any point (y, z, T ) ∈ S n , can be written as a convex combination of a finite number of points from the set G n .Lety n (α n i ), z n (α n i ), T n (α n i ) m i=1be these points, then, we have there exists a finite sequence {p i } m i=1 , such that(y, z, T ) = m i=1 p i • y n (α n i ), z n (α n i ), T n (α n i ) , p i ≥ 0,

p
i • y n (α n i ), z n (α n i ), T n (α n i ) = (y, z, T ),which implies any point in S n can be achieved by the corresponding expectations of a randomized stationary algorithm.
Now we claim that for y n [t], z n [t] and T n k result from the randomized stationary algorithm, prove (36) and (37) is shown in a similar way.Consider any fixed T , and let S n [T ] be the number of renewals up to (and including) time T .Then, from Lemma 3.7 in Section 3, S n [T ] is a valid stopping time with respect to the filtration {F n k } ∞ k=0 .

.
For the second part on the right hand side of (38), by Assumption 2.2, ≤ y max • E t n S n [T ] − T ≤ √ By max , which implies lim T →∞ 1 T E t n S n [T ] −1 t=T y n [t] = 0. Overall, we have (36) holds.imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 At time slot t, after observing the state s[t] ∈ S n and choosing the action u[t] ∈ U n , the n-th MDP receives a penalty y n (u[t], s[t]) and L types of resource costs z n 1 Definition 2.2.Let S n be the convex hull of y n imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 Thus, by Lemma 3.5,Q[t n k ] is determined by F n k .For the proposed algorithm, each system makes decisions purely based on the virtual queue state Q[t n k ], and by the renewal property of each system, given the decision at the k-th renewal, the random quantities T n k , z n [t] and y n [t], t ∈ T n k are independent of the imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 Lz max (z max + d max )B from Lemma 3.6 in the equality and use T + 1 ≤ 2T in the final equality.Dividing both sides by T finishes the proof.Theorem 3.2.The sequences {y n [t]} ∞ t=0 and {z n [t]} ∞ t=0 produced by the proposed algorithm satisfy all the constraints in (2) and achieves O(1/V ) near optimality, i.e.
3.4.Achieving near optimality.The following theorem gives the performance bound of our proposed algorithm.imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 is a martingale.Consider any fixed T ∈ N and define S n [T ] as the number of renewals up to T .Lemma 3.7 shows S n [T ] is a valid stopping time with respect to the filtration {F n k } ∞ k=0 .Furthermore, {F n K∧S n [T ] } ∞ K=0 is a supermartingale by Theorem 3.1, where a ∧ b := min{a, b}.For this fixed T , we have Thus, A ∈ F n k+1 , which implies F n k ⊆ F n k+1 , ∀k, and {F n k } ∞ k=0 is indeed a filtration.This finishes the first part of the proof.Next, we would like to show that G t n k imsart-ssy ver.2014/02/20 file: asyn-theory.texdate: May 23, 2018 1. t ≤ T .In this case, the set (35) is empty and obviously belongs to F[t]. 2. t > T .In this case, we have {t n Overall, we have {S n [T ] = k} ∩ {t n k ≤ t} ∈ F[t], ∀t ∈ N. Thus, {S n [T ] = k} ∈ F n k and S n [T ] is indeed a valid stopping time with respect to the filtration {F n k } ∞ k=0 .