Analysis of a Splitting Estimator for Rare Event Probabilities in Jackson Networks

We consider a standard splitting algorithm for the rare-event simulation of overflow probabilities in any subset of stations in a Jackson network at level n, starting at a fixed initial position. It was shown in DeanDup09 that a subsolution to the Isaacs equation guarantees that a subexponential number of function evaluations (in n) suffice to estimate such overflow probabilities within a given relative accuracy. Our analysis here shows that in fact O(n^{2{\beta}+1}) function evaluations suffice to achieve a given relative precision, where {\beta} is the number of bottleneck stations in the network. This is the first rigorous analysis that allows to favorably compare splitting against directly computing the overflow probability of interest, which can be evaluated by solving a linear system of equations with O(n^{d}) variables.


Introduction
The development of rare-event simulation algorithms for overflow probabilities in stable open Jackson networks has been the subject of a substantial amount of papers in the literature during the last decades (see Section 2 for the specification of an open Jackson network). A couple of early references on the subject are [20] and [1]. Subsequent work which has also been very influential in the development of efficient algorithms for overflows of Jackson networks include [22], [12], [13], [16], [14], [9], [19], [11] and [8]. The survey papers of [15] and [7] provide additional references on this topic.
The two most popular approaches that are applied to the construction of efficient rareevent simulation algorithms are importance sampling and splitting (see [3]). Importance sampling involves simulating the system under consideration (in our case the Jackson network) according to a different set of probabilities in order to induce the occurrence of the rare event. Then, one attaches a weight to each simulation corresponding to the likelihood ratio of the observed outcome relative to the nominal / original distribution. In splitting, on the other hand, there is no attempt to bias the behavior of the system. Instead, the idea is to decompose the occurrence of the rare event of interest (in our case overflow in a Jackson network) into a sequence of nested "milestone" events whose subsequent occurrence is not rare. The rare event occurs when the last of the milestone events occurs. The idea is to keep splitting the particles as they reach subsequent milestones. Of course, each particle is attached a weight corresponding to the total number of times it has split so that the overall estimation (which is the sum of the weights corresponding to the particles that make it to the last milestone) provides an unbiased estimator of the probability of interest.
The most popular performance measure for efficiency analysis of rare-event simulation algorithms for Jackson networks corresponds to that of "asymptotic optimality" or "weak efficiency". In order to both explain the computational complexity implied by this notion and to put in perspective our contributions let us discuss the class of problems we are interested in. Starting from any fixed state, we consider the problem of computing the probability that the total number of customers in any fixed set of stations in the network reaches level n prior to reaching the origin. That is, the probability that the sum of the queue lengths in any given set of stations reaches level n within a busy period. The number of stations in the whole network is assumed to be d and the number of bottleneck stations (i.e. stations with the maximum traffic intensity in equilibrium) is β.
Weak efficiency guarantees that a subexponential number of replications (as a function of the overflow level, say n) suffices to compute the underlying overflow probability of interest within a given relative accuracy. In contrast, as we shall explain in Section 2, overflow probabilities in the setting of Jackson networks can be computed by solving a linear system of equations with O(n d ) 1 unknowns. It is well known that Gaussian elimination then takes O n 3d operations to find an exact solution to such linear system. Moreover, since in our case the associated linear system has some sparsity properties the linear equations can be solved in as many as O n 3d−2 operations (see the discussion in Section 2). Our analysis for the solution of the associated linear system of equations is not intended to be exhaustive. Our objective is simply to make the point that naive Monte Carlo (which indeed takes an exponential number of replication in n to achieve a given relative accuracy) is not the natural benchmark that one should be using in order to test the performance of an efficient simulation estimator for overflows in Jackson networks. Rather, a more natural benchmark is the application of a straightforward method for solving the associated system of linear equations. It would be interesting to provide a detailed study of various methods for solving linear system of equations (such as multigrid procedures) that are suitable for our environment and even combine them with the ideas behind efficient simulation procedures. This, however, would be the subject of an entire paper and therefore is left as a topic for future research.
Our goal here is to analyze a class of splitting algorithms similar to those introduced in [22] for the evaluation of overflow probabilities at level n. Further analysis was given in [8], where the authors provide necessary and sufficient conditions for the design of the "milestone events" in order to achieve subexponential complexity in n. Our contribution is to show that if the milestone events are properly placed as suggested by [8], the splitting algorithm requires O n 2β+1 function evaluations to achieve a fixed relative error. Since clearly the number of bottleneck stations β is at most d, the complexity of splitting is O n 2d+1 , which is substantially smaller than that of the direct solution of the associated linear system. Our analysis therefore provides theoretical justification for the superior performance observed when applying splitting algorithms compared to directly solving the associated linear system.
We believe that our results shed light into the type of performance that can be expected when applying particle algorithms beyond the setting of Jackson networks. This feature should be emphasized, specially given the fact that a linear time algorithm for computing overflows in Jackson networks has been developed very recently (see [4]). Contrary to particle methods, which are versatile and that can in principle be applied in great generality, the algorithm in [4] takes advantage of certain properties of Jackson networks which are not shared by all classes of systems.
In addition, our results also provide interesting connections to recent performance analysis studied in the context of state-dependent importance sampling algorithms for a class of Jackson networks. These connections might eventually help guide the users of rare event simulation algorithms decide when to apply importance sampling or splitting. For instance, consider the overflow at level n of the total population of a tandem network with d stations. The work of [9] proposes an importance sampling estimator based on the subsolution of an associated Isaacs equation. In particular, [9] shows that if exponential tiltings are applied using the gradient of the associated subsolution as the tilting parameter (depending on the current state), the corresponding algorithm is weakly efficient. Turns out that there are many subsolutions that can be constructed varying certain so-called "mollification parameters". A recent analysis based on Lyapunov inequalities given in [6] shows that a natural selection of mollification parameters guarantees O n 2(d−β)+1 function evaluations to achieve a given relative error. Our analysis here therefore guarantees that one can achieve a running time of order O n d+1 if one chooses importance sampling when there are more than d/2 bottleneck stations in the network and splitting if there are less than d/2 bottleneck stations. Although our analysis is still not sharp we believe that our results provide a significant step in order to understand the connections between splitting and importance sampling.
The rest of the paper is organized as follows. A brief discussion on complexity and efficiency considerations is given in Section 2. Then we discuss the necessary large deviations asymptotics for Jackson networks required for our analysis in Section 3. The introduction of the splitting algorithm as well as connections to the theory developed in [8] is given in Section 4. Our complexity analysis is finally given in Section 5.

Complexity and Efficiency
We shall review concepts of efficiency and complexity in rare event simulation. We start our discussion in the context of a generic class of rare event simulation problems. Consider a sequence of events {E n , n = 1, 2...} with p n P (E n ) → 0 as n → ∞ (Without loss of generality, we might assume that p n → 0 exponentially fast as n ր 0.) The design of an efficient rare-event simulation algorithm is typically associated with the construction of an unbiased estimator, sayp n , such that p n = E By virtue of Chebyshev's inequality we obtain the following property for the relative error, |p n (m) − p n |/p n , of the estimate Hence, for a pre-determined upper bound ǫ of relative error, if we choose the number of replications m such that where cv 2 n = Var (p n ) /p 2 n is the squared coefficient of variation ofp n , we can guarantee that the relative error is no larger than ǫ with probability at least 1 − δ.
Equation (2) stipulates that m needs to grow at least at the same rate as cv 2 n does in order to keep the relative error within a desirable threshold. If cv 2 n grows at a subexponential rate (i.e. if log (cv n ) 2 = o (log p n ) , as n ր ∞) the estimator is said to be asymptotic optimality, logarithmical efficient or weakly efficient. In this case, the number of replication needs to increase subexponentially in n to achieve a prescribed level of relative accuracy. The name "asymptotic optimality" is derived from the fact that weak efficiency implies that the exponential rate of decay to zero of the Ep 2 n coincides with that of p 2 n and therefore is maximal (by virtue of Jensen's inequality).
Obviously, one has to keep in mind that weak efficiency measures the optimality of the estimator for a given level of computational budget. Coming to the splitting algorithm, it is apparent that the computational effort varies drastically with the degree of splitting performed; one must therefore take into account the cost involved in generating each replication ofp n . We measure such cost in terms of the number of elementary function evaluation which we will take to be simple addition, multiplication, comparison, and the generation of a single uniform random variable. When we incorporate the computational cost per replication of the estimator, (2) says that the total number of function evaluations needed has to keep pace with the work-normalized squared coefficient of variation, i.e., cv 2 n · N n , where N n is the cost per replication ofp n . We will show in section 5 that N n is closely related to the expected total number of the survival particles in a single run of the Splitting algorithm.
Coming back to the setting of Jackson networks. It is important to recognize that overflow probabilities in such a setting can be obtained by solving a system of linear equations. So, a reasonable benchmark for any simulation based algorithm to be regarded "efficient" is indeed how fast one can solve such a system of linear equation by a direct procedure. Jackson networks are basically multidimensional simple random walks with constrained behavior on the boundaries. In particular, they are Markov chains living on a countable state-space. The overflow probabilities can be conveniently expressed as first passage time probabilities, which in turn can be characterized as the solution to certain linear system of equations thanks to its countable state-space Markov chain structure. We shall quickly review how to obtain such linear system for a generic Markov chain Q = {Q k : k ≥ 0} living on a countable state-space S with transition matrix {K (x, y) : x, y ∈ S}. Let A, B be two disjoint subsets of S, define T A inf{k ≥ 0 : X ∈ A}, T B inf{k ≥ 0 : X ∈ B} and put p (x) = P x (T A ≤ T B ). A simple conditioning argument on the first transition leads to subject to the boundary conditions In fact, p (·) is the minimum non-negative solution to the above system (see [5]). Now, if Q describes the state of the embedded discrete time Markov chain corresponding to a Jackson network with d stations then S = Z d + . The transition dynamics of a Jackson network are specified as follows (see [21] p. 92). Inter-arrival times and the service times are all independent and exponentially distributed random variables. The arrival rates are given by the vector λ = (λ 1 , ..., λ d ) T and service rates are given by µ = (µ 1 , ..., µ d ) T . (By convention all of the vectors in this paper are taken to be column vectors and T denotes transposition.) A job that leaves station i joins station j with probability P i,j and it leaves the system with probability We shall consider open Jackson networks, which satisfy the following conditions: ii) ∀i, either P i0 > 0 or P ij 1 P j 1 j 2 ...P j k 0 > 0 for some j 1 , ..., j k .
iii) The network is stable (i.e. a stationary distribution exists).
These conditions simply require that each station will receive jobs either directly from the outside or routed from other stations, and each job will leave the system eventually. Our main interest lies in the evaluation of p n (x) assuming that B = {0} and A n = {y : v T y = n} where v is a binary vector which encodes a particular subset of the network (i.e., the i-th position of the vector v is 1 if station i falls in the subset of interest, and 0 otherwise). We shall denote by V (x) = x T v the mapping recording the total population in the stations corresponding to the vector v. The case in which v = 1 = (1, 1, ..., 1) T corresponds to the total population of the system. So, p n (x), or more precisely p V n (x), corresponds to the overflow probability in the subset encoded by v within a busy period and starting from x and that. In this setting, it follows (as we shall review in the next section) that p V n (x) −→ 0 exponentially fast in n as n ր ∞ and the system of equations (3) has O n d unknowns. Gaussian elimination requires O n 3d function evaluations to find the solution of such system. But since each state of the Markov chain in this case has possible interactions with only a small fraction of the entire state-space, it is therefore possible to permutate the states (say in lexicographic order) so that the system is banded (i.e. the associated matrix is sparse in the sense that its non-zero entries fall to a diagonal band.) One can show that the bandwidth is O n d−1 , and therefore solving such a banded linear system requires O n d · n d−1 2 = O n 3d−2 operations (see, e.g., [2]).
Estimators that possess weak efficiency (in a work-normalized sense) are guaranteed to run at subexponential complexity. When comparing to the above polynomial algorithms of solving systems of linear equations, the efficiency analysis of such estimators appears to be insufficient. We will show in later analysis that the multilevel Splitting algorithm suggested by Dean and Dupuis [8], applied to estimate the overflow probabilities in Jackson networks, requires fewer function evaluations than directly solving the associated system of linear equations.

Jackson Networks: Notation and Properties
As we mentioned in the previous section, a Jackson network is encoded by two vectors of arrival and service rates, The network is assumed to be open and stable so conditions i), ii), and iii) described in the previous section are in place.
Given the stability assumption, the system of equations given by admits a unique solution φ = λ T (I − P ) −1 (see [3]). The traffic intensity at station i in the system in equilibrium is given by ρ i which is defined by and satisfies ρ i ∈ (0, 1) for all i = 1, 2, ..., d. Define ρ * = max 1≤i≤d ρ i and let β be the cardinality of the set {i : ρ i = ρ * }. We shall study the queueing network by means of the embedded discrete time Markov ). For each k, Q i (k) represents the number of customers in station i immediately after the kth transition epoch of the system. As mentioned before, the process Q lives in the space S = Z d + . Let V (y) = y T v be the total population in the stations corresponding to the binary vector v. We are interested in the overflow probability in any given subset of the Jackson network. More precisely, we wish to estimate p V n = P { total population in stations encoded by v reaches n before returning to 0, starting from 0}.
In turn, p V n can be expressed in terms of the following stopping times, Indeed, if we use the notation P x (·) P(·|Q(0) = x) then we can rewrite p V n as Similarly, The asymptotic analysis of p V n (x) can be studied by means of large deviations theory. We shall indicate how this theory can be applied to specify an efficient splitting algorithm in the next section. In the mean time, let us provide a representation for the dynamics of the queue length process that will be convenient in order to motivate the elements of the efficient splitting algorithm that we shall analyze.
As we mentioned earlier, Jackson networks are basically constrained random walks. The constraints arise because the number of customers in each station must be non-negative. Thinking about Jackson networks as constrained random walks facilitates the introduction and motivation of the necessary large deviations elements behind the description of the splitting algorithm. In order to specify the dynamics of the embedded discrete time Markov chain in terms of a random walk type representation we need to introduce notation which will be useful to specify the transitions at the boundaries induced by the non-negativity constraints.
The state-space Z d + can be partitioned in 2 d different regions which are indexed by all the subsets E ⊆ {1, . . . , d}. The region encoded by a given subset E is defined as The interior of the domain is given by ∂ ∅ and the origin is represented by ∂ {1,2,...,d} . Subsets other than the empty set represent the "boundaries" of the state-space and correspond to system configurations in which at least one station is empty. The collection of all possible values that the increments of the process Q can take depends on the current region at which Q is positioned. However, in any case, such collection is a subset of where e i is the vector whose i-th component is one and the rest are zero. An element of the form e i represents an arrival at station i, an element of the form −e i + e j represents a departure from station i that flows to station j and an element of the form −e j represents a departure from station i out of the system. The set of all possible departures from station i is a subset of Because of the nonnegativity constraints on the boundaries of the system we have to be careful when specifying the transition dynamics. First we define a sequence of i.i.d. random The dynamics of the queue-length process admit the random walk type representation given by where ζ (·) is the constrained mapping and it is defined for The large deviations theory associated to Jackson networks is somewhat similar (at least in form) to that of random walks. One has to recognize, of course, that the non-smoothness of the constrained mapping as a function of the state of the system creates substantial technical complications, but we will leave aside this issue in our discussion because our objective is simply to describe the form of the necessary large deviations results for our purposes. An extremely important role behind the development of large deviations theory for light-tailed random walks is played by the logmoment generating function of the increment distribution. So, given the similarities suggested by the dynamics of (8) and those of a simple random walk it is not surprising that the logmoment generating function of the increments, namely, also plays a crucial role in the large deviations behavior of p V n (x) as n ր ∞. In order to understand the large deviations behavior of p V n it is useful to scale space by 1/n, thereby introducing a scaled queue length process {Q n (k) : k ≥ 0} which evolves according to Q n (k + 1) = Q n (k) + 1 n ζ (Q n (k), Y (k + 1)) .
Suppose that Q n (0) = y = x/n and note that Note that using the scaled queue length process one can write Large deviations theory dictates that as n ր ∞ for some non-negative function W V (·). In order to characterize W V (·) we can combine the previous expression together with (10) and a formal Taylor expansion to obtain (1)) .
Sending n ր ∞ we formally arrive at the equation ψ (y, −∂W V (y)) = 0 (12) together with the boundary condition W V (y) = 0 if V (y) ≥ 1. The previous equation is the so-called Isaacs equation which characterizes the large deviations behavior of p V n (·) and it was introduced together with a game theoretic interpretation by Dupuis and Wang in [10]. The solution to (12) is understood in a weak sense because the function W V (·) is typically not differentiable everywhere. Nevertheles, it coincides with a certain calculus of variations representation which can be obtained out of the local large deviations rate function for Jackson networks (see [17]).
An asymptotic lower bound for W V (y) can be obtained by finding an appropriate subsolution to the Isaacs equation, in which the equality signs in (12) are appropriately replaced by inequalities thereby obtaining a so-called subsolution to the Isaacs equation. In particular, W V (·) is said to be a subsolution to the Isaacs equation if , which translates to an asymptotic logarithmic upper bound p V n (y). The subsolution is said to be maximal at zero if W V (0) = W V (0). Not surprisingly, subsolutions are easier to construct than solutions and, as we shall discuss in the next section, beyond their use in the development of asymptotic upper bounds they can be applied to the design of efficient simulation procedures. The use of subsolutions to the Isaacs equation for the design of efficient simulation algorithms was introduced in [10]. A derivation of the subsolution equation (13) following the same spirit leading to (12) using Lyapunov inequalities is given in [6].
As we mentioned in Section 2, the efficiency analysis of a rare-event simulation estimator depends on the growth rate of its coefficient of variation. We are interested in an asymptotic analysis that goes beyond the error term exp(o (n)) given by the large deviations approximation (11). So, we must enhance the large deviations approximations in order to provide a more precise estimate for p V n . Developing such an estimate is the aim of the following proposition which follows as a direct consequence of Proposition 2 and the analysis in Section 5 in [4]; see also Section 4 in this paper for a sketch of the proof. Proposition 1. There exists K > 0 (independent of x and n) such that where Q ∞ is the steady state queue length. Moreover, if x ≤ c for some c ∈ (0, ∞) then 2 as n ր ∞.
Remark It is important to keep in mind that we shall mostly work with the process Q (·) directly, as opposed to the scaled version Q n (·) which is used in the analysis of [8].
The previous proposition provides the necessary means to estimate p V n up to a constant; we just need to recall that the distribution of Q (∞) is computable in closed form (see [21] p. 95). In particular, we have that We shall use π (·) to denote the stationary measure of Q. In simple words, the previous equation says that the steady state queue length process has independent components which are geometrically distributed. In particular, P (Q j (∞) = m) = ρ m j (1 − ρ j ) for m ≥ 0. The next proposition follows directly from standard properties of the geometric distribution (see Proposition 3 in [4]).
is the number of bottleneck stations in the target subset corresponding to v.

The Splitting Algorithm
The previous section discussed some large deviations properties required to guide the construction of an efficient splitting scheme using the theory developed in the work of Dean and Dupuis [8]. In order to explain the construction suggested by Dean and Dupuis let us first discuss the general idea behind the splitting algorithm that we shall analyze; a variation of which was first applied to Jackson networks by by Villen-Altamirano and Villen-Altamirano [18].
The strategy is to divide the state-space into a collection of regions {C n j : 0 ≤ j ≤ l n (x)} which are nested and that help define "milestone" events that interpolate between the initial position of the process and the target set, which corresponds to the region C n 0 . That is, in our setting we put C n 0 {y ∈ S : V (y) ≥ n} and the remaining C n j 's are placed so that C n 0 ⊆ C n 1 ⊆ ... ⊆ C n Mn . How to construct the level sets C n j in order to induce efficiency will be discussed below. An observation that is intuitive at this point, however, is that one should have M n = Θ (n) so that the next milestone event becomes accessible given the current level. For the moment, let us assume that the C n j 's have been placed. The splitting algorithm proceeds as follows.

Algorithm SA
1.-Initiate the simulation procedure with a single particle starting from position x ∈ C n k for a given k ≥ 1. Let w 1 = 1 be the initial weight associated to such particle.
2.-Evolve the initial particle until either it hits {0} or it hits level C n k−1 . If the particle hits {0}, then the particle is said to die. If the particle reaches level C n k−1 then it is replaced by r identical particles (for a given integer r > 1). The replacing particles are called the immediate descendants or children of the initial particle, which in turn is said to be their parent. The children are positioned precisely at the place where the parent particle reached level C n k−1 . The weight w j associated to the j-th children (enumerate the children arbitrarily) has a value equal to the weight of the parent particle multipled by 1/r.
3.-The procedure starting from step 1 is replicated for each of the offspring particles in place; carrying over the value of each of the weights at each level for the surviving particles (the weights of the particles that die can be disregarded).
4.-Steps 1 to 3 are repeated until all the particles either die or reach level C n 0 .
Dean and Dupuis in [8] show how to apply large deviations theory to select the C n j 's in order to obtain a weakly efficient splitting algorithm. One needs to balance the number of the C n j 's so that it is not unlikely for a given particle to reach the next level while keeping the total number of particles controlled. We now provide a formal motivation for the use of large deviations for constructing the C n j 's in a balanced way. It is convenient, as we did in our formal large deviations discussion in the previous section, to consider the scaled process Q n (·). Let us assume that the splitting mechanism indicated in Algorithm SA is in place and that our initial position is set at level Q (0) = x, so that Q n (0) = y = x/n. The C n j 's are typically constructed in terms of the level sets of a so-called importance function which we shall denote by U (·). In particular, put D n {y ∈ n −1 S : V (y) < 1} and set C n j = nL zn(j) , where and the z n (j)'s are appropriately chosen momentarily. Then, define l n (y) = min{j ≥ 0 : ny ∈ C n j } = min{j ≥ 0 : x ∈ C n j }.
The total weight corresponding to a particle that reaches level C n 0 given that it started at level l n (y) is r −ln(y) . In order to have at least a weakly efficient algorithm we wish to achieve two constraints. The first one imposes the total weight of a particle reaching level C n 0 to be p V n (x) exp (o (n)); this would guarantee that the second moment of the resulting estimator achieves asymptotic optimality. The second constraint dictates that the expected number of particles that make it to C n 0 , which is roughly r ln(y) p V n (x) exhibits subexponential growth (i.e. exp (o (n))); this would guarantee a cost per replication that is subexponential. Note that both constraints lead to the requirement of r ln(y) p V n (x) = exp (o (n)). So, given a subsolution W V (·) to the corresponding Isaacs equation, which implies that it suffices to ensure that l n (y) log (r) − nW V (y) = o (n) .
The behavior of l n (y) as a n ր ∞ only relates to the properties of the function U (·) and it is really independent of the large deviations behavior of the system. In particular, picking z n (j) = ∆j/n, ∆ ∈ (0, 1] yields l n (y) = ⌈nU (y) /∆⌉ and therefore, equation (16) suggests that one should select U (y) = ∆W V (y) / log (r) with W V (0) = W V (0) in order to obtain a weakly efficient estimator for p V n . This is precisely the conclusion obtained in the work of [8] who present a rigorous analysis that justifies the previous heuristic discussion. Our development in the next section will sharpen the efficiency properties of the sampler proposed in [8] when applied to Jackson networks. So, we content ourselves with the previous heuristic motivation for the splitting method that we will analyze in the next section and which in turn is based on the viscosity subsolution given bȳ where ̺ i = − log ρ i for i = 1, ..., d, see e.g., [11] and [8].
We close this section with a precise definition of the estimator that we will analyze. First, given a constant ∆ ∈ (0, 1] (the level size) defineW V (·) as indicated in (17) for each y = x/n with x ∈ S. Then, select an integer r > 1 and define U (y) = ∆W V (y) / log (r). Given the initial position x put y = x/n and define the sets {C n j : 1 ≤ j ≤ l n (y)} as indicated above (see equation (15)). Run Algorithm SA and let N n be the number of particles that survive up to C n 0 ; their corresponding final weight is 1/r ln(x) . Our estimator for p V n (x) is simply Now, for the sake of analytical convenience, when analyzing the second moment of R n (x) we will adopt the so-called fully branching representation of the previous estimator (see [8]). Such fully branching representation is obtained by splitting death particles at level zero. It is useful to think about fully branching conceptually in the following terms. Start with a particle at position x in level C n ln(x) and run step 2 of Algorithm SA, but instead of killing the particle if it hits {0} before reaching C n ln(x)−1 , just allow it to also split into r particles and update the weight of the children as indicated in Algorithm SA when the particle reaches C n ln(x)−1 .
Step 3 continues, now the particles that are sitting in level C n ln(x)−1 will evolve and the death particles will once again split and remain in state 0 (so state 0 is being populated with death particles). After l n (x) iterations we have r ln(x) total particles labeled 1, 2, ..., r ln(x) , each with weight 1/r ln(x) . We define I j as the indicator function of the event that the j-th particle is in C n 0 so that N n (x) = r ln(x) j=1 I j . The fully branching representation of R n (x) is simply

Analysis of Splitting Estimators
We are now in a good position to perform a refined efficiency analysis for the estimator R n (x). We shall break our analysis into two parts. The first part corresponds to the expected number of particles generated per run and the second part deals with the second moment of R n (x). First, we will review a technique studied in [4] based on the corresponding time reversed chain associated to the queue length process Q. For both quantities we are able to obtain an upper bound. We are then able to reach the conclusion that this multilevel Splitting algorithm substantially outperforms the direct polynomial time algorithm for solving the associated system of linear equations. Our analysis takes advantage of the time reversed process associated to the underlying Jackson network which we shall now define. Given the transition matrix {K (x, y) : x, y ∈ S} of the process Q, we define the reversed Markov chainQ = {Q (k) : k ≥ 0} via the transition matrixK (·):K (y, x) = K (x, y) π (x) /π (y) , for x, y ∈ S. It turns out thatQ also describes the queue length process of an open stable Jackson network with stationary distribution equal to π (·), (see [21] p. 95). We will usẽ P x (·) to denote the probability measure in path space associated toQ given thatQ (0) = x.
The following result is similar to that of Proposition 2 in [4]. However, our representation in (20) is slightly more useful for our purposes.
Proof. We assume that x = 0. The case x = 0 is included in the analysis of (21). First, we observe that Following the same technique as in Proposition 2 in [4] we have that Letting y ′ i = y k−i for i = 1, ...k we see that the summation in each of the terms above ranges over paths y ′ 0 , ..., y ′ k satisfying that y ′ 0 ∈ C n 0 ,T {x} = k (so in particular y ′ k = x) and also that T {0} ≥ k,T C n 0 > k. So, we can interpret the previous sum as This yields part (20). Part (21) corresponds to Proposition 2 of [4]. Finally, (22) also follows as in Proposition 7 of [4].
Proposition 1 and 2 from Section 2 follow as a consequence of this result, the rest of the details are given in Section 5 of [4]. Given the subsolution we proposed in Section 4, the importance function can be written as where C = − log ρ V * / log r, and α = ̺ / log ρ V * . The level index function also simplifies to We shall first look at the expected number of surviving particles of the splitting algorithm which characterizes the stability of the algorithm. One shall keep in mind that when the complexity of the splitting algorithm is concerned, what actually matters is the total function evaluation involved in each run. An upper bound is obtained for this quantity, as measured by the sum of all particles generated at interim levels weighted by the maximum remaining function evaluations associated with each of them. We first have the following result.
where β V , introduced in Proposition 2, denotes the number of bottleneck stations corresponding to the vector v.
As pointed out earlier, the number of terminal surviving particles, although serves as a reasonable proxy to measure the stability of the algorithm, is not suitable for quantifying the complexity. We also need to take into account the number of function evaluations required to generate R n (x). The next result addresses precisely this issue.
Proposition 5. The expected computational effort per run required to generate a single replication of R n (x) is O n β V +1 .
Proof. To see this, let N n m , m = 0, ...., l n (x), be the number of particles that survive to level C n l n (x)−m . Also let η m,j be the remaining computational effort of the j-th particle at the start of the m-th level until it either reaches the next level or it dies out. Putη m,j (x j ) to be the expectation of η m,j given that the position of the j-th particle at the start of level m is x j . Note that the norm of the position of x j is less than c · m for a given constant c that depends on the traffic intensities of the system but not on the position of the particle per-se. Therefore, it is easy to see that sup 1≤j≤N n mη m,j (x j ) ≤ c · m, for some c ∈ (0, ∞). Intuitively, each particle at level m either advances to the next level, or it dies out by hitting the zero level before moving to the next one, since it takes Θ (1) work to cross one single layer, η m,j is dominated by the work required to die out, and hence its mean is bounded from above by c × m for some constant c. The expected total work per run is then given by for some positive constant c where the first inequality is due to independence.
To facilitate the analysis of the second moment of R n (x) we add the following notations. We follow the analysis in [8] to make our exposition here self-contained. For a given generation m, denote by Q m,j the position of the jth particle; recall that the accumulated weight up to the mth stage of such a particle is r m . Let χ m,j be the disjoint grouping of particles in the next generation (i.e., m + 1) according to their "parents" in generation m. For k ∈ χ m,j , denote by d k the offsprings of this particle at the final stage l n (x). We then have the following expansion of the second moment of R n (x): where we define I m k to be the indicator function of the event that particle m k is in the set C n 0 . The second term above is essentially the diagonal terms of the second moment (26), and for the off-diagonal terms, for each generation, we categorize particles according to their common ancestors, a technique used by [8]. For the first term, we have Conditioning on the whole genealogy up to step m, we obtain Combining this with the diagonal term in (26), which can be readily expressed as r −ln(x) p V n (x), we arrive at the following expansion for the second moment of R n (x): The next result takes advantage of expression (27) to obtain an upper bound for E x R n (x) 2 .
Proposition 6. The second moment of R n (x) satisfies where β = d i=1 I (ρ i = ρ * ) is the number of bottleneck stations in the whole network.
In order to prove the previous result, we will show that the second moment of R n (x) is dominated by the first item on the righthand side of the equality in (27). In turn, the asymptotic behaviour of such term hinges on the conditional distribution of the exact position of the particle in generation m, Q m,1 in C n ln(x)−m . Proof. We begin the proof with an important property implied by the splitting algorithm: where we used the representations of U (·) and l n (x) in (23) and (24) and the definition of L z in (14). In other words, if a particle survives m generations then its current position is beyond the mth level, which implies that the weighted sum of system population, with weight given by the vector ̺, is bounded from above by that of the initial position adjusted by a linear function in m. If we define the stopping timeT m log r}, the above property also implies that V (Q m,1 ) > 0 ⇒T m C < T 0 . Now, the expectation term in the sum of (27) can be expressed as where we used the property derived in (29). For the first item in (30), we have where c 1 , K are some constants independent of n. Here we used Propositions 1 and 2 for the the last two inequalities respectively. As for the expectation term in (31), since the process Q (·) has for each dimension an increment at most of unit size, we can write where c 2 , c 3 and δ are some positive constants. It remains to give a bound on the term P x T m C < T 0 . As a result of Proposition 2, Note thatα i ∈ (0, 1). To finish the proof we need the following Lemma.
Proof. Note that One direction is elementary, sinceα T Q (∞) ≥ Z (β, 1 − ρ * ), we clearly have For the other direction, note that there exists a constantρ < ρ * such that As a result,α where " ≤ st " denotes that the left hand side is stochastically dominated by the right hand side. But since 1 − ρ * < 1 −ρ, a similar argument as given by Proposition 3 in [4] allows us to obtain for some finite constant c 0 that is independent of m. Combining (33) and (34), we have Using again Proposition 3 of [4], we reach the conclusion that Going back to (30), the above Lemma allows us to write If we combine (31), (32) and (36), we obtain the following upper bound for the expectation term in the sum of expression (27): where for the first equality we use the fact that ρ V * ≤ ρ * andĈ ≤ C, and for the second equality we use ρ V * = r −C . Putting the bound in (37) back to the sum in the first item of (27), we have Finally, note that the second item of (27) is dominated by (38), and it follows immediately that Equipped with these results, we are ready to summarize our discussions in the statement of the following Theorem, which is the main result of this paper. Theorem 1. To estimate the overflow probability p V n (x) using R n (x), the number of function evaluations needed for a given level of relative error is O n β V +β+1 .
Proof. Recall from section 2 that the number of function evaluations sufficient to achieve a pre-determined level of relative accuracy for the Splitting estimator is proportional to the work-normalized squared coefficient of variation. This is therefore immediate by combining the upper bound analysis of the computational effort per run in Proposition 5 along with the upper bound of the second moment of R n (x) available in Proposition 6.
A direct comparison to the O n 3d−2 complexity of solving a system of linear equations (see Section 2 ) yields the immediate conclusion that the Splitting algorithm is "efficient" in the sense that it improves over the "benchmark" polynomial algorithm. Even in the worst case when we look at the total population of the network and the network is totally symmetric, i.e., all stations are bottlenecks (β V = β = d > 3), the number of flops needed is off by a substantial factor n d−3 . In the case where β V = β = 1, the algorithm only requires a number of function evaluations that at most grows cubically in the level of overflow n. Furthermore, if the number of bottlenecks is less than half of the total number of stations, i.e. β < d/2, the Splitting algorithm enjoys a running time of order smaller than O n d , which is not worst than storing the vector that encodes the solution to the associated linear system. If, on the other hand, more than half of the stations are bottlenecks, faster importance sampling based algorithms do exist at least for the case of tandem networks; see the analysis in [6], which implies that O n 2(d−β)+1 function evaluations suffice to obtain an estimator with a given relative precision. Overall, the analysis thus provides some sort of guidance on the choice of simulation algorithms. It is meaningful to point out that the previous comparison is not based on the sharpest analysis. In fact we only resort to a rather crude upper bound in the analysis of the second moment of R n (x) in (31). A sharper result is possible by bounding the expectation term in (20) with more care. But as pointed out in the Introduction, even though there is still room for a more refined analysis, we believe our work provides substantial insights leading to a better understanding of the relations between these two classes of algorithms.