Efficiency of Parallel and Restart Exploration Strategies in Model-Free Stochastic Simulations

Published Online:https://doi.org/10.1287/stsy.2025.0108

Abstract

We analyze the efficiency of parallelization and restart mechanisms for stochastic simulations in model-free settings, where the underlying system dynamics are unknown. Such settings are common in Reinforcement Learning (RL) and rare-event estimation, where standard variance-reduction techniques like importance sampling are inapplicable. Focusing on the challenge of reaching rare states under a finite computational budget, we model exploration via random walks and Lévy processes. Based on rigorous probability analysis, our work reveals a phase transition in the success probability as a function of the number of parallel simulations: an optimal number N* exists, balancing exploration diversity and time allocation per simulation. Beyond this threshold, performance degrades exponentially. Furthermore, we demonstrate that a restart strategy, which reallocates resources from stagnant trajectories to promising regions, can yield an exponential improvement in success probability. In the context of RL, these strategies can improve policy gradient methods by enabling more efficient state-space exploration, leading to more accurate policy gradient estimates.

Funding: This research was supported by the SticAmsud LAGOON project, ANR EPLER, IRL-CNRS IFUMI-2030, and Action international CNRS.

1. Introduction

The efficient simulation of stochastic systems is a cornerstone of many critical applications, ranging from rare-event estimation to gradient estimation in Reinforcement Learning (RL). In many practical scenarios, the underlying dynamics of a system are complex, poorly characterized, or entirely unknown, placing us essentially in a model-free setting. In such contexts, strategies for exploring the state space where the stochastic dynamics live and gathering informative samples are paramount. This work investigates two powerful and general mechanisms for enhancing the efficiency of stochastic simulations: parallelization and restarting. We aim to provide a rigorous probabilistic analysis of these strategies, with applications focused on key model-free problems such as estimating small probabilities for stochastic processes with unknown dynamics and the challenge of efficient exploration in RL.

A canonical problem in probability and statistics is the estimation of the probability that a stochastic process reaches a rare, distant state within a finite time horizon. Standard Monte Carlo methods fail as the event of interest is rarely observed. A common technique for accelerating rare-event simulation is importance sampling (Asmussen and Glynn 2007), which involves simulating the process under an alternative “twisted” measure that makes the rare event more likely, and then correcting this change of measure via likelihood ratios. However, importance sampling is not a viable solution in the model-free settings as its application requires exact knowledge of the underlying process dynamics to construct the optimal change of measure. When the dynamics are unknown or too complex to model—as is often the case in large-scale RL environments or with intractable stochastic processes—this precise knowledge is unavailable, rendering importance sampling inapplicable. Moreover, an estimation scheme for the change of measure is usually of very large variance. It is precisely this limitation that motivates the use of blind or model-free strategies, such as parallelization and restarting, which do not require explicit knowledge of the system dynamics.

Parallelization—running multiple independent simulations concurrently—seems a natural remedy, as it increases the chances of observing the rare event. However, under a fixed total computational budget, a critical trade-off emerges: should one allocate the entire budget to a single, long simulation, or distribute it among many shorter, parallel runs? The answer is not trivial, as splitting the budget too finely may deprive each simulation of the necessary time to reach the rare state. We analyze this trade-off quantitatively, identifying a sharp phase transition in the success probability as a function of the number of parallel runs.

Beyond parallelization, restart strategies offer another avenue for improvement. Instead of letting simulations run in potentially unfruitful regions of the state space, a restart mechanism halts them and reinitializes them from more promising states, effectively redirecting computational effort. This restarting mechanism is particularly powerful in scenarios where certain regions are more likely to lead to the target event. A well-designed restart policy can prevent the waste of resources on trajectories with a low probability of success and serves as a practical, model-free alternative to importance sampling.

Several recent works have addressed the efficient exploration challenge, in particular for RL. For example, the Go-Explore approach in Ecoffet et al. (2021) introduces an innovative solution by distinctly separating the exploration and exploitation phases. This method allows the algorithm to maintain a repository of promising states, which can be revisited for further exploration, significantly enhancing the ability to discover and leverage rare or hard-to-reach states. Another example is found in Mastropietro et al. (2025), based on the paradigm of the Fleming-Viot particle system, introduced by Burdzy et al. (1996). In this framework, a population of particles (which can represent different simulations in parallel) evolves over time, exploring the state space in parallel. When a particle falls into a less promising region, it is “restarted” by being replaced with a copy of another, more promising particle. This ensures that resources are concentrated on exploring the most fruitful areas of the state space.

Beyond RL, stochastic restarting has been studied extensively across disciplines such as statistical physics, computer science, and network theory, primarily to minimize expected task-completion times modeled via first-passage times (see, e.g., Luby et al. 1993, Avrachenkov et al. 2018, and Evans et al. 2020). In contrast, our work focuses on the asymptotic behavior of the full distribution of passage times, studying large deviation regimes and rare-event phenomena through asymptotic estimates, bounds, and threshold effects rather than expectations. Although related analytic frameworks for Markov processes with restart—addressing ergodicity, quasi-stationary distributions (QSDs), and connections to Fleming-Viot systems—are developed in Grigorescu and Kang (2013), our results concern finer asymptotics for more restricted process classes and different probabilistic questions. Our work also differs from Monthus (2021), which analyzes large deviations for the long-time asymptotics with stochastic restart to a fixed state. In contrast, we consider large deviations with respect to a spatial parameter modeling the complexity of exploring the state space.

Despite the promise of parallel simulation techniques and restart mechanisms, there is a lack of quantitative evidence of their potential benefits. Specifically, the extent to which these strategies improve exploration efficiency and yield better learning outcomes remains unclear. We aim to fill this gap by providing a detailed analysis of the impact of parallel simulations combined with restart mechanisms on the exploration process, modeled as the task of reaching a specific subset of the state space. To achieve this, we use a simplified model based on toy dynamics, specifically random walks and Lévy processes. These processes are chosen for their mathematical tractability and their ability to capture fundamental aspects of the problem, allowing explicit analytical results. Although these dynamics are simple, they provide crucial insights into the mechanisms that govern exploration, forming the basis for future work with more involved Markovian dynamics. By employing this framework, we examine two key aspects of exploration:

  • Exploration complexity, encapsulated by a parameter that quantifies the fluctuations necessary to achieve effective exploration and reach the target subset of the state space;

  • Exploration diversity, represented by the number of parallel simulations employed, which increases the breadth of the state-space search and enhances the likelihood of encountering rare, high-reward states within the target subset.

We focus our analysis on a specific regime where the exploration complexity scales linearly with the simulation budget. The toy dynamics framework allows us to examine these parameters within finite time budgets systematically and to quantify how parallel simulations improve efficiency by leveraging diversity, whereas restarting mechanisms enable particles to avoid stagnation by redirecting effort toward more promising regions of the state space, ultimately facilitating the achievement of the exploration goal.

Our approach relies on the systematic analysis of these strategies under finite time budgets, using analytical techniques based on exponential martingales to quantify the effects of parallel simulations and restarting mechanisms. By addressing the interplay between exploration diversity and the power of parallelization and restarting mechanisms, our work provides novel insights into efficient exploration strategies and lays the groundwork for further investigations into more complex stochastic dynamics.

Through this investigation, our goal is not only to demonstrate the potential utility of these techniques but also to delineate their limitations. Understanding these nuances is crucial for developing more efficient stochastic simulations in model-free settings such as RL, particularly in environments characterized by sparse rewards. Ultimately, our findings aim to provide actionable insights into when and how to leverage parallel simulations and restarting strategies to optimize exploration, thereby advancing the field in both theory and practice.

1.1. Problem Statement and Contributions

In this work, we model exploration as a one-dimensional stochastic process, often referred to as a particle, aiming to reach a target subset of the state space. This target, represented as a barrier, captures a potentially informative state that is difficult to attain, making it essential to devise strategies that maximize the probability of success within a finite time budget and to quantify the resulting improvements. In the following we summarize our main findings and key take-away messages.

  1. Phase Transition in Reaching Probabilities. We identify a striking phase transition in the probability of reaching the target subset of the state space as a function of the number of parallel simulations N. In Theorems 1 and 2, we rigorously show that when the time budget scales linearly with the barrier level, there is a sharp transition in performance. Our analysis is based on large deviations techniques, and the transition happens at a certain threshold which we present explicitly. Below this threshold, concentrating the entire budget on a single particle is optimal. If the budget, when split into N parts, remains above the threshold, then distributing it among N particles significantly improves the probability of success. However, using too many particles—thereby reducing the time available to each beyond the threshold—results in exponentially diminished performance.

  2. Optimal Number of Parallel Simulations. In view of the previous point, a critical contribution is then the determination of an optimal number of parallel simulations, N*, that balances the trade-off between exploration diversity and the time allocated to each particle. This optimal N* depends intricately on the time budget and the large deviations characteristics of the particle dynamics, but in terms of the phase transition mentioned above, it can be characterized as the largest integer for which the split budget exceeds the threshold.

  3. Exponential Boost via Restarting Strategy. To overcome stagnation and improve the probability of reaching the target, we introduce a restarting mechanism inspired by mechanisms described, for instance, in Mastropietro et al. (2025). We study an idealized situation in which this mechanism replaces particles with a low likelihood of success and restarts them at new initial positions sampled from a given measure. Our theoretical results (Theorem 3) demonstrate that this restarting strategy can substantially enhance the probability of success. The improvement is quantified by a factor proportional to the time budget and the exponential moments of the measure associated with the restarting mechanism. A particularly interesting restarting measure is a QSD over a set representing high-probability regions for successful trajectories. For these restarting measures, we can obtain sharper asymptotics using different methods that exploit the QSD properties. We note that this restarting strategy can be well approximated in practice by selection mechanisms such as the Fleming-Viot or Aldous-Flanary-Palacios algorithms (see Budhiraja et al. 2022 and the references therein). These schemes enable practical implementation and adequate sampling of initial positions, making the restarting mechanism computationally feasible in real-world scenarios. Finally, from a practical point of view, we present contributions in two directions for queueing systems. The first one corresponds to the problem of estimating small probabilities. We analyze the interplay between the probability of exploration and the efficiency of the associated estimation method. For the particular case of an M/M/1 queue, we show that the variance of the vanilla Monte Carlo estimator is essentially determined by the typical exploration time. The second, with broader impact, addresses efficient exploration in RL. In RL environments with sparse rewards, the core of the efficient exploration challenge lies in the inability of algorithms such as policy gradient methods to perform effective policy updates (see Ecoffet et al. 2021 and references therein). These methods estimate the gradient of the expected return with respect to the policy parameters. This estimation relies crucially on exploring the state space; if the agent fails to reach rewarding states, the gradient estimates are biased and ineffective. Our work addresses this specific issue: we aim to improve gradient estimation (hence policy evaluation under a fixed policy) not by changing the policy, but by enhancing the exploration process used to evaluate it. The M/M/1/K queue considered in Mastropietro et al. (2025) is reanalyzed here to illustrate our results.

1.2. Organization of the Paper

In Section 2, we introduce the models for the exploration process, which represent the dynamics of the exploration phase in RL. Like the rest of the paper, this section is divided into two parts. The first part analyzes the effects of parallel exploration, and the second part analyzes our restarting mechanism. Section 3 presents our main results for the two scenarios described above, after some necessary preliminaries. Main results are discussed in generality and illustrated through numerical simulations for concrete instances. In Section 4, we prove Theorems 1 and 2 related to parallel exploration. In Section 5 we prove Theorem 3 for exploration with a general restarting measure, and in Section 6 we prove Theorem 4 for a QSD restarting measure. Finally, appendices are devoted to practical applications of our work.

2. Exploration Strategies

We now formally introduce the models for exploration, using random walks and Lévy processes as simplified toy models. We focus on the probability of reaching a distant state x from an initial position of zero, with the drift component encoding the difficulty of exploration. The drift represents the inherent challenge in progressing toward the target state, where a stronger drift away from the target reflects a more difficult exploration task.

We consider scenarios where the total time budget scales with x, allowing us to examine how exploration strategies perform as the difficulty of reaching the target increases. Two specific strategies are explored: the first investigates parallel exploration under a fixed time budget, whereas the second examines the effects of restarting the process when it crosses a predefined threshold. Although these models are simplified, they provide critical insights into the fundamental principles that govern the effectiveness of exploration strategies in RL, particularly in challenging environments where reaching an unlikely target state is the goal.

2.1. Strategy I: Parallel Exploration

Let {Zi(t)}tI,i{1,,N}, where I=N or I=R+, be a family of one-dimensional random walks or Lévy processes. Define further

τi(x)inf{tI:Zi(t)x},(1)
τ(N)(x)min{τ1(x),τ2(x),,τN(x)},(2)
the time it takes particle i to reach state x, and the first time one of N particles reaches x, respectively.

First, the case of N parallel independent simulations is analyzed. We introduce as performance criterion for exploration the probability that starting from zero, one of the simulations reaches the rare state x within a given time budget B(x)/N, that is,

P(τ(N)(x)B(x)/N).

We aim at characterizing this probability (as a function of N) in the regime where the total simulation budget, B(x), satisfies x=C·B(x)+o(B(x)), for x large and C some positive constant. This is the focus of Theorem 1 for random walks and Theorem 2 for Lévy processes.

2.2. Strategy II: Exploration with Restart

Our second model incorporates a restart mechanism into the dynamics of a random walk or a Lévy process. The ingredients of the model are a positive number x>0 (the rare state we want the process to reach), a probability measure νx supported on (0,x), and a discrete random walk or a Lévy process Z={Z(t)}tR+. The trajectories of the process under study can be viewed as a concatenation in time of independent and identically distributed cycles, each consisting of sampling a state y0 from νx and simulating the trajectory of y0+Z(t) until the first time it exits the interval (0,x).

Our goal is to analyze the first passage time over x of the restarted process, and compare it with the corresponding one for the process without restart. The conditions required on the family of restart measures to improve algorithmic performance are quite general. Essentially, it suffices that each νx be stochastically dominated by a nondegenerate measure possessing a finite second moment. We also examine in detail the case where restart occurs according to the unique quasi-stationary measure of the process Z absorbed at R+(0,x) (see Subsection 3.4 for precise definitions). Our motivation for this particular mechanism (restarting from quasi-stationary distributions) is that the resulting process approximates complex particle dynamics built upon a given Markov process, a key example, as noted earlier, being the class of Fleming-Viot particle systems. Our model is indeed an approximation of the Fleming-Viot dynamics because the stationary distribution of the N particle system approximates the quasi-stationary distribution associated with the underlying Markov process absorbed upon reaching A, as N goes to infinity (see Appendix B for more details).

3. Main Results

After introducing due notation, we present our results for each strategy.

3.1. Parallel Exploration Driven by a Random Walk

Let Z(t)=X1++Xt, tN be a random walk with independent increments all distributed as a random variable X. We will always assume that X is not deterministic, meaning that it is not concentrated on a single value, and that it may take both positive or negative values with positive probability. The moment-generating function (mgf) of X, φ(λ)E[exp(λX)], is strictly convex on the interior points of the set Λ (int(Λ)) where Λ{λR:φ(λ)<+}. The following condition is usually referred to as the right Cramér condition:

Λ+(0,+)int(Λ)is nonempty.(3)

We will work under further additional assumptions:

φ is atleast twice differentiable in Λ+,(4)
and
There existsλ*Λ+:ψ(λ*)=0 where ψ(λ)log φ(λ).(5)

Note that E[X]=ψ(0) and Assumptions (3), (4), and (5) imply that E[X]<0 and ψ(λ*)>0, because ψ(λ*)=ψ(0) and φ is strictly convex. Let τ(x) denote the first time the random walk reaches level x, defined as

τ(x)inf{t0:Z(t)x}.(6)

Furthermore, let τ(N)(x) represent the minimum of N independent random times, each distributed as τ(x), that is,

τ(N)(x)min{τ1(x),τ2(x),,τN(x)},(7)
which corresponds to the first time any of N independent random walks reaches level x.

Our first main theorem states that the interesting behavior of τ(x) and τ(N)(x) occurs while comparing them to functions of the following classes: for λΛ+ let

L(λ){f:RR such that x=ψ(λ)·f(x)+o(f(x))as x+}.(8)

This set consists of functions f(x) that asymptotically behave like x/ψ(λ) with an error term of order smaller than f(x). The result below is stated for the class of functions defined above, but the reader can think of a specific example being B(x)=x/ψ(λ) for any λΛ+.

We now state our main result regarding parallel exploration for random walks:

Theorem 1.

Let Z denote a random walk whose increments satisfy Assumptions (3), (4), and (5), and let B(·) belong to L(λ) for some λΛ+. Then for N2,

limx+P(τ(N)(x)B(x)N)P(τ(x)B(x))={N, if Nψ(λ)<ψ(λ*);0, if Nψ(λ)>ψ(λ*).(9)

Corollary 1

(Optimal Number of Particles). For a given state x and a budget B(x)L(λ). If ψ(λ)<ψ(λ*), the asymptotically optimal number of particles is

N*=max{N1:Nψ(λ)<ψ(λ*)}=ψ(λ*)ψ(λ)1,(10)
where · is the ceiling function. Otherwise, N*=1.

In the regime in which the theorem is stated, probabilities decay exponentially as x goes to infinity (see Proposition 1 in Section 4). The result shows that using too many particles (thus giving too little time to each) entails an exponentially worse performance, meaning that P(τ(N)(x)B(x)N) decays with a faster exponential rate than P(τ(x)B(x)). On the other hand, using too few particles comes only at a linear cost in performance.

The main intuition behind the results stated above is that within the (rare) event {τ(x)<+}, the probability is concentrated on those trajectories with drift ψ(λ*), for which the time needed to reach x is τ(x)x/ψ(λ*). This idea is formalized using exponential martingales and is at the core of Proposition 1. Our results show that allowing a budget smaller than x/ψ(λ*) substantially reduces the probability of success, whereas allowing too much budget results in no significant improvement. In the latter case, by splitting it into several independent particles, each one provides an “extra chance” of success.

Even though in general λ* and (thus) the threshold ψ(λ*)1 depend in a nontrivial way on the distribution of the increments, in some cases it is possible to find simple expressions for these parameters. We present some of the important examples here.

Example 1

(Normal Increments). If X has a normal distribution with mean μ<0 and variance σ2=1, then

ψ(λ)=λμ+12λ2;
hence, λ*=2μ and ψ(λ*)=μ and the threshold can be reliably estimated in practice if needed. Moreover, the function ψ:Λ+[0,+) is surjective, so for every C>0 there exists λΛ+ such that ψ(λ)=1/C. Theorem 1 in this case states that if, in order to reach a high state x>0, we are allowed a budget B(x)=Cx for some C>0, then the (asymptotically) optimal number of independent particles to use is N*=Cμ1 if Cμ>1 and N*=1 if not.

Example 2

(Birth-and-Death Chains). If X only takes values 1 and 1 with probabilities p and 1p, respectively, with p<1/2, then λ*=log(1p)log p and ψ(λ*)=12p=EX. Because X is bounded, the image of ψ contains the positive real numbers as in the previous example. Then, if we are allowed a budget B(x)=Cx, the number of particles maximizing our chances is N*=C(12p)1 if C(12p)>1 and N*=1 if not.

3.1.1. Simulations for Parallel Random Walks.

Consider the random walk case described in Example 2. We use the parameters p=0.45 and B(x)=300·x, which imply ψ(λ*)=0.1. According to Theorem 1 the ratio

P(τ(N)(x)B(x)N)P(τ(x)B(x))(11)
is expected to approach the identity for N<300·0.1=30 and zero for N>30, as x. This is the behavior observed in Figure 1.

Figure 1. Parallel Exploration for Random Walks with Negative Mean p(1p)=0.1
Notes. Estimated ratio between P(τ(N)(x)B(x)/N) and P(τ(x)B(x)) as a function of number N with B(x)=C·x=300·x. The phase transition is observed at the expected threshold N*=C(12p)1=29.

For the simulation we used the measure Pλ* as defined in (20), under which the process is distributed as a birth-and-death random walk with interchanged parameters, namely, p*=Pλ*(X1=1)=1Pλ*(X1=1)=1p. Under this measure, passage over a given xN is not a rare event anymore, so simulations are feasible. For each large deviation parameter x=10k, k{2,3,4,5}, and for each number of particles N between 1 and 100, we simulate and average over 1,000 random walks to get an estimate of Pλ*(τB(x)/N). By reverting the measure change, we obtain estimates for the original ratio.

3.2. Parallel Exploration Driven by a Lévy Process

Random walks have an analogue in continuous time known as Lévy processes. Under analogous assumptions on the exponential moments, we can extend Theorem 1 to the case of Lévy processes on R. This is based on the fact that the asymptotics for first passage times over high barriers are similar in the cases of random walks and Lévy processes; see Section 4. We now review some preliminaries on Lévy processes, all of which may be found in classical references such as Kyprianou (2006) and Bertoin (1998).

Let (Ω,F,{Ft}t,P) be a filtered probability space, where the filtration {Ft}t is assumed augmented and right-continuous. A Lévy process Z={Z(t)}tI, I=[0,+) is a stochastic process such that: for all 0st, Z(t) is Ft measurable, Z(0)=0, Z(t)Z(s) is independent of Fs and has the same distribution as Z(ts), and is continuous in probability, namely, that Z(t+s)Z(t) in probability if s0. We will always work with a càdlàg version of Z. Let ψ denote the Lévy exponent of Z:

EeλZ(t)=etψ(λ).

The Lévy-Khintchine formula introduces the characteristic triplet (μ,σ,Π) of Z:

ψ(λ)=log EeλZ(1)=μλ+(σλ)22++(exp(λy)1λy1(|y|1))Π(dy),(12)
where μR is called the drift coefficient, σ(0,+) is the diffusion coefficient (note that we exclude the finite-variation case), and Π is a measure in R{0} such that
+(1y2)Π(dy)<+,
called the Lévy measure (or jump measure). As in the discrete-time case we will work under Cramér’s condition: E(eλZ(1)) is finite for λ[0,λmax) and there exists λ*(0,λmax) such that ψ(λ*)=0. This condition implies in particular that EZ(1)<0 and that Z drifts to almost surely.

Let τ(x) and τ(N)(x) be defined as (6) and (7), respectively. Up to taking a continuous time parameter, our result on the number of particles under finite time budget constraints for Lévy processes is the same as for random walks:

Theorem 2.

Let Z denote a Lévy process such that Z(1) satisfies Assumptions (3), (4), and (5). Let B(·) belong to the class L(λ) for some λΛ+ (as defined in (8)). Then for N2,

limx+P(τ(N)(x)B(x)N)P(τ(x)B(x))={N, if Nψ(λ)<ψ(λ*);0, if Nψ(λ)>ψ(λ*).(13)

In view of the above result, we obtain an optimal number of particles for the Lévy process, in the same way as in Corollary 1.

The following is an example where all the parameters can be computed explicitly.

Example 3

(Linear Brownian Motion). If the Lévy measure is zero everywhere, the process is a Brownian motion with drift, namely, a solution of the stochastic differential equation (SDE)

dZ(t)=μdt+σdB(t).

We assume further that σ=1. Because its distribution at time 1 is Gaussian with parameters (μ,1), from Example 1 we conclude that λ*=2μ and ψ(λ*)=μ. The same conclusion on the optimal number of particles as in Example 1 holds.

3.2.1. Simulations for Parallel Lévy Processes.

For the simulations in Figure 2 we considered a Lévy process with positive and negative jumps. Positive (resp. negative) jumps occur at times distributed as a Poisson process of intensity r=2 (resp. s=3) and whose lengths are exponentially distributed with rate α=4 (resp. β=1). Following the notation in (12), we further take μ=σ=1. It can be checked that the Lévy measure is given by

Π(dy)=1(y<0)3 exp(y)dy+1(y>0)8 exp(4y)dy.

Figure 2. Parallel Exploration Lévy Processes
Note. Estimated ratio between P(τ(N)(x)B(x)/N) and P(τ(x)B(x)) as a function of the number N of independent Lévy processes with exponential jumps, with parameters described in Subsection 3.2.1.

For the considered process, ψ(λ)=λ+12λ23λλ+1+2λ4λ,λ<4 and its positive root turns out to be λ*=2. Moreover, ψ(2)=8/3, which gives, for B taken as B(x)=15·x, an optimal number N*=40 of particles.

For simulation purposes, as in the discrete-time case, we use the exponential martingale associated with λ* and the measure Pλ*, as defined in Section 4. Specifically, the parameters of the process were chosen in such a way that under the measure Pλ*, the process is again Lévy with exponential jumps. Moreover, the time intensities and the jump rates are permuted: under the measure Pλ* the parameters of the Lévy measure are r*=4, α*=2, s*=1, and β*=3.

Remark 1

(A Remark on Reflected Process). In terms of applications, it may be desirable to include reflection at a low barrier in our models to convey the idea that complexity of an exploration task cannot diverge to in realistic scenarios. However, in the context of rare events we are in, there is no substantial difference in introducing reflection, because estimates vary at most by a multiplicative factor. To see this, consider the process Z˜(t)=Z(t)[inf0stZ(s)(1)], namely, the reflected version of Z at 1. By considering the coupling (Z(t),Z˜(t)) and observing that P(Z(t)Z˜(t))=1 for every t0, it follows that

P(τ(x)t)P(τ˜(x)t), for every t0.

On the other hand, the event {τ˜(x)<t} may be decomposed into trajectories that reach x before the first reflection at 1 and those that do not. The first subset may be treated as a process without reflection, thus contributing a term of the same order as that of the process without reflection. Trajectories of the latter subset have a last reflection time before τ˜(x), so there is an excursion of length x+1 during a shorter time. This provides an extra term of order at most P(τ(x+1)t)<P(τ(x)t). Thus,

P(τ˜(x)t)2P(τ(x)t),
and we may then safely restrict ourselves to processes without reflection. The interested reader may consult Hansen (2009) for the maximum of reflected Lévy processes.

3.3. Exploration with Restart for Random Walks and Lévy Processes

Our Strategy II consists of restarting a random walk or a Lévy process such as those studied in Theorems 1 and 2 upon leaving (0,x) with a given probability measure νx supported in the interval (0,x). Our first result in this direction, stated in Theorem 3, gives asymptotic performance guarantees of restart strategies under mild assumptions on the restart measures νx, as x+.

In the next section we will specialize to a particularly interesting family of measures: quasi-stationary measures, for which better estimates are obtained (in particular matching bounds up to multiplicative constants). We remark that for the QSD case we rely exclusively on quasi-stationarity and related properties but not on Theorem 3.

Definition 1.

Let ν1,ν2 be two positive finite measures on R. ν1 is said to be stochastically dominated by ν2, denoted ν1stochν2, if for every measurable nondecreasing function u,

Ru(y)ν1(dy)Ru(y)ν2(dy).(14)

This notion is sometimes called first-order stochastic domination and is equivalent to ν1(x,+) being less than or equal to ν2(x,+) for every xR.

The next theorem provides estimates on the time it takes for the restarted process to reach the desired region (here an interval [x,+) for large x) under mild assumptions on the restart measures (νx)x1, for which we assume without loss of generality that x1.

Theorem 3.

Let {Z(t)}t>0 denote a random walk or a Lévy process satisfying Cramér’s condition, and letλ*be given by(5). For each x>0, let {Zνx(t)}t>0 be the restarted process with restart measures (νx)x1 as described at the beginning of this section, and let τνx(x) be the first time a cycle of the process ends above x. Assume that there exists a finite positive measure ν with the following properties:

  1. ν is not a delta measure on zero;

  2. νxstochν for every positive x;

  3. ν has a finite second moment: y2ν(dy)<+.

Then, B(x) growing faster than ψ(λ*)1x but slower than eλ*x, the ratio

P(τνx(x)B(x))P(τ(x)B(x))1B(x)0xexp(λ*y)νx(dy)(15)
is bounded away from zero uniformly in x>0.

A typical trajectory of the restarted process is a concatenation of independent cycles, each of which consists of a trajectory {y0+Z(t):0tτ} for some y0(0,x) sampled according to νx and τinf{t>0:y0+Z(t)(0,x)}. An intuition behind the result is that adding νx as an initial distribution boosts the performance by a factor 0xexp(λ*y)νx(dy) for each cycle. Because the cycles’ duration is stochastically bounded, we expect to observe an asymptotically linear number of cycles within time B(x). However, a matching upper bound remains elusive for this general context.

Example 4.

Consider some fixed probability measure ν on (0,+) such that ν((0,1))>0 and satisfying properties 1, 2, and 3 listed in Theorem 3. Now define the sequence (νx)x1 as the conditional laws

νx(·)ν(·(0,x))ν((0,x)).

Then νx(y,+)ν((0,1))1ν(y,+), and Theorem 3 is applicable for this family of measures.

Remark 2.

In a more general fashion than the previous example, one may wonder if the hypotheses of Theorem 3 are satisfied whenever the measures νx have a weak limit ν. This is indeed the case under some assumptions on the moments of the νx’s, but the dominating measure need not be related to the weak limit. Define ν*((,y])infx1νx((,y]). Then ν* defines a càdlàg function with limyν*((,y])=0. Because the νx’s have a weak limit, limyν*((,y])=1 also holds and ν* defines a probability measure that dominates νx for every x1. Note as well that ν* is not a delta measure at zero unless νx=δ0 for all x1. Let us now give a precise condition under which ν* satisfies property 3 in Theorem 3. Assume that the νx’s have uniformly bounded moments of order p for some p>2, that is,

Csupx10+ypνx(dy)<+.(16)

We bound the tails of the νx by Markov’s inequality:

νx(y0,+)y0p0+ypνx(dy)Cy0p.

Then, using the Layer Cake Representation, the desired property holds:

0+y2ν*(dy)=20+yν*(y,+)dy=20+ysupx1νx(y,+)dy2C0+y1pdy<+.

As a final example for this section, we take a simple extension of Example 4 that shows that Theorem 3 covers cases beyond weak convergence:

Example 5.

Take ν1 and ν2 two (distinct) probability measures on (0,+) satisfying the conditions of Theorem 3 and νi((0,1))>0,i=1,2. We may build an alternating family of measures for x1 as follows:

νx(·){ν1(·(0,x))ν1((0,x)) if x is odd, ν2(·(0,x))ν2((0,x)) if x is even. 

In this case Theorem 3 applies, but there is not a weak limit of the whole sequence of measures.

3.4. Restart with Quasi-stationary Measures

We now specialize to an important family of restarting measures for which we can obtain sharper asymptotic results than those provided by Theorem 3.

Throughout this section we assume that νx is the QSD of the process restricted to the interval (0,x). We begin with a few brief preliminaries on quasi-stationary distributions, and refer the reader to the monograph of Collet et al. (2013) for a comprehensive treatment of the subject.

Definition 2

(Quasi-stationary Measure). Let Z={Z(t)}t0 be a Markov process with state space S and let A be a subset of S. The absorbed process, denoted by ZA, is defined as ZA(t)=Z(tτ(A)), where τ(A)inf{t>0:Z(t)A}. A quasi-stationary measure (QSD) ν is a probability measure on SA which is invariant when conditioned on nonabsorption, which means

Pν(Z(t)B|t<τ(A))=Pν(ZA(t)B)Pν(t<τ(A))=ν(B),
for every t>0 and every measurable set BSA.

The study of QSD for a given Markov process is a subtle matter; in general, they may not exist, and when they do, they may not be unique.

For the particular case of a one-dimensional Lévy process with absorbing set A=R(0,a) for some a>0, by Kolb and Savov (2014, theorem 2.1 and remark 2.2) there is a unique QSD, provided the distribution of the process without absorption at any time t>0 is absolutely continuous with respect to Lebesgue in R, which is satisfied because we are assuming a strictly positive diffusion coefficient. When there is a unique QSD, it is obtained as the Yaglom limit

limt+Py(Z(t)·|t<τ(A)),
which is independent of the starting position ySA.

The existence of a Yaglom limit for a Lévy process when the absorbing set is (,0] was proved in Kyprianou and Palmowski (2006),1 extending the case without jumps of Martinez and San Martin (1994).

To the best of our knowledge, there is no analogous existence-and-uniqueness result specific to discrete-time random walks conditioned on nonabsorption on bounded intervals. If the law of the increments of a random walk is absolutely continuous with respect to the Lebesgue measure, then one may consider the associated compound Poisson process (CPP) with rate 1, thereby obtaining a Lévy process with an absolutely continuous law for every time larger than its first jump. By checking that for large t the probability of not having jumped is negligible for the CPP even when conditioned on nonabsorption by time t, one obtains the same Yaglom limit as the conditioned CPP and in particular the existence and uniqueness given by Kolb and Savov (2014). This can be carried out using the asymptotics on passage times from Denisov and Shneer (2013). These asymptotics for conditioned processes are used in the proof of Lemma 4 of Section 6. Yaglom limits for random walks with absorption at (,0] were studied in Iglehart (1974), and under the Cramér condition in Doney (1985).

We may now state our main theorem for restarted processes:

Theorem 4.

Let Z be a random walk on discrete time or a Lévy process satisfying (3), (4), and (5) as in Theorem 1. For each positive x(0,+), let νx denote the quasi-stationary probability measure on (0,x) for the process Z absorbed upon exiting (0,x). Let Zνx be the restarted version of Z on (0,x) with restart measure νx and τνx(x) the associated passage time over x. Let the time budget B(x) grow faster than x/ψ(λ*) as x+ but slower than eλ*x. Then there exist constants c,C>0 such that

cP(τνx(x)<B(x))P(τ(x)<B(x))1B(x)0xeλ*yνx(dy)C(17)
for all x>0.

Corollary 2.

In the particular case when Z is a linear Brownian motion, with a negative drift equal to μ, we have

0xeλ*yνx(dy)=eμx,(18)
and the ratio
P(τνx(x)<B(x))P(τ(x)<B(x))1B(x)eμx(19)
is bounded away from zero and infinity for B(x) growing faster than x/μ.

The theorem states that the application of a restarting mechanism with quasi-stationary distributions improves performance by a factor comparable with the time budget times an exponential moment of the respective measure. In the Brownian motion case, this improvement factor is exponential in the complexity of the task. Indeed, although P(τ(x)B(x)) is of order e2μx, with the restarting mechanism we get order B(x)eμx. For a general Lévy process, exponential moments of its quasi-stationary measures are not easy to estimate. We conjecture that they grow as exp[(λ*λ0)x], as x+, where λ0 is the positive solution of ψ(λ)=0. Note that this is exactly the case for the linear Brownian motion. For spectrally negative Lévy processes, namely, when Π(0,+)=0, the conjecture might be proved with the use of a Girsanov transformation of the process into a zero-drift one (thus providing the conjectured exponential factor) and applying the results of Lambert (2000). We do not pursue this further here; it is left as a direction of further study.

3.4.1. Simulations for Restarted Lévy Processes.

We now present numerical experiments for restarted processes. As a case study, we consider the Lévy process with exponential jumps introduced in Subsection 3.2.1, equipped with a restart mechanism activated upon exiting the interval (0,50)R.

For the process without restart, recall that the Cramér exponent is λ*=2. Consequently, when the process is started from the origin, the probability of ever reaching the upper barrier at 50 is of order exp(100). As restart measure, we consider an exponential distribution with mean 10, truncated to (0,50), namely,

ν50(dx)=0.1e0.1x1e51(0,50)(x)dx.

Its exponential moment of order λ* satisfies

050e2xν50(dx)9.6×1039.

Multiplying by P0(τ(50)<+) yields

P0(τ(50)<+)050e2xν50(dx)3.6×104.

Although this estimate is only heuristic, in view of Theorem 3 it suggests that the probability of exceeding the level x=50 within a finite time horizon B(50) under this restart strategy is no longer negligible. This behavior is confirmed numerically in Figure 3, which reports empirical mean estimates based on 100 independent replications for each time budget, together with 95% confidence intervals. The figure illustrates how the restart mechanism makes the exceedance of the upper barrier observable on moderate time scales, with confidence intervals (CI) indicating the associated variability.

Figure 3. Probability of Exceeding the Level x=50 for the Restarted Lévy Process as a Function of the Time Horizon
Notes. The underlying process combines a Brownian motion with drift μ=1 and volatility σ=1 with exponential jumps: positive jumps arrive at rate λ+=2 with jump sizes of rate 4, whereas negative jumps arrive at rate λ=3 with jump sizes of rate 1. Whenever the process exits the interval (0,50), it is restarted according to a truncated exponential distribution with mean 10.

We emphasize that the simulations reported in this section do not rely on importance sampling or on any other variance-reduction technique. The numerical results therefore illustrate that introducing a restart mechanism alone can already lead to a significant improvement in the simulation of rare events. In particular, restarting makes it possible to turn probabilities that are extremely small for the original process into quantities that can be reliably estimated within reasonable computational time.

4. Proof of Theorems 1 and 2

We begin the section by introducing some preliminaries on exponential martingales, first for random walks and then for the Lévy process, which will be used in the proofs. They are standard techniques for the study of large deviations events for light tails, as is our case.

4.1. Preliminaries

By Assumption (3), the family of exponential martingales indexed by λΛ+ may be defined as

Mtλexp(λZ(t)tψ(λ)),tN.

Indeed, Mλ is a martingale for every λΛ+, which acts as a density for a new probability measure on the same probability space. Specifically, we define the measure Pλ as

Pλ(A)E[Mτλ1(A)],(20)
where A depends on {Z(t)}tτ, and τ is any almost-surely finite stopping time. The measure Pλ*, associated with λ* in Equation (5), will be of particular importance.

We also observe that the density of P|Fτ with respect to Pλ|Fτ is given by (Mτλ)1.

Exponential martingales are particularly useful for computing probabilities of rare events, as these events may become more frequent under Pλ if λ is appropriately chosen. Under the measure Pλ, the increments Xj, j1, are i.i.d. (making Z a random walk), with mean and variance given by

μ(λ)Eλ[X1]=ψ(λ) and σ(λ)Eλ[|X1μ(λ)|2]=ψ(λ).

The threshold in Theorem 1 can thus be interpreted in terms of the measure Pλ*: the random time τ(x) is Pλ*-almost surely finite (because ψ(λ*)>0) with mean xψ(λ*)1, asymptotically as x+.

For the Lévy process we have analogous definitions: under Cramér’s condition the family {Mtλ}0tλmax of exponential martingales is defined as

Mtλexp(λZ(t)tψ(λ)),
and the corresponding family of measures is given by
dPλdP|Ft=Mtλ.

Under any of the measures Pλ, Z is again a Lévy process with Lévy exponent

ψ(λ)(r)=ψ(λ+r)ψ(λ),(21)
with mean and variance again given by
μ(λ)Eλ[X1]=ψ(λ) and σ(λ)Eλ[|X1μ(λ)|2]=ψ(λ).

From (21) it is straightforward to check that the corresponding characteristic triplet is given by

μλ=μ+λσ2+11y(eλy1)Π(dy),σλ2=σ2,Πλ(dy)=eλyΠ(dy).(22)

The measure Pλ* provides an intuition to Theorem 2 as in the random walk case: the law of the process conditioned to reach the high barrier is distributed as Pλ*. We do not prove this exact result here, but it lies at the core of the references cited for our proof (Bertoin and Doney 1994, Palmowski and Pistorius 2009).

Remark 3.

μλ in (22) should not be confused with μ(λ). If the jump measure Π is nonzero, they might not coincide.

4.2. Proofs

Remark 4.

We introduce some pieces of notation: f(x)g(x) will be used to denote asymptotic equivalence limx+f(x)g(x)1=1 and f(x)g(x) for C1<f(x)g(x)1<C for some constant C>0 for all x>0.

Theorems 1 and 2 will follow from the following asymptotics of the passage times of random walks and Lévy processes with the assumptions on their exponential moments stated in Subsection 3.1.

The following proposition unifies Höglund (1990, corollary 2.2) for the case of random walks and Palmowski and Pistorius (2009, theorem 1) for Lévy processes:

Proposition 1.

Let Z={Z(t)}tI be either a random walk (I=N) or a Lévy process (I=R+) taking values in R and assume that Conditions (3), (4), and (5) are satisfied (for Z(1) in the case of a Lévy process). Fix some λ(0,+)Λ such that μ(λ)=ψ(λ)>0. If x and t=t(x) go to infinity in such a way that tL(λ), then there exist positive constants C (independent of λ) and D(λ) such that

P(τ(x)t){C exp(λ*x),if μ(λ)<μ(λ*)D(λ)t1/2 exp(ζ[μ(λ)]t),if μ(λ)>μ(λ*),(23)
where ζ is the convex conjugate of ψ:
ζ[s]supλΛ{λsψ(λ)}.(24)

Some remarks are now in order:

Remark 5.

The constants C and D(λ) are given in Höglund (1990) and Palmowski and Pistorius (2009) in terms of the increments of the process in each case. Their explicit values are not relevant for our analysis.

Remark 6.

If s=μ(λ), then λ realizes the supremum in the definition of ζ[s], that is,

ζ[μ(λ)]=λμ(λ)ψ(λ),λΛ.(25)

Remark 7.

The case μ(λ)<μ(λ*) in the proposition shows that for t growing sufficiently fast with x, P(τ(x)t) and P(τ(x)<+) have the same exponential order:

P(τ(x)t)P(τ(x)<+)=Eλ*[1(τ(x)<+) exp(λ*Z(τ(x)))]=eλ*xEλ*[1(τ(x)<+) exp(λ*[Z(τ(x))x])]eλ*x,
where we used that Pλ*(τ(x)<+)=1 and that the overshoot at x, U(x)Z(τ(x))x, is positive. In fact, if one assumes that the jump measure of Z is not supported on a lattice, then Pλ*(τ(x)<+)exp(λ*x) converges to a positive constant (see Kyprianou 2014, theorem 7.6; Bertoin and Doney 1994; and Palmowski and Pistorius 2009). We will recall this fact repeatedly.

Proof of Theorems 1 and 2.

We begin by noticing that for any time t (discrete or continuous),

P(τ(N)(x)t)=NP(τ(x)t)+O((NN/2)P(τ(x)t)2);(26)
so, for N fixed, P(τ(N)(x)t)NP(τ(x)t). Recall that · denotes the ceiling function.

Case I: Nμ(λ)<μ(λ*). In this case B(x) and B(x)/N are in the same regime of Proposition 1; thus, by (26),

limx+P(τ(N)(x)B(x)N)P(τ(x)B(x))=limx+NP(τ(x)B(x)N)P(τ(x)B(x))=N.(27)

For the remaining cases some considerations will be needed: for each λ such that μ(λ)>0, let

S(λ){s1:λs>λ, s.t. μ(λs)=sμ(λ)}.(28)

The set S(λ) contains a (nontrivial) right neighbourhood of one as long as λ<λmax. We also note that for each λ as before, the map sλs is continuously differentiable and

ddsλs=μ(λ)μ(λs)=μ(λ)σ2(λs).

Case II: μ(λ)<μ(λ*)<Nμ(λ). As a subcase, assume first that NS(λ), so that there exists λN(λ,λmax) such that

μ(λN)=Nμ(λ).(29)

In this case, we have that x=μ(λN)B(x)N+o(x), so the exponential rate of P(τ(x)B(x)N) is given by Proposition 1:

B(x)Nζ[μ(λN)]=B(x)N(λNμ(λN)ψ(λN)).(30)

On the other hand, P(τ(x)B(x))eλ*x. Then, for some constant D˜(λ),

limx+P(τ(x)B(x)N)P(τ(x)B(x))limx+D˜(λ) exp(x(λNλ*)+B(x)Nψ(λN))=limx+D˜(λ) exp(x(λNλ*)(1ψ(λN)ψ(λ*)μ(λN)(λNλ*))+λNo(x)).(31)

Using the Mean Value Theorem and the fact that μ is strictly increasing in the interval (λ*,λN), we conclude that the limit in (31) is zero.

For the second subcase, let NS(λ), so that Nμ(λ) is at a positive distance above the range of μ(·). A bound may be obtained from any sS(λ){1} by monotonicity of tP(τ(x)t):

limx+P(τ(N)(x)B(x)N)P(τ(x)B(x))limx+P(τ(N)(x)xμ(λs)+o(x))P(τ(x)B(x))=0.(32)

Case III: μ(λ*)<μ(λ). As in the subcases considered above, assume first that NS(λ). Now both B(x) and B(x)/N lie in the right-hand side (RHS) of the threshold in (23). It suffices to show that the function

sS(λ)1sζ[μ(λs)]
is strictly increasing. Indeed, in that case
limx+P(τ(N)(x)B(x)N)P(τ(x)B(x))=limx+exp{B(x)(ζ[μ(λ)]1Nζ[μ(λN)])}=0,(33)
and the result is proved.

It only remains to prove that 1sζ[μ(λs)] is increasing, and using Remark 6 this is seen to hold:

dds1sζ[μ(λs)]=ddsλsμ(λ)+ψ(λs)s21sμ(λs)ddsλs=ψ(λs)s2,(34)
which is positive because λs>λ>λ*.

The proof is finished by arguing in the same way as in Case II for NS(λ). □

5. Proof of Theorem 3

For the proof of Theorem 3 we will need the following auxiliary lemmas:

Lemma 1.

Let

q(x)Pνx(τ(x)<τ(0)).(35)

Then under the assumptions of Theorem 3,

limx+q(x)=0.(36)

Moreover, there exists a constant c>0 such that

c0xeλ*yνx(dy)eλ*xq(x)0xeλ*yνx(dy).(37)

Lemma 2.

Let η and ζ be random variables defined by

P(ηt)=Pνx(τ(0)t|τ(0)<τ(x))
and
P(ζt)=Pνx(τ(x)t|τ(x)<τ(0)).

Under the assumptions of Theorem 3,

lim supx+E[η]<+
and2
E[ζ]xψ(λ*).

Lemma 3.

Let η,ζ be as in Lemma 2 and consider an i.i.d. sequence η1,η2, with the same distribution as η. For a fixed δ>0, define the events

Eδm{j=1mηjm(1+δ)Eη1} and Fδ{ζ(1+δ)Eζ},
and for any T>0 let
nδ(T)T(1+δ)Eζ(1+δ)Eη1.

In the setting of Theorem 3, the following holds:

limx+P(Eδnδ(B(x))Fδ)=1.(38)

Assuming these, we may now give the main result for restarted processes.

Proof of Theorem 3.

The first time the restarted process goes above x has a regenerative structure with cycles determined by the restart times. Formally,

τνx(x)=j=1Nxηj+ζ,(39)
where Nx{0,1,} is defined as the number of “failed” cycles (i.e., those ending at time τ(0)) before a successful one (i.e., one ending at time τ(x)) and has a geometric distribution with success parameter q(x)Pνx(τ(x)<τ(0)). For each x>0 the laws of the ηj’s and of ζ are those of τ(0) conditioned on the event {τ(0)<τ(x)} and of τ(x) conditioned on {τ(x)<τ(0)}, respectively. We remark that although it is not explicit in our notation, the laws of η1 and ζ both depend on x.

Lemma 1 studies the behavior of q(x) for large x; in particular (36) shows that it becomes increasingly harder to observe the events of interest as x grows.

A lower bound for P(τνx(x)B(x)) is given by the probability that Nx in (39) takes an unusually small value. To this end, consider for a fixed small δ>0 the events

Eδm{j=1mηjm(1+δ)Eη1}andFδ{ζ(1+δ)Eζ},
and for each T>0,
nδ(T)T(1+δ)Eζ(1+δ)Eη1.

Then by Lemma 3,

lim infx+P(τνx(x)B(x))lim infx+P({Nxnδ(B(x))}Eδnδ(B(x))Fδ)=lim infx+P(Nxnδ(B(x))).

Finally, we can conclude the desired uniform lower bound for (15) because

P(Nxnδ(B(x)))=1(1q(x))nδ(B(x))=nδ(B(x))q(x)(1+o(1))
and nδ(B(x))/B(x) is bounded away from zero and infinity by Lemma 2. It follows that for B(x) in the regime considered
P(τ(x)B(x))B(x)q(x)P(τ(x)B(x))B(x)0xexp(λ*y)νx(dy).

The proof of Theorem 3 is now concluded. □

5.1. Proofs of Auxiliary Lemmas

Proof of Lemma 1.

We begin with the proof of the upper bound of (37) by a martingale argument. We have

0xeλ*yνx(dy)=EνxM0=EνxMτ(x)τ(0)=eλ*xEνx[eλ*(Z(τ(x))x)1(τ(x)<τ(0))]+Eνx[eλ*Z(τ(0))1(τ(x)>τ(0))]eλ*xEνx[1(τ(x)<τ(0))]=eλ*xq(x).

We now turn to the lower bound. Our main ingredient is Bertoin and Doney (1994),3 who state that there exists a constant C(0,1) such that

limx+eλ*xP(τ(x)<+)=C.(40)

Using that τ(0) is almost surely finite, for every x and y(0,x)

Py(τ(x)<+)=Py(τ(0)<τ(x)<+)+Py(τ(x)<τ(0)).(41)

We first claim that

l=lim infx+0xeλ*yνx(dy)>1.(42)

Indeed, l1 and equality cannot hold because otherwise the measures νx would converge (up to taking a subsequence) to the delta measure on zero, thus contradicting the positive limit of the absorption rates in Lemma 6, and the claim follows.

In view of the claim above, let δ>0 be small enough such that C(1δ)>C(1+δ)l, and let also xδ be given by (40) such that for x>xδ

C(1δ)eλ*xP(τ(x)<+)C(1+δ).

Noticing that by the Markov property Py(τ(0)<τ(x)<+)P0(τ(x)<+)=P(τ(x)<+), we then have

q(x)=0xPy(τ(x)<τ(0))νx(dy)=0xP(τ(xy)<+)Py(τ(0)<τ(x)<+)νx(dy)0xP(τ(xy)<+)P(τ(x)<+)νx(dy)C(1δ)eλ*x0xxδeλ*yνx(dy)+xxδxP(τ(xy)<+)νx(dy)C(1+δ)eλ*xC(1δ)eλ*x0xxδeλ*yνx(dy)+P(τ(xδ)<+)νx([xxδ,x])C(1+δ)eλ*x.(43)

It then suffices to find a strictly positive lower bound, independent of x for the following:

q(x)eλ*x0xeλ*yνx(dy)C(1δ)0xxδeλ*yνx(dy)0xeλ*yνx(dy)+P(τ(xδ)<+)νx([xxδ,x])eλ*x0xeλ*yνx(dy)C(1+δ)10xeλ*yνx(dy).(44)

There are two cases to distinguish: whether l(δ)=lim infx+0xxδeλ*yνx(dy)0xeλ*yνx(dy)is equal to one or not.

In the case l(δ)=1, the first term in the RHS of (44) dominates the third and the desired lower bound is C(1δ)C(1+δ)/l>0.

If we now assume that l(δ)<1ε for some positive ε, then the middle term in (44) is at least εP(τ(xδ)<+) for large x. The proof concludes by showing that either the first and third terms vanish or the first dominates the third: assume first that there exists a sequence xn+ such that

0xnxδeλ*yνxn(dy)<1ε
for some ε>0 and all n; then νxn([xnxδ,xn])ε and consequently
0xneλ*yνxn(dy)+.

For such a sequence, the first and third terms on the RHS of (44) vanish as x+. On the other hand, if there is no such sequence (and hence liminf0xnxδeλ*yνxn(dy)1), the first term dominates the third one and εP(τ(xδ)<+) is still a lower bound for (44).

Let us now observe that q(x) goes to zero as x+, so the events of interest are indeed rare events: for any ε>0, there is kε such that ν[kε,+)<ε; then for x big enough, we have that

q(x)exp(λ*x)0xexp(λ*y)νx(dy)exp(λ*x)0kεexp(λ*y)νx(dy)+νx[kε,x).

Because kε is fixed, the above limit is upper bounded by ε, and ε is arbitrary. Hence, q(x)0 as x+.

The proof of Lemma 1 is then complete. □

Proof of Lemma 2.

The statement on η follows easily from stochastic domination by noticing that the function yEyτ(0) is nondecreasing. Indeed,

lim supx+E[η]=lim supx+Eνx[τ(0)1(τ(0)<τ(x))]1q(x)Eν[τ(0)]1+o(1)<+.(45)

We now turn to ζ. For each x>0 denote by U(x) the overshoot at x: U(x)=Z(τ(x))x.

by means of the exponential change of measure and the asymptotics of

limxE[ζ]=limxEνx[τ(x)|τ(x)<τ(0)]=limxEνxλ*[eλ*U(x)+λ*y0;τ(x);τ(x)<τ(0)]Eνxλ*[eλ*U(x)+λ*y0;τ(x)<τ(0)]Eνxλ*[τ(x)]1ψ(λ*)0x(xy0)νx(dy0).

Using that ν has finite first moment, the last term above is seen to scale as x/ψ(λ*) up to multiplicative constants. □

Proof of Lemma 3.

To lighten the notation, let us denote nδ(B(x)) by nδ. We will show separately that

limxP(1nδj=1nδηj(1+δ)Eη1)=0(46)
and
limxP(ζ>(1+δ)Eζ)=0.(47)

Together they imply (38).

Let us start with (46). For this part of the proof, we apply Chebyshev’s inequality:

limxP(1nδj=1nδηj(1+δ)Eη1)limxP(1nδ2j=1nδ|ηjEηj|2(δEη1)2)Var(η1)nδ(δEη1)21nδ(δEη1)20+Ey[τ(0)2]ν(dy)1nδ(δEη1)20+(xy)2ν(dy).

The asymptotic equivalence of the last line above is justified by Doney and Maller (2004, theorem 3) once we note that P(Z(t)x0) decays exponentially to zero with t for every fixed x0>0 (this is easily checked with an exponential martingale argument using any parameter θ>0 such that ψ(θ)<0).

Because ν has finite second moment and does not equal δ0, E[η1] does not go to zero with x+ and we conclude (46).

Let us now prove (47). On the one hand, the Strong Law of Large Numbers (see Kyprianou 2014, exercise 7.2, p. 224) implies that

limx+Z(τ(x))τ(x)Eλ*Z(1)=limx+x+U(x)τ(x)ψ(λ*)=1,Pλ*a.s.

Here as before U(x) denotes the overshoot at x. On the other hand, as seen in Lemma 2, for large x, Eζ is asymptotically equivalent to x/ψ(λ*). Under Cramér’s condition U(x)/x converges to zero almost surely, so we conclude that

limx+τ(x)Eλ*[τ(x)]=1,Pλ*a.s.
and (47) follows. □

6. Proof of Theorem 4

In this section Z denotes either a random walk on discrete time or a Lévy process, satisfying Cramér’s condition, and we will consider as restarting measures the family of quasi-stationary distributions (νx)x1. The proof of Theorem 4 relies on some auxiliary results.

Lemma 4.

The sequence of QSD measures (νx)x1 converges in distribution to the Yaglom limit on (0,+), given by

ν(B)limt+Px0(Z(t)B|t<τ(0)),
where τ(0)=inf{t>0:Z(t)0} and x0(0,+) is an arbitrary point.

Once the sequence of measures admits a weak limit, Remark 2 provides a route to apply Theorem 3, thereby yielding the lower bound in Theorem 4. We adopt, however, a different approach that relies more heavily on properties associated with quasi-stationarity.

Lemma 5.

For the restarted process Zνx, τνx(x) has an exponential distribution with parameter β(x)q(x), where β(x) is the exponential rate of absorption associated with νx and q(x) is defined in (35).

Lemma 6.

The function β(x), as defined in Lemma 5, is decreasing in x>0 and has a positive limit β>0 as x+.

If we assume the lemmas above, we may prove the main result as follows:

Proof of Theorem 4.

By Lemma 5 we have that P(τνx(x)B(x))=1exp[β(x)q(x)B(x)]. Because B(x) grows slower than eλ*x with x, β(x)q(x)B(x)0 and P(τνx(x)B(x))β(x)q(x)B(x) as x+, with the notation defined at the beginning of Subsection 4.2.

Also, by Lemma 6, β(x) has a positive limit, and because P(τ(x)B(x)) has order exp(λ*x), by Proposition 1 we conclude that the ratio

P(τνx(x)<B(x))P(τ(x)<B(x))1B(x)0xeλ*yνx(dy)(48)
is bounded away from zero and infinity uniformly in x. □

Before proving the lemmas, we compute the quasi-stationary distribution, its exponential moment of order λ*, and the absorption rate for a linear Brownian motion explicitly.

Proof of Corollary 2.

In the case in which Z is distributed as a Brownian motion, we can compute all ingredients explicitly. Brownian motion with constant drift μ<0 has infinitesimal generator

L=12d2dx2μddx,
whose domain contains the set of twice continuously differentiable functions with right limit (resp. left limit) equal to zero at zero (resp. x), denoted by C02(0,x). The adjoint operator is
L*=12d2dx2+μddx
and has u(y)=sin(πy/x)eμy as a positive eigenfunction. Indeed,
L*u=12(μ2+π2x2)u.

By the spectral characterization of quasi-stationary measures (see, e.g., Méléard and Villemonais 2012, proposition 4), the measure

νx(dy)=D sin(πy/x)eμydy(49)
is then quasi-stationary for Z, where D=μ2x2+π2πx(eμx+1) is the normalization constant. The absorption time Tx=τ(x)τ(0) is exponentially distributed with rate β(x)=12(μ2+π2x2), that is,
Pνx(Tx>t)=e12(μ2+π2x2)t.(50)

With the explicit form for the quasi-stationary distribution, and recalling that λ*=2μ, the exponential moment follows:

0xeλ*yνx(dy)=D0xeμy sin(πxy)dy=Dπxeμx+1μ2x2+π2=eμx.(51)

The result now follows from Theorem 4. □

6.1. Proofs of Auxiliary Lemmas

We now turn to the proof of the auxiliary lemmas.

Proof of Lemma 4.

Let us recall the piece of notation Tx=inf{t>0:Z(t)(0,x)}=τ(0)τ(x) and take x0 any number in (0,+). We shall prove that the following limit holds for any bounded measurable function f:

limx+limt+|Ex0[f(Z(t))|t<τ(0)]Ex0[f(Z(t))|t<Tx]|=0.(52)

The result relies on proving that Px0(t<τ(x)τ(0))Px0(t<τ(0)) when taking t and x to + in the same order as in (52). To check this fact, we start by using the strong Markov property

1Px0(t<Tx)Px0(t<τ(0))=Px0(τ(x)t<τ(0))Px0(t<τ(0))=Ex0{1(τ(x)t)PZ(τ(x))(tτ(x)<τ(0))}Px0(t<τ(0)).(53)

In Denisov and Shneer (2013, theorem 3.5), it is proved that for Lévy processes, for large t,

Px0(t<τ(0))const V(x0)exp(tψ(λ0))t3/2,
where λ0(0,λ*) is the unique positive solution to ψ(λ)=0, and the function V as defined in Denisov and Shneer (2013, theorem 2.1) satisfies
g(x)eλ0xV(x)eλ0xC(54)
for some constant C>0 and some bounded increasing function g. It now follows that the right-hand side of (53) converges to zero. Indeed,
1Px0(t<Tx)Px0(t<τ(0))constEx0{1(τ(x)t)V(Z(τ(x)))}V(x0)t+const1exp(λ0x0)Ex0{1(τ(x)+) exp{λ0(xx0+U(x))}}constEλ*{exp((λ*λ0)(xx0+U(x)))}=o(1),x+.

Here we used the exponential martingale change of measure and the overshoot U(x) once more (recall Remark 7). In the case of a random walk under Cramér’s condition, Denisov and Shneer (2013, theorem 3.5) provide an analogous estimate but with a spatial factor V=1, so all the above considerations still hold. We may now conclude (52):

|Ex0[f(Z(t))|t<τ(0)]Ex0[f(Z(t))|t<Tx]|=|Ex0[f(Z(t));t<τ(0)](1+o(1))Ex0[f(Z(t));t<Tx]|Px0(t<τ(0))|Ex0[f(Z(t));τ(x)t<τ(0)]|Px0(t<τ(0))+o(1)|Ex0[f(Z(t));t<Tx]|Px0(t<τ(0))fo(1),(55)
and the proof is complete. □

Proof of Lemma 5.

We recall that for the process Z absorbed upon exiting (0,x), the absorption time τ(x)τ(0) is exponentially distributed with parameter β(x) if started with the quasi-stationary distribution νx (see, e.g., Collet et al. 2013, theorem 2.2). Observe that the τνx(x) can be decomposed into a sum of a geometric number of duration of cycles, namely,

τνx(x)=j=1Nxτj,(56)
where Nx is the index of the first cycle at which Z exits through [x,+), and τj is the duration of the j-th cycle. Thus, Nx is geometric with parameter q(x)=Pνx(τ(x)<τ(0)) and the times τj are independent and distributed as τ(x)τ(0). Then τνx(x) is a geometric sum of i.i.d. exponential random variables. Moreover, because νx is a quasi-stationary distribution, the time and position at which Z exits (0,x) are independent (Collet et al. 2013, theorem 2.6), and hence so is Nx of τ1,,τNx. Being a geometric sum of exponential random variables which are mutually independent and independent of the number of terms, we conclude that τνx(x) has an exponential distribution of parameter β(x)q(x). □

Proof of Lemma 6.

Let Tx=τ(x)τ(0)=inf{t>0:Z(t)(0,x)} be the first exit time from (0,x); because the quasi-stationary measure is the unique Yaglom limit for the system under consideration, the rate of absorption β(x) is obtained as

β(x)=limt+1tlog Py(Tx>t)(57)
for any y(0,x). Because x>x>y>0 implies TxTxPyalmost surely, it follows that β(x)β(x). The sequence of absorption rates is nonincreasing with x, and hence has a limit β. The same reasoning shows that if β denotes the absorption rate of the QSD on (0,+), then β(x)β for every x1, which implies that ββ>0. □

7. Conclusion

This paper investigates various exploration strategies under time constraints in environments with unknown stochastic dynamics focusing on their impact on performance as measured by the time required to reach a set of rare states.

We aim for this work to be a foundational step toward developing a more qualitative theory of exploration, specifically by incorporating the time needed to observe meaningful signals for the first time. For example, in Reinforcement Learning environments with sparse rewards, exploration of the state space or action-state space is widely recognized as a critical bottleneck to efficiency. By addressing this crucial aspect, we seek to contribute to more efficient and effective exploration strategies in such contexts and beyond.

In this work, we focused on space-invariant one-dimensional dynamics, which provide highly interpretable results and for which we can provide explicit performance guarantees. Building on this understanding, future work will consider more general Markovian dynamics, providing a more comprehensive and realistic framework for exploring challenging environments. A natural extension of this work is to higher-dimensional settings. Although explicit hitting-time estimates of the kind used here are generally unavailable in that regime, the “too-many-particles” phenomenon and the core conclusions of our main results are expected to carry over to considerably more general settings.

Acknowledgments

The authors are grateful to the anonymous referees for their thorough and insightful reviews; their suggestions and observations led to substantial improvements in this work. P.B. and E.G. also express their deep gratitude to professors E. Mordecki and J. R. León for very valuable discussions.

Appendix A. M/M/1

In this section we investigate the interplay between the probability of exploration, under the time regime of interest, and the efficiency of the associated estimation procedure. In particular, we show that the variance of the natural Monte Carlo estimator mirrors the typical exploration time.

Consider an M/M/1 queue on the truncated state space {0,1,,n} with load parameter ρ<1, and assume that X(0)=0 almost surely. The chain is ergodic, and its invariant measure is geometrically decaying: for any kN with kn, the stationary weight is proportional to ρk. By the Ergodic Theorem, the stationary measure π satisfies

π(k)=limt1t0t1(X(s)=k)ds.

A Monte Carlo estimator for π(k), based on a time horizon B(k), is therefore

π^(k)=1B(k)0B(k)1(X(s)=k)ds.

We focus on restrictive time budgets and assume that the estimation of π(k) for large k is performed with a linear-in-k horizon,

B(k)k,k.

In such a regime, the process is expected to spend exceedingly little time in state k: indeed, the expected hitting time of k grows exponentially fast in k. As we show next, this scarcity of observations is the principal obstruction to reliable estimation, and it directly governs the variance of the Monte Carlo estimator.

Fix c>0 and introduce the surrogate estimator

ξ(k)c1(τ(k)<B(k)),
where τ(k) denotes the hitting time of state k. Then, by noting that π^(k)+ξ(k) is almost surely bounded by 1+c, and that |π^(k)ξ(k)|1(τ(k)>B(k))=0, one obtains
|Varπ^(k)Varξ(k)|=|E0[(π^(k)ξ(k))(π^(k)+ξ(k))]E0(π^(k)ξ(k))E0(π^(k)+ξ(k))|2(1+c)·E0|π^(k)ξ(k)|const·P0(τ(k)B(k)),
for some constant independent of k, thus showing that the variance is essentially governed by the rare-event probability of reaching k within the available time budget.

Appendix B. The Fleming-Viot (FV) Particle System and FVRL

The approach found in Mastropietro et al. (2025) for defining a restarted mechanism is based on the paradigm of the FV particle system, introduced by Burdzy et al. (1996). In this framework, a population of particles (which can represent different simulations in parallel) evolves over time, exploring the state space in parallel. When a particle falls into a less promising region A, it is “restarted” by being replaced with a copy of another, more promising particle. This ensures that resources are concentrated on exploring the most fruitful areas of the state space. The FV strategy is particularly effective in scenarios where certain regions of the state space are more likely to yield valuable rewards, as it dynamically reallocates exploration effort toward these regions, thereby increasing the overall efficiency of the exploration process.

Note that various papers (Ferrari and Marić 2007, Asselah et al. 2011, Villemonais 2011) have shown that FV empirical measures converge as the number of particles tends to infinity to the conditioned evolution of the process, that is,

mtN(A)P(XtA|Tt),N,
where T is the hitting time of A. On the other hand, P(XtA|Tt)ν(A), as t where ν is the QSD (a QSD, for countable state space) associated with the process absorbed in A. This is the motivation for restarting according to the QSD in our work.

Appendix C. An Example in RL: The Control of Blocking in an M/M/1/K Queue

This example has been considered in Mastropietro et al. (2025) and generalized to multidimensional settings (several coupled queues). It serves here as a canonical illustration of Reinforcement Learning in environments with sparse and rare rewards where the exploration is the main bottleneck. We summarize the model and the FVRL strategy introduced in Mastropietro et al. (2025) as a motivation for our theoretical results. Consider an M/M/1/K queue with fixed buffer capacity K, arrival rate λ, service rate μ, and load ρ=λ/μ<1. The state space is S={0,1,,K}, representing queue occupancy. The control action a{0,1} determines whether to accept (a=1) or block (a=0) arriving jobs. The reward function is structured to penalize both the underutilisation of resource and large blocking probabilities

r(x,a)={0if a=1B(1+bxxref)if a=0
with B>0, b>1, and reference state xref. This structure induces threshold-optimal policies (blocking all the incoming traffic after a given threshold). For threshold policies parameterized by θ, we define the policy-dependent blocking threshold Kθ=θ+1, representing the state at which the policy πθ begins rejecting incoming jobs. To ensure that the blocking threshold never exceeds the physical buffer, the policy parameter is restricted to the compact domain Θ=[0,K1], so that KθK is guaranteed at all times. Under the average reward criterion and using a policy gradient strategy, one aims at minimizing
Jπθ=xSpπθ(x)aπθ(a|x)Qθ(x,a),
where θΘ is the parameterization of the policy. The policy gradient theorem yields
θJπθ=xSpπθ(x)aQθ(x,a)θπθ(a|x).(C.1)

For threshold policies parameterized by θΘ, this simplifies to

θJπθpπθ(Kθ1)[Qθ(Kθ1,1)Qθ(Kθ1,0)],(C.2)
where Kθ=θ+1 is the learned blocking threshold, and the gradient is concentrated at state Kθ1=θ, the last state before the policy blocks under the current parameterization θΘ.

C.1. The Gradient Estimation Problem

Given this framework, the RL strategy will be efficient if the gradient estimates can be informative. We discuss here only the estimate of the stationary probability pπθ(Kθ1). The stationary distribution for state Kθ, under the threshold policy πθ, is given by the M/M/1/Kθ formula:

pπθ(Kθ)=(1ρ)ρKθ1ρKθ+1.

For typical parameters (ρ=0.7, Kθ=40), pπθ(Kθ)O(107). Because pπθ(Kθ1)pπθ(Kθ)/ρ, both are exponentially small.

Consequently,

  • Vanilla Monte Carlo (MC) estimators of pπθ(Kθ1) exhibit prohibitive variance (in terms of multiplicative errors);

  • Much more importantly, the policy gradient in (C.2) becomes numerically zero under finite sampling. Hence, the learning signal vanishes, preventing convergence to an optimal θ.

C.2. Parallel Sampling

In order to compare our results with those of Mastropietro et al. (2025), we present a set of simulations for parallel M/M/1 queues under different time regimes. The hyperparameters used in Figure C.1 are total queue capacity K=40, initial condition J=12, arrival rate λ=0.7, and service rate μ=1. In this numerical illustration, Kθ=K=40, so that pπθ(K) is the exponentially rare quantity identified in Subsection C.1.

Figure C.1. Simulation of Parallel M/M/1 Queues
Notes. Simulation of parallel M/M/1 queues with K=40, J=12, λ=0.7, and μ=1, comparing renewal-based stationary probability estimates across time horizons 105, 106, and 107. The number of parallel copies to deploy in each time regime is computed using the results of our main theorems.

Following Mastropietro et al. (2025), we estimate the stationary probability via a renewal-type decomposition based on successive returns to J. More precisely, we estimate (i) the expected return time to state J, which in our setting is of order 102, and independently (ii) the probability of hitting K starting from J. For the latter we report estimators under time horizons B(K) of magnitudes 105, 106, and 107, and we compare them with the true stationary value pπθ(K) shown in Figure C.1.

To ensure a fair comparison with the experiment of Mastropietro et al. (2025, section 4.1), we remark that the total number of events observed in our simulations—counting both arrivals and departures—is one to two orders of magnitude larger than theirs. Indeed, across our time regimes we record approximately 1.7×10r events for r=5,6,7, which exceeds the event counts reported in Mastropietro et al. (2025, figure 2). This discrepancy is natural: correlations in the Fleming-Viot particle system substantially facilitate the exploration of the rare state K.

In summary, and as expected, the estimation accuracy with independent particles lies between that of naive Monte Carlo (which rarely reaches K even for time of order 106) and that of the Fleming-Viot particle system.

C.3. Fleming-Viot Solution

The FVRL method addresses this by introducing an absorption set, where prior knowledge is used on the fact that no rewards are granted in A={0,1,,J1} with J<K. The Fleming-Viot particle system consists of N particles (ξtν(i))i=1N evolving in Ac, with a resetting mechanism: particles hitting A jump to positions of randomly selected surviving particles. This particle system approximately estimates the quasi-stationary distribution

νQπ(x)=limtPAc(Xtπ=x|TK>t),
where TK is the hitting time of A. The FVRL algorithm leverages this to construct gradient estimators:
θvπθ^=xÅcp^πθ(x)aQ^θ(x,a)θπθ(a|x),(C.3)
where p^πθ is estimated via the FV particle system.

For K=40, ρ=0.7:

  • Vanilla MC estimates pπ(K)0 even with 106 samples;

  • FV provides accurate estimates with N103 particles;

  • FVRL converges to K* whereas MC-based policy gradient fails completely.

The method effectively trades the rare-event probability pπ(K) for the substantially larger quasi-stationary probability νQπ(K), overcoming the exponential sample complexity of vanilla approaches. We do not compare here our results with the estimates constructed in Mastropietro et al. (2025) because those are more complex than the idealized situation described in Subsection 3.4, which can, however, serve as a rule of thumb for practitioners.

Endnotes

1 The processes we consider fall in their class A category with parameter α=2.

2 The notation was introduced in Remark 4.

3 See Borovkov (2013) for the analogous result for discrete-time random walks.

References

  • Asmussen S, Glynn PW (2007) Stochastic Simulation: Algorithms and Analysis, Stochastic Modelling and Applied Probability, vol. 57 (Springer, New York).Google Scholar
  • Asselah A, Ferrari PA, Groisman P (2011) Quasistationary distributions and Fleming-Viot processes in finite spaces. J. Appl. Probab. 48(2):322–332.Google Scholar
  • Avrachenkov K, Piunovskiy A, Zhang Y (2018) Hitting times in Markov chains with restart and their application to network centrality. Methodology Comput. Appl. Probab. 20(4):1173–1188.Google Scholar
  • Bertoin J (1998) Lévy Processes, Cambridge Tracts in Mathematics, vol. 121 (Cambridge University Press, Cambridge, UK).Google Scholar
  • Bertoin J, Doney RA (1994) Cramér’s estimate for Lévy processes. Statist. Probab. Lett. 21(5):363–365.Google Scholar
  • Borovkov A (2013) Probability Theory (Springer, London).Google Scholar
  • Budhiraja A, Fraiman N, Waterbury A (2022) Approximating quasi-stationary distributions with interacting reinforced random walks. ESAIM Probab. Statist. 26:69–125.Google Scholar
  • Burdzy K, Hołyst R, Ingerman D, March P (1996) Configurational transition in a Fleming-Viot-type model and probabilistic interpretation of Laplacian eigenfunctions. J. Phys. A Math. General 29(11):2633–2642.Google Scholar
  • Collet P, Martínez S, San Martín J (2013) Quasi-stationary Distributions., Markov Chains, Diffusions and Dynamical Systems (Springer, Berlin).Google Scholar
  • Denisov D, Shneer V (2013) Asymptotics for the first passage times of Lévy processes and random walks. J. Appl. Probab. 50(1):64–84.Google Scholar
  • Doney RA (1985) Conditional limit theorems for asymptotically stable random walks. Zeitschrift Wahrscheinlichkeitstheor. Verwandte Gebiete 70:351–360.Google Scholar
  • Doney RA, Maller RA (2004) Moments of passage times for Lévy processes. Ann. Inst. Henri Poincaré Probab. Statist. 40(3):279–297.Google Scholar
  • Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580–586.Google Scholar
  • Evans MR, Majumdar SN, Schehr G (2020) Stochastic resetting and applications. J. Phys. A Math. Theoret. 53(19):193001.Google Scholar
  • Ferrari PA, Marić N (2007) Quasi stationary distributions and Fleming-Viot processes in countable spaces. Electronic J. Probab. 12:684–702.Google Scholar
  • Grigorescu I, Kang M (2013) Markov processes with redistribution. Markov Processes Related Fields 19(3):497–520.Google Scholar
  • Hansen NR (2009) The maximum of a Lévy process reflected at a general barrier. Stochastic Processes Appl. 119(7):2336–2356.Google Scholar
  • Höglund T (1990) An asymptotic expression for the probability of ruin within finite time. Ann. Probab. 18(1):378–389.Google Scholar
  • Iglehart DL (1974) Random walks with negative drift conditioned to stay positive. J. Appl. Probab. 11(4):742–751.Google Scholar
  • Kolb M, Savov M (2014) Exponential ergodicity of killed Lévy processes in a finite interval. Electronic Comm. Probab. 19:1–9.Google Scholar
  • Kyprianou AE (2006) Introductory Lectures on Fluctuations of Lévy Processes with Applications (Springer, Berlin).Google Scholar
  • Kyprianou AE (2014) Fluctuations of Lévy Processes with Applications, Introductory Lectures, 2nd ed. (Springer, Berlin).Google Scholar
  • Kyprianou AE, Palmowski Z (2006) Quasi-stationary distributions for Lévy processes. Bernoulli 12(4):571–581.Google Scholar
  • Lambert A (2000) Completely asymmetric Lévy processes confined in a finite interval. Ann. Inst. Henri Poincaré Probab. Statist. 36(2):251–274.Google Scholar
  • Luby M, Sinclair A, Zuckerman D (1993) Optimal speedup of Las Vegas algorithms. Inform. Processing Lett. 47(4):173–180.Google Scholar
  • Martinez S, San Martin J (1994) Quasi-stationary distributions for a Brownian motion with drift and associated limit laws. J. Appl. Probab. 31(4):911–920. Google Scholar
  • Mastropietro D, Ayesta U, Jonckheere M, Majewski S (2025) Fast-exploring reinforcement learning with applications to stochastic networks. Queueing Systems 109(3):23.Google Scholar
  • Méléard S, Villemonais D (2012) Quasi-stationary distributions and population processes. Probab. Surveys 9:340–410.Google Scholar
  • Monthus C (2021) Large deviations for Markov processes with stochastic resetting: Analysis via the empirical density and flows or via excursions between resets. J. Statist. Mech. Theory Experiment 2021(3):033201.Google Scholar
  • Palmowski Z, Pistorius M (2009) Cramér asymptotics for finite time first passage probabilities of general Lévy processes. Statist. Probab. Lett. 79(16):1752–1758.Google Scholar
  • Villemonais D (2011) Interacting particle systems and Yaglom limit approximation of diffusions with unbounded drift. Electronic J. Probab. 16:1663–1692.Google Scholar