Convergence properties of weighted particle islands with application to the double bootstrap algorithm

Particle island models (Verg\'e et al., 2013) provide a means of parallelization of sequential Monte Carlo methods, and in this paper we present novel convergence results for algorithms of this sort. In particular we establish a central limit theorem - as the number of islands and the common size of the islands tend jointly to infinity - of the double bootstrap algorithm with possibly adaptive selection on the island level. For this purpose we introduce a notion of archipelagos of weighted islands and find conditions under which a set of convergence properties are preserved by different operations on such archipelagos. This theory allows arbitrary compositions of these operations to be straightforwardly analyzed, providing a very flexible framework covering the double bootstrap algorithm as a special case. Finally, we establish the long-term numerical stability of the double bootstrap algorithm by bounding its asymptotic variance under weak and easily checked assumptions satisfied for a wide range of models with possibly non-compact state space.

Particle island models [26] provide a means of parallelization of sequential Monte Carlo methods, and in this paper we present novel convergence results for algorithms of this sort. In particular we establish a central limit theorem-as the number of islands and the common size of the islands tend jointly to infinity-of the double bootstrap algorithm with possibly adaptive selection on the island level. For this purpose we introduce a notion of archipelagos of weighted islands and find conditions under which a set of convergence properties are preserved by different operations on such archipelagos. This theory allows arbitrary compositions of these operations to be straightforwardly analyzed, providing a very flexible framework covering the double bootstrap algorithm as a special case. Finally, we establish the long-term numerical stability of the double bootstrap algorithm by bounding its asymptotic variance under weak and easily checked assumptions satisfied for a wide range of models with possibly noncompact state space.
1. Introduction. This paper discusses approaches to parallelization of sequential Monte Carlo (SMC) methods (or particle filters) approximating normalized Feynman-Kac distribution flows. At present, SMC methods are used successfully for online sampling from sequences of complex distributions in a wide range of applications, including nonlinear filtering, signal processing, data assimilation [see, e.g., 17,4,24,1,7, and the references therein], and rare event analysis [9,3]. These algorithms evolve, recursively and randomly in time, a sample of random draws, particles, with associated importance weights, and the Feynman-Kac distribution flow is approximated by the weighted empirical measures associated with this sample. The particle cloud is updated through selection and mutation operations, where the former duplicates or eliminates, through resampling, particles with large or small importance weights, respectively, while the latter disseminates randomly the particles over the state space and updates accordingly the importance weights for further selection.
SMC methods are computationally intensive, which may be critical in online applications where only a limited computational power is at hand. In particular, since the particle interaction enforced by the selection operation is of "global" nature (as it draws, with replacement, each particle from the entire particle population rather than from a subset of the same), running SMC methods in parallel on multicore processors is not straightforward.
The natural ideal of [26], which is the basis also for the present paper, is to parallelize the algorithm by, instead of considering a single batch of N particles, simply dividing the particle population into N 1 batches of each N 2 particles (i.e., N = N 1 N 2 ), where each batch is referred to as a particle island (or simply an island). In this framework, each island evolves according to the standard SMC scheme subjecting alternatingly the subpopulation to selection and mutation. Unfortunately, the division of the particle population introduces additional bias which may be of note for moderate island sizes N 2 . Thus, in [26] it is proposed to reduce this bias by performing additional selection also on the island level by resampling multinomially, when needed, the islands according to probabilities proportional to the weight averages over the different subpopulations. Selection on the island level may be performed systematically, as in the double bootstrap (B 2 ) algorithm (in the present paper we have chosen to denote the algorithm "B 2 " rather than "2B", as we consider it more correctly described as a "square bootstrap" rather than a "double bootstrap"; nevertheless, the algorithm must not be confused with the SMC square (SMC 2 ) algorithm proposed in [6], which is, if still of a related form, of a different nature) or may be activated adaptively by some criterion measuring the skewness of the island weights. The latter approach will be referred to by us as the double bootstrap algorithm with adaptive selection on the island level (B 2 ASIL). At the end of the day, a sequence of Monte Carlo estimators is obtained by weighing up, using the island weights, the self-normalized empirical measures associated with the different particle islands. Needless to say, the theoretical analysis of B 2 -type algorithms is challenging due to the intricate dependence structure imposed by the island selection operation and the "double asymptotics" introduced by the two sample sizes N 1 and N 2 . The authors of [26], who base their theoretical analysis on a reformulation of the particle island model as an extended Feynman-Kac model on an augmented space of dimension N 2 , detour the latter difficulty by letting first the number N 1 of islands and then the number N 2 of individuals of each island tend to infinity. By separating the asymptotics in this manner, the analysis can, not surprisingly, at least in the case of the B 2 algorithm, be handled using classical techniques from SMC analysis, and in this way the authors establish convergence of bias and variance when these quantities are scaled with the size N of the system. However, working with this somewhat synthetic mode of convergence (with separated limits), the authors fail to supplement their consistency results with a central limit theorem (CLT). Moreover, they do not provide any convergence results for the B 2 ASIL algorithm.
Nevertheless, even though the islands are allowed to interact through selection, any two individuals of the system should become more and more statistically independent as the number of islands as well as the size of the islands grow (cf. the propagation of chaos property of standard SMC methods [8]). Thus, we may expect a law of large number as well as a CLT to hold when N 1 and N 2 tend jointly to infinity. Moreover, in analogy with similar result for standard, single batch SMC methods [see 10,5,19,13], we may expect the rate of such a CLT to be √ N. The aim of the present paper is to improve the existing theoretical analysis of particle island models by establishing results of the previous type. For this purpose we will introduce a notion of archipelagos of weighted islands that generalizes the particle models studied in [26] and consider three kinds of convergence properties of such archipelagos, namely consistency (convergence in probability), asymptotic normality (convergence in distribution in terms of a CLT with rate √ N ), and large deviation (an exponential inequality of Hoeffing-type that holds uniformly over all islands). After this, we perform single-step analyses of three kinds of operations on archipelagos, namely selection on the island level, selection on the individual level, and mutation, and show how these operations preserve the convergence properties under consideration. As a consequence, we are able to establish that the convergence properties in question are preserved through an arbitrary composition of the mentioned operations, including the B 2 algorithm as a special case, and to provide explicit expressions of the associated asymptotic variance. Moreover, the flexibility of our results, which generalize those obtained in [13] for standard, single batch SMC methods, makes these well-suited for analyzing particle island algorithms with adaptive resampling strategies such as the B 2 ASIL scheme, for which we provide a detailed analysis (including a CLT). In our proofs, which rely on limit theorems for triangular arrays obtained in [13], the working process is highly inductive. Since the intricate dependence structures of the particle model force us to define triangular arrays on the island level, we will often, when establishing the preservation of a certain convergence property of a certain operation, face a situation where the only way of obtaining some critical limit or bound is to add the same to the list of induction hypotheses. After this, one establishes that the operation in question preserves also this additional property (limit or bound), by possibly adding, if needed, further assumptions to the list, and so on. At the end of the day, we have obtained a more or less minimal set (a hexad in the case of asymptotic normality) of properties that need to be checked at each induction step. In this machinery, the large deviation property is a critical component, since it provides, as a consequence of the distribution-free character of Hoeffding-type inequalities, uniform control of the deviation of the empirical measures associated with the different islands from their common mean.
As a last contribution, we establish the numerical stability of the B 2 algorithm by bounding uniformly the asymptotic variance of its output. We carry through this analysis under a strong mixing condition as well a local Doeblin condition (see Section 5.10 for details), where the latter is considerably weaker than the former and easily verified for a large variety of models with possibly non-compact state space. When operating under the local Doeblin condition, we let the Feynman-Kac model be indexed by a strictly stationary sequence of random parameters (corresponding, e.g., to random observations in the case of optimal filtering in hidden Markov models) and show, using novel results in [15], that the sequence of asymptotic variances is stochastically bounded (tight) in this setting. On the other hand, imposing the strong mixing assumption, which is classical in the literature of SMC analysis [11,8], allows an explicit, deterministic uniform variance bound to be obtained using standard methods.
To sum up, the contribution of the present paper is threefold, since we • introduce a general theory of archipelagos of weighted particle islands and analyze thoroughly the convergence properties, as the number N 1 of islands and the common size N 2 of the islands tend jointly to infinity, of such objects when subjected to certain operations. For this purpose, we develop a machinery that allows triangular arrays defined on the island level to be analyzed and which may be used for handling double asymptotics appearing in other kinds of island-type particle algorithms. • apply the previous theoretical results to the B 2 and B 2 ASIL algorithms, yielding laws and large numbers and CLTs for these schemes. • establish the long-term stability of the B 2 algorithm under weak and easily checked assumptions.
The paper is organized as follows. In Section 2 we introduce, after some prefatory notation, the concept of archipelagos of weighted islands, and define the three different convergence properties of such archipelagos. Our main results are, along with the three different operations under consideration, presented in Section 3, and Section 4 discusses the application of these results to the B 2 ASIL algorithm. In particular, in Corollary 3 we establish the asymptotic normality of this algorithm, which implies the asymptotic normality of the B 2 algorithm as a special case (see Corollary 4), and provide a formula for the asymptotic variance; moreover, in Section 4.3 establish the long-term stability of the algorithm by showing that the asymptotic variance of the B 2 algorithm may, under suitable assumptions, be bounded uniformly. The most significative proofs are gathered in Section 5, and in order to avoid repetition we have put some additional proofs using similar techniques in the supplementary paper [25]. Finally, Appendix A provides some technical results that are used frequently in Section 5.

Preliminaries.
2.1. Some notation. For (m, n) ∈ Z 2 such that m ≤ n we denote m, n := {m, m + 1, . . . , n} ⊂ Z. Moreover, we denote by and R + and R * + the sets of nonnegative and positive real numbers, respectively, and by N * the set of positive integers. For any quantities {a ℓ } n ℓ=m we will use the vector notation a m:n = (a m , . . . , a n ) with the convention a m:n = ∅ if m > n.
In the sequel we assume that all random variables are defined on a common probability space (Ω, F, P). For some given measurable space (X, X ) we denote by M(X ) and M 1 (X ) ⊂ M(X ) the set of measures and probability measures on (X, X ), respectively. In addition, we denote by F(X ) the set of real-valued measurable functions on (X, X ) and by F b (X ) ⊂ F(X ) the set of bounded such functions. For f ∈ F b (X ) we denote the sup norm h ∞ := sup x∈X |h(x)| and the oscillator norm osc(h) := sup (x,x ′ )∈X 2 |h(x)−h(x ′ )|. For any ν ∈ M(X ) and f ∈ F(X ) we denote by νf := f (x) ν(dx) the Lebesgue integral of f under ν whenever this is well-defined. Now, given also some other (Y, Y) measurable space, an unnormalized transition kernel K from (X, X ) to (Y, Y) is a mapping from X × Y to R + such that for all A ∈ Y, x → K(x, A) is a nonnegative measurable function on X and for all x ∈ X, A → K(x, A) is a measure on (Y, Y). If K(x, Y) = 1 for all x ∈ X, then K is called a transition kernel (or simply a kernel). The kernel K induces two integral operators, one acting on functions and the other on measures. More specifically, let f ∈ F(X ) and ν ∈ M(X ) and define the measurable function whenever these quantities are well-defined. Finally, let K be as above and let L be another unnormalized transition kernels from (Y, Y) to some third measurable space (Z, Z); then we define the product of K and L as the unnormalized transition kernel from (X, X ) to (Z, Z) whenever this is well-defined.
2.2. Weighted particle islands and archipelagos. Let {N 1 (N )} N ∈N * and {N 2 (N )} N ∈N * be sequences of positive integers such that N 1 (N )N 2 (N ) = N for all N ∈ N * and N 1 (N ) → ∞ and N 2 (N ) → ∞ as N → ∞. For lucidity we will often omit the index N from the notation and write simply N 1 and N 2 . In the following, let {(ξ N (i, j), ω N (i, j)); (i, j) ∈ 1, N 1 × 1, N 2 } be an array of X-valued random variables (the ξ N ) with associated nonnegative (possibly unnormalized) weights (the ω N ). Each subset {(ξ N (i, j), ω N (i, j))} N 2 j=1 , i ∈ 1, N 1 , of the array will be referred to as an island. With this terminology, a random variable ξ N (i, j) in the array will be referred to as an individual or a particle. Finally, we associate each island {(ξ N (i, j), ω N (i, j))} N 2 j=1 with a nonnegative (possibly unnormalized) weight Ω N (i). In the following, the set {(Ω N (i), {(ξ N (i, j), ω N (i, j))} N 2 j=1 )} N 1 i=1 of islands with associated weights will be referred to as an archipelago on (X, X ). We will always require the island weights to be positive and the particle weights to be positive and uniformly bounded, i.e., there exists some constant |ω| ∞ such that 0 < |ω N (i, j)| ≤ |ω| ∞ for all (i, j) ∈ 1, N 1 (N ) × 1, N 2 (N ) and N ∈ N * . 2.3. Convergence properties of archipelagos. In the following, any limit (−→), limit in probability ( P −→), and limit in distribution ( D −→) is supposed to hold as N → ∞ if not specified differently.
Note that the estimator in (C1) assigns the weight associated with island i ∈ 1, N 1 , and the smallness condition (C2) formalizes the fact that this weight, and thus the contribution of each island to the estimator associated with the archipelago as a whole, should vanish asymptotically as N → ∞.
Definition 2.2 (exponential deviation). In the following, let η ∈ M 1 (X ) and ̺ and The exponential deviation inequality in (D) provides uniform control on the deviations of the unnormalized importance sampling estimators N 2 j=1 ω N (i, j)h(ξ N (i, j))/N 2 , i ∈ 1, N 1 , associated with the different islands from their common mean level ̺ × ηh. The factor N 1 on the right hand side of the equality is required to compensate for the maximum with respect to the island index. Assumption (D) implies, by a straightforward extension of the generalized Hoeffding inequality derived in [12,Lemma 4], that also the deviations of the properly normalized importance sampling estimators associated with the different islands from the expectations targeted by the archipelago can be uniformly controlled as follows.
Finally, we introduce a third convergence property describing weak convergence in the sense of a CLT. Let N denote the Gaussian distribution.
Definition 2.4 (asymptotic normality). In the following, let and, in addition, Here (AN1) corresponds to a CLT and implies straightforwardly (C1) . In addition, since (AN6) implies immediately (C2) we may conclude that asymptotic normality is stronger than consistency.

Main results.
3.1. Operations on weighted archipelagos. We will next consider three different operations on archipelagos and show how these operations preserve the three convergence properties listed in the previous section. In the following we let P({a(i)} M i=1 ) denote the discrete probability distribution induced by a set {a(i)} M i=1 of positive (possibly unnormalized) numbers; thus, writing V ∼ P({a(i)} M i=1 ) means that the random variable V takes the value i ∈ 1, M with probability a(i)/ M i ′ =1 a(i ′ ).
3.1.1. Selection on the island level. The first operation, described in Algorithm 1 below, is referred to as multinomial selection on the island level (SIL). The operation in question consists in converting an archipelago with uniform island weights targeting the same distribution η. This step allows islands with small/large weights to be eliminated/duplicated, respectively. More precisely, a new family of islands is generated from the existing ones by selecting, conditionally independently given the input archipelago, new islands according to probabilities proportional to the island weights After this, the weights and the particles of the selected islands are copied deterministically (which of course implies that the particle weights of the new archipelago are bounded by the same constant |ω| ∞ as the ancestor archipelago).
Algorithm 1: Multinomial selection on the island level (SIL) In the following we will abbreviate Algorithm 1 by writing The following theorems state conditions under which SIL preserves consistency, exponential deviation, and asymptotic normality.
denote consequently the input and output, respectively, of Algorithm 1 and all proofs are found in Section 5.
We impose the following assumption, guaranteeing that N 1 grows only subexponentially fast with respect to N 2 .

3.1.2.
Selection on the individual level. A second operation, described in Algorithm 2 below, is referred to as multinomial selection on the individual level (SiL), and consists in converting a weighted archipelago with uniform particle weights targeting the same distribution η. This step allows particles with large/small weights to be duplicated/eliminated, respectively. Note that the island weights remain unaffected.
Algorithm 2: Multinomial resampling on the individual level (SiL) Trivially, the particle weights are bounded by |ω| ∞ = 1 in this case. As for the SIL operation, we will express Algorithm 2 in a compact form by writing As established by the following theorems, where denote the input and output of Algorithm 2, respectively, also SiL preserves consistency, exponential deviation inequality, and asymptotic normality.
Again, proofs are found in Section 5.

3.1.3.
Mutation. The last operation considered by us is Mutation, which is described in Algorithm 3 below. This operation converts, using importance sampling on the individual level, an archipelago targeting some other probability distributionη, defined on another state space (X,X ). The distributionη is related to η through the identity where Q : X ×X → R + is a possibly unnormalized transition kernel. In the algorithm that follows, let R : X ×X → R + be a (normalized) transition kernel such that Q(x, ·) ≪ R(x, ·) for all x ∈ X, and denote the corresponding Radon-Nikodym derivatives by In the sequel, we will refer to the mapping w as the importance weight function and assume that w ∈ F b (X X ) and Q½X ∈ F b (X ).
Algorithm 3: Mutation As before, we will abbreviate Algorithm 3 by writing where the kernel Q is included in the notation for the sake of completeness. Note that the Mutation operation forms indeed a proper weighted archipelago with |ω| ∞ = |ω| ∞ w ∞ .
In conformity with the SIL and SiL operations, the Mutation operation preserves consistency, exponential deviation, and asymptotic normality. This is established below, where denote consequently the input and output of Algorithm 3, respectively.
Remark 10. Note that Theorem 3.6 and Theorem 3.9 hold true regardless of the intermutual rates by which N 1 and N 2 tend to infinity with N . In particular, these results do not, on the contrary to Theorem 3.3, require the condition (S) . This is in line with what we expect, as the SiL and Mutation operations do not involve any island interaction.

Applications.
4.1. Feynman-Kac models. For a sequence of unnormalized transition kernels {Q n } n∈N defined on some common measurable space (X, X ) and some probability distribution η 0 ∈ M 1 (X ), a sequence {η n } n∈N of Feynman-Kac measures is defined by (with usual convention n p=m a p = 1 when m > n). We may express recursively the sequences of unnormalized and normalized Feynman-Kac measures as, for h ∈ F b (X ) and (m, n) ∈ N with m ≤ n, In particular, which means that we may cast the model into the framework considered in Section 3.1.3.
Example 1. A special instance of the previous framework is formed naturally by specifying, first, a sequence {M n } n∈N of normalized (Markov) transition kernels on (X, X ) with an associated initial distribution χ and, second, potential functions {g n } n∈N * , where g n : X → R * + for all n ∈ N * , and letting Q n h(x) := M n (g n+1 h)(x), n ∈ N * , x ∈ X, and h ∈ F b (X ). In addition, η 0 := χ. This setup covers a large variety of important models in probability and statistics, such as optimal filtering in hidden Markov models (or state-space models; see, e.g., [2]) and models for the analysis of rare events [9,3]. We will return to this setting in Section 4.3.
Using a Feynman-Kac model in practice is typically non-trivial as neither the distribution flow {γ n } n∈N nor {η n } n∈N can be computed in a closed-form in general (with the exception of the very specific cases of optimal filtering in linear state-space models, in which case the solution is provided by the Kalman filter, or hidden Markov models with finite state space).

4.2.
The double bootstrap algorithm with adaptive selection. In this section, our aim is to form online a sequence of archipelagos targeting the Feynman-Kac flow {η n } n∈N by using sequentially the operations described in Section 3. A special feature of the approach that we consider is that the SIL operation is not performed systematically at every iteration of the algorithm, but only when the island weights fail to satisfy some appropriately defined skewness criterion. In this way we avoid adding unnecessary variance to the estimator. More specifically, we will analyze an algorithm proposed in [26,Algorithm 3], where SIL is executed on the basis of the so-called coefficient of variation (CV; see [18] and [21]) given by CV 2 The CV is closely related to the efficient sample size (ESS, proposed in [20]), which is the criterion used in [26]; nevertheless, since the ESS can be expressed as , the two criteria are equivalent. Note that the CV is minimal (zero) when all island weights are perfectly equal and maximal (N 1 − 1) in the situation of maximal skewness, i.e., when the total mass of the system is carried by a single island (a situation which is however not possible in our framework, as we always assume the island weights to be strictly positive). More specifically, as long as the CV stays below a specified threshold τ > 0, we let the N 1 islands evolve without interaction according to mutation and selection on the individual level. However, when the island weights get too dispersed as measured by the CV criterion, the islands are rejuvenated by SIL. The scheme, referred to by us as the double bootstrap with adaptive selection on the island level (B 2 ASIL), is described in Algorithm 4, where we have added the iteration index p to the weighted archipelagos returned by the algorithm.
Algorithm 4: The B 2 ASIL algorithm Using the theoretical results obtained in Section 3 we may prove the following result, establishing that exponential deviation and asymptotic normality are preserved through one iteration of the B 2 ASIL algorithm. As a by product we obtain the incremental asymptotic variance caused by an iteration. Since focus is set on asymptotic normality, we provide recursive formulas describing precisely the evolution of the functionals and measures involved in (AN1-5) , while leaving the derivation of the analogous formulas for the constants of the exponential deviation bound (D) to the reader. The proof is this result provides a nice illustration of the efficiency by which the theoretical results obtained in Section 3, despite appearing somewhat involved at a first sight, can be applied for analyzing sequences of arhipelagos produced by executing alternatingly the SIL, SiL, and Mutation operations in an arbitrary order.
satisfies exponential deviation and is asymptotically normal for (η n , σ 2 n , ν 2 n , {µ where η n+1 is given by (4.2) and for all h ∈ F b (X ), Proof. First, note that since the input archipelago satisfies (AN3), it holds that where ε n is defined in the statement of the theorem. Consequently, after the if -else statement in Algorithm 4, the resulting archipelago {(Ω (p) satisfies, by Theorem 3.2 and Theorem 3.3, exponential deviation and asymptotic normality, the latter for (η n , σ 2 n , ν 2 n , {µ 3 ) if ε n = 1.
Finally, considering also the final Mutation operation in Algorithm 4, and propagating, for the two different values of ε n , the quantities of the previous display through the updating formulas of Theorem 3.9, establishes, together with Theorem 3.8, the statement of the theorem.
, n ∈ N, produced by the B 2 ASIL algorithm satisfies exponential deviation and asymptotic normality, where for h ∈ F b (X ) and n ∈ N * , (under the standard conventions that n ℓ=m a ℓ = 1, n ℓ=m a ℓ = 0, and Q m · · · Q n = id if m > n), and {ε n } n∈N * is given in Theorem 4.2. In addition, µ Proof. The non-recursive expression above are verified using induction. More specifically, one assumes that the given expressions of (σ 2 n , ν 2 n , µ 1 ) hold true for some n ∈ N (and for all h ∈ F b (X )) and plug the same into the recursive expressions established in Theorem 4.2 under repeated use of the identities and η ℓ Q ℓ · · · Q n−1 ½ X × η n Q n ½ X = η ℓ Q ℓ · · · Q n ½ X (ℓ ∈ N).
We leave this as an exercise to the reader. To verify the base case n = 1, note that the initial archipelago } is, by the standard CLT and law of large numbers of for independent random variables, asymptotically normal for (η 0 , , and satisfies, by Hoeffding's inequality, exponential deviation for (η 0 , 1, 2, 1/2). Now, by Theorem 3.9 and Theorem 3.9 also the weighted archipelago {(Ω (1) , obtained by mutating the initial archipelago, satisfies exponential deviation and asymptotic normality for µ (1) 1 = η 1 and Under the standard conventions, this is however in agreement with the formula in the statement of the theorem. This completes the proof.
Of special interest is of course the special case where SIL is applied systematically at every iteration, corresponding to τ = 0. This yields the standard B 2 algorithm, in which case the asymptotic variance is given by the following corollary.
, n ∈ N, produced by the B 2 algorithm satisfies exponential deviation and asymptotic normality, where for h ∈ F b (X ) and n ∈ N * , Proof. The result is an immediate consequence of Corollary 3, as τ = 0 implies that ε n = 1 for all n ∈ N * .
On the other hand, letting ε n = 0 for all n ∈ N * in Corollary 3, corresponding to the case where SIL is never applied, yields the variance which we recognize as the well-known formula for the asymptotic variance of the standard SMC algorithm (more specifically, the sequential importance sampling with resampling, SISR, algorithm). This is completely in line with our expectations, as such an algorithm would simply propagate N 1 independent (non-interacting) islands, each island evolving as a standard SMC algorithm based on N 2 particles.

4.3.
Long-term stability of the double bootstrap algorithm. As a last part of our study we establish the long-term numerical stability of the B 2 algorithm by providing a time uniform bound on the asymptotic variance of its output. Throughout this section we will, in the spirit of Example 1, assume that each unnormalized transition kernel Q p , p ∈ N, can be decomposed into a normalized transition kernel M p : X × X → [0, 1] and a nonnegative potential potential function g p+1 : X → R + , i.e., for all h ∈ F b (X ) and x ∈ X, In this setting, given a sequence {R p } p∈N of proposal kernels such that M p (x, ·) ≪ R p (x, ·) for all x ∈ X and p ∈ N, the importance weight function is given by Remark 5. Instead of letting the Feynman-Kac distribution flow be generated by the unnormalized kernel (4.6), one could, as in [26], consider an alternative model with a flow {η p } p∈N generated by withQ 0 = M 0 andη 0 = χ. In [8] the two models (4.6) and (4.7) are referred to as updated and prediction Feynman-Kac models, respectively. For the prediction model, it is, in the case of the B 2 algorithm, possible to achieve full adaptation (borrowing the terminology of [23]) of the algorithm, i.e., to generate archipelagos with uniformly weighted islands and individuals targeting the distribution sequence of interest, by letting R p = M p for all p ∈ N and decomposing the dynamics (4.7) into the product , is the Boltzmann multiplicative operator associated with the potential g p . Now (4.8) allows also the Feynman-Kac transition according toQ p to be decomposed into two subsequent Feynman-Kac sub-transitions, the first according to G p and the other according to M p . The former corresponds to the Mutation operation (4.9) {(Ω (p) which simply assigns each particle and island the weightsω N (i, j))/N 2 , respectively (where we assumed that we start with uniformly weighted islands and individuals). After this weighing operation, the output (4.9) is, in accordance with Algorithm 4 (with τ = 0), subjected to the SIL and SiL operations followed by the Mutation operation yielding an archipelago with perfectly uniform island and individual weights approximating the Feynman-Kac distributionη p+1 at the next time point. Also this algorithm may be analyzed straightforwardly using our results, and carrying through this analysis retrieves exactly the variance expression obtained in [26,Equation 43]. We leave this as an exercise to the interested reader.
The previous way of obtaining an archipelago with uniformly weighted islands and individuals approximating the prediction Feynman-Kac distribution flow can be viewed as a special instance of a general auxiliary double bootstrap algorithm (extending the so-called auxiliary particle filter proposed in [23]) based on the decomposition , is a Boltzmann multiplicative operator associated with some positive auxiliary importance weight function t p ∈ F b (X ), anď In analogy with the previous, we may thus construct an alternative algorithm approximating {η p } p∈N by furnishing the main loop of the B 2 algorithm with a prefatory weighing operation

and, after intermediate SIL and SiL operations, a terminating Mutation operation
where, consequently, the all weights are given by the importance weight functioň Thus, choosing t p (x) as some prediction of the value of the derivative dQ p (x, ·)/dR p (x, ·) in the support of R p (x, ·) yields close to uniformly weighted islands and individuals (i.e., a close to fully adapted algorithm); for instance, following [23], a possible design is t p (x) = dQ p (x, ·)/dR p (x, ·)(R p id(x)). Of course, also this algorithm can be analyzed easily using our results (we refer to [14] for such an analysis of the standard auxiliary particle filter).

4.4.
Time uniform convergence under the strong mixing assumption. When studying the numerical stability of the B 2 algorithm we will first work under the following strong mixing condition.
The assumption (M) (i), implying that each M p allows the whole state space X as a 1-small set, is rather restrictive and requires typically the state space X to be a compact set. Still, it plays a vital role in the literature of SMC analysis [see, e.g., 11,8,2,14,22,16]. On the other hand, the weaker assumption (M) (ii) is satisfied for most applications and (M)(iii) does not require the potential functions to be uniformly bounded from below; the latter is a condition that appears frequently in the literature. Under (M), denote (4.11) ρ then the previous assumptions allow the following explicit time uniform bound to be derived.
Corollary 6. Suppose (M) . Then the sequence of asymptotic variances of the B 2 algorithm (see Corollary 4) satisfies, for all n ∈ N and h ∈ F b (X ), where ρ is defined in (4.11).
The proof is found in Section 5.10.

4.5.
Time uniform convergence under a local Doeblin condition. The explicitness and simplicity of the variance bound in Corollary 6 are obtained at the cost of restrictive model assumptions that are rarely satisfied in real-world applications. Thus, in this section we will discuss how the assumptions of (M) can be lightened considerably and turned into easily verifiable conditions, satisfied for many models of interest, by considering assumptions under which the asymptotic variance is stochastically bounded (tight) rather than bounded by a deterministic constant. Since the asymptotic variance (4.4) of the B 2 algorithm differs only from that of the SISR algorithm (see (4.5)) by the factors n − ℓ, the results obtained in this section will rely heavily on similar results obtained in [15] for the standard bootstrap particle filter. For this purpose, assume that each potential function depends on time through some random parameter only, i.e., for all p ∈ N * , g p = g Z p , where {Z p } p∈N is some stochastic process taking values in some state space (Z, Z) and g z ∈ F b (X ) for all z ∈ Z. Moreover, we assume that the normalized transition kernels of the model are time homogeneous, i.e., M p = M for all p ∈ N, and that Mutation is based on the underlying dynamics of the model, i.e., R p = R = M , and, consequently, w p (x, x ′ ) = w p Z p+1 (x, x ′ ) = g Z p+1 (x ′ ) for all p ∈ N and (x, x ′ ) ∈ X 2 . Thus, in this case the model generates a parameter dependent Feynman-Kac flow {η p Z 0:p } p∈N . (For instance, in the case of a hidden Markov model, the sequence {Z p } p∈N plays the role of noisy observations of some Markov chain (the state process) {X p } p∈N with transition kernel M on (X, X ). Conditionally on the state process, the observations are assumed to be independent and such that the conditional density Z ∋ z → g z (x) of each Z p depends on the corresponding state X p = x ∈ X only. In this important framework, η p Z 0:p is the so-called filter distribution at time p, i.e., the conditional distribution of the latent state X p given the observations Z 0:p . In this case, the asymptotic variance σ 2 n (h) of the B 2 algorithm is a function of the random vector Z 0:n , and we write σ 2 n Z 0:n (h) to emphasize this fact. We will replace the condition (M)(i) by a considerably weaker condition of the following type.
Definition 4.7. A set C ∈ X is local Doeblin with respect to M if there is ϕ C ∈ M 1 (X ) with ϕ C (C) = 1 and constants 0 < σ − C < σ + C such that for all x ∈ C and A ∈ X , Now, impose the following assumption.
(L) The process {Z p } p∈N is strictly stationary and ergodic. Moreover, there exists a set K ∈ Z such that the following holds.
(ii) For all ε > 0 there exists a local Doeblin set C such that for all z ∈ Z, (iii) There exists a set D ∈ X such that inf x∈D M (x, D) > 0 and The condition (L) can be checked easily for a large variety of models; see [15,Section 4] for examples.
Remark 8. The condition (L) can be weakened further by requiring the local Doeblin condition to hold only for some iterate Q z 1 · · · Q z r , z 1:r ∈ Z r , with a minorizing measure ϕ C and constants σ − C , σ + C possibly depending on the block z 1:r ; we refer to [15] for details. In this paper we have however chosen to state the most basic version of the condition (corresponding to r = 1) for simplicity.
Then the following holds true.
Corollary 9. Assume (L) and suppose in addition that η 0 ∈ M 1 (X , D). Then for all h ∈ F b (X ), the sequence {σ 2 n Z 0:n (h)} n∈N of asymptotic variances of the output of the B 2 algorithm is tight, i.e., it satisfies The previous result is obtained by inspecting the proof of [15,Theorem 11]; see Section 5.11 for some details.

Proofs.
In this section we will use repeatedly the same notation {U N (i)} N 1 i=1 to denote triangular arrays (in the sense of Theorem A.1 and Theorem A.2), even though the roles of these arrays change throughout the proofs. 5.1. Proof of Theorem 3.1. We apply Theorem A.1. For this purpose, define the triangular array and filtration ) it holds, as the ancestor sample is assumed to be consistent, Thus, since |U N (i)| ≤ h ∞ /N 1 < ∞ for all i ∈ 1, N 1 , it is enough to check the conditions (A1) and (A2) in Theorem A.1. The tightness condition (A1) is straightforwardly satisfied as sequences that converge in probability are tight. Moreover, to check (A2) we may apply Lemma A.3 with V N = h ∞ /N 1 and Y N = X N = 0. Thus, the limits, in probability as coincide, which completes the proof.

Proof of Theorem 3.2. Trivially, since {I
where the right hand side has an exponential bound by assumption. This completes the proof.

Proof of Theorem 3.3.
In order to check (AN1) using Theorem A.2, define the array (5.3) and let {F N } N ∈N * be the filtration (5.1). We first note that E[U 2 Along the lines of (5.2), By assumption, the second term on the right hand side of (5.4) converges in distribution to a Gaussian random variable with zero mean and variance σ 2 (h). To treat the first term using Theorem A.2 we first consider where the limit holds as the ancestor archipelago is assumed to satisfy (AN2). In addition, as the right hand side tends, as the ancestor archipelago satisfies (AN1), in distribution to a scaled χ 2 -distributed random variable as N → ∞. Combining the two previous displays shows that the condition (B1) in Theorem A.2 holds with limit ς 2 (h) = ν 2 (h). To check the condition (B2) in the same lemma, we note that max i∈ 1, Note that the sequence {V N } N ∈N * is F N -adapted and vanishes in probability as N → ∞ since the ancestor archipelago is assumed to satisfy (D) and thus Equation 2.1. We may then apply Lemma A.3 to check the condition (B2) in Theorem A.2, implying that for all u ∈ R, Now, using this limit, the decomposition (5.4), and hypothesis that the ancestor archipelago satisfies (AN1), we conclude, via Lemma A.5, that for all u ∈ R, which concludes the proof of (AN1).
To check that the output of Algorithm 1 satisfies also (AN2-5) we proceed along similar lines; we refer the reader to the supplementary paper [25, Section 1.1] for all details.

Proof of Theorem 3.4.
We first note that (C2) is trivially satisfied. In order to check (C1) we apply Theorem A.1 to the array associated with {F N } N ∈N given by (5.1). Note that all indices {J N (i, j) ∈ 1, N 1 × 1, N 2 } are conditionally independent given F N . Moreover, for all i ∈ 1, where convergence holds by assumption. First, note that U N (i) ≤ h ∞ < ∞ for all i ∈ 1, N 1 and N ∈ N * . Moreover, (A1) is trivially satisfied. Thus, consistency is established by showing that (A2) is satisfied, which is an immediate implication of Lemma A.3 with , which is F N -adapted and tends to zero in probability thanks to (C2). Hence, by Theorem A.1, the series have the same limit ηh in probability. This completes the proof.

5.5.
Proof of Theorem 3.5. We may bound the quantity of interest according to max i∈ 1,N 1 where we have set (5.7) By Lemma 2.3, the second term on the right hand side satisfies For each i ∈ 1, N 1 , the variables {δ N (i, j)} N 2 j=1 are, conditionally on F N , independent and identically distributed with zero mean; moreover, as |δ N (i, j)| ≤ 2 h ∞ for all (i, j) ∈ 1, N 1 × 1, N 2 , Hoeffding's inequality implies that for all ε > 0, Combining the previous two displays show that (D) is satisfied with the choice ofc 1 and c 2 given in the theorem. 5.6. Proof of Theorem 3.6. We start with (AN1) . In order to apply Theorem A.2, define the array (5.9) equipped with the usual filtration {F N } N ∈N given by Equation 5.1. We first note that N 1 and N ∈ N * . In order to check (B1), write, following the arguments of the proof of Theorem 3.4, Moreover, since Since the ancestor archipelago satisfies (2.1) and is consistent for η, we deduce that Then, since the ancestor archipelago also satisfies (AN3) we conclude that the variance (5.10) tends in probability to µ 1 {(h − ηh) 2 }. Consequently, the triangular array satisfies Assumption (B1) with limit µ 1 {(h − ηh) 2 }. In order to check Assumption (B2) we may apply Lemma A.3 by bounding where the δ N s are defined in (5.7). Here {V N } N ∈N * is F N -adapted and tends to zero in probability by (AN6) and Lemma 2.3. In addition, {X N } N ∈N * is F N -adapted and, by (AN6), tight. Moreover, for all N ∈ N * , Y N has, by (5.8), a tail of the type Thus, Lemma A.3 applies, which establishes (B2) . Finally, we may conclude, by using Lemma A.5, the proof of (AN1) to obtain thatσ 2 The conditions (AN2-6) are checked using similar techniques. We refer the reader to the supplementary paper [25, Section 1.2] for all details. 5.7. Proof of Theorem 3.7. First, note that (5.11) in Algorithm 3. In order to determine the limit in probability of this quantity we apply Theorem A.1 to the array associated with the filtration {F N } N ∈N * given in (5.1). For each (i, j) ∈ 1, N 1 × 1, N 2 , the conditional distribution ofξ N (i, j) given F N is R(ξ N (i, j), ·); thus, implying that (5.13) where convergence holds since the ancestor archipelago satisfies Assumption (C1) . This implies (A1). To check also the condition (A2) we apply Lemma A.
where V N is F N -adapted and tends to zero in probability by the assumption (C2). Hence, Theorem A.1 ensures that the two series have the same limit ηQh in probability.
Moreover, by setting h is equal to the constant function ½X we deduce that (5.14) which allows us to complete the proof of (C1) using Slutsky's lemma. Finally, Assumption (C2) is checked straightforwardly by just noting that max i∈ 1,N 1 , where the right hand side tends to zero in probability by (5.14) and the fact that the ancestor archipelago satisfies (C1).
5.8. Proof of Theorem 3.8. Note that̺ ×ηh = ̺ × ηQh and bound the quantity of interest according to (5.15) max Since the input archipelago satisfies (D) it holds that For each i ∈ 1, N 1 , the random variables {δ N (i, j)} N 2 j=1 are, conditionally on F N , independent and, by (5.12), zero mean. Moreover, since for all (i, j) ∈ 1, where δ is defined in the statement of theorem, Hoeffding's inequality implies that for all ε > 0, By combining the two previous displays we may conclude that (D) is satisfied withc 1 and c 2 defined as in the theorem statement. 5.9. Proof of Theorem 3.9. We preface the proof by the following auxiliary result, which is obtained as a straightforward extension of the generalized hoeffding inequality in [12,Lemma 4].
Lemma 5.1. Let the assumptions of Theorem 3.8 hold. Then for all N 1 ∈ N * , N 2 ∈ N * , and ε > 0, To check (AN1), take h ∈ F b (X ) and assume without loss of generality thatηh = 0 and, consequently, ηQh = 0. We again rewrite the estimator according to (5.11) and apply Theorem A.2 to the second factor. For this purpose, define the array and furnish the same with the filtration {F N } N ∈N * defined in (5.1). We may now write (5.20) where, by assumption, since Qh ∞ ≤ h ∞ Q½X ∞ , the second term on the right hand side satisfies the CLT Our main challenge will be to handle the first term on the right hand side of (5.20). Since all individuals of the mutated archipelago are conditionally independent given F N , we notice that for all i ∈ 1, N 1 , Using this, we turn to the variance and deduce the expression which tends in probability to µ 2 R(w 2 h 2 ) − µ 2 (Qh) 2 as the input archipelago satisfies Assumption (AN4) . This implies that Assumption (B1) in Theorem A.2 holds with the same limit. To verify the Lindeberg condition (B2) in Theorem A.2, note that proceeding as in (5.15) yields where, for N ∈ N * , N (i, j)) , .
Here {V N } N ∈N * is F N -adapted and tends to zero in probability by (AN6) and Lemma 2.3, Thus, by Lemma A.3, (B2) holds true, and we may conclude the proof of (AN1) using first Lemma A.5 and then Slutsky's lemma. We turn to (AN2) and decompose the quantity under consideration according to The convergence in probability of the second term on the right hand side will now to be established using Theorem A.1. For this purpose, define the triangular array (5.23) and associate the same with the σ-field F N defined in (5.1). We now apply the previous machinery and study the convergence of the series which tends in probability to ν 2 (Qh) + µ 3 R(w 2 h 2 ) − µ 3 (Q 2 h) as the ancestor archipelago satisfies (AN2) and (AN5) . Thus, the condition (A1) in Theorem A.1 is checked. In addition, (A2) is checked using Lemma A.3, as where {X N } N ∈N * is F N -adapted and tight by (AN6) and each Y N has, by [12,Lemma 4], since the input and output archipelagos satisfy (D) , a tail of the form (A.1) (with α = 1). Thus, (A1) holds true, and we may conclude that the series N 1 i=1 U N (i) and tend to the same limit in probability. We turn to the first term of (5.22) and show that this tends to zero in probability. Indeed, note that the absolute value of the same is, up to the factor N 1 i ′ =1 Ω N (i ′ )/ N 1 i ′′ =1Ω N (i ′′ ), which converges in probability by (5.14), bounded by where the first factor vanishes in probability by Lemma 5.1 and Lemma A.4, and the convergence of the second factor was established above. This establishes (AN2).
To check Assumption (AN6), consider the bound where the second factor on the right hand side is tight as the ancestor archipelago is assumed to satisfy (AN6). Moreover, as the third factor tends to 1/ηQ½X in probability by (5.14) we conclude that (AN6) holds true also for the output. The assumptions (AN3-5) are checked using similar techniques, and we refer the reader to the supplementary paper [25, Section 1.3] for all details. 5.10. Proof of Corollary 6. For ℓ ∈ N, combining the identity However, since, by [25, Section 2], it holds that Finally, the proof is concluded by summing up the terms.
5.11. Proof of Corollary 9. Since the ℓth terms of the asymptotic variances (4.4) and (4.5) differ only by the multiplicative constant n − ℓ, the proof follows straightforwardly by direct inspection of the proof of the analogous result for the standard bootstrap particle filter given in [15,Theorem 11] (which in turn is an adaptation of the proof of Theorem 10 in the same paper, providing the analogous result for the particle predictor). More specifically, the result is obtained by • embedding, using a trivial extension of Kolmogorov's extension theorem, the stationary sequence {Z p } p∈N into a stationary process {Z p } p∈Z with doubly infinite time.

APPENDIX A: TECHNICAL RESULTS
We first recall two results, obtained in [13], which are essential for the developments of the present paper. (B1) For some constant ς 2 > 0, as N → ∞, (B2) For all ε > 0, as N → ∞, Then, for all u ∈ R, as N → ∞, The following lemma is useful when verifying the tightness conditions (A2) and (B2).
Proof. We start with the case p = 1. First, note that for all υ > 0, where we used the condition (iii) in the second step. Thus, using the standard upper tail bound for Gaussian distributions. In any case, In addition, note that (ii) implies, for all ε ′ > 0 and all δ > 0, the existence of a constant λ δ < ∞ such that for all λ ≥ λ δ , Now, for any ε > 0 and λ > 0 the quantity of interest may be bounded as For α = 1, while the case α = 2 corresponds to the first case of (A.2). Consequently, as N → ∞, which completes the proof.
Lemma A.4. Let a ∈ R be nonzero, a = 0, and let {X N (i)} N 1 i=1 , N 1 ∈ N * , be random variables such that X N (i) = 0 for all i ∈ 1, N 1 . Assume that max i∈ 1,N 1 |X N (i)−a| P −→ 0 as N → ∞. Then max i∈ 1,N 1 Proof. Pick ε > 0; then we may write for all η > 0, where the first term tends to zero as N tends to infinity for any η by assumption. For all i ∈ 1, N 1 , there exists, by Taylor's formula, ζ N (i) ∈ (X N (i) ∧ a, X N (i) ∨ a) such that Thus, if a > 0 and 0 < η < a, P max i∈ 1,N 1 where the right hand side tends, by assumption, to zero as N tends to infinity. On the other hand, if a < 0 and 0 < η < −a, P max i∈ 1,N 1 where again the right hand side tends to zero. This concludes the proof.
Lemma A.5. Let {Z N } N ∈N be a sequence of random variables such that for some constant z ∞ ∈ R, as N → ∞, E [Z N ] → z ∞ and let {X N } be a sequence of random variables that (i) converges in probability to some constant x ∞ ∈ R and (ii) is uniformly bounded by some constant x + ∈ R. Then, as N → ∞,