Importance Sampling for Rainbow Option Pricing

Published Online:https://doi.org/10.1287/stsy.2025.0110

Abstract

This paper studies the applications of state-dependent importance sampling in pricing exotic rainbow options. We demonstrate that for many rainbow options, efficient dynamic importance sampling schemes can be constructed from subsolutions to appropriate partial differential equations. We also introduce an alternative large deviations scaling, which leads to universal importance sampling for rainbow options.

1. Introduction

Importance sampling is a general variance reduction technique in Monte Carlo simulation, and it has numerous applications in computational finance; see, for example, Glasserman et al. (1999, 2008), Glasserman (2004), Guasoni and Robertson (2008), and Wang (2012). In importance sampling, samples are simulated from alternative probability distributions, and an unbiased estimator is formed by multiplying each sample with an appropriate likelihood ratio. The literature on importance sampling is vast. Various methodologies and performance analyses can be found in this very partial list: Sigmund (1976), Heidelberger (1995), Sadowsky (1996), Dupuis and Wang (2004), Blanchet et al. (2007), Rubinstein and Kroese (2007), Bucklew (2010), and the references therein. Although most of the work on importance sampling is empirical, the goal of this paper is to build efficient importance sampling algorithms for the estimation of the price of high-dimensional rainbow options with rigorously provable asymptotically optimal schemes in the context of rare event simulation.

Almost all of the applications of importance sampling to computational finance are concerned with state-independent changes of measure. Within our context, state independence means that the change of measure can only depend on time. In other words, the alternative sampling distribution, which is usually obtained from solving an appropriate variational problem, remains fixed at any given time, regardless of the state of the system (Glasserman et al. 1999, Glasserman 2004, Guasoni and Robertson 2008). For such importance sampling schemes to attain efficiency, certain minimax conditions need to be imposed (see, e.g., Glasserman et al. 1999, condition 2.14 and Guasoni and Robertson 2008, condition 3.7). However, such minimax conditions will not hold in general, especially in the setting of rainbow options because many payoff functions are not convex. In Wang and Zhou (2015), a modified crossentropy scheme to select alternative sampling distributions from a class of mixtures for rainbow option simulation has been introduced. However, the scheme lacks theoretical performance analysis, and it requires pilot samples that may incur nontrivial computational overhead in higher dimensions.

In a series of papers, Dupuis and Wang (2004, 2007, 2009) have established that the change of measure needs to be state dependent (i.e., the change of measure can depend on both time and system state) in order to achieve efficiency in general. Such schemes, named dynamic importance sampling (dynamic IS) schemes, are built from suitable subsolutions to relevant Hamilton–Jacobi–Bellman (HJB) equations, and they have found much success in the setting of queueing networks. The goal of this paper is to systematically construct efficient state-dependent importance sampling schemes for rainbow options. We will demonstrate that the framework of dynamic importance sampling through subsolutions can be adapted to the simulation of rainbow options as well. However, this approach has limited success because it requires the option payoff function to satisfy certain conditions (Section 4.3). When applicable, the resulting dynamic importance sampling schemes are explicitly calculable and are not difficult to implement. Moreover, rigorous performance analysis can be carried out to establish the efficiency or asymptotic optimality of such schemes.

The difficulty of applying this dynamic importance sampling to general rainbow options lies in the construction of appropriate subsolutions to the associated HJB equations. This is largely because of the complicated and high-dimensional nature of a rainbow option’s payoff function. In order to circumvent this difficulty, we observe that all of these HJB equations are derived under the classical large deviation embedding, which has played an important role in the design of importance sampling schemes (Glasserman et al. 1999, Dupuis and Wang 2007, Guasoni and Robertson 2008). In this paper, we introduce a novel alternative large deviations embedding, which will reduce the complexity of the problem considerably. In short, it becomes sufficient to only investigate the estimation of probabilities instead of general expected values. Consequently, the construction of the important sampling schemes becomes much easier. The resulting asymptotically optimal (under this alternative embedding) importance sampling schemes are still state dependent, but the dependency is much simpler.

The paper is organized as follows. Section 2 establishes the model for the financial market. In Section 3, we give an overview of importance sampling and asymptotic optimality as well as a literature review. Section 4 considers the traditional large deviations embedding and establishes the connection between subsolutions and asymptotic optimality. We also discuss the construction of subsolutions when the terminal conditions take a certain form. The alternative large deviations embedding and the resulting importance sampling scheme are introduced in Section 5. Examples of numerical simulations are given in Section 6. We conclude the paper with further remarks and heuristic discussions in Sections 7 and 8.

2. Market Model and Rainbow Options

Because we are only interested in option pricing, we assume that the probability space (Ω,F) is equipped with the risk-neutral probability measure P. Let r be the constant risk-free interest rate. Consider a financial market with d risky assets whose prices at time t are denoted by a column vector S(t)[S1(t),,Sd(t)]. We will assume that the financial market is complete and that stock prices are geometric Brownian motions. More precisely, let B{B(t)=[B1(t),,Bd(t)]:t0} be a d-dimensional Brownian motion with strictly positive definite covariance matrix Σ[Σij] of size d×d and Σii=1 for every i=1,2,,d. Let σ[σ1,,σd] be a column vector of strictly positive constants that denotes the volatilities of stock prices. For i=1,2,,d, the stock price Si is modeled by the geometric Brownian motion

Si(t)=Si(0)exp{(r12σi2)t+σiBi(t)}.

We are interested in estimating the price of a rainbow option with expiration date T and payoff h(S(T))h(S1(T),,Sd(T)), where h:R+d[0,) is a nonnegative function defined on the open positive orthant

R+d{x[x1,,xd]Rd:xi>0, i=1,,d}.

The option price is simply the expected value vE[erTh(S(T))]. Prices of most rainbow options do not admit explicit formulae. The goal of our paper is to construct efficient importance sampling schemes for simulating such option prices.

For future analysis, it is convenient to introduce the log stock price process X(t)[X1(t),,Xd(t)], where for each i=1,2,,d,

Xi(t)logSi(t)=logSi(0)+(r12σi2)t+σiBi(t).

In particular, letting Γ be a matrix such that ΓΓΣ (see Remark 1), we can rewrite X(T)=A+CΓZ, where

A[A1,,Ad],  AilogSi(0)+(r12σi2)T,(1)

C is a d×d diagonal matrix with Cii=σi for i=1,,d, and Z[Z1,,Zd] is a d-dimensional normal random vector with distribution N(0,TId). Therefore, the discounted payoff can be expressed as

erTh(S(T))=erTh(exp{A+CΓZ})G(Z),(2)
where we have adopted the notation that exp{x}[exp{x1},,exp{xd}] for any vector x[x1,,xd]Rd.

Finally, we would like to impose some mild technical conditions on the payoff function h, which are satisfied by nearly all rainbow options. Throughout the paper, we will assume that h is of linear growth; that is, there exist some nonnegative constants a and b such that

0h(s)a+bs(3)
for every sR+d. Under this assumption, the option price v is finite. We will assume that the set D{h>0} is either open or closed and that there is some continuous nonnegative function h¯:R+d[0,) such that h¯=h on {h>0} or equivalently, h=h¯1{h>0}. The choice of h¯ is not essential, and we can choose h¯ so that it also satisfies the growth Condition (3); indeed, one can otherwise redefine h¯ to be the minimum of itself with a+bs.

Remark 1.

When Σ is strictly positive definite, there is a particularly convenient choice of Γ based on Cholesky factorization (Glasserman 2004). In this case, Γ is lower triangular, and its components are explicitly available.

Remark 2.

Even though our analysis is largely based on market models where the stock prices are geometric Brownian motions, this restriction is not essential. The idea is applicable to more general jump diffusion models. We will discuss this extension with more details in Section 7.

3. Overview of Importance Sampling

Importance sampling is a general variance reduction technique for Monte Carlo simulation. It is particularly effective in the context of rare event simulation. The idea is to generate samples from a different probability distribution and multiply each outcome with an appropriate likelihood ratio so that the resulting estimator is unbiased.

Hereon, we will equip the sample space (Ω,F) with probability measures other than the original risk-neutral probability measure P. The expected value with respect to P will simply be denoted by E[·], whereas the expected value with respect to some other probability measure, say Q, will be denoted by EQ[·]. If those alternative probability measures are indexed by some parameter ε>0, say Q=Pε, then the expected value with respect to those probabilities will be denoted by Eε[·] for simplicity when no confusion is incurred.

Suppose one is interested in estimating the expected value vE[X] of some random variable X. Let Q be an alternative probability measure such that PQ. Then,

v=EQ[XdPdQ].

In other words, importance sampling produces an unbiased estimator by generating samples from an alternative probability measure Q and multiplying each sample of X by the likelihood ratio dP/dQ. The key in the design of importance sampling algorithms is the choice of the alternative sampling distribution Q. When appropriately designed, importance sampling can be very efficient, reducing the variance of the estimator by orders of magnitude. It is particularly useful in the estimation of probabilities or expected values related to rare events.

A commonly used criterion for selecting the alternative sampling distribution is the so-called “asymptotic optimality” (Sigmund 1976, Heidelberger 1995). Its definition requires that the present estimation problem be embedded into a sequence of estimation problems, where certain types of large deviations convergence results hold. To fix ideas, suppose that we wish to estimate the expected values of nonnegative random variables {Xε} indexed by ε>0 (the original problem corresponds to some ε; i.e., X=Xε for some ε). Denote vεE[Xε]. We assume that a large deviation type of asymptotics exists; that is, there exists a constant γR such that

γlimε0εlogvε.

Let Pε be a new probability measure such that PPε. The importance sampling estimator denoted by v^ε is given by

v^ε=XεdPdPε,
where dP/dPε denotes the Radon–Nikodým derivative or the likelihood ratio. Because the estimator is unbiased under Pε, minimizing its variance is equivalent to minimizing its second moment. However, by the Jensen inequality and the unbiasedness of v^ε, we have the lower bound
lim¯ε0εlogEε[v^ε2]lim¯ε02εlogEε[v^ε]=lim¯ε02εlogvε=2γ.

Definition 1

(Asymptotic Optimality). An importance sampling scheme or change of measure {Pε} is said to be asymptotically optimal if this lower bound is achieved: that is, if

lim¯ε0εlogEε[v^ε2]2γ.

Remark 3.

To put everything into the framework of asymptotic optimality, we need to embed the original estimation problem into a sequence of estimation problems. Such embedding is not unique, which is a useful observation that we will exploit later on to deal with certain rainbow options.

4. Efficient Importance Sampling

It has been established in Glasserman and Wang (1997) and Dupuis and Wang (2004, 2007) that state-independent changes of measure will not lead to efficient importance sampling schemes in general, whereas asymptotically optimal state-dependent schemes can be constructed from classical subsolutions to related Hamilton–Jacobi–Bellman equations. For the financial market model under consideration, the stock prices are driven by Brownian motion B. Under an alternative sampling distribution that we consider, the Brownian motion B will become a Brownian motion with drift. Therefore, we expect that asymptotically optimal changes of measure correspond to state-dependent drifts in general.

4.1. The Classical Large Deviation Embedding

Recall the definition of G in (2). The goal is to estimate the option price vE[G(Z)], where Z[Z1,,Zd] is a d-dimensional normal random vector with distribution N(0,TId). Following the setup in Glasserman et al. (1999), define F(z)logG(z) for any zRd. Note that F takes values in R{} because G can be zero. Now, fix any ε>0. Define

vεE[eF(εZ)/ε].(4)

The option price of interest corresponds to ε=1. Under very mild conditions, one can establish the large deviations or exponential growth/decay rate of vε. The following result is a slight generalization of the Varadhan integral lemma, and its proof can be found in Glasserman et al. (1999, lemma 2.2).

Lemma 1.

Suppose that f:RdR{} is continuous and satisfies the growth condition f(z)c1+c2z2 for some c2<1/2 and for all zD, where DRd is either open or closed. Let Z be a d-dimensional normal random vector with distribution N(0,TId). Then,

limε0εlogE[ef(εZ)/ε1D(εZ)]=supzD[f(z)z2/2T].

Under our assumptions, h=h¯1{h>0} for some nonnegative, continuous function h¯:R+d[0,) that satisfies the linear growth Condition (3). Now, define G¯ according to (2) similarly with h replaced by h¯. Finally let F¯logG¯. It is not difficult to check that F¯ is continuous and satisfies the growth condition of Lemma 1; indeed, F¯(z)c1+c2z for some c1,c2 and every zRd. Because h and h¯ agree on the set {h>0}, it follows that G¯=G and thus, F¯=F on the set D{F>}. Furthermore, because {h>0} is either open or closed and zexp{A+CΓz} is a continuous bijection from Rd to R+d, D is either open or closed. Thus, we may apply Lemma 1 to obtain that

γlimε0εlogvε=limε0εlogE[eF(εZ)/ε]=limε0εlogE[eF(εZ)/ε1D(εZ)]=limε0εlogE[eF¯(εZ)/ε1D(εZ)]=supzD[F¯(z)z2/2T]=supzD[F(z)z2/2T]=supzRd[F(z)z2/2T].(5)

Finally, we observe that the supremum is always attained at some z*D¯ when F is continuous on D¯ because of the linear growth condition of F.

4.2. HJB Equation and Subsolution

Even though Dupuis and Wang (2004, 2007) are mostly concerned with discrete time dynamics, the technique therein can be naturally adapted to the diffusion setting. We should omit these repetitive heuristic details of deriving the HJB equation and directly state the definition of a subsolution. To this end, define WΓ1B. Then, W is a d-dimensional standard Brownian motion. Define the scaled state process indexed by ε:

Wε(t)εW(t)=εΓ1B(t),   t0.

Then, the quantity of interest can be written as

vε=E[eF(εZ)/ε]=E[eF(Wε(T))/ε].

We say a continuously differentiable function V:[0,T]×RdR is a classical subsolution with bounded gradient if V is uniformly bounded and if V satisfies the partial differential inequality (see Remark 5)

Vt+infuRd[12u2+V,u]=Vt12V20(6)
for any (t,x)[0,T)×Rd with terminal condition
V(T,x)F(x),   xRd.(7)

Note that the term u2/2 is the large deviation rate function for the standard d-dimensional normal distribution N(0,Id) and is closely associated with the large deviations of d-dimensional standard Brownian motion {Wε} (Dembo and Zeitouni 1998, theorem 5.2.3). The change of measure corresponding to a classical subsolution V is given by Dupuis and Wang (2007):

u*(t,x)V(t,x).(8)

More precisely, the alternative sampling distribution is defined by the probability measure Pε, where

dPεdPexp{1ε0Tu*(t,Wε(t))dW(t)12ε0Tu*(t,Wε(t))2dt}.(9)

Because u* is assumed to be bounded, the right-hand side indeed defines a probability measure. By the Girsanov theorem (Karatzas and Shreve 1991), the process W¯ defined by

dW¯(t)dW(t)1εu*(t,Wε(t))dt
is a standard Brownian motion under Pε. The corresponding importance sampling estimate for vε is simply
v^ε=eF(Wε(T))/εdPdPε.

The performance of this estimate is characterized by the following theorem, which is essentially a continuous-time version of Dupuis and Wang (2007, theorem 8.1). This is the key result in the subsolution approach to importance sampling.

Theorem 1.

Let V:[0,T]×RdR be a classical subsolution to the HJB Equations (6) and (7) with bounded gradient. If V is also twice continuously differentiable with respect to x with uniformly bounded Hessian matrix, then

lim¯ε0εlogEε[v^ε2]2V(0,0).

Proof.

Let 2V[ij2V] denote the d×d Hessian matrix of V and C be a constant such that |2Vij(t,x)|C for every i,j=1,,d and (t,x)[0,T]×Rd. Define the process M{Mt:t[0,T]} with

Mtexp{2εV(t,Wε(t))1ε0tu*(s,Wε(s))dW(s)+12ε0tu*(s,Wε(s))2dsCdt}.(10)

Because of the boundedness of all first-order derivatives of V, V is of linear growth in both t and x. Therefore, it is not difficult to see that E[Mt2]K for some constant K and all t[0,T]. Applying the Itô formula and using (8), we have

dMtMt=2ε[Vt+V2+12u*2+V,u*]dt(Cd+tr2V)dt1ε[2V+u*]dW(t)=2ε[Vt+12V2]dt(Cd+tr2V)dt1ε[2V+u*]dW(t),
with the understanding that all of the coefficients on the right-hand side are taking values at (t,Wε(t)). It follows that M is indeed a local supermartingale because the coefficients of dt are negative by the subsolution property of V and that iiVC for all i=1,,d. Because M is nonnegative, it is thus a true supermartingale. In particular, E[M0]E[MT]. Combined with the terminal Condition (7), it leads to
e2V(0,0)/εeCdTE[e2V(T,Wε(T))/εdPdPε]eCdTE[e2F(Wε(T))/εdPdPε]=eCdTEε[e2F(Wε(T))/ε(dPdPε)2]=eCdTEε[v^ε2].

Now, taking logarithm on both sides, multiplying with ε, and then, sending ε to zero, we complete the proof. □

Remark 4.

Theorem 1 asserts that if V(0,0)=γ, then the importance sampling estimate is asymptotically optimal. In many practical applications, V(0,0) can be chosen to be arbitrarily close to γ for a classical subsolution V.

Remark 5.

The equation should be viewed as an Isaacs equation for interpretation. The interesting point of this Isaacs equation is that it is equivalent to the HJB equation by a scaling of two, which is the intrinsic reason why state-dependent importance sampling schemes can attain asymptotic optimality. In this paper, we simply use the form of the equivalent HJB equation. Details of the Isaacs equation may be found in Dupuis and Wang (2004, 2007).

4.3. Construction of Classical Subsolutions

Because of Theorem 1, building an efficient importance sampling scheme amounts to constructing a classical subsolution V of (6) and (7) with V(0,0) as close to γ as possible. In many applications, such subsolutions can be obtained as a mollification of piecewise affine weak subsolutions (Dupuis and Wang 2007).

Throughout this section, we assume that F is continuous on D¯, where D{F>}. Recall the definition of γ in (5), and denote by z*D¯ the maximizing point. We start with the simplest case where F is concave. Define V:[0,T]×RdR by

V(t,x)1Tz*,x12T2(Tt)z*2[F(z*)1Tz*2].(11)

We claim that V is a classical subsolution. Indeed, it satisfies (6) with equality. As for the terminal Inequality (7), we observe that z*/TF because z* attains the supremum of (5). Therefore, by concavity,

F(x)F(z*)1Tz*,xz*,
which is exactly (7). Finally, V(0,0)=F(z*)+z*2/2T=γ, and thus, the corresponding state-independent importance sampling scheme is asymptotically optimal.

A more common situation is that F is the maximum of a concave function. For many options, the corresponding F can be written in this form, such as multistrike call options. Assume that FF1Fm, where Fi is a concave function for each i=1,,m. Define for each i

Vi(t,x)1Tzi,x12T2(Tt)zi2[Fi(zi)1Tzi2],(12)
where ziRd is the maximizer for (5) with Fi in place of F. Note that each Vi satisfies the HJB Equation (6) with equality and Vi(T,x)Fi(x). Define VV1Vm. Then, V satisfies (6) wherever it is differentiable with V(T,x)F(x). Moreover,
V(0,0)=maxi=1,,m[Fi(zi)12Tzi2]=maxi=1,,mmaxzRd[Fi(z)12Tz2]=maxzRdmaxi=1,,m[Fi(z)12Tz2]=maxzRd[F(z)12Tz2]=γ.

However, we cannot apply Theorem 1 directly because V is not differentiable everywhere. To do so, one can mollify the (weak) subsolution V to obtain a classical subsolution. To fix ideas, let δ be a small positive number, and define

Vδ(t,x)δlog(i=1meVi(t,x)/δ).(13)

Then, Vδ is a continuously differentiable function, and

Vδ(t,x)=i=1mρiδ(t,x)Vi(t,x), Vδt(t,x)=i=1mρiδ(t,x)Vit(t,x)(14)
with
ρiδ(t,x)eVi(t,x)/δj=1meVj(t,x)/δ.(15)

It is easy to verify because of the convexity of the mapping xx2 for xRd that Vδ satisfies the subsolution property (6). Furthermore, by the definitions of Vδ and V,

Vδ(T,x)δlog(eV(T,x)/δ)=V(T,x)F(x)
and
Vδ(0,0)δlog(meV(0,0)/δ)=V(0,0)δlogm=(γ+δlogm).

In other words, Vδ is a classical subsolution, where V(0,0) is only 2δlogm away from asymptotic optimality (see Remark 6). In practice, δ is chosen to be small to guarantee good numerical performance.

Remark 6.

To achieve optimality, one can let δ depend on ε, and a result analogous to Theorem 1 can be obtained. More precisely, if δ0 and δ/ε, then the corresponding importance sampling scheme is asymptotically optimal. The proof is verbatim to that of Theorem 1 except that the trace of Vδ is now bounded by C/δ for some constant δ, which is the reason behind condition δ/ε.

5. An Alternative Embedding and Importance Sampling

In this section, we investigate an alternative large deviations embedding and the resulting importance sampling scheme.

5.1. Motivation and Overview

The classical large deviation embedding introduced by (4) and the subsequent subsolution approach yield a general framework for constructing asymptotically optimal importance sampling schemes. We have shown that if the payoff of a rainbow option can be expressed as the maximum of concave functions, the construction of subsolutions can be quite straightforward. However, many rainbow options (even simple ones, such as basket call) do not belong to this category. For those options, it is unclear, particularly in high dimensions, how to find systematic ways to construct subsolutions that lead to asymptotically optimal importance sampling schemes.

This prompts us to introduce a large deviation embedding other than (4). A distinct advantage of this alternative embedding is that the resulting importance sampling scheme is very easy to construct because it does not depend on the specific form of the payoff function. Rather, the construction will only rely on the distance from the origin to the region where the payoff function is strictly positive. For this reason, such importance sampling schemes are said to be “universal” (Dupuis and Wang 2004). They are still state-dependent importance sampling schemes, but the dependence is minimal; see (20).

5.2. An Alternative Large Deviation Embedding

To describe this new approach, we recall the definition of G in (2) and that of D{G>0}. Consider the following large deviations scaling. For any ε>0, define

vεE[G(εZ)],(16)
where Z is a d-dimensional normal random vector with distribution N(0,TId). The quantity of interest corresponds to ε=1. Here, we have abused the notation as vε is also used in the classical embedding. But, there should be no ambiguity given the context. We also define the set (which actually corresponds to a binary call)
Aε{zRd:G(εz)>0}=D/ε
for every ε>0. The heuristic is that a good importance sampling change of measure for estimating P(Aε) should also be good for estimating vε. Even though Aε may not be convex itself or the union of convex sets, one can design universal importance sampling (universal IS) schemes for it (Dupuis and Wang 2007). We adopt the following mild assumptions throughout this section.

Assumption 1

. D is nonempty, and D°¯=D¯, where D° denotes the interior of D.

Assumption 2.

G is continuous on D¯.

Assumption 3.

For some positive constants a and b, |G(z)|aebz for all zRd.

Assumption 4.

D¯ does not contain the origin.

We would like to remark that Assumption 4 is imposed only to avoid the trivial case where importance sampling is not really necessary. It will not affect the theoretical results in any way.

Lemma 2.

Let Z be a d-dimensional normal random vector with distribution N(0,TId). Then, we have

γlimε0εlogvε=limε0εlogP(ZAε)=12TinfzD¯z2,(17)
where the infimum is achieved at some z*D.

Proof.

Note that the last equality in (17) follows from classic large deviations and Assumption 1. It is trivial that the infimum is attained at some z*D because D is nonempty.

To analyze vε, we fix an arbitrarily small δ>0 and define the set Dδ{zRd:G(z)>δ}. Note that Dδ° is nonempty for δ small enough because D° is nonempty. It follows that vεδP(εZDδ), and classical large deviations theory yields

lim¯ε0εlogvεlim¯ε0[εlogδ+εlogP(εZDδ)]12TinfzDδ°z2.

Now, letting δ0, it is straightforward to show by Assumption 1 that Dδ°D°, and thus,

lim¯ε0εlogvε12TinfzD°z2=12TinfzD¯z2.

As for the other direction, take an arbitrarily large constant M>0, and observe that

G(εZ)G(εZ)1{G(εZ)>M}+M1Aε(Z).(18)

It follows that

lim¯ε0εlogvεlim¯ε0εlog{E[G(εZ)1{G(εZ)>M}]+MP(ZAε)}=max{lim¯ε0εlogE[G(εZ)1{G(εZ)>M}],lim¯ε0εlog[MP(ZAε)]}.

Because the second term on the right-hand side equals lim¯ε0εlogP(ZAε), it remains to show that

limMlim¯ε0εlogE[G(εZ)1{G(εZ)>M}]=.

Because of the growth condition of G, it is sufficient to show that

limMlim¯ε0εlogE[aebεZ1{aebεZ>M}]=.

Observe that by the Hölder inequality,

E[aebεZ1{aebεZ>M}]{E[a2e2bεZ]P(aebεZ>M)}1/2.

By the dominated convergence theorem, E[a2e2bεZ]a2e2b as ε0. Furthermore, classical large deviations theory implies that

limε0εlogP(aebεZM)=12Tinf{z2:aebzM}=12Tb2(logMa)2
as M. This completes the proof. □

5.3. Asymptotic Optimality

Below is the main result of the paper. It establishes the connection between the estimation of vε and that of {P(ZAε)}, and it confirms the heuristic that a good importance sampling scheme for the latter is also good for the former. In particular, if an importance sampling scheme is asymptotically optimal for estimating {P(ZAε)}, then it is also asymptotically optimal for estimating {vε} under some mild conditions.

Theorem 2.

Assume that {Pε} is asymptotically optimal for the estimation of {P(ZAε)}. If for some q>1,

lim¯ε0εlogE[1Aε(Z)(dP/dPε)q]<,
then {Pε} is asymptotically optimal for the estimation of {vε} as well.

Proof.

Denote the likelihood LεdP/dPε. The importance sampling estimator is just

v^ε=G(εZ)Lε,
where Z is sampled from Pε. Because of Lemma 2, it is sufficient to show that
lim¯ε0εlogEε[v^ε2]2γ,(19)
where γ is defined to be the right-hand side of (17). Plugging in the form of v^ε, the left-hand side (LHS) of (19) equals
LHS=limε0εlogE[v^ε2Lε1]=limε0εlogE[G2(εZ)Lε].

Take any large number M. It follows from (18) and (a+b)22a2+2b2 that

G2(εZ)2G2(εZ)1{G(εZ)>M}+2M21Aε(Z).

By assumption, {Pε} is asymptotically optimal for estimating {P(ZAε)}. It follows that

lim¯ε0εlogE[2M21Aε(Z)Lε]=lim¯ε0εlogE[1Aε(Z)Lε]=lim¯ε0εlogEε[1Aε(Z)Lε2] =2γ.

Now, for the other term, we can again use the Hölder inequality. Define α such that 1/α+1/q=1. Then,

lim¯ε0εlogE[2G2(εZ)1{G(εZ)>M}Lε]=lim¯ε0εlogE[G2(εZ)1{G(εZ)>M}1Aε(Z)Lε]lim¯ε0εlog{E[G2α(εZ)1{G(εZ)>M}]}1/α{E[1Aε(Z)Lεq]}1/qlim¯ε0ε/αlogE[G2α(εZ)1{G(εZ)>M}]+lim¯ε0ε/qlogE[1Aε(Z)Lεq].

Because of the assumption, the second term on the right-hand side cannot take a value of . Thus, it remains to show that

limMlim¯ε0εlogE[G2α(εZ)1{G(εZ)>M}]=.

This is almost verbatim to the second half of the proof to Lemma 2, and we omit the details. Therefore, Inequality (19) follows readily. This completes the proof. □

5.4. Universal Importance Sampling

To apply Theorem 2, we need efficient importance sampling schemes for estimating {P(ZAε)}. Here, we investigate the universal schemes brought up in Dupuis and Wang (2007, section 10.7) and argue that it is indeed asymptotically optimal for the estimation of {P(ZAε)} as well as {vε}.

Recall D{G>0} and γ in (17). Fix any vector θRd such that θ1, and define u*:[0,T)×RdRd by

u*(t,x)2γT(xx1{x0}+θ1{x=0})(20)

The importance sampling change of measure {Pε} is again given by the Girsanov transform

dPεdPexp{1ε0Tu*(t,Wε(t))dW(t)12ε0Tu*(t,Wε(t))2dt}.

Theorem 3.

{Pε} is asymptotically optimal for the estimation of both {P(ZAε)} and {vε}.

Proof.

We first show that {Pε} is asymptotically optimal for estimating {P(ZAε)}. This part is very similar to the proof of Theorem 1. Fix any δ>0, and define V:[0,T]×RdR by (see Dupuis and Wang 2007, section 10.7)

V(t,x)2γT(x2+δ)γ(1+tT).

It is not difficult to show that the gradient and Hessian of V are uniformly bounded by some constant, say C. Analogous to the proof of Theorem 1, we define the process M{Mt:t[0,T]} exactly as in (10). Because u* is bounded, we easily have E[Mt2]K for some constant K and all t[0,T]. Applying the Itô formula, we have

dMtMt=4γεT(εWε(t)2εWε(t)2+δεWε(t)2εWε(t)2+δ)1{Wε(t)0}dt+2γεT(1θ2)1{Wε(t)=0}dt(Cd+tr2V)dt1ε[2V+u*]dW(t)
with the understanding that all of the coefficients on the right-hand side are taking values at (t,Wε(t)). Clearly, the coefficients of dt are negative (note that γ0). Define F(x)log1D(x). We observe that for xD¯, x22γT by the definition of γ in (17). Thus, for all xD¯,
V(T,x)=2γT(x2+δ)2γ2γTx2γ0.

In particular, we have V(T,x)F(x) for all xRd. The rest of the proof is almost verbatim to that of Theorem 1, and we arrive at

lim¯ε0εlogEε[1Aε(Z)(dP/dPε)2]2V(0,0)=2γ+22γδT.

Letting δ0, we conclude that {Pε} is asymptotically optimal for estimating {P(ZAε)}. To complete the proof, it is sufficient to show that for some q>1,

lim¯ε0εlogE[1Aε(Z)(dP/dPε)q]lim¯ε0εlogE[(dP/dPε)q]<.(21)

We then invoke Theorem 2. This is trivial because u* is uniformly bounded. Thus,

E[(dP/dPε)q]=E[exp{qε0Tu*(t,Wε(t))dW(t)+q2ε0Tu*(t,Wε(t))2dt}]E[exp{qε0Tu*(t,Wε(t))dW(t)q22ε0Tu*(t,Wε(t))2dt}]×exp{(q2+q)Tu*22ε}=exp{(q2+q)Tu*22ε},
and Inequality (21) follows readily. This completes the proof. □

Remark 7.

Both the dynamic importance sampling scheme outlined in Section 4 and the universal importance sampling schemes (20) are state dependent. However, the latter is much easier to construct because it only requires information on γ or equivalently, the distance between the origin and the region D where the payoff function is strictly positive. Moreover, it is more broadly applicable because it does not require the payoff function to take any specific form. However, the price that we pay for this simplicity and generality is the efficiency. Even though the universal schemes are asymptotically optimal under the alternative large deviation scaling, they will not be as efficient as those schemes in Section 4 when the latter is applicable. This is quite obvious as the universal schemes do not utilize the details of the payoff functions. It is also demonstrated by the empirical evidence in the subsequent numerical experiments.

6. Numerical Experiments

In this section, we will present numerical examples on the pricing of rainbow options. For some options, we will use the dynamic importance sampling scheme outlined in Section 4, and for all options, we will also perform the universal importance sampling outlined in Section 5. In Tables 17, which show the numerical results, “R.E.” denotes the relative error of the importance sampling estimate (standard error divided by the estimate), and “var ratio” denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate. In all of the simulations, the time interval [0, T] is divided into k=50 equal-length subintervals (see Remark 8). Each estimate is based on n = 100,000 samples.

Table

Table 1. Spread Call Option with Two Underlying Assets

Table 1. Spread Call Option with Two Underlying Assets

K=20K=40K=60
DynamicUniversalDynamicUniversalDynamicUniversal
Estimate1.20381.18280.08970.09030.00600.0060
R.E., %0.240.700.331.250.391.83
Var ratio18.42.4138.19.61,69976.1
z*, z^(1.4, −0.9)(0.8, −0.6)(2.4, −1.1)(1.9, −1.0)(3.2, −1.2)(2.9, −1.1)


Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

All of the rainbow options that we consider satisfy the conditions in Section 2, including the linear growth Condition (3). Also, recall the definitions of vector A and square matrices C and Γ in (1) and thereafter. Note that F(z)logG(z), where G is defined in (2). When we apply universal IS, the set of interest is D{G>0}.

Example 1.

In this example, we consider a spread call option with d=2 underlying assets and payoff

h(S)[S1(T)S2(T)K]+,
where K>0 is a constant. In this case, for any zR2, the corresponding F(z) and D are given by
F(z)=log(exp{x1}exp{x2}K)+,D={z:exp{x1}exp{x2}>K},
where x[x1,x2]=A+CΓz for zR2 (Table 1).
  1. Dynamic IS. One can show that F is a continuous concave function (we omit the technical details). Therefore, V in (11) is a classical subsolution with z* as the unique maximizer:

    z*=arg maxzRd[F(z)12Tz2].

    The corresponding change of measure is given by (8) and (9), where u*=V(t,x)=z*/T. Note that z* must satisfy F(z*)=z*.

  2. Universal IS. The corresponding γ is given by Lemma 2, where the minimizing z must lie on D={z:exp{x1}exp{x2}=K}.

    We use the MATLAB function fminunc supplied with gradient (one can also use the iterative method in Glasserman et al. 1999) to solve for z*. The equation of γ can be converted into an equation of a single variable and solved by the bisection method. We omit the technical details and simply present the simulation results. For future reference, we also record z* and the maximizing z^ for Lemma 2. The parameters are set at r=0.05, T=1, and

    S(0)=[3530], [σ1σ2]=[0.30.4], Σ=[10.20.21].

Example 2.

In this example, we consider an outperformance digital call option with d underlying assets and payoff

h(S)=1{S1(T)Sd(T)K},
where K>0 is a constant (Table 2). In this case, for any zRd, the corresponding F(z) and D are given by
F(z)=log1{x1xdlogK},D={z:x1xdlogK},

Table

Table 2. Outperformance Digital Call Option with Three Underlying Assets

Table 2. Outperformance Digital Call Option with Three Underlying Assets

K=80K=100K=120
DynamicUniversalDynamicUniversalDynamicUniversal
Estimate3.41E-33.45E-32.41E-42.35E-42.05E-52.09E-5
R.E., %0.642.970.653.400.685.44
Var ratio71.23.21,01238.39,256138.9


Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

where x[x1,,xd]=A+CΓz.

  1. Dynamic IS. FF1Fd, where Fi(z)log1{xilogK} is a concave function for each i. Therefore, VV1Vd in (11) is a (weak) subsolution, where for each i, Vi is given by (12) with zi as the unique maximizer:

    zi=arg maxzRd[Fi(z)12Tz|2].

    Note that Fi takes values of either zero or . Thus, the maximizer zi necessarily satisfies Fi(zi)=0. From this observation, it is very straightforward to show that

    zi=logKAiθi2·θi,
    where θi denotes the ith column vector of the matrix (CΓ). A classical subsolution Vδ can be obtained by mollification (13) with derivatives (14) and (15). The corresponding change of measure is given by (8) and (9), where u*=Vδ(t,x).

  2. Universal IS. In this case, the corresponding γ is given by Lemma 2, where obviously, the minimizing z must coincide with one of the zi’s above, and it is easily identified as

    γ=12Tmini=1,,dzi2=12Tmini=1,,d(logKAi)2θi2.

For simulation, we consider a digital call with d=3 underlying assets. The mollification parameter δ is set to be 0.1 in dynamic IS, and the other parameters are r=0.05, T=1, and

S(0)=[403540], [σ1σ2σ3]=[0.20.30.1], Σ=[10.20.30.210.50.30.51].

Example 3.

In this example, we consider a multistrike call option with d underlying assets and payoff

h(S)max{S1(T)K1,,Sd(T)Kd,0},
where K>0 is a constant (Table 3). In this case, for any zRd, the corresponding F(z) and D are given by
F(z)=max{log[exp{x1}K1]+,,log[exp{xd}Kd]+},D={z:max{x1logK1,,xdlogKd}>0},

Table

Table 3. Multistrike Call Option with Four Underlying Assets

Table 3. Multistrike Call Option with Four Underlying Assets

Δ=20Δ=30Δ=40
DynamicUniversalDynamicUniversalDynamicUniversal
Estimate5.38E-25.45E-22.95E-33.01E-31.36E-41.38E-4
R.E., %0.471.590.592.680.723.90
Var ratio58.04.9481.921.42,11370.5


Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

where x[x1,,xd]=A+CΓz.

  1. Dynamic IS. FF1Fd, where Fi(z)log[exp{xi}Ki]+ is a concave function for each i. Therefore, VV1Vd of (11) is a (weak) subsolution, where for each i, Vi is given by (12) with zi as the unique maximizer:

    zi=arg maxzRd[Fi(z)12Tz2].

    It can be derived (we omit the technical details) that for each i, zi=xiθi, where θi denotes the ith column vector of the matrix (CΓ) and xi is the unique solution to the equation

    1+KieAi+xθi2KixT=0,    xlogKiAiθi2.

    Each xi can be obtained by the bisection method. A classical subsolution Vδ can be obtained by mollification (13) with derivatives (14) and (15). The corresponding change of measure is given by (8) and (9), where u*=Vδ(t,x).

  2. Universal IS. The calculation of γ is almost verbatim to that of Example 2 with K replaced by Ki in each instance. Indeed,

    γ=mini=1,,d(logKiAi)2θi2.

    The mollification parameter δ is set to be 0.1 in dynamic IS. We consider d=4 underlying assets with parameters r=0.05, T=1, and

    S(0)=[40353030], [σ1σ2σ3σ4]=[0.10.10.20.2], Σ=[10.20.300.210.40.20.30.410.300.20.31],[K1K2K3K4]=S(0)+Δ[1111].

    In order to investigate how the dimension of the problem affects the performance of these schemes, we also consider a group of N=12 underlying assets and multistrike call options associated with the first d assets, with d=2,4,6,8,10,12. The initial stock price S(0), the volatility vector σ=[σ1,σ2,,σN], and the covariance matrix Σ are randomly generated. Below is a realization of these parameters:

    S(0)=[181222241818122130251521],      σ=[0.130.150.180.130.190.170.120.210.110.140.200.26],      [K1K2K3K4K5K6K7K8K9K10K11K12]=S(0)+30[111111111111],Σ=[1.000.080.230.150.330.320.190.310.050.120.270.130.081.000.160.110.160.240.260.160.170.310.210.120.230.161.000.350.350.180.390.030.140.360.180.270.150.110.351.000.330.200.120.070.300.220.170.350.330.160.350.331.000.300.080.280.270.090.060.140.320.240.180.200.301.000.210.380.120.010.410.170.190.260.390.120.080.211.000.060.040.050.030.130.310.160.030.070.280.380.061.000.000.400.330.010.050.170.140.300.270.120.040.001.000.010.060.020.120.310.360.220.090.010.050.400.011.000.140.370.270.210.180.170.060.410.030.330.060.141.000.010.130.120.270.350.140.170.130.010.020.370.011.000].

    The other parameters in the simulation remain the same except that the sample size is increased to n = 400,000 (Table 4).

    Table

    Table 4. Multistrike Call Option with d Underlying Assets

    Table 4. Multistrike Call Option with d Underlying Assets

    d=2d=4d=6d=8d=10d=12
    DynamicUniversalDynamicUniversalDynamicUniversalDynamicUniversalDynamicUniversalDynamicUniversal
    Estimate2.01E-131.96E-134.10E-63.73E-64.24E-64.60E-65.52E-55.21E-55.51E-55.42E-51.5E-31.6E-3
    R.E., %0.312.210.243.440.356.380.297.700.289.950.288.07


    Note. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate).

    The variance ratios to the plain Monte Carlo estimates are not presented because it is often the case that the plain Monte Carlo cannot produce meaningful estimates/variances for d=2,4,6,8,10. For example, a typical plain Monte Carlo estimate for d=2 is zero, and a typical estimate for d=8 is 1.16E-5 with a 40% relative error (in these cases, the true values of the expectation and variance are both vastly underestimated, a common occurrence in rare event simulation) (Glasserman and Wang 1997, Dupuis and Wang 2004). It is only for d=12 that the plain Monte Carlo produces meaning results, in which case the variance ratios are 2,300 and 4, respectively. In all of these simulations, one can see that both the dynamic IS and the universal IS outperform the plain Monte Carlo simulation. The dynamic IS, when feasible, is more efficient than the universal IS.

Example 4.

In this example, we show the advantage of universal IS by considering three different types of rainbow options. Let K, {ci:i=1,,d}, and {Ki:i=1,,d} be strictly positive constants:

  1. basket option: h(S)=[c1S1(T)++cdSd(T)K]+;

  2. pyramid option: h(S)=[|S1(T)K1|++|Sd(T)Kd|K]+; and

  3. Madonna option: h(S)=[|S1(T)K1|2++|Sd(T)Kd|2K]+.

For all of these options, the corresponding F cannot be expressed as the maximum of concave functions. Therefore, it is not clear how one can construct appropriate subsolutions and consequently, dynamic IS algorithms.

However, the universal IS algorithm is still easily applicable. For all simulations, we consider d=3 underlying assets with parameters r=0.05, T=1, and

S(0)=[403530], [σ1σ2σ3]=[0.20.20.1], Σ=[10.20.30.210.40.30.41],[c1c2c3]=[0.30.30.4], [K1K2K3]=[353535].

We use the MATLAB function fmincon to solve the minimization problem in Lemma 2. There is no guarantee that the local minimum found by the function is the global minimum because of the lack of concavity (even though we have used random initial search conditions for fmincov to alleviate the problem to some degree). However, the simulation shows great improvement in performance, which is our ultimate goal. In Tables 57, “plain MC” stands for the plain Monte Carlo scheme.

Table

Table 5. Basket Call Option with Three Underlying Assets

Table 5. Basket Call Option with Three Underlying Assets

K=50K=55K=60
UniversalPlain MCUniversalPlain MCUniversalPlain MC
Estimate7.47E-37.09E-35.22E-44.73E-43.30E-51.20E-4
R.E., %2.516.962.5424.74.1661.6
Var ratio7.077.52,891


Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

Table

Table 6. Pyramid Call Option with Three Underlying Assets

Table 6. Pyramid Call Option with Three Underlying Assets

K=50K=60K=70
UniversalPlain MCUniversalPlain MCUniversalPlain MC
Estimate2.66E-22.74E-24.67E-35.03E-38.22E-47.75E-4
R.E., %1.616.722.0014.52.3830.1
Var ratio18.360.6142.1


Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

Table

Table 7. Madonna Call Option with Three Underlying Assets

Table 7. Madonna Call Option with Three Underlying Assets

K=40K=50K=60
UniversalPlain MCUniversalPlain MCUniversalPlain MC
Estimate1.07E-20.93E-21.06E-31.52E-31.09E-41.86E-4
R.E., %1.679.532.3627.24.2859.4
Var ratio24.7282.8554.4


Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.

6.1. Summary

The numerical results have shown that both dynamic IS and universal IS greatly improve the efficiency of the Monte Carlo simulation, especially when the options are deep out of the money. When applicable, the dynamic IS outperforms the universal IS, which is not surprising considering that the universal scheme only uses very limited knowledge of the model. On the other hand, universal IS (still provably asymptotically optimal in the alternative scaling) leads to great variance reduction compared with the plain Monte Carlo algorithm, and it is much more broadly applicable. Further, it also is much easier to construct for various rainbow options with complicated payoffs.

Remark 8.

The implementation of the dynamic importance sampling or the universal importance sampling requires the partition of the time interval. Even though we have picked a fairly large partition with k=50 here, the results remain nearly identical even if k is much smaller: for example, k=10. Note that in this paper, the plain Monte Carlo simulation does not require the partition of the time interval, whereas the state-dependent importance sampling does. Such computational overhead is insignificant compared with the reduction in variance, especially for the simulation of deep out-of-money options.

7. Further Remarks

This paper discusses two state-dependent importance sampling schemes: the “dynamic IS” based on the classical large deviations embedding and the “universal IS” based on an alternative large deviations embedding. Both of them greatly improve the efficiency of Monte Carlo simulation of rainbow options. Even though the theory is based on market models with lognormal stock prices and path-independent options, in principle one may be able to extended the algorithms to more general market models and path-dependent options. For the dynamic IS, the challenge lies in the overly complicated structure of the HJB equations, which is inevitable when one deviates from the simple lognormal stock price model and/or introduces path dependency, as well as the construction of appropriate subsolutions. On the other hand, the universal IS will reduce the complexity of the problem considerably by only dealing with probabilities. Even so, it still leads to a number of challenging yet interesting future research directions.

For example, some commonly used stock price models are jump diffusion models. The building blocks in those models, for instance, can take the form

W(t)+i=1N(t)Yi,
where W is a Brownian motion, N is a Poisson process, and {Y1,Y2,} determine jump sizes. It would be interesting to construct an explicit and asymptotically efficient algorithm similar to (20) by analyzing the large deviation rate functions associated with those building blocks. Another example is path-dependent options, such as barrier options and Asian options. Regardless of whether such options are discretely or continuously monitored, one has to deal with a very high-dimensional or infinite-dimensional system. The construction and analysis of universal schemes within this context are not immediately clear. The last example that we would like to mention is with respect to American options. To the best of our knowledge, there has been no work on importance sampling for deep-out-of-the-money American options with provable asymptotic optimality. It would be interesting to see if any rigorous analysis can be done under the alternative large deviations embedding.

8. Heuristic Connection Between Two Embeddings

Finally, we would like to discuss the connection between the two large deviations embedding heuristically. It is difficult to give any rigorous justification, and hence, we should only proceed heuristically. The first large deviation scaling (4) leads to the rate function given by (5), whereas the second scaling (16) leads to (17). Intuitively speaking, the most important point for importance sampling under each large deviations scaling is the optimizing point in (5) and (17). Our goal is to heuristically justify that these two optimizing points are indeed close to each other in practice. In the special case where G is an indicator function (i.e., binary options), these two points are identical. More generally, let us assume that G(z)=[f(z)K]+ for some continuous function f (i.e., various call options). We also let T=1 for notational convenience. In this case, (5) is just

supzRd(log[f(z)K]+12z2)=supzD(log[f(z)K]12z2),(22)
where D{f(z)>K}={G(z)>0}. Assume that f is continuously differentiable and c1f(z)f(z)c2f(z) for some constants c1,c2>0 on the set D¯. Given the form of the stock prices, this assumption is very natural. The maximizing z*D¯ should satisfy the equation
f(z*)f(z*)K=z*,
and thus,
c1f(z*)f(z*)Kz*c2f(z*)f(z*)K.

Because we are interested in the case where K is large and z as K if zD¯ (because of the continuity of f), we can assume that z*>c2, and thus, the preceding inequality amounts to

Kz*z*c1f(z*)Kz*z*c2.

Thus, the maximization problem for the first large deviations scaling or Equation (22) can be taken over all z such that

zDD{z: Kzzc1f(z)Kzzc2}.(23)

For all such z (note that ǁzǁis large when K is large), we have

log[f(z)K]12z2=logK+(12z2+ϕ(z)),
where ϕ(z)=o(z2) because
logc1zc1ϕ(z)logc2zc2.

Therefore, intuitively, the optimization problem is more or less equivalent to the following:

supzD12z2.

However, recalling the definition of D in (23) and that for each i=1,2,

zzci1,
we conclude that D is very close to D. This justifies that the optimizer for (5) is very close to that of (17).

Acknowledgments

The authors express their appreciation to the anonymous referee for careful suggestions.

References

  • Blanchet JH, Glynn P, Liu JC (2007) Fluid heuristics, Lyapunov bounds and efficient importance sampling for a heavy-tailed G/G/1 queue. Queueing Systems 57(2):99–113. Google Scholar
  • Bucklew JA (2010) Introduction to Rare Event Simulation (Springer-Verlag, New York).Google Scholar
  • Dembo A, Zeitouni O (1998) Large Deviations Techniques and Applications (Springer-Verlag, New York).Google Scholar
  • Dupuis P, Wang H (2004) Importance sampling, large deviations, and differential games. Stochastics Stochastic Rep. 76(6):481–508.Google Scholar
  • Dupuis P, Wang H (2007) Subsolutions of an Isaacs equation and efficient schemes for importance sampling. Math. Oper. Res. 32(3):723–757.LinkGoogle Scholar
  • Dupuis P, Wang H (2009) Importance sampling for Jackson networks. Queueing Systems 62(1):113–157.Google Scholar
  • Glasserman P (2004) Monte Carlo Methods in Financial Engineering (Springer-Verlag, New York).Google Scholar
  • Glasserman P, Wang Y (1997) Counterexamples in importance sampling for large deviations probabilities. Ann. Appl. Probab. 7(3):731–746.Google Scholar
  • Glasserman P, Heidelberger P, Shahabuddin P (1999) Asymptotically optimal importance sampling and stratification for pricing path-dependent options. Math. Finance 9(2):117–152.Google Scholar
  • Glasserman P, Kang W, Shahabuddin P (2008) Fast simulation of multifactor portfolio credit risk. Oper. Res. 56(5):1200–1217.LinkGoogle Scholar
  • Guasoni P, Robertson S (2008) Optimal importance sampling with explicit formulas in continuous time. Finance Stochastics 12(1):1–19.Google Scholar
  • Heidelberger P (1995) Fast simulation of rare events in queueing and reliability models. ACM Trans. Model. Comput. Simulation 5(1):43–85.Google Scholar
  • Karatzas I, Shreve SE (1991) Brownian Motion and Stochastic Calculus (Springer-Verlag, New York).Google Scholar
  • Rubinstein RY, Kroese DP (2007) Simulation and the Monte Carlo Method (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Sadowsky JS (1996) On Monte Carlo estimation of large deviations probabilities. Ann. Appl. Probab. 6(2):399–422.Google Scholar
  • Sigmund JS (1976) Importance sampling in the Monte Carlo study of sequential tests. Ann. Statist. 4(4):673–684.Google Scholar
  • Wang H (2012) Monte Carlo Simulation with Applications to Finance (CRC Press, Boca Raton, FL).Google Scholar
  • Wang H, Zhou X (2015) A cross-entropy scheme for mixtures. ACM Trans. Model. Comput. Simulation 25(1):6.Google Scholar