Importance Sampling for Rainbow Option Pricing
Abstract
This paper studies the applications of state-dependent importance sampling in pricing exotic rainbow options. We demonstrate that for many rainbow options, efficient dynamic importance sampling schemes can be constructed from subsolutions to appropriate partial differential equations. We also introduce an alternative large deviations scaling, which leads to universal importance sampling for rainbow options.
1. Introduction
Importance sampling is a general variance reduction technique in Monte Carlo simulation, and it has numerous applications in computational finance; see, for example, Glasserman et al. (1999, 2008), Glasserman (2004), Guasoni and Robertson (2008), and Wang (2012). In importance sampling, samples are simulated from alternative probability distributions, and an unbiased estimator is formed by multiplying each sample with an appropriate likelihood ratio. The literature on importance sampling is vast. Various methodologies and performance analyses can be found in this very partial list: Sigmund (1976), Heidelberger (1995), Sadowsky (1996), Dupuis and Wang (2004), Blanchet et al. (2007), Rubinstein and Kroese (2007), Bucklew (2010), and the references therein. Although most of the work on importance sampling is empirical, the goal of this paper is to build efficient importance sampling algorithms for the estimation of the price of high-dimensional rainbow options with rigorously provable asymptotically optimal schemes in the context of rare event simulation.
Almost all of the applications of importance sampling to computational finance are concerned with state-independent changes of measure. Within our context, state independence means that the change of measure can only depend on time. In other words, the alternative sampling distribution, which is usually obtained from solving an appropriate variational problem, remains fixed at any given time, regardless of the state of the system (Glasserman et al. 1999, Glasserman 2004, Guasoni and Robertson 2008). For such importance sampling schemes to attain efficiency, certain minimax conditions need to be imposed (see, e.g., Glasserman et al. 1999, condition 2.14 and Guasoni and Robertson 2008, condition 3.7). However, such minimax conditions will not hold in general, especially in the setting of rainbow options because many payoff functions are not convex. In Wang and Zhou (2015), a modified crossentropy scheme to select alternative sampling distributions from a class of mixtures for rainbow option simulation has been introduced. However, the scheme lacks theoretical performance analysis, and it requires pilot samples that may incur nontrivial computational overhead in higher dimensions.
In a series of papers, Dupuis and Wang (2004, 2007, 2009) have established that the change of measure needs to be state dependent (i.e., the change of measure can depend on both time and system state) in order to achieve efficiency in general. Such schemes, named dynamic importance sampling (dynamic IS) schemes, are built from suitable subsolutions to relevant Hamilton–Jacobi–Bellman (HJB) equations, and they have found much success in the setting of queueing networks. The goal of this paper is to systematically construct efficient state-dependent importance sampling schemes for rainbow options. We will demonstrate that the framework of dynamic importance sampling through subsolutions can be adapted to the simulation of rainbow options as well. However, this approach has limited success because it requires the option payoff function to satisfy certain conditions (Section 4.3). When applicable, the resulting dynamic importance sampling schemes are explicitly calculable and are not difficult to implement. Moreover, rigorous performance analysis can be carried out to establish the efficiency or asymptotic optimality of such schemes.
The difficulty of applying this dynamic importance sampling to general rainbow options lies in the construction of appropriate subsolutions to the associated HJB equations. This is largely because of the complicated and high-dimensional nature of a rainbow option’s payoff function. In order to circumvent this difficulty, we observe that all of these HJB equations are derived under the classical large deviation embedding, which has played an important role in the design of importance sampling schemes (Glasserman et al. 1999, Dupuis and Wang 2007, Guasoni and Robertson 2008). In this paper, we introduce a novel alternative large deviations embedding, which will reduce the complexity of the problem considerably. In short, it becomes sufficient to only investigate the estimation of probabilities instead of general expected values. Consequently, the construction of the important sampling schemes becomes much easier. The resulting asymptotically optimal (under this alternative embedding) importance sampling schemes are still state dependent, but the dependency is much simpler.
The paper is organized as follows. Section 2 establishes the model for the financial market. In Section 3, we give an overview of importance sampling and asymptotic optimality as well as a literature review. Section 4 considers the traditional large deviations embedding and establishes the connection between subsolutions and asymptotic optimality. We also discuss the construction of subsolutions when the terminal conditions take a certain form. The alternative large deviations embedding and the resulting importance sampling scheme are introduced in Section 5. Examples of numerical simulations are given in Section 6. We conclude the paper with further remarks and heuristic discussions in Sections 7 and 8.
2. Market Model and Rainbow Options
Because we are only interested in option pricing, we assume that the probability space is equipped with the risk-neutral probability measure P. Let r be the constant risk-free interest rate. Consider a financial market with d risky assets whose prices at time t are denoted by a column vector . We will assume that the financial market is complete and that stock prices are geometric Brownian motions. More precisely, let be a d-dimensional Brownian motion with strictly positive definite covariance matrix of size and for every . Let be a column vector of strictly positive constants that denotes the volatilities of stock prices. For , the stock price is modeled by the geometric Brownian motion
We are interested in estimating the price of a rainbow option with expiration date T and payoff , where is a nonnegative function defined on the open positive orthant
The option price is simply the expected value . Prices of most rainbow options do not admit explicit formulae. The goal of our paper is to construct efficient importance sampling schemes for simulating such option prices.
For future analysis, it is convenient to introduce the log stock price process , where for each ,
In particular, letting be a matrix such that (see Remark 1), we can rewrite , where
C is a diagonal matrix with for , and is a d-dimensional normal random vector with distribution . Therefore, the discounted payoff can be expressed as
Finally, we would like to impose some mild technical conditions on the payoff function h, which are satisfied by nearly all rainbow options. Throughout the paper, we will assume that h is of linear growth; that is, there exist some nonnegative constants a and b such that
When is strictly positive definite, there is a particularly convenient choice of based on Cholesky factorization (Glasserman 2004). In this case, is lower triangular, and its components are explicitly available.
Even though our analysis is largely based on market models where the stock prices are geometric Brownian motions, this restriction is not essential. The idea is applicable to more general jump diffusion models. We will discuss this extension with more details in Section 7.
3. Overview of Importance Sampling
Importance sampling is a general variance reduction technique for Monte Carlo simulation. It is particularly effective in the context of rare event simulation. The idea is to generate samples from a different probability distribution and multiply each outcome with an appropriate likelihood ratio so that the resulting estimator is unbiased.
Hereon, we will equip the sample space with probability measures other than the original risk-neutral probability measure P. The expected value with respect to P will simply be denoted by , whereas the expected value with respect to some other probability measure, say Q, will be denoted by . If those alternative probability measures are indexed by some parameter , say , then the expected value with respect to those probabilities will be denoted by for simplicity when no confusion is incurred.
Suppose one is interested in estimating the expected value of some random variable X. Let Q be an alternative probability measure such that . Then,
In other words, importance sampling produces an unbiased estimator by generating samples from an alternative probability measure Q and multiplying each sample of X by the likelihood ratio . The key in the design of importance sampling algorithms is the choice of the alternative sampling distribution Q. When appropriately designed, importance sampling can be very efficient, reducing the variance of the estimator by orders of magnitude. It is particularly useful in the estimation of probabilities or expected values related to rare events.
A commonly used criterion for selecting the alternative sampling distribution is the so-called “asymptotic optimality” (Sigmund 1976, Heidelberger 1995). Its definition requires that the present estimation problem be embedded into a sequence of estimation problems, where certain types of large deviations convergence results hold. To fix ideas, suppose that we wish to estimate the expected values of nonnegative random variables indexed by (the original problem corresponds to some ; i.e., for some ). Denote . We assume that a large deviation type of asymptotics exists; that is, there exists a constant such that
Let be a new probability measure such that . The importance sampling estimator denoted by is given by
(
To put everything into the framework of asymptotic optimality, we need to embed the original estimation problem into a sequence of estimation problems. Such embedding is not unique, which is a useful observation that we will exploit later on to deal with certain rainbow options.
4. Efficient Importance Sampling
It has been established in Glasserman and Wang (1997) and Dupuis and Wang (2004, 2007) that state-independent changes of measure will not lead to efficient importance sampling schemes in general, whereas asymptotically optimal state-dependent schemes can be constructed from classical subsolutions to related Hamilton–Jacobi–Bellman equations. For the financial market model under consideration, the stock prices are driven by Brownian motion B. Under an alternative sampling distribution that we consider, the Brownian motion B will become a Brownian motion with drift. Therefore, we expect that asymptotically optimal changes of measure correspond to state-dependent drifts in general.
4.1. The Classical Large Deviation Embedding
Recall the definition of G in (2). The goal is to estimate the option price , where is a d-dimensional normal random vector with distribution . Following the setup in Glasserman et al. (1999), define for any . Note that F takes values in because G can be zero. Now, fix any . Define
The option price of interest corresponds to . Under very mild conditions, one can establish the large deviations or exponential growth/decay rate of . The following result is a slight generalization of the Varadhan integral lemma, and its proof can be found in Glasserman et al. (1999, lemma 2.2).
Suppose that is continuous and satisfies the growth condition for some and for all , where is either open or closed. Let Z be a d-dimensional normal random vector with distribution . Then,
Under our assumptions, for some nonnegative, continuous function that satisfies the linear growth Condition (3). Now, define according to (2) similarly with h replaced by . Finally let . It is not difficult to check that is continuous and satisfies the growth condition of Lemma 1; indeed, for some and every . Because h and agree on the set , it follows that and thus, on the set . Furthermore, because is either open or closed and is a continuous bijection from to , D is either open or closed. Thus, we may apply Lemma 1 to obtain that
Finally, we observe that the supremum is always attained at some when F is continuous on because of the linear growth condition of F.
4.2. HJB Equation and Subsolution
Even though Dupuis and Wang (2004, 2007) are mostly concerned with discrete time dynamics, the technique therein can be naturally adapted to the diffusion setting. We should omit these repetitive heuristic details of deriving the HJB equation and directly state the definition of a subsolution. To this end, define . Then, W is a d-dimensional standard Brownian motion. Define the scaled state process indexed by :
Then, the quantity of interest can be written as
We say a continuously differentiable function is a classical subsolution with bounded gradient if is uniformly bounded and if V satisfies the partial differential inequality (see Remark 5)
Note that the term is the large deviation rate function for the standard d-dimensional normal distribution and is closely associated with the large deviations of d-dimensional standard Brownian motion (Dembo and Zeitouni 1998, theorem 5.2.3). The change of measure corresponding to a classical subsolution V is given by Dupuis and Wang (2007):
More precisely, the alternative sampling distribution is defined by the probability measure , where
Because is assumed to be bounded, the right-hand side indeed defines a probability measure. By the Girsanov theorem (Karatzas and Shreve 1991), the process defined by
The performance of this estimate is characterized by the following theorem, which is essentially a continuous-time version of Dupuis and Wang (2007, theorem 8.1). This is the key result in the subsolution approach to importance sampling.
Let be a classical subsolution to the HJB Equations (6) and (7) with bounded gradient. If V is also twice continuously differentiable with respect to x with uniformly bounded Hessian matrix, then
Let denote the Hessian matrix of V and C be a constant such that for every and . Define the process with
Because of the boundedness of all first-order derivatives of V, V is of linear growth in both t and x. Therefore, it is not difficult to see that for some constant K and all . Applying the Itô formula and using (8), we have
Now, taking logarithm on both sides, multiplying with , and then, sending to zero, we complete the proof. □
Theorem 1 asserts that if , then the importance sampling estimate is asymptotically optimal. In many practical applications, can be chosen to be arbitrarily close to for a classical subsolution V.
The equation should be viewed as an Isaacs equation for interpretation. The interesting point of this Isaacs equation is that it is equivalent to the HJB equation by a scaling of two, which is the intrinsic reason why state-dependent importance sampling schemes can attain asymptotic optimality. In this paper, we simply use the form of the equivalent HJB equation. Details of the Isaacs equation may be found in Dupuis and Wang (2004, 2007).
4.3. Construction of Classical Subsolutions
Because of Theorem 1, building an efficient importance sampling scheme amounts to constructing a classical subsolution V of (6) and (7) with as close to as possible. In many applications, such subsolutions can be obtained as a mollification of piecewise affine weak subsolutions (Dupuis and Wang 2007).
Throughout this section, we assume that F is continuous on , where . Recall the definition of in (5), and denote by the maximizing point. We start with the simplest case where F is concave. Define by
We claim that V is a classical subsolution. Indeed, it satisfies (6) with equality. As for the terminal Inequality (7), we observe that because attains the supremum of (5). Therefore, by concavity,
A more common situation is that F is the maximum of a concave function. For many options, the corresponding F can be written in this form, such as multistrike call options. Assume that , where is a concave function for each . Define for each i
However, we cannot apply Theorem 1 directly because V is not differentiable everywhere. To do so, one can mollify the (weak) subsolution V to obtain a classical subsolution. To fix ideas, let be a small positive number, and define
Then, is a continuously differentiable function, and
It is easy to verify because of the convexity of the mapping for that satisfies the subsolution property (6). Furthermore, by the definitions of and V,
In other words, is a classical subsolution, where is only away from asymptotic optimality (see Remark 6). In practice, is chosen to be small to guarantee good numerical performance.
To achieve optimality, one can let depend on , and a result analogous to Theorem 1 can be obtained. More precisely, if and , then the corresponding importance sampling scheme is asymptotically optimal. The proof is verbatim to that of Theorem 1 except that the trace of is now bounded by for some constant , which is the reason behind condition .
5. An Alternative Embedding and Importance Sampling
In this section, we investigate an alternative large deviations embedding and the resulting importance sampling scheme.
5.1. Motivation and Overview
The classical large deviation embedding introduced by (4) and the subsequent subsolution approach yield a general framework for constructing asymptotically optimal importance sampling schemes. We have shown that if the payoff of a rainbow option can be expressed as the maximum of concave functions, the construction of subsolutions can be quite straightforward. However, many rainbow options (even simple ones, such as basket call) do not belong to this category. For those options, it is unclear, particularly in high dimensions, how to find systematic ways to construct subsolutions that lead to asymptotically optimal importance sampling schemes.
This prompts us to introduce a large deviation embedding other than (4). A distinct advantage of this alternative embedding is that the resulting importance sampling scheme is very easy to construct because it does not depend on the specific form of the payoff function. Rather, the construction will only rely on the distance from the origin to the region where the payoff function is strictly positive. For this reason, such importance sampling schemes are said to be “universal” (Dupuis and Wang 2004). They are still state-dependent importance sampling schemes, but the dependence is minimal; see (20).
5.2. An Alternative Large Deviation Embedding
To describe this new approach, we recall the definition of G in (2) and that of . Consider the following large deviations scaling. For any , define
. D is nonempty, and , where denotes the interior of D.
G is continuous on .
For some positive constants a and b, for all .
does not contain the origin.
We would like to remark that Assumption 4 is imposed only to avoid the trivial case where importance sampling is not really necessary. It will not affect the theoretical results in any way.
Let Z be a d-dimensional normal random vector with distribution . Then, we have
Note that the last equality in (17) follows from classic large deviations and Assumption 1. It is trivial that the infimum is attained at some because D is nonempty.
To analyze , we fix an arbitrarily small and define the set . Note that is nonempty for small enough because is nonempty. It follows that , and classical large deviations theory yields
Now, letting , it is straightforward to show by Assumption 1 that , and thus,
As for the other direction, take an arbitrarily large constant , and observe that
It follows that
Because the second term on the right-hand side equals , it remains to show that
Because of the growth condition of G, it is sufficient to show that
Observe that by the Hölder inequality,
By the dominated convergence theorem, as . Furthermore, classical large deviations theory implies that
5.3. Asymptotic Optimality
Below is the main result of the paper. It establishes the connection between the estimation of and that of , and it confirms the heuristic that a good importance sampling scheme for the latter is also good for the former. In particular, if an importance sampling scheme is asymptotically optimal for estimating , then it is also asymptotically optimal for estimating under some mild conditions.
Assume that is asymptotically optimal for the estimation of . If for some ,
Denote the likelihood . The importance sampling estimator is just
Take any large number M. It follows from (18) and that
By assumption, is asymptotically optimal for estimating . It follows that
Now, for the other term, we can again use the Hölder inequality. Define such that . Then,
Because of the assumption, the second term on the right-hand side cannot take a value of . Thus, it remains to show that
This is almost verbatim to the second half of the proof to Lemma 2, and we omit the details. Therefore, Inequality (19) follows readily. This completes the proof. □
5.4. Universal Importance Sampling
To apply Theorem 2, we need efficient importance sampling schemes for estimating . Here, we investigate the universal schemes brought up in Dupuis and Wang (2007, section 10.7) and argue that it is indeed asymptotically optimal for the estimation of as well as .
Recall and in (17). Fix any vector such that , and define by
The importance sampling change of measure is again given by the Girsanov transform
is asymptotically optimal for the estimation of both and .
We first show that is asymptotically optimal for estimating . This part is very similar to the proof of Theorem 1. Fix any , and define by (see Dupuis and Wang 2007, section 10.7)
It is not difficult to show that the gradient and Hessian of V are uniformly bounded by some constant, say C. Analogous to the proof of Theorem 1, we define the process exactly as in (10). Because is bounded, we easily have for some constant K and all . Applying the Itô formula, we have
In particular, we have for all . The rest of the proof is almost verbatim to that of Theorem 1, and we arrive at
Letting , we conclude that is asymptotically optimal for estimating . To complete the proof, it is sufficient to show that for some ,
We then invoke Theorem 2. This is trivial because is uniformly bounded. Thus,
Both the dynamic importance sampling scheme outlined in Section 4 and the universal importance sampling schemes (20) are state dependent. However, the latter is much easier to construct because it only requires information on or equivalently, the distance between the origin and the region D where the payoff function is strictly positive. Moreover, it is more broadly applicable because it does not require the payoff function to take any specific form. However, the price that we pay for this simplicity and generality is the efficiency. Even though the universal schemes are asymptotically optimal under the alternative large deviation scaling, they will not be as efficient as those schemes in Section 4 when the latter is applicable. This is quite obvious as the universal schemes do not utilize the details of the payoff functions. It is also demonstrated by the empirical evidence in the subsequent numerical experiments.
6. Numerical Experiments
In this section, we will present numerical examples on the pricing of rainbow options. For some options, we will use the dynamic importance sampling scheme outlined in Section 4, and for all options, we will also perform the universal importance sampling outlined in Section 5. In Tables 1–7, which show the numerical results, “R.E.” denotes the relative error of the importance sampling estimate (standard error divided by the estimate), and “var ratio” denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate. In all of the simulations, the time interval [0, T] is divided into equal-length subintervals (see Remark 8). Each estimate is based on n = 100,000 samples.
|
Table 1. Spread Call Option with Two Underlying Assets
| Dynamic | Universal | Dynamic | Universal | Dynamic | Universal | |
|---|---|---|---|---|---|---|
| Estimate | 1.2038 | 1.1828 | 0.0897 | 0.0903 | 0.0060 | 0.0060 |
| R.E., % | 0.24 | 0.70 | 0.33 | 1.25 | 0.39 | 1.83 |
| Var ratio | 18.4 | 2.4 | 138.1 | 9.6 | 1,699 | 76.1 |
| , | (1.4, −0.9) | (0.8, −0.6) | (2.4, −1.1) | (1.9, −1.0) | (3.2, −1.2) | (2.9, −1.1) |
Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
All of the rainbow options that we consider satisfy the conditions in Section 2, including the linear growth Condition (3). Also, recall the definitions of vector A and square matrices C and in (1) and thereafter. Note that , where G is defined in (2). When we apply universal IS, the set of interest is .
In this example, we consider a spread call option with underlying assets and payoff
Dynamic IS. One can show that F is a continuous concave function (we omit the technical details). Therefore, V in (11) is a classical subsolution with as the unique maximizer:
The corresponding change of measure is given by (8) and (9), where . Note that must satisfy .
Universal IS. The corresponding is given by Lemma 2, where the minimizing z must lie on .
We use the MATLAB function fminunc supplied with gradient (one can also use the iterative method in Glasserman et al. 1999) to solve for . The equation of can be converted into an equation of a single variable and solved by the bisection method. We omit the technical details and simply present the simulation results. For future reference, we also record and the maximizing for Lemma 2. The parameters are set at , , and
In this example, we consider an outperformance digital call option with d underlying assets and payoff
|
Table 2. Outperformance Digital Call Option with Three Underlying Assets
| Dynamic | Universal | Dynamic | Universal | Dynamic | Universal | |
|---|---|---|---|---|---|---|
| Estimate | 3.41E-3 | 3.45E-3 | 2.41E-4 | 2.35E-4 | 2.05E-5 | 2.09E-5 |
| R.E., % | 0.64 | 2.97 | 0.65 | 3.40 | 0.68 | 5.44 |
| Var ratio | 71.2 | 3.2 | 1,012 | 38.3 | 9,256 | 138.9 |
Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
where .
Dynamic IS. , where is a concave function for each i. Therefore, in (11) is a (weak) subsolution, where for each i, is given by (12) with as the unique maximizer:
Note that takes values of either zero or . Thus, the maximizer necessarily satisfies . From this observation, it is very straightforward to show that
where denotes the ith column vector of the matrix . A classical subsolution can be obtained by mollification (13) with derivatives (14) and (15). The corresponding change of measure is given by (8) and (9), where .Universal IS. In this case, the corresponding is given by Lemma 2, where obviously, the minimizing z must coincide with one of the ’s above, and it is easily identified as
For simulation, we consider a digital call with underlying assets. The mollification parameter is set to be 0.1 in dynamic IS, and the other parameters are , , and
In this example, we consider a multistrike call option with d underlying assets and payoff
|
Table 3. Multistrike Call Option with Four Underlying Assets
| Dynamic | Universal | Dynamic | Universal | Dynamic | Universal | |
|---|---|---|---|---|---|---|
| Estimate | 5.38E-2 | 5.45E-2 | 2.95E-3 | 3.01E-3 | 1.36E-4 | 1.38E-4 |
| R.E., % | 0.47 | 1.59 | 0.59 | 2.68 | 0.72 | 3.90 |
| Var ratio | 58.0 | 4.9 | 481.9 | 21.4 | 2,113 | 70.5 |
Notes. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
where .
Dynamic IS. , where is a concave function for each i. Therefore, of (11) is a (weak) subsolution, where for each i, is given by (12) with as the unique maximizer:
It can be derived (we omit the technical details) that for each i, , where denotes the ith column vector of the matrix and is the unique solution to the equation
Each can be obtained by the bisection method. A classical subsolution can be obtained by mollification (13) with derivatives (14) and (15). The corresponding change of measure is given by (8) and (9), where .
Universal IS. The calculation of is almost verbatim to that of Example 2 with K replaced by in each instance. Indeed,
The mollification parameter is set to be 0.1 in dynamic IS. We consider underlying assets with parameters , , and
In order to investigate how the dimension of the problem affects the performance of these schemes, we also consider a group of underlying assets and multistrike call options associated with the first d assets, with . The initial stock price , the volatility vector , and the covariance matrix are randomly generated. Below is a realization of these parameters:
The other parameters in the simulation remain the same except that the sample size is increased to n = 400,000 (Table 4).
Table 4. Multistrike Call Option with d Underlying Assets
Table 4. Multistrike Call Option with d Underlying Assets
Dynamic Universal Dynamic Universal Dynamic Universal Dynamic Universal Dynamic Universal Dynamic Universal Estimate 2.01E-13 1.96E-13 4.10E-6 3.73E-6 4.24E-6 4.60E-6 5.52E-5 5.21E-5 5.51E-5 5.42E-5 1.5E-3 1.6E-3 R.E., % 0.31 2.21 0.24 3.44 0.35 6.38 0.29 7.70 0.28 9.95 0.28 8.07 Note. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate).
The variance ratios to the plain Monte Carlo estimates are not presented because it is often the case that the plain Monte Carlo cannot produce meaningful estimates/variances for . For example, a typical plain Monte Carlo estimate for is zero, and a typical estimate for is 1.16E-5 with a relative error (in these cases, the true values of the expectation and variance are both vastly underestimated, a common occurrence in rare event simulation) (Glasserman and Wang 1997, Dupuis and Wang 2004). It is only for that the plain Monte Carlo produces meaning results, in which case the variance ratios are 2,300 and 4, respectively. In all of these simulations, one can see that both the dynamic IS and the universal IS outperform the plain Monte Carlo simulation. The dynamic IS, when feasible, is more efficient than the universal IS.
In this example, we show the advantage of universal IS by considering three different types of rainbow options. Let K, , and be strictly positive constants:
basket option: ;
pyramid option: ; and
Madonna option: .
For all of these options, the corresponding F cannot be expressed as the maximum of concave functions. Therefore, it is not clear how one can construct appropriate subsolutions and consequently, dynamic IS algorithms.
However, the universal IS algorithm is still easily applicable. For all simulations, we consider underlying assets with parameters , , and
We use the MATLAB function fmincon to solve the minimization problem in Lemma 2. There is no guarantee that the local minimum found by the function is the global minimum because of the lack of concavity (even though we have used random initial search conditions for fmincov to alleviate the problem to some degree). However, the simulation shows great improvement in performance, which is our ultimate goal. In Tables 5–7, “plain MC” stands for the plain Monte Carlo scheme.
|
Table 5. Basket Call Option with Three Underlying Assets
| Universal | Plain MC | Universal | Plain MC | Universal | Plain MC | |
|---|---|---|---|---|---|---|
| Estimate | 7.47E-3 | 7.09E-3 | 5.22E-4 | 4.73E-4 | 3.30E-5 | 1.20E-4 |
| R.E., % | 2.51 | 6.96 | 2.54 | 24.7 | 4.16 | 61.6 |
| Var ratio | 7.0 | — | 77.5 | — | 2,891 | — |
Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
|
Table 6. Pyramid Call Option with Three Underlying Assets
| Universal | Plain MC | Universal | Plain MC | Universal | Plain MC | |
|---|---|---|---|---|---|---|
| Estimate | 2.66E-2 | 2.74E-2 | 4.67E-3 | 5.03E-3 | 8.22E-4 | 7.75E-4 |
| R.E., % | 1.61 | 6.72 | 2.00 | 14.5 | 2.38 | 30.1 |
| Var ratio | 18.3 | — | 60.6 | — | 142.1 | — |
Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
|
Table 7. Madonna Call Option with Three Underlying Assets
| Universal | Plain MC | Universal | Plain MC | Universal | Plain MC | |
|---|---|---|---|---|---|---|
| Estimate | 1.07E-2 | 0.93E-2 | 1.06E-3 | 1.52E-3 | 1.09E-4 | 1.86E-4 |
| R.E., % | 1.67 | 9.53 | 2.36 | 27.2 | 4.28 | 59.4 |
| Var ratio | 24.7 | — | 282.8 | — | 554.4 | — |
Notes. Plain MC stands for the plain Monte Carlo scheme. R.E. denotes the relative error of the importance sampling estimate (standard error divided by the estimate). Var ratio denotes the ratio of the variance of the plain Monte Carlo estimate to that of the importance sampling estimate.
6.1. Summary
The numerical results have shown that both dynamic IS and universal IS greatly improve the efficiency of the Monte Carlo simulation, especially when the options are deep out of the money. When applicable, the dynamic IS outperforms the universal IS, which is not surprising considering that the universal scheme only uses very limited knowledge of the model. On the other hand, universal IS (still provably asymptotically optimal in the alternative scaling) leads to great variance reduction compared with the plain Monte Carlo algorithm, and it is much more broadly applicable. Further, it also is much easier to construct for various rainbow options with complicated payoffs.
The implementation of the dynamic importance sampling or the universal importance sampling requires the partition of the time interval. Even though we have picked a fairly large partition with here, the results remain nearly identical even if k is much smaller: for example, . Note that in this paper, the plain Monte Carlo simulation does not require the partition of the time interval, whereas the state-dependent importance sampling does. Such computational overhead is insignificant compared with the reduction in variance, especially for the simulation of deep out-of-money options.
7. Further Remarks
This paper discusses two state-dependent importance sampling schemes: the “dynamic IS” based on the classical large deviations embedding and the “universal IS” based on an alternative large deviations embedding. Both of them greatly improve the efficiency of Monte Carlo simulation of rainbow options. Even though the theory is based on market models with lognormal stock prices and path-independent options, in principle one may be able to extended the algorithms to more general market models and path-dependent options. For the dynamic IS, the challenge lies in the overly complicated structure of the HJB equations, which is inevitable when one deviates from the simple lognormal stock price model and/or introduces path dependency, as well as the construction of appropriate subsolutions. On the other hand, the universal IS will reduce the complexity of the problem considerably by only dealing with probabilities. Even so, it still leads to a number of challenging yet interesting future research directions.
For example, some commonly used stock price models are jump diffusion models. The building blocks in those models, for instance, can take the form
8. Heuristic Connection Between Two Embeddings
Finally, we would like to discuss the connection between the two large deviations embedding heuristically. It is difficult to give any rigorous justification, and hence, we should only proceed heuristically. The first large deviation scaling (4) leads to the rate function given by (5), whereas the second scaling (16) leads to (17). Intuitively speaking, the most important point for importance sampling under each large deviations scaling is the optimizing point in (5) and (17). Our goal is to heuristically justify that these two optimizing points are indeed close to each other in practice. In the special case where G is an indicator function (i.e., binary options), these two points are identical. More generally, let us assume that for some continuous function f (i.e., various call options). We also let for notational convenience. In this case, (5) is just
Because we are interested in the case where K is large and as if (because of the continuity of f), we can assume that , and thus, the preceding inequality amounts to
Thus, the maximization problem for the first large deviations scaling or Equation (22) can be taken over all z such that
For all such z (note that ǁzǁis large when K is large), we have
Therefore, intuitively, the optimization problem is more or less equivalent to the following:
However, recalling the definition of in (23) and that for each ,
The authors express their appreciation to the anonymous referee for careful suggestions.
References
- (2007) Fluid heuristics, Lyapunov bounds and efficient importance sampling for a heavy-tailed G/G/1 queue. Queueing Systems 57(2):99–113. Google Scholar
- (2010) Introduction to Rare Event Simulation (Springer-Verlag, New York).Google Scholar
- (1998) Large Deviations Techniques and Applications (Springer-Verlag, New York).Google Scholar
- (2004) Importance sampling, large deviations, and differential games. Stochastics Stochastic Rep. 76(6):481–508.Google Scholar
- (2007) Subsolutions of an Isaacs equation and efficient schemes for importance sampling. Math. Oper. Res. 32(3):723–757.Link, Google Scholar
- (2009) Importance sampling for Jackson networks. Queueing Systems 62(1):113–157.Google Scholar
- (2004) Monte Carlo Methods in Financial Engineering (Springer-Verlag, New York).Google Scholar
- (1997) Counterexamples in importance sampling for large deviations probabilities. Ann. Appl. Probab. 7(3):731–746.Google Scholar
- (1999) Asymptotically optimal importance sampling and stratification for pricing path-dependent options. Math. Finance 9(2):117–152.Google Scholar
- (2008) Fast simulation of multifactor portfolio credit risk. Oper. Res. 56(5):1200–1217.Link, Google Scholar
- (2008) Optimal importance sampling with explicit formulas in continuous time. Finance Stochastics 12(1):1–19.Google Scholar
- (1995) Fast simulation of rare events in queueing and reliability models. ACM Trans. Model. Comput. Simulation 5(1):43–85.Google Scholar
- (1991) Brownian Motion and Stochastic Calculus (Springer-Verlag, New York).Google Scholar
- (2007) Simulation and the Monte Carlo Method (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1996) On Monte Carlo estimation of large deviations probabilities. Ann. Appl. Probab. 6(2):399–422.Google Scholar
- (1976) Importance sampling in the Monte Carlo study of sequential tests. Ann. Statist. 4(4):673–684.Google Scholar
- (2012) Monte Carlo Simulation with Applications to Finance (CRC Press, Boca Raton, FL).Google Scholar
- (2015) A cross-entropy scheme for mixtures. ACM Trans. Model. Comput. Simulation 25(1):6.Google Scholar

