Topics:
$\renewcommand{bt}[1]{\tilde{\boldsymbol #1}}$
Monte Carlo: a town in Monaco principality in SE France, gambling resort.
$\renewcommand{hs}{\hat{\bs s}}$ Monte carlo simulation:
Typical applications are:
Compute the value of $\pi$ using Monte Carlo:
def sim_pi(ns) :
es = np.random.uniform(size=[ns[-1], 2])
d = np.array([x*x + y*y < 1. for x, y in es])
return np.array([4.*np.sum(d[:n])/n for n in ns])
ns = 4**(np.arange(8))*100
ps = sim_pi(ns)
figure(figsize=[10, 4])
subplot(1, 2, 1)
es = np.random.uniform(size=[500, 2])
d = np.array([x*x + y*y < 1. for x, y in es])
dn = np.logical_not(d)
plot(es[d,0], es[d,1], 'r.');
plot(es[dn,0], es[dn,1], 'b.');
x = np.arange(0, 1.01, .01)
plot(x, np.sqrt(1-x*x), 'k');
subplot(1, 2, 2)
semilogx(ns, ps, '.-');
axhline(np.pi, color='r')
legend(['Simulated', 'True Value'], loc='best')
xlabel('Number of dots')
title('Simulation of $\pi$');
Given a random variable $\renewcommand{bs}{\boldsymbol} \renewcommand{var}{\text{var}} \renewcommand{std}{\text{std}} \renewcommand{cov}{\text{cov}} \tilde{r}$, its true statistics are referred as the population statistics:
We use $\bs r = [r_1, r_2, ..., r_n]^T$ to represent samples of $\tilde r$:
The samples themselves are random as well, we can define their statistics:
$\hat{\mu}$ is the mean of the samples:
$$\hat{\mu} = \frac{1}{n}\sum_{i=1}^n r_i = \frac{1}{n} \bs 1^T \bs r$$
$\hat \mu$ is an unbiased estimator of the population mean $\mu$:
$$\mathbb{E}[\hat{\mu}] = \frac{1}{n} \bs 1^T \bs 1 \mu = \mu $$
the variance of sample mean:
$$\var[\hat{\mu}] = \cov(\frac{1}{n} \bs 1^T \bs r, \frac{1}{n} \bs 1^T \bs r) = \frac{1}{n^2} \bs 1^T V \bs 1 = \frac{1}{n^2} \bs 1^T \sigma^2 I \bs 1 = \frac{\sigma^2}{n}$$
$\hat \sigma^2$ is the variance of the sample:
$$\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n (r_i - \hat{\mu})^2 = \frac{1}{n} (\bs r - \bs 1 \hat{\mu})^T (\bs r - \bs 1 \hat{\mu})$$
$\hat{\sigma}^2$ is an biased estimator of the population variance $\sigma^2$:
$$ \begin{array} \\ \mathbb{E}[\hat{\sigma}^2] &= \mathbb{E}\left[\frac{1}{n}\left(\bs r^T \bs r - 2 \bs r^T \bs 1 \hat \mu + \bs 1^T \bs 1 \hat \mu^2\right)\right] = \mathbb{E}\left[\frac{1}{n}\bs r^T \bs r - \hat \mu^2\right] \\ &= \mathbb{E}[\tilde r^2] - \mathbb{E} [\hat \mu^2] = (\sigma^2 + \mu^2) - (\frac{\sigma^2}{n} + \mu^2) = \frac{n-1}{n} \sigma^2 \end{array} $$
$\hat{s}^2 = \frac{n}{n-1}\hat{\sigma}^2$ is therefore an unbiased estimator of $\sigma^2$.
$$\hat s = \sqrt{\hat s^2} = \sqrt {\frac{1}{n-1}\sum_{i=1}^n (r_i - \hat{\mu})^2} $$
Is standard deviation is an unbiased estimator of $\sigma$?
$$ \mathbb{E}[\hat s] = \mathbb{E}[\sqrt{\hat s^2}] \le \sqrt{\mathbb{E}[\hat s^2]} = \sqrt{\sigma^2} = \sigma$$
$$\var[\hat s^2] = \frac{1}{n}(\mu_4 - \sigma^4 \frac{n-3}{n-1}) \approx \frac{\beta - 1}{n} \sigma^4$$
The MC error of $\hat \mu$ can be estimated as $\frac{\hat s}{\sqrt{n}}$:
The error of MC error is therefore $\std(\frac{\hat s}{\sqrt n}) = \frac{\std(\hat s)}{\sqrt n}$:
Use normality: $\hat s = \mu_s + \sigma_s \tilde \epsilon$, where $\tilde \epsilon \sim N(0, 1)$:
$$ \begin{array}{l} \mathbb{E}[\hat s^2] &= \mu_s^2 + \sigma_s^2 = \sigma^2 \\ \var[\hat s^2] &= \var[ 2\mu_s\sigma_s\tilde\epsilon + \sigma_s^2 \tilde \epsilon^2] = 4\mu_s^2\sigma_s^2 + 2\sigma_s^4 \approx \frac{(\beta-1)\sigma^4}{n} \end{array}$$
Note that $\var[\tilde \epsilon^2] = \mathbb{E}[\tilde \epsilon^4] - \mathbb{E}^2[\tilde \epsilon^2] = 2$ and $\cov[\tilde \epsilon, \tilde \epsilon^2] = 0$, then:
$$\sigma_s^4 - 2 \sigma^2 \sigma_s^2 + \frac{(\beta-1)\sigma^4}{2n} = 0$$
thus $\sigma_s = \std(\hat s) \approx \sigma\sqrt{\frac{\beta-1}{4n}}$, not even close to $\sqrt[4]{\var(\hat s^2)} \approx \sigma\sqrt[4]{\frac{\beta-1}{n}}$
Definition | Mean | Variance | |
---|---|---|---|
Population | $\tilde r$ | $\mu$ | $\sigma^2$ |
Sample mean | $\hat u = \frac{1}{n}\bs 1^T \bs r$ | $\mu$ | $\frac{\sigma^2}{n}$ |
(adjusted) Sample variance |
$\hat s^2 = \frac{1}{n-1} \sum_i (r_i - \hat \mu)^2 $ | $\sigma^2$ | ~$\frac{\beta - 1}{n} \sigma^4$ |
Sample stddev | $\hat s$ | ~$\sigma \sqrt{1-\frac{\beta-1}{4n}} $ | ~$\frac{\beta-1}{4n}\sigma^2$ |
$\sigma^2$ can be well approximated by the unadjusted sample variance:
$$\hat\sigma^2 = \frac{1}{n} \sum_i (r_i - \hat \mu)^2 = \frac{1}{n}\sum_i r_i^2 - \hat \mu^2 $$
Consider the example of computing $\pi$ using MC: $$\renewcommand{ind}{1{\hskip -2.5 pt}\hbox{l}}$$
$$\frac{\pi}{4} = \mathbb{E}\left[\ind\left(\sqrt{\tilde x^2 + \tilde y^2} < 1\right)\right]$$
where $\mathbb{\ind}(\cdot)$ is an indicator function, the population mean, variance and 4th moment are:
$$\begin{array}{l} \mu &= \mathbb{E}[\tilde \ind] = \frac{\pi}{4} \\ \sigma^2 &= \var[\tilde \ind] = \mathbb{E}[\tilde \ind^2] - \mu^2 = \frac{4\pi - \pi^2}{16} \\ \mu_4 &= \mathbb{E}\left[\left(\tilde \ind - \mu\right)^4\right] = \frac{\pi}{4}(1-\frac{\pi}{4})^4 + (1-\frac{\pi}{4})(\frac{\pi}{4})^4 \\ \beta &= \frac{\mu_4}{\sigma^4} = 2.933 \end{array} $$
The variance and error of sample statistics:
u = np.pi/4
v = u*(1-u)
u4 = np.pi/4*(1-np.pi/4)**4 + (1.-np.pi/4)*(np.pi/4)**4
ns = 2**(np.arange(1, 14))
var_u = v/ns
var_var = (u4 - v*v*(ns-3)/(ns-1))/ns
beta = u4/v**2
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
loglog(ns, np.transpose([var_u, var_var]), '.-')
legend(['var of $\hat \mu$', 'var of $\hat s^2$']);
xlabel('Number of random samples')
title('Sample Variance');
ax = fig.add_subplot(122)
loglog(ns, np.transpose([np.sqrt(var_u), np.sqrt((beta-1.)/4./ns)*np.sqrt(v), np.sqrt((beta-1.))/2./ns*np.sqrt(v)]), '.-');
title('Error Estimate')
legend(['std of mean', 'std of $\hat s$', 'err of MC err'], loc='best');
xlabel('Number of random samples');
Let's revisit the estimation of stock return's mean and volatility, suppose:
The historical mean estimator $\hat u = \frac{1}{n}\sum_i r_i $ has an error of:
The vol is estimated as the standard deviation $\hat s$ of the daily return $r_i$:
Advanced MC techniques often involves dependent samples
Batching is a generic method to estimate MC errors with dependent samples
The MC example of $\pi$ is really to solve the integral of:
$$\int_0^1\sqrt{1-x^2}dx = \frac{\pi}{4}$$
However, MC shines because its convergence is dimension independent:
Dimension | Simpson Integration Points | Equivalent MC samples | Cost Ratio (MC/Simpson) |
---|---|---|---|
1 | $10$ | $10^{8} $ | $10^{7}$ |
5 | $10^5$ | $10^{8}$ | $1000$ |
10 | $10^{10}$ | $10^{8}$ | $0.01$ |
20 | $10^{20}$ | $10^{8}$ | $10^{-12}$ |
In Finance, we routinely run into very high dimensionalities, every source of randomness is a separate dimension:
Most numerical methods fail at high dimensionality
x = np.arange(0, 10)*.1
y = np.arange(0, 10)*.1
xm, ym = np.meshgrid(x, y)
figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(xm, ym, '.');
xlabel('x')
ylabel('y')
title('100 Grid Samples');
subplot(1, 2, 2)
xs = np.random.uniform(size=[2, 100])
plot(xs[0,:], xs[1,:], '.');
title('100 Random Samples');
xlabel('x')
ylabel('y');
Monte Carlo is effective in combating high dimensions:
$$ \frac{v(t_0)}{m(t_0)} = \mathbb{E}^\mathbb{Q}\left[\frac{v(t)}{m(t)} \bigg| \mathcal{F}_{t_0}\right] $$
There are good reasons why Monte Carlo is widely used in quant finance:
Robert Coveyou: the generation of random numbers is too important to be left to chance.
True random number generators (T-RNG) observe random physical processes:
Pseudo random number generators (P-RNG) are computer software that
Building good P-RNG requires highly specialized skills:
random()
in Excel is implemented incorrectly, don't use it for anything seriousThe most popular P-RNG:
numpy.random.uniform
P-RNGs produces uniform random numbers between [0, 1]
Given the cumulative distribution function $F(x)$, $\tilde x = F^{-1}(\tilde u)$ transforms a uniform $\tilde u$ to the desired distribution:
$$ \mathbb{P}[\tilde x < x]= \mathbb{P}[F^{-1}(\tilde u) < x] = \mathbb{P}[\tilde u < F(x)] = F(x) $$
Normal random number is the most important of all:
Box-muller is a classic method to transform independent uniform RNs to independent normal RNs:
numpy.random.normal
Numerical inversion of normal CDF is an effective alternative to generate normal random numbers:
Normal inversion is often preferred over the Box-muller.
How to create correlated normal RNs from independent normals RNs?
Mark Twain: if you must be indiscrete, be discrete in your indiscretion
To simulate a stochastic differential equation (SDE) numerically, we have to discretize it first.
$s(t)$ is the true 1-D stochastic processes as defined by a SDE
$\hat s(t; \delta)$ is a discretization of $s(t)$
The discretization error are application specific:
European style of derivatives: $\mathbb{E}[f(s(t))]$
Path dependent derivatives: $\mathbb{E}[f(s(\bs \tau))]$
$$\mathbb{E}[s(t) - \hat s(t; \delta)] = O(\delta^n)$$
$$\mathbb{E}\left[\left\Vert s(\bs \tau) - \hat s(\bs \tau; \delta) \right\Vert\right] = O(\delta^n)$$
Generic multi-dimensional diffusion SDE:
$$ d \bs s(t) = \bs \mu(\bs s, t) dt + \Sigma(\bs s, t) d \bs w(t) $$
The main workhorse for MC applications in quant finance:
$$\renewcommand{hs}{\hat{\bs s}} \hs(t + \delta) = \hs(t) + \bs \mu(\hs, t) \delta + \Sigma(\hs, t) L\bs \epsilon\sqrt{\delta} $$
But Euler discretization is slow in convergence:
For processes with known analytical solutions, more efficient sampling method can be devised
The most common process to model asset prices:
$$\frac{dx(t)}{x(t)} = \mu dt + \sigma dw(t)$$
its analytical solution is:
$$x(t) = x(t_0) e^{(\mu - \frac{1}{2}\sigma^2)(t-t_0) + \sigma w(t)}$$
$$dx(t) = \kappa(\bar x - x(t)) dt + \sigma dw(t)$$
The analytical solution is $$\small x(t) = x(t_0)e^{-\kappa (t-t_0)} + \bar x (1 - e^{-\kappa (t-t_0)}) + \int_{t_0}^t\sigma e^{\kappa(s-t)}dw(s)$$
the Ito integral $e(t) = \int_{t_0}^t\sigma e^{\kappa(s-t)}dw(s)$ is normally distributed:
$$e(t) \sim N\left(0, \frac{\sigma^2}{2\kappa}(1-e^{-2\kappa (t-t_0)})\right)$$
the trajectory of $x(t)$ in the OU can be sampled exactly with arbitrary time steps by drawing $e(t)$ instead of $w(t)$:
Cox-Ingersoll-Ross (CIR) process:
$$dx(t) = \kappa(x(t) - \bar x) dt + \sigma \sqrt{x(t)} dw(t)$$
Euler discretization could lead to negative values for $x(t)$
A better solution to prevent CIR process from becoming negative is to apply the transformation $y(t) = \log(x(t))$:
$$\small dy(t) = \frac{1}{x(t)}\left(\kappa(x(t) - \bar x) - \frac{\sigma^2}{2} \right) dt + \frac{\sigma}{\sqrt{x(t)}}dw(t) $$
For a diffusion SDE of the form:
$$ dx_t = u(x_t)dt + \sigma(x_t) dw_t $$
Observe that both $u(x_t)$ and $\sigma(x_t)$ are stochastic through $x_t$:
$$ d u_s = (\cdot) ds + \dot u_s \sigma_s dw_s, \;\; d\sigma_s = (\cdot) ds + \dot \sigma_s \sigma_s dw_s $$
where $\dot u_s = \frac{du(x_s)}{dx_s}, \dot \sigma_s = \frac{d\sigma(x_s)}{dx_s}$.
All corrections are higher order than $d t$ except the $dw_s dw_t$ term (note $t < u < s < t+\delta t$):
$$ \delta x_t = u_t \delta t + \sigma_t \delta w_t + \int_t^{t+\delta t}\int_t^{s}\dot\sigma_u \sigma_s dw_u dw_s $$
The last correction term can be approximated as: $$ \small \begin{eqnarray} \sigma_t\int_t^{t+\delta t}\int_t^{s}\dot\sigma_u dw_u dw_s &\approx& \sigma_t\dot\sigma_u \int_t^{t+\delta t}\int_t^{s} dw_u dw_s = \sigma_t\dot\sigma_t \int_t^{t+\delta t} (w_{s}-w_t)dw_s \\ &=& \sigma_t\dot\sigma_t \left(\int_t^{t+\delta t} w_{s}dw_s - \int_t^{t+\delta t}w_t dw_s\right) \\ &=& \sigma_t\dot\sigma_t \left(\int_t^{t+\delta t} w_{s}dw_s - w_t(w_{t+\delta t} - w_t)\right) \end{eqnarray}$$
Note that $\int w_t dw_t = \frac{1}{2}(w_t^2 - t)$, the term in the bracket becomes: $$ \small \frac{1}{2} (w_{t+\delta t}^2 - w_t^2) - \frac{1}{2}\delta t - w_t(w_{t+\delta t} - w_t) = \frac{1}{2}(w_{t+\delta t} - w_t)^2 - \frac{\delta t}{2} \approx \frac{1}{2}(\delta^2w - \delta t) $$
This term vanishes for infinitesimal $dt$ but not for finite sample size $\delta t$.
Therefore, if we sample $\delta w_t = \tilde z \sqrt{\delta t}$, we add the following correction term:
$$\delta x = u(t) \delta t + \sigma(t) \sqrt{\delta t} \tilde z + \frac{1}{2}\sigma \dot \sigma (\tilde z^2 - 1) \delta t $$
Consider a one-dimensional OU process, and its Euler discretization:
$$\small \begin{array} \\ dx(t) &= \kappa(\bar x - x(t)) dt + \sigma dw(t) \\ x(t_{i+1}) &= x(t_i) + \kappa(\bar x - x(t_i)) \delta + \sqrt{\delta} \sigma \epsilon_i \\ &= (1 - \kappa \delta) x(t_i) + \kappa \bar x \delta + \sqrt{\delta} \sigma \epsilon_i \\ \mathbb{E}[x(t_{i+1})] &= (1 - \kappa \delta) \mathbb{E}[x(t_i)] + \kappa \bar x \delta \end{array}$$
the expectation explodes if the $| 1 - \kappa \delta | > 1$, i.e., $\delta > \frac{2}{\kappa}$
import proc
u = .05
k = 10.
dt = 2/k
n = int(20./dt)
sigma = .02
ou = proc.OU(k=k, u=u, sig=sigma)
es = np.random.normal(size=[1, n])
et = .01
figure(figsize=[12, 4]);
subplot(1, 2, 1)
dt = 2/k - et
paths = ou.draw(es, u, dt)
plot(np.arange(n)*dt, paths.T)
xlabel('Time(Y)', fontsize=14);
title('Euler sample path with $\delta = %g, \kappa=%g$' % (dt, k), fontsize=16);
subplot(1, 2, 2)
dt = 2/k + et
paths = ou.draw(es, u, dt)
plot(np.arange(n)*dt, paths.T)
xlabel('Time(Y)', fontsize=14);
title('Euler sample path with $\delta = %g, \kappa=%g$' % (dt, k), fontsize=16);
Suppose we draw $n$ independent paths of $\hat s(\bs \tau; \delta)$, where $\bs \tau$ consists of $m$ discrete time samples.
There are two types of errors in Monte Carlo simulation of SDEs:
The two types of errors are independent, thus the total variance is:
$$ \epsilon^2 = c_1 \epsilon_d^2 + c_2\epsilon_{s}^2 = \frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}$$
Given a fixed computation budget of $mn = c$, the best allocation between $m$ and $n$ can be found by solving:
$$\begin{array}{c} \min_{m, n} \left(\frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}\right) \\ \text{subject to: } mn = c \end{array}$$
apply the Lagrange multiplier:
$$\small \begin{array}{l} l &= \left(\frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}\right) + \lambda(mn - c) \\ \frac{\partial l}{\partial m} &= -c_1\frac{2\gamma}{m^{2\gamma+1}} + \lambda n = 0 \\ \frac{\partial l}{\partial n} &= -c_2\frac{1}{n^2} + \lambda m = 0 \\ \lambda mn &= \frac{2c_1\gamma}{m^{2\gamma}} = \frac{c_2}{n} \end{array}$$
therefore the optimal strategy is to maintain $\frac{m^{2\gamma}}{n} = \frac{2c_1\gamma}{c_2}$
Recall Euler discretization is of 1st order weak convergence:
$$\mathbb{E}[f(\hs(t; \delta))] = \mathbb{E}[f(\bs s(t)] + c \delta + O(\delta^2)$$
Richard extrapolation can be applied to get 2nd order weak convergence:
$$\begin{array}{l} \mathbb{E}[f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + 2c \delta + O(\delta^2) \\ 2\mathbb{E}[f(\hs(t; \delta))] - \mathbb{E}[f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + O(\delta^2) \\ \mathbb{E}[2f(\hs(t; \delta)) - f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + O(\delta^2) \end{array}$$
Therefore the estimator $2f(\hs(t; \delta)) - f(\hs(t; 2\delta))$ is 2nd order weak convergence.
The variance of the estimator with Richardson extrapolation is:
$$\small \begin{array}{l} \var[2f(\hs(t; \delta)) - f(\hs(t; 2\delta))] \\ = \var\left[2f(\hs(t; \delta))\right] + \var\left[f(\hs(t; 2\delta))\right] - 2 \cov\left[2f(\hs(t; \delta)), f(\hs(t; 2\delta))\right] \\ = 4\sigma_\delta^2 + \sigma_{2\delta}^2 - 2\rho\sigma_\delta\sigma_{2\delta} \gtrsim 3\sigma^2 \end{array}$$
The following implementation maximizes the correlation between $f(\hs(t; \delta))$ and $f(\hs(t; 2\delta))$:
Lewis Carroll: it's a poor sort of memory that only works backwards.
Early exercise (American or Bermudan option):
The value of American put at time $t_i$ is: $$ v(t_i) = \max\left(\left(k-s\left(t_i\right)\right)^+, \mathbb{E}\left[v(t_{i+1})e^{-r\delta}\biggr|\mathcal{F}_{t_i}\right]\right)$$
Similarly, the value of American call at time $t_i$ is: $$ v(t_i) = \max\left(\left(s(t_i)-k\right)^+, \mathbb{E}\left[v(t_{i+1})e^{-r\delta}\biggr|\mathcal{F}_{t_i}\right]\right)$$
these equations are commonly known as the Bellman equation.
Is it optimal to early exercise?
The optimal exercise boundary of American put is a function of stock price and time: $b(s(t), t)$
The American option's value can be expressed as an optimization of:
$$ v(b^*(t)) = \max_{b(t)} [v\left(b\left(t\right)\right)] $$
This types of free boundary problem poses a serious challenge to MC
Before 2000, it was generally accepted that MC cannot solve these free boundary problems.
We draw 100,000 stock paths to 1Y with time grid $\delta = 0.01$ from the GBM:
t, m, r, s0, k, vol = 1., 52, .1, 100, 100, .25
dt = t/m
n = 400000
ts = np.arange(1, m+1)*dt
gbm = proc.GBM(r, vol)
spath = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
spath2 = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
eputs = np.maximum(k - spath[:,-1], 0)*np.exp(-r*t)
u = np.mean(eputs)
e = np.std(eputs)/sqrt(n)
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(ts, spath[:100].T);
xlabel('Time(Y)')
title('First 100 simulated stock path');
ax = fig.add_subplot(122)
ax.set_axis_off()
ax.text(0, .5, "Europan put price is %.2f, MC err %.2g" % (u, e), size="xx-large");
ax.text(0, .6, "True value is %.2f" % (5.46), size="xx-large");
Given the simulated path, one may propose the following simple method to price American option:
$$v(0) = \frac{1}{n} \sum_i e^{-rt^*_i}\left(k-s(t^*_i)\right)^+$$
This algorithm is easy to implement, but ...
import fmt
truv = 6.523
disc = np.exp(-r*ts)
to = [np.argmax(np.maximum(k-p, 0)*disc) for p in spath]
v = np.mean([np.max(np.maximum(k-p, 0)*disc) for p in spath])
ne = 1000
te, se = map(np.array, zip(*[((o+1)*dt, spath[i, o]) for i, o in enumerate(to[:ne])]))
me = np.argmin(se)
df = pd.DataFrame(np.array([v, truv]), index=['Pathwise Optimal', 'True Value'], columns=['American Put'])
fmt.displayDF(df.T, "3f", 4)
Pathwise Optimal | True Value | |
---|---|---|
American Put | 12.253 | 6.523 |
figure(figsize=[12, 4])
subplot(1, 2, 1)
nidx = np.array([0, 1, 2, me])
plot(ts, spath[nidx].T)
plot(te[nidx], se[nidx], 'o')
xlabel('Time (Y)')
title('Pathwise optimal exercise')
subplot(1, 2, 2)
plot(te, se, "o");
xlabel('Time (Y)')
title('Exercise points');
the reason is that it uses the future information, i.e., with supernatural ability:
Least square is an important breakthrough in MC, it is the method of choice for early exercise.
The key insight is that the conditional expectation in Bellman equation can be solved by regression and backward-induction.
$$ v(t_i) = \max\left(\left(k-s\left(t_i\right)\right)^+, \mathbb{E}\left[v(t_{i+1})e^{-r\delta}\biggr|\mathcal{F}_{t_i}\right]\right)$$
We present it using a numerical example of American put:
import mc
put_ev = lambda x : np.maximum(k - x, 0)
v_b, e_b, b_b, cvs = mc.amer_opt(spath, exp(-r*dt), put_ev, mc.fit_ex)
v_u, e_u, b_u, _ = mc.amer_opt(spath, exp(-r*dt), put_ev, mc.opt_ex)
The key idea of LSMC is to work backwards using the stored paths,
First, initialize the value at maturity $t_m$ as $\bs v(t_m) = \left(k - \hs(t_m)\right)^+$
idx = -1
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
plot(ts, spath[:20, :].T, label=None)
axvline(ts[idx], lw=5, label=None)
plot(ts[idx], k - b_b[idx], 'r>', ms=15)
xlabel('Time', fontsize=16)
title('Stock Paths', fontsize= 18)
subplot(1, 2, 2)
plot(spath[:500,idx], cvs[:500, idx], '.')
xlim(40, 120)
ylim(-10, 60)
plot(k, 0, 'r>', ms=15)
xlabel('Stock price $s(t = %g)$' % ts[idx], fontsize=16)
title('Payoff $v(t = %g)$' % (ts[idx]), fontsize=18);
def show_lsmc_step(idx, ne=500) :
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
plot(ts, spath[:20, :].T, label=None)
plot(ts[idx], k - b_b[idx], 'r>', ms=15, label=None)
axvline(ts[idx], lw=2)
xlabel('Time', fontsize=16)
ylabel('Price', fontsize=16)
title('Stock Paths', fontsize=18)
legend(['LSMC fit time'], loc='best')
subplot(1, 2, 2)
s = spath[:, idx]
cv = cvs[:, idx+1]
u = mc.fit_u(s, cv, put_ev(s))
sd = np.arange(10., 120., 1)
itm = np.greater(k, s[:ne])
otm = np.greater(s[:ne], k)
plot(s[:ne][itm], cv[:ne][itm], 'b.');
plot(s[:ne][otm], cv[:ne][otm], 'g.');
plot(sd, u(sd), 'r')
plot(sd, np.maximum(k-sd, 0), 'k')
plot(k - b_b[idx], b_b[idx], 'r>', ms=15)
legend(['in the money', 'out of the money', 'LS fit', 'exer value']);
xlim(40, 120)
ylim(-10, 60)
xlabel('Current stock price $s(t=%.2f)$' % (ts[idx]), fontsize=16);
ylabel('Continuation value', fontsize=16)
title('Least Square Fit @ t=%.2f' % (ts[idx]), fontsize=18);
show_lsmc_step(-2)
show_lsmc_step(-3)
show_lsmc_step(-20)
show_lsmc_step(-40)
show_lsmc_step(0)
We reached the first time step $t=\delta$, where the option value can be computed as the average of discounted $\bs v(t=\delta)$, i.e., $v =\frac{1}{n} e^{-r\delta}\bs 1^T \bs v(t_0)$.
import fmt
df = pd.DataFrame(np.array([v_b, truv]), index=['LSMC MC', 'True Value'], columns=['American Put'])
fmt.displayDF(df.T, "3f", 3)
LSMC MC | True Value | |
---|---|---|
American Put | 6.493 | 6.523 |
at time $t_{m-1}$: $\hs(t_{m-1})$ and the corresponding c.v. $\bs v(t_m)e^{-r\delta}$ are known:
construct a smooth function $u(s; t_{m-1})$ that minimizes
$$\left\Vert \ind\left(\hs(t_{m-1}) < k\right) \odot \left( u\left(\hs(t_{m-1}); t_{m-1}\right) - \bs v(t_m)e^{-r\delta} \right)\right\Vert_2 $$
$\odot$ is element wise vector multiplication.
plot(ts, k-b_b, '.-')
xlabel('Time', fontsize=16)
title('Exercise boundary', fontsize=18);
Exercise boundary is defined by:
$$ u(s; t_{m-1}) = c_0 + c_1 s + c_2 s^2 < k - s $$
One would think LSMC is biased low because of the inaccurate exercise boundary, is it really?
Consider the following two ways of pricing American option:
we can conclude the following (barring the discretization and MC errors):
Therefore, 1B is not necessarily a lower bound, due to the offsetting effects.
Given that 1A is a reliable lower bound, it is useful to find an upper bound:
We present a simple upper bound that is the most useful when the exercise boundary has simple topology.
A more precise upper bound can be found by solving for the optimal exercise boundary $b^*(t)$ by going backwards from the simulated paths:
This results in an upper bound because we did use the future information when solving for $b^*(t)$.
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(ts, k-np.transpose([b_a, b_u]), '.-')
xlabel('Time(Y)')
title('Exercise Boundary')
legend(['LSMC', 'Optimize $b^*$'], loc='best');
ylim(75, 100);
ax = fig.add_subplot(122)
ax.set_axis_off()
ax.text(0, .6, "Option value is bounded by [%.3f, %.3f]" % (v_a, v_u), size="xx-large")
ax.text(0, .5, "LSMC (1B) value is %.3f" % v_b, size="xx-large");
ax.text(0, .4, "True value is %.3f" % truv, size="xx-large");
The path-wise LSMC values are dependent
To estimate the exact LSMC error, we have to use batching
def lsmc_batch(b, n) :
bvs = []
errs = []
spath2 = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
disc1p = exp(-r*dt)
for i in range(b) :
spath = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
v_b, e_b, b_b, _ = mc.amer_opt(spath, disc1p, put_ev, mc.fit_ex)
v_u, e_u, b_u, _ = mc.amer_opt(spath, disc1p, put_ev, mc.opt_ex)
fixed_b.b = b_b
fixed_b.idx = len(b_b)
v_a, e_a, b_a, _ = mc.amer_opt(spath2, disc1p, put_ev, fixed_b)
bvs.append((v_a, v_b, v_u))
errs.append((e_a, e_b, e_u))
spath2 = spath
return np.array(bvs), np.array(errs)
bvs, errs = lsmc_batch(10, n)
import fmt
import pandas as pd
df = pd.DataFrame(np.concatenate((bvs.T, errs.T)),
columns = ['Batch %d' % (i+1) for i in range(10)],
index=['LSMC - LB', 'LSMC', 'LSMC - UB',
'err - LB', 'LSMC err ', 'err - UB'])
sd = np.std(df.T)
u = np.mean(df.T)
df['Batch Mean'] = u
df['Batch Std'] = sd
idx = [0, 1, 2, 4]
fmt.displayDF(df.iloc[idx, :], "3f", 2)
Batch 1 | Batch 2 | Batch 3 | Batch 4 | Batch 5 | Batch 6 | Batch 7 | Batch 8 | Batch 9 | Batch 10 | Batch Mean | Batch Std | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LSMC - LB | 6.507 | 6.492 | 6.486 | 6.498 | 6.519 | 6.511 | 6.497 | 6.517 | 6.486 | 6.517 | 6.503 | 0.012 |
LSMC | 6.506 | 6.499 | 6.514 | 6.530 | 6.526 | 6.507 | 6.528 | 6.496 | 6.528 | 6.537 | 6.517 | 0.014 |
LSMC - UB | 6.548 | 6.540 | 6.554 | 6.573 | 6.563 | 6.550 | 6.568 | 6.533 | 6.571 | 6.582 | 6.558 | 0.015 |
LSMC err | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.000 |
The main application is exotic options whose exercise decisions depend on multiple factors.
Recommended reading:
Homework: