%pylab inline
import pandas as pd
import sys
import fmt
Populating the interactive namespace from numpy and matplotlib
Topics:
$\renewcommand{bt}[1]{\tilde{\boldsymbol #1}}$
Monte Carlo: a town in Monaco principality in SE France, gambling resort.
$\renewcommand{hs}{\hat{\bs s}}$ Monte carlo simulation:
Typical applications are:
Compute the value of $\pi$ using Monte Carlo:
def sim_pi(ns) :
es = np.random.uniform(size=[ns[-1], 2])
d = np.array([x*x + y*y < 1. for x, y in es])
return np.array([4.*np.sum(d[:n])/n for n in ns])
ns = 4**(np.arange(8))*100
ps = sim_pi(ns)
figure(figsize=[10, 4])
subplot(1, 2, 1)
es = np.random.uniform(size=[500, 2])
d = np.array([x*x + y*y < 1. for x, y in es])
dn = np.logical_not(d)
plot(es[d,0], es[d,1], 'r.');
plot(es[dn,0], es[dn,1], 'b.');
x = np.arange(0, 1.01, .01)
plot(x, np.sqrt(1-x*x), 'k');
subplot(1, 2, 2)
semilogx(ns, ps, '.-');
axhline(np.pi, color='r')
legend(['Simulated', 'True Value'], loc='best')
xlabel('Number of dots')
title('Simulation of $\pi$');
Given a random variable $\renewcommand{bs}{\boldsymbol} \renewcommand{var}{\text{var}} \renewcommand{std}{\text{std}} \renewcommand{cov}{\text{cov}} \tilde{r}$, its true statistics are referred as the population statistics:
We use $\bs r = [r_1, r_2, ..., r_n]^T$ to represent samples of $\tilde r$:
The samples themselves are random as well, we can define their statistics:
$\hat{u}$ is the mean of the samples:
$$\hat{\mu} = \frac{1}{n}\sum_{i=1}^n r_i = \frac{1}{n} \bs 1^T \bs r$$
$\hat u$ is an unbiased estimator of the population mean $\mu$:
$$\mathbb{E}[\hat{\mu}] = \frac{1}{n} \bs 1^T \bs 1 \mu = \mu $$
the variance of sample mean:
$$\var[\hat{\mu}] = \cov(\frac{1}{n} \bs 1^T \bs r, \frac{1}{n} \bs 1^T \bs r) = \frac{1}{n^2} \bs 1^T V \bs 1 = \frac{1}{n^2} \bs 1^T \sigma^2 I \bs 1 = \frac{\sigma^2}{n}$$
$\hat \sigma^2$ is the variance of the sample:
$$\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n (r_i - \hat{\mu})^2 = \frac{1}{n} (\bs r - \bs 1 \hat{\mu})^T (\bs r - \bs 1 \hat{\mu})$$
$\hat{\sigma}^2$ is an biased estimator of the population variance $\sigma^2$:
$$ \begin{array} \\ \mathbb{E}[\hat{\sigma}^2] &= \mathbb{E}\left[\frac{1}{n}\left(\bs r^T \bs r - 2 \bs r^T \bs 1 \hat \mu + \bs 1^T \bs 1 \hat \mu^2\right)\right] = \mathbb{E}\left[\frac{1}{n}\bs r^T \bs r - \hat \mu^2\right] \\ &= \mathbb{E}[\tilde r^2] - \mathbb{E} [\hat u^2] = (\sigma^2 + \mu^2) - (\frac{\sigma^2}{n} + \mu^2) = \frac{n-1}{n} \sigma^2 \end{array} $$
$\hat{s}^2 = \frac{n}{n-1}\hat{\sigma}^2$ is therefore an unbiased estimator of $\sigma^2$.
$$\hat s = \sqrt{\hat s^2} = \sqrt {\frac{1}{n-1}\sum_{i=1}^n (r_i - \hat{\mu})^2} $$
Is standard deviation is an unbiased estimator of $\sigma$?
$$ \mathbb{E}[\hat s] = \mathbb{E}[\sqrt{\hat s^2}] \le \sqrt{\mathbb{E}[\hat s^2]} = \sqrt{\sigma^2} = \sigma$$
$$\var[\hat s^2] = \frac{1}{n}(\mu_4 - \sigma^4 \frac{n-3}{n-1}) \approx \frac{\beta - 1}{n} \sigma^4$$
The MC error of $\hat \mu$ can be estimated as $\frac{\hat s}{\sqrt{n}}$:
The error of MC erorr is therefore $\std(\frac{\hat s}{\sqrt n}) = \frac{\std(\hat s)}{\sqrt n}$:
Use normality: $\hat s = \mu_s + \sigma_s \tilde \epsilon$, where $\tilde \epsilon \sim N(0, 1)$:
$$ \begin{array}{l} \mathbb{E}[\hat s^2] &= \mu_s^2 + \sigma_s^2 = \sigma^2 \\ \var[\hat s^2] &= \var[ 2\mu_s\sigma_s\tilde\epsilon + \sigma_s^2 \tilde \epsilon^2] = 4\mu_s^2\sigma_s^2 + 2\sigma_s^4 \approx \frac{(\beta-1)\sigma^4}{n} \end{array}$$
Note that $\var[\tilde \epsilon^2] = \mathbb{E}[\tilde \epsilon^4] - \mathbb{E}^2[\tilde \epsilon^2] = 2$ and $\cov[\tilde \epsilon, \tilde \epsilon^2] = 0$, then:
$$\sigma_s^4 - 2 \sigma^2 \sigma_s^2 + \frac{(\beta-1)\sigma^4}{2n} = 0$$
thus $\sigma_s = \std(\hat s) \approx \sigma\sqrt{\frac{\beta-1}{4n}}$, not even close to $\sqrt[4]{\var(\hat s^2)} \approx \sigma\sqrt[4]{\frac{\beta-1}{n}}$
Definition | Mean | Variance | |
---|---|---|---|
Population | $\tilde r$ | $\mu$ | $\sigma^2$ |
Sample mean | $\hat u = \frac{1}{n}\bs 1^T \bs r$ | $\mu$ | $\frac{\sigma^2}{n}$ |
(adjusted) Sample variance |
$\hat s^2 = \frac{1}{n-1} \sum_i (r_i - \hat \mu)^2 $ | $\sigma^2$ | ~$\frac{\beta - 1}{n} \sigma^4$ |
Sample stddev | $\hat s$ | ~$\sigma \sqrt{1-\frac{\beta-1}{4n}} $ | ~$\frac{\beta-1}{4n}\sigma^2$ |
$\sigma^2$ can be well approximated by the unadjusted sample variance:
$$\hat\sigma^2 = \frac{1}{n} \sum_i (r_i - \hat \mu)^2 = \frac{1}{n}\sum_i r_i^2 - \hat \mu^2 $$
Consider the example of computing $\pi$ using MC: $$\renewcommand{ind}{1{\hskip -2.5 pt}\hbox{l}}$$
$$\frac{\pi}{4} = \mathbb{E}\left[\ind\left(\sqrt{\tilde x^2 + \tilde y^2} < 1\right)\right]$$
where $\mathbb{\ind}(\cdot)$ is an indicator function, the population mean, variance and 4th moment are:
$$\begin{array}{l} \mu &= \mathbb{E}[\tilde \ind] = \frac{\pi}{4} \\ \sigma^2 &= \var[\tilde \ind] = \mathbb{E}[\tilde \ind^2] - \mu^2 = \frac{4\pi - \pi^2}{16} \\ \mu_4 &= \mathbb{E}\left[\left(\tilde \ind - \mu\right)^4\right] = \frac{\pi}{4}(1-\frac{\pi}{4})^4 + (1-\frac{\pi}{4})(\frac{\pi}{4})^4 \\ \beta &= \frac{\mu_4}{\sigma^4} = 2.933 \end{array} $$
The variance and error of sample statistics:
u = np.pi/4
v = u*(1-u)
u4 = np.pi/4*(1-np.pi/4)**4 + (1.-np.pi/4)*(np.pi/4)**4
ns = 2**(np.arange(1, 14))
var_u = v/ns
var_var = (u4 - v*v*(ns-3)/(ns-1))/ns
beta = u4/v**2
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
loglog(ns, np.transpose([var_u, var_var]), '.-')
legend(['var of $\hat \mu$', 'var of $\hat s^2$']);
xlabel('Number of random samples')
title('Sample Variance');
ax = fig.add_subplot(122)
loglog(ns, np.transpose([np.sqrt(var_u), np.sqrt((beta-1.)/4./ns)*np.sqrt(v)]), '.-');
title('Error Estimate')
legend(['std of mean', 'std of $\hat s$'], loc='best');
xlabel('Number of random samples');
Let's revisit the estimation of stock return's mean and volatility, suppose:
The historical mean estimator $\hat u = \frac{1}{n}\sum_i r_i $ has an error of:
The vol is estimated as the standard deviation $\hat s$ of the daily return $r_i$:
print .25/sqrt(2500*2)
print .25/sqrt(10)
0.00353553390593 0.0790569415042
Advanced MC techniques often involves dependent samples
Batching is a generic method to estimate MC errors with dependent samples
The MC example of $\pi$ is really to solve the integral of:
$$\int_0^1\sqrt{1-x^2}dx = \frac{\pi}{4}$$
However, MC shines because its convergence is dimension independent:
Dimension | Simpson Integration Points | Equivalent MC samples | Cost Ratio (MC/Simpson) |
---|---|---|---|
1 | $10$ | $10^{8} $ | $10^{7}$ |
5 | $10^5$ | $10^{8}$ | $1000$ |
10 | $10^{10}$ | $10^{8}$ | $0.01$ |
20 | $10^{20}$ | $10^{8}$ | $10^{-12}$ |
In Finance, we routinely run into very high dimensionalities, every source of randomness is a separate dimension:
Most numerical methods fail at high dimensionality
x = np.arange(0, 10)*.1
y = np.arange(0, 10)*.1
xm, ym = np.meshgrid(x, y)
figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(xm, ym, '.');
xlabel('x')
ylabel('y')
title('100 Grid Samples');
subplot(1, 2, 2)
xs = np.random.uniform(size=[2, 100])
plot(xs[0,:], xs[1,:], '.');
title('100 Random Samples');
xlabel('x')
ylabel('y');
Monte Carlo is effective in combating high dimensions:
$$ \frac{v(t_0)}{m(t_0)} = \mathbb{E}^\mathbb{Q}\left[\frac{v(t)}{m(t)} \bigg| \mathcal{F}_{t_0}\right] $$
There are good reasons why Monte carlo is widely used in quant finance:
Robert Coveyou: the generation of random numbers is too important to be left to chance.
True random number generators (T-RNG) observe random physical processess:
Pseudo random number generators (P-RNG) are computer software that
Building good P-RNG requires highly specialized skills:
random()
in Excel is implemented incorrectly, don't use it for anything seriousThe most popular P-RNG:
numpy.random.uniform
P-RNGs produces uniform random numbers bewteen [0, 1]
Given the cumulative distribution function $F(x)$, $\tilde x = F^{-1}(\tilde u)$ transforms a uniform $\tilde u$ to the desired distribution:
$$ \mathbb{P}[\tilde x < x]= \mathbb{P}[F^{-1}(\tilde u) < x] = \mathbb{P}[\tilde u < F(x)] = F(x) $$
Normal random number is the most important of all:
Box-muller is a classic method to transform independent uniform RNs to independent normal RNs:
numpy.random.normal
Numerical inversion of normal CDF is an effective alternative to genrate normal random numbers:
Normal inversion is often preferred over the Box-muller.
How to create correlated normal RNs from independent normals RNs?
Mark Twain: if you must be indiscrete, be discrete in your indiscretion
To simulate a stochastic differential equation (SDE) numerically, we have to discretize it first.
$s(t)$ is the true 1-D stochastic processes as defined by a SDE
$\hat s(t; \delta)$ is a discretization of $s(t)$
The discretization error are application specific:
European style of derivatives: $\mathbb{E}[f(s(t))]$
Path dependent derivatives: $\mathbb{E}[f(s(\bs \tau))]$
$$\mathbb{E}[s(t) - \hat s(t; \delta)] = O(\delta^n)$$
$$\mathbb{E}\left[\left\Vert s(\bs \tau) - \hat s(\bs \tau; \delta) \right\Vert\right] = O(\delta^n)$$
Generic multi-dimensional diffusion SDE:
$$ d \bs s(t) = \bs \mu(\bs s, t) dt + \Sigma(\bs s, t) d \bs w(t) $$
The main workhorse for MC applications in quant finance:
$$\renewcommand{hs}{\hat{\bs s}} \hs(t + \delta) = \hs(t) + \bs \mu(\hs, t) \delta + \Sigma(\hs, t) L\bs \epsilon\sqrt{\delta} $$
But Euler discretization is slow in convergence:
For processess with known analytical solutions, more efficient sampling method can be devised
The most common process to model asset prices:
$$\frac{dx(t)}{x(t)} = \mu dt + \sigma dw(t)$$
its analytical solution is:
$$x(t) = x(t_0) e^{(\mu - \frac{1}{2}\sigma^2)(t-t_0) + \sigma w(t)}$$
$$dx(t) = \kappa(\bar x - x(t)) dt + \sigma dw(t)$$
The analytical solution is $$\small x(t) = x(t_0)e^{-\kappa (t-t_0)} + \bar x (1 - e^{-\kappa (t-t_0)}) + \int_{t_0}^t\sigma e^{\kappa(s-t)}dw(s)$$
the Ito integral $e(t) = \int_{t_0}^t\sigma e^{\kappa(s-t)}dw(s)$ is normally distributed:
$$e(t) \sim N\left(0, \frac{\sigma^2}{2\kappa}(1-e^{-2\kappa (t-t_0)})\right)$$
the trajectory of $x(t)$ in the OU can be sampled exactly with arbitrary time steps by drawing $e(t)$ instead of $w(t)$:
Cox-Ingersoll-Ross (CIR) process:
$$dx(t) = \kappa(x(t) - \bar x) dt + \sigma \sqrt{x(t)} dw(t)$$
Euler discretization could lead to negative values for $x(t)$
A better solution to prevent CIR process from becoming negative is to apply the transformation $y(t) = \log(x(t))$:
$$\small dy(t) = \frac{1}{x(t)}\left(\kappa(x(t) - \bar x) - \frac{\sigma^2}{2} \right) dt + \frac{\sigma}{\sqrt{x(t)}}dw(t) $$
For a diffusion SDE of the form:
$$ dx_t = u(x_t)dt + \sigma(x_t) dw_t $$
Observe that both $u(x_t)$ and $\sigma(x_t)$ are stochastic through $x_t$:
$$ d u_s = (\cdot) ds + \dot u_s \sigma_s dw_s, \;\; d\sigma_s = (\cdot) ds + \dot \sigma_s \sigma_s dw_s $$
where $\dot u_s = \frac{du(x_s)}{dx_s}, \dot \sigma_s = \frac{d\sigma(x_s)}{dx_s}$.
All corrections are higher order than $d t$ except the $dw_s dw_t$ term (note $t < u < s < t+\delta t$):
$$ \delta x_t = u_t \delta t + \sigma_t \delta w_t + \int_t^{t+\delta t}\int_t^{s}\dot\sigma_u \sigma_s dw_u dw_s $$
The last correction term can be approximated as: $$ \small \begin{eqnarray} \sigma_t\int_t^{t+\delta t}\int_t^{s}\dot\sigma_u dw_u dw_s &\approx& \sigma_t\dot\sigma_u \int_t^{t+\delta t}\int_t^{s} dw_u dw_s = \sigma_t\dot\sigma_t \int_t^{t+\delta t} (w_{s}-w_t)dw_s \\ &=& \sigma_t\dot\sigma_t \left(\int_t^{t+\delta t} w_{s}dw_s - \int_t^{t+\delta t}w_t dw_s\right) \\ &=& \sigma_t\dot\sigma_t \left(\int_t^{t+\delta t} w_{s}dw_s - w_t(w_{t+\delta t} - w_t)\right) \end{eqnarray}$$
Note that $\int w_t dw_t = \frac{1}{2}(w_t^2 - t)$, the term in the bracket becomes: $$ \small \frac{1}{2} (w_{t+\delta t}^2 - w_t^2) - \frac{1}{2}\delta t - w_t(w_{t+\delta t} - w_t) = \frac{1}{2}(w_{t+\delta t} - w_t)^2 - \frac{\delta t}{2} \approx \frac{1}{2}(\delta^2w - \delta t) $$
This term vanishes for infinitesimal $dt$ but not for finited sample size $\delta t$.
Therefore, if we sample $\delta w_t = \tilde z \sqrt{\delta t}$, we add the following correction term:
$$\delta x = u(t) \delta t + \sigma(t) \sqrt{\delta t} \tilde z + \frac{1}{2}\sigma \dot \sigma (\tilde z^2 - 1) \delta t $$
Consider a one-dimensional OU process, and its Euler discretization:
$$\small \begin{array} \\ dx(t) &= \kappa(\bar x - x(t)) dt + \sigma dw(t) \\ x(t_{i+1}) &= x(t_i) + \kappa(\bar x - x(t_i)) \delta + \sqrt{\delta} \sigma \epsilon_i \\ &= (1 - \kappa \delta) x(t_i) + \kappa \bar x \delta + \sqrt{\delta} \sigma \epsilon_i \\ \mathbb{E}[x(t_{i+1})] &= (1 - \kappa \delta) \mathbb{E}[x(t_i)] + \kappa \bar x \delta \end{array}$$
the expectation explodes if the $| 1 - \kappa \delta | > 1$, i.e., $\delta > \frac{2}{\kappa}$
import proc
u = .05
k = 10.
dt = 2/k
n = int(20./dt)
sigma = .02
ou = proc.OU(k=k, u=u, sig=sigma)
es = np.random.normal(size=[1, n])
et = .01
figure(figsize=[12, 4]);
subplot(1, 2, 1)
dt = 2/k - et
paths = ou.draw(es, u, dt)
plot(np.arange(n)*dt, paths.T)
xlabel('Time(Y)', fontsize=14);
title('Euler sample path with $\delta = %g, \kappa=%g$' % (dt, k), fontsize=16);
subplot(1, 2, 2)
dt = 2/k + et
paths = ou.draw(es, u, dt)
plot(np.arange(n)*dt, paths.T)
xlabel('Time(Y)', fontsize=14);
title('Euler sample path with $\delta = %g, \kappa=%g$' % (dt, k), fontsize=16);
Suppose we draw $n$ independent paths of $\hat s(\bs \tau; \delta)$, where $\bs \tau$ consists of $m$ discrete time samples.
There are two types of errors in Monte Carlo simulation of SDEs:
The two types of errors are independent, thus the total variance is:
$$ \epsilon^2 = c_1 \epsilon_d^2 + c_2\epsilon_{s}^2 = \frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}$$
Given a fixed computation budget of $mn = c$, the best allocation between $m$ and $n$ can be found by solving:
$$\begin{array}{c} \min_{m, n} \left(\frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}\right) \\ \text{subject to: } mn = c \end{array}$$
apply the Lagrange multiplier:
$$\small \begin{array}{l} l &= \left(\frac{c_1}{m^{2\gamma}} + \frac{c_2}{n}\right) + \lambda(mn - c) \\ \frac{\partial l}{\partial m} &= -c_1\frac{2\gamma}{m^{2\gamma+1}} + \lambda n = 0 \\ \frac{\partial l}{\partial n} &= -c_2\frac{1}{n^2} + \lambda m = 0 \\ \lambda mn &= \frac{2c_1\gamma}{m^{2\gamma}} = \frac{c_2}{n} \end{array}$$
therefore the optimal strategy is to maintain $\frac{m^{2\gamma}}{n} = \frac{2c_1\gamma}{c_2}$
Recall Euler discretization is of 1st order weak convergence:
$$\mathbb{E}[f(\hs(t; \delta))] = \mathbb{E}[f(\bs s(t)] + c \delta + O(\delta^2)$$
Richard extraploation can be applied to get 2nd order weak convergence:
$$\begin{array}{l} \mathbb{E}[f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + 2c \delta + O(\delta^2) \\ 2\mathbb{E}[f(\hs(t; \delta))] - \mathbb{E}[f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + O(\delta^2) \\ \mathbb{E}[2f(\hs(t; \delta)) - f(\hs(t; 2\delta))] &= \mathbb{E}[f(\bs s(t)] + O(\delta^2) \end{array}$$
Therefore the estimator $2f(\hs(t; \delta)) - f(\hs(t; 2\delta))$ is 2nd order weak convergence.
The variance of the estimator with Richardson extrapolation is:
$$\small \begin{array}{l} \var[2f(\hs(t; \delta)) - f(\hs(t; 2\delta))] \\ = \var\left[2f(\hs(t; \delta))\right] + \var\left[f(\hs(t; 2\delta))\right] - 2 \cov\left[2f(\hs(t; \delta)), f(\hs(t; 2\delta))\right] \\ = 4\sigma_\delta^2 + \sigma_{2\delta}^2 - 2\rho\sigma_\delta\sigma_{2\delta} \gtrsim 3\sigma^2 \end{array}$$
The following implementation maximizes the correlation between $f(\hs(t; \delta))$ and $f(\hs(t; 2\delta))$:
Lewis Carroll: it's a poor sort of memory that only works backwards.
Early exercise (American or Bermudan option):
The value of American put at time $t_i$ is: $$ v(t_i) = \max\left(\left(k-s\left(t_i\right)\right)^+, \mathbb{E}\left[v(t_{i+1})e^{-r\delta}\biggr|\mathcal{F}_{t_i}\right]\right)$$
Similarly, the value of American call at time $t_i$ is: $$ v(t_i) = \max\left(\left(s(t_i)-k\right)^+, \mathbb{E}\left[v(t_{i+1})e^{-r\delta}\biggr|\mathcal{F}_{t_i}\right]\right)$$
these equations are commonly known as the Bellman equation.
Is it optimal to early exercise?
The optimal exercise boundary of American put is a function of stock price and time: $b(s(t), t)$
The american option's value can be expressed as an optimization of:
$$ v(b^*(t)) = \max_{b(t)} [v\left(b\left(t\right)\right)] $$
This types of free boundary problem poses a serious challenge to MC
Before 2000, it was generally accepted that MC cannot solve these free boundary problems.
We draw 100,000 stock paths to 1Y with time grid $\delta = 0.01$ from the GBM:
print 1e5*100*8/1024/1024
76.2939453125
t, m, r, s0, k, vol = 1., 52, .1, 100, 100, .25
dt = t/m
n = 400000
ts = np.arange(1, m+1)*dt
gbm = proc.GBM(r, vol)
spath = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
spath2 = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
eputs = np.maximum(k - spath[:,-1], 0)*np.exp(-r*t)
u = np.mean(eputs)
e = np.std(eputs)/sqrt(n)
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(ts, spath[:100].T);
xlabel('Time(Y)')
title('First 100 simulated stock path');
ax = fig.add_subplot(122)
ax.set_axis_off()
ax.text(0, .5, "Europan put price is %.2f, MC err %.2g" % (u, e), size="xx-large");
ax.text(0, .6, "True value is %.2f" % (5.46), size="xx-large");
Given the simulated path, one may propose the following simple method to price American option:
$$v(0) = \frac{1}{n} \sum_i e^{-rt^*_i}\left(k-s(t^*_i)\right)^+$$
This algorithm is easy to implement, but ...
import fmt
disc = np.exp(-r*ts)
to = [np.argmax(np.maximum(k-p, 0)*disc) for p in spath]
v = np.mean([np.max(np.maximum(k-p, 0)*disc) for p in spath])
ne = 1000
te, se = map(np.array, zip(*[((o+1)*dt, spath[i, o]) for i, o in enumerate(to[:ne])]))
me = np.argmin(se)
df = pd.DataFrame(np.array([v, truv]), index=['Pathwise Optimal', 'True Value'], columns=['American Put'])
fmt.displayDF(df.T, "3f", 4)
Pathwise Optimal | True Value | |
---|---|---|
American Put | 12.304 | 6.523 |
figure(figsize=[12, 4])
subplot(1, 2, 1)
nidx = np.array([0, 1, 2, me])
plot(ts, spath[nidx].T)
plot(te[nidx], se[nidx], 'o')
xlabel('Time (Y)')
title('Pathwise optimal exercise')
subplot(1, 2, 2)
plot(te, se, "o");
xlabel('Time (Y)')
title('Exercise points');
the reason is that it uses the future information, i.e., with supernatural ability:
Least square is an important breakthrough in MC, it is the method of choice for early exercise.
We present it using a numerical example of American put:
import mc
put_ev = lambda x : np.maximum(k - x, 0)
v_b, e_b, b_b, cvs = mc.amer_opt(spath, exp(-r*dt), put_ev, mc.fit_ex)
v_u, e_u, b_u, _ = mc.amer_opt(spath, exp(-r*dt), put_ev, mc.opt_ex)
The key idea of LSMC is to work backwards using the stored paths,
First, initialize the value at maturity $t_m$ as $\bs v(t_m) = \left(k - \hs(t_m)\right)^+$
idx = -1
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
plot(ts, spath[:20, :].T, label=None)
axvline(ts[idx], lw=5, label=None)
plot(ts[idx], k - b_b[idx], 'r>', ms=15)
xlabel('Time', fontsize=16)
title('Stock Paths', fontsize= 18)
subplot(1, 2, 2)
plot(spath[:500,idx], cvs[:500, idx], '.')
xlim(40, 120)
ylim(-10, 60)
plot(k, 0, 'r>', ms=15)
xlabel('Stock price $s(t = %g)$' % ts[idx], fontsize=16)
title('Payoff $v(t = %g)$' % (ts[idx]), fontsize=18);
truv = 6.523
def show_lsmc_step(idx, ne=500) :
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
plot(ts, spath[:20, :].T, label=None)
plot(ts[idx], k - b_b[idx], 'r>', ms=15, label=None)
axvline(ts[idx], lw=2)
xlabel('Time', fontsize=16)
ylabel('Price', fontsize=16)
title('Stock Paths', fontsize=18)
legend(['LSMC fit time'], loc='best')
subplot(1, 2, 2)
s = spath[:, idx]
cv = cvs[:, idx+1]
u = mc.fit_u(s, cv, put_ev(s))
sd = np.arange(10., 120., 1)
itm = np.greater(k, s[:ne])
otm = np.greater(s[:ne], k)
plot(s[:ne][itm], cv[:ne][itm], 'b.');
plot(s[:ne][otm], cv[:ne][otm], 'g.');
plot(sd, u(sd), 'r')
plot(sd, np.maximum(k-sd, 0), 'k')
plot(k - b_b[idx], b_b[idx], 'r>', ms=15)
legend(['in the money', 'out of the money', 'LS fit', 'exer value']);
xlim(40, 120)
ylim(-10, 60)
xlabel('Current stock price $s(t=%.2f)$' % (ts[idx]), fontsize=16);
ylabel('Continuation value', fontsize=16)
title('Least Square Fit @ t=%.2f' % (ts[idx]), fontsize=18);
show_lsmc_step(-2)
show_lsmc_step(-3)
show_lsmc_step(-20)
show_lsmc_step(-40)
show_lsmc_step(0)
We reached the first time step $t=\delta$, where the option value can be computed as the average of discounted $\bs v(t=\delta)$, i.e., $v =\frac{1}{n} e^{-r\delta}\bs 1^T \bs v(t_0)$.
import fmt
df = pd.DataFrame(np.array([v_b, truv]), index=['LSMC MC', 'True Value'], columns=['American Put'])
fmt.displayDF(df.T, "3f", 3)
LSMC MC | True Value | |
---|---|---|
American Put | 6.529 | 6.523 |
at time $t_{m-1}$: $\hs(t_{m-1})$ and the corresponding c.v. $\bs v(t_m)e^{-r\delta}$ are known:
construct a smooth function $u(s; t_{m-1})$ that minimizes
$$\left\Vert \ind\left(\hs(t_{m-1}) < k\right) \odot \left( u\left(\hs(t_{m-1}); t_{m-1}\right) - \bs v(t_m)e^{-r\delta} \right)\right\Vert_2 $$
$\odot$ is element wise vector multiplication.
plot(ts, k-b_b, '.-')
xlabel('Time', fontsize=16)
title('Exercise boundary', fontsize=18);
Exercise boundary is defined by:
$$ u(s; t_{m-1}) = c_0 + c_1 s + c_2 s^2 < k - s $$
One would think LSMC is biased low because of the inaccurate exercise boundary, is it really?
Consider the following two ways of pricing American option:
we can conclude the folllowing (barring the discretization and MC errors):
Therefore, 1B is not necessarily a lower bound, due to the offsetting effects.
def fixed_b(s, cv, ev) :
fixed_b.idx -= 1
return np.greater(ev, fixed_b.b[fixed_b.idx])
fixed_b.b = b_b
fixed_b.idx = -1
v_a, e_a, b_a, _ = mc.amer_opt(spath2, exp(-r*dt), put_ev, fixed_b)
Given that 1A is a reliable lower bound, it is useful to find an upper bound:
We present a simple upper bound that is the most useful when the exercise boundary has simple topology.
A more precise upper bound can be found by solving for the optimal exercise boundary $b^*(t)$ by going backwards from the simulated paths:
This results in an upper bound because we did use the future information when solving for $b^*(t)$.
fig = figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(ts, k-np.transpose([b_a, b_u]), '.-')
xlabel('Time(Y)')
title('Exercise Boundary')
legend(['LSMC', 'Optimize $b^*$'], loc='best');
ylim(75, 100);
ax = fig.add_subplot(122)
ax.set_axis_off()
ax.text(0, .6, "Option value is bounded by [%.3f, %.3f]" % (v_a, v_u), size="xx-large")
ax.text(0, .5, "LSMC (1B) value is %.3f" % v_b, size="xx-large");
ax.text(0, .4, "True value is %.3f" % truv, size="xx-large");
The path-wise LSMC values are dependent
To estimate the exact LSMC error, we have to use batching
def lsmc_batch(b, n) :
bvs = []
errs = []
spath2 = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
disc1p = exp(-r*dt)
for i in range(b) :
spath = gbm.draw(np.random.normal(size=[n, m]), s0, dt)
v_b, e_b, b_b, _ = mc.amer_opt(spath, disc1p, put_ev, mc.fit_ex)
v_u, e_u, b_u, _ = mc.amer_opt(spath, disc1p, put_ev, mc.opt_ex)
fixed_b.b = b_b
fixed_b.idx = len(b_b)
v_a, e_a, b_a, _ = mc.amer_opt(spath2, disc1p, put_ev, fixed_b)
bvs.append((v_a, v_b, v_u))
errs.append((e_a, e_b, e_u))
spath2 = spath
return np.array(bvs), np.array(errs)
bvs, errs = lsmc_batch(10, n)
import fmt
import pandas as pd
df = pd.DataFrame(np.concatenate((bvs.T, errs.T)),
columns = ['Batch %d' % (i+1) for i in range(10)],
index=['LSMC - LB', 'LSMC', 'LSMC - UB',
'err - LB', 'LSMC err ', 'err - UB'])
sd = np.std(df.T)
u = np.mean(df.T)
df['Batch Mean'] = u
df['Batch Std'] = sd
idx = [0, 1, 2, 4]
fmt.displayDF(df.iloc[idx, :], "3f", 2)
Batch 1 | Batch 2 | Batch 3 | Batch 4 | Batch 5 | Batch 6 | Batch 7 | Batch 8 | Batch 9 | Batch 10 | Batch Mean | Batch Std | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LSMC - LB | 6.486 | 6.514 | 6.473 | 6.516 | 6.507 | 6.543 | 6.493 | 6.499 | 6.496 | 6.503 | 6.503 | 0.018 |
LSMC | 6.526 | 6.490 | 6.528 | 6.519 | 6.552 | 6.504 | 6.509 | 6.509 | 6.518 | 6.522 | 6.518 | 0.016 |
LSMC - UB | 6.569 | 6.531 | 6.568 | 6.561 | 6.600 | 6.545 | 6.554 | 6.552 | 6.561 | 6.566 | 6.561 | 0.017 |
LSMC err | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.012 | 0.000 |
The main application is exotic options whose exercise decisions depend on multiple factors.
Recommented reading:
Homework: