Lecture 10: Entropy and Allocation¶

Topics:

Maximum entropy method
- Applications
Risk capital allocation
- [optional] Constrained Aumann-Shapley

$$\renewcommand{ind}{1{\hskip -2.5 pt}\hbox{l}} \renewcommand{v}{\text{VaR}} \renewcommand{rc}{\text{rc}} $$

Maximum Entropy¶

Entropy: noun, lack of order or predictability

Albert Einstein: information is not knowledge

A model for information¶

What is the information $i(p)$ when observing an event with probability $p$?

$i(p) \ge 0$: information never decrease
$i(1) = 0$: observing a certain event adds no information
$i(p_1p_2) = i(p_1) + i(p_2)$: information from observing independent events are additive

The only functional form that satisfies these conditions is $i(p) = - \log(p)$

only meaningful in relative sense
any base of logarithm would work, as long as it is consistent

Shannon's information entropy¶

Expected information from observing a discrete event of $\mathbb{P}[\tilde{x} = x_k] = p_k$:

$$ h(\tilde x) = - \sum_k p_k \log(p_k) $$

For a continuous random variable $\tilde x$ with PDF $p(x)$:

$$ h(\tilde x) = - \int_{-\infty}^\infty p(x) \log\left(p\left(x\right)\right) dx $$

Information entropy $h(\tilde x)$ can also be interpreted as:

the disorder (lack-of-information) in the distribution

why this is not a good model for knowledge?

Entropy in coin toss¶

Consider the toss of fair and unfair coins:

Coin	$\mathbb{P}[H]$	$\mathbb{P}[T]$	Entropy	Information in distribution	Information gain from observation
Fair	$\frac{1}{2}$	$\frac{1}{2}$	1	none	maximum
Unfair	$\frac{1}{4}$	$\frac{3}{4}$	0.81	more	less
Robbery	0.01	0.99	0.08	a lot more	a lot less
Two Heads	1	0	0	maximum	none

here the entropy is computed using $\log_2(\cdot)$

Underdetermined problem¶

An optimization problem with more variables than constraints:

has infinite number of solutions
requires an objective function for a unique solution
common problem in quant finance
- finding implied distribution from few liquid market prices
- curve building

In (early) literature, various ad hoc objective functions are used:

e.g.: sum of squares of first and second order derivatives

Ignorance is strength¶

Disorder (or the lack of information) in a distribution is highly desirable:

more uncertainty in outcome, leaves open more possibilities
free from the contamination of irrelevant and artificial restrictions
the distribution is smooth and well behaved
more difficult to arbitrage against

Max entropy is the ideal objective for finding implied distributions:

much better than ad hoc smoothness constraints
invokes the higher principles of the information theory

Maximum Entropy Optimization¶

For a discrete distribution of $\mathbb{P}(\tilde{x} = x_i) = p_k$, the maximum entropy optimization in vector form: $$\renewcommand{bs}{\boldsymbol}$$ $$ \max \left(-\bs p^T \log (\bs p) \right) $$

subj. to $$ \begin{array}{l} \bs 1^T \bs p & = 1 \\ A \bs p &= \bs b \end{array}$$

all analytical functions are applied element-wise
- e.g. $\exp(\bs p), \log(\bs q)$ are column vectors
- $\small \frac{\partial}{\partial \bs p} \left(\bs p^T \log(\bs p)\right) = \log(\bs p^T) + \bs 1^T \iff \frac{\partial}{\partial p_k} \left(\sum_i p_i \log(p_i)\right) = \log(p_k) + 1$
we only consider linear constraints, tractable and adequate in practice

The continuous version can be expressed similarly with integrals.

Uniform distribution¶

Uniform distribution has the maximum entropy among all distributions.

without additional information, we have to assume the coin is fair

Apply the Lagrange multiplier:

$$ \begin{array} \\ l &= - \bs p^T \log(\bs p) - \lambda (\bs 1^T \bs p - 1) \\ \frac{\partial l}{\partial \bs p} &= -\log(\bs p^T) - \bs 1^T - \lambda \bs 1^T = \bs 0^T \\ \log(\bs p) &= -(1+\lambda) \bs 1 \end{array}$$

therefore, $\bs p$ is a uniform distribution

Normal distribution¶

Normal distribution has the maximum entropy with given mean and variance

This explains the ubiquity of the normal distribution
Knowing only mean and variance, we have to assume the distribution is normal

Apply the Lagrange multiplier: $$\renewcommand{intr}{\int_{-\infty}^{\infty}}$$ $$\small \begin{array} \\ l &= & - \intr p(x) \log(p(x)) dx + \lambda_1 (\intr p(x) dx - 1) \\ & & + \lambda_2 (\intr x p(x) dx - \mu) + \lambda_3 (\intr (x - \mu)^2 p(x) dx - \sigma^2) \\ & = & \intr \left(-p\log(p) + \lambda_1 p + \lambda_2 xp + \lambda_3(x-\mu)^2 p\right) dx - \lambda_1 - \mu\lambda_2 - \sigma^2 \lambda_3 \\ & = & \intr g dx - \lambda_1 - \mu\lambda_2 - \sigma^2\lambda_3 \\ \frac{\partial g}{\partial p} &= &-\log(p) - 1 + \lambda_1 + \lambda_2x + \lambda_3(x-\mu)^2 = 0 \\ p(x) &= & \exp(\lambda_3(x-\mu)^2 + (\lambda_1 -1) + \lambda_2 x) \end{array}$$

Therefore we must have: $\lambda_3 < 0, \lambda_2 = 0$, otherwise $p(x)$ explodes.

Exponential distribution¶

Exponential distribution has the maximum entropy for a positive random variable with a given expectation

Without addtional information, we have to assume all survival times are exponetially distributed.

Apply the Lagrange multiplier:

$$\renewcommand{intp}{\int_0^{\infty}} \small \begin{array} \\ l &= - \intp p(x) \log(p(x)) dx + \lambda_1 (\intp p(x) dx - 1) + \lambda_2 (\intp x p(x) dx - \mu) \\ & = \intp \left(-p\log(p) + \lambda_1 p + \lambda_2 xp \right) dx - \lambda_1 - \mu\lambda_2 = \intp g dx - \lambda_1 - \mu\lambda_2 \\ \frac{\partial g}{\partial p} &= -\log(p) - 1 + \lambda_1 + \lambda_2x = 0 \\ p(x) &= \exp((\lambda_1 -1) + \lambda_2 x) \end{array}$$

Therefore $\lambda_2 < 0$, otherwise $p(x)$ explodes

Numerical examples¶

In [31]:

x = np.arange(0, 10, .05)
a = np.array([x])
u = np.array([1.])
e = np.array([0])
q = np.ones(np.size(x))
dual = me.MaxEntDual(q, a, u, e)

res = minimize(dual.dual, np.zeros(len(u)), jac=dual.grad, method="BFGS")

figure(figsize=[12, 8])
subplot(2, 2, 1)

plot(x, dual.dist(res.x));
title('$x>0, \mathbb{E}[x]=1$');

subplot(2, 2, 3)
x = np.arange(-13., 13., .01)
a = np.array([x, x*x])
u = np.array([0., 1.])
e = np.array([0., 0.])
q = np.ones(np.size(x))/len(q)
dual = me.MaxEntDual(q, a, u, e)

res = minimize(dual.dual, np.zeros(len(u)), jac=dual.grad, method="BFGS")
semilogy(x, dual.dist(res.x))
xlim(-8, 8)
ylim(1e-16, 1e-2)
title('$\mathbb{E}[x] = 0, \mathbb{E}[x^2] = 1$');

subplot(2, 2, 4)
a = np.vstack([a, x*x*x*x])
u = np.append(u, 20)
e = np.append(e, 0)
dual = me.MaxEntDual(q, a, u, e)

res = minimize(dual.dual, np.zeros(len(u)), jac=dual.grad, method="BFGS")
semilogy(x, dual.dist(res.x))
xlim(-8, 8)
title('$\mathbb{E}[x] = 0, \mathbb{E}[x^2] = 1, \mathbb{E}[x^4] = 20$');

subplot(2, 2, 2)
a = np.vstack([a, x*x*x])
u = np.append(u, -4)
e = np.append(e, 0)
dual = me.MaxEntDual(q, a, u, e)

res = minimize(dual.dual, np.zeros(len(u)), jac=dual.grad, method="BFGS")
plot(x, dual.dist(res.x))
xlim(-8, 4)
title('$\mathbb{E}[x] = 0, \mathbb{E}[x^2] = 1, \mathbb{E}[x^3] = -4, \mathbb{E}[x^4] = 20$');

we will discuss the details of numerical implementation later

Cross entropy¶

Cross entropy is a measure of incremental information in a distribution $\bs p$ in relative to a prior distribution $\bs q$;

Also known as the Kullback–Leibler distance

With a discrete prior $\mathbb{P}[\tilde{x} = x_k] = q_k$ and posterier $p_k$ :

$$ h(\bs p | \bs q) = -\sum_k p_k \log(p_k) + \sum_k p_k \log(q_k) = - \sum_k p_k \log\left(\frac{p_k}{q_k}\right) $$

Similarly with a continuous prior $q(x)$ and posterier $p(x)$ :

$$ h\left(p(x) | q(x)\right) = - \intr p(x) \log \left(\frac{p(x)}{q(x)}\right) dx $$

The smaller the cross entropy is, the more incremental information in $\bs p$ from $\bs q$

Properties of cross entropy¶

Cross entropy $h(\bs p| \bs q) = - \bs p^T(\log(\bs p) - \log(\bs q))$:

a measure of the lack of incremental information in $\bs p$ from $\bs q$:
reduces to regular entropy when the prior $\bs q$ is uniform
maximized with value 0 when $\bs p = \bs q$.
- no additional information
- Proof: apply the Lagrange multiplier
$h(\bs p| \bs q) = -\infty$ if $p_k > 0$ for some $q_k = 0$
- new discovery adds infinite amount of new information
finite if $p_k = 0$ for some $q_k > 0$
- not much value for disapproving an existing theory
Prior and posterior distributions are asymmetric $h(\bs p | \bs q) \ne h(\bs q | \bs p)$
- Kullback–Leibler distance is actually a misnomer

Prior beliefs in the market¶

Prior beliefs are common in the market:

e.g., stock returns are normally distributed

Maximizing cross entropy is an ideal objective function to capture prior beliefs

Only introduce minimal perturbation to the prior beliefs $\bs q$
While incorporating additional constraints to the distribution $\bs p$

Cross entropy optimization with bid/ask¶

Take the discrete form of cross entropy:

$$ \text{argmax}_{\bs p} \left(- \bs p^T \left(\log(\bs p) - \log(\bs q)\right) - \frac{1}{2} \bs e^T W \bs e \right) $$

subj. to : $$\begin{array}\\ \bs 1^T \bs p & = 1 \\ A \bs p &= \bs b + \bs e \end{array}$$

$A$ is a matrix that computes benchmark prices from the distribution
$\bs b$ is the observed mid price of benchmark instruments
$\bs e$ is the pricing error to the mid price
$W$ is a diagonal penalty matrix, we usually choose $W^{-1} = \alpha E$
- where $E$ is a diagonal matrix of bid/ask spreads,
- $\alpha$ controls the trade off between fit quality and entropy

This problem is difficult to solve due to the dimensionality and constraints.

If you have a hammer ...¶

The Lagrange multiplier has served us well so far, let's nail it:

$$\small \begin{array} \\ l &= - \bs p^T \left(\log(\bs p) - \log(\bs q)\right) - \frac{1}{2} \bs e^T W \bs e - \bs u^T(A\bs p - \bs b - \bs e) - v (\bs 1^T \bs p - 1) \\ \frac{\partial l}{\partial \bs p} &= \log(\bs q^T) -\log(\bs p^T) - \bs 1^T - \bs u^T A - v \bs 1^T = 0 \\ \log(\bs p^*) &= \log(\bs q) - A^T \bs u - \bs 1 - v \bs 1 \\ \end{array}$$

$\bs p^*$ is the optimal solution.
we haven't solved the problem, as $\bs e, \bs u, v$ remain unknown

but the dimensionality of $l$ is reduced, by plugging in $\bs p^*$:

$$\begin{array} \\ l &= - \bs p^{*T} \left( - A^T \bs u - \bs 1 - v \bs 1 \right) - \frac{1}{2} \bs e^T W \bs e - \bs u^T(A\bs p^* - \bs b - \bs e) - v (\bs 1^T \bs p^* - 1) \\ &= \bs p^{*T} \bs 1 - \frac{1}{2} \bs e^T W \bs e + \bs u^T \bs b + \bs u^T \bs e + v \\ \end{array} $$

What to do with $\bs e$? note $\bs p^*$ does not depend on $\bs e$:

$$ \frac{\partial l}{\partial \bs e} = - \bs e^T W + \bs u^T = \bs 0^T \iff \bs e = W^{-1} \bs u $$

plug $\bs e$ back to $l$:

$$ l = \bs q^T \exp(-A^T \bs u - 1 - v) + \frac{1}{2} \bs u^T W^{-1} \bs u + \bs u^T \bs b + v $$

do the same to $v$:

$$\begin{array} \\ \frac{\partial l}{\partial v} &= - \bs q^T \exp(-A^T \bs u - 1 - v) + 1 \\ &= - \bs q^T \exp(-A^T \bs u) \exp(- 1 - v) + 1 = 0 \\ v &= \log \left(\bs q^T \exp(-A^T \bs u)\right) - 1 \\ \end{array}$$

plug $v$ in to $l$:

$$\begin{array} \\ l &= 1 + \frac{1}{2} \bs u^T W^{-1} \bs u + \bs u^T \bs b + \log \left(\bs q^T \exp(-A^T \bs u)\right) - 1 \\ &= \log \left(\bs q^T \exp(-A^T \bs u)\right) + \frac{1}{2} \bs u^T W^{-1} \bs u + \bs u^T \bs b \end{array}$$

Now $\bs u$ are the only unknowns, this $l$ is much easier to minimize.

Dual problem¶

	Original Problem	Dual Problem
Objective	$ \scriptsize - \bs p^T \left(\log(\bs p) - \log(\bs q)\right) - \frac{1}{2} \bs e^T W \bs e $	$ \scriptsize \log \left(\bs q^T \exp(-A^T \bs u)\right) + \frac{1}{2} \bs u^T W^{-1} \bs u + \bs u^T \bs b$
Extreme type	maximum	minimum
Solve for	$\bs p$: high dimension	$\bs u$: low dimension
Constraints	$ \bs 1^T \bs p = 1, A \bs p = \bs b + \bs e$	None
Convex	Yes	Yes

The dual problem is much easier to solve
- the gradient is analytical, suitable for gradient descent
A solution is guaranteed to exist if $W$ is diagonal and positive

Once we find the optimal $\bs u^*$ that minimizes the dual objective:

$$ \bs p^* = \frac{\bs q \odot \exp(- A^T \bs u^*)}{\bs q^T \exp(- A^T \bs u^*)} $$

$\odot$ is element wise multiplication

Maximum Entropy Applications¶

Volatility skew¶

In the option market, OTM calls command higher implied volatility than the ATM/ITM calls.

Suppose we only have the following three liquid 5M S&P500 index option:

In [32]:

from inst import BlackScholes
from scipy.stats import norm
import fmt

def optionCons(strikes, s) :
    a = [np.maximum(s-k, 0) for k in strikes]
    return np.array(a)


iv = np.array([.1594, .1481, .1383])

s = 1652.32
r = 0.
t = 5./12

ks = np.array([1611, s, 1694])

cs = np.array([BlackScholes.callPrice(0, s, k, t, v) for k, v in zip(ks, iv)])
d = np.array([BlackScholes.callDelta(0, s, k, t, v) for k, v in zip(ks, iv)])

df_opt = pd.DataFrame(np.array([cs, iv*100, d]).T, columns = ["Call Price", "Implied Vol(%)", "Delta"], index=ks)
fmt.displayDF(df_opt, "4g")

x = s*np.exp(np.arange(-1.5, 3, .001))
q = np.ones(len(x))
qn = norm.pdf((np.log(x/s)+.5*iv[1]*iv[1]*t)/(iv[1]*np.sqrt(t)))
qn = qn/sum(qn)

a = optionCons(ks, x)
e = np.ones(len(ks))*.0

mks = np.array([1322,  1487,  1570,  1611,  1652,  1694,  1735, 1818, 1982])
mvs = np.array([.2319, .1895, .1687, .1594, .1481, .1383, .1289, .1176, .1125])

	Call Price	Implied Vol(%)	Delta
1,611	89.62	15.94	0.617
1,652	62.99	14.81	0.5191
1,694	41.03	13.83	0.4073

How to price options of other strikes?

Entropy optimizaiton setup¶

We sample the future stock price distribution discretely as: $p_i = \mathbb{P}[s(t = 5M) = s_i]$, where $s_i$ is a discretization of the stock prices.

Each observed option price $v(k)$ becomes a liner constraint in $p_i$:

$$ b(t) \mathbb{E}[(s(t)-k)^+] = b(t) \sum_i p_i (s_i - k)^+ = v(k) $$

The machinery of the dual maximum entropy setup is then applied to find the implied distribution of $\bs p^*$
Option prices of all strikes are then known from $\bs p^*$
The discounting $b(t)$ is ignored as the time horizon is very short.

Maximum entropy solution¶

The ME optimization results without prior does not make much sense even though it exactly reprices the observed prices.

In [33]:

dual = me.MaxEntDual(q, a, cs, e)
res = minimize(dual.dual, np.zeros(len(cs)), jac=dual.grad, method="BFGS")
po = dual.dist(res.x)

kd = arange(80., 130., 1.)/100*s
cd = np.array([po.dot(np.maximum(x-k, 0)) for k in kd])
vd = [BlackScholes.compImpliedVolFromCall(0., s, k, t, c) for k, c in zip(kd, cd)]
#dd = [BlackScholes.callDelta(0, s, k, 1, v) for k, v in zip(kd, vd)]

figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(x, dual.dist(res.x))
xlabel('Strike')
title('ME Distribution of Prices')
xlim(1200,2400)

subplot(1, 2, 2)
plot(kd, vd, 'g-')
plot(mks, mvs, 'k>')
plot(ks, iv, 'ro')
xlim(1200, 2200)
xlabel('Strike')
title('Implied Vol');
legend(['Max Entropy', 'Market']);

Use the prior belief¶

We can take advantage of the prior belief that the stock return is normal, and use the observed ATM vol in the prior.

The resulting distribution and volatility skew are reasonble
The whole vol skew curve is inferred from three option prices
The resulting option prices are very close to the actual market

In [34]:

dual = me.MaxEntDual(qn, a, cs, e)
res = minimize(dual.dual, np.zeros(len(cs)), jac=dual.grad, method="BFGS")
po = dual.dist(res.x)

cd = np.array([po.dot(np.maximum(x-k, 0)) for k in kd])
vd = [BlackScholes.compImpliedVolFromCall(0, s, k, t, c) for k, c in zip(kd, cd)]
#dd = [BlackScholes.callDelta(0, s, k, 1, v) for k, v in zip(kd, vd)]

figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(x, qn)
plot(x, dual.dist(res.x))
xlabel('Strike')
title('Max Entropy Distribution')
legend(['Prior', 'Max Entropy'])
xlim(1200,2600)

subplot(1, 2, 2)
axhline(iv[1])
plot(kd, vd, 'g-')
plot(mks, mvs, 'k>')
plot(ks, iv, 'ro')
xlim(1200, 2200)
xlabel('Strike')
legend(['Prior', 'Max Entropy', 'Market Quotes'])
title('Implied Vol');

Curve building revisited¶

We already know how to build good CDS/IR curves using:

Bootstrap and iteration
Tension spline interpolation

However, there are some unanswered questions:

which is a better state varaible, zero rate $r(t) = -\frac{1}{t}\log(b(t))$ or cumulative yield $y(t) = -\log(b(t))$?
what tension parameter makes more sense?
what if market prices are arbitrageable?

Maximum entropy method can help answer these questions.

IR curve building¶

Typical instruments to build USD IR curves:

deposits: 1D, 1M, 2M, 3M
3M IR Futures (or FRA): 3M, 6M, 12M, ..., 33M, 36M
Swaps: 4Y, 5Y, ..., 19Y, 20Y, 25Y, 30Y, 35Y, 40Y, 50Y

these instruments have different liquidity and their prices are not necessarily compatible with each other

it is not necessary to exactly match all mid market prices, usually it is adequate to be within the bid/ask
it is not uncommon for market prices to be arbitrageable, thus impossible to exactly fit all market mids

Instruments for building IR curves¶

$b(t)$ is the market price of zero coupon bond maturing at quarterly date $t$, then the market observable prices can be expressed as a linear system of $C\bs b = \bs v$:

$$\tiny \begin{array}\\ \text{Deposits} \begin{cases} \begin{array} \\ \\ \\ \\ \\ \end{array}\end{cases}\\ \text{Futures}\begin{cases}\begin{array} \\ \\ \\ \\ \\ \end{array}\end{cases}\\ \text{Swaps}\begin{cases}\begin{array} \\ \\ \\ \\ \\ \end{array}\end{cases}\end{array} \begin{pmatrix} & * & \\ & & * & \\ & & & * & \\ & & & & * & \\ & & & * & * \\ & & & & * & * \\ & & & & & * & * \\ & & & & & & * & * \\ & * & * & * & * & * & * & * & * & * \\ & * & * & * & * & * & * & * & * & * & * & * \\ & * & * & * & * & * & * & * & * & * & * & * & * & * \\ & * & * & * & * & * & * & * & * & * & * & * & * & * & * & * \end{pmatrix} \begin{pmatrix} * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \end{pmatrix} = \begin{pmatrix} * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \\ * \end{pmatrix} $$

* represents non zero elements

Pricing constraints¶

$$ C \bs b = \bs v $$

$C$ represents the cashflow of the benchmark instruments
$\bs b$ is the zero coupon bond prices at quarterly maturities
$\bs v$ is the benchmark instruments' prices
Often in practice, we sacrefice exact fit for better curve properties.
Maximum entropy is a powerful technique to find the right trade off

Stylized example:¶

We use the following swap data set to illustrate the IR Curve building.

The bid/ask spreads increase with swap maturity
The 100 year swap is added as a terminal condition

In [35]:

import fmt

mats = np.array([1, 2, 3, 5, 7, 10, 12, 15, 20, 25, 100])
par = np.array([.042, .043, .047, .054, .057, .06, .061, .059, .056, .0555, .0555])
ba = np.array([.0002, .0002, .0003, .0004, .0005, .0006, .0007, .001, .0015, .002, .01])

df_swap = pd.DataFrame(np.array([par, ba]).T*100, columns=["Par Spread (%)", "Bid/Ask Spread (%)"], 
                       index=mats)
fmt.displayDF(df_swap.T, "2f")

	1	2	3	5	7	10	12	15	20	25	100
Par Spread (%)	4.20	4.30	4.70	5.40	5.70	6.00	6.10	5.90	5.60	5.55	5.55
Bid/Ask Spread (%)	0.02	0.02	0.03	0.04	0.05	0.06	0.07	0.10	0.15	0.20	1.00

We can apply the ME optimization to build curves, as oppose to bootstrapping

this approach is more flexible and robust
it can handle overlapping instruments with arbitrageable prices

What distribution?¶

We need a random variable to apply the maximum entropy method.

the discounting factor can be rewritten by the probability of loss when teleporting a dollar from the future to today:

$$ 1 - b(t_i) = \sum_i p_i $$

$b(t_i)$ is the discount factor, or price of risk free zero coupon bond maturing at $t_i$
$p_i$ is the probability of the dollar evaporizing between time $t_{i-1}$ and $t_{i}$
$\bs p$ is the distribution of the evaporation time of a dollar when teleporting it from $t=\infty$

Swap price in $\bs p$¶

Assuming $t_i$ is semi-annual time grid, a swap's pricing equation can be expressed as a linear function of $p_i$

$$\begin{array} \\ \sum_{i=1}^n \frac{s}{2} b(t_i) + b(t_n) &= 1 \\ \sum_{i=1}^n \frac{s}{2} (1-\sum_{j=1}^i p_j) + (1-\sum_{i=1}^n p_i) &= 1 \\ \sum_{i=1}^n \left(\frac{s}{2} (n-i+1) + 1\right) p_i &= \frac{sn}{2} \\ \end{array}$$

the $\frac{1}{2}$ is because the swap's coupons are paid semi-anually.

Exact fit¶

We can force exact fits to the market prices by setting $W^{-1} = 0E$

$E$ is a digonal matrix of bid/ask term structure of swaps
recall $\bs e = W^{-1}\bs u$

In [36]:

def swapCons(coupons, mats, freq) :
    am = []
    nt = (int)(mats[-1]*freq + freq + .01)
    b = coupons*mats
    for m, c in zip(mats, coupons) :
        a = np.zeros(nt)
        ts = range(0, m*freq)
        a[ts] = .5*c*(m*freq-ts) + 1
        am.append(a)
        
    return np.array(am), b, arange(1, nt+1)*.5

def priceSwap(disc, mats, freq) :
    par = []
    pv01s = []
    for m in mats :
        pv01 = np.sum(disc[:m*freq])/freq
        par.append((1-disc[m*freq-1])/pv01)
        pv01s.append(pv01)
        
    return np.array(par), np.array(pv01s)

def fitSwap(a, b, ba, efs) :
    n = len(a.T)
    q = np.ones(n)/n
    pv01 = np.ones(len(b))
    
    outs = []
    for ef in efs :
        for ii in range(3) :
            e = ba*pv01*ef
    
            dual = me.MaxEntDual(q, a, b, e)
            res = minimize(dual.dual, np.zeros(len(e)), jac=dual.grad, method="BFGS")

            op = dual.dist(res.x)
            d = 1. - np.cumsum(op)
            fwd = np.diff(-(np.log(d)))
            fit, pv01 = priceSwap(d, mats, freq) 
    
        outs.append((op, fit, d, fwd))
        
    return zip(*outs)
        
freq = 2
nt = mats[-1]*freq
a, b, t = swapCons(par, mats, freq)

In [37]:

ef = [0, 2, 20]
op, fit, d, fwd = fitSwap(a, b, ba, ef)

for e, f in zip(ef, fit) : 
    df_swap["Fit Error %% ($W^{-1} = %.1fE$)" % (e*2)] = (f-par)*100
    
fmt.displayDF(df_swap.T[:3], "2f")

	1	2	3	5	7	10	12	15	20	25	100
Par Spread (%)	4.20	4.30	4.70	5.40	5.70	6.00	6.10	5.90	5.60	5.55	5.55
Bid/Ask Spread (%)	0.02	0.02	0.03	0.04	0.05	0.06	0.07	0.10	0.15	0.20	1.00
Fit Error % ($W^{-1} = 0.0E$)	-0.00	0.00	0.00	0.00	-0.00	0.00	0.00	-0.00	0.00	0.00	0.00

In [38]:

figure(figsize=[12, 4])
subplot(1, 2, 1)

plot(t, op[0], '.-');
xlabel('time (Y)')
title('Loss Time Distribution')
xlim(0, 26)

subplot(1, 2, 2)
plot(t[1:], np.diff(-(np.log(d[0]))), '.-')
title('Forward Rate');
xlabel('time (Y)')
xlim(0, 26)
ylim(0, .05);

The resulting forward rate is almost piece-wise flat
This validates the piece-wise flat forward rate interpolation
- or equivalently the linear interpolation in cumulative yield $y(t) = - \log(b(t))$
The ME implied tension parameter is large

Approximate fit¶

Now we construct the curve using $W^{-1} = 4E$ and $40E$.

In [39]:

fmt.displayDF(df_swap.T, "2f")

	1	2	3	5	7	10	12	15	20	25	100
Par Spread (%)	4.20	4.30	4.70	5.40	5.70	6.00	6.10	5.90	5.60	5.55	5.55
Bid/Ask Spread (%)	0.02	0.02	0.03	0.04	0.05	0.06	0.07	0.10	0.15	0.20	1.00
Fit Error % ($W^{-1} = 0.0E$)	-0.00	0.00	0.00	0.00	-0.00	0.00	0.00	-0.00	0.00	0.00	0.00
Fit Error % ($W^{-1} = 4.0E$)	0.01	0.02	0.02	0.00	0.01	-0.02	-0.08	-0.06	-0.08	-0.30	-1.10
Fit Error % ($W^{-1} = 40.0E$)	0.08	0.17	0.15	-0.02	-0.17	-0.54	-0.82	-0.98	-1.19	-1.52	-2.17

In [40]:

figure(figsize=[12, 4])
subplot(1, 2, 1)

tags = map(lambda d : '$W^{-1}=%d E$' % d, np.array(ef)*2)

plot(t, np.array(op).T, '.-');
xlabel('time(Y)')
title('Loss Time Distribution')
xlim(0, 26)
legend(tags)

subplot(1, 2, 2)
plot(t[:-1], np.array(fwd).T, '.-')
xlabel('time(Y)')
title('Forward Rate');
legend(tags)
xlim(0, 26)
ylim(0, .05);

the resulting forward rate is still very close to piecewise flat
The maximum entropy principle implies large tension parameter $\lambda$
Smoothness in forward rate is likely a false belief

Arbitrage removal¶

The market mid prices could be inconsistent and arbitrageable,

directly bootstrapping would lead to negative forward discount rate

How can we adjust the market prices to remove aribitrage?

it is not easy as there are (infinitely) many ways to do so

ME method can remove arbitrages with minimal distortion to the market:

with a non-zero $W^{-1}$, ME method guarantees to find an arbitrage free solution
the reliability of the market prices can be incorporated into $W^{-1}$

An arbitrageable market¶

In [41]:

par[4] = .03
ba[4] = .001
a, b, t = swapCons(par, mats, freq)
ef = [0.1, 2., 20.]

df_swap_arb = pd.DataFrame(np.array([par, ba]).T*100, columns=["Par Spread (%)", "Bid/Ask Spread (%)"], 
                       index=mats)

op, fit, d, fwd = fitSwap(a, b, ba, ef)

for e, f in zip(ef, fit) : 
    df_swap_arb["Fit Error %% ($W^{-1} = %.1fE$)" % (e*2)] = (f-par)*100

fmt.displayDF(df_swap_arb.T, "3g", 1)

	1	2	3	5	7	10	12	15	20	25	100
Par Spread (%)	4.2	4.3	4.7	5.4	3	6	6.1	5.9	5.6	5.55	5.55
Bid/Ask Spread (%)	0.02	0.02	0.03	0.04	0.1	0.06	0.07	0.1	0.15	0.2	1
Fit Error % ($W^{-1} = 0.2E$)	0.00352	0.00439	0.00772	-0.303	0.759	-0.00768	-0.00566	-0.00268	0.00653	-0.00438	-0.155
Fit Error % ($W^{-1} = 4.0E$)	0.00902	0.0226	0.0165	-0.282	1.02	-0.122	-0.102	-0.0708	-0.0816	-0.292	-1.08
Fit Error % ($W^{-1} = 40.0E$)	0.0787	0.147	0.0644	-0.344	1.85	-0.761	-0.963	-1.08	-1.25	-1.57	-2.19

7Y swap is clearly out of line with the rest.
The ME method automatically adjusts the 7Y prices with minimal distortion to other tenors.

Identify bad prices¶

In [42]:

figure(figsize=[12, 4])
subplot(1, 2, 1)

tags = map(lambda d : '$W^{-1}=%.1fE$' % d, np.array(ef)*2)

plot(t, np.array(op).T, '.-');
xlabel('time(Y)')
title('Loss Time Distribution')
xlim(0, 25)
legend(tags)

subplot(1, 2, 2)
plot(t[:-1], np.array(fwd).T, '.-')
xlabel('time(Y)')
title('Forward Rate');
legend(tags)
xlim(0, 25)
ylim(0, .08);

The reslting ME distribution also identifies the bad 7Y price
We can adjust the $W^{-1}$ to reduce the impact of bad prices

Summary of maximum entropy¶

The maximum entropy method is elegant and powerful

effective in dealing with partial information and incomplete market
bring many types of problems into a consistent framework

Main limitations:

all possible outcomes has to be enumerated, not good for high dimensionality
it is static, and does not give useful insight on the dynamics

The ME problem and solution are measure dependent

e.g., results are different between spot measure and forward measure

Risk Capital Allocation¶

Brit Hume: Fairness is not an attitude. It's a professional skill that must be developed and exercised.

Bank's trading book¶

The trading book hierarchy is well defined and stable over time
Leaf nodes are the lowest level books, only containing individual trades, but not books

Diversification benefits¶

$\bs w = \sum_i \bs w_i$ is the company's whole portfolio

$\bs w_i$ is the notional vector of the i-th business unit
$\bs w_i$ could represent a single trade in the ultimate granularity

$c(\cdot)$ is a cost function of a portfolio:

$c(\bs w) = c\left(\sum_i \bs w_i\right)$ is the cost of the whole company
$c(\bs w_i)$ is the standalone cost of the i-th business unit of the company
risk capital is the most important cost functions, such as VaR, IRC

For most risk capital metrics: $c\left(\sum_i \bs w_i\right) < \sum_i c(\bs w_i)$

because of the diversification/hedging benefits
total diversification benefits is therefore $\sum_i c(\bs w_i) - c\left(\sum_i \bs w_i\right)$

Allocation problem¶

How to divide the firm's total $c(\bs w)$ to individual business in an additive manner?

allocated cost $\xi_i$: $c(\sum_i \bs w_i) = \sum_i \xi_i$ by definition.
The core of the allocation problem is the fair distribution of diversification benefits
- $c(\bs w_i) - \xi_i$ is the diversification benefit allocated to the i-th business unit
In practice, allocation runs all the way down to individual trades

Importance of allocation¶

Business performance is measured by return on capital (ROC)

ROC is computed using allocated capital $\xi_i$, not the standalone capital $c(\bs w_i)$
certain business is only viable as part of a bank as $\xi_i < c(\bs w_i)$

Business incentive is a critical consideration.

allocated capital is actively managed by business heads
allocation method directly affects trading desks' behaviour
allocation method should incentivize risk reduction
- a business unit's action to reduce the firm's overall risk should also reduce its own allocation

Replacement cost¶

We use replacement cost as an example to illustrate allocation methods:

$$ \rc(\bs w) = \max(\bs w^T \bs v, 0)$$

$\bs v$ is the PV vector of all tradable instruments (per unit notional)
$\bs w^T \bs v$ is the portfolio's total PV

Replacement cost is the aggregated PV of the portfolio against a particular counterparty, floored at 0

part of the Basel 3 leveraged balance sheet capital
it is a measure of MtM loss if the counterparty defaults

A simple example¶

A bank has only three trade against a counterparty, their PVs are: $ a = 12, b = 24, c = -24$

The bank's total replacement cost is therefore: $$\rc(a + b + c) = \max(12 + 24 - 24, 0) = 12$$
Diversification benefits: $$ \rc(a) + \rc(b) + \rc(c) - \rc(a+b+c) = 12 + 24 + 0 - 12 = 24$$

Standalone allocation¶

The simplest allocation strategy, allocation proportional to the standalone cost:

$$ \xi_i = \frac{c\left(\sum_i \bs w_i\right)}{\sum_i c(\bs w_i)} c(\bs w_i) $$

Standalone allocation results:

	A	B	C	Total
PV	12	24	-24	12
Standalone RC	12	24	0	36
Standalone allocation	4	8	0	12

commonly used in practice
not intrinsically additive, need the ad hoc scaling factor
gives wrong business incentives (more on this later)

A model for fairness¶

In cooperative game theory, fairness is modeled by the following axioms:

efficiency: allocations sum up to total
symmetry: if two units' contribution are identical to any arbitrary subsets of units, their allocations are the same
linearity: allocation of sums equals to the sum of allocations
null: if a unit has no contribution to any subsets, it has 0 allocation

Shapley allocation is the only fair allocation method:

A unit's allocation equals to the average of its marginal contributions over all possible permutations

Developed in 1960s, Sir L. Shapley received the 2012 Nobel prize.

Shapley allocation example¶

Permutations	Cumulative RC	A	B	C
A=12, B=24, C=-24	12, 36, 12	12	24	-24
A=12, C=-24, B=24	12, 0, 12	12	12	-12
B=24, C=-24, A=12	24, 0, 12	12	24	-24
C=-24, B=24, A=12	0, 0, 12	12	0	0
B=24, A=12, C=-24	24, 36, 12	12	24	-24
C=-24, A=12, B=24	0, 0, 12	0	12	0
	Averages	10	16	-14

Shapley allocation is additive by construction

no need for any ad hoc scaling factors

Shapley allocation is computationally expensive

requires Monte Carlo simulation for any non-trivial allocaiton problems
e.g. the total number of permutation for 10 units is $10! = 3,628,800$

Associativity¶

Associativity means a top-down applicaiton of the allocation yields identical results as a bottom-up application:

Neither standalone nor Shapley allocation is associative:

	allocation to [A]	allocation to [BC]
Top-down	12	0
Standalone bottom-up	4	8
Shapley bottom-up	10	2

Associativity is critical in practice:

Without it, the allocation algorithm is vulnerable to manipulations via trade rebooking.

The direction of attack:

Tweak the Shapley allocation to make it associative

Aumann-Shapley allocation¶

A continuous limit of the Shapley allocation with allocation units being trades with infinitesimal notionals.

$$\bs u^T = \int_0^1 \frac{\partial c(q\bs w)}{\partial (q \bs w)} dq \iff u_k = \int_0^1 \frac{\partial c(q\bs w)}{\partial (q w_k)} dq $$

$\bs u$ is the allocation per unit notional of instruments
the allocation to individual trades in the i-th unit are $\bs u \odot \bs w_i$
the allocation to a portfolio is therefore $\xi_i = \bs u^T \bs w_i$.
Associative by construction: inherited from vector summation

A-S allocation is much more efficient to compute than Shapley allocation.

The big blender¶

Conceptually the A-S allocation, $\bs u^T = \int_0^1 \frac{\partial c(q\bs w)}{\partial (q \bs w)} dq $, is a big blender:

all organizational information are lost
the allocation is computed from a homogenous soup of trades

Additivity of Aumann-Shapley¶

The A-S allocation is additive by construction:

$$\begin{array} \\ \sum_i \xi_i &= \sum_i \bs u^T \bs w_i = \bs u^T \bs w \\ &= \int_0^1 \frac{\partial c(q\bs w)}{\partial (q \bs w)} \bs w \; dq = \int_0^1 \frac{\partial c(q\bs w)}{\partial (q \bs w)} \frac{\partial (q \bs w)}{\partial q} \; dq \\ &= \int_0^1 d c(q\bs w) = c(\bs w) - c(\bs 0) \end{array}$$

it is customary to define $c(\bs 0) = 0$, ie, empty portfolio should have zero risk or capital cost.

Euler allocation¶

If the cost function is homogeneous, ie, $c(q \bs w) = q c(\bs w)$, the A-S allocation reduces to the Euler allocation:

$$ \bs u^T = \int_0^1 \frac{\partial c(q\bs w)}{\partial (q \bs w)} dq = \int_0^1 \frac{\partial c(\bs w)} { \partial \bs w} dq = \frac{ \partial c(\bs w)} {\partial \bs w} $$

In practice, most risk capital metrics are homogeneous
Euler allocation is by trades' marginal contribution
A-S/Euler allocation are computationtally efficient
- no need for Monte Carlo

Euler allocation of replacement cost¶

$$ \bs u^T = \frac{\partial \rc(\bs w)}{\partial \bs w} = \frac{\partial \max(\bs w^T \bs v, 0)}{\partial \bs w} = \ind(\bs w^T \bs v > 0) \bs v^T $$

$\frac{d}{dx} \max(x, 0) = \ind(x > 0)$, where $\ind(\cdot)$ is the indicator function
the allocation is trade's MtM if the total PV is positive, otherwise 0
it is indeed associative, the sum of allocation to B and C is 0

	A	B	C	Total
A-S/Euler Allocation	12	24	-24	12

Perverse incentives¶

A-S allocation could produce perverse incentives in practice
A consequence of using infitesimal sized allocation unit
Fairness depends on organization

Consider replacement cost of a bank with two trading desks X and Y:

Day	X PV	Y PV	Banks RC	A-S/Euler Allocation
1	12	-10	2	X=12, Y=-10
2	12	-14	0	X=0, Y=0

Y gets penalized for reducing the bank’s overall RC to 0
X gets a big reduction in capital allocation without doing anything

What's wrong with Shapley?¶

The associativity is broken by two permutations that violates the bank’s organizational constraint.
If we remove the two offending permutations from averaging, the Shapley allocation becomes associative.

Permutations	Cumulative RC	A	B	C
A=12, B=24, C=-24	12, 36, 12	12	24	-24
A=12, C=-24, B=24	12, 0, 12	12	12	-12
B=24, C=-24, A=12	24, 0, 12	12	24	-24
C=-24, B=24, A=12	0, 0, 12	12	0	0
B=24, A=12, C=-24	24, 36, 12	12	24	-24
C=-24, A=12, B=24	0, 0, 12	0	12	0
Averages	of all	10	16	-14
Averages	of admissibles	12	15	-15

Constrained Shapley¶

A unit’s allocation equals its average incremental contribution, where the average is taken over all the permutations that are admissible under the organizational constraints.

Admissible permutations:

All permutations are admissible for nodes with the same parent
A branch (i.e. a node and all of its descendents) has to be permuted as a whole.

Admissible Permutations	Inadmissible Permutations
</br>	</br>

mixing trades from different leaf nodes also lead to inadmissible permutations

Correct incentives¶

The constrained Shapley is additive and associtive by construction, it gives the right incentives:

Day	X PV	Y PV	Banks RC	C-Shapley Allocation
1	12	-10	2	X=7, Y=-5
2	12	-14	0	X=6, Y=-6

C-Shapley allocation gives the correct incentive for Y to reduce the overall bank’s RC.

C-Shapley features¶

Advantages of C-Shapley:

additive and associative by construction
gives the right business incentive

However, it is still computationally expensive:

Impossible to exhaust all admissible permutations
Monte Carlo permutation of large number of trades is challenging

Constrained Aumman-Shapley (CAS) is a continuous extension to C-Shapley:

computationally efficient
equivalent to C-Shapley on all nodes, different only in the allocation to individual trades

Business incentives¶

We take the following simple bank as an example:

The covariance matrix between the returns of three businesses are:

	A	B	C
A	1	0.65	-0.9
B	0.65	1	-0.9
C	-0.9	-0.9	1

We then scale the size of business C and see what happens to the firm's total VaR and its VaR allocations.

Allocation comparison¶

Yellow region: increasing C reduces firm's overall VaR
Standalone allocation failed to recognize and reward hedges
Euler allocation failed to distribute diversification benefits
Euler allocation is unstable
Only C-S/CAS recognizes the main driver of the risk

Allocation summary¶

Criteria	C-S/CAS	A-S/Euler	Shapley	Standalone
Associativity	yes	yes	no	no
Negative allocations for hedges	yes	yes	yes	no
Stability	good	poor	good	best
Predict marginal changes	good	best	good	poor
Distribute diversification benefits	yes	no	yes	yes
Recognize main risk driver	best	good	best	poor
Computational speed	good	good	poor	best

C-Shapley/CAS allocation:¶

stands out with the best set of features
a candidate for universal risk capital allocation method
- applicable to a wide variety of risk capital metrics in banks
- applicable to the risk allocations of multiple trading strategies

Achieve the impossible¶

Change allocation methodology is almost mission impossible:

zero-sum: a desk's gain is another desk's loss
it affects the interests and livelihood of every business unit
diffcult to build consensus amongst business units with conflicting interests

The only way to implement the change is to:

establish a theoreticaly sound methodology that is impossible to argue against (C-Shapley/CAS)
explain the methodology to every business unit, and ask business heads to document their objections
- "I don't like my allocation" is not a good objection
the process takes time and effort

optional

Constrained Aumman-Shapley¶

Constrained Aumman-Shapley (CAS) is a continuous generaization of the C-Shapley allocation
Conceptually it is a small blender, that works within the organizational boundary

CAS allocation¶

The allocation per unit notional for trades in the portfolio B, conditioned on an admissible permutation is:

$$ \bs u^T(\bs w_B |\bs w_A) = \int_0^1 \frac{\partial c(\bs w_A + q\bs w_B)}{\partial (q \bs w_B)} dq $$

The unconditional CAS allocation per unit notional for leaf node B is:

$$\bs u^T(\bs w_B) = \mathbb{E}\left[\bs u^T(\bs w_B | \bs w_A)\right]$$

the expectation is taken over all admissible permutations (i.e., all $\bs w_A$)
the allocation for each trade in B is therefore $\bs u(\bs w_B) \odot \bs w_B$
the allocation for the portfolio B is $\bs u^T(\bs w_B) \bs w_B$

CAS features¶

Identical to C-Shapley for all nodes, only differ in individual trades
Additive, for each admissible permutation:

$$\begin{array}{l} \bs u^T(\bs w_B | \bs w_A) \bs w_B &= \int_0^1 \frac{\partial c(\bs w_A + q\bs w_B)}{\partial (q \bs w_B)} \bs w_B dq = \int_0^1 \frac{\partial c(\bs w_A + q\bs w_B)}{\partial (q \bs w_B)} \frac{\partial (q \bs w_B)}{\partial q} dq \\ &= \int_0^1 d c(\bs w_A + q \bs w_B) = c(\bs w_A + \bs w_B) - c (\bs w_A) \end{array}$$

Associative, follows that of the C-Shapley

We will show later that CAS is:

Computationally efficient
Gives the right business incentives

Separability¶

The cost function is separable if

$$ \frac{\partial c(\bs w_A + q\bs w_B)}{\partial (q \bs w_B)} = \bs a^T(\bs w_A + q\bs w_B) S $$

$\bs a$ is a vector of the portfolio level parameters only
$S$ is a matrix that only depends on individual trade's characteristics

Vast majority of risk capital metrics are separable:

VaR and VaR variants like IRC, CRM
Expected shortfall
Replacement cost

Separate replacement cost¶

$$\begin{array}{l} \frac{\partial }{\partial (q \bs w_B)} \rc(\bs w_A + q \bs w_B) &= \frac{\partial }{\partial (q \bs w_B)} \max\left((\bs w_A + q \bs w_B)^T \bs v, 0 \right) \\ &= \ind ((\bs w_A + q \bs w_B)^T \bs v > 0) \bs v^T \end{array}$$

$\ind \left((\bs w_A + q \bs w_B)^T \bs v > 0\right)$ is only a function of the portfolio PV
$\bs v$ is a vector of individual instrument's PV (per unit notional)

Efficiency and separability¶

For separable metrics, CAS allocation reduces to:

$$\mathbb{E}\left[\bs u^T(\bs w_B | \bs w_A)\right] = \mathbb{E}\left[\int_0^1 \frac{\partial c(\bs w_A + q\bs w_B)}{\partial (q \bs w_B )} dq \right] = \mathbb{E}\left[\int_0^1 \bs a^T(\bs w_A + q\bs w_B) dq \right] S $$

Being independent of $\bs w_A$ or $q$, $S$ can be pulled out of the expectation.

With separability, CAS allocation is extremely efficient:

Simulate $\bs a(\bs w_B) = \mathbb{E}\left[\int_0^1 \bs a(\bs w_A + q\bs w_B) dq\right]$
- only sample leaf nodes' permutations
- no need to permute and track individual trades
- many orders of magnitude faster, millions of trades vs. thousands of leaf nodes
The product $\bs a^T(\bs w_B) S$ can be done cheaply as a second step

Value at Risk¶

Value at risk (VaR) is a quantile measure of the portfolio's risk.

If a portfolio's 10-day 99% VaR is \$10M, then the probabilty for the portfolio to lose more than \$10M in 10 days is 1%.
It is the most important and widely quoted risk management metric

Mathematically VaR is a quantile measure: $$\renewcommand{rv}{\tilde{\bs v}}$$

$$ \mathbb{P}[\bs w^T \rv > \text{VaR}_\alpha] = \alpha $$

VaR is communicate as a positive number despite being a loss
We write $\text{VaR} < 0$ and use $|\text{VaR}|$ explicitly when needed
$\alpha$ is the quantile, like 99%
$\bs w$ is the portfolio's notional vector of instruments,
$\rv$ is the PV change of a unit notional instrument over the 10-day period, it is a random vector
The above equation explicitly defined a $\text{VaR}_\alpha(\bs w)$ function

Computing VaR:¶

Historical simulation: replay historical 10-day returns of all risk factors on today's portfolio
Model simulation: build risk factor models using historical data and simulate many scenarios from the model

Marginal contribution to VaR:¶

A useful relationship: $$ \frac{\partial \v}{\partial \bs w} = \mathbb{E}[\rv^T | \bs w^T \rv = \v] $$

which can be derived as:

$$\small \begin{array} \\ 0 &= \frac{\partial}{\partial \bs w} \mathbb{P}\left[\bs w^T \rv > \v \right] = \frac{\partial}{\partial \bs w} \mathbb{E}\left[\ind(\bs w^T \rv - \v > 0)\right] \\ &= \mathbb{E}\left[\frac{\partial}{\partial \bs w} \ind (\bs w^T \rv - \v > 0)\right] = \mathbb{E}\left[ \delta (\bs w^T \rv - \v) (\rv^T - \frac{\partial \v}{\partial \bs w})\right] \\ &= \mathbb{E}\left[\rv^T - \frac{\partial \v}{\partial \bs w} \vert \bs w^T \rv = \v\right] = \mathbb{E}\left[\rv^T \vert \bs w^T \rv = \v\right] - \frac{\partial \v}{\partial \bs w} \end{array}$$

$\ind(x)$ is the indicator function, $\frac{d}{dx}\ind(x>0) = \delta(x)$
$\delta(x)$ is the Dirac's delta function, with $\intr f(x) \delta(c) dx = f(c)$

Separability of VaR¶

The VaR can be written in a separable form using its marginal contribution:

$$\begin{array}{l} \frac{\partial \v (\bs w_A + q \bs w_B)}{\partial (q \bs w_B)} &= \mathbb{E}[\rv^T | (\bs w_A + q\bs w_B)^T \rv = \v] \\ &= \bs k^T(\bs w_A + q\bs w_B) V \end{array}$$

The first equality can be proved similarly using the steps in the previous slide.

$\bs k^T$ is a Gaussian Kernel
$V$ is a matrix of the historical 10-day PnL for all the individual trades

Assignments¶

Homework

Complete homework set 10