$$ \renewcommand{bs}{\boldsymbol} \begin{matrix} \bs u = \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{matrix} \right) \hspace{2cm} \bs v = \left( \begin{matrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{matrix} \right) \end{matrix} $$
$$ \overset{\text{Outputs}}{\longleftarrow}\overset{\downarrow \text{Inputs}} {\begin{pmatrix} a_{11} & a_{12} & . & a_{1n} \\ a_{21} & a_{22} & . & a_{2n} \\ . & . & . & \\ a_{m1} & a_{m2} & . & a_{mn} \end{pmatrix}} $$
$$\begin{array} \\ \bs u = A \bs v &\iff u_i = \sum_{j=1}^{n} a_{ij}v_j \\ C = AB &\iff c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj} = a_i \cdot b_j \end{array}$$
Linear function on vectors:
Any linear transformation bewteen finite dimensional vector space can be represented by a matrix multiplication, therefore we can write $L \bs u$ instead of $L(\bs u)$.
%pylab inline
lecture = 2
import fmt
import sympy as sp
from IPython.display import display
sp.init_printing(use_latex = True)
Populating the interactive namespace from numpy and matplotlib
a = sp.Matrix([[2, 1, -1], [-6, -2, 4], [-2, 1, 2]])
y = sp.Matrix([8, -22, -3])
X = sp.MatrixSymbol('x', 3, 1)
x1, x2, x3 = sp.symbols('x_1, x_2, x_3')
x = sp.Matrix([x1, x2, x3])
fmt.displayMath(a, fmt.joinMath('=', x, y), sep="", pre="\\small ")
Eliminate the $x_1$ terms using the first row, this operation is a linear transformation:
A = sp.MatrixSymbol('A', 3, 3)
L1 = sp.MatrixSymbol('L_1', 3, 3)
L2 = sp.MatrixSymbol('L_2', 3, 3)
l1 = sp.eye(3)
l1[1, 0] = -a[1, 0]/a[0, 0]
l1[2, 0] = -a[2, 0]/a[0, 0]
fmt.displayMath(L1*a, fmt.joinMath('=', x, L1*y), "\;,\;\;", l1*a, fmt.joinMath('=', x, l1*y), sep="", pre="\\small ")
Use the 2nd equation (row) to eliminate the $x_2$ terms:
l2 = sp.eye(3)
a2 = l1*a
y2 = l1*y
l2[2, 1] = -a2[2, 1]/a2[1, 1]
u = l2*a2
fmt.displayMath(L2*a2, fmt.joinMath('=', x, L2*y2), "\;,\;", u, fmt.joinMath('=', x, l2*y2), sep="", pre="\\small ")
the $L_1$ and $L_2$ are both lower triangular matrix
Ui = sp.MatrixSymbol('U^{-1}', 3, 3)
U = sp.MatrixSymbol('U', 3, 3)
L = sp.MatrixSymbol('L', 3, 3)
fmt.displayMath(fmt.joinMath('=', L1, l1), fmt.joinMath('=', L2, l2), fmt.joinMath('=', U, u), pre="\\small ")
The resulting matrix $U = L_2L_1A$ is upper trianglar
The triangular matrix is easy to invert by variable replacement
y3 = l2*y2
a3 = l2*a2
ui = a3.inv()
fmt.displayMath(fmt.joinMath('=', Ui, ui), "\;,\;", fmt.joinMath('=', x, Ui), fmt.joinMath('=', l2*y2, ui*y3), sep="\;", pre="\\small ")
Now we can group $L = L_1^{-1}L_2^{-1}$ and obtain the LU factorization
$$L_2 L_1 A = U \iff A = L_1^{-1} L_2^{-1} U \iff A = LU $$
l = l1.inv()*l2.inv()
fmt.displayMath(fmt.joinMath('=', L, l), fmt.joinMath('=', U, a3), fmt.joinMath('=', L*U, l*a3), pre="\\small ")
$$ A = P \cdot L \cdot U $$
where the $P$ matrix represent the row permutation.
from scipy.linalg import lu
def displayMultiple(fs) :
tl=map(lambda tc: '$' + sp.latex(tc) + '$',fs)
r = '''
<table border="0"><tr>'''
for v in tl :
r += "<td>" + v + "</td>"
r += "</tr></table>"
return r
a = sp.Matrix([[0, 3, 1, 2], [4, 0, -3, 1], [-3, 1, 0, 2], [9, 2, 5, 0]])
p, l, u = map(lambda x: sp.Matrix(x), lu(a))
Pi = sp.MatrixSymbol('P^{-1}', 4, 4)
A = sp.MatrixSymbol('A', 4, 4)
P = sp.MatrixSymbol('P', 4, 4)
fmt.displayMath(fmt.joinMath('=', A, a), fmt.joinMath('=', P, p), pre="\\small ")
fmt.displayMath(sp.Eq (Pi*A, p.inv()*a), pre="\\small ")
Complexity of a numerical algorithm is stated in the order of magnitude, often in the big-O notation:
Most numerical linear algebra algorithm is of complexity of $O(n^3)$
The most important and ubiquitous matrices in quant Finance,
$$\begin{array}{l}V &= \mathbb{E}[(\bs r - \bar{\bs r})(\bs r - \bar{\bs r})^T] = \mathbb{E}[\bs r \bs r^T] - \bar{\bs r}\bar{\bs r}^T \\ &= (\text{cov}(r_i, r_j)) = (\rho_{ij} \sigma_i \sigma_j)\end{array}$$
Covariance of linear combinations:
$$\begin{array}{l} \text{cov}(\bs x^T \bs r, \bs y^T \bs r) &= \mathbb{E}[(\bs x^T \bs r)(\bs r^T \bs y)] - \mathbb{E}[\bs x^T \bs r]\mathbb{E}[\bs r^T \bs y]\\ &= \bs x^T \mathbb{E}[\bs r \bs r^T] \bs y - \bs x^T \bar{\bs r}\bar{\bs r}^T \bs y = \bs x^T V \bs y \end{array}$$
Both covariance and correlation matrices are symmetric positive semi-definate (SPD):
import pandas as pd
f3 = pd.read_csv('data/f3.csv', parse_dates=[0]).set_index('Date').sort()
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
f3.plot(title='Historical Prices', ax=ax1);
ax2 = fig.add_subplot(122)
r = np.log(f3).diff()
r.plot(title='Historical Returns', ax=ax2);
Correlation and covariance matrix of weekly returns:
cm = r.corr()
cv = r.cov()
fmt.displayDFs(cm, cv*1e4, headers=['Correlation', 'Covariance'], fmt="4f")
Correlation | Covariance | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
If $A$ is symmetric semi positive definate (SPD) matrix
Cholesky decomposition is unique:
The Cholesky decomposition of previous correlation matrix:
lcm = np.linalg.cholesky(cm)
lcv = np.linalg.cholesky(cv)
fmt.displayMath("\\small ", sp.Matrix(lcm).evalf(4), fmt.joinMath('=', sp.Matrix(lcm.T).evalf(4), sp.Matrix(lcm.dot(lcm.T)).evalf(4)),
sep="")
A SPD matrix $A$ and its Choleski decomposition $L$ can be partitioned as:
s_A, a11, A12, A21, A22 = sp.symbols("A, a_{11} A_{21}^T A_{21} A_{22}")
s_L, s_LT, l11, L12, L12T, L22, L22T = sp.symbols("L L^T l_{11} L_{21} L_{21}^T L_{22} L_{22}^T")
A = sp.Matrix([[a11, A12], [A21, A22]])
L = sp.Matrix([[l11, 0], [L12, L22]])
LT = sp.Matrix([[l11, L12T], [0, L22T]])
fmt.displayMath(fmt.joinMath('=', s_A, A), fmt.joinMath('=', s_L, L), fmt.joinMath('=', s_LT, LT))
From $A = LL^T$:
fmt.displayMath(fmt.joinMath('=', A, L*LT))
We immediately have:
fmt.displayMath(fmt.joinMath('=', l11, sp.sqrt(a11)), fmt.joinMath('=', L12, A21/l11),
fmt.joinMath('=', A22 - L12*L12T, L22*L22T))
Note that $L_{22}$ is the Cholesky decomposition of the smaller matrix of $A_{22} - \frac{1}{a_{11}}A_{21}A_{21}^T$.
Ubiquitous in quant Finance
Steps in drawing correlated n-dimensional Geometric Brownian motion paths:
$$ \frac{dx^k_t}{x^k_t} = u^k dt + \sigma^k dw^k_t, \;\;\;\;dw_t^j\cdot dw_t^k = \rho_{jk} dt $$
e = np.random.normal(size=[3, 20000])
dw = (lcm.dot(e)).T
dws = dw*np.diag(cv)
wcm = np.exp(np.cumsum(dws, 0))
figure(figsize=[12, 4])
subplot(1, 2, 1)
dw = (lcm.dot(e)).T
dws = dw*np.diag(cv)
wcm = np.exp(np.cumsum(dws, 0))
plot(wcm)
legend(cm.index, loc='best');
title('$L \epsilon$')
subplot(1, 2, 2)
dw2 = (lcm.T.dot(e)).T
dws2 = dw2*np.diag(cv)
wcm = np.exp(np.cumsum(dws2, 0))
plot(wcm)
legend(cm.index, loc='best');
title(('$L^T \epsilon$'));
df1 = pd.DataFrame(np.corrcoef(dws.T), columns=cm.index, index=cm.index)
df2 = pd.DataFrame(np.corrcoef(dws2.T), columns=cm.index, index=cm.index)
fmt.displayDFs(df1, df2, headers=['$L \epsilon$', '$L^T \epsilon$'])
$L \epsilon$ | $L^T \epsilon$ | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The resulting covariance (correlation) is: $$\mathbb{E}[L\bs \epsilon (L\bs \epsilon)^T] = \mathbb{E}[L\bs{\epsilon \epsilon}^T L^T] = LL^T = C$$
The covariance/correlation matrix are extremely useful, but they are difficult to deal with when their size grows large:
$$f(\bs x) = f(x_1, ..., x_n)$$
$$\bs{ y(x) } = [y_1(\bs x), ..., y_n(\bs x)]^T$$
Expression | Results | Special Cases |
---|---|---|
$\frac{\partial(a \bs u + b \bs v)}{\partial \bs x}$ | a $\frac{\partial{\bs u}}{\partial \bs x} + b\frac{\partial{\bs v}}{\partial \bs x}$ | $\frac{\partial{\bs c}}{\partial \bs x} = 0I, \frac{\partial{\bs x}}{\partial \bs x} = I$ |
$\frac{\partial{A \bs u}}{\partial \bs x}$ | $A\frac{\partial{\bs u}}{\partial \bs x}$ | $\frac{\partial{\bs A x}}{\partial \bs x} = A, \frac{\partial{\bs x^T A}}{\partial \bs x} = A^T$ |
$\frac{\partial{y \bs u}}{\partial \bs x}$ | $y \frac{\partial{\bs u}}{\partial \bs x} + \bs u \frac{\partial{y}}{\partial \bs x}$ | |
$\frac{\partial \bs{u}^T A \bs v}{\partial \bs x} $ | $\bs u^T A \frac{\partial{\bs v}}{\partial \bs x} + \bs v^T A^T \frac{\partial{\bs u}}{\partial \bs x} $ | $\frac{\partial \bs{x}^T A \bs x}{\partial \bs x} = (\bs{A + A}^T)x, \frac{\partial \bs{u^Tv}}{\partial x} =\bs{u}^T\frac{\partial \bs{v}}{\partial x} + \bs{v}^T\frac{\partial \bs{u}}{\partial x} $ |
$\frac{\partial{\bs g(\bs u})}{\partial \bs x}$ | $\frac{\partial{\bs g}}{\partial \bs u} \frac{\partial{\bs u}}{\partial \bs x} $ | chain rules of multiple steps, $\frac{\partial \bs y}{\partial \bs x}\frac{\partial \bs x}{\partial \bs y} = I$ |
Suppose there are $n$ risky assets on the market, with random return vector $\tilde{\bs r}$ whose covariance matrix is $V$,
Given the view $\bs f$, the optimal portfolio $\bs w$ to express the view is:
which solves a portfolio with the largest Sharp ratio under the view $\bs f$.
Why we take the covariance matrix $V$ as a constant, but treat expected return $\bs f$ as a variable?
The optimal portfolio can be solved analytically using Lagrange multiplier and matrix calculus:
$$\begin{array} \\ & l &= \bs w^TV \bs w - 2 \lambda (\bs f^T \bs w - 1) \\ & \frac{\partial l}{\partial \bs w} &= 2 \bs w^T V - 2 \lambda \bs f^T = \bs 0^T \\ \end{array}$$ $$ V \bs w - \lambda \bs f = 0 \iff \bs w = \lambda V^{-1} \bs f $$
plug it into $\bs f^T \bs w = 1$:
$$ \bs f^T(\lambda V^{-1} \bs f) = 1 \iff \lambda = \frac{1}{\bs f^T V^{-1} \bs f}$$ $$\bs w(\bs f) = \frac{V^{-1}\bs f}{\bs f^T V^{-1} \bs f}$$
$\bs w (\bs f)$ is also known as the characteristic portfolio for the forecast $\bs f$.
The optimal portfolio is invariant if $\bs f$ is scaled by a scalar $a$:
$$\bs w(a\bs f) = \frac{1}{a} \bs w(\bs f)$$
Therefore, the return forecast $\bs f$ is only meaningful in a relative sense:
The return forecast $\bs f$ should only be interpreted in a relative sense, we use $\propto$ to make it explicit.
Identical returns: $\bs f \propto \bs e = [1, 1, ...., 1]^T$:
Beta to a benchmark portfolio: $\bs f \propto \bs \beta_b = \frac{V\bs w_b}{\sigma_b^2}$:
From any portfolio, we can backout its implied excess return forecast $\bs f$:
Consider the market portfolio: the market collectively believe that $\bs f \propto \bs \beta_m$:
$$\mathbb{E}[\tilde{\bs r}] - r_0 = \bs \beta_m \left(\mathbb{E}[\tilde r_m] - r_0\right)$$
The implied excess return is a much better return estimate than historical data.
~500 years of data
The covariance matrix estimated from historical weekly returns are:
fmt.displayDF(cv*1e4, "4g")
SPY | GLD | OIL | |
---|---|---|---|
SPY | 8.043 | 0.2461 | 5.611 |
GLD | 0.2461 | 7.288 | 3.838 |
OIL | 5.611 | 3.838 | 22.28 |
We consider three risky assets, with the following excess return forecast:
er = np.array([.05, .02, .01]).T
df_er = pd.DataFrame(np.array([er]).T*100, columns=["Excess Return Forcast (%)"], index = f3.columns).T
fmt.displayDF(df_er, "2f")
SPY | GLD | OIL | |
---|---|---|---|
Excess Return Forcast (%) | 5.00 | 2.00 | 1.00 |
The optimal portfolio for the given forecast is:
cvi = np.linalg.inv(cv)
w = cvi.dot(er.T)/er.T.dot(cvi).dot(er)
df_er.ix['Optimial Portfolio', :] = w
fmt.displayDF(df_er[-1:], "3g")
SPY | GLD | OIL | |
---|---|---|---|
Optimial Portfolio | 17.6 | 8.35 | -4.83 |
Suppose we are given the following portfolio:
w = np.array([10, 5, 5])
vb = w.dot(cv).dot(w)
ir = cv.dot(w)/vb
df = pd.DataFrame(np.array([w, ir])*100, index=["$ Position", "Implied Return %"],
columns = ["SPY", "GLD", "OIL"])
fmt.displayDF(df[:1], "4g")
SPY | GLD | OIL | |
---|---|---|---|
$ Position | 1,000 | 500 | 500 |
We can compute its implied return forecast as:
fmt.displayDF(df[1:], "2f")
SPY | GLD | OIL | |
---|---|---|---|
Implied Return % | 4.73 | 2.50 | 8.04 |
Does the investor really have so much confidence in OIL?
The Mean-variance optimization is very powerful, but there are potential pitfalls in practice:
nt = 1000
es = np.random.normal(size=[2, nt])
rho = .999999
e4 = rho/np.sqrt(2)*es[0, :] + rho/np.sqrt(2)*es[1,:] + np.sqrt(1-rho*rho)*np.random.normal(size=[1, nt])
es = np.vstack([es, e4])
cor = np.corrcoef(es)
cor1 = np.copy(cor)
cor1[0, 1] = cor1[1, 0] = cor[0, 1] + .00002
sd = np.eye(3)
np.fill_diagonal(sd, np.std(r))
cov = sd.dot(cor).dot(sd.T)
cov1 = sd.dot(cor1).dot(sd.T)
e, v = np.linalg.eig(np.linalg.inv(cov))
er = v[:, 2]/10
df_cov = pd.DataFrame(cov*1e4, index=f3.columns, columns=f3.columns)
df_cov1 = pd.DataFrame(cov1*1e4, index=f3.columns, columns=f3.columns)
pf = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi = np.linalg.inv(cov)
pf.ix['Optimal Portfolio', :] = covi.dot(er.T)/er.T.dot(covi).dot(er)
fmt.displayDFs(df_cov, pf, headers=["Covariance", "Optimized Portfolio"], fmt="4f")
Covariance | Optimized Portfolio | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
pf1 = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi1 = np.linalg.inv(cov1)
pf1.ix['Optimal Portfolio', :] = covi1.dot(er.T)/er.T.dot(covi1).dot(er)
fmt.displayDFs(df_cov1, pf1, headers=["Covariance", "Optimized Portfolio"], fmt="4f")
Covariance | Optimized Portfolio | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Consider the following linear system $A\bs x = \bs y$, and its solution:
a = np.array([[1, 2], [2, 3.999]])
x = sp.MatrixSymbol('x', 2, 1)
y = np.array([4, 7.999])
fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(y)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, y), 4))))
A small perturbation on vector $\bs y$:
z = np.copy(y)
z[1] += .002
fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(z)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, z), 4))))
A small perturbtion on matrix $A$:
b = np.copy(a)
b[1, 1] += .003
fmt.displayMath(fmt.joinMath('=', sp.Matrix(b)*x, sp.Matrix(y)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(b, y), 4))))
is a measure of the magnitude of the vector:
Vectors with unit norms:
Defined to be the largest amount the linear transformation can stretch a vector:
$$\Vert A\Vert = \max_{\bs u \ne 0}\frac{\Vert A\bs u\Vert }{\Vert \bs u\Vert }$$
The matrix norm definition depends on the vector norms. Only L1 and L$\infty$ matrix norm have analytical formula:
The propogation of errors in a linear system $\bs y = A\bs x$ with invertable $A$:
$$\begin{array}\\ \Vert d \bs y\Vert &= \Vert A d \bs x\Vert \le \Vert A\Vert\Vert d \bs x\Vert = \Vert A\Vert \Vert \bs x\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \\ &= \Vert A\Vert \Vert A^{-1} \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \le \Vert A\Vert \Vert A^{-1} \Vert \Vert \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \end{array}$$
$$\frac{\Vert d \bs y\Vert }{\Vert \bs y\Vert } \le \Vert A\Vert\Vert A^{-1}\Vert\frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert}$$
What if we change the matrix itself? i.e. given $AB = C$, how would $B$ change under a small change in $A$ while holding $C$ constant?
We can write any $\delta A = \dot{A} \epsilon$ and the resulting $\delta B = \dot{B} \epsilon$:
$$\begin{array} \\ (A + \delta A) (B + \delta B) &= (A + \dot{A} \epsilon ) (B + \dot{B} \epsilon) = C \\ AB + (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= C \\ (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= 0 \end{array}$$
Now we collect the first order terms of $\epsilon$, $\dot{A}B + A\dot{B} = 0$:
$$\begin{array} \\ \dot{B} &= -A^{-1}\dot{A}B \\ \delta B &= -A^{-1}\delta A B \\ \Vert \delta B \Vert &= \Vert A^{-1}\delta A B \Vert \le \Vert A^{-1} \Vert \Vert \delta A \Vert \Vert B \Vert \\ \frac{\Vert \delta B \Vert}{\Vert B \Vert} &\le \Vert A^{-1} \Vert \Vert \delta A \Vert = \Vert A^{-1} \Vert \Vert A \Vert \frac{\Vert \delta A \Vert}{\Vert A \Vert} \end{array}$$
We reach the same conclusion of $k(A) = \Vert A^{-1} \Vert \Vert A \Vert$ for a small change in $A$ under the linear system $AB = C$.
Consider the ill-conditioned matrices from previous examples:
V = sp.MatrixSymbol('V', 3, 3)
Vi = sp.MatrixSymbol('V^{-1}', 3, 3)
fmt.displayMath(fmt.joinMath('=', V, sp.Matrix(cov*1e4).evalf(4)), fmt.joinMath('=', Vi, sp.Matrix(cov*1e4).inv().evalf(5)))
A = sp.MatrixSymbol('A', 2, 2)
Ai = sp.MatrixSymbol('A^{-1}', 2, 2)
fmt.displayMath(fmt.joinMath('=', A, sp.Matrix(a)), fmt.joinMath('=', Ai, sp.Matrix(a).inv().evalf(4)))
their condition numbers are large because of the large elements in the inversion:
fmt.displayDF(pd.DataFrame([[np.linalg.cond(x, n) for n in (1, 2, inf)] for x in [a, cov*1e4]],
columns = ["L-1", "L-2", "L-$\infty$"], index=['Condition number $A$', 'Condition number $V$']), "4g")
L-1 | L-2 | L-$\infty$ | |
---|---|---|---|
Condition number $A$ | 3.599e+04 | 2.499e+04 | 3.599e+04 |
Condition number $V$ | 3.667e+06 | 2.52e+06 | 3.667e+06 |
Orthogonal transformation is unconditionally stable:
$$\Vert Q \bs u\Vert_2^2 = (Q \bs u)^T(Q \bs u) = \bs u^T Q^TQ \bs u = \bs u^T \bs u = \Vert \bs u \Vert_2^2$$
Required reading:
Homework: