Lecture 2: Linear Algebra¶

Topics¶

Review of Linear Algebra
Gaussian Elimination and LU Factorization
Cholesky decomposition
Matrix calculus
Norm and condition

Review of Linear Algebra¶

Richard Feynman: In fact, mathematics is, to a large extent, invention of better notations.

Vector¶

Is a set of elements, which can be real or complex:

$\renewcommand{bs}{\boldsymbol}$

$$ \begin{matrix} \bs u = \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{matrix} \right) \hspace{2cm} \bs v = \left( \begin{matrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{matrix} \right) \end{matrix} $$

Basic vector operations¶

Vector addition: $\bs{w = u + v}$
Scalar multiplication: $\bs w = a \bs u$, where $a$ is a scalar

Notation¶

We use the following notation throughout the class:

column vector: $\bs {u, v, x, \beta}$
row vector: $\bs u^T, \bs v^T, \bs x^T, \bs \beta^T$
scalar: $a, b, \alpha, \beta$
matrix: $A, B, P$
random variables: $\tilde a, \tilde{\bs u}, \tilde{\bs v}^T$

Vector addition¶

Associativity: $\bs{u + (v + w) = (u + v) + w}$
Commutativity: $\bs{u + v = v + u}$
Identity: $\bs{v + 0 = v}$ for $\forall \bs{v}$
Inverse: for $\forall \bs{v}$, exists an $-\bs{v}$, so that $\bs{v + (-v) = 0}$
Distributivity: $a \bs{(u + v)} = a \bs{u} + a \bs v, (a+b) \bs v = a \bs v + b \bs v$

Scalar multiplication¶

Associativity: $a(b \bs v) = (ab)\bs v$
Identity: $1 \bs v = \bs v$, for $\forall \bs v$

Vector space $\Omega$¶

a collection of vectors that can be added and multiplied by scalars.

Vector subspace¶

a subset of the vector space $\Omega' \subset \Omega$ that is closed under vector addition and scalar multiplication

Linear combination¶

$\bs v = a_1 \bs v_1 + a_2 \bs v_2 + ... + a_n \bs v_n$
Linear combinations of vectors form a subspace $\Omega_v = \text{span}(\bs{v_1, v_2, ... v_n}) \subset \Omega$
Linear independence: $\bs {v = 0} \iff a_k = 0$ for $\forall k$
Basis: any set of linearly independent $\bs v_i$ that spans $\Omega_v$
Dimension of $\Omega_v$ is the number of vectors in (any of) its basis

Inner product¶

$\langle \bs u, a \bs v_1 + b \bs v_2 \rangle = a \langle \bs{u, v_1} \rangle + b \langle \bs{u, v_2} \rangle$
$\langle \bs{u, v} \rangle = \langle \bs{v, u} \rangle^c$
$\langle \bs{u, u} \rangle \ge 0$
$\langle \bs{u, u} \rangle = 0 \iff u = 0$
$\bs {u, v}$ orthogonal if $\langle \bs{u, v} \rangle = 0$

Dot product¶

A special case of inner product:

the standard inner product: $\bs{u \cdot v} = \sum_{k=1}^n u_k^c v_k$
the magnitude of a vector $\bs b$ is $\vert b \vert = \sqrt{\bs b \cdot \bs b}$
the projection of vector $\bs a$ to the direction of vector $\bs b$ is: $ a_1 = \frac{\bs a \cdot \bs b}{\vert b \vert} = \vert \bs a \vert \cos(\theta) $

Matrix¶

Represents a linear response to multiple input factors:

$$ \overset{\text{Outputs}}{\longleftarrow}\overset{\downarrow \text{Inputs}} {\begin{pmatrix} a_{11} & a_{12} & . & a_{1n} \\ a_{21} & a_{22} & . & a_{2n} \\ . & . & . & \\ a_{m1} & a_{m2} & . & a_{mn} \end{pmatrix}} $$

Matrix addition and scalar multiplication are element wise, similar to those for vectors

Matrix multiplication¶

$$\begin{array} \\ \bs u = A \bs v &\iff u_i = \sum_{j=1}^{n} a_{ij}v_j \\ C = AB &\iff c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj} = a_i \cdot b_j \end{array}$$

Matrix represents linear transformation¶

Linear function on vectors:

$L(\bs{u + v}) = L(\bs u) + L(\bs v)$
$L(a \bs v) = a L(\bs v)$

Any linear transformation between finite dimensional vector space can be represented by a matrix multiplication, therefore we can write $L \bs u$ instead of $L(\bs u)$.

Properties of linear transformation¶

Associativity: $A(BC) = (AB)C$
Distributivity:
- $A(B+C) = AB + AC$
- $(B+C)A = BA + CA$
- $\alpha (A+B) = \alpha A + \alpha B$
But not commutative: $AB \ne BA$

$\renewcommand{id}{I}$

Matrix definitions¶

Identity matrix $\id$: $\id A = A \id = A$
$A^T$ is the transpose of $A$: $a^T_{ij} = a_{ji}$
Symmetric matrix: $A = A^T$
$A^*$ is the adjoint of $A$: $a^*_{ij} = a_{ji}^c$
- real matrix: $A^T = A^*$
- self-adjoint (Hermitian) matrix: $A = A^*$
Inverse matrix: $AA^{-1} = A^{-1}A = \id$
Orthogonal matrix: $A^T = A^{-1} \iff AA^T = \id$

LU Factorization¶

Factorize: to resolve into factors.

Linear system¶

In matrix form, a linear system is $A \bs {x = y}$
It has a unique solution if $A$ is a full rank square matrix

In [3]:

a = sp.Matrix([[2, 1, -1], [-6, -2, 4], [-2, 1, 2]])
y = sp.Matrix([8, -22, -3])
X = sp.MatrixSymbol('x', 3, 1)
x1, x2, x3 = sp.symbols('x_1, x_2, x_3')
x = sp.Matrix([x1, x2, x3])

fmt.displayMath(a, fmt.joinMath('=', x, y), sep="", pre="\\scriptsize ")

$$\scriptsize \left(\begin{matrix}2 & 1 & -1\\-6 & -2 & 4\\-2 & 1 & 2\end{matrix}\right)\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=\left(\begin{matrix}8\\-22\\-3\end{matrix}\right)$$

Gaussian elimination¶

Eliminate the $x_1$ terms using the first row, this operation is a linear transformation:

In [4]:

A = sp.MatrixSymbol('A', 3, 3)
L1 = sp.MatrixSymbol('L_1', 3, 3)
L2 = sp.MatrixSymbol('L_2', 3, 3)
l1 = sp.eye(3)
l1[1, 0] = -a[1, 0]/a[0, 0]
l1[2, 0] = -a[2, 0]/a[0, 0]

fmt.displayMath(L1*a, fmt.joinMath('=', x, L1*y), "\;,\;\;", l1*a, fmt.joinMath('=', x, l1*y), sep="", 
                pre="\\scriptsize ")

$$\scriptsize L_{1} \left(\begin{matrix}2 & 1 & -1\\-6 & -2 & 4\\-2 & 1 & 2\end{matrix}\right)\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=L_{1} \left(\begin{matrix}8\\-22\\-3\end{matrix}\right)\;,\;\;\left(\begin{matrix}2 & 1 & -1\\0 & 1 & 1\\0 & 2 & 1\end{matrix}\right)\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=\left(\begin{matrix}8\\2\\5\end{matrix}\right)$$

Use the 2nd equation (row) to eliminate the $x_2$ terms:

In [5]:

l2 = sp.eye(3)
a2 = l1*a
y2 = l1*y
l2[2, 1] = -a2[2, 1]/a2[1, 1]
u = l2*a2
fmt.displayMath(L2*a2, fmt.joinMath('=', x, L2*y2), "\;,\;", u, fmt.joinMath('=', x, l2*y2), 
                sep="", pre="\\scriptsize ")

$$\scriptsize L_{2} \left(\begin{matrix}2 & 1 & -1\\0 & 1 & 1\\0 & 2 & 1\end{matrix}\right)\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=L_{2} \left(\begin{matrix}8\\2\\5\end{matrix}\right)\;,\;\left(\begin{matrix}2 & 1 & -1\\0 & 1 & 1\\0 & 0 & -1\end{matrix}\right)\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=\left(\begin{matrix}8\\2\\1\end{matrix}\right)$$

the $L_1$ and $L_2$ are both lower triangular matrices

In [6]:

Ui = sp.MatrixSymbol('U^{-1}', 3, 3)
U = sp.MatrixSymbol('U', 3, 3)
L = sp.MatrixSymbol('L', 3, 3)
fmt.displayMath(fmt.joinMath('=', L1, l1), fmt.joinMath('=', L2, l2), fmt.joinMath('=', U, u), 
                pre="\\scriptsize ")

$$\scriptsize L_{1}=\left(\begin{matrix}1 & 0 & 0\\3 & 1 & 0\\1 & 0 & 1\end{matrix}\right)\;,\;\;\;L_{2}=\left(\begin{matrix}1 & 0 & 0\\0 & 1 & 0\\0 & -2 & 1\end{matrix}\right)\;,\;\;\;U=\left(\begin{matrix}2 & 1 & -1\\0 & 1 & 1\\0 & 0 & -1\end{matrix}\right)$$

The resulting matrix $U = L_2L_1A$ is upper triangular

LU factorization¶

The triangular matrix is easy to invert by variable replacement

In [7]:

y3 = l2*y2
a3 = l2*a2
ui = a3.inv()
fmt.displayMath(fmt.joinMath('=', Ui, ui), "\;,\;", fmt.joinMath('=', x, Ui), 
                fmt.joinMath('=', l2*y2, ui*y3), sep="\;", pre="\\scriptsize ")

$$\scriptsize U^{-1}=\left(\begin{matrix}\frac{1}{2} & - \frac{1}{2} & -1\\0 & 1 & 1\\0 & 0 & -1\end{matrix}\right)\;\;,\;\;\left(\begin{matrix}x_{1}\\x_{2}\\x_{3}\end{matrix}\right)=U^{-1}\;\left(\begin{matrix}8\\2\\1\end{matrix}\right)=\left(\begin{matrix}2\\3\\-1\end{matrix}\right)$$

Now we can group $L = L_1^{-1}L_2^{-1}$ and obtain the LU factorization

$$L_2 L_1 A = U \iff A = L_1^{-1} L_2^{-1} U \iff A = LU $$

$U$ is a upper triangular matrix.
There can be infinite numbers of LU pairs, the convention is to keep the diagonal elements of $L$ matrix 1.

In [8]:

l =  l1.inv()*l2.inv()
fmt.displayMath(fmt.joinMath('=', L, l), fmt.joinMath('=', U, a3), fmt.joinMath('=', L*U, l*a3), 
                pre="\\scriptsize ")

$$\scriptsize L=\left(\begin{matrix}1 & 0 & 0\\-3 & 1 & 0\\-1 & 2 & 1\end{matrix}\right)\;,\;\;\;U=\left(\begin{matrix}2 & 1 & -1\\0 & 1 & 1\\0 & 0 & -1\end{matrix}\right)\;,\;\;\;L U=\left(\begin{matrix}2 & 1 & -1\\-6 & -2 & 4\\-2 & 1 & 2\end{matrix}\right)$$

The LU factorization is the matrix representation of Gaussian elimination
LU factorization can be used to compute matrix inversion
- triangular matrix can be inverted by simple substitution

Pivoting¶

The Gaussian elimination does not work if there are 0s in the diagonal of the matrix.

The rows of the matrix can be permuted first, so that the diagonal elements have the greatest magnitude.
Permuting rows is a linear operation, thus it can be expressed as a Matrix $P$

$$ A = P \cdot L \cdot U $$

where the $P$ matrix represents the row permutation. The permutation (pivoting) also improve the numerical stability:

In [9]:

from scipy.linalg import lu

def displayMultiple(fs) :
    tl=map(lambda tc: '$' + sp.latex(tc) + '$',fs)
    r = '''
  <table border="0"><tr>'''
    for v in tl :
        r += "<td>" + v + "</td>"
    r += "</tr></table>"
    return r

a = sp.Matrix([[0, 3, 1, 2], [4, 0, -3, 1], [-3, 1, 0, 2], [9, 2, 5, 0]])
p, l, u = map(lambda x: sp.Matrix(x), lu(a))
Pi = sp.MatrixSymbol('P^{-1}', 4, 4)
A = sp.MatrixSymbol('A', 4, 4)
P = sp.MatrixSymbol('P', 4, 4)
fmt.displayMath(fmt.joinMath('=', A, a), fmt.joinMath('=', P, p), pre="\\scriptsize ")

$$\scriptsize A=\left(\begin{matrix}0 & 3 & 1 & 2\\4 & 0 & -3 & 1\\-3 & 1 & 0 & 2\\9 & 2 & 5 & 0\end{matrix}\right)\;,\;\;\;P=\left(\begin{matrix}0.0 & 1.0 & 0.0 & 0.0\\0.0 & 0.0 & 1.0 & 0.0\\0.0 & 0.0 & 0.0 & 1.0\\1.0 & 0.0 & 0.0 & 0.0\end{matrix}\right)$$

In [10]:

fmt.displayMath(sp.Eq (Pi*A, p.inv()*a), pre="\\scriptsize ")

$$\scriptsize P^{-1} A = \left(\begin{matrix}9.0 & 2.0 & 5.0 & 0\\0 & 3.0 & 1.0 & 2.0\\4.0 & 0 & -3.0 & 1.0\\-3.0 & 1.0 & 0 & 2.0\end{matrix}\right)$$

Cholesky Decomposition¶

a.k.a Cholesky Factorization

Covariance matrix¶

The most important and ubiquitous matrix in quant Finance,

given random factors $\bs{\tilde r} = [r_1, ..., r_n]^T$ and their expectation: $\bar{\bs r} = \mathbb{E}[\bs {\tilde r}]$

The covariance matrix is:

$$V = \mathbb{E}[(\bs {\tilde r} - \bar{\bs r})(\bs {\tilde r} - \bar{\bs r})^T] = \mathbb{E}[\bs{\tilde r} \bs{\tilde r}^T] - \bar{\bs r}\bar{\bs r}^T $$

The element $(i, j)$ in $V$ is: $\text{cov}(r_i, r_j) = \rho_{ij} \sigma_i \sigma_j$.

Covariance of linear combinations of factors:

$$\begin{array}{l} \text{cov}(\bs x^T \bs r, \bs y^T \bs r) &= \mathbb{E}[(\bs x^T \bs r)(\bs r^T \bs y)] - \mathbb{E}[\bs x^T \bs r]\mathbb{E}[\bs r^T \bs y]\\ &= \bs x^T \mathbb{E}[\bs r \bs r^T] \bs y - \bs x^T \bar{\bs r}\bar{\bs r}^T \bs y = \bs x^T V \bs y \end{array}$$

Correlation matrix¶

$\renewcommand{Sigma}{\mathcal{S}}$

$C = (\rho_{ij})$ is the co-variance matrix of the normalized factors $\bs {\tilde s} = [\frac{r_1}{\sigma_1}, ..., \frac{r_n}{\sigma_n}]^T$
$V = \Sigma C \Sigma $, where $\Sigma$ is a diagonal matrix of $\sigma_i$
all elements in a correlation matrix are within [-1, 1]

Symmetric positive definite (SPD)¶

Positive definite:

Matrix $A$ is positive definite if $\bs x^T A \bs x > 0$ for $\forall \bs{x \ne 0}$
Matrix $A$ is semi positive definite if $\bs x^T A \bs {x \ge 0}$ for $\forall \bs{x \ne 0}$
Positive definite does not imply every element in the matrix is positive

Both covariance and correlation matrices are symmetric (semi) positive definite (SPD):

$\bs x^T V \bs x = \text{cov}[\bs x^T \bs {\tilde r},\bs x^T \bs {\tilde r}] = \text{var}[\bs x^T \bs {\tilde r}] \ge 0$
$\bs x^T C \bs x = \text{cov}[\bs x^T \bs {\tilde s},\bs x^T \bs {\tilde s}] = \text{var}[\bs x^T \bs {\tilde s}] \ge 0$

Example: weekly price and returns¶

In [11]:

import pandas as pd
f3 = pd.read_csv('data/f3.csv', parse_dates=[0]).set_index('Date').sort_index()

fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
f3.plot(title='Historical Prices', ax=ax1);

weeks_in_year = 52. #business weeks per year

ax2 = fig.add_subplot(122)
r = np.log(f3).diff()
r.plot(title='Historical Returns', ax=ax2);

Correlation and covariance matrix of weekly returns:

In [12]:

cm = r.corr()
cv = r.cov()
    
fmt.displayDFs(cm, cv*weeks_in_year, headers=['Correlation', 'Covariance'], fontsize=4, fmt="4f")

Correlation

Covariance

	SPY	GLD	OIL
SPY	1.0000	0.0321	0.4191
GLD	0.0321	1.0000	0.3011
OIL	0.4191	0.3011	1.0000

	SPY	GLD	OIL
SPY	0.0418	0.0013	0.0292
GLD	0.0013	0.0379	0.0200
OIL	0.0292	0.0200	0.1159

Does it make sense to compute the covariance and correlation matrix of the price levels?

Cholesky decomposition¶

If $A$ is symmetric semi positive definite (SPD) matrix

$A$ can be decomposed as $A = LL^T$, where $L$ is lower triangle
In another word, the $U = L^T$ in $A$'s LU decomposition
$L$ can be viewed as the "square root" of $A$

Cholesky decomposition is not unique if the matrix is semi positive definite, but unique when it is positive definite:

The rank is same between $L$ and $A$
Beware that $A = LL^T \neq L^TL$

Examples of Cholesky decomposition:¶

Previous correlation matrix:

In [13]:

lcm = np.linalg.cholesky(cm)
lcv = np.linalg.cholesky(cv)
fmt.displayMath(sp.Matrix(lcm).evalf(4), 
                fmt.joinMath('=', sp.Matrix(lcm.T).evalf(4), sp.Matrix(lcm.dot(lcm.T)).evalf(4)), 
                sep="", pre="\\tiny")

$$\tiny \left(\begin{matrix}1.0 & 0 & 0\\0.03215 & 0.9995 & 0\\0.4191 & 0.2878 & 0.8611\end{matrix}\right)\left(\begin{matrix}1.0 & 0.03215 & 0.4191\\0 & 0.9995 & 0.2878\\0 & 0 & 0.8611\end{matrix}\right)=\left(\begin{matrix}1.0 & 0.03215 & 0.4191\\0.03215 & 1.0 & 0.3011\\0.4191 & 0.3011 & 1.0\end{matrix}\right)$$

Previous covariance matrix:

In [14]:

fmt.displayMath(sp.Matrix(lcv).evalf(3), 
                fmt.joinMath('=', sp.Matrix(lcv.T).evalf(3), sp.Matrix(lcv.dot(lcv.T)).evalf(3)), 
                sep="", pre="\\tiny")

$$\tiny \left(\begin{matrix}0.0284 & 0 & 0\\0.000868 & 0.027 & 0\\0.0198 & 0.0136 & 0.0406\end{matrix}\right)\left(\begin{matrix}0.0284 & 0.000868 & 0.0198\\0 & 0.027 & 0.0136\\0 & 0 & 0.0406\end{matrix}\right)=\left(\begin{matrix}0.000804 & 2.46 \cdot 10^{-5} & 0.000561\\2.46 \cdot 10^{-5} & 0.000729 & 0.000384\\0.000561 & 0.000384 & 0.00223\end{matrix}\right)$$

Recursive algorithm¶

A SPD matrix $A$ and its Cholesky decomposition $L$ can be partitioned as:

In [15]:

s_A, a11, A12, A21, A22 = sp.symbols("A, a_{11} A_{21}^T A_{21} A_{22}")
s_L, s_LT, l11, L12, L12T, L22, L22T = sp.symbols("L L^T l_{11} L_{21} L_{21}^T L_{22} L_{22}^T")

A = sp.Matrix([[a11, A12], [A21, A22]])
L = sp.Matrix([[l11, 0], [L12, L22]])
LT = sp.Matrix([[l11, L12T], [0, L22T]])

fmt.displayMath(fmt.joinMath('=', s_A, A), fmt.joinMath('=', s_L, L), fmt.joinMath('=', s_LT, LT)
               , pre='\\scriptsize')

$$\scriptsize A=\left(\begin{matrix}a_{11} & A_{21}^T\\A_{21} & A_{22}\end{matrix}\right)\;,\;\;\;L=\left(\begin{matrix}l_{11} & 0\\L_{21} & L_{22}\end{matrix}\right)\;,\;\;\;L^{T}=\left(\begin{matrix}l_{11} & L_{21}^T\\0 & L_{22}^T\end{matrix}\right)$$

From $A = LL^T$:

In [16]:

fmt.displayMath(fmt.joinMath('=', A,  L*LT), pre='\\scriptsize')

$$\scriptsize \left(\begin{matrix}a_{11} & A_{21}^T\\A_{21} & A_{22}\end{matrix}\right)=\left(\begin{matrix}l_{11}^{2} & L_{21}^T l_{11}\\L_{21} l_{11} & L_{21} L_{21}^T + L_{22} L_{22}^T\end{matrix}\right)$$

We immediately have:

In [17]:

fmt.displayMath(fmt.joinMath('=', l11, sp.sqrt(a11)), fmt.joinMath('=', L12,  A21/l11), 
                fmt.joinMath('=', A22 - L12*L12T, L22*L22T), pre='\\scriptsize')

$$\scriptsize l_{11}=\sqrt{a_{11}}\;,\;\;\;L_{21}=\frac{A_{21}}{l_{11}}\;,\;\;\;A_{22} - L_{21} L_{21}^T=L_{22} L_{22}^T$$

Note that $L_{22}$ is the Cholesky decomposition of the smaller matrix of $A_{22} - \frac{1}{a_{11}}A_{21}A_{21}^T$.

Correlated Brownian motion¶

Ubiquitous in quantitative Finance

The most common processes for asset prices and risk factors

Example: correlated n-dimensional Geometric Brownian motion:

$$ \frac{dx^k_t}{x^k_t} = u^k dt + \sigma^k dw^k_t, \;\;\;\;dw_t^j\cdot dw_t^k = \rho_{jk} dt $$

In vector form: $$ d \bs x = X \bs u dt + X \Sigma d \bs w, \;\;\; d\bs w d\bs w^T = C dt$$

where $X$ is a diagonal matrix of $x_i$, and $\Sigma$ is a diagonal matrix of $\sigma_i$ and $C$ is the correlation matrix of $\rho_{ij}$

Draw correlated Brownians¶

Draw discretized version of $\delta \bs w = L \bs z \sqrt{\delta t}$

where $\bs z$ is a vector of independent standard normal random variables
- $\mathbb{E}[\bs z] = \bs 0$, $\mathbb{E}[\bs z \bs z^T] = I$
$L$ is the Cholesky decomposition of the correlaton matrix $C = LL^T$

$\delta \bs w$ have the desired correlation: $$\mathbb{E}[\delta \bs w \delta \bs w^T] = \mathbb{E}[L \bs z \bs z^T L^T] \delta t = L \mathbb{E}[\bs z \bs z^T] L^T \delta t = LL^T \delta t = C \delta t$$

Equivalently, we can draw $\Sigma \delta \bs w = (\Sigma L) \bs z\sqrt{\delta t}$, where

$\Sigma L$ is the Cholesky decomposition of the covariance matrix $$ C = LL^T \iff V = \Sigma C \Sigma = \Sigma L (\Sigma L)^T $$

Simulated paths¶

In [18]:

# the code below ignores the drifts and its correction

nweeks = 1000
e = np.random.normal(size=[3, nweeks])
dw = (lcm.dot(e)).T
ts = np.arange(nweeks)/weeks_in_year

dws = dw*np.diag(cv)
wcm = np.exp(np.cumsum(dws, 0))

figure(figsize=[12, 4])
subplot(1, 2, 1)

stdev = np.sqrt(np.diag(cv))

dw = (lcm.dot(e)).T
dws = dw*stdev
wcm = np.exp(np.cumsum(dws, 0))
plot(ts, wcm)
xlabel("Year")
legend(cm.index, loc='best');
title('$L \epsilon$')

subplot(1, 2, 2)
dw2 = (lcm.T.dot(e)).T
dws2 = dw2*stdev
wcm = np.exp(np.cumsum(dws2, 0))
plot(ts, wcm)
xlabel("Year")
legend(cm.index, loc='best');
title(('$L^T \epsilon$'));

$L \bs \epsilon$ and $L^T \bs \epsilon$ are different, gives different correlation

In [19]:

df1 = pd.DataFrame(np.corrcoef(dws.T), columns=cm.index, index=cm.index)
df2 = pd.DataFrame(np.corrcoef(dws2.T), columns=cm.index, index=cm.index)

fmt.displayDFs(df1, df2, headers=['$L \epsilon$', '$L^T \epsilon$'], fontsize=4)

$L \epsilon$

$L^T \epsilon$

	SPY	GLD	OIL
SPY	1	-0.00805	0.4677
GLD	-0.00805	1	0.2535
OIL	0.4677	0.2535	1

	SPY	GLD	OIL
SPY	1	0.09084	0.4258
GLD	0.09084	1	0.2219
OIL	0.4258	0.2219	1

Big correlation/covariance matrices¶

In practice, we work with thousands of risk factors

Correlation matrix is easier to maintain than the covariance matrix, because it is "normalized"

Very difficult to keep large correlation matrices semi positive definite (SPD)

Size of a few thousands is the practical limit
Small changes in few values can invalidate the whole matrix
Adding new entries can be extremely difficult

Dimensionality reduction is required when dealing with very large number of factors.

Complexity¶

Complexity of a numerical algorithm is stated in the order of magnitude, often in the big-O notation:

binary search $O(\log(n))$
best sorting algorithm: $O(n \log(n))$

Most common numerical linear algebra algorithms are of complexity of $O(n^3)$

matrix multiplication: $n^2$ elements in output, each element takes $O(n)$
LU decomposition: $n$ diagonal elements, to zero-out each column costs $O(n^2)$
Cholesky decomposition: $n$ recursive steps, each takes $O(n^2)$

Matrix Calculus¶

Morpheus: The Matrix is a system, Neo.

Scalar function¶

$$f(\bs x) = f(x_1, ..., x_n)$$

Derivative to vector (Gradient): $\frac{\partial f}{\partial \bs x} = \nabla f = [\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}]$
Note that $\frac{\partial f}{\partial \bs x}$ is always a row vector, whenever the vector or matrix appears in the denominator of differentiation, the result is transposed.
Some time we use the notation: $\frac{\partial f}{\partial \bs x^T} = \left(\frac{\partial f}{\partial \bs x}\right)^T $ to denote a column vector.

Vector function¶

$$\renewcommand{p}{\partial}\bs y(\bs x) = [y_1(\bs x), ..., y_n(\bs x)]^T$$

Derivative to vector (Jacobian matrix): $\frac{\partial \bs y}{\partial \bs x} = \left[\frac{\partial{y_1}}{\partial \bs x}, \frac{\partial{y_2}}{\partial \bs x}, \cdots, \frac{\partial{y_n}}{\partial \bs x} \right]^T$
- $\frac{\partial \bs y}{\partial \bs x}$ is a matrix of $\bs y$ rows and $\bs x$ columns
- $\frac{\partial \bs y}{\partial \bs x}\frac{\partial \bs x}{\partial \bs y} = \id$, even when $\bs x$ and $\bs y$ are of different dimension.
- sometime we use the following notation: $\frac{\partial \bs y}{\partial \bs x^T} = \left(\frac{\partial \bs y}{\partial \bs x}\right)^T$
Derivative to scalar: $\frac{\partial \bs y}{\partial z} = [\frac{\partial y_1}{\partial z}, \cdots, \frac{\partial y_n}{\partial z}]^T$
- remains a column vector

Vector differentiation cheatsheet¶

$A, a, b, \bs c$ are constants (ie, not functions of $\bs x$)
$\bs {u=u(x), v=v(x)}, y=y(\bs x)$ are functions of $\bs x$
$O$ is the zero matrix, $I$ is the identity matrix

Expression	Results	Special Cases
$\frac{\p(a \bs u + b \bs v)}{\p \bs x}$	$a \frac{\p{\bs u}}{\p \bs x} + b\frac{\p{\bs v}}{\p \bs x}$	$\frac{\p{\bs c}}{\p \bs x} = O, \frac{\p{\bs x}}{\p \bs x} = \id$
$\frac{\p{A \bs u}}{\p \bs x}$	$A\frac{\p{\bs u}}{\p \bs x}$	$\frac{\p{\bs A x}}{\p \bs x} = A, \frac{\p{\bs x^T A}}{\p \bs x} = A^T$
$\frac{\p{y \bs u}}{\p \bs x}$	$y \frac{\p{\bs u}}{\p \bs x} + \bs u \frac{\p{y}}{\p \bs x}$	-
$\frac{\p \bs{u}^T A \bs v}{\p \bs x} $	$\bs u^T A \frac{\p{\bs v}}{\p \bs x} + \bs v^T A^T \frac{\p{\bs u}}{\p \bs x} $	$\frac{\p \bs{x}^T A \bs x}{\p \bs x} = \bs x^T (A + A^T) $, $\frac{\p \bs{u^Tv}}{\p \bs x} =\bs{u}^T\frac{\p \bs{v}}{\p \bs x} + \bs{v}^T\frac{\p \bs{u}}{\p \bs x}$
$\frac{\p{\bs g(\bs u})}{\p \bs x}$	$\frac{\p{\bs g}}{\p \bs u} \frac{\p{\bs u}}{\p \bs x} $	$\frac{\p \bs y}{\p \bs x}\frac{\p \bs x}{\p \bs y} = \id$ , $\frac{\p \bs z}{\p \bs y}\frac{\p \bs y}{\p \bs x} \frac{\p \bs x} {\p \bs z}= \id$, multi-step chain rules.

Similar to univariate calculus, a compact notation for multivariate calculus
Replace $A^T$ by $A^*$ for complex matrix

Portfolio optimization¶

Powerful mean/variance portfolio theory can be expressed succinctly using linear algebra and matrix calculus.

Suppose there are $n$ risky assets on the market, with random return vector $\tilde{\bs r}$ whose covariance matrix is $V$,

$\bs w$: a portfolio, its elements are values (dollar) invested in each asset
- $\bs w^T \tilde{\bs r}$ is the portfolio's P&L
- $\sigma^2 = \bs w^TV\bs w$: the variance of the portfolio P&L

Excess return forecast¶

$\bs f = \mathbb{E}[\tilde{\bs r}] - r_0$ is a vector of excess return forecast of all risky assets
- $r_0$ is the risk free rate
$\bs f$ is a view, which can be from:
- Fundamental research: earning forecasts, revenue growth etc
- Technical and quantitative analysis
- Your secret trading signal

Sharpe ratio¶

Sharpe ratio of a portfolio $\bs w$: $$s(\bs w) = \frac{\bs w^T \bs f}{\sqrt{\bs w^T V \bs w}}$$

Sharpe ratio is invariant under:

portfolio scaling: $s(a \bs w) = s(\bs w)$
leverage or deleverage: borrowing or lending money using the risk free asset

Portfolio optimization¶

Given the view $\bs f$, the optimal portfolio $\bs w$ to express the view is:

minimize the variance of portfolio P&L (risk): $\bs w^TV\bs w$
while preserving a unit dollar of excess P&L: $\bs w^T \bs f = 1$

which solves the portfolio $\bs w^*$ with the maximum Sharpe ratio under the view $\bs f$:

$\bs w^*$ is a unique solution
all portfolios with optimal Sharpe ratio have the same risky asset mix
No need to constrain the total value because Sharp ratio is invariant by scaling the portfolio.

Why we take the covariance matrix $V$ as a constant, but treat expected return $\bs f$ as a variable?

Characteristic portfolio¶

The optimal portfolio can be solved analytically using Lagrange multiplier and matrix calculus:

$$\begin{eqnarray} l &=& \bs w^TV \bs w - 2 \lambda (\bs f^T \bs w - 1) \\ \frac{\partial l}{\partial \bs w^T} &=& 2 V \bs w - 2 \lambda \bs f = \bs 0 \\ \bs w &=& \lambda V^{-1} \bs f \end{eqnarray}$$

plug it into $\bs f^T \bs w = 1$:

$$ \bs f^T(\lambda V^{-1} \bs f) = 1 \iff \lambda = \frac{1}{\bs f^T V^{-1} \bs f}$$ $$\bs w^*(\bs f) = \frac{V^{-1}\bs f}{\bs f^T V^{-1} \bs f} \propto V^{-1}\bs f$$

$\bs w^*(\bs f)$ is also known as the characteristic portfolio for the forecast $\bs f$.

Relativity of return forecast¶

The optimal portfolio unchanged if $\bs f$ is scaled by a scalar $a$:

$$\bs w^*(a\bs f) = \frac{1}{a} \bs w^*(\bs f)$$

$\bs w^*$ and $\frac{1}{a} \bs w^*$ defines the same risky asset mix, with identical Sharpe ratio

Therefore, only the relative sizes of excess returns are important

e.g. excess return forecast of [1%, 2%] and [10%, 20%] of two assets would result in identical optimal (characteristic) portfolio

Benchmark portfolio and $\bs \beta$¶

Benchmark portfolio $\bs w_b$ is usually an index portfolio to measure the performance of active portfolio management.

The beta of a portfolio $\bs w$ to the benchmark is $$\frac{\text{cov}(\bs w^T \tilde{\bs r}, \bs w_b^T \tilde{\bs r})}{\sigma_b^2} = \frac{\bs w^T V \bs w_b}{\sigma_b^2} = \bs w^T \bs \beta_b$$
Define $\bs \beta_b = \frac{V\bs w_b}{\sigma^2_b} $, the vector of individual assets' betas
$\bs w_b^T \bs \beta_b = \frac{\bs w_b^T V\bs w_b}{\sigma^2_b} = {1}$, the benchmark portfolio itself has a unit beta

$\bs w_m$ is the market portfolio, as defined in CAPM

$\bs \beta_m = \frac{V\bs w_m}{\sigma^2_m}$: the betas vector to the market portfolio $\bs w_m$

Important characteristic portfolios¶

Identical returns: $\bs f \propto \bs e = [1, 1, ...., 1]^T$:

A naive view that all assets have the same excess returns (zero information)
$\bs e^T\bs w = 1$ means it is a portfolio of \$1 fully invested in risky assets
$\bs w_e = \frac{V^{-1}\bs e}{\bs e^TV^{-1}\bs e}$ has the minimum variance of those fully-invested

Beta to a benchmark portfolio: $\bs f \propto \bs \beta_b = \frac{V\bs w_b}{\sigma_b^2}$:

$\bs \beta_b^T \bs w_b = 1$ means the portfolio has a beta of 1 to the benchmark portfolio $\bs w_b$
$\bs w_{\beta} = \frac{V^{-1}\bs \beta_b}{\bs \beta_b^T V^{-1}\bs \beta_b} = \frac{\bs w_b}{\bs{\beta_b^T w_b}} = \bs w_b$, i.e., the benchmark portfolio itself
the benchmark portfolio itself is optimal amongst those with unit beta.

Portfolio optimization example¶

Given the following covariance matrix estimates and excess return forecasts:

In [20]:

er = np.array([.05, .02, .01]).T
flat_r = np.array([.01, .01, .01]).T
df_er = pd.DataFrame(np.array([er, flat_r]).T*100, columns=["Forcast (%)", "Naive(%)"], index = f3.columns).T


fmt.displayDFs(cv*1e4, df_er, fontsize=4, headers=['Covariance', 'Epected Excess Return'])

Covariance

Epected Excess Return

	SPY	GLD	OIL
SPY	8.043	0.2461	5.611
GLD	0.2461	7.288	3.838
OIL	5.611	3.838	22.28

	SPY	GLD	OIL
Forcast (%)	5	2	1
Naive(%)	1	1	1

The (normalized) optimal portfolio for the given forecast is:

In [20]:

cvi = np.linalg.inv(cv)
w = cvi.dot(er.T)/er.T.dot(cvi).dot(er)
w = w/np.sum(w)
df_er.loc['Optimial Portfolio (O)', :] = w

w2 = cvi.dot(flat_r.T)/er.T.dot(cvi).dot(flat_r)
w2 = w2/np.sum(w2)
df_er.loc['Min Vol Portfolio (C)', :] = w2

fmt.displayDF(df_er[-2:], "4g", fontsize=4)

	SPY	GLD	OIL
Optimial Portfolio (O)	0.8333	0.395	-0.2283
Min Vol Portfolio (C)	0.5007	0.5433	-0.044

Efficient Frontier¶

The simulated Sharpe ratios of all $1 portfolios fully invested in risky assets:

the optimal portfolio O has the optimal Sharpe Ratio
the min Variance portfolio C has the smallest variance
the line start from origin because we used excess return everywhere

In [21]:

rnd_w = np.random.uniform(size=[3, 10000]) - 0.5
rnd_w = np.divide(rnd_w, np.sum(rnd_w, 0))

rnd_r = er.dot(rnd_w)
rnd_vol = np.sqrt([p.dot(cv).dot(p.T) for p in rnd_w.T])

vol_o = sqrt(w.dot(cv).dot(w))
r_o = er.dot(w)
plot(rnd_vol, rnd_r, 'g.')
plot([0, 10*vol_o], [0, 10*r_o], 'k');
xlim(0, .06)
ylim(-.01, .08)

plot(vol_o, r_o, 'ro')
text(vol_o-.001, r_o+.003, 'O', size=20)
r_c = er.dot(w2)
vol_c = sqrt(w2.dot(cv).dot(w2))
plot(vol_c, r_c, 'r>')
text(vol_c-.002, r_c-.008, 'C', size=20);

legend(['Portfolios', 'Optimal Sharpe Ratio'], loc='best')

ylabel('Portfolio Expected Excess Return')
xlabel('Portfolio Return Volatility')
title('Sharpe Ratios of $1 portfolios');

Implied view of a portfolio¶

From any portfolio, we can back out its implied excess return forecast $\bs f$:

Assuming the portfolio is mean variance optimal
It is nothing but the betas to the portfolio
Only meaningful in a relative sense as well

Consider the market portfolio: the market collectively believe that $\bs f \propto \bs \beta_m$:

$$\mathbb{E}[\tilde{\bs r}] - r_0 = \bs \beta_m \left(r_m - r_0\right)$$

$r_m = \frac{\mathbb{E}[\bs w_m^T \bs {\tilde r}]}{\bs w_m^T \bs 1} $ is the market portfolio's expected return
this is exactly the CAPM.

Estimate expected returns¶

Estimate expected return from historical data is difficult

historical return is not a good indicator of future performance
even if we assume it is, it requires very long history, ~500 years

The market implied return is a much better return estimate than historical data.

This is the key insight of Black-Litterman

Implied views example¶

Suppose we are given the following portfolio:

In [22]:

w = np.array([10, 5, 5])
vb = w.dot(cv).dot(w)
ir = cv.dot(w)/vb

df = pd.DataFrame(np.array([w, ir])*100, index=["$ Position", "Implied Return %"], 
                  columns = ["SPY", "GLD", "OIL"])
fmt.displayDF(df[:1], "4g", fontsize=4)

	SPY	GLD	OIL
$ Position	1,000	500	500

We can compute its implied return forecast as:

In [23]:

fmt.displayDF(df[1:], "2f", fontsize=4)

	SPY	GLD	OIL
Implied Return %	4.73	2.50	8.04

note that these forecast are only meaningful in a relative sense

Does the investor really have so much confidence in OIL?

Norm and Condition¶

Robert Heinlein: Throughout history, poverty is the normal condition of man.

Ill-conditioned covariance matrix¶

The Mean-variance optimization is very powerful, but there are potential pitfalls in practice:

Suppose we have the following covariance matrix and excess return forecast, then we can compute the optimal portfolio.

In [24]:

nt = 1000
es = np.random.normal(size=[2, nt])

rho = .999999
e4 = rho/np.sqrt(2)*es[0, :] + rho/np.sqrt(2)*es[1,:] + np.sqrt(1-rho*rho)*np.random.normal(size=[1, nt])
es = np.vstack([es, e4])

cor = np.corrcoef(es)
cor1 = np.copy(cor)
cor1[0, 1] = cor1[1, 0] = cor[0, 1] + .00002
 
sd = np.eye(3)
np.fill_diagonal(sd, np.std(r))
cov = sd.dot(cor).dot(sd.T)
cov1 = sd.dot(cor1).dot(sd.T)

e, v = np.linalg.eig(np.linalg.inv(cov))

In [25]:

er = v[:, 2]/10

df_cov = pd.DataFrame(cov*1e4, index=f3.columns, columns=f3.columns)
df_cov1 = pd.DataFrame(cov1*1e4, index=f3.columns, columns=f3.columns)

pf = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi = np.linalg.inv(cov)
pf.loc['Optimal Portfolio', :] = covi.dot(er.T)/np.sum(covi.dot(er.T))

fmt.displayDFs(df_cov, pf, headers=["Covariance", "Optimized Portfolio"], fontsize=4, fmt="4f")

Covariance

Optimized Portfolio

	SPY	GLD	OIL
SPY	8.0234	0.5060	10.0011
GLD	0.5060	7.2697	9.0361
OIL	10.0011	9.0361	22.2275

	SPY	GLD	OIL
Expected Return	0.0390	0.0342	0.0855
Optimal Portfolio	0.2456	0.2158	0.5387

A few days later, there is a tiny change in covariance matrix, but

In [26]:

pf1 = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi1 = np.linalg.inv(cov1)
pf1.loc['Optimal Portfolio', :] = covi1.dot(er.T)/np.sum(covi1.dot(er.T))

fmt.displayDFs(df_cov1, pf1, headers=["Covariance", "Optimized Portfolio"], fontsize=4, fmt="4f")

Covariance

Optimized Portfolio

	SPY	GLD	OIL
SPY	8.0234	0.5062	10.0011
GLD	0.5062	7.2697	9.0361
OIL	10.0011	9.0361	22.2275

	SPY	GLD	OIL
Expected Return	0.0390	0.0342	0.0855
Optimal Portfolio	0.0462	0.0098	0.9441

the optimal portfolio is totally different, how is it possible?

Ill-conditioned linear system¶

Consider the following linear system $A\bs x = \bs y$, and its solution:

In [27]:

a = np.array([[1, 2], [2, 3.999]])
x = sp.MatrixSymbol('x', 2, 1)
y = np.array([4, 7.999])
fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(y)), 
                fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, y), 4))))

$$ \left(\begin{matrix}1.0 & 2.0\\2.0 & 3.999\end{matrix}\right) x=\left(\begin{matrix}4.0\\7.999\end{matrix}\right)\;,\;\;\;x=\left(\begin{matrix}2.0\\1.0\end{matrix}\right)$$

A small perturbation on vector $\bs y$:

In [28]:

z = np.copy(y)
z[1] += .002

fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(z)), 
                fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, z), 4))))

$$ \left(\begin{matrix}1.0 & 2.0\\2.0 & 3.999\end{matrix}\right) x=\left(\begin{matrix}4.0\\8.001\end{matrix}\right)\;,\;\;\;x=\left(\begin{matrix}6.0\\-1.0\end{matrix}\right)$$

A small perturbation on matrix $A$:

In [29]:

b = np.copy(a)
b[1, 1] += .003

fmt.displayMath(fmt.joinMath('=', sp.Matrix(b)*x, sp.Matrix(y)), 
                fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(b, y), 4))))

$$ \left(\begin{matrix}1.0 & 2.0\\2.0 & 4.002\end{matrix}\right) x=\left(\begin{matrix}4.0\\7.999\end{matrix}\right)\;,\;\;\;x=\left(\begin{matrix}5.0\\-0.5\end{matrix}\right)$$

How do we identify ill-conditioned linear system in practice?

Vector norms¶

is a measure of the magnitude of the vector:

Positive: $\Vert \bs u\Vert \ge 0$, $\Vert \bs u\Vert = 0 \iff \bs{u = 0}$
Homogeneous: $\Vert a \bs u \Vert = |a| \Vert \bs u\Vert $
Triangle inequality: $\Vert \bs u + \bs v\Vert \le \Vert \bs u\Vert + \Vert \bs v\Vert $

Common vector norms¶

L1: $\Vert \bs u\Vert _1 = \sum_i | u_i |$
L2 (Euclidean): $\Vert \bs u\Vert _2 = (\sum u_i^2)^{\frac{1}{2}} = (\bs u^T \bs u)^\frac{1}{2}$
Lp: $\Vert \bs u\Vert _p = (\sum | u_i |^p)^{\frac{1}{p}}$
L${\infty}$: $\Vert \bs u\Vert _\infty = \max(|u_1|, |u_2|, ..., |u_n|)$

Vector norms comparison¶

Vectors with unit norms:

Unit L2 norm forms a perfect circle (sphere in high dimension)
Unit L1 and L${\infty}$ norms are square boxes
The difference between different norms are not significant

Matrix norms¶

Elementwise: based on looking at matrix as an elongated vector
Shatten norms: based on singular values
Vector norm induced: most common and introduced below

Defined to be the largest amount the linear transformation can stretch a vector:

$$\Vert A\Vert = \max_{\bs u \ne 0}\frac{\Vert A\bs u\Vert }{\Vert \bs u\Vert }$$

The matrix norm definition depends on the vector norms. Only L1 and L$\infty$ matrix norm have analytical formula:

L1: $\Vert A\Vert _1 = \max_{j} \sum_i |a_{ij}|$
L2: $\Vert A \Vert_2 =$ the largest singular value of $A$
L${\infty}$: $\Vert A\Vert _\infty = \max_i \sum_j |a_{ij}|$

Norm inequalities¶

$\Vert A \bs u \Vert \le \Vert A\Vert \Vert u\Vert $
$\Vert b A\Vert = |b| \Vert A\Vert $
$\Vert A + B\Vert \le \Vert A\Vert + \Vert B\Vert $
$\Vert AB\Vert \le \Vert A\Vert \Vert B\Vert$

Matrix condition¶

The propagation of errors in a linear system $\bs y = A\bs x$ with invertible $A$:

Consider a perturbation to $\bs{ x' = x} + d\bs x$, and corresponding $d \bs y = A d\bs x$:

$$\begin{array}\\ \Vert d \bs y\Vert &= \Vert A d \bs x\Vert \le \Vert A\Vert\Vert d \bs x\Vert = \Vert A\Vert \Vert \bs x\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \\ &= \Vert A\Vert \Vert A^{-1} \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \le \Vert A\Vert \Vert A^{-1} \Vert \Vert \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \end{array}$$

$$\frac{\Vert d \bs y\Vert }{\Vert \bs y\Vert } \le \Vert A\Vert\Vert A^{-1}\Vert\frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert}$$

$k(A) = \Vert A\Vert\Vert A^{-1}\Vert$ is the condition number for the linear system $\bs y = A\bs x$, which defines the maximum possible magnification of the relative error.

Matrix perturbation¶

What if we change the matrix itself? i.e. given $AB = C$, how would $B$ change under a small change in $A$ while holding $C$ constant?

we can no longer directly compute it via matrix calculus
perturbation is a powerful technique to solve this types of problem

We can write any $\delta A = \dot{A} \epsilon$ and the resulting $\delta B = \dot{B} \epsilon$:

$\dot{A}, \dot{B}$ are matrices representing the direction of the perturbation
$\epsilon$ is a first order small scalar

$$\begin{array} \\ (A + \delta A) (B + \delta B) &= (A + \dot{A} \epsilon ) (B + \dot{B} \epsilon) = C \\ AB + (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= C \\ (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= 0 \end{array}$$

Now we collect the first order terms of $\epsilon$, $\dot{A}B + A\dot{B} = 0$:

$$\begin{array} \\ \dot{B} &= -A^{-1}\dot{A}B \\ \delta B &= -A^{-1}\delta A B \\ \Vert \delta B \Vert &= \Vert A^{-1}\delta A B \Vert \le \Vert A^{-1} \Vert \Vert \delta A \Vert \Vert B \Vert \\ \frac{\Vert \delta B \Vert}{\Vert B \Vert} &\le \Vert A^{-1} \Vert \Vert \delta A \Vert = \Vert A^{-1} \Vert \Vert A \Vert \frac{\Vert \delta A \Vert}{\Vert A \Vert} \end{array}$$

We reach the same conclusion of $k(A) = \Vert A^{-1} \Vert \Vert A \Vert$ for a small change in $A$ under the linear system $AB = C$.

we will cover the condition number for non-square matrix in the next class.

Numerical example¶

Consider the ill-conditioned matrices from previous examples:

In [30]:

V = sp.MatrixSymbol('V', 3, 3)
Vi = sp.MatrixSymbol('V^{-1}', 3, 3)
fmt.displayMath(fmt.joinMath('=', V, sp.Matrix(cov*1e4).evalf(4)), 
                fmt.joinMath('=', Vi, sp.Matrix(cov*1e4).inv().evalf(5)), pre="\\scriptsize")

$$\scriptsize V=\left(\begin{matrix}8.023 & 0.506 & 10.0\\0.506 & 7.27 & 9.036\\10.0 & 9.036 & 22.23\end{matrix}\right)\;,\;\;\;V^{-1}=\left(\begin{matrix}32668.0 & 32335.0 & -27844.0\\32335.0 & 32006.0 & -27560.0\\-27844.0 & -27560.0 & 23732.0\end{matrix}\right)$$

In [31]:

A = sp.MatrixSymbol('A', 2, 2)
Ai = sp.MatrixSymbol('A^{-1}', 2, 2)
fmt.displayMath(fmt.joinMath('=', A, sp.Matrix(a)), 
                fmt.joinMath('=', Ai, sp.Matrix(a).inv().evalf(4)), pre="\\scriptsize")

$$\scriptsize A=\left(\begin{matrix}1.0 & 2.0\\2.0 & 3.999\end{matrix}\right)\;,\;\;\;A^{-1}=\left(\begin{matrix}-3999.0 & 2000.0\\2000.0 & -1000.0\end{matrix}\right)$$

their condition numbers are large because of the large elements in the inversion:

In [32]:

fmt.displayDF(pd.DataFrame([[np.linalg.cond(x, n) for n in (1, 2, inf)] for x in [a, cov*1e4]],
             columns = ["L-1", "L-2", "L-$\infty$"], index=['Condition number $A$', 'Condition number $V$']), 
              "4g", fontsize=4)

	L-1	L-2	L-$\infty$
Condition number $A$	3.599e+04	2.499e+04	3.599e+04
Condition number $V$	3.831e+06	2.688e+06	3.831e+06

Orthogonal transformation¶

Orthogonal transformation is unconditionally stable:

$$\Vert Q \bs u\Vert_2^2 = (Q \bs u)^T(Q \bs u) = \bs u^T Q^TQ \bs u = \bs u^T \bs u = \Vert \bs u \Vert_2^2$$

therefore by definition: $\Vert Q \Vert_2 = \Vert Q^{-1} \Vert_2 = 1$
$k(Q) = \Vert Q \Vert_2 \Vert Q^{-1} \Vert_2 = 1$
the relative error does not grow under orthogonal transformation.
Orthogonal transformation is extremely important in numerical linear algebra.

Assignments¶

Required reading:

Bindel and Goodman: Chapter 4, 5.1-5.4

Highly recommended reading:

Deflating Sharpe Ratio: http://www.davidhbailey.com/dhbpapers/deflated-sharpe.pdf

Homework:

Complete homework set 2