Richard Feynman: In fact, mathematics is, to a large extent, invention of better notations.
$\renewcommand{bs}{\boldsymbol}$
$$ \begin{matrix} \bs u = \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{matrix} \right) \hspace{2cm} \bs v = \left( \begin{matrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{matrix} \right) \end{matrix} $$
We use the following notation throughout the class:
A special case of inner product:
$$ \overset{\text{Outputs}}{\longleftarrow}\overset{\downarrow \text{Inputs}} {\begin{pmatrix} a_{11} & a_{12} & . & a_{1n} \\ a_{21} & a_{22} & . & a_{2n} \\ . & . & . & \\ a_{m1} & a_{m2} & . & a_{mn} \end{pmatrix}} $$
$$\begin{array} \\ \bs u = A \bs v &\iff u_i = \sum_{j=1}^{n} a_{ij}v_j \\ C = AB &\iff c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj} = a_i \cdot b_j \end{array}$$
Linear function on vectors:
Any linear transformation bewteen finite dimensional vector space can be represented by a matrix multiplication, therefore we can write $L \bs u$ instead of $L(\bs u)$.
$\renewcommand{id}{I}$
%pylab inline
lecture = 2
import fmt
import sympy as sp
from IPython.display import display, HTML
sp.init_printing(use_latex = True)
Populating the interactive namespace from numpy and matplotlib
a = sp.Matrix([[2, 1, -1], [-6, -2, 4], [-2, 1, 2]])
y = sp.Matrix([8, -22, -3])
X = sp.MatrixSymbol('x', 3, 1)
x1, x2, x3 = sp.symbols('x_1, x_2, x_3')
x = sp.Matrix([x1, x2, x3])
fmt.displayMath(a, fmt.joinMath('=', x, y), sep="", pre="\\scriptsize ")
Eliminate the $x_1$ terms using the first row, this operation is a linear transformation:
A = sp.MatrixSymbol('A', 3, 3)
L1 = sp.MatrixSymbol('L_1', 3, 3)
L2 = sp.MatrixSymbol('L_2', 3, 3)
l1 = sp.eye(3)
l1[1, 0] = -a[1, 0]/a[0, 0]
l1[2, 0] = -a[2, 0]/a[0, 0]
fmt.displayMath(L1*a, fmt.joinMath('=', x, L1*y), "\;,\;\;", l1*a, fmt.joinMath('=', x, l1*y), sep="",
pre="\\scriptsize ")
Use the 2nd equation (row) to eliminate the $x_2$ terms:
l2 = sp.eye(3)
a2 = l1*a
y2 = l1*y
l2[2, 1] = -a2[2, 1]/a2[1, 1]
u = l2*a2
fmt.displayMath(L2*a2, fmt.joinMath('=', x, L2*y2), "\;,\;", u, fmt.joinMath('=', x, l2*y2),
sep="", pre="\\scriptsize ")
the $L_1$ and $L_2$ are both lower triangular matrix
Ui = sp.MatrixSymbol('U^{-1}', 3, 3)
U = sp.MatrixSymbol('U', 3, 3)
L = sp.MatrixSymbol('L', 3, 3)
fmt.displayMath(fmt.joinMath('=', L1, l1), fmt.joinMath('=', L2, l2), fmt.joinMath('=', U, u),
pre="\\scriptsize ")
The resulting matrix $U = L_2L_1A$ is upper trianglar
The triangular matrix is easy to invert by variable replacement
y3 = l2*y2
a3 = l2*a2
ui = a3.inv()
fmt.displayMath(fmt.joinMath('=', Ui, ui), "\;,\;", fmt.joinMath('=', x, Ui),
fmt.joinMath('=', l2*y2, ui*y3), sep="\;", pre="\\scriptsize ")
Now we can group $L = L_1^{-1}L_2^{-1}$ and obtain the LU factorization
$$L_2 L_1 A = U \iff A = L_1^{-1} L_2^{-1} U \iff A = LU $$
l = l1.inv()*l2.inv()
fmt.displayMath(fmt.joinMath('=', L, l), fmt.joinMath('=', U, a3), fmt.joinMath('=', L*U, l*a3),
pre="\\scriptsize ")
The Gaussian elimination does not work if there are 0s in the diagonal of the matrix.
$$ A = P \cdot L \cdot U $$
where the $P$ matrix represent the row permutation. The permutation (pivoting) also improve the numerical stability:
from scipy.linalg import lu
def displayMultiple(fs) :
tl=map(lambda tc: '$' + sp.latex(tc) + '$',fs)
r = '''
<table border="0"><tr>'''
for v in tl :
r += "<td>" + v + "</td>"
r += "</tr></table>"
return r
a = sp.Matrix([[0, 3, 1, 2], [4, 0, -3, 1], [-3, 1, 0, 2], [9, 2, 5, 0]])
p, l, u = map(lambda x: sp.Matrix(x), lu(a))
Pi = sp.MatrixSymbol('P^{-1}', 4, 4)
A = sp.MatrixSymbol('A', 4, 4)
P = sp.MatrixSymbol('P', 4, 4)
fmt.displayMath(fmt.joinMath('=', A, a), fmt.joinMath('=', P, p), pre="\\scriptsize ")
fmt.displayMath(sp.Eq (Pi*A, p.inv()*a), pre="\\scriptsize ")
a.k.a Cholesky Factorization
The most important and ubiquitous matrix in quant Finance,
The covariance matrix is:
$$V = \mathbb{E}[(\bs {\tilde r} - \bar{\bs r})(\bs {\tilde r} - \bar{\bs r})^T] = \mathbb{E}[\bs{\tilde r} \bs{\tilde r}^T] - \bar{\bs r}\bar{\bs r}^T $$
Covariance of linear combinations of factors:
$$\begin{array}{l} \text{cov}(\bs x^T \bs r, \bs y^T \bs r) &= \mathbb{E}[(\bs x^T \bs r)(\bs r^T \bs y)] - \mathbb{E}[\bs x^T \bs r]\mathbb{E}[\bs r^T \bs y]\\ &= \bs x^T \mathbb{E}[\bs r \bs r^T] \bs y - \bs x^T \bar{\bs r}\bar{\bs r}^T \bs y = \bs x^T V \bs y \end{array}$$
$\renewcommand{Sigma}{\mathcal{S}}$
Positive definite:
Both covariance and correlation matrices are symmetric (semi) positive definte (SPD):
import pandas as pd
f3 = pd.read_csv('data/f3.csv', parse_dates=[0]).set_index('Date').sort()
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
f3.plot(title='Historical Prices', ax=ax1);
ax2 = fig.add_subplot(122)
r = np.log(f3).diff()
r.plot(title='Historical Returns', ax=ax2);
-c:2: FutureWarning: sort(....) is deprecated, use sort_index(.....)
Correlation and covariance matrix of weekly returns:
cm = r.corr()
cv = r.cov()
fmt.displayDFs(cm, cv*1e4, headers=['Correlation', 'Covariance'], fmt="4f")
Correlation | Covariance | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
If $A$ is symmetric semi positive definate (SPD) matrix
Cholesky decomposition is unique:
Previous correlation matrix:
lcm = np.linalg.cholesky(cm)
lcv = np.linalg.cholesky(cv)
fmt.displayMath(sp.Matrix(lcm).evalf(4),
fmt.joinMath('=', sp.Matrix(lcm.T).evalf(4), sp.Matrix(lcm.dot(lcm.T)).evalf(4)),
sep="", pre="\scriptsize")
Previoius covariance matrix:
fmt.displayMath(sp.Matrix(lcv).evalf(4),
fmt.joinMath('=', sp.Matrix(lcv.T).evalf(4), sp.Matrix(lcv.dot(lcv.T)).evalf(4)),
sep="", pre="\scriptsize")
A SPD matrix $A$ and its Choleski decomposition $L$ can be partitioned as:
s_A, a11, A12, A21, A22 = sp.symbols("A, a_{11} A_{21}^T A_{21} A_{22}")
s_L, s_LT, l11, L12, L12T, L22, L22T = sp.symbols("L L^T l_{11} L_{21} L_{21}^T L_{22} L_{22}^T")
A = sp.Matrix([[a11, A12], [A21, A22]])
L = sp.Matrix([[l11, 0], [L12, L22]])
LT = sp.Matrix([[l11, L12T], [0, L22T]])
fmt.displayMath(fmt.joinMath('=', s_A, A), fmt.joinMath('=', s_L, L), fmt.joinMath('=', s_LT, LT)
, pre='\\scriptsize')
From $A = LL^T$:
fmt.displayMath(fmt.joinMath('=', A, L*LT), pre='\\scriptsize')
We immediately have:
fmt.displayMath(fmt.joinMath('=', l11, sp.sqrt(a11)), fmt.joinMath('=', L12, A21/l11),
fmt.joinMath('=', A22 - L12*L12T, L22*L22T), pre='\\scriptsize')
Note that $L_{22}$ is the Cholesky decomposition of the smaller matrix of $A_{22} - \frac{1}{a_{11}}A_{21}A_{21}^T$.
Ubiquitous in quantitative Finance
Example: correlated n-dimensional Geometric Brownian motion:
$$ \frac{dx^k_t}{x^k_t} = u^k dt + \sigma^k dw^k_t, \;\;\;\;dw_t^j\cdot dw_t^k = \rho_{jk} dt $$
In vector form: $$ d \bs x = X \bs u dt + X \Sigma d \bs w, \;\;\; d\bs w d\bs w^T = C dt$$
where $X$ is a diagonal matrix of $x_i$, and $\Sigma$ is a diagonal matrix of $\sigma_i$ and $C$ is the correlation matrix of $\rho_{ij}$
Draw discretized version of $\delta \bs w = L \bs z \sqrt{\delta t}$
$\delta \bs w$ have the desired correlation: $$\mathbb{E}[\delta \bs w \delta \bs w^T] = \mathbb{E}[L \bs z \bs z^T L^T] \delta t = L \mathbb{E}[\bs z \bs z^T] L^T \delta t = LL^T \delta t = C \delta t$$
Equivalently, we can draw $\Sigma \delta \bs w = (\Sigma L) \bs z\sqrt{\delta t}$, where
# the code below ignores the drifts and its correction
e = np.random.normal(size=[3, 20000])
dw = (lcm.dot(e)).T
dws = dw*np.diag(cv)
wcm = np.exp(np.cumsum(dws, 0))
figure(figsize=[12, 4])
subplot(1, 2, 1)
dw = (lcm.dot(e)).T
dws = dw*np.diag(cv)
wcm = np.exp(np.cumsum(dws, 0))
plot(wcm)
legend(cm.index, loc='best');
title('$L \epsilon$')
subplot(1, 2, 2)
dw2 = (lcm.T.dot(e)).T
dws2 = dw2*np.diag(cv)
wcm = np.exp(np.cumsum(dws2, 0))
plot(wcm)
legend(cm.index, loc='best');
title(('$L^T \epsilon$'));
$L \bs \epsilon$ and $L^T \bs \epsilon$ are different, gives different correlation
df1 = pd.DataFrame(np.corrcoef(dws.T), columns=cm.index, index=cm.index)
df2 = pd.DataFrame(np.corrcoef(dws2.T), columns=cm.index, index=cm.index)
fmt.displayDFs(df1, df2, headers=['$L \epsilon$', '$L^T \epsilon$'])
$L \epsilon$ | $L^T \epsilon$ | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
In practice, we work with thousands of risk factors
Very difficult to keep large correlation matrices semi positive definite (SPD)
Dimensionality reduction is required when dealing with very large number of factors.
Complexity of a numerical algorithm is stated in the order of magnitude, often in the big-O notation:
Most common numerical linear algebra algorithms are of complexity of $O(n^3)$
Morpheus: The Matrix is a system, Neo.
$$f(\bs x) = f(x_1, ..., x_n)$$
$$\renewcommand{p}{\partial}\bs y(\bs x) = [y_1(\bs x), ..., y_n(\bs x)]^T$$
Expression | Results | Special Cases |
---|---|---|
$\frac{\p(a \bs u + b \bs v)}{\p \bs x}$ | $a \frac{\p{\bs u}}{\p \bs x} + b\frac{\p{\bs v}}{\p \bs x}$ | $\frac{\p{\bs c}}{\p \bs x} = O, \frac{\p{\bs x}}{\p \bs x} = \id$ |
$\frac{\p{A \bs u}}{\p \bs x}$ | $A\frac{\p{\bs u}}{\p \bs x}$ | $\frac{\p{\bs A x}}{\p \bs x} = A, \frac{\p{\bs x^T A}}{\p \bs x} = A^T$ |
$\frac{\p{y \bs u}}{\p \bs x}$ | $y \frac{\p{\bs u}}{\p \bs x} + \bs u \frac{\p{y}}{\p \bs x}$ | - |
$\frac{\p \bs{u}^T A \bs v}{\p \bs x} $ | $\bs u^T A \frac{\p{\bs v}}{\p \bs x} + \bs v^T A^T \frac{\p{\bs u}}{\p \bs x} $ | $\frac{\p \bs{x}^T A \bs x}{\p \bs x} = \bs x^T (A + A^T) $, $\frac{\p \bs{u^Tv}}{\p \bs x} =\bs{u}^T\frac{\p \bs{v}}{\p \bs x} + \bs{v}^T\frac{\p \bs{u}}{\p \bs x}$ |
$\frac{\p{\bs g(\bs u})}{\p \bs x}$ | $\frac{\p{\bs g}}{\p \bs u} \frac{\p{\bs u}}{\p \bs x} $ | $\frac{\p \bs y}{\p \bs x}\frac{\p \bs x}{\p \bs y} = \id$ , $\frac{\p \bs z}{\p \bs y}\frac{\p \bs y}{\p \bs x} \frac{\p \bs x} {\p \bs z}= \id$, multi-step chain rules. |
Powerful mean/variance portfolio theory can be expressed succinctly using linear algebra and matrix calculus.
Suppose there are $n$ risky assets on the market, with random return vector $\tilde{\bs r}$ whose covariance matrix is $V$,
Sharpe ratio of a portfolio $\bs w$: $$s(\bs w) = \frac{\bs w^T \bs f}{\sqrt{\bs w^T V \bs w}}$$
Sharpe ratio is invariant under:
Given the view $\bs f$, the optimal portfolio $\bs w$ to express the view is:
which solves the portfolio $\bs w^*$ with the maximum Sharpe ratio under the view $\bs f$:
Why we take the covariance matrix $V$ as a constant, but treat expected return $\bs f$ as a variable?
The optimal portfolio can be solved analytically using Lagrange multiplier and matrix calculus:
$$\begin{eqnarray} l &=& \bs w^TV \bs w - 2 \lambda (\bs f^T \bs w - 1) \\ \frac{\partial l}{\partial \bs w^T} &=& 2 V \bs w - 2 \lambda \bs f = \bs 0 \\ \bs w &=& \lambda V^{-1} \bs f \end{eqnarray}$$
plug it into $\bs f^T \bs w = 1$:
$$ \bs f^T(\lambda V^{-1} \bs f) = 1 \iff \lambda = \frac{1}{\bs f^T V^{-1} \bs f}$$ $$\bs w^*(\bs f) = \frac{V^{-1}\bs f}{\bs f^T V^{-1} \bs f} \propto V^{-1}\bs f$$
$\bs w^*(\bs f)$ is also known as the characteristic portfolio for the forecast $\bs f$.
The optimal portfolio unchanged if $\bs f$ is scaled by a scalar $a$:
$$\bs w^*(a\bs f) = \frac{1}{a} \bs w^*(\bs f)$$
Therefore, only the relative sizes of excess returns are important
Benchmark portfolio $\bs w_b$ is usually an index portfolio to measure the performane of active portfolio management.
$\bs w_m$ is the market portfolio, as defined in CAPM
Identical returns: $\bs f \propto \bs e = [1, 1, ...., 1]^T$:
Beta to a benchmark portfolio: $\bs f \propto \bs \beta_b = \frac{V\bs w_b}{\sigma_b^2}$:
Given the following covariance matrix estimates and excess return forecasts:
er = np.array([.05, .02, .01]).T
flat_r = np.array([.01, .01, .01]).T
df_er = pd.DataFrame(np.array([er, flat_r]).T*100, columns=["Forcast (%)", "Naive(%)"], index = f3.columns).T
fmt.displayDFs(cv*1e4, df_er, headers=['Covariance', 'Epected Excess Return'])
Covariance | Epected Excess Return | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The (normalized) optimal portfolio for the given forecast is:
cvi = np.linalg.inv(cv)
w = cvi.dot(er.T)/er.T.dot(cvi).dot(er)
w = w/np.sum(w)
df_er.ix['Optimial Portfolio (O)', :] = w
w2 = cvi.dot(flat_r.T)/er.T.dot(cvi).dot(flat_r)
w2 = w2/np.sum(w2)
df_er.ix['Min Vol Portfolio (C)', :] = w2
fmt.displayDF(df_er[-2:], "4g")
SPY | GLD | OIL | |
---|---|---|---|
Optimial Portfolio (O) | 0.8333 | 0.395 | -0.2283 |
Min Vol Portfolio (C) | 0.5007 | 0.5433 | -0.044 |
The simulated Sharpe ratios of all $1 portfolios fully invested in risky assets:
rnd_w = np.random.uniform(size=[3, 10000]) - 0.5
rnd_w = np.divide(rnd_w, np.sum(rnd_w, 0))
rnd_r = er.dot(rnd_w)
rnd_vol = np.sqrt([p.dot(cv).dot(p.T) for p in rnd_w.T])
vol_o = sqrt(w.dot(cv).dot(w))
r_o = er.dot(w)
plot(rnd_vol, rnd_r, 'g.')
plot([0, 10*vol_o], [0, 10*r_o], 'k');
xlim(0, .06)
ylim(-.01, .08)
plot(vol_o, r_o, 'ro')
text(vol_o-.001, r_o+.003, 'O', size=20)
r_c = er.dot(w2)
vol_c = sqrt(w2.dot(cv).dot(w2))
plot(vol_c, r_c, 'r>')
text(vol_c-.002, r_c-.008, 'C', size=20);
legend(['Portfolios', 'Optimal Sharpe Ratio'], loc='best')
ylabel('Portfolio Expected Excess Return')
xlabel('Portfolio Return Volatility')
title('Sharpe Ratios of $1 portfolios');
From any portfolio, we can backout its implied excess return forecast $\bs f$:
Consider the market portfolio: the market collectively believe that $\bs f \propto \bs \beta_m$:
$$\mathbb{E}[\tilde{\bs r}] - r_0 = \bs \beta_m \left(r_m - r_0\right)$$
Estimate expected return from historical data is difficult
The market implied return is a much better return estimate than historical data.
Suppose we are given the following portfolio:
w = np.array([10, 5, 5])
vb = w.dot(cv).dot(w)
ir = cv.dot(w)/vb
df = pd.DataFrame(np.array([w, ir])*100, index=["$ Position", "Implied Return %"],
columns = ["SPY", "GLD", "OIL"])
fmt.displayDF(df[:1], "4g")
SPY | GLD | OIL | |
---|---|---|---|
$ Position | 1,000 | 500 | 500 |
We can compute its implied return forecast as:
fmt.displayDF(df[1:], "2f")
SPY | GLD | OIL | |
---|---|---|---|
Implied Return % | 4.73 | 2.50 | 8.04 |
Does the investor really have so much confidence in OIL?
Robert Heinlein: Throughout history, poverty is the normal condition of man.
The Mean-variance optimization is very powerful, but there are potential pitfalls in practice:
nt = 1000
es = np.random.normal(size=[2, nt])
rho = .999999
e4 = rho/np.sqrt(2)*es[0, :] + rho/np.sqrt(2)*es[1,:] + np.sqrt(1-rho*rho)*np.random.normal(size=[1, nt])
es = np.vstack([es, e4])
cor = np.corrcoef(es)
cor1 = np.copy(cor)
cor1[0, 1] = cor1[1, 0] = cor[0, 1] + .00002
sd = np.eye(3)
np.fill_diagonal(sd, np.std(r))
cov = sd.dot(cor).dot(sd.T)
cov1 = sd.dot(cor1).dot(sd.T)
e, v = np.linalg.eig(np.linalg.inv(cov))
er = v[:, 2]/10
df_cov = pd.DataFrame(cov*1e4, index=f3.columns, columns=f3.columns)
df_cov1 = pd.DataFrame(cov1*1e4, index=f3.columns, columns=f3.columns)
pf = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi = np.linalg.inv(cov)
pf.ix['Optimal Portfolio', :] = covi.dot(er.T)/np.sum(covi.dot(er.T))
fmt.displayDFs(df_cov, pf, headers=["Covariance", "Optimized Portfolio"], fmt="4f")
Covariance | Optimized Portfolio | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
A few days later, there is a tiny change in covariance matrix, but
pf1 = pd.DataFrame(np.array([er]), columns=f3.columns, index=['Expected Return'])
covi1 = np.linalg.inv(cov1)
pf1.ix['Optimal Portfolio', :] = covi1.dot(er.T)/np.sum(covi1.dot(er.T))
fmt.displayDFs(df_cov1, pf1, headers=["Covariance", "Optimized Portfolio"], fmt="4f")
Covariance | Optimized Portfolio | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
the optimal portfolio is totally different, how is it posible?
Consider the following linear system $A\bs x = \bs y$, and its solution:
a = np.array([[1, 2], [2, 3.999]])
x = sp.MatrixSymbol('x', 2, 1)
y = np.array([4, 7.999])
fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(y)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, y), 4))))
A small perturbation on vector $\bs y$:
z = np.copy(y)
z[1] += .002
fmt.displayMath(fmt.joinMath('=', sp.Matrix(a)*x, sp.Matrix(z)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(a, z), 4))))
A small perturbtion on matrix $A$:
b = np.copy(a)
b[1, 1] += .003
fmt.displayMath(fmt.joinMath('=', sp.Matrix(b)*x, sp.Matrix(y)),
fmt.joinMath('=', x, sp.Matrix(np.round(np.linalg.solve(b, y), 4))))
is a measure of the magnitude of the vector:
Vectors with unit norms:
Defined to be the largest amount the linear transformation can stretch a vector:
$$\Vert A\Vert = \max_{\bs u \ne 0}\frac{\Vert A\bs u\Vert }{\Vert \bs u\Vert }$$
The matrix norm definition depends on the vector norms. Only L1 and L$\infty$ matrix norm have analytical formula:
The propogation of errors in a linear system $\bs y = A\bs x$ with invertable $A$:
$$\begin{array}\\ \Vert d \bs y\Vert &= \Vert A d \bs x\Vert \le \Vert A\Vert\Vert d \bs x\Vert = \Vert A\Vert \Vert \bs x\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \\ &= \Vert A\Vert \Vert A^{-1} \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \le \Vert A\Vert \Vert A^{-1} \Vert \Vert \bs y\Vert \frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert } \end{array}$$
$$\frac{\Vert d \bs y\Vert }{\Vert \bs y\Vert } \le \Vert A\Vert\Vert A^{-1}\Vert\frac{\Vert d \bs x\Vert }{\Vert \bs x\Vert}$$
What if we change the matrix itself? i.e. given $AB = C$, how would $B$ change under a small change in $A$ while holding $C$ constant?
We can write any $\delta A = \dot{A} \epsilon$ and the resulting $\delta B = \dot{B} \epsilon$:
$$\begin{array} \\ (A + \delta A) (B + \delta B) &= (A + \dot{A} \epsilon ) (B + \dot{B} \epsilon) = C \\ AB + (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= C \\ (\dot{A}B + A\dot{B})\epsilon + \dot{A}\dot{B}\epsilon^2 &= 0 \end{array}$$
Now we collect the first order terms of $\epsilon$, $\dot{A}B + A\dot{B} = 0$:
$$\begin{array} \\ \dot{B} &= -A^{-1}\dot{A}B \\ \delta B &= -A^{-1}\delta A B \\ \Vert \delta B \Vert &= \Vert A^{-1}\delta A B \Vert \le \Vert A^{-1} \Vert \Vert \delta A \Vert \Vert B \Vert \\ \frac{\Vert \delta B \Vert}{\Vert B \Vert} &\le \Vert A^{-1} \Vert \Vert \delta A \Vert = \Vert A^{-1} \Vert \Vert A \Vert \frac{\Vert \delta A \Vert}{\Vert A \Vert} \end{array}$$
We reach the same conclusion of $k(A) = \Vert A^{-1} \Vert \Vert A \Vert$ for a small change in $A$ under the linear system $AB = C$.
Consider the ill-conditioned matrices from previous examples:
V = sp.MatrixSymbol('V', 3, 3)
Vi = sp.MatrixSymbol('V^{-1}', 3, 3)
fmt.displayMath(fmt.joinMath('=', V, sp.Matrix(cov*1e4).evalf(4)),
fmt.joinMath('=', Vi, sp.Matrix(cov*1e4).inv().evalf(5)), pre="\\scriptsize")
A = sp.MatrixSymbol('A', 2, 2)
Ai = sp.MatrixSymbol('A^{-1}', 2, 2)
fmt.displayMath(fmt.joinMath('=', A, sp.Matrix(a)),
fmt.joinMath('=', Ai, sp.Matrix(a).inv().evalf(4)), pre="\\scriptsize")
their condition numbers are large because of the large elements in the inversion:
fmt.displayDF(pd.DataFrame([[np.linalg.cond(x, n) for n in (1, 2, inf)] for x in [a, cov*1e4]],
columns = ["L-1", "L-2", "L-$\infty$"], index=['Condition number $A$', 'Condition number $V$']),
"4g")
L-1 | L-2 | L-$\infty$ | |
---|---|---|---|
Condition number $A$ | 3.599e+04 | 2.499e+04 | 3.599e+04 |
Condition number $V$ | 4.198e+06 | 2.853e+06 | 4.198e+06 |
Orthogonal transformation is unconditionally stable:
$$\Vert Q \bs u\Vert_2^2 = (Q \bs u)^T(Q \bs u) = \bs u^T Q^TQ \bs u = \bs u^T \bs u = \Vert \bs u \Vert_2^2$$
Required reading:
Highly recommended reading:
Deflating Sharpe Ratio: http://www.davidhbailey.com/dhbpapers/deflated-sharpe.pdf
Homework: