# initialize the environment
%pylab inline
import sympy as sp
import fmt
import pandas as pd
sp.init_printing(use_latex = True)
from IPython.display import display
lecture = 3
Populating the interactive namespace from numpy and matplotlib
regression: noun, a return to a former or less developed state
Given an unknown scalar function $\renewcommand{bs}{\boldsymbol} y = y(\boldsymbol x)$, where the argument $\bs x$ is a $n$ dimensional vector.
We can express this problem as a least square minimization:
$$ \min_{\bs {\beta}} \Vert X \bs{\beta - y} \Vert_2$$ Which can be easily solved using Matrix calculus:
$$\begin{array}{l} \Vert X \bs{\beta - y} \Vert_2^2 &= (X \bs{\beta - y})^T (X \bs{\beta - y}) \\ &= \bs{\beta}^T X^T X \bs{\beta} - 2 \bs{\beta}^TX^T\bs y + \bs y^T\bs y\\ \frac{\partial}{\partial \bs \beta} \Vert X \bs{\beta - y} \Vert_2^2 &= 2 \bs \beta^T X^TX - 2\bs y^T X = \bs 0^T \\ X^TX \bs \beta &= X^T \bs y \\ \bs \beta &= (X^TX)^{-1}X^T\bs y \end{array}$$
We already learned that an invertable matrix $A$'s condition number is
$$k(A) = \Vert A \Vert \Vert A^{-1} \Vert$$
We can extend this to non-square matrix using pseudo-inverse, following the same proof.
$$k(A) = \Vert A \Vert \Vert A^{+} \Vert$$
The $\bs \beta$ that solves the least square problem may have large magnitude:
Ridge regression adds a magnitude penalty to the objective function of least square:
$$\begin{array}{l} l(\bs \beta) &= \Vert X \bs{\beta - y} \Vert_2^2 + \lambda \Vert W \bs \beta \Vert_2^2 \\ &= \bs{\beta}^T X^T X \bs{\beta} - 2 \bs{\beta}^TX^T\bs y + \bs y^T \bs y + \lambda \bs \beta^T W^TW \bs \beta \\ \frac{\partial l}{\partial \bs \beta} &= 2\bs \beta^T X^TX - 2\bs y^T X + 2 \lambda \bs \beta^T W^T W = \bs 0^T \\ \bs \beta &= (X^TX + \lambda W^TW)^{-1}X^T\bs y \end{array}$$
We draw many pairs of $x, y, z$ from the following linear model:
$$ y = 2x + 0.1z + 5 + \epsilon $$
We regress the vector $\bs y$ against $X = [\bs x, \bs x + .0001 \bs z, \bs 1]$:
import pandas as pd
n = 5000
x = np.random.normal(size=n)
z = np.random.normal(size=n)
y = 2*x + 5 + 0.1*z + np.random.normal(size=n)
xs = np.array([x, x + 0.0001*z, np.ones(len(x))]).T
q = np.eye(len(xs.T))
lbd = .1
b_ols = np.linalg.inv(xs.T.dot(xs)).dot(xs.T).dot(y)
e_ols = np.linalg.norm(y - xs.dot(b_ols), 2)
b_rr = np.linalg.inv(xs.T.dot(xs) + lbd*q.T.dot(q)).dot(xs.T).dot(y)
e_rr = np.linalg.norm(y - xs.dot(b_rr), 2)
df = pd.DataFrame(np.array([b_ols, b_rr]), index=['Least square', 'Ridge regression $\\lambda=%2g$' % lbd],
columns=['$\\bs x$', '$\\bs x+.0001\\bs z$', '$\\bs 1$'])
df['$\Vert X\\bs \\beta - \\bs y \\Vert_2$'] = [e_ols, e_rr]
fmt.displayDF(df, "4g")
$\bs x$ | $\bs x+.0001\bs z$ | $\bs 1$ | $\Vert X\bs \beta - \bs y \Vert_2$ | |
---|---|---|---|---|
Least square | -996 | 998 | 4.988 | 71.02 |
Ridge regression $\lambda=0.1$ | 0.7599 | 1.257 | 4.991 | 71.37 |
eigen-: characteristic, origin: German
For a square matrix $A$ of size $n \times n$, if there exists a vector $\bs {v \ne 0}$ and a scalar $\lambda \ne 0$ so that:
$$ A \bs v = \lambda \bs v $$
Eigen vector/value naming conventions:
Rewrite the equation as:
$$ (A - \lambda I) \bs {v = 0}$$
It has non-zero solution if and only if:
$$ \det(A - \lambda I) = 0$$
characteristic equation is a polynomial of degree $n$:
It is difficult to directly solve the characteristic function for $\lambda_i$, especially for large $n$.
Eigen vectors of distinct eigen values are linearly independent.
$$\begin{array} \\ \bs{v_k} &= \sum_{i=1}^{k-1}c_i\bs{v_i} \\ \lambda_k \bs{v_k} &= \sum_{i=1}^{k-1}c_i\lambda_i\bs{v_i} = \lambda_k \sum_{i=1}^{k-1}c_i\bs{v_i} \\ \bs 0 &= \sum_{i=1}^{k-1}c_i(\lambda_i - \lambda_k) \bs{v_i} \end{array} $$
this leads to contradiction.
If $A$ has $n$ distinct eigen values, we can write:
$$ A R = R \Lambda$$
$R$ is invertible because eigen vectors are all independent:
$$\begin{array} \\ R^{-1} A &= \Lambda R^{-1} \\ A^*(R^{-1})^* &= (R^{-1})^* \Lambda^* \\ \end{array}$$
If $A$ is real and symmetric: $A^* = A$:
The conclusion holds even when there are duplicated eigen values, with some care.
If $A$ is real and symmetric:
Apply Lagrange multiplier:
$$\begin{array} \\ l &= \bs u^T A \bs u - \gamma (\bs u^T \bs u - 1) \\ \frac{\partial l}{\partial \bs u} &= 2 \bs u^T A^T - 2\gamma \bs u^T = \bs 0^T \iff A \bs u = \gamma \bs u \end{array}$$
This process can be repeated to find all eigenvalues and vectors:
Square matrix $B$ and $A$ are similar if $B = P^{-1}AP$:
$$ AR = PBP^{-1} R = R \Lambda \iff B (P^{-1}R) = (P^{-1} R) \Lambda $$
The basic idea of computing eigen values for $A$:
orthogonal: adjective, at right angles
Any real matrix $A$ can be decomposed into $A = QR$:
QR decomposition is numerically stable
What if $A$ is not fully ranked?
QA algorithm is one of the most important numerical methods of the 20th century link
Start with $A_0 = A$, then iterate:
This algorithm is unconditionally stable because only orthogonal transformtions are used.
Find the eigenvalues of the following matrix:
def qr_next(a) :
q, r = np.linalg.qr(a)
return r.dot(q)
a = np.array([[5, 4, -3, 2], [4, 4, 2, -1], [-3, 2, 3, 0], [2, -1, 0, -2]])
A = sp.MatrixSymbol('A', 4, 4)
A1 = sp.MatrixSymbol('A_1', 4, 4)
Q = sp.MatrixSymbol('Q_0', 4, 4)
R = sp.MatrixSymbol('R_0', 4, 4)
fmt.displayMath(fmt.joinMath('=', A, sp.Matrix(a)), pre="\\scriptsize ")
QR decomposition of $A_0 = A$:
q, r = np.linalg.qr(a)
fmt.displayMath(fmt.joinMath('=', Q, sp.Matrix(q).evalf(3)), fmt.joinMath('=', R, sp.Matrix(r).evalf(3)), pre="\\scriptsize ")
$A$ at the start of the next iteration, and after 20 iterations:
ii = 20
d = np.copy(a)
for i in range(ii) :
d = qr_next(d)
An = sp.MatrixSymbol('A_%d' % ii, 4, 4)
fmt.displayMath(fmt.joinMath('=', A1, sp.Matrix(r.dot(q)).evalf(3)), fmt.joinMath('=', An, sp.Matrix(np.round(d, 3))), pre="\\scriptsize ")
$$ H = I - \frac{2\bs{uu}^T}{\bs {(u^T u)^2}}$$
We show how to use Householder transformation to perform QR decomposition of the $A$ in previous example.
def householder(x0, e=0) :
n = len(x0)
e1 = np.zeros(n-e)
x = x0[e:]
e1[0] = np.linalg.norm(x, 2)
u = x - e1
v = np.matrix(u/np.linalg.norm(u, 2))
hs = np.eye(n-e) - 2*v.T*v
h = np.eye(n)
h[e:,e:] = hs
return h
x, u, e1, Q, R = sp.symbols("x, u, e_1, Q, R")
xn = sp.symbols("\Vert{x}\Vert_2")
b = a[:, 0]
c = np.zeros(len(b))
c[0] = np.linalg.norm(b, 2)
fmt.displayMath(fmt.joinMath('=', A, sp.Matrix(np.round(a))), fmt.joinMath('=', x, sp.Matrix(a[:,0])),
fmt.joinMath('=', xn, np.round(np.linalg.norm(b, 2), 3)),
fmt.joinMath('=', e1, sp.Matrix([1, 0, 0, 0])),
fmt.joinMath('=', u, sp.Matrix(a[:,0]-c).evalf(3)), pre="\\scriptsize ")
fmt.displayMath("\;")
h1 = householder(a[:, 0], 0)
H1 = sp.MatrixSymbol('H_1', 4, 4)
a1 = h1.dot(a)
fmt.displayMath(fmt.joinMath('=', H1, sp.Matrix(np.round(h1, 3))),
fmt.joinMath('=', H1*A, sp.Matrix(np.round(a1, 3))), pre="\\scriptsize ")
continue to zero out the lower triangle
h2 = householder(a1[:, 1], 1)
H2 = sp.MatrixSymbol('H_2', 4, 4)
a2 = h2.dot(a1)
fmt.displayMath(fmt.joinMath('=', H2, sp.Matrix(np.round(h2, 3))),
fmt.joinMath('=', H2*H1*A, sp.Matrix(np.round(a2, 3))), pre="\\scriptsize ")
fmt.displayMath("\;")
h3 = householder(a2[:, 2], 2)
H3 = sp.MatrixSymbol('H_3', 4, 4)
a3 = h3.dot(a2)
fmt.displayMath(fmt.joinMath('=', H3, sp.Matrix(np.round(h3, 3))),
fmt.joinMath('=', H3*H2*H1*A, sp.Matrix(np.round(a3, 3))), pre="\\scriptsize ")
the final results are therefore $Q = (H_3H_2H_1)^T, R = Q^T A$:
q = (h3.dot(h2).dot(h1)).T
r = q.T.dot(a)
np.round(q.dot(r), 4)
fmt.displayMath(fmt.joinMath('=', Q, sp.Matrix(np.round(q, 3))), fmt.joinMath('=', R, sp.Matrix(np.round(r, 3))), pre="\\scriptsize ")
$$ \min_{\bs {\beta}} \Vert X \bs{\beta - y} \Vert_2$$
given the QR decomposition of $X = QR$:
$$ \min_{\bs {\beta}} \Vert X \bs{\beta - y} \Vert_2 = \min_{\bs {\beta}} \Vert Q^T X \bs \beta - Q^T \bs y \Vert_2 = \min_{\bs {\beta}} \Vert R \bs \beta - \bs y'\Vert_2$$
note that $R$ is right trangular, the vector whose norm is to be minimized looks like:
$$\scriptsize \begin{pmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & \ddots & \ddots & \vdots \\ 0 & \cdots\ & 0 & r_{nn} \\ \hline 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{pmatrix} \begin{pmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_n \end{pmatrix} - \begin{pmatrix} y'_1 \\ y'_2 \\ \vdots \\ y'_n \\ \hline y'_{n+1} \\ \vdots \\ y'_m \end{pmatrix} $$
therefore, the solution is the $\bs \beta$ that zero out the first $n$ elements of the vector.
By SIAM:
Why single out the 20th century?
Slide rule:
Abacus:
Mahatma Gandhi: action expresses priorities.
The principal component (PC):
Suppose $\bs r$ is a random vector, with expectation $\bar{\bs r} = \mathbb{E}[\bs r]$ and covariance matrix $ V = \mathbb{E}[(\bs r - \bar{\bs r})(\bs r - \bar{\bs r})^T] $
PC is defined to be the direction $\hat{\bs u}$ onto which the projection $\bs r^T \hat{\bs u}$ has the maximimum variance,
es = np.random.normal(size=[3, 300])
x = (1.5*es[0,:] + .25*es[1,:])*.3 + 1
y = es[0,:]*.4 + es[2,:]*.2
figure(figsize=[6, 4])
plot(x, y, '.')
xlim(-2, 4)
ylim(-2, 2);
xlabel('x')
ylabel('y');
cov = np.cov([x, y])
ev, evec = np.linalg.eig(cov)
ux = mean(x)
uy = mean(y)
arrow(ux, uy, -3*sqrt(ev[1])*evec[0, 1], -3*sqrt(ev[1])*evec[1, 1], width=.01, color='r')
arrow(ux, uy, 3*sqrt(ev[0])*evec[0, 0], 3*sqrt(ev[0])*evec[1, 0], width=.01, color='r');
title('Principal Components of 2-D Data');
The projection of the random vector in the direction of $\hat{\bs u}$ is a random scalar $\hat{\bs u}^T \bs r$
If we limit ourselves to all $\bs u$ that is perpendicular to $\bs v_1$, then:
The total variance of is sum of variance of in all $n$ elements of the random vector, which equals:
Therefore, the percentage of variance explained by the first $k$ eigenvectors is $\frac{\sum_i^k \lambda_i}{\sum_i^n \lambda_i}$.
The PCA analysis has some similarity to the least square problem, both of them can be viewed as minimizing the L2 norm of residual errors.
xs = np.array([x, np.ones(len(x))]).T
beta = np.linalg.inv(xs.T.dot(xs)).dot(xs.T).dot(y)
s_x, s_y = sp.symbols('x, y')
br = np.round(beta, 4)
figure(figsize=[13, 4])
subplot(1, 2, 1)
plot(x, y, '.')
xlim(-2, 4)
ylim(-2, 2);
xlabel('x')
ylabel('y');
dx = 4*sqrt(ev[0])*evec[0, 0]
dy = 4*sqrt(ev[0])*evec[1, 0]
arrow(ux-dx, uy-dy, 2*dx, 2*dy, width=.01, color='k');
ex = 3*sqrt(ev[1])*evec[0, 1]
ey = 3*sqrt(ev[1])*evec[1, 1]
plot([ux, ux+ex], [uy, uy+ey], 'r-o', lw=3)
legend(['data', 'residual error'], loc='best')
title('PCA');
subplot(1, 2, 2)
plot(x, y, '.')
xlim(-2, 4)
ylim(-2, 2);
xlabel('x')
ylabel('y');
arrow(-1, -1*beta[0]+beta[1], 4, 4*beta[0], width=.01, color='k');
y0 = (ux+ex)*beta[0] + beta[1]
plot([ux+ex, ux+ex], [y0, uy+ey], 'r-o', lw=3)
legend(['data', 'residual error'], loc='best')
title('Least Square');
The main differences between PCA and least square (regression):
PCA | Least Square/Regression | |
---|---|---|
Scale Invariant | No | Yes |
Symmetry in Dimension | Yes | No |
To choose between the two:
CMT rates are constant maturity treasury bond yield that are published daily by U. S. Treasury.
cmt_rates = pd.read_csv("data/cmt.csv", parse_dates=[0], index_col=[0])
cmt_rates.plot(legend=False, title='Historical CMT yield');
tenors = cmt_rates.columns.map(float)
c = cmt_rates.cov()
fmt.displayDF(cmt_rates.corr(), "4g")
0.25 | 0.5 | 1 | 2 | 3 | 5 | 7 | 10 | 20 | |
---|---|---|---|---|---|---|---|---|---|
0.25 | 1 | 0.9981 | 0.9923 | 0.9765 | 0.9612 | 0.9284 | 0.9017 | 0.8696 | 0.8146 |
0.5 | 0.9981 | 1 | 0.9971 | 0.9841 | 0.97 | 0.9388 | 0.9126 | 0.8822 | 0.8272 |
1 | 0.9923 | 0.9971 | 1 | 0.9938 | 0.9838 | 0.9582 | 0.9351 | 0.9086 | 0.8577 |
2 | 0.9765 | 0.9841 | 0.9938 | 1 | 0.9972 | 0.9816 | 0.9648 | 0.943 | 0.8995 |
3 | 0.9612 | 0.97 | 0.9838 | 0.9972 | 1 | 0.9927 | 0.9807 | 0.963 | 0.9254 |
5 | 0.9284 | 0.9388 | 0.9582 | 0.9816 | 0.9927 | 1 | 0.9966 | 0.9871 | 0.9604 |
7 | 0.9017 | 0.9126 | 0.9351 | 0.9648 | 0.9807 | 0.9966 | 1 | 0.9957 | 0.9779 |
10 | 0.8696 | 0.8822 | 0.9086 | 0.943 | 0.963 | 0.9871 | 0.9957 | 1 | 0.9903 |
20 | 0.8146 | 0.8272 | 0.8577 | 0.8995 | 0.9254 | 0.9604 | 0.9779 | 0.9903 | 1 |
The eigenvectors and percentage explained,
x, v = np.linalg.eig(c)
pct_v = np.cumsum(x)/sum(x)*100
import pandas as pd
pd.set_option('display.precision', 3)
fmt.displayDF(pd.DataFrame({'P/C':range(1, len(x)+1), 'Eigenvalues':x, 'Cumulative Var(%)': pct_v}).set_index(['P/C']).T, "2f")
P/C | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Cumulative Var(%) | 96.27 | 99.74 | 99.92 | 99.96 | 99.99 | 99.99 | 99.99 | 100.00 | 100.00 |
Eigenvalues | 34.33 | 1.24 | 0.06 | 0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 |
flab = ['factor %d' % i for i in range(1, 4)]
fig = figure(figsize=[12, 4])
ax1 = fig.add_subplot(121)
plot(tenors, v[:, :3], '.-');
xlabel('Tenors(Y)')
legend(flab, loc='best')
title('First 3 principal components');
fs = cmt_rates.dot(v).iloc[:, :3]
ax2 = fig.add_subplot(122)
fs.plot(ax=ax2, title='Projections to the first 3 P/Cs')
legend(flab, loc='best');
PCA is a powerful technique, however, beware of its limitations:
PCA is usually applied to different data points of the same type
It is important to choose the right variable to apply PCA. The main considerations are:
Values or changes
Spot or forward
Since the PCA is commonly applied to a correlation/covariance matrix, it is equivalent to ask what variables' correlation/covariance matrix we should model.
Spurious correlation: variables can appear to be significantly correlated, but it is just an illusion:
f3 = pd.read_csv('data/f3.csv', parse_dates=[0]).set_index('Date').sort()
r = np.log(f3).diff()
fmt.displayDFs(f3.corr(), r.corr(), headers=['Price Correlation', 'Return Correlation'], fmt="4g")
Price Correlation | Return Correlation | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The correlation between 9Y and 10Y spot rates and forward rates:
es = np.random.normal(size=[2, 500])
r0 = es[0,:]*.015 + .05
f1 = es[1,:]*.015 + .05
r1 = -np.log(np.exp(-9*r0-f1))/10
rc = np.corrcoef(np.array([r0, r1]))[0, 1]
fc = np.corrcoef(np.array([r0, f1]))[0, 1]
figure(figsize=[12, 4])
subplot(1, 2, 1)
plot(r0, r1, '.')
xlabel('9Y rate')
ylabel('10Y spot')
title('Corr(9Y Spot, 10Y Spot) = %.4f'% rc)
subplot(1, 2, 2)
plot(r0, f1, '.')
xlabel('9Y rate')
ylabel('9-10Y forward')
title('Corr(9Y Spot, 9-10Y Forward) = %.4f'% fc);
$$\exp(-9 r_9)\exp(-(10-9) f_9^{10} ) = \exp(-10 r_{10}) \iff r_{10} = 0.9r_9 + 0.1f_9^{10} $$
rotate, stretch and rotate, then you get back to square one, that's pretty much the life in a bank.
For any real matrix $M$, $\sigma > 0$ is a singular value if there exists two vectors $\bs {u, v}$ such that:
$\bs {u, v}$ are called left and right singular vector of $M$.
By convention,
Consider $\bs u^T M \bs v$ for real matrix $M$:
Apply the Lagrange multiplier, with constraints $\bs u^T \bs u = \bs v^T \bs v = 1$:
$$\small \begin{array} \\l &= \bs u^T M \bs v - \lambda_1 (\bs u^T \bs u - 1) - \lambda_2 (\bs v^T \bs v - 1) \\ \frac{\partial l}{\partial{\bs u}} &= \bs v^T M^T - 2 \lambda_1 \bs u^T = \bs 0^T \iff \bs u^T M \bs v - 2 \lambda_1 = 0\\ \frac{\partial l}{\partial{\bs v}} &= \bs u^T M - 2\lambda_2 \bs v^T = \bs 0^T \iff \bs u^T M \bs v- 2\lambda_2 = 0 \end{array}$$
Therefore, $$\small \bs u^T M \bs v = 2\lambda_1 = 2\lambda_2 \iff M \bs v = \sigma \bs u \;,\; M^T\bs u = \sigma \bs v $$
Given $\bs{u_1, v_1}$ are the first singular vectors of $M$:
because:
$$ (M \bs x)^T \bs u_1 = \bs x^T M^T \bs u_1 = \bs x^T \sigma_1 \bs v_1 = 0 $$
Therefore, among those unit $\bs u, \bs v$ that are orthogonal to $\bs{u_1, v_1}$ :
We can write all singular vectors and singular values in matrix format, which is the singular value decomposition (SVD):
$$ U^T M V = \Sigma \iff M = U \Sigma V^T $$
$M$ | $U$ | $\Sigma$ | $V$ | |
---|---|---|---|---|
Name | Original Matrix | Left singular vector | Singular value | Right singular vector |
Type | Real | Orthogonal, real | Diagonal, positive | Orthogonal, real |
Size | $m \times n$ | $m\times m$ | $m\times n$ | $n \times n$ |
Illustration from wikipedia:
$$ M = U \Sigma V^T $$
SVD and eigenvalue decomposition (EVD) are closely related to each other:
Given the SVD decomposition $X = U\Sigma V^T$, the SVD of $X$'s pseudo inverse is:
$$X^+ = (X^TX)^{-1}X^T = V \Sigma^+ U^T$$
where $\Sigma^+$ is the pseudo inverse of $\Sigma$, whose diagonal elements are the reciprocals of those in $\Sigma$.
Proof: the last equation is obviously true, and every step is reversible:
$$\begin{array}&(X^TX)^{-1}X^T &= V\Sigma^+ U^T \\ (V \Sigma^T \Sigma V^T)^{-1} V\Sigma^TU^T &= V\Sigma^+ U^T \\ (V \Sigma^T \Sigma V^T)^{-1} V\Sigma^T\Sigma &= V \Sigma^+\Sigma \\ (V \Sigma^T \Sigma V^T)^{-1} V\Sigma^T\Sigma &= V \\ (V \Sigma^T \Sigma V^T)^{-1} (V\Sigma^T\Sigma V^T) &= I \end{array}$$
SVD is another way to solve the least square problem.
Consider a real matrix $A$, its L2 norm is defined as the maximum of $\frac{\Vert A \bs x \Vert_2}{\Vert \bs x \Vert_2}$:
$$ \Vert A \Vert_2^2 = \max_{\bs x} \frac{\Vert A \bs x \Vert_2^2}{\Vert \bs x \Vert_2^2} = \max_{\bs x} \frac{\bs x^T A^T A \bs x}{\bs x^T \bs x} = \max_{\hat{\bs x}} \hat{\bs x}^T A^T A \hat{\bs x} = \sigma_1(A)^2$$
where $\hat{\bs x} = \frac{\bs x}{\Vert \bs x \Vert_2}$ is a unit vector, and $\sigma_1(A)$ is the largest singular value of $A$.
For a generic non-square matrix $A$:
SVD offers clear intuitions in understanding the L-2 condition number of a Matrix $A$.
For a linear system $\bs y = A \bs x$, consider a perturbation $\delta \bs x$ to the input $\bs x$:
When working with near singular matrix, SVD can identify ill-conditioned area:
What is the change in $\lambda_i$ if we perturb the symmetric matrix $A$?
$$ A \bs v_i = \lambda_i \bs v_i$$
Apply the perturbation analysis: $\delta A = \dot{A} \epsilon, \delta \lambda_i = \dot{\lambda_i} \epsilon, \delta \bs v_i = \dot{\bs v_i} \epsilon$, the first order terms of $\epsilon$:
$$\begin{array} \\ (A + \dot{A}\epsilon )(\bs v_i + \dot{\bs v_i}\epsilon) &= (\lambda_i + \dot{\lambda_i}\epsilon) (\bs v_i + \dot{\bs v_i} \epsilon) \\ \dot{A}\bs v_i + A \dot{\bs v_i} &= \lambda_i \dot{\bs v_i} + \dot{\lambda_i}\bs v_i \\ \bs v_i^T\dot{A}\bs v_i + \bs v_i^T A \dot{\bs v_i} &= \bs v_i^T \lambda_i \dot{\bs v_i} + \bs v_i^T \dot{\lambda_i}\bs v_i \\ \bs v_i^T\dot{A}\bs v_i &= \dot{\lambda_i} \end{array}$$
Therefore: $$\begin{array} \\ \frac{ | \delta \lambda_i |}{| \lambda_i |} &= \frac{\Vert \bs v_i^T\delta A \bs v_i\Vert_2}{|\lambda_i|} \le \frac{\Vert \delta A \Vert_2}{|\lambda_i|} = \frac{\Vert A\Vert_2}{|\lambda_i|} \frac{\Vert \delta A \Vert_2}{\Vert A \Vert_2} = \left\vert \frac{\lambda_1}{\lambda_i}\right\vert \frac{\Vert \delta A \Vert_2}{\Vert A \Vert_2} \end{array}$$
For real matrix $A$:
Decomposition | Applicability | Formula | Results | Key applications |
---|---|---|---|---|
LU | square, full rank | $A = LU$ | lower, upper triangluar | matrix inversion |
Cholesky | symmetric positive definite | $A = LL^T$ | lower triangular | simulate correlated factors |
QR | any | $A = Q R$ | orthogonal, upper triangular | solve eigen values, least square |
EVD | symmetric | $A = R\Lambda R^T$ | orthogonal, real diagonal | PCA |
SVD | any | $A = U\Sigma V^T$ | orthogonal, positive diagonal, orthogonal | error analysis, rank reduction |
Requried reading:
Recommended reading:
Homework: