A rough schedule of the class:
Week | Main Topics | Practical Problems |
---|---|---|
1 | Introduction and Error Analysis | Software practice |
2-3 | Linear Algebra | Portfolio optimization, PCA, least square |
4 | Rootfinding and interpolation | Curve building |
5 | Derivatives and integration | Hedging, Risk Transformation |
7-8 | Optimization | Model calibration |
9-11 | Monte Carlo simulation | Exotic pricing, Max entropy, Allocation |
12-13 | ODE/PDE | Pricing |
Principles of scientific computing online book
D Bindel and J Goodman, 2009
Interest rate modelling, volume II: Vanilla Models
L Anderson and V. Piterbarg, 2010
Monte Carlo method in Financial Engineering
P. Glasserman 2010
Available online at: http://yadongli.github.io/nyumath2048
Memory hierarchy
Storage | Bandwidth | Latency | Capacity |
---|---|---|---|
L-1 Cache | 98GB/s | 4 cycles | 32K/core |
L-2 Cache | 50GB/s | 10 cycles | 256K/core |
L-3 Cache | 30GB/s | 40 - 75 cycles | 8MB shared |
RAM | 15GB/s | 60-100 ns | 2 - 128GB |
Disk | 50MB/s | ~5ms | $\infty$ |
Network | vary | much longer | $\infty$ |
In practice, what often trumps everything else is:
The typical advice includes:
These are good advice, but quite superficial.
by Paul Phillips (quoting George Orwell): "We're doing it all wrong":
Sun Tzu: "the supreme art of modelling is to solve problems without coding"
There is no single right answer
When you have the chance, choose the more expressive language for the job.
Eric Raymond, "How to Become a Hacker":
"Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot."
Python
Python ecosystem
Python resources
Python version:
Pyton setup instructions:
James Gosling: "95% of the folks out there are completely clueless about floating-point"
If the true value $a$ is approximated by $\hat{a}$:
Both measure are useful in practice.
Impreciseness in the theory or model itself
Errors in numerical algorithm
Error due to machine representation
We cover the last category first.
First published in 1985, the standard includes:
The IEEE 754 is a tremendously successful standard:
The IEEE 754 standard is widely implemented in modern hardware and software,
strictfp
flag (don't use it!)$$\underbrace{*}_{\text{sign bit}}\;\underbrace{********}_{\text{exponent}}\overbrace{\;1.\;}^{\text{implicit}}\;\underbrace{***********************}_{\text{fraction}} $$
$$(-1)^{\text{sign}}(\text{1 + fraction})2^{\text{exponent} - p}$$
Format | Sign | Fraction (incl. the implicit bit) | Exponent | $p$ |
---|---|---|---|---|
IEEE 754 - 32 bit | 1 | 24 | 8 | 127 |
IEEE 754 - 64 bit | 1 | 53 | 11 | 1023 |
%pylab inline
import bitstring
import pandas as pd
import numpy as np
import fmt
import math, sys
pd.set_option("display.max_colwidth", 80)
def parts2f(sign, e, sig) :
return ((-1)**sign*sig*2.**e)
def f2parts(v, flen) :
ps = {32: 9, 64 : 12}
ed = ps[flen]
eoff = 2**(ed - 2) - 1
f1 = bitstring.BitArray(float=v, length=flen)
signb = f1[:1]
sigb = f1[ed:]
eb = f1[1:ed]
sig = (1. + 1.*sigb.uint/(2**(flen-ed))) if eb.uint > 0 else 2.*sigb.uint/(2**(flen-ed))
e = eb.uint - eoff
bins = np.array([signb.bin, eb.bin, sigb.bin])
vals = np.array([1-signb.uint*2, e, sig])
fmt.displayDF(pd.DataFrame(np.array([bins, vals]).T, columns=['Bits', 'Decimal'],
index=["Sign", "Exp", "Fraction"]).T)
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['f', 'e'] `%matplotlib` prevents importing * from pylab and numpy
f2parts takes a floating number and its bit length (32 or 64) and shows the three parts.
f2parts(10.7, 32);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 0 | 10000010 | 01010110011001100110011 |
Decimal | 1.0 | 3.0 | 1.33749997616 |
f2parts(-23445.25, 64)
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 1 | 10000001101 | 0110111001010101000000000000000000000000000000000000 |
Decimal | -1.0 | 14.0 | 1.43098449707 |
prec32 = 2**(-24)
prec64 = 2**(-53)
f32 = [parts2f(0, -126, 1), parts2f(0, 254-127, (2.-prec32*2)), prec32, -np.log10(prec32)]
f64 = [parts2f(0, -1022, 1), parts2f(0, 2046-1023, (2.-prec64*2)), prec64, -np.log10(prec64)]
fmt.displayDF(pd.DataFrame(np.array([f32, f64]), index=['32 bit', '64 bit'],
columns=["Min", "Max", "Machine Precision", "# of Significant Digits"]), "5g")
Min | Max | Machine Precision | # of Significant Digits | |
---|---|---|---|---|
32 bit | 1.1755e-38 | 3.4028e+38 | 5.9605e-08 | 7.2247 |
64 bit | 2.2251e-308 | 1.7977e+308 | 1.1102e-16 | 15.955 |
f2parts(-0., 32);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 1 | 00000000 | 00000000000000000000000 |
Decimal | -1.0 | -127.0 | 0.0 |
f2parts(np.NaN, 32);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 0 | 11111111 | 10000000000000000000000 |
Decimal | 1.0 | 128.0 | 1.5 |
f2parts(-np.Inf, 32);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 1 | 11111111 | 00000000000000000000000 |
Decimal | -1.0 | 128.0 | 1.0 |
f2parts(-1e-43, 32);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 1 | 00000000 | 00000000000000001000111 |
Decimal | -1.0 | -127.0 | 1.69277191162e-05 |
a = 1./3
b = a + a + 1. - 1.
c = 2*a
print "b == c? %s !" % (b == c)
print "b - c = %e" % (b - c)
b == c? False ! b - c = -1.110223e-16
f2parts(b, 64);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 0 | 01111111110 | 0101010101010101010101010101010101010101010101010100 |
Decimal | 1.0 | -1.0 | 1.33333333333 |
f2parts(c, 64);
Sign | Exp | Fraction | |
---|---|---|---|
Bits | 0 | 01111111110 | 0101010101010101010101010101010101010101010101010101 |
Decimal | 1.0 | -1.0 | 1.33333333333 |
x = 1e-308
print "inverse of %e is %e" % (x, 1/x)
inverse of 1.000000e-308 is 1.000000e+308
x = 1e-309
print "inverse of %e is %e" % (x, 1/x)
inverse of 1.000000e-309 is inf
print 1. - 1e-16 == 1.
False
print 1. + 1e-16 == 1.
True
a = math.factorial(30) # in Python's bignum
print "a = ", a
print "maxint = ", sys.maxint
print "a - maxint == a ?\n", (a - sys.maxint) == a
b = float(math.factorial(30))
print "b = ", b
print "b - maxint == b ?", (b - sys.maxint) == b
a = 265252859812191058636308480000000 maxint = 2147483647 a - maxint == a ? False b = 2.65252859812e+32 b - maxint == b ? True
The limited precision can cause subtle errors that are puzzling even for experiened programmers.
a = np.arange(0, 3. - 1., 1.)
print "a = ", a
print "a/3 = ", a/3.
a = [ 0. 1.] a/3 = [ 0. 0.33333333]
b = np.arange(0, 1 - 1./3., 1./3.)
print "b = ", b
b = [ 0. 0.33333333 0.66666667]
It should become your instinct to always use error bound when comparing floating numbers, or use integer comparison as the condition for ending loops.
b = arange(0, 1 - 1./3. - 1e-10, 1./3)
print "b = ", b
b = [ 0. 0.33333333]
Compute $e^x$ via Taylor expansion: $$e^x = \sum_{n=0}^{\infty}\frac{x^n}{n!} = 1 + x + \frac{1}{2}x^2 + ...$$ Other types of errors may be mistaken as floating point rounding error:
import math
def exp_taylor(x, n) :
return sum([1.*x**i/math.factorial(i) for i in np.arange(1, n)]) + 1.
x = 5
print "rel err for exp(%.f) is %.2f%%" % (x, (np.exp(x) - exp_taylor(x, 100))/np.exp(x)*100)
x = -5
print "rel err for exp(%.f) is %.2f%%" % (x, (np.exp(x) - exp_taylor(x, 100))/np.exp(x)*100)
print "exp_taylor(%.f) = %f, exp(%.f) = %f" % (x, exp_taylor(-5, 150), x, np.exp(-5))
rel err for exp(5) is 0.06% rel err for exp(-5) is 474.72% exp_taylor(-5) = -0.025249, exp(-5) = 0.006738
Dramatic loss of precision can happen when very similar numbers are subtracted
$$f(x) = \frac{1}{x^2}(1-\cos(x))$$
The straight forward implementation failed spectacularly for small $x$.
def f(x) :
return (1.-np.cos(x))/x**2
def g(x) :
return (np.sin(.5*x)**2*2)/x**2
x = np.arange(-4e-8, 4e-8, 1e-9)
df = pd.DataFrame(np.array([f(x), g(x)]).T, index=x, columns=['Numerical f(x)', 'True Value']);
df.plot();
x = 1.1e-8
print "cos(%.16e) = %.16f" % (x, cos(x))
print "1-cos(%.e) = %.16e" % (x, 1.-cos(x))
print "true value of 1-cos(%.e) = %.16e" % (x, 2*sin(.5*x)**2)
cos(1.0999999999999999e-08) = 0.9999999999999999 1-cos(1e-08) = 1.1102230246251565e-16 true value of 1-cos(1e-08) = 6.0499999999999997e-17
the CIR process is widely used in quant finance, eg, in the short rate and Heston model:
$$ dr_t = \kappa(\mu - r) dt + \sigma \sqrt{r_t} dW_t $$
the variance of $r_t$ at $t > 0$:
$$\text{Var}[r_t | r_0] = \frac{r_0\sigma^2}{\kappa}(e^{-\kappa t} - e^{-2\kappa t}) + \frac{\mu\sigma^2}{2\kappa}(1-e^{-\kappa t})^2$$
how the catastrophic cancellation arises?
k = arange(-3e-8, 3e-8, 1e-9)*1e-7
t = 1
v = (1-exp(-k*t))/k
plot(k, v, '.-')
title(r'$\frac{1}{\kappa}(1-\exp(-2\kappa))$')
xlabel(r'$\kappa$');
import sympy as sp
sp.init_printing(use_latex = True)
k = sp.symbols("k", positive = True)
e = (1-sp.exp(-k))/k
sp.limit(e, k, 0)
Consider the relative error of a function with multiple arguments: $f = f(x_1, ..., x_n)$:
\begin{eqnarray} \small df &=& \sum_i \frac{\partial f}{\partial x_i} dx_i \\ \frac{df}{f} &=& \sum_i \frac{x_i}{f}\frac{\partial f}{\partial x_i} \frac{dx_i}{x_i} \end{eqnarray}
Assuming input argument's relative errors are $\epsilon_i$ (could be negative):
$$ \small \left| \frac{\Delta f}{f} \right| = \left| \sum_i \frac{x_i}{f}\frac{\partial f}{\partial x_i} \epsilon_i \right| \le \sum_i \left| \frac{x_i}{f}\frac{\partial f}{\partial x_i} \right| \left|\epsilon_i\right| \le \sum_i \left| \frac{x_i}{f}\frac{\partial f}{\partial x_i} \right| \epsilon \equiv k(f) \epsilon $$
where $\epsilon$ is the maximum relative error of all inputs.
Condition number is the systematic way to detect potential numerical problems:
$f(x_1, x_2) = x_1 - x_2$
$k(f) = \frac{|x_1| + |x_2|}{|x_1 - x_2|}$, ill conditioned when $x_1 \approx x_2$, catestrophic cancellation
$f(x_1, x_2) = x_1 x_2$
$k(f) = 2$, the multiplication is well conditioned
In practice, it is impossible to run this analysis over complicated calculations:
Consider the previous example of $f(x) = \frac{1}{x^2}(1-\cos(x))$:
$$ k(f) = \left| 2 + \frac{x \sin{\left (x \right )}}{\cos{\left (x \right )} - 1} \right| $$
The numerical algorithm can be unstable even if the problem itself is well-posed.
A better algorithm is:
$$f(x) = \frac{2\sin^2(\frac{x}{2})}{x^2}$$
But an ill-posed problem is always numerically unstable.
x, k = sp.symbols('x, k', real = True)
f = (1-sp.cos(x))/x**2
k = sp.simplify(f.diff(x)*x/f)
sp.limit(k, x, 0)
sp.simplify(1-sp.cos(2*x))
Ill-posed problem are generally not suitable for numerical solutions.
The Mandelbrot set is a famous example:
Required Reading:
Recommended Reading:
Homework