Solving linear systems

Introductory notes on use and trust

Note: Though it's written for Matlab, a good read for this stuff is definitely Cleve Moler's text Numerical computing with Matlab, particularly the Linear Equations chapter.

Another look at the class example

In class, we examined the system

$$\begin{align} 10^{-n}x_1 + x_2 &= 1 \\ x_1 + x_2 &= 2 \end{align}$$

Applying direct Gaussian elimination, we found

$$x_2 = \frac{2-10^n}{1-10^n} \; \text{ and } \; x_1 = (1-x_2)10^n.$$

The expression for $x_1$ looks pretty bad from a numerical perspective, as $1-x_2$ is a difference between numbers that are close together; multiplication by $10^n$ then magnifies that difference.

If we pivot by simply changing the order of the equations and then apply Gaussian elimination, we find

$$x_2 = \frac{1-2\times10^{-n}}{1-10^{-n}} \; \text{ and } x_1 = 2-x_2.$$

While, this is algebraically equivalent to the first solution, it's numerical properties are much better. In particular, $2-x_2$ yields a numerically stable result.

Here's what the actual computed solutions produced by these results look like for $n=9$.

In [1]:
# First suspicious solution
n = 9
x2 = (2-10**(n))/(1-10**(n))
x1 = (1-x2)*10**n
x1,x2
Out[1]:
(0.9999999717180685, 0.999999999)
In [2]:
# Numerically stable solution
x2 = (1-2*10**(-n))/(1-10**(-n))
x1 = 2-x2
x1,x2
Out[2]:
(1.000000001, 0.9999999989999999)

Of course, the second one is better as we must have $x_1$ slighly larger than $1$.

Matrix inversion

Matrix inversion is generally a bad idea - it's more work and less accurate. Even if you're just doing plain old arithmetic.

In [3]:
from decimal import Decimal, getcontext
context = getcontext()
context.prec = 5

six = Decimal('6')
eighteen = Decimal('18')

print((1/six)*eighteen)
print(eighteen/six)
3.0001
3

This problem is greatly exacerbated in the matrix context, as matrix inversion is a computationally intense process. Naive matrix inversion has time complexity $O(n^3)$ and the best known algorithms are still $O(n^{2.3})$ or so. Furthermore, most computations involving matrix inversion can be recast using faster and more stable techniques. One common example is the solution of linear systems. A given matrix equation $Ax=b$ can be solved by $x=A^{-1}b$, but a faster and more reliable reliable solution can be obtained via Gaussian elimination. Let's compare the two. In SciPy, these are implemented via inv and solve in the scipy.linalg module. We'll also import norm to measure the residuals of the solutions, as well as a tool to generate a random matrix and a function to time our results.

In [4]:
from scipy.linalg import solve, inv, norm
from numpy.random import randn, seed
import time
import numpy as np
In [5]:
# Generate a 5000 by 5000 system
seed(1)
A = randn(5000,5000)
b = randn(5000)
In [6]:
# Solve with inv and time the solution
t = time.time()
x1 = inv(A).dot(b)
time.time()-t
Out[6]:
4.363846063613892
In [7]:
# Measure the residual
norm(A.dot(x1)-b)
Out[7]:
1.4137433010885349e-09
In [8]:
# Do the same with solve
t = time.time()
x2 = solve(A,b)
time.time()-t
Out[8]:
1.6812100410461426
In [9]:
norm(A.dot(x2)-b)
Out[9]:
7.631727480284211e-10

Residual vs error

If solve $Ax=b$ on the computer we obtain an estimate that we'll denote $x_*$. There are two commonly used measures of how far off $x_*$ is from the actual $x$:

  • Error: $x-x_*$
  • Residual $b-Ax_*$

Properly implemented Gaussian elimination with pivoting is generally guaranteed to produce a small residual, but not necessarily a small error. Consider for example, the system

$$\begin{align} (1+10^{-n})x_1 + x_2 &= 1 \\ x_1 + x_2 &= 2. \end{align}$$

It's easy to show that $x_1 = -10^{n}$ and $x_2 = 10^n+2$ form the exact solution of the system. For large $n$, though, we might expect rounding errors

In [10]:
n = 10
A = np.matrix([[1+10**(-n), 1],[1,1]])
b = [1,2]
x = solve(A,b)
x
Out[10]:
array([ -9.99999917e+09,   9.99999917e+09])

The residual is still effectively zero:

In [11]:
A.dot(x)-b
Out[11]:
matrix([[ 0.,  0.]])

But the error is still fairly large.

In [12]:
xx = np.array([-10**n, 2+10**n])
x-xx
Out[12]:
array([ 828.40364075, -828.40364075])

Condition numbers

Multiplication by $A$ produces a new vector $Ax$ that can have a very different norm from x. The variability in this operation is exactly what produces numerical instability. The range of the possible changes in magnitude can be expressed by two quantities:

$$\begin{align} M &= \max\frac{\|Ax\|}{\|x\|} \\ m &= \min\frac{\|Ax\|}{\|x\|} \end{align}$$

The ratio of these is called the condition number:

$$\kappa(A) = \frac{\max\|Ax\|/\|x\|}{\min\|Ax\|/\|x\|}.$$

In some sense, $\kappa(A)$ measures how close the matrix is to being singular. If the matrix is singular, then $m=0$ and $\kappa(A)=\infty$. A large condition number indicates that computational error might be much larger than residual estimates. Here's the condition number of the matrix we were just working with.

In [13]:
from numpy.linalg import cond
cond(A)
Out[13]:
40000198916.164642

No wonder it presented computational challenges.