Transcript CM0368 Scientific Computing
CM0368 Scientific Computing
Spring 2009 Professor David Walker [email protected]
Schedule
• Weeks 1 & 2 ( 6 ): Numerical linear algebra (DWW) – Solving A
x
=
b
by Gaussian elimination and Gauss-Seidel.
– Algebraic eigenvalue problem. Power method and QR method.
• Weeks 3 & 4 ( 6 ): Numerical solution of differential equations (BMB) – Ordinary differential equations (finite difference and Runge-Kutta methods).
– Partial differential equations (finite difference method) • Weeks 5 & 6 ( 5 ): Applications in Physics (BMB) – Laplace and Helmholtz equations, Schrodinger problem for the hydrogen atom, etc.
• Weeks 6 & 7 ( 4 ): Numerical optimisation (DWW) • Weeks 2 – 11: Tutorials on Wednesdays at 1pm in T2.07. These will be given by Kieran Robert.
Text Book
• “Numerical Computing with MATLAB” by Cleve B. Moler, SIAM Press, 2004. ISBN 0898715601.
• http://www.readinglists.co.uk/rsl/student/sv iewlist.dfp?id=20248 • Web edition at http://www.mathworks.com/moler/
Web Site
• Lecture notes and other module material can be found at: http://users.cs.cf.ac.uk/David.W.Walker/CM0368
Numerical Linear Algebra
• A system of n simultaneous linear equations can be represented in matrix notation as: A
x
=
b
where A is an n n matrix, and
x
vectors of length n.
• Can write solution as
x
inverse
of A.
= A -1
b
and
b
where A -1 are is the • A square matrix is said to be
non-singular
if its inverse exists. If the inverse does not exist then the matrix is
singular
.
Geometrical Interpretation
• If A is a 2 2 matrix, for example 2 3 4 1
x x
2 1 3 1 then 2x 1 – x 2 = 3 and 3x 1 + 4x 2 = -1. Each represents a straight line and the solution of the above is given by their intersection.
• If A is a 3 3 matrix each of the three equations represents a plane, and the solution is the point lying at the intersection of the three planes.
MATLAB Solution
• In MATLAB we can find the solution to A
x
writing: =
b
by x = A\b • Write: A = [10 -7 0;-3 2 6;5 -1 5] • Then: b = [7;4;6] • Then: x = A\b
Gaussian Elimination
• Eliminate x 1 from all the equations after the first.
• Then eliminate x 2 second.
from all the equations after the • Then eliminate x 3 third.
from all the equations after the • And so on, until after n-1 steps we have eliminated x j from all the equations after the jth, for j = 1, 2, …, n-1.
• These steps are referred to as the
forward elimination
stage of Gaussian elimination.
Example
10 5 3 7 2 1 0 6 5
x x
2
x
3 1 7 4 6 Subtract -3/10 times equation 1 from equation 2, and 5/10 times equation 1 from equation 3.
10 0 0 2 .
0 7 5 .
1 0 6 5
x x
2
x
3 1 7 6 .
1 2 .
5 Next we swap equations 2 and 3. This is called
partial pivoting
. It is done to get the largest absolute value on or below the diagonal in column 2 onto the diagonal. This makes the algorithm more stable with respect to round off errors (see later).
Example (continued)
10 0 0 7 2 .
5 0 .
1 0 5 6
x x
2
x
1 3 7 2 .
5 6 .
1 Now subtract -0.1/2.5 times equation 2 from equation 3.
10 0 0 7 2 .
5 0 0 5 6 .
2
x x
2
x
3 1 7 2 .
5 6 .
2 This completes the forward elimination stage of the Gaussian elimination algorithm.
Pseudocode for Forward Elimination
make b column n+1 of A for k=1 to n-1 find pivot A(m,k) in column k and row m k swap rows k and m for i=k+1 to n mult = A(i,k)/A(k,k) for j=i+1 to n+1 A(i,j) = A(i,j) – mult*A(k,j) end end end
Back Substitution
• After the forward elimination phase, the matrix has been transformed into
upper triangular
form.
• Equation n just involves x n solved immediately.
and so can now be • Equation n-1 just involves x n-1 and x n , and since we already know x n we can find x equations we can find x n , x n-1 n-1 ,…, x 1 .
• Working our way backwards through the . • This is called the
back substitution
phase of the Gaussian elimination algorithm.
The Example Again
10 0 0 7 2 .
5 0 0 5 6 .
2
x x
2
x
1 3 7 2 .
5 6 .
2 Equation 3 is 6.2x
3 = 6.2, so x 3 substituted into equation 2: = 1. This value is 2.5x
2 + (5)(1) = 2.5
so x 2 = -1. Substituting for x 2 and x 3 in equation 1: 10x 1 + (-7)(-1) = 7 so x 1 = 0.
Pseudocode for Back Substitution
This solves Ux = b x(n) = b(n)/U(n,n) for k=n-1 to 1 sum = 0 for j=k+1 to n sum = sum + U(k,j)*x(j) end x(k) = (b(k) – sum)/U(k,k) end
LU Factorisation
• The Gaussian elimination process can be expressed in terms of three matrices.
• The first matrix has 1’s on the main diagonal and the multipliers used in the forward elimination below the diagonal. This is a
lower triangular matrix
diagonal, and is usually denoted by L.
with unit • The second matrix, denoted by U, is the upper triangular matrix obtained at the end of the forward elimination.
• The third matrix, denoted by P, is a
permutation matrix
representing the row interchanges performed in pivoting.
1
L
0 .
5 0 .
3 0 1 0 .
04 0 0 1
L, U, and P
U
10 0 0 7 2 .
5 0 0 5 6 .
2
P
1 0 0 0 0 1 0 1 0 The original matrix can be expressed as: LU = PA The permutation matrix is the identity matrix with its rows reordered. If P ij = 1 then row i of the permuted matrix is row j of the original matrix. The same information can be represented in a vector, the ith entry of which gives the number of the column containing the 1 in row i.
p
1 3 2
Some MATLAB Code
• L, U, and P can be found in MATLAB as follows: [L,U,P] = lu(A) • Solution of the system A
x
=
b
is equivalent to solving the two triangular systems L
y
= P
b
and U
x
=
y
• Once you have L, U, and P it is simple to solve the original system of equations: y = L\(P*b) and x = U\y, or just x = U\(L\(P*b)) • This should give the same answer as: x = A\b
LDU Factorisation
• It is sometimes useful to explicitly separate out the diagonal of U, which contains the pivots.
• We write U=DU’ where U’ is the same as U except that it has 1’s on the diagonal, and D is a diagonal matrix containing the diagonal elements of U.
• With this form of the factorisation we have LDU = PA 1
L
0 .
5 0 .
3 0 1 0 .
04 0 0 1
D
10 0 0 0 2 .
5 0 0 0 6 .
2
U
1 0 0 7 1 0 0 5 1
Pseudocode for LU Factorisation
make b column n+1 of A for k=1 to n-1 find pivot A(m,k) in column k and row m k swap rows k and m for i=k+1 to n A(i,k) = A(i,k)/A(k,k) for j=i+1 to n+1 A(i,j) = A(i,j) – A(i,k)*A(k,j) end end end
Explanation of LU
• At stage i of forward elimination we do pivoting to find the largest absolute value in column i on or below the diagonal, and then exchange rows to bring it onto the diagonal.
• Then we subtract multiples of row i from rows i+1,…,n.
• Each of these operations can be represented by a matrix multiplication.
Elementary Matrices
• An elementary matrix, M, is one that has 1’s along the main diagonal and 0’s everywhere else, except for one non-zero value (-m, say) in row i and column j.
• Multiplying A by M has the effect of subtracting m times row j of matrix A from row i.
• Ignoring pivoting, the GE algorithm applies a series of elementary matrices to A to get U: U = M n-1 ….M
2 M 1 A • If L i -1 =M i then L 1 L 2 …L n-1 U = A so taking L = L 1 L 2 …L n-1 we have LU=A.
• L i is the same as M i is changed.
except the sign of the non-zero value • With pivoting U = M n-1 P n-1 ….M
2 P 2 M 1 P 1 A, and it can be shown in a similar way that LU=PA, where P = P n-1 …P 2 P 1 .
The Need for Pivoting
• Suppose we change our problem slightly to: 10 5 3 7 2 .
099 1 0 6 5
x
1
x
2
x
3 7 3 .
901 6 where the solution is still (0,-1,1) T .
• Now suppose we solve the problem on a computer that does arithmetic to five decimal places.
Pivoting (continued)
• The first step of the elimination gives: 10 0 0 7 0 .
001 2 .
5 0 6 5
x x
2
x
1 3 7 6 .
001 2 .
5 • The (2,2) element is quite small. Now we continue
without
pivoting.
• The next step is to subtract -2.5
10 3 times equation 2 from equation 3: (5-(-2.5 10 3 )(6))x 3 = (2.5-(-2.5 10 3 )(6.001)) • The righthand side is (2.5+1.50025 10 4 ). The second term is rounded to 1.5002 10 4 . When we add the 2.5 the result is rounded to 1.5004 10 4 .
Pivoting (continued)
• So the equation for x 3 1.5005 10 4 x 3 becomes: = 1.5004 10 4 which gives x 3 • Then x 2 = 0.99993 (instead of 1).
is found from: -0.001 x 2 which gives: + (6)(0.99993) = 6.001
-0.001 x 2 = 1.5 10 -3 so x 2 = -1.5 (instead of -1).
• Finally, x 1 is found from: 10 x 1 + (-7)(-1.5) = 7 which gives x • The problem arises from using a small pivot which leads to a larger multiplier.
1 = -0.35 (instead of 0).
• Partial pivoting ensures that the multipliers are always less than or equal to 1 in magnitude, and results in a satisfactory solution.
Measuring Errors
• • The discrepancy due to rounding error in a solution can be measured by the
error
:
e
=
x
–
x
* and by the
residual
:
r
=
b
- A
x
* where
x
is the exact solution and
x
* computed solution.
is the
e
=
0
if and only if
r
=
0
, however,
e
not necessarily both small.
and
r
are
Error Example
• Consider an example in which the matrix is almost singular.
• If GE is used to compute the solution with low precision we get a large error but small residual.
• Geometrically, the lines represented by the equation are almost parallel.
• GE with partial pivoting will always produce small residuals, but not necessarily small errors.
0 .
780 0 .
913 0 .
563 0 .
659
x x
2 1 0 .
217 0 .
254 computed solution exact solution
Sensitivity
• We want to know how sensitive the solution to A
x
=
b
is to perturbations in A and
b
.
• To do this we first have to come up with some measure of how close to singular a matrix is.
• If A is singular, a solution will not exist for some
b
’s (inconsistency), while for other changes in A and
b b
’s it is not unique.
• So if A is nearly singular we would expect small to cause large changes in
x
.
• If A is the identity matrix then and
b x
=
b
, so if A is close to the identity matrix we expect small changes in A to result in small changes in
x
.
Singularity and Pivots
• In GE all the pivots are non-zero if and only if A is non-singular, provided exact arithmetic is used.
• So if the pivots are small we expect the matrix to be close to singular.
• However, with finite precision arithmetic, a matrix might still be close to singular even if none of the pivots are small.
Norms
• Size of pivots is not adequate to decide how close a matrix is to being singular.
• Define l p family of norms (1≤p≤ ):
x p
i n
1
x i p
1 /
p
• Most common norms use p = 1, 2, and .
x
1
i n
1
x i
x
2
i n
1
x i
2 1 / 2
x
max
i x i
Properties of Norms
• All these norms have the following properties:
x
0
if
0 0
cx
c x x
0 for all scalars
c x
y
x
y
(the triangle inequality) • In MATLAB use norm(x,p) to find a norm: norm1 = norm(x,1) norm2 = norm(x) norminf = norm(x,inf)
Condition Number
• A
x
may have a very different norm from
x
, and this change in norm is related to the solution sensitivity to changes in A. Denote:
Ax Ax
M = max and m = min
x x
where the max and min are taken over all non zero vectors
x.
Note, if A is singular, m = 0.
• The
condition number,
(A), of A is defined as the ratio M/m. Usually we are interested in the order of magnitude (A)
,
so it doesn’t matter which norm is used.
Relative Errors
• Suppose we have an error
b
results in an error
x
in
x
. So A in
x
=
b b
, which becomes: A(
x
+
x
) =
b
+
b
so A
x
=
b
. From the definition of M and m we have:
b
M x
and
b
m
x
Then if m 0 we have the following relationship between the relative error in
b
and in
x
:
x
(
A
)
b x b
(A) measures the amplification of the relative error!
Uses of Condition Number
1. As a measure of the amplification of relative error due to changes in rhs.
2. As a measure of the amplification of relative error due to changes in matrix A.
3. As a measure of how close a matrix is to being singular (hard maths omitted). If (A) is large then A is close to singular.
Some Properties of
(A)
• • (A) 1 since M m.
(P) = 1, if is a permutation matrix.
• (cA) = (A) for scalar c.
• For D a diagonal matrix (D) is the ratio of the largest diagonal value to the smallest.
Relative Error Example
• In this example we use the l 1 -norm:
A
4 .
1 9 .
7 2 .
8 6 .
6
b
4 .
1 9 .
7 • Solution is
x
Change ~
b
4 .
11 9 .
70
b
= (1 0) T , and ||b|| = 13.8 and ||x|| = 1. slightly to: then
x
becomes 0 .
34 0 .
97 Small change in
b
gives big change in
x
.
• Errors are ||
b
||=0.01 and ||
x
||=1.63, so relative errors are:
b b
0 .
0007246
x x
1 .
63 (A) is large = 1.63/0.0007246 = 2249.4
Relative Error and Residual
• Condition number is also important in round-off error in GE. It can be shown that:
b
Ax
*
A x
*
x
x
*
x
* (
A
) where
x
* is the numerical solution of A
x
constant smaller than about 10, and =
b
, is a the machine precision.
• The first inequality says that the relative residual will be about the same size as the round-off error no matter how ill-conditioned A is.
• The second inequality says that the relative error is small if (A) is small but might be large if (A) is large.
Matrix Norms
• The norm of matrix A is ||A|| = M, i.e.,
A
max
Ax x
• Since ||A -1 || = 1/m it follows that the condition number can also be defined as: (A) = ||A|| ||A -1 || • In MATLAB, cond(A,p) computes the condition number of A relative to the l p -norm.
Iterative methods for A
x
=
b
• Gaussian Elimination is a direct method that computes the solution of A O(n 3 /3) operations.
x
=
b
is • If n is large we might want a faster, less accurate, method.
• With an iterative method we can stop the iteration whenever we think we have a sufficiently accurate solution.
Iterative methods
• Suppose we split the matrix as A = S-T, then S
x
= T
x
+
b
.
• We can turn this into an iteration: S
x
k+1 = T
x
k +
b
or
x
k+1 = S -1 T
x
k + S -1
b
• So if this sequence converges then we can start the iteration with a guess at the solution
x
0 and get an approximate solution.
• We need S to be easily invertible
Common Iterative Methods
1.
S = diagonal part of A (Jacobi’s method) 2. S = triangular part of A (Gauss-Seidel method) 3. S = combination of 1 and 2 (successive over relaxation or SOR) S is called a
pre-conditioner
. The choice of S affects the convergence properties of the solution.
Jacobi Method
• S = diag(A) so formula for iteration k+1 becomes:
a ii
(
x i
)
k
1
i j
1 1
a ij
(
x j
)
k
j n
i
1
a ij
(
x j
)
k
b i
• Expressed in matrices: D
x
k+1 = - (L+U)
x
k +
b
where D, L, and U are the diagonal, strictly lower triangular, and strictly upper triangular parts of A, respectively.
Note: these are not the D, L, and U of the LDU factorisation.
Jacobi Example
• Example:
A
2 1 2 1 ,
S
2 0 0 2 ,
T
0 1 1 0 ,
S
1
T
0 1 2 1 2 0 Then:
x x
2 1
k
1 0 1 2 1 2 0
x x
2 1
k
b b
2 1 2 2
Pseudocode for Jacobi Method
choose an initial guess x 0 for k=0,1,2,….
for i=1 to n sum = 0.0
for j=1 to i-1 and i+1 to n sum = sum + A(i,j)*x k (j) end x k+1 (i) = (b(i) –sum)/A(i,i) end check convergence and continue if needed end
Gauss-Seidel Method
• A problem with the Jacobi method is that we have to store all of
x
k computing
x
k+1 .
until we have finished • In the Gauss-Seidel method the
x
k+1 are used as soon as they are computed, and replace the corresponding
x
k on the righthand side
a ii
(
x i
)
k
1
j
1
i
1
a ij
(
x j
)
k
1
j n
i
1
a ij
(
x j
)
k
b i
• This uses half the storage of the Jacobi method.
Gauss-Seidel in Matrix Notation
• Expressed in matrices: (D+L)
x
k+1 = -U
x
k +
b
where D+L is the lower triangular part of A, and U is the strictly upper triangular part of A.
Gauss-Seidel Example
• Example:
A
2 1 2 1 ,
S
2 1 0 2 ,
T
0 0 1 0 ,
S
1
T
0 0 1 / 1 / 2 4 Then: 2 1 0 2
x x
2 1
k
1 0 0 1 0
x x
2 1
k
b b
2 1 • Gauss-Seidel is better than Jacobi because it uses half the storage and often converges faster.
Pseudocode for Gauss-Seidel
choose an initial guess x 0 for k=0,1,2,….
for i=1 to n sum = 0.0
for j=1 to i-1 sum = sum + A(i,j)*x k+1 (j) end for j=i+1 to n end sum = sum + A(i,j)*x k (j) end x k+1 (i) = (b(i) –sum)/A(i,i) end check convergence and continue if needed