Solving Large Scale Linear Systems (in parallel)

Download Report

Transcript Solving Large Scale Linear Systems (in parallel)

Solving Scalar Linear Systems
Iterative approach
Lecture 15
MA/CS 471
Fall 2003
Some matlab scripts to construct various types of random circuit loop matrices
are available at the class website:
The Sparsity Pattern of a Loop Circuit Matrix for
a Random Circuit (with 1000 closed loops)
b
gridcircuit.m
An array of current loops with random resistors
(resistors on circuit boundary not shown)
Matrix Due To Random Grid Circuit
b
Note the large amount of structure in the loop circuit matrix
The Limits of Factorization
• In the last class/lab we began to see that there are
limits to the size of linear system solvable with
matrix factorization based methods.
• The storage cost for the loop current matrix built on
a Cartesian circuit stored as a sparse NxN matrix is
~ 5*N
• However, using LU (or Cholesky) and symmetric
RCM the storage requirement is b*N which is
typically at least an order of magnitude larger than
the storage required for the loop matrix itself.
• Also – memory spent on storing the matrices is
memory we could have used for extra cells…
Alternative Approach
• We are going to pursue iterative methods
which will satisfy the equation Ax  b
in an approximate way without an
excessive amount of extra storage.
• There are a number of different classes of
iterative methods, today we will discuss an
example from the class of stationary
methods.
http://www.netlib.org/linalg/html_templates/node9.html
Jacobi Iteration
• Example system:
• Initial guess:
• Algorithm:
5 x  1y  3z  2
3x  6 y  2 z  3
2x  3y  6z  1
x0  y 0  z 0  0
5 x n1  1y n  3z n  2
3x n  6 y n1  2 z n  3
2 x n  3 y n  6 z n1  1
• i.e. for the I’th equation compute the I’th degree
of freedom using the values computed from the
previous iteration.
Cleaning The Scheme Up
1
x   2  1y n  3z n 
5
1
n 1
n
n
y   3  3x  2 z 
6
1
n 1
n
n
z  1  2 x  3 y 
6
n 1
5 x n1  1y n  3z n  2
3x  6 y
n
n 1
 2z  3
2x  3y  6z
n
n
n
n 1
1
Couple of Iterations
1
2
0
0
x   2  1y  3z  
5
5
1
1
1
0
0
y   3  3x  2 z  
6
2
1
1
1
0
0
z  1  2 x  3 y  
6
6
1
1st iteration:
2nd
iteration:
1
1
1
1 1
1
1
x   2  1y  3z    2  1.  3.  
5
5
2
6 5
2
y2 
1
1
2
1  11
1
1
3

3
x

2
z

3

3.

2.




6
6
5
6  45
1
1
2
1  13
1
1
z  1  2 x  3 y   1  2.  3.  
6
6
5
2  60
2
Pseudo-Code For Jacobi Method
1) Build A, b
2) Build modified A with diagonal zero  Q
3) Set initial guess x=0
4) do{
jN

1 
xi 
 bi   Qij x j 
a) compute:
A
j 1

max  xi  xi
ii
b) compute error:
err 
i 1,.., N
max  bi
i 1,.., N
c) update x:
}while err>tol
xi  xi



Matlab
To The
Rescue
Running The Jacobi
Iteration Script
• I ran the script with the
stopping tolerance (tol)
set to 1e-8:
• Note that the error is of
the order of 1e-8 !!!
• i.e. the Jacobi iterative
method computes an
approximate solution to
the system of equations!.
Try The Larger
Systems
• First I made the script
into a function which
takes the rhs vector,
system matrix and
stopping tolerance as
input.
• It returns the
approximate solution
and the residual history.
Run Time!
• First I built a circuit matrix using the
gridcircuit.m script (with N=100)
• Then I built a random source vector for the
right hand side.
• Then I called the jacobisolve routine with:
• [x,residuals] = jacobisolve(mat,b,1e-4);
Convergence History
Stopping
criterion..
Increasing System Size
Notice how the number of iterations required grew with N
Ok – so I kind of broke the rules
• We set up the Jacobi iteration but did not ask the
question “when will the Jacobi iteration converge and
how fast?”
Definition:
A matrix A is diagonally dominant if
N

j 1, j i
Aij  Aii
Theorem:
If the matrix A is diagonally dominant then Ax=b has a
unique solution x and the Jacobi iteration produces a
0
n
x
sequence  which converges to x for any initial guess x
Informally:
The “more diagonally dominant” a matrix is the faster it
will converge… this holds some of the time.
Going Back To The Circuit System
• For the gridcircuit code I set up the resistors so
that all the internal circuit loops shared all their
resistors with other loops.
• the current balance law => for internal cells the
total cell resistance = sum of of resistances
shared with neighboring cells…
• i.e. the row sums of the internal cell loop currents
N
is zero
 Aij  Aii
j 1, j i
• i.e. the matrix is weakly diagonally dominant – and
does not exactly satisfy the convergence criterion
for the Jacobi iterative scheme.
Slight Modification of the Circuit
• I added an additional random resistor to
each cell (i.e. increased the diagonal entry
and did not change the off-diagonal
entries).
• This modification ensures that the matrix is
now strictly diagonally dominant.
Convergence history for diagonally dominant system – notice dramatic
reduction in iteration count
Gauss-Seidel
• Example system:
• Initial guess:
• Algorithm:
5 x  1y  3z  2
3x  6 y  2 z  3
2x  3y  6z  1
x0  y 0  z 0  0
5 x n1  1y n  3z n  2
3x n1  6 y n1  2 z n  3
2 x n1  3 y n1  6 z n1  1
• i.e. for the I’th equation compute the I’th degree of
freedom using the values computed from the
previous iteration and the new values just computed
Cleaning The Scheme Up
1
x   2  1y n  3z n 
5
1
n 1
n 1
n
y   3  3x  2 z 
6
1
n 1
n 1
n 1
z  1  2 x  3 y 
6
n 1
5 x n1  1y n  3z n  2
3x
2x
n 1
n 1
 6y
 3y
n 1
n 1
 2z  3
 6z
n
n 1
1
First Iteration
1st iteration:
1
0
0
x   2  1y  3z 
5
1
1
y   3  3x1  2 z 0 
6
1
1
z  1  2 x1  3 y1 
6
1
As soon as the 1 level values are computed, we use them
in the next equations..
Theorem First This Time!.
•
So we should first ask the questions
1) When will the Gauss-Seidel iteration converge?
2) How fast will it converge?
Definition:
A matrix is said to be positive definite if
xt Ax  0 for all x  N
Theorem:
If A is symmetric and positive definite, then
the Gauss-Seidel iteration converges for any
initial guess for x
Unoficially:
Gauss-Seidel will converge twice as fast in some
cases as Jacobi.
Gauss-Seidel Algorithm
• We iterate:
xi0  0
n1
i
x
j i 1
jN
1 
n 1
n

 bi   Aij x j   Aij x j 
Aii 
j 1
j i 1

Comparing Jacobi and Gauss-Seidel
Same problem – Gauss-Seidel takes almost half the work
The Catch
• Ok – so it looks like one would always use
Gauss-Seidel rather than Jacobi iteration.
• However, let us consider the parallel
implementation.
n 1
i
J: x
n 1
i
GS : x
j i 1
jN
1 
n
n

 bi   A ij x j   A ij x j 
A ii 
j 1
j i 1

j i 1
jN
1 
n 1
n

 bi   A ij x j   A ij x j 
A ii 
j 1
j i 1

Volunteer To Design
A Parallel Version
1) Decide which cells go where
2) Decide how much information each
process needs to keep locally
3) Decide what information needs to be
communicated among processes.
4) Are there any intrinsic bottlenecks in
Gauss-Seidel or Jacobi?.
5) Can we devise a hybrid version of GS
which avoids the bottlenecks?.
Project 3 (serial part)
In C:
1) Build a sparse matrix based on one of
the random circuits (design the storage
for it yourself or use someone else’s
sparse storage structure or class)
2) Write a sparse matrix times vector
routine.
3) Implement the Jacobi iterative scheme