conjugate gradient method - Department of Computer Science

Download Report

Transcript conjugate gradient method - Department of Computer Science

Monica Garika
Chandana Guduru
METHODS TO SOLVE LINEAR SYSTEMS
 Direct methods
Gaussian elimination method
LU method for factorization
Simplex method of linear programming
 Iterative method
Jacobi method
Gauss-Seidel method
Multi-grid method
Conjugate gradient method
Conjugate Gradient method
 The CG is an algorithm for the numerical
solution of particular system of linear equations
Ax=b.
Where A is symmetric i.e., A = AT and positive
definite i.e.,
xT * A * x > 0 for all nonzero vectors
If A is symmetric and positive definite then the
function
Q(x) = ½ x`Ax – x`b + c
Conjugate gradient method
 Conjugate gradient method builds up a solution
x*€ Rn in at most n steps in the absence of round off
errors.
 Considering round off errors more than n steps may be
needed to get a good approximation of exact solution
x*
 For sparse matrices a good approximation of exact
solution can be achieved in less than n steps in also
with round off errors.
Practical Example
In oil reservoir simulation,
The number of linear equations corresponds to the
number of grids of a reservoir
 The unknown vector x is the oil pressure of reservoir
 Each element of the vector x is the oil pressure of a
specific grid of the reservoir
Linear System
Square matrix
Unknown vector
(what we want to find)
Known vector
Matrix Multiplication
Positive Definite Matrix
[ x 1 x 2 … xn ]
>0
Procedure
 Finding the initial guess for solution 𝑥0
 Generates successive approximations to 𝑥0
 Generates residuals
 Searching directions
Conjugate gradient iteration
 x0 = 0, r0 = b, p0 = r0
 for k = 1, 2, 3, . . .

αk = (rTk-1rk-1) / (pTk-1Apk-1) step length

xk = xk-1 + αk pk-1
approximate solution

rk = rk-1 – αk Apk-1
residual

βk = (rTk rk) / (rTk-1rk-1)
improvement

pk = rk + βk pk-1
search direction
 Iteration of conjugate gradient method is of the form
x(t) = x(t-1) + s(t)d(t)
where,
x(t) is function of old value of vector x
s(t) is scalar step size x
d(t) is direction vector
Before first iteration, values of x(0), d(0) and
g(0) must be set
Steps to find conjugate gradient
method
 Every iteration t calculates x(t) in four steps :
Step 1: Compute the gradient
g(t) = Ax(t-1) – b
Step 2: Compute direction vector
d(t) = - g(t) + [g(t)` g(t) / g(t-1)` g(t-1)] d(t-1)
Step 3: Compute step size
s(t) = [- d(t)` g(t)]/d(t)’ A d(t)]
Step 4: Compute new approximation of x
x(t) = x(t-1) + s(t)d(t)
Sequential Algorithm
1) 𝑥0 = 0
2) 𝑟0 := 𝑏 − 𝐴𝑥0
3) 𝑝0 := 𝑟0
4) 𝑘 := 0
5) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done
6) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16
7) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit
8) calculate 𝑣 = 𝐴𝑝k
9) αk : = rkT rk/pTk v
10) 𝑥k+1:= 𝑥k + ak pk
11) 𝑟k+1 := 𝑟k − ak v
12) if 𝑟k+1 is sufficiently small then go to 16
end if
13) 𝛽k :=( r T k+1 r k+1)/(rTk 𝑟k)
14) 𝑝k+1 := 𝑟k+1 +βk pk
15) 𝑘 := 𝑘 + 1
16) 𝑟𝑒𝑠𝑢𝑙𝑡 = 𝑥k+1
Complexity analysis
 To Identify Data Dependencies
 To identify eventual communications
 Requires large number of operations
 As number of equations increases complexity also
increases .
Why To Parallelize?
 Parallelizing conjugate gradient method is a way to
increase its performance
 Saves memory because processors only store the
portions of the rows of a matrix A that contain nonzero elements.
 It executes faster because of dividing a matrix into
portions
How to parallelize?
 For example , choose a row-wise block striped
decomposition of A and replicate all vectors
 Multiplication of A and vector may be performed
without any communication
 But all-gather communication is needed to replicate
the result vector
 Overall time complexity of parallel algorithm is
Θ(n^2 * w / p + nlogp)
Row wise Block Striped Decomposition of a
Symmetrically Banded Matrix
(a)
(b)
Dependency Graph in CG
Algorithm
of a Parallel CG on each Computing Worker (cw)
1) Receive 𝑐𝑤𝑟0,𝑐𝑤 𝐴,𝑐𝑤 𝑥0
2) 𝑐𝑤𝑝0 =𝑐𝑤 𝑟0
3) 𝑘 := 0
4) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done
5) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16
6) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit
7) 𝑣 =𝑐𝑤 𝐴𝑐𝑤𝑝𝑘
8) 𝑐𝑤𝑁𝛼𝑘
9) 𝑐𝑤𝐷𝛼𝑘
10) Send 𝑁𝛼𝑘,𝐷𝛼𝑘
11) Receive 𝛼𝑘
12) 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑝𝑘
13) Compute Partial Result of 𝑟𝑘+1: 𝑟𝑘+1 =𝑟𝑘 − 𝛼𝑘𝑣
14) Send 𝑥𝑘+1, 𝑟𝑘+1
15) Receive 𝑠𝑖𝑔𝑛𝑎𝑙
16) if 𝑠𝑖𝑔𝑛𝑎𝑙 = 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑟𝑒𝑎𝑐ℎ𝑒𝑑 go to 23
17) 𝑐𝑤𝑁𝛽𝑘
18) 𝑐𝑤𝐷𝛽𝑘
19) Send 𝑐𝑤𝑁𝛽𝑘 , 𝑐𝑤𝐷𝛽𝑘
20) Receive 𝛽𝑘
21) 𝑐𝑤𝑝𝑘+1 =𝑐𝑤 𝑟𝑘+1 + 𝛽𝑘 𝑐𝑤𝑝𝑘
22) 𝑘 := 𝑘 + 1
23) Result reached
Speedup of Parallel CG on Grid
Versus Sequential CG on Intel
Communication and Waiting Time
of the Parallel CG on Grid
We consider the difference between f at the solution x and any other vector p:
i)
ii )
f p  
1
f x 
1
p Ap  b p  c
t
t
2
x Ax  b x  c
t
t
2
1 t
 1 t

t
t
ii  i ) f ( x )  f  p    x A x  b x  c    p A p  b p  c 
2
 2

sim plify ) f ( x )  f  p  
1
1
2
2
replace b ) f ( x )  f  p  
1
x Ax  x Ax 
x Ax  b x 
t
t
t
t
2

1
1
2
positive-definitess ) f ( x )  f  p   0
t
1
t
p Ap  x Ap
t
t
2
1
x Ax 
t
2

p Ap  b p
p Ap  x Ap
t
2
x  p
t
A x  p
t
Parallel Computation Design
Parallel Computation Design
 – Iterations of the conjugate gradient method can be executed
 only in sequence, so the most advisable approach is to
parallelize the computations, that are carried out at each
iteration
 The most time-consuming computations are the multiplication
of matrix A by the vectors x and d
 – Additional operations, that have the lower computational
complexity order, are different vector processing procedures
(inner product, addition and subtraction, multiplying by a scalar).
 While implementing the parallel conjugate gradient method,
it can be used parallel algorithms of matrix-vector
multiplication,

“Pure” Conjugate Gradient Method (Quadratic Case)
0 - Starting at any x0 define d0 = -g0 = b - Q x0 , where gk is the column
vector of gradients of the objective function at point f(xk)
1 - Using dk , calculate the new point xk+1 = xk + ak dk , where
gkTd k
ak=- T
d k Qd k
2 - Calculate the new conjugate gradient direction dk+1 , according to:
dk+1 = - gk+1 + bk dk
where
gk+1 TQd k
bk=
d kTQd k
ADVANTAGES
Advantages:
 1) Gradient is always nonzero and linearly
independent of all previous direction vectors.
 2) Simple formula to determine the new direction.
Only slightly more complicated than steepest descent.
 3) Process makes good progress because it is based
on gradients.
ADVANTAGES
 Attractive are the simple formulae for updating the
direction vector.
 Method is slightly more complicated than steepest
descent, but converges faster.
 Conjugate gradient method is an Indirect Solver
 Used to solve large systems
 Requires small amount of memory
 It doesn’t require numerical linear algebra
Conclusion
 Conjugate gradient method is a linear solver tool
which is used in a wide range of engineering and
sciences applications.
 However, conjugate gradient has a complexity
drawback due to the high number of arithmetic
operations involved in matrix vector and vector-vector
multiplications.
 Our implementation reveals that despite the
communication cost involved in a parallel CG, a
performance improvement compared to a sequential
algorithm is still possible.
References
 Parallel and distributed computing systems by Dimitri




P. Bertsekas, John N.Tsitsiklis
Parallel programming for multicore and cluster
systems by Thomas Rauber,Gudula Runger
Scientific computing . An introduction with parallel
computing by Gene Golub and James M.Ortega
Parallel computing in C with Openmp and mpi by
Michael J.Quinn
Jonathon Richard Shewchuk, ”An Introduction to the
Conjugate Gradient Method Without the Agonizing
Pain”, School of Computer Science, Carnegie Mellon
University, Edition 1 1/4