conjugate gradient method - Department of Computer Science
Download
Report
Transcript conjugate gradient method - Department of Computer Science
Monica Garika
Chandana Guduru
METHODS TO SOLVE LINEAR SYSTEMS
Direct methods
Gaussian elimination method
LU method for factorization
Simplex method of linear programming
Iterative method
Jacobi method
Gauss-Seidel method
Multi-grid method
Conjugate gradient method
Conjugate Gradient method
The CG is an algorithm for the numerical
solution of particular system of linear equations
Ax=b.
Where A is symmetric i.e., A = AT and positive
definite i.e.,
xT * A * x > 0 for all nonzero vectors
If A is symmetric and positive definite then the
function
Q(x) = ½ x`Ax – x`b + c
Conjugate gradient method
Conjugate gradient method builds up a solution
x*€ Rn in at most n steps in the absence of round off
errors.
Considering round off errors more than n steps may be
needed to get a good approximation of exact solution
x*
For sparse matrices a good approximation of exact
solution can be achieved in less than n steps in also
with round off errors.
Practical Example
In oil reservoir simulation,
The number of linear equations corresponds to the
number of grids of a reservoir
The unknown vector x is the oil pressure of reservoir
Each element of the vector x is the oil pressure of a
specific grid of the reservoir
Linear System
Square matrix
Unknown vector
(what we want to find)
Known vector
Matrix Multiplication
Positive Definite Matrix
[ x 1 x 2 … xn ]
>0
Procedure
Finding the initial guess for solution 𝑥0
Generates successive approximations to 𝑥0
Generates residuals
Searching directions
Conjugate gradient iteration
x0 = 0, r0 = b, p0 = r0
for k = 1, 2, 3, . . .
αk = (rTk-1rk-1) / (pTk-1Apk-1) step length
xk = xk-1 + αk pk-1
approximate solution
rk = rk-1 – αk Apk-1
residual
βk = (rTk rk) / (rTk-1rk-1)
improvement
pk = rk + βk pk-1
search direction
Iteration of conjugate gradient method is of the form
x(t) = x(t-1) + s(t)d(t)
where,
x(t) is function of old value of vector x
s(t) is scalar step size x
d(t) is direction vector
Before first iteration, values of x(0), d(0) and
g(0) must be set
Steps to find conjugate gradient
method
Every iteration t calculates x(t) in four steps :
Step 1: Compute the gradient
g(t) = Ax(t-1) – b
Step 2: Compute direction vector
d(t) = - g(t) + [g(t)` g(t) / g(t-1)` g(t-1)] d(t-1)
Step 3: Compute step size
s(t) = [- d(t)` g(t)]/d(t)’ A d(t)]
Step 4: Compute new approximation of x
x(t) = x(t-1) + s(t)d(t)
Sequential Algorithm
1) 𝑥0 = 0
2) 𝑟0 := 𝑏 − 𝐴𝑥0
3) 𝑝0 := 𝑟0
4) 𝑘 := 0
5) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done
6) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16
7) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit
8) calculate 𝑣 = 𝐴𝑝k
9) αk : = rkT rk/pTk v
10) 𝑥k+1:= 𝑥k + ak pk
11) 𝑟k+1 := 𝑟k − ak v
12) if 𝑟k+1 is sufficiently small then go to 16
end if
13) 𝛽k :=( r T k+1 r k+1)/(rTk 𝑟k)
14) 𝑝k+1 := 𝑟k+1 +βk pk
15) 𝑘 := 𝑘 + 1
16) 𝑟𝑒𝑠𝑢𝑙𝑡 = 𝑥k+1
Complexity analysis
To Identify Data Dependencies
To identify eventual communications
Requires large number of operations
As number of equations increases complexity also
increases .
Why To Parallelize?
Parallelizing conjugate gradient method is a way to
increase its performance
Saves memory because processors only store the
portions of the rows of a matrix A that contain nonzero elements.
It executes faster because of dividing a matrix into
portions
How to parallelize?
For example , choose a row-wise block striped
decomposition of A and replicate all vectors
Multiplication of A and vector may be performed
without any communication
But all-gather communication is needed to replicate
the result vector
Overall time complexity of parallel algorithm is
Θ(n^2 * w / p + nlogp)
Row wise Block Striped Decomposition of a
Symmetrically Banded Matrix
(a)
(b)
Dependency Graph in CG
Algorithm
of a Parallel CG on each Computing Worker (cw)
1) Receive 𝑐𝑤𝑟0,𝑐𝑤 𝐴,𝑐𝑤 𝑥0
2) 𝑐𝑤𝑝0 =𝑐𝑤 𝑟0
3) 𝑘 := 0
4) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done
5) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16
6) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit
7) 𝑣 =𝑐𝑤 𝐴𝑐𝑤𝑝𝑘
8) 𝑐𝑤𝑁𝛼𝑘
9) 𝑐𝑤𝐷𝛼𝑘
10) Send 𝑁𝛼𝑘,𝐷𝛼𝑘
11) Receive 𝛼𝑘
12) 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑝𝑘
13) Compute Partial Result of 𝑟𝑘+1: 𝑟𝑘+1 =𝑟𝑘 − 𝛼𝑘𝑣
14) Send 𝑥𝑘+1, 𝑟𝑘+1
15) Receive 𝑠𝑖𝑔𝑛𝑎𝑙
16) if 𝑠𝑖𝑔𝑛𝑎𝑙 = 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑟𝑒𝑎𝑐ℎ𝑒𝑑 go to 23
17) 𝑐𝑤𝑁𝛽𝑘
18) 𝑐𝑤𝐷𝛽𝑘
19) Send 𝑐𝑤𝑁𝛽𝑘 , 𝑐𝑤𝐷𝛽𝑘
20) Receive 𝛽𝑘
21) 𝑐𝑤𝑝𝑘+1 =𝑐𝑤 𝑟𝑘+1 + 𝛽𝑘 𝑐𝑤𝑝𝑘
22) 𝑘 := 𝑘 + 1
23) Result reached
Speedup of Parallel CG on Grid
Versus Sequential CG on Intel
Communication and Waiting Time
of the Parallel CG on Grid
We consider the difference between f at the solution x and any other vector p:
i)
ii )
f p
1
f x
1
p Ap b p c
t
t
2
x Ax b x c
t
t
2
1 t
1 t
t
t
ii i ) f ( x ) f p x A x b x c p A p b p c
2
2
sim plify ) f ( x ) f p
1
1
2
2
replace b ) f ( x ) f p
1
x Ax x Ax
x Ax b x
t
t
t
t
2
1
1
2
positive-definitess ) f ( x ) f p 0
t
1
t
p Ap x Ap
t
t
2
1
x Ax
t
2
p Ap b p
p Ap x Ap
t
2
x p
t
A x p
t
Parallel Computation Design
Parallel Computation Design
– Iterations of the conjugate gradient method can be executed
only in sequence, so the most advisable approach is to
parallelize the computations, that are carried out at each
iteration
The most time-consuming computations are the multiplication
of matrix A by the vectors x and d
– Additional operations, that have the lower computational
complexity order, are different vector processing procedures
(inner product, addition and subtraction, multiplying by a scalar).
While implementing the parallel conjugate gradient method,
it can be used parallel algorithms of matrix-vector
multiplication,
“Pure” Conjugate Gradient Method (Quadratic Case)
0 - Starting at any x0 define d0 = -g0 = b - Q x0 , where gk is the column
vector of gradients of the objective function at point f(xk)
1 - Using dk , calculate the new point xk+1 = xk + ak dk , where
gkTd k
ak=- T
d k Qd k
2 - Calculate the new conjugate gradient direction dk+1 , according to:
dk+1 = - gk+1 + bk dk
where
gk+1 TQd k
bk=
d kTQd k
ADVANTAGES
Advantages:
1) Gradient is always nonzero and linearly
independent of all previous direction vectors.
2) Simple formula to determine the new direction.
Only slightly more complicated than steepest descent.
3) Process makes good progress because it is based
on gradients.
ADVANTAGES
Attractive are the simple formulae for updating the
direction vector.
Method is slightly more complicated than steepest
descent, but converges faster.
Conjugate gradient method is an Indirect Solver
Used to solve large systems
Requires small amount of memory
It doesn’t require numerical linear algebra
Conclusion
Conjugate gradient method is a linear solver tool
which is used in a wide range of engineering and
sciences applications.
However, conjugate gradient has a complexity
drawback due to the high number of arithmetic
operations involved in matrix vector and vector-vector
multiplications.
Our implementation reveals that despite the
communication cost involved in a parallel CG, a
performance improvement compared to a sequential
algorithm is still possible.
References
Parallel and distributed computing systems by Dimitri
P. Bertsekas, John N.Tsitsiklis
Parallel programming for multicore and cluster
systems by Thomas Rauber,Gudula Runger
Scientific computing . An introduction with parallel
computing by Gene Golub and James M.Ortega
Parallel computing in C with Openmp and mpi by
Michael J.Quinn
Jonathon Richard Shewchuk, ”An Introduction to the
Conjugate Gradient Method Without the Agonizing
Pain”, School of Computer Science, Carnegie Mellon
University, Edition 1 1/4