Transcript ConjGradx

Luonan Wang
07/14/2015
Ax = b
Solving large systems of linear equations
Hard to solve using Gaussian elimination.

A : Square
Symmetric
𝐴𝑇 = 𝐴
Positive-definite 𝑥 𝑇 𝐴𝑥 > 0

𝑓 𝑥 =
1 𝑇
𝑥 𝐴𝑥
2
− 𝑏𝑇 𝑥 + c
A: matrix
x, b: vectors
c: scalar constant

𝑓 𝑥 is minimized by the solution to 𝐴𝑥 = 𝑏
𝐴=
3
2
2
,
6
𝑏=
Figure 1. Graph of 𝑓 𝑥 =
2
,
−8
1
2
𝑥 𝑇 𝐴𝑥 − 𝑏 𝑇 𝑥 + c
𝑐=0

𝑓 𝑥 =
′
 𝑓
𝑥 =
1 𝑇
𝑥 𝐴𝑥
2
1 𝑇
𝐴 𝑥
2
− 𝑏𝑇 𝑥 + c
+
1
𝐴𝑥
2
−𝑏
Since A is symmetric, we have
𝑓 ′ 𝑥 = 𝐴𝑥 − 𝑏

Starting at initial guess 𝑥(0) , go through a series of steps
𝑥(1) , 𝑥(2) , 𝑥(3) , … until we are close enough to solution x

Definition
error 𝑒(𝑖) = 𝑥(𝑖) − 𝑥,how far we are from the solution
residual 𝑟(𝑖) = 𝑏 − 𝐴𝑥(𝑖) , how far we are from the correct b
𝑟(𝑖) = −𝐴𝑒 𝑖 = −𝑓 ′ (𝑥(𝑖) ), each step choose direction in which f
decreases most quickly

𝑥(1) = 𝑥(0) + 𝛼𝑟(0) , how do we choose 𝛼?
Figure 2. Gradient 𝑓 ′ is shown at several locations along the search line. On the search line, 𝑓 is minimized
where the gradient is orthogonal to the search line


𝑑
𝑓
𝑑𝛼
𝑑
𝑓
𝑑𝛼
𝑥
𝑥
1
=0
1
= 𝑓 ′ (𝑥(1) )𝑇
𝑑
𝑥
𝑑𝛼 (1)
𝑇
= 𝑓 ′ (𝑥(1) )𝑇 𝑟(0) = 𝑟(1)
𝑟(0) = 0
(𝑏 − 𝐴𝑥(1) )𝑇 𝑟(0) = 0
(𝑏 − 𝐴(𝑥 0 + 𝛼𝑟(0) ))𝑇 𝑟(0) = 0
(𝑏 −
𝐴𝑥(0) )𝑇 𝑟(0)
− 𝛼 𝐴𝑟 0
𝑇
(𝑏 − 𝐴𝑥(0) ) 𝑟(0) = 𝛼 𝐴𝑟 0
𝑇
𝑟(0) 𝑟(0) = 𝛼𝑟 𝑇0 (𝐴𝑟
𝛼=
𝑇 𝑟
𝑟(0)
(0)
𝑇 𝐴𝑟
𝑟(0)
(0)
0
)
𝑇
𝑟
𝑇
𝑟
0
0
=0
𝑟(𝑖) = 𝑏 − 𝐴𝑥
𝛼(𝑖) =
𝑖
𝑇
𝑟(𝑖) 𝑟(𝑖)
𝑇
𝑟(𝑖) 𝐴𝑟(𝑖)
𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑟(𝑖)
𝑒
= (𝑒 𝑇 𝐴𝑒)1/2

energy norm

condition number κ = 𝜆𝑚𝑎𝑥 𝜆𝑚𝑖𝑛 ,
𝐴
𝜆𝑚𝑎𝑥 and 𝜆𝑚𝑖𝑛 are largest and smallest
eigenvalue of matrix A

Convergence results
𝑒(𝑖)
𝐴
𝜅−1 𝑖
≤(
) 𝑒(0)
𝜅+1
𝐴
Figure 3 Steepest Descent starts at x0 and converges at x

Steepest Descent takes step in the same direction
as before

We want to take exact one step in each search
direction
Figure 4. The method of orthogonal directions
Choose orthogonal search directions
𝑑(0) , 𝑑(1) , … , 𝑑(𝑛−1)

𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖)
𝑇
𝑑(𝑖)
𝑒(𝑖+1) = 0
𝑑 𝑇𝑖 𝑒 𝑖 + 𝛼 𝑖 𝑑
𝛼(𝑖) = −
𝑖
=0
𝑇
𝑑(𝑖)
𝑒(𝑖)
𝑇 𝑑
𝑑(𝑖)
(𝑖)
𝑒(𝑖) is unknown, otherwise we would have solved the problem

Choose search directions to be A-orthogonal
Two vectors 𝑑(𝑖) and 𝑑(𝑗) are A-orthogonal, or conjugate if
𝑇
𝑑(𝑖)
𝐴𝑑(𝑗) = 0

𝑒(𝑖+1) will be A-orthogonal to 𝑑(𝑖)
𝑇
𝑑(𝑖)
𝐴𝑒(𝑖+1) = 0
𝛼(𝑖) = −
𝑑 𝑇𝑖 𝐴𝑒
𝑖
𝑑 𝑇𝑖 𝐴𝑑
𝑖
=
𝑑 𝑇𝑖 𝑟
𝑖
𝑑 𝑇𝑖 𝐴𝑑
𝑖

We need a set of A-orthogonal search directions 𝑑(𝑖)
𝑢0 , 𝑢1 , … , 𝑢𝑛−1 are a set of n linearly independent vectors
𝑑(0) = 𝑢(0)
𝑑(𝑖) = 𝑢(𝑖) +
𝑖−1
𝑘=0 𝛽𝑖𝑘 𝑑(𝑘) ,
𝑢𝑇𝑖 𝐴𝑑 𝑗
𝛽𝑖𝑗 = − 𝑇
𝑑
𝑖
𝐴𝑑
,
for i>0
𝑖>𝑗
𝑗
All old search vectors must be kept in memory to construct each new one
Figure 5. Gram-Schmidt conjugation of two vectors
𝑇
𝑇
𝑇
𝑟(𝑖)
𝑟(𝑗+1) = 𝑟(𝑖)
𝑟(𝑗) − 𝛼(𝑗) 𝑟(𝑖)
𝐴𝑑(𝑗)
𝑇
𝑇
𝑇
𝛼(𝑗) 𝑟(𝑖)
𝐴𝑑(𝑗) = 𝑟(𝑖)
𝑟(𝑗) − 𝑟(𝑖)
𝑟(𝑗+1)
𝑇
𝑟(𝑖)
𝐴𝑑(𝑗)
1 𝑇
𝑟 𝑟 ,
𝛼(𝑖) (𝑖) (𝑖)
1
=
−
𝑟 𝑇𝑖 𝑟 𝑖
𝛼 𝑖−1
0,
1
𝛽𝑖𝑗 = 𝛼
𝑖−1
0,
𝑖=𝑗
𝑖 =𝑗+1
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑟 𝑇𝑖 𝑟 𝑖
𝑑 𝑇𝑖−1 𝐴𝑑
,
𝑖 =𝑗+1
𝑗−1
𝑖 >𝑗+1
𝛽𝑖 =
𝑟 𝑇𝑖 𝑟 𝑖
𝑑𝑇𝑖−1
𝑟
=
𝑖−1
𝑟 𝑇𝑖 𝑟 𝑖
𝑟 𝑇𝑖−1 𝑟
𝑖−1
𝑑(0) = 𝑟(0) = 𝑏 − 𝐴𝑥
𝛼(𝑖) =
0
𝑇
𝑟(𝑖)
𝑟(𝑖)
(1)
(2)
𝑇 𝐴𝑑
𝑑(𝑖)
(𝑖)
𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖)
(3)
𝑟(𝑖+1) = 𝑟(𝑖) − 𝛼 𝑖 𝐴𝑑
(4)
𝛽(𝑖+1) =
𝑖
𝑇
𝑟(𝑖+1)
𝑟(𝑖+1)
𝑇 𝑟
𝑟(𝑖)
(𝑖)
𝑑(𝑖+1) = 𝑟(𝑖+1) + 𝛽(𝑖+1) 𝑑(𝑖)
(5)
(6)
𝑒(𝑖)
𝐴
𝜅−1 𝑖
≤ 2(
) 𝑒(0)
𝜅+1
𝐴
𝜅 in CG compared to 𝜅 in Steepest Descent
CG has much better convergence rate