Transcript ConjGradx
Luonan Wang
07/14/2015
Ax = b
Solving large systems of linear equations
Hard to solve using Gaussian elimination.
A : Square
Symmetric
𝐴𝑇 = 𝐴
Positive-definite 𝑥 𝑇 𝐴𝑥 > 0
𝑓 𝑥 =
1 𝑇
𝑥 𝐴𝑥
2
− 𝑏𝑇 𝑥 + c
A: matrix
x, b: vectors
c: scalar constant
𝑓 𝑥 is minimized by the solution to 𝐴𝑥 = 𝑏
𝐴=
3
2
2
,
6
𝑏=
Figure 1. Graph of 𝑓 𝑥 =
2
,
−8
1
2
𝑥 𝑇 𝐴𝑥 − 𝑏 𝑇 𝑥 + c
𝑐=0
𝑓 𝑥 =
′
𝑓
𝑥 =
1 𝑇
𝑥 𝐴𝑥
2
1 𝑇
𝐴 𝑥
2
− 𝑏𝑇 𝑥 + c
+
1
𝐴𝑥
2
−𝑏
Since A is symmetric, we have
𝑓 ′ 𝑥 = 𝐴𝑥 − 𝑏
Starting at initial guess 𝑥(0) , go through a series of steps
𝑥(1) , 𝑥(2) , 𝑥(3) , … until we are close enough to solution x
Definition
error 𝑒(𝑖) = 𝑥(𝑖) − 𝑥,how far we are from the solution
residual 𝑟(𝑖) = 𝑏 − 𝐴𝑥(𝑖) , how far we are from the correct b
𝑟(𝑖) = −𝐴𝑒 𝑖 = −𝑓 ′ (𝑥(𝑖) ), each step choose direction in which f
decreases most quickly
𝑥(1) = 𝑥(0) + 𝛼𝑟(0) , how do we choose 𝛼?
Figure 2. Gradient 𝑓 ′ is shown at several locations along the search line. On the search line, 𝑓 is minimized
where the gradient is orthogonal to the search line
𝑑
𝑓
𝑑𝛼
𝑑
𝑓
𝑑𝛼
𝑥
𝑥
1
=0
1
= 𝑓 ′ (𝑥(1) )𝑇
𝑑
𝑥
𝑑𝛼 (1)
𝑇
= 𝑓 ′ (𝑥(1) )𝑇 𝑟(0) = 𝑟(1)
𝑟(0) = 0
(𝑏 − 𝐴𝑥(1) )𝑇 𝑟(0) = 0
(𝑏 − 𝐴(𝑥 0 + 𝛼𝑟(0) ))𝑇 𝑟(0) = 0
(𝑏 −
𝐴𝑥(0) )𝑇 𝑟(0)
− 𝛼 𝐴𝑟 0
𝑇
(𝑏 − 𝐴𝑥(0) ) 𝑟(0) = 𝛼 𝐴𝑟 0
𝑇
𝑟(0) 𝑟(0) = 𝛼𝑟 𝑇0 (𝐴𝑟
𝛼=
𝑇 𝑟
𝑟(0)
(0)
𝑇 𝐴𝑟
𝑟(0)
(0)
0
)
𝑇
𝑟
𝑇
𝑟
0
0
=0
𝑟(𝑖) = 𝑏 − 𝐴𝑥
𝛼(𝑖) =
𝑖
𝑇
𝑟(𝑖) 𝑟(𝑖)
𝑇
𝑟(𝑖) 𝐴𝑟(𝑖)
𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑟(𝑖)
𝑒
= (𝑒 𝑇 𝐴𝑒)1/2
energy norm
condition number κ = 𝜆𝑚𝑎𝑥 𝜆𝑚𝑖𝑛 ,
𝐴
𝜆𝑚𝑎𝑥 and 𝜆𝑚𝑖𝑛 are largest and smallest
eigenvalue of matrix A
Convergence results
𝑒(𝑖)
𝐴
𝜅−1 𝑖
≤(
) 𝑒(0)
𝜅+1
𝐴
Figure 3 Steepest Descent starts at x0 and converges at x
Steepest Descent takes step in the same direction
as before
We want to take exact one step in each search
direction
Figure 4. The method of orthogonal directions
Choose orthogonal search directions
𝑑(0) , 𝑑(1) , … , 𝑑(𝑛−1)
𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖)
𝑇
𝑑(𝑖)
𝑒(𝑖+1) = 0
𝑑 𝑇𝑖 𝑒 𝑖 + 𝛼 𝑖 𝑑
𝛼(𝑖) = −
𝑖
=0
𝑇
𝑑(𝑖)
𝑒(𝑖)
𝑇 𝑑
𝑑(𝑖)
(𝑖)
𝑒(𝑖) is unknown, otherwise we would have solved the problem
Choose search directions to be A-orthogonal
Two vectors 𝑑(𝑖) and 𝑑(𝑗) are A-orthogonal, or conjugate if
𝑇
𝑑(𝑖)
𝐴𝑑(𝑗) = 0
𝑒(𝑖+1) will be A-orthogonal to 𝑑(𝑖)
𝑇
𝑑(𝑖)
𝐴𝑒(𝑖+1) = 0
𝛼(𝑖) = −
𝑑 𝑇𝑖 𝐴𝑒
𝑖
𝑑 𝑇𝑖 𝐴𝑑
𝑖
=
𝑑 𝑇𝑖 𝑟
𝑖
𝑑 𝑇𝑖 𝐴𝑑
𝑖
We need a set of A-orthogonal search directions 𝑑(𝑖)
𝑢0 , 𝑢1 , … , 𝑢𝑛−1 are a set of n linearly independent vectors
𝑑(0) = 𝑢(0)
𝑑(𝑖) = 𝑢(𝑖) +
𝑖−1
𝑘=0 𝛽𝑖𝑘 𝑑(𝑘) ,
𝑢𝑇𝑖 𝐴𝑑 𝑗
𝛽𝑖𝑗 = − 𝑇
𝑑
𝑖
𝐴𝑑
,
for i>0
𝑖>𝑗
𝑗
All old search vectors must be kept in memory to construct each new one
Figure 5. Gram-Schmidt conjugation of two vectors
𝑇
𝑇
𝑇
𝑟(𝑖)
𝑟(𝑗+1) = 𝑟(𝑖)
𝑟(𝑗) − 𝛼(𝑗) 𝑟(𝑖)
𝐴𝑑(𝑗)
𝑇
𝑇
𝑇
𝛼(𝑗) 𝑟(𝑖)
𝐴𝑑(𝑗) = 𝑟(𝑖)
𝑟(𝑗) − 𝑟(𝑖)
𝑟(𝑗+1)
𝑇
𝑟(𝑖)
𝐴𝑑(𝑗)
1 𝑇
𝑟 𝑟 ,
𝛼(𝑖) (𝑖) (𝑖)
1
=
−
𝑟 𝑇𝑖 𝑟 𝑖
𝛼 𝑖−1
0,
1
𝛽𝑖𝑗 = 𝛼
𝑖−1
0,
𝑖=𝑗
𝑖 =𝑗+1
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑟 𝑇𝑖 𝑟 𝑖
𝑑 𝑇𝑖−1 𝐴𝑑
,
𝑖 =𝑗+1
𝑗−1
𝑖 >𝑗+1
𝛽𝑖 =
𝑟 𝑇𝑖 𝑟 𝑖
𝑑𝑇𝑖−1
𝑟
=
𝑖−1
𝑟 𝑇𝑖 𝑟 𝑖
𝑟 𝑇𝑖−1 𝑟
𝑖−1
𝑑(0) = 𝑟(0) = 𝑏 − 𝐴𝑥
𝛼(𝑖) =
0
𝑇
𝑟(𝑖)
𝑟(𝑖)
(1)
(2)
𝑇 𝐴𝑑
𝑑(𝑖)
(𝑖)
𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖)
(3)
𝑟(𝑖+1) = 𝑟(𝑖) − 𝛼 𝑖 𝐴𝑑
(4)
𝛽(𝑖+1) =
𝑖
𝑇
𝑟(𝑖+1)
𝑟(𝑖+1)
𝑇 𝑟
𝑟(𝑖)
(𝑖)
𝑑(𝑖+1) = 𝑟(𝑖+1) + 𝛽(𝑖+1) 𝑑(𝑖)
(5)
(6)
𝑒(𝑖)
𝐴
𝜅−1 𝑖
≤ 2(
) 𝑒(0)
𝜅+1
𝐴
𝜅 in CG compared to 𝜅 in Steepest Descent
CG has much better convergence rate