Transcript ConjGradx
Luonan Wang 07/14/2015 Ax = b Solving large systems of linear equations Hard to solve using Gaussian elimination. A : Square Symmetric 𝐴𝑇 = 𝐴 Positive-definite 𝑥 𝑇 𝐴𝑥 > 0 𝑓 𝑥 = 1 𝑇 𝑥 𝐴𝑥 2 − 𝑏𝑇 𝑥 + c A: matrix x, b: vectors c: scalar constant 𝑓 𝑥 is minimized by the solution to 𝐴𝑥 = 𝑏 𝐴= 3 2 2 , 6 𝑏= Figure 1. Graph of 𝑓 𝑥 = 2 , −8 1 2 𝑥 𝑇 𝐴𝑥 − 𝑏 𝑇 𝑥 + c 𝑐=0 𝑓 𝑥 = ′ 𝑓 𝑥 = 1 𝑇 𝑥 𝐴𝑥 2 1 𝑇 𝐴 𝑥 2 − 𝑏𝑇 𝑥 + c + 1 𝐴𝑥 2 −𝑏 Since A is symmetric, we have 𝑓 ′ 𝑥 = 𝐴𝑥 − 𝑏 Starting at initial guess 𝑥(0) , go through a series of steps 𝑥(1) , 𝑥(2) , 𝑥(3) , … until we are close enough to solution x Definition error 𝑒(𝑖) = 𝑥(𝑖) − 𝑥,how far we are from the solution residual 𝑟(𝑖) = 𝑏 − 𝐴𝑥(𝑖) , how far we are from the correct b 𝑟(𝑖) = −𝐴𝑒 𝑖 = −𝑓 ′ (𝑥(𝑖) ), each step choose direction in which f decreases most quickly 𝑥(1) = 𝑥(0) + 𝛼𝑟(0) , how do we choose 𝛼? Figure 2. Gradient 𝑓 ′ is shown at several locations along the search line. On the search line, 𝑓 is minimized where the gradient is orthogonal to the search line 𝑑 𝑓 𝑑𝛼 𝑑 𝑓 𝑑𝛼 𝑥 𝑥 1 =0 1 = 𝑓 ′ (𝑥(1) )𝑇 𝑑 𝑥 𝑑𝛼 (1) 𝑇 = 𝑓 ′ (𝑥(1) )𝑇 𝑟(0) = 𝑟(1) 𝑟(0) = 0 (𝑏 − 𝐴𝑥(1) )𝑇 𝑟(0) = 0 (𝑏 − 𝐴(𝑥 0 + 𝛼𝑟(0) ))𝑇 𝑟(0) = 0 (𝑏 − 𝐴𝑥(0) )𝑇 𝑟(0) − 𝛼 𝐴𝑟 0 𝑇 (𝑏 − 𝐴𝑥(0) ) 𝑟(0) = 𝛼 𝐴𝑟 0 𝑇 𝑟(0) 𝑟(0) = 𝛼𝑟 𝑇0 (𝐴𝑟 𝛼= 𝑇 𝑟 𝑟(0) (0) 𝑇 𝐴𝑟 𝑟(0) (0) 0 ) 𝑇 𝑟 𝑇 𝑟 0 0 =0 𝑟(𝑖) = 𝑏 − 𝐴𝑥 𝛼(𝑖) = 𝑖 𝑇 𝑟(𝑖) 𝑟(𝑖) 𝑇 𝑟(𝑖) 𝐴𝑟(𝑖) 𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑟(𝑖) 𝑒 = (𝑒 𝑇 𝐴𝑒)1/2 energy norm condition number κ = 𝜆𝑚𝑎𝑥 𝜆𝑚𝑖𝑛 , 𝐴 𝜆𝑚𝑎𝑥 and 𝜆𝑚𝑖𝑛 are largest and smallest eigenvalue of matrix A Convergence results 𝑒(𝑖) 𝐴 𝜅−1 𝑖 ≤( ) 𝑒(0) 𝜅+1 𝐴 Figure 3 Steepest Descent starts at x0 and converges at x Steepest Descent takes step in the same direction as before We want to take exact one step in each search direction Figure 4. The method of orthogonal directions Choose orthogonal search directions 𝑑(0) , 𝑑(1) , … , 𝑑(𝑛−1) 𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖) 𝑇 𝑑(𝑖) 𝑒(𝑖+1) = 0 𝑑 𝑇𝑖 𝑒 𝑖 + 𝛼 𝑖 𝑑 𝛼(𝑖) = − 𝑖 =0 𝑇 𝑑(𝑖) 𝑒(𝑖) 𝑇 𝑑 𝑑(𝑖) (𝑖) 𝑒(𝑖) is unknown, otherwise we would have solved the problem Choose search directions to be A-orthogonal Two vectors 𝑑(𝑖) and 𝑑(𝑗) are A-orthogonal, or conjugate if 𝑇 𝑑(𝑖) 𝐴𝑑(𝑗) = 0 𝑒(𝑖+1) will be A-orthogonal to 𝑑(𝑖) 𝑇 𝑑(𝑖) 𝐴𝑒(𝑖+1) = 0 𝛼(𝑖) = − 𝑑 𝑇𝑖 𝐴𝑒 𝑖 𝑑 𝑇𝑖 𝐴𝑑 𝑖 = 𝑑 𝑇𝑖 𝑟 𝑖 𝑑 𝑇𝑖 𝐴𝑑 𝑖 We need a set of A-orthogonal search directions 𝑑(𝑖) 𝑢0 , 𝑢1 , … , 𝑢𝑛−1 are a set of n linearly independent vectors 𝑑(0) = 𝑢(0) 𝑑(𝑖) = 𝑢(𝑖) + 𝑖−1 𝑘=0 𝛽𝑖𝑘 𝑑(𝑘) , 𝑢𝑇𝑖 𝐴𝑑 𝑗 𝛽𝑖𝑗 = − 𝑇 𝑑 𝑖 𝐴𝑑 , for i>0 𝑖>𝑗 𝑗 All old search vectors must be kept in memory to construct each new one Figure 5. Gram-Schmidt conjugation of two vectors 𝑇 𝑇 𝑇 𝑟(𝑖) 𝑟(𝑗+1) = 𝑟(𝑖) 𝑟(𝑗) − 𝛼(𝑗) 𝑟(𝑖) 𝐴𝑑(𝑗) 𝑇 𝑇 𝑇 𝛼(𝑗) 𝑟(𝑖) 𝐴𝑑(𝑗) = 𝑟(𝑖) 𝑟(𝑗) − 𝑟(𝑖) 𝑟(𝑗+1) 𝑇 𝑟(𝑖) 𝐴𝑑(𝑗) 1 𝑇 𝑟 𝑟 , 𝛼(𝑖) (𝑖) (𝑖) 1 = − 𝑟 𝑇𝑖 𝑟 𝑖 𝛼 𝑖−1 0, 1 𝛽𝑖𝑗 = 𝛼 𝑖−1 0, 𝑖=𝑗 𝑖 =𝑗+1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟 𝑇𝑖 𝑟 𝑖 𝑑 𝑇𝑖−1 𝐴𝑑 , 𝑖 =𝑗+1 𝑗−1 𝑖 >𝑗+1 𝛽𝑖 = 𝑟 𝑇𝑖 𝑟 𝑖 𝑑𝑇𝑖−1 𝑟 = 𝑖−1 𝑟 𝑇𝑖 𝑟 𝑖 𝑟 𝑇𝑖−1 𝑟 𝑖−1 𝑑(0) = 𝑟(0) = 𝑏 − 𝐴𝑥 𝛼(𝑖) = 0 𝑇 𝑟(𝑖) 𝑟(𝑖) (1) (2) 𝑇 𝐴𝑑 𝑑(𝑖) (𝑖) 𝑥(𝑖+1) = 𝑥(𝑖) + 𝛼(𝑖) 𝑑(𝑖) (3) 𝑟(𝑖+1) = 𝑟(𝑖) − 𝛼 𝑖 𝐴𝑑 (4) 𝛽(𝑖+1) = 𝑖 𝑇 𝑟(𝑖+1) 𝑟(𝑖+1) 𝑇 𝑟 𝑟(𝑖) (𝑖) 𝑑(𝑖+1) = 𝑟(𝑖+1) + 𝛽(𝑖+1) 𝑑(𝑖) (5) (6) 𝑒(𝑖) 𝐴 𝜅−1 𝑖 ≤ 2( ) 𝑒(0) 𝜅+1 𝐴 𝜅 in CG compared to 𝜅 in Steepest Descent CG has much better convergence rate