Nonlinear Optimization - The Systems Realization Laboratory

Download Report

Transcript Nonlinear Optimization - The Systems Realization Laboratory

What you can do for one variable,
you can do for many (in principle)
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
78
Method of Steepest Descent
The method of steepest descent (also known as the gradient
method) is the simplest example of a gradient based method
for minimizing a function of several variables.
Its core is the following recursion formula:
xk+1 = xk ±- k F k
xk , xk+1
F(x)
F

Remember: Direction = dk = S(k) = -F(x(k))
= values of t he variables in the k and k+1 it eration.
= object ive function to be minimized (or m axim ized)
= gradients of the object ive function, const itut ing the direct ion of travel
= t he size of t he step in the direct ion of travel
Refer to Section 3.5 for Algorithm and Stopping Criteria
Advantage: Simple
Disadvantage: Seldom converges reliably.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
79
Newton's Method (multi-variable case)
How t o ext end Newt on’s method to multivariable case?
xk +1 = xk -
y’(xk)
y’’(xk)
Is t his correct ? No. Why?
Start again wit h T aylor expansion:
y(x) = y(xk) + y(xk)(x-xk) + 0.5 (x-xk)H(xk) (x-xk)
T
Remainder
is dropped.
Significance?
See Sec. 1.4.
Note t hatH is the Hessian containing t he second order derivatives.
y(xk)
xk +1 = xk - H(x )
k
Is t his correct ? Not yet. Why?
Newton’s m et hod for finding an extrem e point is
xk +1 = xk - H-1(xk) y(xk)
Don’t confuse H-1 with α.
Optimization in Engineering Design
Like the Steepest Descent Method,
Newton’s searches in the negative
gradient direction.
Georgia Institute of Technology
Systems Realization Laboratory
80
Properties of Newton's Method
Good properties (fast convergence) if started near solution.
However, needs modifications if started far away from solution.
Also, (inverse) Hessian is expensive to calculate.
To overcome this, several modifications are often made.
•
One of them is to add a search parameter  in from of the Hessian.
(similar to steepest descent). This is often referred to as the
modified Newton's method.
•
Other modification focus on enhancing the properties of the second
and first order gradient combination.
•
Quasi-Newton methods build up curvature information by observing
the behavior of the objective functions and its first order gradient.
This info is used to generate an approximation of the Hessian.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
81
Conjugate Directions Method
Conjugate direction methods can be regarded as somewhat in
between steepest descent and Newton's method, having
the positive features of both of them.
Motivation: Desire to accelerate slow convergence of steepest
descent, but avoid expensive evaluation, storage, and
inversion of Hessian.
Application: Conjugate direction methods are invariably
invented and solved for the quadratic problem:
Minimize: (½) xTQx - bTx
Note: Condition for optimality is y = Qx - b = 0 or Qx = b (linear equation)
Note: Textbook uses “A” instead of “Q”.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
82
Basic Principle
Definition: Given a symmetric matrix Q, two vectors d1 and d2 are said
to be Q orthogonal or Q conjugate (with respect to Q) if d1TQd2 = 0.
Note that orthogonal vectors (d1Td2 = 0)are a special case of conjugate
vectors
So, since the vectors di are independent, the solution to the
nxn quadratic problem can be rewritten as
x* = 0d0 + ... + n-1 dn-1
Multiplying by Q and by taking the scalar product with di, you
can express  in terms of d, Q, and either x* or b
Note that A is used instead of Q in your textbook
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
83
Conjugate Gradient Method
The conjugate gradient method is the conjugate direction
method that is obtained by selecting the successive direction
vectors as a conjugate version of the successive gradients
obtained as the method progresses.
Search direction
@ iteration k.
k 1
d k  g k    id i
or
d k 1   g k 1   k d k
i 0
You generate the conjugate directions as you go along.
Three advantages:
1) Gradient is always nonzero and linearly independent of all previous
direction vectors.
2) Simple formula to determine the new direction. Only slightly more
complicated than steepest descent.
3) Process makes good progress because it is based on gradients.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
84
“Pure” Conjugate Gradient Method (Quadratic Case)
0 - Starting at any x0 define d0 = -g0 = b - Q x0 , where gk is the
column vector of gradients of the objective function at point f(xk)
1 - Using dk , calculate the new point xk+1 = xk + k dk , where
gkTd k
k=- T
d k Qd k
Note that  is calculated
2 - Calculate the new conjugate gradient direction dk+1 , according
to: dk+1 = - gk+1 + k dk
where
gk+1 TQd k
k=
d kTQd k
This is slightly different than your current textbook
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
85
Non-Quadratic Conjugate Gradient Methods
• For non-quadratic cases, you have the problem that you do not
know Q, and you would have to make an approximation.
• One approach is to substitute Hessian H(xk) instead of Q.
– Problem is that Hessian has to be evaluated at each point.
• Other approaches avoid the Q completely by using Line
Searches
– Examples: Fletcher-Reeves and Polak-Robiere methods
• Difference in methods:
– find k through line search
– different formulas for calculating k than the “pure” Conjugate Gradient
algorithm
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
86
Polak-Robiere & Fletcher Reeves Method for Minimizing f(x)
0 - Starting at any x0 define d0 = -g0 ,where g is the column vector
of gradients of the objective function at point f(x)
1 - Using dk , find the new point xk+1 = xk + k dk , where k is found
using a line search that minimizes f(xk + k dk)
2 - Calculate the new conjugate gradient direction dk+1 , according
to: dk+1 = - gk+1 + k dk
where k can vary depending on what (update) formula you use.
Fletcher-Reeves:
Polak-Robiere:
k 
( g k 1 )T ( g k 1 )
T
(g k ) (g k )
k 
( g k 1 )T ( g k 1  g k )
( g k )T ( g k )
Note: gk+1 is the gradient of the objective function at point xk+1
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
87
Fletcher-Reeves Method for Minimizing f(x)
0 - Starting at any x0 define d0 = -g0 ,where g is the column vector
of gradients of the objective function at point f(x)
1 - Using dk , find the new point xk+1 = xk + k dk , where k is found
using a line search that minimizes f(xk + k dk)
2 - Calculate the new conjugate gradient direction dk+1 , according
to: dk+1 = - gk+1 + k dk
where
T
k 
( g k 1 ) ( g k 1 )
( g k )T ( g k )
See also Example 3.9 (page
73) in your textbook
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
88
Conjugate Gradient Method Advantages
Attractive are the simple formulae for updating the direction vector.
Method is slightly more complicated than steepest descent, but
converges faster.
See ‘em in action!
For animations of each of ALL preceding search techniques, check out:
http://www.esm.vt.edu/~zgurdal/COURSES/4084/4084-Docs/Animation.html
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
89