Computational Optimization Mathematical Programming Fundamentals Line Segment Let xRn and yRn, the points on the line segment joining x and y are { z |

Download Report

Transcript Computational Optimization Mathematical Programming Fundamentals Line Segment Let xRn and yRn, the points on the line segment joining x and y are { z |

Computational Optimization
Mathematical Programming
Fundamentals
Line Segment
Let xRn and yRn, the points on the
line segment joining x and y are
{ z | z = x+(1- )y, 0   1 }.
y
x
Convex Sets
A set S is convex if the line segment
joining any two points in the set is also
in the set, i.e., for any x,yS,
x+(1- )y S for all 0   1 }.
convex
not convex
convex
not convex
not convex
Convex Functions
A function f is (strictly) convex on a convex set
S, if and only if for any x,yS,
f(x+(1- )y)(<)   f(x)+ (1- )f(y)
for all 0   1.
f(x+(1- )y)
f(y)
f(x)
x x+(1- )y
y
Concave Functions
A function f is (strictly) concave on a
convex set S, if and only if for any –
f is (strictly) convex on S.
f
-f
(Strictly)Convex, Concave, or
none of the above?
None of the above
Concave
Concave
Convex
Strictly convex
Convexity of function affects
optimization algorithm
Convexity of constraints
affects optimization algorithm
min f(x) subject to xS
S convex
direction of
Steepest descent
S not convex
Convex Program
min f(x) subject to xS
where f and S are convex
Make optimization nice
Many practical problems are
convex problem
Use convex program as
subproblem for nonconvex
programs
Theorem 2.1: Global Solution
of convex program
If x* is a local minimizer of a convex
programming problem, x* is also a
global minimizer. Further more if the
objective is strictly convex then x* is the
unique global minimizer.
Proof. In text.
Problems with nonconvex
objective
Min f(x) subject to x  [a,b]
f strictly convex, problem
has unique global minimum a
f not convex, problem has
two local minima
a
x*
x’
b
x* b
Problems with nonconvex set
Min f(x) subject to x  [a,b] or [c d]
a
b
x’
c
x*
d
Multivariate Calculus
For x Rn, f(x)=f(x1, x2 , x3 , x4 ,…, xn)
The gradient of f:
 f ( x) f ( x)
f ( x) 
f ( x )  
,
,...,


x

x

x
1
2
n 

The Hessian of f:
  2 f ( x)

 x1x1
  2 f ( x)

 2 f ( x)   x2 x1


  2 f ( x)

 xn x1
 2 f ( x)
 2 f ( x) 
...

x1x2
x1xn 

 2 f ( x)
...

x2 x2



2
2
 f ( x)
 f ( x) 
...

xn x2
xn xn 
For example
f ( x )  x  3 x2  e3 x1  4 x1 x2
2
1
4
 2 x1  3e3 x1  4 x2 
f ( x )  

3


12 x2  4 x1

3 x1

2

9
e
 2 f ( x)  
4

x  [0,1]
7 
f ( x )   
12 
11
 f ( x)  
4
2
4 
36 


2
36 x 2 
4
Quadratic Functions
Form
xR QR
n
f ( x)
nxn
bR
Gradient
n
f ( x)
xk

1
x Qx  b ' x
2

n
1 n n
Qij xi x j   b j x j


2 i 1 j 1
j 1

1
1
Qkk xk   Qik xi   Qkj x j  bk
2 ik
2 j k

n
Q
j 1
f ( x)
kj
x j  bk assuming Q symmetric
 Qx  b
 f ( x) 
2
Q
Taylor Series Expansion about
x* - 1D Case
Let x=x*+p
1 2 2
1 3 3
f(x)= f(x*+p)=f(x*)+pf (x*)+ p f (x*)+ p f (x*)
2
3!
1 n n
+ +
p f (x*) +
n!
Equivalently
f(x)=f(x*)+(x-x*)f (x*)+
+
+
1
1
(x-x*) 2 f 2 (x*)+ ( x  x*)3 f 3 (x*)
2
3!
1
( x  x*) n f n (x*) +
n!
Taylor Series Example
Let f(x) = exp(-x), compute
Taylor Series Expansion about x*=0
1
1
2 2
f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( x  x*)3 f 3 (x*)
2
3!
1
+ +
( x  x*) n f n (x*) +
n!
2
3
n
x
x
x
 1  xe  x* 
e  x*  e  x* + +(-1) n
e  x* 
2
3!
n!
n
x 2 x3
x
 1 x 

+ +(-1) n

2 3!
n!
First OrderTaylor Series
Approximation
Let x=x*+p
f(x)=f(x*+p)=f(x*)+pf(x*)+ p  ( x*, p)
where lim  ( x*, p )
p 0
Says that a linear approximation of a
function works well locally
f(x)=f(x*+p)=f ( x*)  pf ( x*)
f(x)
x*
Second OrderTaylor Series
Approximation
Let x=x*+p
1
2
f(x)=f(x*+p)=f(x*)+pf(x*)+ p2 f(x*)p+ p  ( x*, p)
2
where lim  ( x*, p)  0
p 0
Says that a quadratic approximation of
a function works even better locally
f(x)
f(x)=f(x*+p)=f ( x*)  pf ( x*) 
x*
1
p2 f ( x*) p
2
Taylor Series Approximation
Exercise
Consider the function and x*=[-2,3]
f ( x1 , x2 )  x  5x x  7 x1 x2  2x
3
1
2
1 2
2
Compute gradient and Hessian.
What is First Order TSA about X*
What is second order TSA about X*
Evaluate both TSA at y=[-1.9,3.2] and
compare with f(y)
2
2
Exercise
f ( x1 , x2 )  x  5 x x2  7 x1 x 22  2 x22
3
1
2
1
 
f ( x )   
 
function
f ( x*)  [ , ]'



2
 2 f ( x)  

f
(
x
*)






First order TSA:
g ( x )  f ( x*)  ( x  x*)f ( x*) 
second order TSA:
h( x )  f ( x*)  ( x  x*)f ( x*)

| f ( y )  g ( y ) |
| f ( y )  h( y ) |
1
2
gradient



( x  x*) 2 f ( x*)( x  x*)
Hessian
Exercise
f ( x1 , x2 )  x13  5 x12 x2  7 x1 x 22  2 x22
function
f ( x*)  56
3x12  10 x1 x2  7 x22 
f ( x )   2
f ( x*)  [15, 52]
gradient

5 x1  14 x1 x2  4 x2 
 6 x1  10 x2 10 x1  14 x2 
 18 22 
2
2
 f ( x)  
 f ( x*)  
Hessian


14 x1  4 
 22 -24 
10 x1  14 x2
Exercise
First order TSA:
g ( x )  f ( x*)  ( x  x*)f ( x*)  64.9
second order TSA:
h( x )  f ( x*)  ( x  x*) f ( x*)

1
2
( x  x*) 2 f ( x*)( x  x*)
| f ( y )  g ( y ) || 64.811  ( 64.9) | .089
| f ( y )  h( y ) || 64.811  ( 64.85) | .039
General Optimization
algorithm
Specify some initial guess x0
For k = 0, 1, ……
If xk is optimal then stop
 Determine descent direction pk
 Determine improved estimate of the
solution: xk+1=xk+kpk

Last step is one-dimensional search
problem called line search
Negative Gradient
An important fact to know is that the
negative gradient always points downhill
Let d  f ( x ), then   0 such that f ( x   d )  f ( x )
Proof
f ( x   d )  f ( x )   d f ( x )   d  ( x ,  d )

f (x  d )  f (x )
 d f ( x )  d  ( x ,  d )


f ( x   d )  f ( x )  0 for  sufficiently small
since d f ( x )  0 and  ( x ,  d )  0.
Directional Derivative
f ( x , d )  lim
f (x  d )  f (x )
 0
 f ( x )d

Always exists when function is convex
Line Search
Assume f function maps the vector f to a
n
f
:
R
R
scalar:
n
Current point is
Have interval   [a, b]
Want to find:
x R
min f ( x   d )  g ( )
[ a , b ]
Review of 1D Optimality
Conditions
First Order Necessary Condition
If x* is a local min then f’(x*)=0.
If f’(x*)=0 then ??????????
2nd Derivatives - 1D Case
Sufficient conditions


If f’(x*)=0 and f’’(x*) >0, then x* is a local min.
If f’(x*)=0 and f’’(x*) <0, then x* is a local max.
Necessary conditions


If x* is a local min, then f’(x*)=0 and f’’(x*) >=0.
If x* is a local max, then f’(x*)=0 and f’’(x*) <=0.
Example
Say we are minimizing
1
f ( x1 , x2 )  x  x1 x2  2 x22  15 x1  4 x2
2
1
2
 x 

 x1 
1
1
2
 [ x1 x2 ]  1     15 4  
2
  2 4   x2 
 x2 
2
1
Solution is [8 2]’
Example continued
The exact stepsize can be found
0
1    
x  d          
 1
 0   1
g ( )  f ( x   d )   2 

2
 2  15  4
2
'
g '( )  f '( x   d )  f ( x ) d    1
 
 2
1
 2   15  0
2
29
 
4
    15 
    
4   1  4  


1
2
'
1 
0
 
Example continued
So new point is
0
x d    +
 1
29
4
g([0,-1]=6
g([29/4, -1])=-46.5
g([8,2])=-64

1   29
4
0   
   1
Differentiability and Convexity
For convex function, linear
approximation underestimates function
f(x)
(x*,f(x*))
g ( x)  f ( x*)  ( x  x*)f ( x*)
Theorem
f is convex and continuously
differentiable on a Set S if and only if
f ( y)  f ( x)  ( y  x) ' f ( x)  x, y  S
Theorem
Consider problem min f(x)
unconstrained.
If f ( x )  0 and f is convex, then
x is a global minimum.
Proof:
y
f ( y)  f ( x )  ( y  x ) ' f ( x ) by convexity of f
 f ( x ) since f ( x )  0.
What’s next
The last theorem is an example of a
sufficient optimality condition
Next time
1-d optimality conditions and line search
 How to tell if a function is convex
 General optimality conditions.

Homework handed out
Read Chapter 10
Theorem
Let f be twice continuously
differentiable.
f(x) is convex on S if and only if for all
xX, the Hessian at x
 f ( x)
2
is positive semi-definite.
Theorem
Let f be twice continuously
differentiable.
f(x) is strictly convex on S if and only if
for all xX, the Hessian at x
 f ( x)
2
is positive definite.
Definition
The matrix H is positive semi-definite (p.s.d.)
if and only if for any vector y
yHy  0
The matrix H is positive definite (p.d.) if and
only if for any nonzero vector y
yHy  0
Similarly for negative (semi-) definite.
Convexity and Curvature
Convex functions have positive
curvature everywhere.
Curvature can be measured by the
second derivative or Hessian.
Properties of the Hessian indicate if a
function is positive definite or not.
Checking Matrix H is p.s.d/p.d.
Manually
 x1
 4 1  x1 
x2  
 x 

1
3

  2
 16 x12  2 x1 x2  9 x22
 ( x1  x2 ) ^ 2  15 x12  8 x22  0 [ x1, x2 ]  0
so matrix is positive definite
Or use eigenvalues
If all eigenvalues are positive, then matrix is
positive definite, p.d.
If all eigenvalues are nonnegative, then
matrix is positive semi-definite, p.s.d
If all eigenvalues are negative, then matrix is
negative definite, p.d.
If all eigenvalues are nonpositive, then matrix
is negative semi-definite, n.s.d
Otherwise the matrix is indefinite.
Via eigenvalues
The eigenvalues of
 4 1
 1 3  are 4.618 and 2.382


so matrix is positive definite
Unconstrained Optimality
Conditions
Basic Problem:
(1)
max
xS
Where S is an open set
e.g. Rn
f ( x)
Notes
Note that this condition is not sufficient
f ( x*)  0
Also true for
local max and
saddle points
First Order Necessary
Conditions
Theorem: Let f be continuously
differentiable.
If x* is a local minimizer of (1),
then
f ( x*)  0
Proof
Assume false, e.g., f ( x*)  0
Let d  f ( x*), then
f ( x *  d )  f ( x*)   d f ( x*)   d  ( x*,  d )

f ( x *  d )  f ( x*)
 d f ( x*)  d  ( x*,  d )


f ( x *  d )  f ( x*)  0 for  sufficiently small
since d f ( x*)  0 and  ( x*,  d )  0.
CONTRADICTION !! x * is a local min.
Second Order Sufficient
Conditions
Theorem: Let f be twice continuously
differentiable.
If
and
f ( x*)  0
 f ( x*) is positive definite
2
then x* is a local minimizer of (1).
Proof
Any point x in neighborhood of x* can
be written as x*+d for some vector d
1
2
2

f ( x *  d )  f ( x*)   d f ( x*)  d '  f ( x*)d   d  ( x*,  d )
2
since f ( x*)  0

f ( x *  d )  f ( x*) 1
2
2

d
'

f
(
x
*)
d

d
 ( x*,  d )
2
2


f ( x *  d )  f ( x*)  0 for  sufficiently small
since d f ( x*)d  0 for all d and  ( x*,  d )  0.
therefore x * is a local min.
Second Order Necessary
Conditions
Theorem: Let f be twice continuously
differentiable.
If x* is a local minimizer of (1)
then
f ( x*)  0
 f ( x*) is positive semi definite
2
Proof
For any vector d, x*+d is in any
neighborhood of x* for  sufficiently
small,
1
f ( x *  d )  f ( x*)   d f ( x*)   2 d '  2 f ( y )d for some vector y
2
since f ( x*)  0

f ( x *  d )  f ( x*) 1
 d '  2 f ( y )d
2
2

since f ( x *  d )  f ( x*) for  sufficiently small
so d f ( x*)d  0 for all d .
0