Computational Optimization Mathematical Programming Fundamentals Line Segment Let xRn and yRn, the points on the line segment joining x and y are { z |
Download
Report
Transcript Computational Optimization Mathematical Programming Fundamentals Line Segment Let xRn and yRn, the points on the line segment joining x and y are { z |
Computational Optimization
Mathematical Programming
Fundamentals
Line Segment
Let xRn and yRn, the points on the
line segment joining x and y are
{ z | z = x+(1- )y, 0 1 }.
y
x
Convex Sets
A set S is convex if the line segment
joining any two points in the set is also
in the set, i.e., for any x,yS,
x+(1- )y S for all 0 1 }.
convex
not convex
convex
not convex
not convex
Convex Functions
A function f is (strictly) convex on a convex set
S, if and only if for any x,yS,
f(x+(1- )y)(<) f(x)+ (1- )f(y)
for all 0 1.
f(x+(1- )y)
f(y)
f(x)
x x+(1- )y
y
Concave Functions
A function f is (strictly) concave on a
convex set S, if and only if for any –
f is (strictly) convex on S.
f
-f
(Strictly)Convex, Concave, or
none of the above?
None of the above
Concave
Concave
Convex
Strictly convex
Convexity of function affects
optimization algorithm
Convexity of constraints
affects optimization algorithm
min f(x) subject to xS
S convex
direction of
Steepest descent
S not convex
Convex Program
min f(x) subject to xS
where f and S are convex
Make optimization nice
Many practical problems are
convex problem
Use convex program as
subproblem for nonconvex
programs
Theorem 2.1: Global Solution
of convex program
If x* is a local minimizer of a convex
programming problem, x* is also a
global minimizer. Further more if the
objective is strictly convex then x* is the
unique global minimizer.
Proof. In text.
Problems with nonconvex
objective
Min f(x) subject to x [a,b]
f strictly convex, problem
has unique global minimum a
f not convex, problem has
two local minima
a
x*
x’
b
x* b
Problems with nonconvex set
Min f(x) subject to x [a,b] or [c d]
a
b
x’
c
x*
d
Multivariate Calculus
For x Rn, f(x)=f(x1, x2 , x3 , x4 ,…, xn)
The gradient of f:
f ( x) f ( x)
f ( x)
f ( x )
,
,...,
x
x
x
1
2
n
The Hessian of f:
2 f ( x)
x1x1
2 f ( x)
2 f ( x) x2 x1
2 f ( x)
xn x1
2 f ( x)
2 f ( x)
...
x1x2
x1xn
2 f ( x)
...
x2 x2
2
2
f ( x)
f ( x)
...
xn x2
xn xn
For example
f ( x ) x 3 x2 e3 x1 4 x1 x2
2
1
4
2 x1 3e3 x1 4 x2
f ( x )
3
12 x2 4 x1
3 x1
2
9
e
2 f ( x)
4
x [0,1]
7
f ( x )
12
11
f ( x)
4
2
4
36
2
36 x 2
4
Quadratic Functions
Form
xR QR
n
f ( x)
nxn
bR
Gradient
n
f ( x)
xk
1
x Qx b ' x
2
n
1 n n
Qij xi x j b j x j
2 i 1 j 1
j 1
1
1
Qkk xk Qik xi Qkj x j bk
2 ik
2 j k
n
Q
j 1
f ( x)
kj
x j bk assuming Q symmetric
Qx b
f ( x)
2
Q
Taylor Series Expansion about
x* - 1D Case
Let x=x*+p
1 2 2
1 3 3
f(x)= f(x*+p)=f(x*)+pf (x*)+ p f (x*)+ p f (x*)
2
3!
1 n n
+ +
p f (x*) +
n!
Equivalently
f(x)=f(x*)+(x-x*)f (x*)+
+
+
1
1
(x-x*) 2 f 2 (x*)+ ( x x*)3 f 3 (x*)
2
3!
1
( x x*) n f n (x*) +
n!
Taylor Series Example
Let f(x) = exp(-x), compute
Taylor Series Expansion about x*=0
1
1
2 2
f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( x x*)3 f 3 (x*)
2
3!
1
+ +
( x x*) n f n (x*) +
n!
2
3
n
x
x
x
1 xe x*
e x* e x* + +(-1) n
e x*
2
3!
n!
n
x 2 x3
x
1 x
+ +(-1) n
2 3!
n!
First OrderTaylor Series
Approximation
Let x=x*+p
f(x)=f(x*+p)=f(x*)+pf(x*)+ p ( x*, p)
where lim ( x*, p )
p 0
Says that a linear approximation of a
function works well locally
f(x)=f(x*+p)=f ( x*) pf ( x*)
f(x)
x*
Second OrderTaylor Series
Approximation
Let x=x*+p
1
2
f(x)=f(x*+p)=f(x*)+pf(x*)+ p2 f(x*)p+ p ( x*, p)
2
where lim ( x*, p) 0
p 0
Says that a quadratic approximation of
a function works even better locally
f(x)
f(x)=f(x*+p)=f ( x*) pf ( x*)
x*
1
p2 f ( x*) p
2
Taylor Series Approximation
Exercise
Consider the function and x*=[-2,3]
f ( x1 , x2 ) x 5x x 7 x1 x2 2x
3
1
2
1 2
2
Compute gradient and Hessian.
What is First Order TSA about X*
What is second order TSA about X*
Evaluate both TSA at y=[-1.9,3.2] and
compare with f(y)
2
2
Exercise
f ( x1 , x2 ) x 5 x x2 7 x1 x 22 2 x22
3
1
2
1
f ( x )
function
f ( x*) [ , ]'
2
2 f ( x)
f
(
x
*)
First order TSA:
g ( x ) f ( x*) ( x x*)f ( x*)
second order TSA:
h( x ) f ( x*) ( x x*)f ( x*)
| f ( y ) g ( y ) |
| f ( y ) h( y ) |
1
2
gradient
( x x*) 2 f ( x*)( x x*)
Hessian
Exercise
f ( x1 , x2 ) x13 5 x12 x2 7 x1 x 22 2 x22
function
f ( x*) 56
3x12 10 x1 x2 7 x22
f ( x ) 2
f ( x*) [15, 52]
gradient
5 x1 14 x1 x2 4 x2
6 x1 10 x2 10 x1 14 x2
18 22
2
2
f ( x)
f ( x*)
Hessian
14 x1 4
22 -24
10 x1 14 x2
Exercise
First order TSA:
g ( x ) f ( x*) ( x x*)f ( x*) 64.9
second order TSA:
h( x ) f ( x*) ( x x*) f ( x*)
1
2
( x x*) 2 f ( x*)( x x*)
| f ( y ) g ( y ) || 64.811 ( 64.9) | .089
| f ( y ) h( y ) || 64.811 ( 64.85) | .039
General Optimization
algorithm
Specify some initial guess x0
For k = 0, 1, ……
If xk is optimal then stop
Determine descent direction pk
Determine improved estimate of the
solution: xk+1=xk+kpk
Last step is one-dimensional search
problem called line search
Negative Gradient
An important fact to know is that the
negative gradient always points downhill
Let d f ( x ), then 0 such that f ( x d ) f ( x )
Proof
f ( x d ) f ( x ) d f ( x ) d ( x , d )
f (x d ) f (x )
d f ( x ) d ( x , d )
f ( x d ) f ( x ) 0 for sufficiently small
since d f ( x ) 0 and ( x , d ) 0.
Directional Derivative
f ( x , d ) lim
f (x d ) f (x )
0
f ( x )d
Always exists when function is convex
Line Search
Assume f function maps the vector f to a
n
f
:
R
R
scalar:
n
Current point is
Have interval [a, b]
Want to find:
x R
min f ( x d ) g ( )
[ a , b ]
Review of 1D Optimality
Conditions
First Order Necessary Condition
If x* is a local min then f’(x*)=0.
If f’(x*)=0 then ??????????
2nd Derivatives - 1D Case
Sufficient conditions
If f’(x*)=0 and f’’(x*) >0, then x* is a local min.
If f’(x*)=0 and f’’(x*) <0, then x* is a local max.
Necessary conditions
If x* is a local min, then f’(x*)=0 and f’’(x*) >=0.
If x* is a local max, then f’(x*)=0 and f’’(x*) <=0.
Example
Say we are minimizing
1
f ( x1 , x2 ) x x1 x2 2 x22 15 x1 4 x2
2
1
2
x
x1
1
1
2
[ x1 x2 ] 1 15 4
2
2 4 x2
x2
2
1
Solution is [8 2]’
Example continued
The exact stepsize can be found
0
1
x d
1
0 1
g ( ) f ( x d ) 2
2
2 15 4
2
'
g '( ) f '( x d ) f ( x ) d 1
2
1
2 15 0
2
29
4
15
4 1 4
1
2
'
1
0
Example continued
So new point is
0
x d +
1
29
4
g([0,-1]=6
g([29/4, -1])=-46.5
g([8,2])=-64
1 29
4
0
1
Differentiability and Convexity
For convex function, linear
approximation underestimates function
f(x)
(x*,f(x*))
g ( x) f ( x*) ( x x*)f ( x*)
Theorem
f is convex and continuously
differentiable on a Set S if and only if
f ( y) f ( x) ( y x) ' f ( x) x, y S
Theorem
Consider problem min f(x)
unconstrained.
If f ( x ) 0 and f is convex, then
x is a global minimum.
Proof:
y
f ( y) f ( x ) ( y x ) ' f ( x ) by convexity of f
f ( x ) since f ( x ) 0.
What’s next
The last theorem is an example of a
sufficient optimality condition
Next time
1-d optimality conditions and line search
How to tell if a function is convex
General optimality conditions.
Homework handed out
Read Chapter 10
Theorem
Let f be twice continuously
differentiable.
f(x) is convex on S if and only if for all
xX, the Hessian at x
f ( x)
2
is positive semi-definite.
Theorem
Let f be twice continuously
differentiable.
f(x) is strictly convex on S if and only if
for all xX, the Hessian at x
f ( x)
2
is positive definite.
Definition
The matrix H is positive semi-definite (p.s.d.)
if and only if for any vector y
yHy 0
The matrix H is positive definite (p.d.) if and
only if for any nonzero vector y
yHy 0
Similarly for negative (semi-) definite.
Convexity and Curvature
Convex functions have positive
curvature everywhere.
Curvature can be measured by the
second derivative or Hessian.
Properties of the Hessian indicate if a
function is positive definite or not.
Checking Matrix H is p.s.d/p.d.
Manually
x1
4 1 x1
x2
x
1
3
2
16 x12 2 x1 x2 9 x22
( x1 x2 ) ^ 2 15 x12 8 x22 0 [ x1, x2 ] 0
so matrix is positive definite
Or use eigenvalues
If all eigenvalues are positive, then matrix is
positive definite, p.d.
If all eigenvalues are nonnegative, then
matrix is positive semi-definite, p.s.d
If all eigenvalues are negative, then matrix is
negative definite, p.d.
If all eigenvalues are nonpositive, then matrix
is negative semi-definite, n.s.d
Otherwise the matrix is indefinite.
Via eigenvalues
The eigenvalues of
4 1
1 3 are 4.618 and 2.382
so matrix is positive definite
Unconstrained Optimality
Conditions
Basic Problem:
(1)
max
xS
Where S is an open set
e.g. Rn
f ( x)
Notes
Note that this condition is not sufficient
f ( x*) 0
Also true for
local max and
saddle points
First Order Necessary
Conditions
Theorem: Let f be continuously
differentiable.
If x* is a local minimizer of (1),
then
f ( x*) 0
Proof
Assume false, e.g., f ( x*) 0
Let d f ( x*), then
f ( x * d ) f ( x*) d f ( x*) d ( x*, d )
f ( x * d ) f ( x*)
d f ( x*) d ( x*, d )
f ( x * d ) f ( x*) 0 for sufficiently small
since d f ( x*) 0 and ( x*, d ) 0.
CONTRADICTION !! x * is a local min.
Second Order Sufficient
Conditions
Theorem: Let f be twice continuously
differentiable.
If
and
f ( x*) 0
f ( x*) is positive definite
2
then x* is a local minimizer of (1).
Proof
Any point x in neighborhood of x* can
be written as x*+d for some vector d
1
2
2
f ( x * d ) f ( x*) d f ( x*) d ' f ( x*)d d ( x*, d )
2
since f ( x*) 0
f ( x * d ) f ( x*) 1
2
2
d
'
f
(
x
*)
d
d
( x*, d )
2
2
f ( x * d ) f ( x*) 0 for sufficiently small
since d f ( x*)d 0 for all d and ( x*, d ) 0.
therefore x * is a local min.
Second Order Necessary
Conditions
Theorem: Let f be twice continuously
differentiable.
If x* is a local minimizer of (1)
then
f ( x*) 0
f ( x*) is positive semi definite
2
Proof
For any vector d, x*+d is in any
neighborhood of x* for sufficiently
small,
1
f ( x * d ) f ( x*) d f ( x*) 2 d ' 2 f ( y )d for some vector y
2
since f ( x*) 0
f ( x * d ) f ( x*) 1
d ' 2 f ( y )d
2
2
since f ( x * d ) f ( x*) for sufficiently small
so d f ( x*)d 0 for all d .
0