Computational Optimization - Rensselaer Polytechnic Institute

Download Report

Transcript Computational Optimization - Rensselaer Polytechnic Institute

Review + Announcements 2/22/08

Presentation schedule Friday 4/25 (5 max) 1. Miguel Jaller 8:03 2. Adrienne Peltz 8:20 3. Olga Grisin 8:37 Tuesday 4/29 (5 max) 1. Jayanth 8:03 2. Raghav 9;20 3. Rhyss 8:37 4. Dan Erceg 8:54 4. Tim *:54 5. Nick Suhr 9:11 5-6. Lindsey Garret and Mark Yuhas 9:11 6. Christos Boutsidis 9:28 Monday 4/28 4:00 7:00 Pizza included Lisa Pak Christos Boutsidis David Doria.

Zhi Zeng Carlos Varun Samrat Matt Adarsh Ramsuhramonian Be on time.

Plan your presentation for 15 minutes.

Strict schedule.

Suggest putting presentation in Your public_html directory in rcs so you can click and go.

Monday night class is in Amos Eaton 214 4 to 7.

Other Dates Project Papers due Friday (or in class Monday if you have a Friday presentation) Final Tuesday 5/6 3 p.m. Eaton 214 Open book/note (no computers) Comprehensive. Labs fair game too.

Office hours Monday 5/5 10 to 12 (or email)

What did we learn?

Theme 1: “There is nothing more practical than a good theory” - Kurt Lewin Algorithm arise out of the optimality conditions.

What did we learn?

Theme 2: To solve a harder problem, reduce it to an easier problem that you already know how to solve.

Fundamental Theoretical Ideas Convex functions and sets Convex programs Differentiability  Taylor Series Approximations  Descent Directions Combining these with the ideas of feasible directions provides the basis for optimality conditions.

Convex Functions A function f is (strictly) convex on a convex set S, if and only if for any x,y  S, f(  x+(1  )y)(<)   f(x)+ (1 for all 0    1.

 )f(y)

f( λx+(1-

)y)

f(y) f(x)

Convex Sets A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,y  S,  x+(1  )y  S for all 0    1 }.

convex not convex convex not convex not convex

Convex Program min f(x) subject to x  S where f and S are convex Make optimization nice Many practical problems are convex problem Use convex program as subproblem for nonconvex programs

Theorem : Global Solution of convex program If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the objective is strictly convex then x* is the unique global minimizer.

Proof: contradiction x* f(y)

First Order Taylor Series Approximation Let x=x*+p f(x)=f(x*+p)=f(x*)+p  f(x*)+ p

where p

 0 

x p

 0  Says that a linear approximation of a function works well locally f(x) f(x)  f(x)  

x

*) ' 

f x

x*

Second Order Taylor Series Approximation Let x=x*+p f(x)=f(x*+p)=f(x*)+p   f(x*)+ 1 2

where p

 0 

x p

 0

p

  2 f(x*)p+ p 2  Says that a quadratic approximation of a function works even better locally f(x) x* f(x) 

x

*) '   1 2 (

x

x

*) '  2

x

x

*)

Descent Directions If the directional derivative is negative then linesearch will lead to decrease in the function  ( )   0 [8,2] [0,-1] 

f x

d

First Order Necessary Conditions Theorem: Let f be continuously differentiable. If x* is a local minimizer of (1), then 

f

 0

Second Order Sufficient Conditions Theorem: Let f be twice continuously differentiable. If 

f

 0 and  2 then x* is a strict local minimizer of (1).

Second Order Necessary Conditions Theorem: Let f be twice continuously differentiable. If x* is a local minimizer of (1) then 

f

 0  2

Optimality Conditions First Order Necessary Second Order Necessary Second Order Sufficient With convexity the necessary conditions become sufficient.

Easiest Problem Line Search = 1-D Optimization (1)

f

R

 

S

Optimality conditions based on first and second derivatives Golden section search

Sometimes can solve linesearch exactly The exact stepsize can be found min 

g

  

d

)

g d

 

d

)  0

General Optimization algorithm Specify some initial guess x 0 For k = 0, 1, ……    If x k is optimal then stop Determine descent direction p k Determine improved estimate of the solution: x k+1 =x k +  k p k Last step is one-dimensional search problem called line search

Newton’s Method Minimizing quadratic has closed form    2 1 0 Minimum must satisfy,

Qx

*  

b

 1

Q b

(unique if Q is invertible)

General nonlinear functions For non-quadratic f (twice cont. diff): Approximate by 2 nd order TSA Solve for FONC for quadratic approx.

1   (

y

x

)    (

y

x

)   2

y

x

) 2 Calculate FONC      2

y

x

)  0 Solve for

y

 2

y y

 2

x

)    1  Pure Newton Direction

Basic Newton’s Algorithm Start with x 0 For k =1,…,K    If x k is optimal then stop Solve:  2

f x p k

  ( )

k

X k+1 =x k +p

Final Newton’s Algorithm Start with x 0 For k =1,…,K   If x k is optimal then stop Solve:  2

f x k k

 

LDL

' using modified cholesky ( )

k

factorization  Perform linesearch to determine X k+1 =x k +  k p k What are pros and cons?

Steepest Descent Algorithm Start with x 0 For k =1,…,K  If x k is optimal then stop

p k

  ( )

k

 Perform exact or backtracking linesearch to determine x k+1 =x k +  k p k

Inexact linesearch can work quite well too! For 0

k

 

p k

) 

k

) 

c

1  

f x k

) '

p k

k

 

p k

) '

p

c

2 

f x k

) '

p k

Solution exists for any descent direction if f is bounded below on the linesearch.

(Lemma 3.1)

Conditioning Important for gradient methods!

50(x-10)^2+y^2 Cond num =50/1=50

Steepest Descent ZIGZAGS!!!

Know Pros and Cons of each approach

Conjugate Gradient (CG) Method for minimizing quadratic function Low storage method CG only stores vector information CG superlinear convergence for nice problems or when properly scaled Great for solving QP subproblems

Quasi Newton Methods Pros and Cons Globally converges to a local min always find descent direction Superlinear convergence Requires only first order information – approximates Hessian More complicated than steepest descent Requires sophisticated linear algebra Have to watch out for numerical error

Quasi Newton Methods Pros and Cons Globally converges to a local min Superlinear convergence w/o computing Hessian Works great in practice. Widely used. More complicated than steepest descent Best implementations require sophisticated linear algebra, linesearch, dealing with curvature conditions. Have to watch out for numerical error.

Trust Region Methods Alternative to line search methods Optimize quadratic model within the “ trust region ”

p k

 arg min of objective

f

(

x i

)  

f

(

x i

)'

p

 1 2

p

'

B k p s

.

t

.

p

 

k

 

f

(

x

)

x i

 1

x i

Easiest Problem Linear equality constraints

f

R n

b A

R

,

b

R m

Lemma 14.1 Necessary Conditions (Nash + Sofer) If x* is a local min of f over {x|Ax=b}, and Z is a null matrix 

Z

' 

f x and Z

'  2  0 . . .

Or equivalently use KKT Conditions  

Ax

* 

b

A

'   0 

Z

' 

f

2 . . .

has a solution

Other conditions Generalize similarly

Handy ways to compute Null Space Variable Reduction Method Orthogonal Projection Matrix QR factorization (best numerically) Z=Null(A) in matlab

Next Easiest Problem Linear equality constraints

f

R n

b A

R

,

b

R m

Constraints form a polyhedron

Inequality Case a 5 x = b 5 Polyhedron Ax>=b a 4 x = b 4 a 1 

f

(

x

*) x* a 2 x = b 5 a 2 Inequality problem

x

*  a 3 x = b 3

i i

 1...

d

b i

a 1 x = b 1 Inequality FONC:  

A T

 * 

Ax

* 

b

i

 (  0 * 

b i

)  0 

i

i a i

Nonnegative Multipliers imply gradient points to the greater than Side of the constraint.

Second Order Sufficient Conditions for Linear Inequalities If (x*,  *) satisfies

Ax

* 

b

Primal feasibility 

f x

 *  0   * '(

Ax

*

A b

Dual feasibility 0 Complementarity and SOSC

Z

 '  2  Then x* is a strict local minimizer

Sufficient Conditions for Linear Inequalities where Z + is a basis matrix for Null(A + ) and A + corresponds to nondegenerate active constraints) i.e.

A + 

A J A x j

* 

b j

,  j *  0 }

General Constraints min

i i

 0  0

i

E i

I

Careful : Sufficient conditions are the same as before Necessary conditions have extra constraint qualification to make sure Lagrangian multipliers exist!

Necessary Conditions General If x* satisfies LICQ and is a local min of there exists 

I

*  0,  *

E

0  

x I E

*)    * '

and Z

'  2

xx

 0

where Z

constraints)  0 . .

 

i

i

*

i

 

i

i

*

i

Algorithms build on prior Approaches Linear Equality Constrained: Convert to unconstrained and solve

Ax

b

Different ways to represent Null space produce Algorithms in practice

Prior Approaches (cont) Linear Inequality Constrained: Identify active constraints Solve equality constrained subproblems

Ax

b

Nonlinear Inequality Constrained: Linearize constraints  0 Solve subproblems

Active Set Methods NW 16.5

Change one item of working set at a time

Interior point algorithms NW 16.6

Traverse interior of set (a little more later)

Gradient Projection NW 16.7

Change many elements of working set at once

Generic inexact penalty problem From To (

NLP

) min

i i

 0  0

i

E i

I

    

g x i

2    (( 

i

2 What are penalty problems and why do we use them?

Difference between exact and inexact penalties.

Augmented Lagrangian Consider min f(x) s.t h(x)=0 Start with L(x,  )=f(x)  ’h(x) Add penalty L(x,  ,c)=f(x)  ’h(x)+μ/2||h(x)|| 2 The penalty helps insure that the point is feasible. Why do we like these? How do they work in practice?

Sequential Quadratic Programming (SQP) Basic Idea: QP with constraints are easy. For any guess of active constraints, just have to solve system of equations.

So why not solve general problem as a series of constrained QPs.

Which QP should be used?

Trust Region Works Great We only trust approximation locally so limit step to this region by adding constraint to QP min

p

1 2

p

 '  2

xx

x p k

, 

k

) 

x

) '  

k p

 Trust region  ' 

x k

)  0

x

, 

k

)  No stepsize needed!

Advanced topics Duality Theory – Can choose to solve primal or dual problem. Dual is always nice. But there may be a “duality gap” if overall problem is not nice. Nonsmooth optimization Can do the whole thing again on the basis of subgradients instead of gradients.

Subgradient Generalization of the gradient Definition Let

f

:

R n

R

be a convex function.

a vector g 

R n

such that f(y)  f(x)  g' (y x) Hinge loss is a subgradien t of

f

at x.

0 f(y) 1  f(x)  g'(y-x)