Curved Trajectories towards Local Minimum of a Function
Download
Report
Transcript Curved Trajectories towards Local Minimum of a Function
Curved Trajectories towards
Local Minimum of a Function
…or How to get all you can from the Taylor Series
Al Jimenez
Mathematics Department
California Polytechnic State University
San Luis Obispo, CA 93407
Summer, 2007
Introduction and Notation
• The Problem
Minimize
f ( x ),
n
x
• Derivatives:
f:
n
f ( x), f ( x), f ( x), f (4) ( x), etc
• A local min x* is a critical point: f ( x*) 0
• Necessary condition: f ( x*) ≥ 0
Typical Iterative Methods
• Sequence x1, x2 ,..., xk , xk 1 is generated from x0
• Such that f ( xk 1 ) f ( xk pk vk ) f ( xk )
• With vk a vector with property f ( xk )vk 0
a descent direction
• And pk > 0 typically approximates solution of
Minimize f ( xk pvk )
p
called the line search or the scalar search
• Proven to converge for smooth functions
Current Methods
• Selecting vk has huge effect on convergence rate:
– Steepest Descent:
1st order
vk f ( xk )
–
–
–
–
–
vk f ( xk ) f ( xk ) 2nd order,
Newton’s direction:
but may not be a descent direction when far from a min
Conjugate Directions uses vk-1, vk-2, ...
Quasi-Newton/Variable metric also uses vk-1, vk-2, ...
High order Tensor models fit prior iteration values
Number of derivatives available affects method
• The scalar search
– Accuracy of scalar minimization
– Quadratic models: “Trust Region”
1
The Basic Idea
We seek a
solution to:
f ( x) 0
calling it x = x*
By a change
of variable:
x h( z )
we never really
find h(z) directly
g(z) is composition
That results in
f ( h( z ) ) g ( z)
new function:
function
Such that:
g ( z) 0
Is easy to solve for
a z = z*
Easy to Solve Functions g(z) = 0
• Used for this talk: g ( z) z , z* 0
z
• Other possibilities: g ( z) e 1, z* 0
• PhD work showed potential by selecting
an appropriate one based on the function
being minimized from limited explorations
p
Infinite Series of Solution
1
1
2
x h( zk ) h ( zk ) zk h ( zk ) zk h( zk ) zk3 ...
2
6
*
h( zk ) p f ( xk ) zkp 1
1
h( zk ) f ( xk ) p( p 1) zkp 2 f ( xk )h( zk )h( zk )
1
h( zk ) f ( xk ) p( p 1)( p 2) zkp 3 3 f ( xk )h( zk )h( zk ) f (4) ( xk )h( zk )h( zk )h( zk )
1
• Matrix vector products, but shown with
exponents for connections with scalar
Taylor series.
Infinite Series of Solution…
• Define:
f ( xk ) d 2
f ( xk )
f ( xk ) d 3
1
f ( xk )d 2d 2
2
f ( xk ) d 4 f ( xk )d 2d 3
• Then:
1 (4)
f ( xk ) d 2 d 2 d 2
6
1
2
x xk pd 2 p( p 1)d 2 p d 3
2
1
p( p 1)( p 2)d 2 p 2 ( p 1)d 3 p 3d 4 ...
6
*
• For p = 1:
x xk d2 d3 d4 ....
*
Curved Trajectories Algorithm
• At kth iteration, estimate , then calculate:
f ( xk ) d 2
f ( xk )
f ( xk ) d 3
1
f ( xk d 2 ) ( 1) f ( xk )
1
f
(
x
)
d
k 4 3 f ( xk d 2 2d 3 ) ( 1) f ( xk )
2
• Select order, modify di , and select pk
2nd order:
3rd order:
4th order:
xk 1 xk pd 2
3
1
d 2 p d 2 2d 3 p 2
2
2
11
1
xk 1 xk d 2 p d 2 2d 3 p 2 d 2 6d 3 6d 4 p 3
6
6
xk 1 xk
Results on Rosenbrock Banana
Shaped Function
f ( x, y ) 100( y x 2 )2 (1 x)2
-1.2
x ,
,
1.0
x0 [ x
y]T [1.2 1]T
x* [1 1]T
f , 24.2000 ,
-215.600
Gradient ,
-88.00
1330.00
Hessian ,
480.0
d2
-0.02472
,
-0.3807
-0.02444
d3 ,
0.05805
480.0
200
-0.02420
d4 ,
0.05687
1.2 1. p ( 0.04532 p ( 0.02416 0.003879 p ) )
4th order xk1 ,
1.0
1.
p
(
0.6979
p
(
0.4968
0.06462
p
)
)
• Algorithm selects
x1 [0.1156 0.1479]T , f 2.59
x2
x0
x3
f = 24.2
f = 24.2
f=4
x1
f = 0.5
3D View
"Rosenbrock's banana-shaped valley"
iteration:, 0,
norm x-x* , 2.20,
iteration:, 1,
max iterations, 25,
-1.2
x,
,
1.0
Nfuns, Ngrads, Nhess:
, 7, 5, 1,
Nfuns, Ngrads, Nhess:
, 1, 1, 0
-216.
Gradient,
,
-88.0
f , 24.20,
order:, 4,
p:, 5.,
h2D:, 0,
0.115575300000000006
norm x-x* , 1.23, x ,
, f , 2.591,
0.147857030000000000
iteration:, 2,
Nfuns, Ngrads, Nhess:
, 1 3, 8, 2,
order:, 3,
p:, 6.,
gnorm =, 233.
k2 , 0.000312
, d3normerr, 0.15210-8
-7.99
Gradient,
,
26.9
h2D:, 1,
gnorm, 28.1
k2 , 0.000364
, d3normerr, 0.87110-8
1.08503529999999992
0.631
norm x-x* , 0.196, x ,
,
f
,
0.007344
,
Gradient
,
-0.213,
1.17623889999999998
iteration:, 3,
Nfuns, Ngrads, Nhess:
, 17, 11, 3,
order:, 4,
p:, 1.,
h2D:, 0,
k2 , 0.000306
, d3normerr, 0.0000115
1.00052460000000010
0.200
norm x-x* , 0.000762
, x,
,
f
,
0.00002502
,
Gradient
,
,
1.00055200000000011
-0.0995
iteration:, 4,
Nfuns, Ngrads, Nhess:
, 2 0, 14, 4,
order:, 4,
p:, 1.,
1.00000040000000001
-12
norm x-x* , 0.86410-6 , x ,
, f , 0.154510 ,
1.00000080000000002
iteration:, 5,
Nfuns, Ngrads, Nhess:
, 2 2, 16, 5,
1.
norm x-x* , 0., x , ,
1.
order:, 3,
-33
f , 0.135110 ,
p:, 1.,
gnorm, 0.666
h2D:, 0,
k2 , 0.00125, d3normerr, 0.23910-10
0.30610-5
Gradient,
,
-0.11410-5
h2D:, 0,
gnorm, 0.223
gnorm, 0.32710-5
k2 , 0.9, d3normerr, 0.63710-17
0.16010-15
Gradient,
,
-0.69110-16
gnorm, 0.17510-15
Rosenbrock’s Function
f ( x ) 100( x2 x ) (1 x1 ) 2
x0 [1.2 1]T
x * [1 1]T
Counters
xk x *
f ( xk )
f ( xk )
Order
p
k
#f #G #H
0
1
1
0
2.2
24.2
233
1
7
5
1
4
5
1.23
2.59
28.1
2
13
8
2
3
6
0.196
0.007344
0.666
3
17 11
3
4
1
0.00076
0.223
2.5 10 5
4
20 14
4
4
1
8.6 10 7
1.5 1013
3.3 106
5
22 16
5
3
1
10 16
1.4 1034
1.8 10 16
Table 2. Summary of Curved Trajectories Algorithm performance on banana-shape valley
function. # f is number of function evaluations, # G is number of gradient evaluations, and #
H is number of Hessian evaluations.
2 2
1
Fletcher and Powel’s Helical valley Function
f ( x ) 100[( x3 10 ) 2 ( r 1) 2 ] x32
x0 [1 0 0]T
x* [1 0 0]T
r x12 x22 ,
tan 1 ( x2 / x1 ) /(2 ),
x1 0
1
0.5 tan ( x2 / x1 ) /(2 ), x1 0
Counters
xk x *
f ( xk )
f ( xk )
Order
p
k #f #G #H
0
1
1
0
2
2500
1880
1
3
3
1
3
1
5.34
24.85
16.6
2 15
6
2
2
0.009
4.22
21.77
73
3 19
9
3
3
1
2.75
10.04
43.8
4 23 12
4
4
1
1.85
3.01
17
5 27 15
5
4
1
0.643
2.748
35.1
6 34 19
6
4
1
0.133
0.2624
15.1
7 38 22
7
4
1
0.00165
0.000028
0.143
8 42 25
8
4
1
1.2 108
1.3 1016
1.3 107
9 44 27
9
3
1
1.5 1023
2.4 1046
2 1022
Table 3. Curved Trajectories Algorithm performance on helical valley function which is
not defined at any point [0 0 s]T , s . This function is quite a challenge given the
continuity requirements.
Wood’s saddle point function
f ( x ) 100( x2 x12 ) 2 (1 x1 ) 2 90( x4 x32 ) 2 (1 x3 )2
10.1[( x2 1)2 ( x4 1) 2 ] 19.8( x2 1)( x4 1)
x0 [3 1 3 1]T
x* [1 1 1 1]T
Counters
xk x *
f ( xk )
f ( xk )
Order
p
k #f #G #H
0
1
1
0
6.32
19192
16400
1
7
5
1
4
4
0.386
52.25
342
2 11
8
2
3
2
0.286
38.26
330
3 18 12
3
4
2
0.784
0.6307
9.57
4 24 16
4
3
0.68
0.0566
0.4228
27.4
5 30 20
5
4
2
0.00164
0.0336
4 106
6 34 23
6
4
1
7 1010
3.5 1019
1.8 10 8
7 35 24
7
2
1
1016
2.3 1036
3.3 1017
Table 4. Curved Trajectories Algorithm performance on Wood’s function with a saddle
point that traps many algorithms.
Powel’s singular Hessian at the solution
f ( x ) ( x1 10 x2 ) 2 5( x3 x4 ) 2 ( x2 2 x3 ) 4 10( x1 x4 ) 4
x0 [3 1 0 1]T
x* [0 0 0 0]T
Counters
xk x *
f ( xk )
f ( xk )
Order
p
k #f #G #H
0
1
1
0
3.32
215
459
1
7
5
1
4
3
0.82
1.99
16.8
2 19
10
2
4
2.06
0.0421
0.000014
0.00229
3 26
14
3
4
3
0.000327
5.1 1014
1 109
4 29
17
4
4
3
4.15 105
1.3 1017
2.2 1012
5 32
20
5
4
3
5.26 106
3.4 1021
4.5 1015
Table 5. Curved Trajectories Algorithm performance on Powell’s function with singular
Hessian at the solution, which means solution has multiplicity greater than one. The p = 3
0
0
2 20
0
selection, suggests multiplicity of 3. Hessian at x* = 20 200 0
0
10 10
0
0 10 10
0
Cragg and Levy’s function with exponential, tangent, large exponents and singular Hessians
f ( x ) ( e x1 x2 )4 100( x2 x3 )6 tan 4 ( x3 x4 ) x18 ( x4 1) 2
x0 [1 2 2 2]T
x* [0 1 1 1]T
Counters
xk x *
f ( xk )
f ( xk )
Order
p
K #f #G #H
0
1
1
0
2
2.266
12.3
1
6
4
1
2
0.735
1.64
1.254
9.07
2
13
7
2
4
1
1.14
0.3391
4.12
3
19 11
3
4
2
0.396
0.002114
0.056
4
27 16
4
3
2.15
0.245
0.00344
5.7 105
5
33 20
5
4
3
0.0176
3.4 10 9
1.1 105
6
36 23
6
4
3
0.00549
3.9 1013
2.09 109
7
39 26
7
4
3
0.000755
7.9 1017
4.5 1012
Table 6. Curved Trajectories Algorithm performance on Cragg and Levy’s steep walls function
103.2 16.81
with singular Hessians: at x0 = 16.81 6.186
0
0
0
0
0 0
0 0 , and at x* =
0 0
0 2
0
0
0
0
0 0 0
0 0 0
0 0 0
0 0 2
Initial Comparisons
Function
Algorithm
Counters
FR
DFP
B
F
CTAn
CTA
Iterations
27
19
35
39
4
4
#f
155
96
51
47
47
26
#G
28
20
36
47
27
14
Iterations
36
20
21
35
9
8
#f
202
141
140
42
108
41
#G
37
21
22
42
69
28
Iterations
189
57
42
60
7
6
#f
3288
475
310
61
64
38
#G
190
58
43
61
92
23
Powell
Singular
Hessian
Iterations
104
36
38
60
4
3
#f
624
434
374
68
38
25
#G
105
37
39
68
71
13
Cragg-Levy
Singular
Hessians
Iterations
39
96
84
82
6
5
#f
221
424
350
91
207
34
#G
40
97
85
91
62
25
Rosenbrock
banana valley
Fletcher
helical valley
Wood saddle
Cuter Performance Profiles
CPU-time Profile (127 problems < 500 variables)
Cumulative Distribution
100%
90%
80%
70%
60%
50%
40%
30%
1
2
3
4
5
6
7
8
9
Normalized CPU-time/problem
10
11
12
CTA
CTAn
CTAnn
CG Descent
Lancelot
Tenmin
L-BFGS
L-BFGS-B
Cuter Performance Profiles
CPU-time Profile (51 problems >= 500 variables)
Cumulative Distribution
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
3
5
7
9
11
13
Normalized CPU-time/problem
CTA
CTAn
CG Descent
Lancelot
L-BFGS-B
Partial List of Research Pursuits
Handle several functions to minimize:
Pareto Optimal point.
Combine Trust-Region Method, or other
strategies.
Explore the family of infinite series for
combination of composition functions.
Handle constraint functions.
Hessian < 0 Changes
Rotations
Rotations 3D
Conclusions, what’s new:
• Infinite series families for the solution to a
nonlinear vector function equation
• High order terms accurately approximated
from the Gradient and the Hessian
• Scalar searches that may be along
polynomial curved trajectories
• Testing shows considerable promise for
problems even as large as 10000
variables