Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

Download Report

Transcript Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

Covariance Matrix Adaptation
Evolution Strategy (CMA-ES)
References

Hansen, N. The CMA Evolution Strategy: A Tutorial. November 24, 2010.
http://www.lri.fr/~hansen/cmatutorial.pdf

Auger, A. and Hansen, N. CMA-ES Tutorial Slides for GECCO 2011.
http://www.lri.fr/~hansen/gecco2011-CMA-ES-tutorial.pdf

Ros, R. and Hansen, N. A Simple Modification in CMA-ES Achieving Linear
Time and Space Complexity. INRIA 2008.

Jastrebski, G. and Arnold, D. Improving Evolution Strategies through Active
Covariance Matrix Adaptation. IEEE Congress on Evolutionary
Computation. July, 2006.

All can be found at: http://www.lri.fr/~hansen/index.html

Search for ‘cma-es’ on Google.
Disclaimer

A lot of these slides are simply transcriptions of parts of
Hansen’s written pdf tutorial or images copied from his
tutorial slides for GECCO 2011. During the seminar I
tried to explain in more detail what the various copied
sections were saying. I don’t want it to seem like I wrote
them or designed the graphics.
Black Box Optimization


Find a search point x with as small (or large) a value as possible.
Function values of evaluated search points are the only accessible
information.



Gradients not available
Search points can be evaluated freely.
Search cost is the number of function evaluations
Search Points
(parameter vectors)
Function Values
f(x1)
f(x2)
f(x3)
f(x4)
CMA-ES

Goal is to cope with objective functions which are:








Non-linear
Non-separable
Non-convex
Multimodal
Non-smooth
Ill-conditioned
Noisy
Moderate dimensionality
Separability
Ill-Conditioned
Second order will not help for problems which are separable and contain only
first order terms. In which case all partial second order derivatives will be zero.
Quasi-Newton Optimization Review

Approximate local region around f(x) as a quadratic function and
take a step to (or toward) the minimum.
Approximation of Hessian matrix
Second order Taylor expansion


If f(x) happens to be quadratic then we will go straight to the
minimum.
BFGS and Levenberg-Marquardt as examples
CMA-ES - Rational
CMA-ES Rational
CMA-ES - Overview
CMA-ES
How to update m, C, and σ?
Mean Update
Covariance Matrix Update
Bessel’s Correction
Variance within sampled
points.
Sample mean
Variance of sampled
steps.
True mean
But we want to estimate a ‘better’ covariance matrix (more likely to reproduce
successful steps). Not just re-estimate the distribution we sampled from.
Rank-μ-Update


To achieve fast search performance the population size
must be small, but the previous equation is not a reliable
estimator for the covariance matrix when the population
size is small.
However, after a sufficient number of generations the
mean of the estimated covariance matrices from all
generations becomes a reliable estimate.
Must adjust for the global step-size
Rank-μ-Update

Assign recent generations larger weight.
(14)
• 1/cμ is the backward time horizon. ~63% of the
information in C(g+1) is from the last 1/cμ generations.
• Small values -> slow learning.
• Large values -> covariance matrix can degenerate.
Rank-One-Update
(19)
(20)
The Evolution Path
The Evolution Path
(22)
Rank-One-Update Using Evolution Path
(26)
Combing Rank-One and Rank-μ-Update
Why have a global step-size?
Why have a global step-size?
The Evolution Path Again?
Step-Size Control
Step-Size Control
Step-Size Control
Step-Size Control
Defaults
Termination
Selecting population size

Larger population



Better global search performance.
Slower convergence.
Recommend restart strategy with increasing population
size



Run until termination.
Increase population size by 2.
Repeat until max number of fitness evaluations or runtime is
met.
Computational and Memory Complexity

Eigen decomposition complexity

O(n3)

Specifically for Intel Math Kernel Library ~9n3




sytrd – (4/3)n3
orgtr - (4/3)n3
pteqr – 6n3
Memory requirement




Minimum 2n2
N = 100 -> 8*1002 = 78.13 KB
N = 1000 -> 8*10002 = 7.63 MB
N = 10000 -> 8*100002 = 762.94 MB
Separable CMA-ES

Update only the diagonal elements of the covariance matrix

Eigen decomposition is now just taking the square root of the
diagonal elements of the covariance matrix.
Time and space complexity reduced to O(n)
No longer learn dependencies.




Hurts performance on non-separable functions.
For very large n performance between CMA-ES and sepCMA-ES on non-separable functions diminishes. The number of
iterations needed to learn all parameters of the covariance
matrix becomes infeasible.
Active CMA-ES

Can the poor solutions from each generation be used?
Active CMA-ES

Actively reduce variances in unpromising directions
instead of weighting for them to decay.

Faster convergence on discus, ellipsoid, and cigar
functions. Up to 2x.

Covariance matrix no longer guaranteed to be positive
definite.
Software


http://www.lri.fr/~hansen/cmaes_inmatlab.html
MATLAB



C


Boundary handling
Optional Active/Sep-CMA-ES modifications
Includes sep-CMA-ES option, but doesn’t actually reduce
memory requirement
Java/Python/C++/Fortran