No Free Lunch (NFL) Theorem Presentation by Kristian Nolde

Download Report

Transcript No Free Lunch (NFL) Theorem Presentation by Kristian Nolde

No Free Lunch (NFL)
Theorem
Presentation by Kristian Nolde
Many slides are based on a presentation of Y.C. Ho
General notes
Goal:
• Give an intuitive feeling for the NFL
• Present some mathemtical background
To keep in mind
• NFL is an impossibility theorem, such as
– Gödel‘s proof in mathematics (roughly: some facts
cannot be proved or disaproved in any
mathematical system)
– Arrow‘s theorem in economics (in principle,
perfect democracy is not realizable)
• Thus, practicle use is limited ?!?
25. August 2004 – 2/29
The No Free Lunch Theorem
• Without specific structural assumptions, no
optimization scheme can perform better
than blind search on the average
• But blind search is very inefficient!
• Prob (at least one out of N samples is in the
top-n for search space of size |Q|) ~ nN/|Q|
ex. Prob=0.0001 for |Q|=109, n=1000, N=1000
25. August 2004 – 3/29
Assume a finite World
Finite # of input symbols (x’s) and
finite # of output symbols (y’s) =>
finite # of possible mappings from input to
output (f’s)
25. August 2004 – 4/29
The Fundamental Matrix F
f1
f2
f|F|
x1
0
0
1
1
1
x2
0
0
0
1
1
1
x|X|
0
1
1
0
1
In each row, each value of Y appear |Y| |X|-1 times!
FACT: equal number of 0’s and 1’s in each row!
Averaged over all f, the value is independent of x!
25. August 2004 – 5/29
Compare Algorithms
• Think of two algorithms: a1 and a2
e.g. a1 always selects from x1 to x.5|X|
a2 always selects from x.5|X| to x|X|
• For specific f: a1 or a2 may be bettter. However, if f is not
known average performance of both is equal:
y
y
P
(
d
f
,
a
)

P
(
d
f , a2 )


1
f
f
where d is a sample and dy is the cot value associated
with d.
25. August 2004 – 6/29
Comparing Algorithms Continued
• Case 1: Algorithms can be more specific, e.g. assume
a certain realization fk, a1
• Case 2: Or, they can be more general, assume more
uniform distribution of possible f, a2.
• Then performance of a1 will be excellent for fk but
catastrophic for all other cases (great performance, no
robustness)
• Contrary, a2 performs mediocre for all cases, but
doesn‘t fail (poor performance, high robustness)
Common Sense says:
Robustness * Efficiency = Constant
or
Generality * Depth = Constant
25. August 2004 – 7/29
Implication 1
• Let x be the optimization variable, f the
performance function, and y the performance,
i.e., y=f(x)
• then averaged over all possible optimization
problems, the result is choice independent
• if you don’t know the structure of f (which
column you are dealing with), blind choice is
as good as any!
25. August 2004 – 8/29
Implications 2
• Let X be the space of all possible
representation (as in genetic algorithms), or
space of all possible algorithms to apply to a
class of problems
• Without understanding of the problem, blind
choice is as good as any.
• “understanding” means you know which
column of the F matrix you are dealing with
25. August 2004 – 10/29
Implications 3
• Even if you know which columns or group of
columns you are dealing with => you can
specialize the choice of rows
• You must accept that you will suffer LOSSES
should other choices of column occur due to
uncertainties or disturbances
25. August 2004 – 11/29
The Fundamental Matrix F
f1
f2
f|F|
x1
0
0
1
1
1
x2
0
0
0
1
1
1
x|X|
0
1
1
0
1
Assume a distribution of the columns, then pick a row that results
in minimal expected losses or maximal performance. This is
stochastic optimization
25. August 2004 – 12/29
Implications 5
• Worse, if you should estimate the
probabilities incorrectly, then your
stochastically optimized solution may
suffer catastrophic bad outcomes more
frequent then you like.
• Reason: you have already used up more
of the good outcomes in your “optimal”
choice. What are left are bad ones that
are not suppose to occur! (HOT Design
& power law -Doyle)
25. August 2004 – 13/29
Implications 6
• Generality for generality sake is not very
fruitful
• Working on a specific problem can be
rewarding
• Because:
– the insight can be generalized
– the problem is practically important
– the 80-20 effect
25. August 2004 – 14/29