Constraint propagation
Download
Report
Transcript Constraint propagation
Local search and
Optimisation
Introduction: global versus local
Study of the key local search techniques
Concluding Remarks
Introduction:
Global versus Local search
Global versus Local search
Global search:
interest: find a path to a goal
properties:
search through partial paths
in a systematic way (consider all paths – completeness)
opportunity to detect loops
Global versus Local search (2)
Local search:
interest: find a goal
or a state that maximizes/minimizes some objective
function.
function.
4-queens example:
interest in the solutions
not in the way we find them
4
Global versus Local search (3)
Local search:
interest: find a goal
or a state that maximizes/minimizes some objective
function.
rostering:
objective function: estimates quality of roster
optimize the objective function
5
Is the path relevant or not ?
The 8-puzzle:
path relevant
Chess:
path relevant
Water jugs:
path relevant
Traveling sales person:
Symbolic integrals:
path relevant *
could be both
Blocks planning:
path relevant
q-queens puzzle:
not relevant
rostering:
not relevant
6
Traveling sales person:
Representation is a partial sequence:
(New York, Boston)
Global search !
Representation is a potential solution:
(New York, Boston, Miami, SanFran, Dallas, New York)
Local search!
- the path is encoded in every state
- just find a good/optimal state
7
General observations on
Local Search:
Applicable if path to solution is not important
but see comment on TSP
Keeps only 1 (or a fixed k) state(s).
k for local beam search and genetic algorithms
Most often, does not systematically investigate all
possibilities
and as a result, may be incomplete or suboptimal
Does not avoid loops
unless explicitly designed to (e.g. Tabu search) or
included in the state representation
Is often for optimization of an objective function
8
Local Search Algorithms:
Hill Climbing (3) (local version)
Simulated Annealing
Local k-Beam Search
Genetic Algorithms
Tabu Search
Heuristics and Metaheuristics
Hill-Climbing (3)
or Greedy local search
The really “local-search” variant of
Hill Climbing
Hill Climbing (3) algorithm:
Let h be the objective function
State:= S;
STOP := False;
WHILE not STOP
DO Neighbors := successors(State);
IF max(h(Neighbors)) > h(State)
State:= maximal_h_neighbor;
Else STOP:= True;
Return State
Hill Climbing 2,
but without paths
or, minimization (see 8-queens)
11
The problems:
Foothills:
Plateaus
Local
maximum
Ridges
12
More properties:
Termination ?
If h is bounded (in relevant direction)
and there is a minimal step in h-function.
Completeness ?
No !
13
Case study: 8-queens
n1 n2 n3 n4 n5 n6 n7 n8
h
= 17
Minimization !
h = the number of pairs of queens
attacking each other in the state
14
8-queens (cont.)
Neighbors of (n1, n2,…., n8):
obtained by changing only 1 ni
56 neighbors
a local minimum
h=1
15
How well does it work?
For 8-queens:
But how to improve the success rate ?
Plateau’s: sideway moves
At plateau: allow to move to equal-h neighbor
Danger: Non-termination !
Allow only a maximum number of consecutive
sideway moves (say: 100)
Result:
Success rate:
94% : success
6 % : local minimum
17
Variants on HC (3)
Stochastic Hill Climbing:
Move to a random neighbor with a better h
Gets more solutions / but is slower
First-choice Hill Climbing:
Move to the first-found neighbor with a better h
Useful if there are VERY many neighbors
18
Garanteed completeness:
Random-Restart Hill Climbing
If HC terminates without producing a solution:
Then restart HC with a random new initial state
If there are only finitely many states, and
If each HC terminates, then:
RRCH is complete (with probability 1)
19
Analysis of RRHC:
Pure RRHC:
If HC has a probability p of reaching success, then
we need (average) 1/p restarts in RRHC.
For 8-queens:
- p= 0.14
1/p ≈ 7 iterations
- Cost (average) ?
(6 (failures) * 3) + (1 (success) * 4) = 22 steps
With sideway moves added:
For 8-queens:
- p= 0.94
1/p = 1.06 iterations
of which (1-p)/p
1/p-1 = 0.06/0.94 fail
- Cost (average) ?
(0.06/0.94 (failures) * 64) + (1 (success) * 21) ≈ 25
steps
20
Conclusion ?
Local Search is replacing more and more other
solvers in _many_ domains continuously
including optimization problems in ML.
21
Simulated Annealing
Kirkpatrick et al. 1983
Simulate the process of annealing
from metallurgy
Motivations:
1) HC (3): best moves
fast, but stuck in local optima
Stochastic HC: random moves
slow, but complete
Combine !
2) RRHV: resart after failure
why wait until failure?
include ‘jumps’ during the process !
At high ‘temperature’ : frequent big jumps
At low ‘temperature’ : few smaller ones
3) Get ping-pong ball to deepest hole, by rolling ball
and shaking surface
The algorithm:
State:= S;
FOR Time = 1 to ∞ DO
Temp:= DecreaseFunction(Time);
IF Temp = 0 Then Return State;
Else Next:= random_neighborg(State);
Δh:= h(Next) – h(State);
IF Δh > 0 Then State:= Next;
Else State:= Next with probability
e^(Δh/Temp)
End_FOR
For slowly deceasing temperature, will reach
global optimum (with probablity 1)
24
Local k-Beam Search
Beam search, without keeping partial paths
Local k-Beam Search
≠ k parallel HC(3) searches
The k new states are the k best
of ALL the neighbors
Stochastic Beam Search
26
Genetic Algorithms
Holland 1975
Search inspired by evolution theory
General Context
Similar to stochastic k-beam search
keeps track of k States
Different:
generation of new states is “sexual”
Crossover
In addition has: selection and mutation
States must be represented as strings over some
alphabet
e.g. 0/1 bits or decimal numbers
Objective function is called fitness function
28
8-queens example:
State representation: 8-string of numbers [1,8]
Population: set of k states -- here: k=4
8-queens (cont.)
Step 1: Selection:
Fitness function applied to population
Probability of being selected:
proportional to fitness
Select: k/2 pairs of states
8-queens (cont.)
Step 2: Crossover:
Select random crossover point
here: 3 for pair one, 5 for pair two
Crossover applied to the strings
8-queens (cont.)
Step 3: Mutation:
With a small probability:
change a string member to a random value
The algorithm:
Given: Fit (a fitness function)
Pop:= the set of k initial states;
REPEAT
New_Pop:= {};
FOR i=1,k Do
x:= RandomSelect(Pop, Fit);
y:= RandomSelect(Pop, Fit);
Different from the
child:= crossover(x,y);
example, where both
IF (small_random_probability) crossovers were used
Then child:= mutate(child);
New_Pop:= New_Pop U {child};
End_For
Pop:= New_Pop;
Until a member of Pop fit enough or time is up
Comments on GA
Very many variants – this is only one instance!
keep part of Pop, different types of crossover, …
What is added value?
If the encoding is well-constructed: substrings may
represent useful building blocks
Ex: (246*****) is a usefull pattern for 8-queens
Ex Circuit design: some substring may represent a
useful subcircuit
Then crossover may produce more useful states !
In general: advantages of GA are not well understood
34
Interpretation of crossover:
If we change our representation from 8 decimal numbers
to 24 binary digits, how does the interpretation change?
Tabu Search
Glover 1986
Another way to get HC out of local minima
Tabu = forbidden
In order to get HC out of a local maximum:
Naïve idea:
Allow one/some moves downhill
Problem: When switching back to HC, it will just move back!
TabuList: Where you are forbidden
to go next.
The Tabu search idea:
Keep a list TabuList with information on which new
states are not allowed
n3
n3
n3: 2
6
Add to TabuList:
(n3, 6, 2) :
(n3, 2) :
(n3) :
don’t make the opposite move
don’t place n3 back on 2
don’t move n3
or
or
Hoped effect: visualized
TabuList determines an area
move back.
where NOT to
TabuList is kept short: only recent history
determines it.
The algorithm:
Given: Fit (a fitness function)
State:= S;
Best:= S;
TabuList:= {};
WHILE not(StopCondition) Do
Candidates:= {};
FOR every Child in Neighbors(State) Do
IF not_forbidden(Child, TabuList) then
Candidates:= Candidates U {Child};
Succ:= Maximal_Fit(Candidates);
State:= Succ;
If Fit(Succ) > Fit(Best) then
TabuList:= TabuList U {ExcludeCondition(Succ, Best)};
Best:= Succ;
Eliminate_old(TabuList);
Return Best;
Example Personel Rostering:
Initialise to satisfy the
required amounts
Allow only vertical swaps
(neighbors)
If a swap has influenced
a certain region of the
timetable, do not allow
any other swap to
influence this region for
a specified number of
moves (Tabu list, Tabu
attributes)
Shift
1 2 3 4 5 7 1 2 3 C
Pjotr
A A A
Ludwig
C C
R
T T T 0
Clara
T T R
R
F T T 1
Hildegard
A
T T F 1
A A
T T T 0
A C C
T T F 1
Wolfgang
R T T T C
T T T 0
Guiseppe
R
F T T 1
Antonio
R R
Arranger
2 2 2 1 0 0
Tonesetter
1 2 1 1 0 0
Composer
1 1 1 1 2 0
Reader
3 1 1 1 2 0
Johann
R C
F T T 1
Heuristics and Meta-heuristics
More differences with Global search
Examples of problems and heuristics
Meta-heuristics
Heuristics
In Global search: h: States
N
In Local search:
- How to represent a State?
- How to define the Neighbors?
- How to define the objective or Fit function?
Are all heuristic choices that influence the
search VERY much.
Finding routes:
• Given a weighted graph, (V,E), and two vertices,
‘source’ and ‘destination’, find the path from
source to destination with the smallest
accumulated weight.
• Dijkstra: O(|E|2 + |V|) (for sparse
graphs O(|E|log|V|))
The objective function:
In general: many
different functions
possible
OR
Stock cutting:
X 10000
Out of
rectangular
sheets with
fixed
dimensions
• NP-complete, even in one dimension
(pipe-cutting)
Objective function?
Minimize waste!
Personel rostering
Consists of assignments
of employees to working
shifts while satisfying all
constraints
Constraints:
• Shifts have start
times and end times
• Employees have a
qualification
– Constraints:
A required
capacity
•• Shifts
have start
times and
end
per times
shift is given per
• Employees have a
qualification
qualification
•• A
required capacity
per
Employees
can
work
shift is given per
subject to specific
qualification
• Employees
can work subject
regulations
to specific regulations
•• …
…
Shift
1
2
Pjotr
A A A
Ludwig
C C
Clara
T
T
3
4
5
7
1 2 3 C
C T T F 1
R R T T T 0
C
R R F T T 1
Hildegard
A A A
T T T 0
Johann
C C
T T F 1
C T
Wolfgang
T
C
T T T 0
Guiseppe
R R
A A F T T 1
Antonio
R R
C C F T T 1
Arranger
2
2
2
1
0
0
Tonesetter
1
2
1
1
0
0
Composer
1
1
1
1
2
0
Reader
3
1
1
1
2
0
Objective function?
Just solve the problem
(CP)
Number of constraints
violated
Weighted number of
constraints violated
Amount under
assignment
Amount over assignment
This may lead to the
definition of a goal
function (representing a
lot of domain
information)
Shift
1
2
3
4
5
7
1 2 3 C
Pjotr
A A A
Ludwig
C C
R R T T T 0
Clara
T T C
R R F T T 1
C T T F 1
Hildegard
A A A
T T T 0
Johann
C C
T T F 1
C T T C
T T T 0
Wolfgang
Guiseppe
R R
A A F T T 1
Antonio
R R
C C F T T 1
Arranger
2 2
2 1
0 0
Tonesetter
1 2
1 1
0 0
Composer
1 1
1 1
2 0
Reader
3 1
1 1
2 0
Neighbors for Rostering
One can easily think of
Swaps
Removals
Insertions
‘Large Swaps’
These ‘easy’ options do
depend on the domain
They define ‘steps’ in a
‘solution space’ with an
associated change in the
goal function.
One obvious heuristic is a
hill-climber based on a
selection of these
possible steps.
Shift
1
2
3
4
5
7
1 2 3 C
Pjotr
A A A
Ludwig
C C
R R T T T 0
Clara
T T C
R R F T T 1
C T T F 1
Hildegard
A A A
T T T 0
Johann
C C
T T F 1
C T T C
T T T 0
Wolfgang
Guiseppe
R R
A A F T T 1
Antonio
R R
C C F T T 1
Arranger
2 2
2 1
0 0
Tonesetter
1 2
1 1
0 0
Composer
1 1
1 1
2 0
Reader
3 1
1 1
2 0
Traveling Sales Person:
Neighbors:
n cities
n! routes
2-change connects them all !
Meta-heuristics
All the methods, HC, RRHC, Sim.Ann., GA, Tabu
Search, … are meta-heuristics
They provide frameworks in which the user can plug
in heuristics.
At a higher level:
Meta-heuristics can be combined:
use algorithm 1 until condition 1 holds
then, use algorithm 2 until condition 2 holds, ….
Specific combinations are known to work well for
certain types of problems.
ML is used to ‘learn’ which algorithms work better in
which problems.
52
Concluding remarks
Local Search in Continuous Spaces
Variable Neighborhoods Search
Relation to BDA: some pointers
Continuous Search Spaces
Some basic ideas
In 1-dimension: the derivative
All problems studied so far were for discrete
search spaces!
How is HC different for continuous spaces?
Vector points in
ascending direction
dh (x1) > 0
dx
x1 (3)
In 1-dimension: the derivative
All problems studied so far were for discrete
search spaces!
How is HC different for continuous spaces?
Let HC move in the
direction of dh
dx
Vector still points in
ascending direction !
dh (x2) < 0
dx
(-5) x2
For instance:
x:= x + a. dh
dx
In 1-dimension: the derivative
All problems studied so far were for discrete
search spaces!
How is HC different for continuous spaces?
Eventually we get
to a (local) maximum.
dh (x3) = 0
dx
(0) x3
In n-dimensions: the gradient
(x1,y1)
The direction of the strongest ascent:
Δ
The gradient:
h = ( dh/dx1, dh/dx2, …, dh/dxn)
Gives: gradient ascent / gradient descent approaches.
Example: airport placement
Place an airport, nearest to n given cities.
Cities: C1, C2, …, Cn
h(x,y) = Σ (x-xCi)^2 + Σ (y-yCi)^2
i=1,n
dh/dx = 2 Σ(x-xCi)
i=1,n
dh/dy = 2 Σ(y-yCi)
Solve: Σ(x-xCi) = 0 , Σ(y-yCi) = 0
Iterative method: Newton-Raphson: converges to roots
Solution:
x = Σ(xCi) /n
y = Σ(yCi) /n
Obviously correct: center of
the x- and y-coordinates
Variable Neighborhood Search
Exploit various different ways of defining
neighborhoods to move out of local optima
Variable neighborhood search
(Mladenovic and Hanssen 1997)
Facts:
A local minimum with respect to one neighborhood
structure is not necessarily so for another.
A global minimum is a local minimum with respect to
all neighborhood structures
For many problems local minima with respect to one
or several neighbourhoods are relatively close to
each other
Idea: use different
neighborhoods
By moving to a different neighborhood: you may get
out of the local optimum !
Define a number of different neighborhoods
different ways to compute successors
If you can not get out of local optimum in one,
try the next neighborhood.
Algorithm
Select a set of neighbourhood structures Nl (l = 1 to lmax)
State:= S;
l:= 1;
Repeat until termination condition met
Exploration:
find the best neighbour Succ of State in Nl(State);
Acceptance:
If h(Succ) > h(State) then
State:= Succ;
l := 1;
Else l:= l+1;
If l > lmax then l:= 1;
Broad subdomain
Many variants exist !
possible topic for you presentation.
find a paper on variable neighborhood search
technique or application.
Relation to BDA: some pointers
Examples
Sub-modularity
Discrete example:
Given A, a set of 30 possible features to diagnose the flu.
Diarea
Temp >38
Wife had
the flu
...
Coughs
h
42%
precision
Given h, a function from 2^A -> N, giving how well this
subset allows to discriminate flu versus not-flu.
Find the best discriminating subset with 5 elements.
Discrete example:
Diarea
Temp >38
Wife had
the flu
...
Coughs
Start with empty subset,
Add the one element that increase h the most,
Then add the next element that increase h most,
etc.
This is Hill Climbing !!
Contiuous example:
-
+
Find the vector/line that discriminates best.
E.g.: maximize the minimal distance to the points
Continuous optimization problem.
E.g.: by continuous local search.
In discrete case:
Submodularity
The property of Diminishing Returns:
If A C B and s Є S, then
h(A U {s}) – h(A) ≥ h(B U {s}) - h(B)
B
A
add
s
h
Δ
Definition of
Submodularity
Submodularity holds for MANY objectives in ML !!
Relevance of HC for ML:
In ML: if h is a submodular function:
Theorem:
If Greedy Local returns A_greedy, then
h(A_greedy) ≥ (1- 1/e) max h(A)
ACS
≈ 63 %
IF P ≠ NP:
This is the very best one can hope for (in Poly. time)
VERY many problems in ML are submodular !!
Local search in very relevant for ML.
71
Reading assignment and
presentations:
Applications of Local Search
Other Local Search or variants of the studied methods
Applications of Local Search in ML
Further aspects of Submodularity
For the coming SAT-solving:
MAX-SAT solving
Mini-Sat
Further relations between SAT and Local Search
Start with Google and Wiki
Study at least one “real”/scientific source
Provide the reference on your sources.