Review for Finals 2009

Download Report

Transcript Review for Finals 2009

Review for Finals 2011
159.302
SEARCH
CONSTRAINT SATISFACTION PROBLEMS
GAMES
FUZZY LOGIC
NEURAL NETWORKS
GENETIC ALGORITHM (not included in the exam)
LOGIC (not included in the exam)
Allotment of marks
SEARCH
– 15 marks
FUNDAMENTALS (true or false)
– 8 marks
CONSTRAINT SATISFACTION PROBLEMS
– 10 marks
GAMES
– 8 marks
FUZZY LOGIC
– 7 marks
NEURAL NETWORKS
– 12 marks
Total = 60 marks
Search
Robot Navigation
Obstacle Avoidance, Target Pursuit, Opponent Evasion
Input:
Multiple Obstacles:
x, y, angle
Target’s x, y, angle
Output: Robot angle, speed
Cascade of Fuzzy Systems
Multiple Fuzzy Systems employ the various robot behaviours
Path planning Layer:
Path
Layer
The Planning
A* Algorithm
Next Waypoint
Fuzzy Target
SystemPursuit
1
Fuzzy System 1:
Adjusted Angle
Central
Control
Fuzzy System 2:
Fuzzy
System
Speed
Control
for 2Target Pursuit
Target
Pursuit
Adjusted Speed
ObstacleDistance <
MaxDistanceTolerance
and closer than Target
N
Y
Obstacle
Avoidance
Fuzzy
System
3
Fuzzy System 3:
Adjusted Angle
Fuzzy System 4:
Speed
FuzzyControl
Systemfor
4 Obstacle
Avoidance
Adjusted Speed
Actuators
Obstacle
Avoidance
SEARCH
1
Background and Motivation
General Idea: Search allows exploring alternatives
These algorithms provide the conceptual backbone of almost
every approach to the systematic exploration of alternatives.
Topics for Discussion:
• Background
• Uninformed vs. Informed Searches
• Any Path vs. Optimal Path
• Implementation and Performance
7
SEARCH
1
Graph Search as Tree Search
• We can turn graph search problems (from S to G) into tree search problems by:
1.
2.
Replacing undirected links by 2 directed links
Avoiding loops in path (or keeping track of visited nodes globally)
S
A
C
A
D
B
G
D
S
G
D
C
B
C
G
C
G
8
SEARCH
1
More Abstract Example of Graphs
Planning actions (graph of possible states of the world)
Here, the nodes denote descriptions of the state of the world
Path = “plan of actions”
B
C
A
Put C on
A
A
B
C
B
A
Put B on
C
C
A
Put C on
B
A
C
C
B
B
Put A on
C
9
SEARCH
3
Classes of Search
Class
Name
Operation
Depth-First
Systematic exploration of the
whole tree until a goal is
found
Any Path Uninformed
Breadth-First
Any Path Informed
Optimal Uninformed
Optimal Informed
Best-First
Uniform-Cost
A*
Uses heuristic measure of
goodness of a state
(e.g. estimated distance to
goal)
Uses path-length measure.
Finds “shortest” path.
Uses path “length” measure
and heuristic. Finds
“shortest” path.
10
SEARCH
3
Simple Search Algorithm
A Search Node is a path from some state X to the start state
e.g. (X B A S)
The state of a search node is the most recent state of the path
e.g. X
Let Q be a list of search nodes
e.g. (X B A S) (C B A S) …)
Let S be the start state.
1.
Initialise Q with search node (S) as only entry;
set Visited = (S).
2. If Q is empty, fail. Else, pick some search node N from Q.
3. If state (N) is a goal, return N (we’ve reached the goal).
4. (Otherwise) Remove N from Q.
Critical Decisions:
Step 2: picking N from Q.
Step 6: adding extensions of
N to Q.
5. Find all the descendants of state (N) not in Visited and
create all the one-step extensions of N to each descendant.
6. Add the extended paths to Q; add children of state (N) to
Visited.
7. Go to Step 2.
11
SEARCH
4
Implementing the Search Strategies
Depth-First (backtracking search)
1. Pick first element of Q.
2. Add path extensions to front of Q.
Breadth-First
1. Pick first element of Q.
Heuristic functions are applied in
the hope of completing the search
quicker or finding a relatively
good goal state. It does not
guarantee finding the “best” path
though.
2. Add path extensions to end of Q.
Best-First (greedy search)
1. Pick best (measured heuristic value of state) element of Q.
2. Add path extensions anywhere in Q (it may be efficient to keep the Q
ordered in some way so as to make it easier to find the “best” element.
12
SEARCH
4
Depth-First with Visited List
Step Q
Visited
1
(S)
S
2
(AS)(BS)
A, B, S
3
(CAS)(DAS)(BS) C, D, B, A, S
4
(DAS)(BS)
C, D, B, A, S
5
(GDAS)(BS)
G,C,D,B,A,S
C
A
G
D
S
B
Sequence of State Expansions: S-A-C-D-G
Pick first element of Q; Add path extensions to front of Q.
13
In DFS – nodes are pulled off the queue and inserted into the queue using a stack.
SEARCH
4
Visited States
Keeping track of visited states generally improves time efficiency when
searching graphs, without affecting correctness. Note, however, that
substantial additional space may be required to keep track of visited states.
If all we want to do is find a path from the start to the goal, there is no
advantage to adding a search node whose state is already the state of
another search node.
Any state reachable from the node the second time would have been
reachable from that node the first time.
Note that, when using Visited, each state will only ever have at most one
path to it (search node) in Q.
We’ll have to revisit this issue when we look at optimal searching
14
SEARCH
4
Worst Case Running Time
In the worst case, all
the searches, with or
without visited list may
have to visit each state
at least once.
So, all searches will
have worst case
running times that are
at least proportional to
the total number of
states and therefore
exponential in the
“depth” parameter.
Max Time is proportional to the Max. number of Nodes visited.
b=2
d=0
d=1
d=2
d=3
d is depth
b is branching factor
d+1
Number of States in the tree = bd < (bd+1 – 1)/(b-1) < b15
SEARCH
4
Worst Case Space
Max Q Size = Max (#visited – #expanded).
expanded
visited
b=2
d=0
d=1
d=2
d=3
Depth-first maximum Q size:
(b-1)d ≈ bd
Breadth-first max. Q size: bd
16
SEARCH
4
Cost and Performance of Any-Path Methods
Searching a tree with branching factor b and depth d (without using a
Visited List)
Search
Method
Depth-First
Breadth-First
Best-First
Worst Time
Worst Space
Fewest
states?
bd+1
bd
No
bd+1
bd
Yes
bd+1 **
bd
No
*If there are no indefinitely long paths in the search space
**Best-First needs more time to locate the best node in Q.
Guaranteed
to find a
path?
Yes*
Yes
Yes*
Worst case time is proportional to the number of nodes added to Q.
Worst case space is proportional to maximal length of Q.
17
SEARCH
4
Cost and Performance of Any-Path Methods
Searching a tree with branching factor b and depth d (with Visited List)
Search
Method
Worst
Time
Worst
Space
Depth-First
bd+1
bd
bd+1
No
bd+1
bd
bd+1
Yes
bd+1 **
bd
bd+1
No
Breadth-First
Best-First
*If there are no indefinitely long paths in the search space
**Best-First needs more time to locate the best node in Q.
Fewest
states?
Guarantee
d to find a
path?
Yes*
Yes
Yes*
Worst case time is proportional to the number of nodes added to Q.
Worst case space is proportional to maximal length of Q and
Visited list.
18
SEARCH
4
States vs. Path
b=2
d=0
d=1
d=2
d=3
Using a Visited list, the worst-case time performance is limited by the number of
states in the search space rather than the number of paths through the nodes in
the space (which may be exponentially larger than the number of states.
Using a Visited list helps prevent loops; that is, no path visits a state more than once.
However, using the Visited list for very large spaces may not be appropriate as the
19
space requirements would be prohibitive.
SEARCH
4
Space (the final frontier)
In large search problems, memory is often the limiting factor.
• Imagine searching a tree with branching factor 8 and depth 10.
Assume a node requires just 8 bytes of storage. Then, BreadthFirst search might require up to:
(23)10 * 23 = 233 bytes = 8,000 Mbytes = 8 Gbytes
One strategy is to trade time for memory. For example, we can emulate
Breadth-First search by repeated applications of Depth-First search, each up
to a preset depth limit. This is called Progressive Deepening Search (PDS):
1.
2.
3.
C=1
Do DFS to max. depth C. If path is found, return it.
Otherwise, increment C and go to Step 2.
20
See Tutorial on Search
SEARCH
3
Simple Search Algorithm
A Search Node is a path from some state X to the start state
e.g. (X B A S)
The state of a search node is the most recent state of the path
e.g. X
1. Initialise Q with search node (S) as only entry; set Visited = (S).
Let Q be a list of search
nodes
e.g. (X B A S) (C B A S)
Let
…) S be the start state.
2. If Q is empty, fail. Else, pick some search node N from Q.
3. If state (N) is a goal, return N (we’ve reached the goal).
4. (Otherwise) Remove N from Q.
5. Find all the descendants of state (N) not in Visited and create all the onestep extensions of N to each descendant.
6. Add the extended paths to Q; add children of state (N) to Visited.
Do NOT use
Visited List for
Optimal
Searching!
7. Go to Step 2.
Critical Decisions:
Step 2: picking N from Q.
Step 6: adding extensions of22N to
Q.
SEARCH
4
Uniform Cost
Step
Q
1
(0 S)
2
(2 AS)(5 BS)
3
(4 CAS)(6 DAS)(5 BS)
4
(6 DAS)(5 BS)
5
(6 DBS)(10 GBS)(6 DAS)
6
(8 GDBS)(9 CDBS)(10 GBS)(6 DAS)
7
(8 GDAS)(9 CDAS)(8 GDBS)(9 CDBS)
3
2
A
D
C
C
G
8
10
6
D
9
C
D
4
1
G
5
B
5
B
4
2
5
2
6
A
S
(10 GBS)
S
9
C
2
G
8
The sequence of path extensions
corresponds precisely to path-length
order, so it is not surprising we find
the shortest path.
G
0
2
4
5
6
6
8
Sequence of State Expansions: S – A – C – B – D – D - G
Uniform Cost enumerates paths in order of total path cost!
23
SEARCH
3
Simple Optimal Search Algorithm: Uniform Cost + Strict Expanded List
A Search Node is a path from some state X to the start state
e.g. (X B A S)
The state of a search node is the most recent state of the path
e.g. X
1. Initialise Q with search node (S) as only entry, set Expanded = {}.
2. If Q is empty, fail. Else, pick some search node N from Q.
Let Q be a list of search
nodes
3. If state (N) is a goal, return N (we’ve reached the goal).
e.g. (X B A S) (C B A S) …)
4. (Otherwise) Remove N from Q.
Let S be the start state.
Take note that we
need to add some
precautionary
measures in the
adding of extended
paths to Q.
In effect, discard that state
5. If state (N) in Expanded, go to Step 2; otherwise, add state (N)
to Expanded List.
6. Find all the children of state(N) (Not in Expanded) and create all
the one-step extensions of N to each descendant.
7. Add all the extended paths to Q. If descendant state already in
Q, keep only shorter path to state in Q.
8. Go to Step 2.
24
SEARCH
4
Uniform Cost (with Strict Expanded List)
Step
Q
Expanded
C
2
3
1
(0 S)
2
(2 AS)(5 BS)
2
A
S
S
(4 CAS)(6 DAS)(5 BS)
D
4
1
5
3
2
G
5
B
S, A
Remove because D is already in Q (our convention: take
element at the front of the Q if a path leading to the same
state is in Q).
4
(6 DAS)(5 BS)
S, A, C
5
(6 DBS)(10 GBS)(6 DAS)
S, A, C, B
6
(8 GDAS)(9 CDAS)(10 GBS) S, A, C, B, D
Remove because there’s a new shorter path to G
Remove because C was expanded already
Sequence of State Expansions: S – A – C – B – D - G
25
SEARCH
3
Why use estimate of goal distance?
• Order in which UC looks at
states. A and B are same
distance from start S, so will
be looked at before any
longer paths. No “bias”
towards goal.
A
S
B
G
• Assume states are points in
the Euclidean space
27
SEARCH
3
Why use estimate of goal distance?
• Order in which UC looks at
states. A and B are same
distance from start S, so will
be looked at before any
longer paths. No “bias”
towards goal.
A
S
• Assume states are points in
the Euclidean space
B
G
• Order of examination using
distance from S + estimate of
distance to G.
• Note: “bias” toward the goal;
The points away from G look
28
worse
SEARCH
3
Goal Direction
• UC is really trying to identify the shortest path to every state in the graph in
order. It has no particular bias to finding a path to a goal early in the search.
• We can introduce such a bias by means of heuristic function h(N), which is an
estimate (h) of the distance from a state to the goal.
• Instead of enumerating paths in order of just length (g), enumerate paths in
terms of
f = estimated total path length = g + h.
• An estimate that always underestimates the real path length to the goal is called
admissible. For example, an estimate of 0 is admissible (but useless). Straight
line distance is admissible estimate for path length in Euclidean space.
• Use of an admissible estimate guarantees that UC will still find the shortest path.
• UC with an admissible estimate is known as A* Search.
29
SEARCH
4
A*
Step Q
C
2
1
(0 S)
3
2
2
(4 AS)(8 BS)
3
(5 CAS)(7 DAS)(8 BS)
4
(7 DAS)(8 BS)
5
(8 GDAS)(10 CDAS)(8 BS)
A
S
2
D
4
1
5
G
5
B
Heuristic Values:
A=2, B=3, C=1, D=1, S=0, G=0
• Pick best (by path length + heuristic) element of Q, Add path extensions anywhere in
Q.
Sequence of State Expansions: S – A – C – D - G
30
SEARCH
4
States vs. Paths
b=2
d=0
d=1
d=2
d=3
We have ignored the issue of revisiting states in our discussion of Uniform-Cost Search
and A*. We indicated that we could not use the Visited List and still preserve
optimality, but can we use something else that will keep the worst-case cost of a
search proportional to the number of states in a graph rather than to the number of
non-looping paths?
31
SEARCH
4
Consistency
To enable implementing A* using the strict Expanded List, H needs to satisfy
the following consistency (also known as monotonicity) conditions
1. h(Si) = 0, if ni is a goal
2. h(Si) - h(Sj) <= c(Si, Sj), if nj a child of ni
That is, the heuristic cost in moving from one entry to the next cannot
decrease by more than the arc cost between the states. This is a kind of
triangle inequality This condition is a highly desirable property of a heuristic
function and often simply assumed (more on this later).
nj
h(Sj)
goal
C(Si, Sj)
ni
h(Si)
32
SEARCH
4
A* (with Strict Expanded List)
Note that the heuristic is admissible and consistent.
Step
Q
Expanded
List
1
(90 S)
2
(90 BS)(101 AS)
S
3
(101 AS)(104 CBS)
A, S
4
5
(102 CAS)(104 CBS)
(102 GCAS)
1
A
S
C
1
2
2
B
C, A, S
G, C, A, S
Heuristic Values:
A=100, B=88, C=100, S=90, G=0
If we modify the heuristic in the example we have been considering so that it is consistent, as we
have done here by increasing the value of h(B), then A* (with the Expanded List) will work.
Underlined paths are chosen for extension.
G
100
33
SEARCH
4
Dealing with inconsistent heuristic
What can we do if we have an inconsistent heuristic but we still want optimal paths?
Modify A* so that it detects and corrects when inconsistency has led us astray.
Assume we are adding node1 to Q and node2 is present in Expanded List
with node1.state = node2.state.
Strict:
• Do NOT add node1 to Q.
Non-Strict Expanded List:
• If (node1.path_length < node2.path_length), then
1. Delete node2 from Expanded List
2. Add node1 to Q.
34
SEARCH
4
Optimality and Worst Case Complexity
Algorithm Heuristic Expanded Optimality
List
Guaranteed?
Uniform
Cost
Worst Case &
Expansions
None
Strict
Yes
N
A*
Admissible
None
Yes
>N
A*
Consistent
Strict
Yes
N
A*
Admissible
Strict
No
N
A*
Admissible
Non-Strict
Yes
>N
35
Fuzzy Logic
Fuzzy Inference Process
Fuzzy Inference Process
e.g. theta
Fuzzification
e.g. force
Rule Evaluation
Defuzzification
Fuzzification: Translate input into truth values
Rule Evaluation: Compute output truth values
Defuzzification: Transfer truth values into output
Obstacle Avoidance Problem
Robot Navigation
obstacle
(obsx, obsy)

(x,y)
Obstacle Avoidance & Target Pursuit
Demonstration
Can you describe how the robot
should turn based on the position
and angle of the obstacle?
Another example: Fuzzy Sets for Robot Navigation
Angle and Distance
SMALL
MEDIUM
LARGE
NEAR
FAR
VERY FAR
*
Sub ranges for angles & distances overlap
Fuzzy Systems for Obstacle Avoidance
Vision System
Nearest Obstacle (Distance and Angle)
Fuzzy System 3 (Steering)
NEAR
FAR
VERY FAR
SMALL
Very Sharp
Sharp Turn
Med Turn
MEDIUM
Sharp Turn
Med Turn
Mild Turn
LARGE
Med Turn
Mild Turn
Zero Turn
e.g. If the Distance from the Obstacle is NEAR and
the Angle from the Obstacle is SMALL
Then turn Very Sharply.
Angle
Fuzzy System 4 (Speed Adjustment)
NEAR
FAR
VERY FAR
SMALL
Very Slow
Slow Speed
Fast Fast
MEDIUM
Slow Speed
Fast Speed
Very Fast
Very Fast
Top Speed
LARGE
Fast Speed
e.g. If the Distance from the Obstacle is NEAR and
the Angle from the Obstacle is SMALL
Then move Very Slowly.
Speed
Fuzzification
1. Fuzzification
Fuzzification Example
Fuzzy Sets = { Negative, Zero, Positive }
NEGATIVE
ZERO
POSITIVE
1.0
0.0
-3.0
-2.5
-1.0
-0.5
0.0
0.5
1.0
2.5
3.0
Assuming that we are using trapezoidal membership functions.
Crisp Input: x = 0.25
What is the degree of
membership of x in each of the
Fuzzy Sets?
Fuzzification
Fuzzification Example
Fuzzy Sets = { Negative, Zero, Positive }
NEGATIVE
ZERO
POSITIVE
1.0
0.0
-3.0
-2.5
-1.0
-0.5
0.0
0.5
1.0

 x a dx  
FZE (0.25)  max  min 
,1,
 ,0 
 ba d c  

2.5
3.0
Sample Calculations
Crisp Input:
Fzero(0.25)
x  0.25

 x a dx  
FZE (0.25)  max  min 
,1,
 ,0 
b

a
d

c

 


 0.25  (1) 1  0.25  
 max  min 
,1,
 ,0 
 0.25  (1) 1  0.25  

 max  min 1.67,1,1 , 0 
1
Fpositive(0.25)

 0.25  (0.5) 3  0.25  
FP (0.25)  max  min 
,1,
,0
0.25  
 0.5  (0.5) 33- 2.5

 max  min  0.75,1,5.5 ,0 
 0.75
Fnegative(0.25)

 0.25  (3)
0.5  0.25  
FN (0.25)  max  min 
,1,
 ,0 

2.5

(

3)
0.5

(

0.5)

 

 max  min  6.5,1,0.25 ,0 
 0.25
Sample Calculations
Crisp Input:
y  0.25
Fzero(-0.25)

 0.25  (1) 1  (0.25)  
FZE (0.25)  max  min 
,1,
,0
1  0.25  
 0.25  (1)

 max  min 1,1,1.67  , 0 
1
Fpositive(-0.25)

 0.25  (3) 3  (0.25)  
FP (0.25)  max  min 
,1,
 ,0 
3  2.5  
 0.5  (0.5)

 max  min  0.25,1,6.5 ,0 
 0.25
Fnegative(-0.25)

 0.25  (3) 0.5  (0.25)  
FN (0.25)  max  min 
,1,
 ,0 
0.5  (0.5)  
 2.5  (3)

 max  min  5.5,1,0.75 ,0 
 0.75
Trapezoidal Membership Functions
LeftTrapezoid
Left_Slope = 0
Right_Slope = 1 / (A - B)
CASE 1: X < a
Membership Value = 1
CASE 2: X >= b
Membership Value = 0
CASE 3: a < x < b
Membership Value = Right_Slope * (X - b)
a
b
Trapezoidal Membership Functions
RightTrapezoid
Left_Slope = 1 / (B - A)
Right_Slope = 0
a
b
CASE 1: X <= a
Membership Value = 0
CASE 2: X >= b
Membership Value = 1
CASE 3: a < x < b
Membership Value = Left_Slope * (X - a)
Trapezoidal Membership Functions
Regular Trapezoid
Left_Slope = 1 / (B - A)
Right_Slope = 1 / (C - D)
a
b
CASE 1: X <= a Or X >= d
Membership Value = 0
CASE 2: X >= b And X <= c
Membership Value = 1
CASE 3: X >= a And X <= b
Membership Value = Left_Slope * (X - a)
CASE 4: (X >= c) And (X <= d)
Membership Value = Right_Slope * (X - d)
c
d
Fuzzy Control
Different stages of Fuzzy control
2. Rule Evaluation
Inputs are applied to a set of if/then control rules.
e.g. IF temperature is very hot, THEN set fan speed very high.
The results of various rules are summed together to generate a set of “fuzzy outputs”.
FAMM
x
Outputs
NL=-5
NS=-2.5
ZE=0
PS=2.5
PL=5.0
N
y
N
ZE
P
ZE
P
NL
NS
NS
W1
W4
W7
NS
ZE
PS
W2
W5
W8
PS
PS
PL
W3
W6
W9
Fuzzy Control
2. Rule Evaluation
Assuming that we are using the conjunction operator (AND) in the
antecedents of the rules.
W1  min  FN (0.25), FN ( 0.25)   min 0.25, 0.75  0.25
FAMM
W2  min  FN (0.25), FZE ( 0.25)   min 0.25,1  0.25
x
N
y
N
ZE
P
ZE
W3  min  FN (0.25), FP (0.25)   min 0.25, 0.25  0.25
P
NL
NS
NS
NS
ZE
PS
PS
PS
PL
W4  min  FZE (0.25), FN ( 0.25)   min 1, 0.75  0.75
W5  min  FZE (0.25), FZE (0.25)   min 1,1  1
W6  min  FZE (0.25), FP (0.25)   min 1, 0.25  0.25
W7  min  FP (0.25), FN (0.25)   min 0.75, 0.75  0.75
W1
W4
W7
W2
W5
W8
W3
W6
W9
W8  min  FP (0.25), FZE ( 0.25)   min 0.75,1  0.75
W9  min  FP (0.25), FP ( 0.25)   min 0.75, 0.25  0.25
3. Defuzzification
Fuzzy Control
Defuzzification Example
Assuming that we are using the center of mass defuzzification method.
OUTPUT 
(W1  NL  W2  NS  W3  PS  W4  NS  W5  ZE  W6  PS  W7  NS  W8  PS  W9  PL)
9
W
i 1

i
 0.25  (5)  0.25  2.5  0.25  2.5  0.75  2.5  1 0  0.25  2.5  0.75  2.5  0.75  2.5  0.25  5
 0.25  0.25  0.25  0.75  1  0.25  0.75  0.75  0.25
= -1.25/ 4.5 = -0.278
FAMM
x
W1
W4
W7
W2
W5
W8
W3
W6
W9
N
y
N
ZE
P
ZE
P
NL
NS
NS
NS
ZE
PS
PS
PS
PL
Neural Networks
Training a Network
BACKPROPAGATION TRAINING
O
k (OUTPUT)
bz
-3.29
1
z
0.91
k
Wjk
Wik
10.9
-4.95
O
j (HIDDEN)
j
7.1
Wij
O
0.98
i (INPUT)
i
tk – target output, Ok – actual output
x
-2.76
1
bh
-4.95
h
7.1
y
BACKPROPAGATION TRAINING
 k  (tk  Ok )Ok (1  Ok )
W jk  W jk   k O j
  learning _ rate
 k  error _ signal
O j  output _ of _ unit _ j
Wik  Wik  k Oi
 j  Oj(1  Oj ) kWjk
k
Oi  output _ of _ unit _ i
Wij  Wij  j Oi
Ok(0)
Ok(1)
Oj
  learning _ rate
 k  error _ signal
Ok(2)
k (OUTPUT)
j (HIDDEN)
k –subscript for all output nodes that connect to node j
Training a Network
BACKPROPAGATION TRAINING
We will now look at the formulas for adjusting the weights that lead into the output units of
a backpropagation network. The actual activation value of an output unit, k, will be ok and
the target for unit, k, will be tk . First of all there is a term in the formula for k , the error
signal:
bz
-3.29
1
where f’ is the derivative of the activation function, f . If we
use the usual activation function:
z
0.91
10.9
-4.95
0.98
7.1
the derivative term is:
x
-2.76
1
bh
-4.95
h
7.1
y
Training a Network
BACKPROPAGATION TRAINING
The formula to change the weight, wjk between the output unit, k, and unit j is:
where  is some relatively small positive constant called the
learning rate. With the network given, assuming that all
weights start with zero values, and with  = 0.1 we have:
bz
0.0
z
0.5
1
0.0
0.0
0.5
0.0
x
0.0
1
bh
0.0
h
0.0
y
Training a Network
BACKPROPAGATION TRAINING
The formula for computing the error j for a hidden unit, j, is:
The k subscript is for all the units in the output layer
however in this example there is only one unit. In the
example, then:
bz
0.0
z
0.5
1
0.0
0.0
0.5
0.0
x
0.0
1
bh
0.0
h
0.0
y
Training a Network
BACKPROPAGATION TRAINING
The weight change formula for a weight, wij that goes between the hidden unit, j and the input
unit, i is essentially the same as before:
bz
0.0
1
The new weights will be:
z
0.5
0.0
0.0
0.5
0.0
x
0.0
1
bh
0.0
h
0.0
y
Backpropagation Training
Iterative minimization of error over training set:
1.
2.
3.
4.
5.
6.
7.
Put one of the training patterns to be learned on the input units.
Find the values for the hidden unit and output unit.
Find out how large the error is on the output unit.
Use one of the back-propagation formulas to adjust the weights
leading into the output unit.
Use another formula to find out errors for the hidden layer unit.
Adjust the weights leading into the hidden layer unit via another
formula.
Repeat steps 1 thru 6 for the second, third patterns, etc…
Sum of Squared Errors
• This error is minimised during training.
















m
n
E
 (T O )
k
i 1 k 1 k

where Tk is the target output;
Ok is the actual output of the network;
m is the total number of output units;
n is the number of training exemplars;
Training Neural Nets
•
•
•
1.
2.
3.
4.
5.
6.
Given: Data set, desired outputs and a Neural Net with m weights.
Find a setting for the weights that will give good predictive
performance on new data.
Estimate expected performance on new data.
Split data set (randomly) into three subsets:
 Training set – used for picking weights
 Validation set – used to stop training
 Test set – used to evaluate performance
Pick random, small weights as initial values.
Perform iterative minimization of error over training set.
Stop when error on validation set reaches a minimum (to avoid
over-fitting).
Repeat training (from Step 2) several times (to avoid local minima).
Use best weights to compute error on test set, which is the
estimate of performance on new data. Do not repeat training to
improve this.
Multi-Layer Feed-Forward Neural Network
Why do we need BIAS UNITS (or Threshold Nodes)?
Apart from improving the speed of learning for some problems (XOR problem), bias units or
threshold nodes are required for universal approximation. Without them, the feedforward
network always assigns 0 output to 0 input. Without thresholds, it would be impossible to
approximate functions which assign nonzero output to zero input. Threshold nodes are
needed in much the same way that the constant polynomial ‘1’ is required for approximation
by polynomials.
bz
-3.29
1
z
0.91
10.9
-4.95
0.98
7.1
x
-2.76
1
bh
-4.95
h
7.1
y
Data Sets
•
Split data set (randomly) into three subsets:
1. Training set – used for picking weights
2. Validation set – used to stop training
3. Test set – used to evaluate performance
Training
Input Representation
• All the signals in a neural net are [0, 1]. Input values should
also be scaled to this range (or approximately so) so as to
speed training.
• If the input values are discrete, e.g. {A,B,C,D} or {1,2,3,4}, they
need to be coded in unary form.
Output Representation
• A neural net with a single sigmoid output is
aimed at binary classification. Class is 0 if y <
0.5 and 1 otherwise.
• For multi-class problems
• Can use one output per class (unary encoding)
• There may be confusing outputs (two outputs > 0.5 in
unary encoding)
• More sophisticated method is to use special softmax
units, which force outputs to sum to 1.
Neural Networks
Regression Problems?
Classification Problems?
Backpropagation Learning
Overfitting problem in network training
Practice solving using the tutorials.
CSP
Constraint Propagation
Simple Backtracking (BT)
Simple Backtracking with Forward Checking (BT-FC)
Simple Backtracking with Forward Checking with Dynamic Variable and Value Ordering
Practice solving using the tutorials.
Games
Min-Max
Alpha-Beta
Practice solving using the tutorials