Transcript Chapter 4 - Montana State University
Introduction to Neural Networks
John Paxton Montana State University Summer 2003
Chapter 4: Competition
• Force a decision (yes, no, maybe) to be made. •
Winner take all
is a common approach.
• Kohonen learning w j (new) = w j (old) + a (x – w j (old)) • w j is closest weight vector, determined by Euclidean distance.
MaxNet
• Lippman, 1987 • Fixed-weight competitive net.
• Activation function f(x) = x if x > 0, else 0.
• Architecture a 1 a 2 1 e 1
Algorithm
1.
2.
3.
w ij = 1 if i = j, otherwise – e a j (0) = s i , t = 0.
a j (t+1) = f[a j (t) – e * S k<>j a k (t)] 4.
go to step 3 if more than one node has a non-zero activation Special Case: More than one node has the same maximum activation.
Example
• s 1 = .5, s 2 = .1, e = .1
• a 1 (0) = .5, a 2 (0) = .1
• a 1 (1) = .49, a 2 (1) = .05
• a 1 (2) = .485, a 2 (2) = .001
• a 1 (3) = .4849, a 2 (3) = 0
Mexican Hat
• Kohonen, 1989 • Contrast enhancement • Architecture (w 0 , w 1 , w 2 , w 3 ) • w 0 (x i -> x i ) , w 1 (x i+1 -> x i and x i-1 ->x i ) x i-3 x i-2 0 x i-1 x i x i+1 x i+2 + + + x i+3 0
Algorithm
1.
2.
3.
4.
5.
initialize weights x i (0) = s i for some number of steps do x i (t+1) = f [ S w k x i+k (t) ] x i (t+1) = max(0, x i (t))
Example
• x 1 , x 2 , x 3 , x 4 , x 5 • radius 0 weight = 1 • radius 1 weight = 1 • radius 2 weight = -.5
• all other radii weights = 0 •
s
= (0 .5 1 .5 0) • f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise
Example
•
x
(0) = (0 .5 1 .5 1) • x 1 (1) = 1(0) + 1(.5) -.5(1) = 0 • x 2 (1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25
• x 3 (1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) = 2.0
• x 4 (1) = 1.25
• x 5 (1) = 0
Why the name?
• Plot x(0) vs. x(1) 2 x 1 x 2 x 3 x 4 x 5 1 0
Hamming Net
• Lippman, 1987 • Maximum likelihood classifier • The similarity of 2 vectors is taken to be n – H(v 1 , v 2 ) where H is the Hamming distance • Uses MaxNet with similarity metric
Architecture
• Concrete example: x 1 y 1 x 2 MaxNet y 2 x 3
Algorithm
1.
2.
3.
4.
w ij = s i (j)/2 n is the dimensionality of a vector y in.j
= S x i w ij + (n/2) select max(y in.j
) using MaxNet
Example
• Training examples: (1 1 1), (-1 -1 -1) • n = 3 • y in.1
• y in.2
= 1(.5) + 1(.5) + 1(.5) + 1.5 = 3 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0 • These last 2 quantities represent the Hamming distance • They are then fed into MaxNet.
Kohonen Self-Organizing Maps
• Kohonen, 1989 • Maps inputs onto one of m clusters • Human brains seem to be able to self organize.
Architecture
x 1 x n y 1 y m
Neighborhoods
• Linear 3 2 1 # 1 2 3 • Rectangular 2 2 2 2 2 2 1 1 1 2 2 1 # 1 2 2 1 1 1 2 2 2 2 2 2
Algorithm
1.
2.
3.
4. 5.
6.
initialize w ij select topology of y i select learning rate parameters while stopping criteria not reached for each input vector do compute D(j) = S (w ij – x i ) 2 for each j
7.
8.
9.
10.
Algorithm.
select minimum D(j) update neighborhood units w ij (new) = w ij (old) + a [x i – w ij (old)] update a reduce radius of neighborhood at specified times
Example
• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into two clusters a (0) = .6
a (t+1) = .5 * a (t) • random initial weights .2 .8
.6 .4
.5 .7
.9 .3
Example
• Present (1 1 0 0) • D(1) = (.2 – 1) 2 – 0) 2 = 1.86
+ (.6 – 1) 2 + (.5 – 0) 2 + (.9 • D(2) = .98
• D(2) wins!
Example
• w i2 (new) = w i2 (old) + .6[x i – w i2 (old)] .2 .92 (bigger) .6 .76 (bigger) .5 .28 (smaller) .9 .12 (smaller) • This example assumes no neighborhood
Example
• After many epochs 0 1 0 .5
.5 0 1 0 (1 1 0 0) -> category 2 (0 0 0 1) -> category 1 (1 0 0 0) -> category 2 (0 0 1 1) -> category 1
Applications
• Grouping characters • Travelling Salesperson Problem – Cluster units can be represented graphically by weight vectors – Linear neighborhoods can be used with the first and last cluster units connected
Learning Vector Quantization
• Kohonen, 1989 • Supervised learning • There can be several output units per class
Architecture
• Like Kohonen nets, but no topology for output units • Each y i represents a known class x 1 y 1 y m x n
Algorithm
1.
2.
3.
4.
Initialize the weights (first m training examples, random) choose a while stopping criteria not reached do (number of iterations, a is very small) for each training vector do
5.
6.
7.
Algorithm
find minimum || x – w j || if minimum is target class w j (new) = w j (old) + a [x – w j (old)] else w j (new) = w j (old) – reduce a a [x – w j (old)]
Example
• (1 1 -1 -1) belongs to category 1 • (-1 -1 -1 1) belongs to category 2 • (-1 -1 1 1) belongs to category 2 • (1 -1 -1 -1) belongs to category 1 • (-1 1 1 -1) belongs to category 2 • 2 output units, y 1 and y 2 represents category 1 represents category 2
Example
• Initial weights (where did these come from?
1 1 -1 -1 -1 -1 -1 1 a = .1
Example
• Present training example 3, (-1 -1 1 1). It belongs to category 2.
• D(1) = 16 = (1 + 1) 2 + (1 + 1) + (-1-1) 2 2 + (-1 -1) 2 • D(2) = 4 • Category 2 wins. That is correct!
Example
• w2(new) = (-1 -1 -1 1) + .1[(-1 -1 1 1) - (-1 -1 -1 1)] = (-1 -1 -.8 1)
Issues
• How many y i should be used?
• How should we choose the class that each y i should represent?
• LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes
Counterpropagation
• Hecht-Nielsen, 1987 • There are input, output, and clustering layers • Can be used to compress data • Can be used to approximate functions • Can be used to associate patterns
Stages
• Stage 1: Cluster input vectors • Stage 2: Adapt weights from cluster units to output units
Stage 1 Architecture
x 1 w 11 z 1 v 11 y 1 x n z p y m
Stage 2 Architecture
x* 1 t j1 x* n z j v j1 y* 1 y* m
Full Counterpropagation
• Stage 1 Algorithm 1.
initialize weights, a, b 2.
while stopping criteria is false do 3.
4.
5.
for each training vector pair do minimize ||x – w j || + ||y – v j || w j (new) = w j (old) + a [x – w j (old)] v j (new) = v j (old) + b [y-v j (old)] reduce a, b
Stage 2 Algorithm
1.
2.
3.
4.
while stopping criteria is false for each training vector pair do perform step 4 above t j (new) = t j (old) + a [x – t j (old)] v j (new) = v j (old) + b [y – v j (old)]
Partial Example
• Approximate y = 1/x [0.1, 10.0] • 1 x unit • 1 y unit • 10 z units • 1 x* unit • 1 y* unit
Partial Example
• v 11 • v 12 • … • v 10,1 = .11, w 11 = .14, w 12 = 9.0, w = 9.0
= 7.0
10,1 = .11
• test .12, predict 9.0.
• In this example, the output weights will converge to the cluster weights.
Forward Only Counterpropagation
• Sometimes the function y = f(x) is not invertible.
• Architecture (only 1 z unit active) x 1 z 1 y 1 x n z p y m
1.
2.
3.
4.
5.
Stage 1 Algorithm
initialize weights, a (.1), b (.6) while stopping criteria is false do for each input vector do find minimum || x – w|| w(new) = w(old) + a [x – w(old)] reduce a
1.
2.
3.
4.
Stage 2 Algorithm
while stopping criteria is false do for each training vector pair do find minimum || x – w || w(new) = w(old) + a [x – w(old)] v(new) = v(old) + b [y – v(old)] reduce b Note: interpolation is possible.
Example
• y = f(x) over [0.1, 10.0] • 10 z i units • After phase 1, z i • After phase 2, z i = 0.5, 1.5, …, 9.5.
= 5.5, 0.75, …, 0.1