Chapter 4 - Montana State University

Download Report

Transcript Chapter 4 - Montana State University

Introduction to Neural Networks

John Paxton Montana State University Summer 2003

Chapter 4: Competition

• Force a decision (yes, no, maybe) to be made. •

Winner take all

is a common approach.

• Kohonen learning w j (new) = w j (old) + a (x – w j (old)) • w j is closest weight vector, determined by Euclidean distance.

MaxNet

• Lippman, 1987 • Fixed-weight competitive net.

• Activation function f(x) = x if x > 0, else 0.

• Architecture a 1 a 2 1 e 1

Algorithm

1.

2.

3.

w ij = 1 if i = j, otherwise – e a j (0) = s i , t = 0.

a j (t+1) = f[a j (t) – e * S k<>j a k (t)] 4.

go to step 3 if more than one node has a non-zero activation Special Case: More than one node has the same maximum activation.

Example

• s 1 = .5, s 2 = .1, e = .1

• a 1 (0) = .5, a 2 (0) = .1

• a 1 (1) = .49, a 2 (1) = .05

• a 1 (2) = .485, a 2 (2) = .001

• a 1 (3) = .4849, a 2 (3) = 0

Mexican Hat

• Kohonen, 1989 • Contrast enhancement • Architecture (w 0 , w 1 , w 2 , w 3 ) • w 0 (x i -> x i ) , w 1 (x i+1 -> x i and x i-1 ->x i ) x i-3 x i-2 0 x i-1 x i x i+1 x i+2 + + + x i+3 0

Algorithm

1.

2.

3.

4.

5.

initialize weights x i (0) = s i for some number of steps do x i (t+1) = f [ S w k x i+k (t) ] x i (t+1) = max(0, x i (t))

Example

• x 1 , x 2 , x 3 , x 4 , x 5 • radius 0 weight = 1 • radius 1 weight = 1 • radius 2 weight = -.5

• all other radii weights = 0 •

s

= (0 .5 1 .5 0) • f(x) = 0 if x < 0, x if 0 <= x <= 2, 2 otherwise

Example

x

(0) = (0 .5 1 .5 1) • x 1 (1) = 1(0) + 1(.5) -.5(1) = 0 • x 2 (1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25

• x 3 (1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) = 2.0

• x 4 (1) = 1.25

• x 5 (1) = 0

Why the name?

• Plot x(0) vs. x(1) 2 x 1 x 2 x 3 x 4 x 5 1 0

Hamming Net

• Lippman, 1987 • Maximum likelihood classifier • The similarity of 2 vectors is taken to be n – H(v 1 , v 2 ) where H is the Hamming distance • Uses MaxNet with similarity metric

Architecture

• Concrete example: x 1 y 1 x 2 MaxNet y 2 x 3

Algorithm

1.

2.

3.

4.

w ij = s i (j)/2 n is the dimensionality of a vector y in.j

= S x i w ij + (n/2) select max(y in.j

) using MaxNet

Example

• Training examples: (1 1 1), (-1 -1 -1) • n = 3 • y in.1

• y in.2

= 1(.5) + 1(.5) + 1(.5) + 1.5 = 3 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0 • These last 2 quantities represent the Hamming distance • They are then fed into MaxNet.

Kohonen Self-Organizing Maps

• Kohonen, 1989 • Maps inputs onto one of m clusters • Human brains seem to be able to self organize.

Architecture

x 1 x n y 1 y m

Neighborhoods

• Linear 3 2 1 # 1 2 3 • Rectangular 2 2 2 2 2 2 1 1 1 2 2 1 # 1 2 2 1 1 1 2 2 2 2 2 2

Algorithm

1.

2.

3.

4. 5.

6.

initialize w ij select topology of y i select learning rate parameters while stopping criteria not reached for each input vector do compute D(j) = S (w ij – x i ) 2 for each j

7.

8.

9.

10.

Algorithm.

select minimum D(j) update neighborhood units w ij (new) = w ij (old) + a [x i – w ij (old)] update a reduce radius of neighborhood at specified times

Example

• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into two clusters  a (0) = .6

 a (t+1) = .5 * a (t) • random initial weights .2 .8

.6 .4

.5 .7

.9 .3

Example

• Present (1 1 0 0) • D(1) = (.2 – 1) 2 – 0) 2 = 1.86

+ (.6 – 1) 2 + (.5 – 0) 2 + (.9 • D(2) = .98

• D(2) wins!

Example

• w i2 (new) = w i2 (old) + .6[x i – w i2 (old)] .2 .92 (bigger) .6 .76 (bigger) .5 .28 (smaller) .9 .12 (smaller) • This example assumes no neighborhood

Example

• After many epochs 0 1 0 .5

.5 0 1 0 (1 1 0 0) -> category 2 (0 0 0 1) -> category 1 (1 0 0 0) -> category 2 (0 0 1 1) -> category 1

Applications

• Grouping characters • Travelling Salesperson Problem – Cluster units can be represented graphically by weight vectors – Linear neighborhoods can be used with the first and last cluster units connected

Learning Vector Quantization

• Kohonen, 1989 • Supervised learning • There can be several output units per class

Architecture

• Like Kohonen nets, but no topology for output units • Each y i represents a known class x 1 y 1 y m x n

Algorithm

1.

2.

3.

4.

Initialize the weights (first m training examples, random) choose a while stopping criteria not reached do (number of iterations, a is very small) for each training vector do

5.

6.

7.

Algorithm

find minimum || x – w j || if minimum is target class w j (new) = w j (old) + a [x – w j (old)] else w j (new) = w j (old) – reduce a a [x – w j (old)]

Example

• (1 1 -1 -1) belongs to category 1 • (-1 -1 -1 1) belongs to category 2 • (-1 -1 1 1) belongs to category 2 • (1 -1 -1 -1) belongs to category 1 • (-1 1 1 -1) belongs to category 2 • 2 output units, y 1 and y 2 represents category 1 represents category 2

Example

• Initial weights (where did these come from?

1 1 -1 -1 -1 -1 -1 1  a = .1

Example

• Present training example 3, (-1 -1 1 1). It belongs to category 2.

• D(1) = 16 = (1 + 1) 2 + (1 + 1) + (-1-1) 2 2 + (-1 -1) 2 • D(2) = 4 • Category 2 wins. That is correct!

Example

• w2(new) = (-1 -1 -1 1) + .1[(-1 -1 1 1) - (-1 -1 -1 1)] = (-1 -1 -.8 1)

Issues

• How many y i should be used?

• How should we choose the class that each y i should represent?

• LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes

Counterpropagation

• Hecht-Nielsen, 1987 • There are input, output, and clustering layers • Can be used to compress data • Can be used to approximate functions • Can be used to associate patterns

Stages

• Stage 1: Cluster input vectors • Stage 2: Adapt weights from cluster units to output units

Stage 1 Architecture

x 1 w 11 z 1 v 11 y 1 x n z p y m

Stage 2 Architecture

x* 1 t j1 x* n z j v j1 y* 1 y* m

Full Counterpropagation

• Stage 1 Algorithm 1.

initialize weights, a, b 2.

while stopping criteria is false do 3.

4.

5.

for each training vector pair do minimize ||x – w j || + ||y – v j || w j (new) = w j (old) + a [x – w j (old)] v j (new) = v j (old) + b [y-v j (old)] reduce a, b

Stage 2 Algorithm

1.

2.

3.

4.

while stopping criteria is false for each training vector pair do perform step 4 above t j (new) = t j (old) + a [x – t j (old)] v j (new) = v j (old) + b [y – v j (old)]

Partial Example

• Approximate y = 1/x [0.1, 10.0] • 1 x unit • 1 y unit • 10 z units • 1 x* unit • 1 y* unit

Partial Example

• v 11 • v 12 • … • v 10,1 = .11, w 11 = .14, w 12 = 9.0, w = 9.0

= 7.0

10,1 = .11

• test .12, predict 9.0.

• In this example, the output weights will converge to the cluster weights.

Forward Only Counterpropagation

• Sometimes the function y = f(x) is not invertible.

• Architecture (only 1 z unit active) x 1 z 1 y 1 x n z p y m

1.

2.

3.

4.

5.

Stage 1 Algorithm

initialize weights, a (.1), b (.6) while stopping criteria is false do for each input vector do find minimum || x – w|| w(new) = w(old) + a [x – w(old)] reduce a

1.

2.

3.

4.

Stage 2 Algorithm

while stopping criteria is false do for each training vector pair do find minimum || x – w || w(new) = w(old) + a [x – w(old)] v(new) = v(old) + b [y – v(old)] reduce b Note: interpolation is possible.

Example

• y = f(x) over [0.1, 10.0] • 10 z i units • After phase 1, z i • After phase 2, z i = 0.5, 1.5, …, 9.5.

= 5.5, 0.75, …, 0.1