Document

Transcript Document

Before we start ADALINE

Test the response of your Hebb and Perceptron on this
following noisy version

Exercise pp98 2.6(d)
ADALINE



ADAPTIVE LINEAR NEURON
Typically uses bipolar (1, -1) activations for its input
signal and its target output
The weights are adjustable, has bias whose activation is
always 1
Input Unit
Output Unit
1
X1
:
b
w1
Y
w2
Xn
Architecture of an ADALINE
ADALINE

In general ADALINE can be trained using the delta rule
also known as least mean squares (LMS) or Widrow-Hoff
rule

The delta rule can also be used for single layer nets with
several output units

ADALINE – a special one - only one output unit
ADALINE

Activation of the unit
 Is the net input with identity function
 The learning rule
minimizes the mean squares error
between the activation and the target value
 Allows the
net to continue learning on all training
patterns, even after the correct output value is
generated
ADALINE

After training, if the net is being used for pattern
classification in which the desired output is either a +1 or
a -1, a threshold function is applied to the net input to
obtain the activation
If net_input ≥ 0 then activation = 1
Else activation = -1
The Algorithm
Step 0:
Step 1:
Initialize all weights and bias:
(small random values are usually used0
Set learning rate  (0 <  ≤ 1)
=0
While stopping condition is false, do steps 2-6.
Step2:For each bipolar training pair s:t, do steps 3-5
Step 3. Set activations for input units:
i = 1, …, n:
xi = si
Step 4.Compute net input to output unit:
NET = y_in = b +  xi wi ;
The Algorithm
Step 5.
Update weights and bias i = 1, …, n
wi(new) = wi(old) +  (t – y_in)xi
b(new) = b(old) +  (t – y_in)
else
wi(new) = wi(old)
b(new) = b(old)
Step 6.
Test stopping condition:
If the largest weight change that occurred in Step
2 is smaller than a specified tolerance, then stop;
otherwise continue.
Setting the learning rate 




Common to take a small value for  = 0.1 initially
If  too large, the learning process will not
converge
If  too small learning will be extremely slow
For single neuron, a practical range is
0.1 ≤ n ≤ 1.0
Application
After training, an ADALINE unit can be used to classify input patterns. If
the target values are bivalent (binary or bipolar), a step function can be
applied as activation function for the output unit
Step 0:
Step 1:
Initialize all weights
For each bipolar input vector x, do steps 2-4
Step 2. Set activations for input units to x
Step 3. Compute net input to output unit:
net = y_in = b +  xi wi ;
Step 4. Apply the activation function
f(y_in)
1
if y_in ≥ 0;
-1
if y_in < 0.
Example 1

ADALINE for AND function: binary input, bipolar targets
(x1 x2 t)
(1 1 1)
(1 0 -1)
(0 1 -1)
(0 0 -1)

Delta rule in ADALINE is designed to find weights that minimize
the total error
Associated target for pattern p
4
E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2
p=1
Net input to the output unit for pattern p
Example 1

ADALINE for AND function: binary input, bipolar targets

Delta rule in ADALINE is designed to find weights that minimize
the total error

Weights that minimize this error are w1 = 1, w2 = 1, w0 = -3/2

Separating lines x1 + x2 – 3/2 = 0
Example 2

ADALINE for AND function: bipolar input, bipolar targets
(x1 x2 t)
(1 1 1)
(1 -1 -1)
(-1 1 -1)
(-1 -1 -1)

Delta rule in ADALINE is designed to find weights that minimize
the total error
Associated target for pattern p
4
E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2
p=1
Net input to the output unit for pattern p
Example 2

ADALINE for AND function: bipolar input, bipolar targets

Weights that minimize this error are w1 = 1/2, w2 = 1/2, w0 = 1/2

Separating lines 1/2x1 +1/2 x2 – 1/2 = 0
Example

Example 3: ADALINE for AND NOT function: bipolar input,
bipolar targets

Example 4: ADALINE for OR function: bipolar input, bipolar
targets
Derivations

Delta rule for single output unit



The delta rule changes the weights of the connections
to minimize the difference between input and output
unit
By reducing the error for each pattern one at a time
The delta rule for Ith weight(for each pattern) is
wI =  (t – y_in)xI
Derivations
The squared error for a particular training pattern is
E = (t – y_in)2.
E : function of all weights wi, I = 1, …, n


The gradient of E is the vector consisting of the partial derivatives
of E with respect to each of the weights

The gradient gives the direction of most rapid increase in E

Opposite direction gives the most rapid decrease in the error

The error can be reduced by adjusting the weight wI in the
direction of
- E
wI
Derivations

Since
y_in =  xi wi ,
- E
wI
= -2(t – y_in) - y_in
wI
= -2(t – y_in)xI
The local error will be reduced most rapidly
by adjusting the weights according to the delta rule
wI =  (t – y_in)xI
Derivations

Delta rule for multiple output unit

The delta rule for Ith weight(for each pattern) is
wIJ =  (t – y_inJ)xI
Derivations

The squared error for a particular training pattern is
m
E = (tj – y_inj)2.
j=1
E : function of all weights wi, I = 1, …, n

The error can be reduced by adjusting the weight wI in
the direction of
- E
wIJ
m
=   (tj – y_inj)2
wI j=1
=  (tJ – y_inJ)2
wI
Continued pp 88
Exercise


http://www.neural-networks-at-yourfingertips.com/adaline.html
Adaline Network Simulator
MADALINE

MANY ADAPTIVE LINEAR NEURON
1
1
b1
X1
b3
Z1
w11
v1
w12
Y
w21
v2
Z2
w22
b2
X2
1
Architecture of an MADALINE with two
hidden ADALINES and one output ADALINE
MADALINE

Derivation of delta rule for several outputs shows no change in
the training process with several combination of ADALINEs

The outputs of two hidden ADALINES, z1 and z2 are
determined by signal from input units X1 and X2

Each output signal is the result of applying a threshold function
to the unit’s net input

y is the non-linear function of the input vector (x1, x2)
MADALINE

Why we need hidden units???
 The use of hidden units Z1 and Z2 give the net
 Computational capabilities not found in single layer nets
 But…complicate the training process

Two algorithms
 MRI – only weights for hidden ADALINES are adjusted,
the weights for output unit are fixed
 MRII – provides methods for adjusting all weights in the
net
ALGORITHM: MRI
1
1
b1
X1
b3
Z1
w11
w12
v1
Y
w21
Z2
w22
b2
X2
1
v2
The weights v1 and v2 and bias b3
that feed into the output unit Y are
determined so that the response of
unit Y is 1 if the signal it receives
from either Z1 or Z2 (or both) is 1
and is -1 if both Z1 and Z2 send a
signal of -1. The unit Y performs
the logic function OR on the signals
it receives from Z1 and Z2
Set v1 = ½, v2 = ½ and b3 = ½
see example 2.19 the OR function
ALGORITHM:
MRI
1
1
b1
X
x1
1
1
-1
-1
Z1
w11
w1
v1
Y
2
w21
v2
t
-1
1
1
-1
Set  = 0.5
b3
1
x2
1
-1
1
-1
Weights into
Z1
w11 w21 b1
Z2
w12 w22
.05
.1
.2
.3
.2
b2
Y
v1 v2
b3
.15
.5
.5
.5
Z2
w22
b2
X2
1
Set v1 = ½, v2 = ½ and b3 = ½
see example 2.19 the OR function
Step 0:
Step 1:
Initialize all weights and bias:
wi = 0 (i= 1 to n), b=0
Set learning rate  (0 <  ≤ 1)
=0
While stopping condition is false,
do steps 2-8.
1
-1
f(x)
if x ≥ 0
if x < 0
Step2: For each bipolar training pair s:t, do
steps 3-7
Step 3. Set activations for input units:
xi = si
Step 4.Compute net input to each hidden ADALINE unit:
z_in1 = b1+ x1 w11 + x2 w21 ;
z_in2 = b2+ x2 w12 + x2 w22 ;
Step 5. Determine output of each hidden ADALINE
z1 = f(z_in1)
z2 = f(z_in2)
Step 6. Determine output of net:
y_in = b3+ z1 v1 + z2 v2
1
1
b1
X1
b3
Z1
w11
w12
v1
Y
w21
Z2
w22
b2
X2
1
v2
The Algorithm
Step 7. Update weights and bias if an error occurred for this pattern
If t = y, no weight updates are performed
otherwise;
If t = 1, then update weights on ZJ, the unit whose net input is
closest to 0,
wiJ(new) = wiJ(old) +  (1 – z_in)xi
bJ(new) = bJ(old) +  (1 – z_inJ)
If t = -1, then update weights on all units ZK, that have positive net
input,
wik(new) = wik(old) +  (-1 – z_in)xi
bk(new) = bk(old) +  (-1 – z_ink)
Step 8. Test stopping condition:
Of weight changes have stopped(or reached an acceptable level),
or if a specified maximum number of weight update iterations (Step 2)
have been performed, then stop; otherwise continue