Supervised learning network-latest
Download
Report
Transcript Supervised learning network-latest
Supervised learning network
G.Anuradha
Architecture
• Earlier attempts to build intelligent and self
learning systems using simple components
• Used to solve simple classification problems
• Used by Rosenblatt to explain the patternrecognition abilities of biological visual systems.
Sensory Unit
Associator Unit
Binary activation function
Response Unit
Activation +1 0 -1
Quiz
• Which of the features would probably not
be useful for classifying handwritten digits
from binary images?
Raw pixels from images
Set of strokes that can be combined to
form various digits
Day of the year on which the digits were
drawn
Number of pixels set to one
Perceptron networks-Theory
single-layer feed forward networks
1.
It has 3 units:,
1.
2.
3.
2.
3.
4.
input(sensory),
hidden(associator unit)
Output (response unit)
Input-hidden fixed weights -1,0,1 assigned at random,
binary activation fn:
Output unit (1,0,-1) activation, binary step fn: with
threshold θ
Output of perceptron is y f ( yin)
1ifyin
f ( yin) 0if yin
1 yin
Perceptron theory
5. Weight updation between hidden and output
unit
6. Checks out for error between hidden and output
layer
7. Error=target-calculated
8. weights are adjusted in case of error
wi (new) wi (old) txi
b(new) b(old) t
α is the learning rate, ‘t’ is the target which is -1 or 1.
No error-no weight change-training is stopped
Single classification perceptron
network
x
0
1
x1
X1
b
w1
xi
Xi
xn
Xn
wi
Y
wn
y
Perceptron training algo for single
output classes
• Step 0: initialize weights,bias,learning rate(between 0
and1)
• Step 1: perform step 2-6 until final stopping condition is
false
• Step 2: perform steps 3-5 for each training pair indicated
by s:t
• Step 3: input layer is applied with identity activation fn:
– xi=si
• Step 4: calculate yin
y=f(yin)
1ifyin
f ( yin) 0if yin
1yin
Perceptron training algo for single
output classes
• Step 5: Weight and bias adjustment: Compare
the value of actual and desired(target)
If y≠t
else
wi (new) wi (old ) txi
b(new) b(old ) t
wi(new)=wi(old)
b(new)=b(old)
•Step 6: train the network until there is no weight change. This is the
stopping condition for the network. If not met start from Step n2
EXAMPLE
Start
Stop
If
weight
change
s
Initialize weights
and bias
Set α (0 to 1)
W(new)=w(old)
B(new)=b(old)
For
each
s:t
wi (new) wi (old) txi
b(new) b(old) t
If y!=t
Y
Activate input
units
Xi=si
Calculate net
input
Apply activation
function y=f(yin)
Perceptron training algo for multiple
output classes
• Step 0: Initialize the weights, biases, and
learning rate suitably
• Step 1: Check for stopping condition; if
false then perform steps 2-6
• Step 2: Perform steps 3 to 5 for each
bipolar or binary training vector pair s:t
• Step 3: Set activation(identity) a each
input unit i=1 to n xi=si
Perceptron training algo for multiple
output classes
• Step 4: calculate output response
n
yinj bj xiwij
i 1
Activations are applied over the net input to calculate the output
response
1ifyin
f ( yin) 0if yin
1 yin
Perceptron training algo for multiple
output classes
• Step 5: Make adjustment in weights and bias for
j=1 to m and i=1 to n
If ti≠yj then
wij(new) wij(old) tjxi
else
wij(new) wij(old )
bj (new) bj (old )
Step 6: Check for stopping condition. No change in weights then stop training
process
Example of AND
Linear separability
• Perceptron network is used for linear
separability concept.
• Separating line is based of threshold θ
• The condition for separating the response
from region of positive to region of zero is
w1x1+w2x2+b> θ
• The condition for separating the response
from region of zero to region of negative is
w1x1+w2x2+b<- θ
What binary threshold neurons cannot do
• A binary threshold output unit cannot even tell if two single bit features are
the same!
Positive cases (same):
(1,1) 1;
(0,0) 1
Negative cases (different): (1,0) 0;
(0,1) 0
• The four input-output pairs give four inequalities that are impossible to
satisfy:
w1 + w2 ³ q , 0 ³ q
w1 < q ,
w2 < q
-q w1
w2
1
x2
x1
A geometric view of what binary threshold neurons cannot do
Imagine “data-space” in which the
axes correspond to components of an
input vector.
– Each input vector is a point in this
space.
– A weight vector defines a plane in
data-space.
– The weight plane is perpendicular
to the weight vector and misses
the origin by a distance equal to
the threshold.
0,1
1,1
0,0
1,0
The positive and negative cases
cannot be separated by a plane
Discriminating simple patterns
under translation with wrap-around
• Suppose we just use pixels as
the features.
• Can a binary threshold unit
discriminate between different
patterns that have the same
number of on pixels?
– Not if the patterns can
translate with wrap-around!
pattern A
pattern A
pattern A
pattern B
pattern B
pattern B
Learning with hidden units
•
•
•
For such linear separability problem we require an additional layer called as
hidden layer.
Networks without hidden units are very limited in the input-output mappings they
can learn to model..
We need multiple layers of adaptive, non-linear hidden units.
Solution to EXOR problem
ADALINE
• A network with a single linear unit is called
an ADALINE (ADAptive LINear Neuron)
• Input-output relationship is linear
• Uses bipolar activation for its input signals
and its target output
• Weights between the input and output are
adjustable and has only one output unit
• Trained using Delta rule (Least mean
square) or (Widrow-Hoff rule)
Architecture
• Delta rule for Single output unit
– Minimize the error over all training patterns.
– Done by reducing the error for each pattern one at a time
• Delta rule for adjusting the weight for ith pattern is
(i=1to n)
wi (t yin) xi
• Delta rule in case of several output units for adjusting
the weight from ith input unit to jth output unit
wij (t yinj ) xi
Difference between Perceptron
and Delta Rule
Perceptron
Delta
Originates from hebbian
assumption
Derived from gradientdescent method
Stops after a finite number
of learning steps
Continuous forever
converging asymptotically
to the solution
Minimizes error over all
training patterns
Architecture
x0=1
1
b
x1
X1
w1
yin= x1wi
f(yin)
w2
x2
X2
wn
yin
xn
Xn
e=t-yin
Adaptive
algorithm
O/p error
generator
t
Start
Stop
Y
Initialize weights
and bias and α
If Ei=Es
Input the specified
tolerance error Es
Calculate error
Ei=Σ(t-yin)2
For
each
s:t
wi (new) wi (old ) (t yin) xi
b(new) b(old ) (t yin)
Y
Activate input
units
Xi=si
Calculate net input
Yin=b+Σxi wi
Madaline
• Two or more adaline are integrated to develop madaline
model
• Used for nonlinearly separable logic functions (EX-OR)
function
• Used for adaptive noise cancellation and adaptive
inverse control
• In noise cancellation the objective is to filter out an
interference component by identifying a linear model of a
measurable noise source and the corresponding
immeasurable interference.
• ECG, echo elimination from long distance telephone
transmission lines