Transcript Slide 1
Neural Networks I
Karel Berkovec karel.berkovec (at) seznam.cz
Neural Networks I Karel Berkovec, 2007
Artificial Intelligence
Artificial Intelligence Symbolic approach Expert systems, mathematical logic, production systems, bayesian networks … Connectionist approach Neural Networks Adaptive approach Stochastic methodes Analytic approach Regression, interpolation, frequency analysis ..
Neural Networks I Karel Berkovec, 2007
Is it really working?
• Is it a standard mechanism?
• What is it good for?
• Use it someone for
real
applications?
• Can I grasp how it works?
• Can I use it?
Neural Networks I Karel Berkovec, 2007
This presentation
• Basic introduction • Small history window • Model of neuron and neural network • Supervised learning (backpropagation)
No
biology, mathematical fundaments, unsupervised learning, stochastic models, neurocomputers, etc.
Neural Networks I Karel Berkovec, 2007
History I
• 20s – von Neumann computer model • 1943 – Warren McCulloch and Walter Pitts – matematical model of neuron • 1946 – Eniac • 1949 – Donald Hebb – The Organization of Behaviour • 1951 – 1 st • 1951 – 1 st Czechoslovak computer SAPO neurocomputer
Snark
• 1957 – Frank Rosenblatt – algorithm
perceptron
+ learning • 1958 – Rosenblatt and Charless Wightman – 1 st really used neurocomputer
Mark I Perceptron Neural Networks I Karel Berkovec, 2007
History II
• 60s ADALINE • 1 st company oriented on neurocomputing Exhausting of potential • 1967 Marvin Minsky & Seymour Papert –
Perceptrons XOR problem can’t be solved by 1 perceptron Neural Networks I Karel Berkovec, 2007
History III
• 1983 – DARPA • 1982, 1984 - John Hopfield – physical models & NN • 1986 – David Rumehart, Geoffrey Hinton, Ronald Williams –
Backpropagation
– 1969 Arthur Bryson & Paul Webos – 1974 Paul Werbos – 1985 David Parker • 1987 – IEEE International Conference on Neural Networks • Since 90’ NN boom of NNs –
ART, BAM, RBF, spiking neurons Neural Networks I Karel Berkovec, 2007
Present
• Many models of neuron – Perceptron, RBF, spiking neuron … • Many approaches – backpropagation, hopfield learning, correlations, competitive learning, stochastic learning, … • Many libraries and modules – for Matlab, Statistica, Excel … • Many applications – forecasting, smoothing, recognition, classification, datamining, compression …
Neural Networks I Karel Berkovec, 2007
Pros and cons
+ Simple to use + Very good results + Fast results + Robust against incomplete or corrupted inputs + Generalization +/- Mathematical background - Not transparent and traceable - Hard to tune parameters (sometimes hair-triggered) - Sometimes a long time for learning needed - Some tasks are hard to formulate for NNs
Neural Networks I Karel Berkovec, 2007
Formal neuron - perceptron
x
1 • • • • •
x n w
1
w n
w i x i
0 1 0 0
w i x i w i
- potential - threshold - weights
Neural Networks I Karel Berkovec, 2007
AB problem
Neural Networks I Karel Berkovec, 2007
XOR problem
Neural Networks I Karel Berkovec, 2007
XOR problem
1
Neural Networks I
1
x
1
x
2
Karel Berkovec, 2007
XOR problem
2
Neural Networks I x
1 2
x
2
Karel Berkovec, 2007
XOR problem
XOR
(
x
1 ,
x
2 ) AND 1
x
1 2
x
2
Neural Networks I Karel Berkovec, 2007
Feed-forward layered network
NN
:
X
Y y
1
y
2
y
3 Output layer 2 nd hidden layer 1 st hidden layer
x
1
x
2
x
3
x
4
x
5
Neural Networks I
Input layer
Karel Berkovec, 2007
Activating function
Heaviside function
• • • • •
x
1
x n w
1
w i x i
w n y
(
w i x i
) 1
Saturated linear function
1 0
y
( ) 1 1
e
y
( ) 1
e
1
e
Standard sigmoidal function
1
Hyperbolical tangents
1 0 1 0
Neural Networks I
0 1
Karel Berkovec, 2007
NN function
• NN maps input on output
NN
:
X
Y
• Feed-forward NN with one hidden layer and with sigmoidal activation function can approximate arbitrary closely any continuous function The question is how to set up parameters of the network.
Neural Networks I Karel Berkovec, 2007
NN learning
• Error function
E
1 2
k p
1
o O
(
y ko
d ko
) 2 • Perceptron adaptation rule:
w
(
ji t
1 )
w
(
ji t
)
x ki
(
y j
(
w
(
t
) ,
x k
)
d kj
)
w
(
w
(
ji t
1 )
ji t
1 )
w
(
ji t
)
w
(
ji t
)
x ki x ki
y=0 d=1 y=1 d=0 • Algorithm with this learning rule convergates in finite time (if A and B separatable)
Neural Networks I Karel Berkovec, 2007
AB problem
Neural Networks I Karel Berkovec, 2007
Backpropagation
• The most often used learning algorithm for NNs – cca 80% • Fast convergation • Good results • Many modifications
Neural Networks I Karel Berkovec, 2007
Energetic function
• How to adapt weights of neurons in hidden layers?
• We would like to find a minimum of the error function - why not use a derivation?
E
k p
1
E k
1 2
k p
1
o O
(
y ko
d ko
) 2
Neural Networks I Karel Berkovec, 2007
Error gradient
Adaptation rule:
w ij
(
t
1 )
w ij
(
t
)
w ij
(
t
)
w ij
(
t
)
E
w ij
(
t
)
k
E k
w ij
(
t
)
Neural Networks I Karel Berkovec, 2007
Output layer
E
1 2
j
O
(
y j
d j
) 2
y j
( ) 1 1
e
j
E
w ij
E
y j
y j
j
w ij j
E
y j
(
y j
d j
)
y j
E
w ij
(
y j
d j
) ( 1
e
j e
j
) 2
y i j
( 1
e
j e
j
) 2
j
(
w ij y i
)
w ij j
y i Neural Networks I Karel Berkovec, 2007
E
1 2
o
O
(
y o
d o
) 2
Hidden layer
y j
( ) 1 1
e
j
j
(
w ij y i
)
E
w ij
o
O
E
y o
y o
o
w ij o
o
E
y o
y o
o
w ij o
o
y j
y j
j
w ij j
E
w ij
o
O
o w jo e ij
y o j
w jo Neural Networks I e ij
y j
j
w ij j Karel Berkovec, 2007
Implementation BP
• • initialize network
nw ij
: 0 • repeat – update weights
w ij
– for all patterns • count the result • count error • count
w ij
(
t
)
E k
w ij
(
t
)
nw ij
•
nw ij
w ij
(
t
) • until error is not small enough
Neural Networks I Karel Berkovec, 2007
Improvements of BP
• Momentum
w ij
(
t
1 )
w ij
(
t
)
w ij
(
t
)
w ij
(
t
1 ) • Adaptive learning parameters
w ij
(
t
1 )
w ij
(
t
) (
t
)
w ij
(
t
) Other variants of BP: SuperSAB, QuickProp, Levenberg-Marquart alg.
Neural Networks I Karel Berkovec, 2007
Overfitting
Neural Networks I Karel Berkovec, 2007