Transcript Slide 1

Neural Networks I

Karel Berkovec karel.berkovec (at) seznam.cz

Neural Networks I Karel Berkovec, 2007

Artificial Intelligence

Artificial Intelligence Symbolic approach Expert systems, mathematical logic, production systems, bayesian networks … Connectionist approach Neural Networks Adaptive approach Stochastic methodes Analytic approach Regression, interpolation, frequency analysis ..

Neural Networks I Karel Berkovec, 2007

Is it really working?

• Is it a standard mechanism?

• What is it good for?

• Use it someone for

real

applications?

• Can I grasp how it works?

• Can I use it?

Neural Networks I Karel Berkovec, 2007

This presentation

• Basic introduction • Small history window • Model of neuron and neural network • Supervised learning (backpropagation)

No

biology, mathematical fundaments, unsupervised learning, stochastic models, neurocomputers, etc.

Neural Networks I Karel Berkovec, 2007

History I

• 20s – von Neumann computer model • 1943 – Warren McCulloch and Walter Pitts – matematical model of neuron • 1946 – Eniac • 1949 – Donald Hebb – The Organization of Behaviour • 1951 – 1 st • 1951 – 1 st Czechoslovak computer SAPO neurocomputer

Snark

• 1957 – Frank Rosenblatt – algorithm

perceptron

+ learning • 1958 – Rosenblatt and Charless Wightman – 1 st really used neurocomputer

Mark I Perceptron Neural Networks I Karel Berkovec, 2007

History II

• 60s ADALINE • 1 st company oriented on neurocomputing Exhausting of potential • 1967 Marvin Minsky & Seymour Papert –

Perceptrons XOR problem can’t be solved by 1 perceptron Neural Networks I Karel Berkovec, 2007

History III

• 1983 – DARPA • 1982, 1984 - John Hopfield – physical models & NN • 1986 – David Rumehart, Geoffrey Hinton, Ronald Williams –

Backpropagation

– 1969 Arthur Bryson & Paul Webos – 1974 Paul Werbos – 1985 David Parker • 1987 – IEEE International Conference on Neural Networks • Since 90’ NN boom of NNs –

ART, BAM, RBF, spiking neurons Neural Networks I Karel Berkovec, 2007

Present

• Many models of neuron – Perceptron, RBF, spiking neuron … • Many approaches – backpropagation, hopfield learning, correlations, competitive learning, stochastic learning, … • Many libraries and modules – for Matlab, Statistica, Excel … • Many applications – forecasting, smoothing, recognition, classification, datamining, compression …

Neural Networks I Karel Berkovec, 2007

Pros and cons

+ Simple to use + Very good results + Fast results + Robust against incomplete or corrupted inputs + Generalization +/- Mathematical background - Not transparent and traceable - Hard to tune parameters (sometimes hair-triggered) - Sometimes a long time for learning needed - Some tasks are hard to formulate for NNs

Neural Networks I Karel Berkovec, 2007

Formal neuron - perceptron

x

1 • • • • •

x n w

1

w n

w i x i

   0  1  0  0  

w i x i w i

- potential - threshold - weights

Neural Networks I Karel Berkovec, 2007

AB problem

Neural Networks I Karel Berkovec, 2007

XOR problem

Neural Networks I Karel Berkovec, 2007

XOR problem

1

Neural Networks I

1

x

1

x

2

Karel Berkovec, 2007

XOR problem

2

Neural Networks I x

1 2

x

2

Karel Berkovec, 2007

XOR problem

XOR

(

x

1 ,

x

2 ) AND 1

x

1 2

x

2

Neural Networks I Karel Berkovec, 2007

Feed-forward layered network

NN

:

X

Y y

1

y

2

y

3 Output layer 2 nd hidden layer 1 st hidden layer

x

1

x

2

x

3

x

4

x

5

Neural Networks I

Input layer

Karel Berkovec, 2007

Activating function

Heaviside function

• • • • •

x

1

x n w

1 

w i x i

 

w n y

( 

w i x i

  ) 1

Saturated linear function

1 0

y

(  )  1  1

e

 

y

(  )  1 

e

  1 

e

 

Standard sigmoidal function

1

Hyperbolical tangents

1 0  1 0

Neural Networks I

0 1

Karel Berkovec, 2007

NN function

• NN maps input on output

NN

:

X

Y

• Feed-forward NN with one hidden layer and with sigmoidal activation function can approximate arbitrary closely any continuous function The question is how to set up parameters of the network.

Neural Networks I Karel Berkovec, 2007

NN learning

• Error function

E

 1 2

k p

   1

o O

(

y ko

d ko

) 2 • Perceptron adaptation rule:

w

(

ji t

 1 ) 

w

(

ji t

)  

x ki

(

y j

(

w

(

t

) ,

x k

) 

d kj

)

w

(

w

(

ji t

 1 )

ji t

 1 )  

w

(

ji t

)

w

(

ji t

)  

x ki x ki

y=0 d=1 y=1 d=0 • Algorithm with this learning rule convergates in finite time (if A and B separatable)

Neural Networks I Karel Berkovec, 2007

AB problem

Neural Networks I Karel Berkovec, 2007

Backpropagation

• The most often used learning algorithm for NNs – cca 80% • Fast convergation • Good results • Many modifications

Neural Networks I Karel Berkovec, 2007

Energetic function

• How to adapt weights of neurons in hidden layers?

• We would like to find a minimum of the error function - why not use a derivation?

E

k p

  1

E k

 1 2

k p

 1

o O

(

y ko

d ko

) 2

Neural Networks I Karel Berkovec, 2007

Error gradient

Adaptation rule:

w ij

(

t

 1 ) 

w ij

(

t

)   

w ij

(

t

) 

w ij

(

t

)   

E

w ij

(

t

)   

k

E k

w ij

(

t

)

Neural Networks I Karel Berkovec, 2007

Output layer

E

 1 2

j

 

O

(

y j

d j

) 2

y j

(  )  1  1

e

 

j

E

w ij

 

E

y j

y j

 

j

  

w ij j

E

y j

 (

y j

d j

) 

y j

  

E

w ij

 (

y j

d j

) ( 1 

e

 

j e

 

j

) 2

y i j

 ( 1 

e

 

j e

 

j

) 2 

j

 ( 

w ij y i

  )   

w ij j

y i Neural Networks I Karel Berkovec, 2007

E

 1 2

o

 

O

(

y o

d o

) 2

Hidden layer

y j

(  )  1  1

e

 

j

j

 ( 

w ij y i

  ) 

E

w ij

o

 

O

E

y o

y o

 

o

  

w ij o

o

 

E

y o

y o

 

o

  

w ij o

  

o

y j

y j

 

j

  

w ij j

E

w ij

o

 

O

o w jo e ij

  

y o j

w jo Neural Networks I e ij

 

y j

 

j

  

w ij j Karel Berkovec, 2007

Implementation BP

• • initialize network

nw ij

:  0 • repeat – update weights

w ij

  – for all patterns • count the result • count error • count 

w ij

(

t

)   

E k

w ij

(

t

)

nw ij

nw ij

  

w ij

(

t

) • until error is not small enough

Neural Networks I Karel Berkovec, 2007

Improvements of BP

• Momentum

w ij

(

t

 1 ) 

w ij

(

t

)   

w ij

(

t

)   

w ij

(

t

 1 ) • Adaptive learning parameters

w ij

(

t

 1 ) 

w ij

(

t

)   (

t

) 

w ij

(

t

) Other variants of BP: SuperSAB, QuickProp, Levenberg-Marquart alg.

Neural Networks I Karel Berkovec, 2007

Overfitting

Neural Networks I Karel Berkovec, 2007