Transcript Document

Different classes of abstract models:
- Supervised learning
(EX: Perceptron)
Reinforcement learning
-Unsupervised learning
(EX: Hebb rule)
- Associative memory
(EX: Matrix memory)
Abstraction – so what is a neuron?
• Threshold unit (McCullough-Pitts)
x x  0
O   ( iwi xi  w0 ) where  ( x )  
0 x  0




O

w
x
• Linear:
i i i  w0


O

sig
(
w
x
• Sigmoid:
i i i  w0 )
THE
PERCEPTRON:
(Classification)
x x  0
Threshold unit: O   ( wi x  w0 ) where  ( x)  
i
0 x  0


where o  is the output for input pattern x,

Wi are the synaptic weights and y is the desired output
AND
o
w1 w2 w3 w4
x1 x2
x3
w5
x4
x5
x1
1
1
0
0
x2
1
0
1
0
y
1
0
0
0
AND
x1
1
1
0
0
x2
1
0
1
0
y
1
0
0
0
o
1
x1  x2  1.5  0
0
1
-1.5
1

x1
Linearly seprable
1

x2
OR
x1
1
1
0
0
x2
1
0
1
0
y
1
1
1
0
o
1
x1  x2  0.5  0
0
1
-0.5
1

x1
Linearly separable
1

x2
Perceptron learning rule:


A Convergence Proof exists

  (y o )


Wi   xi
o
w1 w2 w3 w4
x2
x3
w5
x4
x5
1. Show examples of Perceptron learning with demo program
2. Show the program itself
3. Talk about linear seperability, define dot product, show on
computer.
Unsupervised learning – the “Hebb” rule.
dWi
  xi y where xi are the inputs
dt
and y the output is assumed linear:
y  W j x j
j
Results in 2D
Example of Hebb in 2D
2
=/3
m
w
1
x2
0
-1
-2
-2
-1
x1
0
1
2
(Note: here inputs have a mean of zero)
• Show program, tilt axis, look a divergence
Why do we get these results?
On the board:
• Solve simple linear first order ODE
• Fixed points and their stability for non linear ODE
• Eigen-values, Eigen-vectors
In the simplest case, the change in synaptic weight w
is:
wi  xi y
where x are input vectors and y is the neural response.
Assume for simplicity a linear neuron: y 


So we get: wi     xi x j w j 
 j

w
j
xj
j
Now take an average with respect to the distribution
of inputs, get:


E[wi ]     E[ xi x j ]w j     Qij w j
j
 j

If a small change Δw occurs over a short time Δt then:
(in matrix notation)
dw
w

 Qw
t
dt
If <x>=0 , Q is the covariance matrix.
What is then the solution of this simple first order
linear ODE ?
(Show on board)
• Show program of Hebb rule again
• Show effect of saturation limits
• Possible solution – normalization
Oja (PCA) rule

dWi
2
  xi y  Wi y
dt

Show PCA program:

dWi
  xi y  Wi y 2
dt

1
0.8
0.6
0.4
0.2
W1
0
-0.2
-0.4
-0.6
-0.8
-1
-1
-0.5
0
0.5
1
W2
OK- some more programming – convert Hebb program
To Oja rule program
So OK , simulations, matlab, mathhhh etc.
What does this have to do with Biology, with the
brain?
Another unsupervised learning model:
The BCM theory of synaptic plasticity.
The BCM theory of cortical plasticity
BCM stands for Bienestock Cooper and Munro, it dates back to
1982. It was designed in order to account for experiments
which demonstrated that the development of orientation
selective cells depends on rearing in a patterned environment.
BCM Theory
(Bienenstock, Cooper, Munro 1982;
Intrator, Cooper 1992)
Requires
• Bidirectional synaptic modification
LTP/LTD
• Sliding modification threshold
• The fixed points depend on the
environment, and in a patterned
environment only selective fixed points
are stable.
LTD
LTP
The integral form of the average:
Is equivalent to this differential form:
Note, it is essential that θm is a superlinear function of the
history of C, that is:
with p>0
Note also that in the original BCM formulation (1982)
rather then
What is the outcome of the BCM theory?
Assume a neuron with N inputs (N synapses),
and an environment composed of N different
input vectors.
A N=2 example:
x1
x2
What are the stable
fixed points of m in
this case?
(Notation:
)
x1
x2
Note:
Every time a new input
is presented, m
changes, and so does
θm
What are the fixed points? What are the
stable fixed points?
The integral form of the average:
Is equivalent to this differential form:
Alternative form:
Show matlab example:
Two examples
with N= 5
Note: The stable FP is
such that for one pattern
yi=∑wixi =θm while for the
others
y(i≠j)=0.
(note: here c=y)
BCM Theory
Stability
•One dimension
y  w  xT
dw
 y  y   M x
dt
M  y 2
•Quadratic form
•Instantaneous limit
dw
dt
0

y  y  y 2 x

y 2 (1  y ) x
1
 (c )
y
y
BCM Theory
Selectivity
•Two dimensions
•Two patterns
•Quadratic form
•Averaged threshold
y  w1x1  w2 x2  w  xT
y1  w  x1 , y 2  w  x 2
dw
  y k  y k   M x k
dt
M
 
 E y2

patterns
2
k 2
p
(
y
 k )
k 1
•Fixed points
dw
0
dt
x2
x2
BCM Theory: Selectivity
dw
 y k  y k   M x k
dt
•Four possible fixed points
•Learning Equation
(unselective)
(Selective)
(Selective)
(unselective)
•Threshold
y
1
y1
y1
y1
 0 , y2
 M , y2
 0 , y2
  , y2
M
 0
 0
 M
 M
 M  p1 ( y1 )2  p2 ( y 2 )2  p1 ( y1 )2
 y1  1 / p1
w1
x1
x2
w2
Summary
• The BCM rule is based on two differential equations, what are
they?
• When there are two linearly independent inputs, what will be the
BCM stable fixed points? What will θ be?
•When there are K independent inputs, what are the stable fixed
points? What will θ be?
Bonus project – 10 extra points for section
Write in matlab a code for a BCM neuron trained with 2 inputs in
2D. Include a 1 page write-up and you will also meet with me for
about 15 minutes to explain code and results.
Associative memory:
Famous images
Names
Albert
Input
Marilyn
.
.
.
.
.
.
Harel
 x11
 1
 x2
 x31
 1
 x4
x12
x22
x32
x42
desired output
x13
x23
x33
x43
x14 
4
x2 
x34 
4
x4 
 y11
 1
 y2
 y31
 1
 y4
y12
y22
y32
y42
y13 y14 

y23 y24 
y33 y34 
3
4
y4 y4 
1. Feed forward matrix
networks
2. Attractor networks (autoassociative)
Linear matrix memory:
N input neurons, M output neurons:
o1
o
oM
P input output pairs
w1µ w2µ
x1 x2
wNµ
xN
1. Set synaptic weights
by Hebb rule
2. Present input – output
is a linear operation
P
1. Hebb rule:
Wij   x y
k 1
k
i
k
j
N
2. Linear output:
O rj   xirWij
i 1
Here you are on your own – write a matlab program to do this.
Tip – use large N, small P, start with orthogonal patterns.
A low-D example of a linear matrix memory, do on the board.
Use simple Hebb rule between input and desired output.
- Orthogonal inputs
- Non Orthogonal inputs
Give examples
Might require other rules, Covariance, Perceptron
Formal neural networks can accomplish many tasks, for example:
• Perform complex classification
•Learn arbitrary functions
•Account for associative memory
Some applications: Robotics, Character recognition, Speech recognition,
Medical diagnostics.
This is not Neuroscience, but is motivated loosely by neuroscience and
carries important information for neuroscience as well.
For example: Memory, learning and some aspects of development are
assumed to be based on synaptic plasticity.