Integrative Vision - Frankfurt Institute for Advanced Studies

Download Report

Transcript Integrative Vision - Frankfurt Institute for Advanced Studies

Self-organizing Maps
(Kohonen Networks)
and related models
Outline:
• Cortical Maps
• Competitive Learning
• Self-organizing Maps (Kohonen Network)
• Applications to V1
• Contextual Maps
• Some Extensions: Conscience, Topology Learning
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
1
Vesalius, 1543
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
2
Brodmann, 1909
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
3
van Essen, 1990s
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
4
Optical imaging of primary visual cortex:
orientation selectivity, retinal position, and
ocular dominance all mapped onto 2-dim.map.
Note: looks at upper layers of cortex
Somatosensory System:
strong magnification of certain
body parts “homunculus”
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
5
General Themes
Potentially high dimensional inputs are mapped onto twodimensional cortical surface: dimensionality reduction
Within one column: (from pia to white matter)
similar properties of neurons
Cortical Neighbors:
smooth changes of properties between neighboring cortical sites
In visual system:
Retinotopy dominates: neighboring cortical sites process information
from neighboring parts of the retina (the lower in the visual system,
the stronger the effect)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
6
Two ideas for learning mappings:
• Willshaw & von der Malsburg:
• discrete input and output space
• all-to-all connectivity
• Kohonen:
• more abstract formulation
• continuous input space
• “weight vector” for each output
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
7
Applications of Hebbian Learning:
Retinotopic Connections
tectum
retina
Retinotopic: neighboring retina cells project to neighboring tectum cells,
i.e. topology is preserved, hence “retinotopic”
Question: How do retina units know where to project to?
Answer: Retina axons find tectum through chemical markers, but fine
structure of mapping is activity dependent
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
8
Willshaw & von der Malsburg Model
Principal assumptions:
- local excitation of neighbors in retina and tectum
- global inhibition in both layers
-Hebbian learning of feed-forward weights, with constraint on sum of pre-syn. weights
- spontaneous activity (blobs) in retina layer
Why should we see retinotopy emerging?
retina
tectum
cooperation in
intersection area
instability, tectum blob
can only form at X or Y
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
competition for ability
to innervate tectum area
9
Model Equations
Weight update:
Weight Normalization:
Hj* = activity in post-synaptic cell j
M = # pre cells
N = # post cells
Ai* = activity of pre-synaptic cell i, either 0 or 1
sij = connection weight from i  j
ekj = excitatory connection of post-synaptic cell k  post-synaptic cell j
ikj = inhibitory connection of post-synaptic cell k  post-synaptic cell j
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
10
Results: Systems Matching
Retina
Tectum
half retina experiment
Formation of retinotopic mappings
models of this type account for a
range of experiments: “half retina”,
“half tectum”, “graft rotation”, …
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
11
Simple Competitive Learning
 
hi   wij j  wi  
j
simple linear units,

input is 
Assume weights >0 and possibly weights normalized.
If weights normalized,
hi maximal if wi has smallest Euclidean

distance to  . define winner: i*  arg maxi hi
1 : i  i 
output of unit: oi  
*
0
:
i

i

“ultra-sparse code”
Figure taken from Hertz et.al.
Winner-Take-All Mechanism:
may be implemented through
lateral inhibition network (see
part on Neurodynamics)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
12
wij  Oi ( j  wij )
learning rule is Hebb-like
plus decay term

Weight of winning unit is moved towards current input  .
Similar to Oja´s and Sanger´s rule.
Geometric interpretation: units (their weights) move in input space.
Winning unit moves towards current input.
2
normalized weights
input
vector
weight
vector
1
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
13
Figure taken from Hertz et.al.
Cost function for
Competitive Learning

wij  Oi ( j  wij )
Claim: Learning rule related to this cost function:

2
 2 1
1


E{wij }     wi*   M i  j  wij , where
2 
2 i , j ,

*

1
if
i

i
( )

Mi  
0 otherwise
To see this, treat M as constant:

minimization of sum of squared errors!
wij  
E
   M i  j  wij
wij



Note 1: may be seen as online version of k-means clustering
Note 2: can modify learning rule to make all units fire equally often
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
14
Vector Quantization
Idea: represent input by weight vector of the winner
(can be used for data compression: just store/transmit
label representing the winner)
Question: what is set of inputs that “belong” to a unit i,
i.e. for which unit i is winner?
Answer: Voroni tesselation (note: matlab command for this)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
Figure
taken from Hertz et.al.
15
Self-Organizing Maps (Kohonen)
Idea: Output units have a priori topology (1-dim. or 2-dim.),
e.g. are arranged on a chain or regular grid.
Not only winner get´s to learn but also its neighbors.
SOMs transform incoming signal patterns of arbitrary dimension
on 1-dim. or 2-dim. map. Neighbors in the map respond to similar
input patterns.
Competition: again with winner-take-all mechanism
Cooperation: through neighborhood relations that are exploited
during learning
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
16
Self-Organizing Maps (Kohonen)
example of 1-dim. topology of output nodes
2
input
vector
weight
vector
1
Neighboring output units learn together! Update winner’s weights
but also those of neighbors.
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
17
SOM algorithm
 
hi   wij j  wi  
j

again, simple linear units, input is 
Define winner: i*  arg maxi hi
Output of unit j when winner is i*: o j ,i*
Same learning rule as earlier:
 d 2j ,i* 
 
2


 exp 
d

r

r
,
j
,
i
*
j
i*
2
 2 


 

w j  o j ,i* (  w j )
Usually, neighborhood shrinks with time:
(for 0 : competetive learning)
Usually, learning rate decays, too:
 (t )   0 exp( )
1
 (t )  0 exp(
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
t
t
2
)
18
2
2 Learning Phases
time dependence
of parameters:
t
t
 (t )   0 exp( ) ,  (t )  0 exp( )
1
2
1. Self-organizing or ordering phase
- topological ordering of weight vectors
- use:
1000
, 0  0.1, 2  1000
 0 = radius of layer ,1 
log 0
2. Convergence phase
- fine tuning of weights
- use: very small neighborhood, learning rate around 0.01
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
19
Examples
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
20
Figure taken from Haykin
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
21
Figure taken from Haykin
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
Figure
taken from Hertz et.al.
22
Feature Map Properties
Property 1: feature map approximates the input space
Property 2: topologically ordered, i.e. neighboring units correspond
to similar input patterns
Property 3: density matching: density of output units corresponds
qualitatively to input probability density function.
SOM tends to under-sample high density areas and
over-sample low-density areas
Property 4: can be thought of (loosely) as non-linear
generalization of PCA.
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
23
Ocular Dominance and Orientation
Tuning in V1
1. singularities (pinwheels and saddle points) tend to align with centers
of ocular dominance bands
2. isoorientation contours intersect borders of ocular dominance bands
at approximately 90 deg. angle (“local orthogonality”)
3. global disorder: (Mexican-hat like autocorrelation functions)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
24
SOM based Model
Obermayer et al.:
  x, y, q sin(2 ), q cos(2 ), z
t 1 (r)  t 1 (r)  hSOM (r, r' )vt 1  t (r) , 0    1

hSOM (r, r ' )  exp  r  r ' / 2 2
2

r' ( v, (r))  minr d ( v, (r))
SOM model and close relative (elastic net)
account for observed structure of pinwheels
and their locations:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
25
Elastic Net Model
Very similar idea: (Durbin and Mitchinson)
  x, y, q sin(2 ), q cos(2 ), z
t 1 (r)  t 1 (r)  hEN (r, vt 1 )vt 1  t (r) 

   (r' )   (r)
r ' r 1
t
t

hEN (r, vt 1 )  exp  d vt 1 , (r) / 2 2 / hEN (r' , vt 1 )
2
r'
Note: explicitly forces unit’s weight to be similar to that of neighbors
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
26
RF-LISSOM model
Receptive Field Laterally Interconnected Synergetically Self-Organizing Map
Idea: learn RF properties and map simultaneously
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
Figures
taken from http://www.cs.texas.edu/users/nn/web-pubs/htmlbook96/sirosh/
27
Lissom equations
input activities are elongated Gaussian blobs:
initial map activity is nonlinear function of input activity, weights μ:
time evolution of activity (μ, E, I are all weights):
Hebbian style learning with weight normalization: all weights learn
but with different parameters:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
28
Demo: (needs
supercomputer
power)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
29
Contextual Maps
Different way of displaying SOM: label output nodes with
a class label describing what output node represents. Can be used
to display data from high dimensional input spaces in 2d.
Figure taken
from Haykin
Input vector is concatenation of attribute vector and symbol code:

T T
(symbol code “small” and free of correlations) T
  [ s  a ]
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
30
Contextual Maps cont’d.
Contextual Map trained just like standard SOM.
Which unit is winner for symbol input only?
Figure taken from Haykin
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
31
Contextual Maps cont’d.
For which symbol code fires unit the most?
Figure taken from Haykin
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
32
Extension with “Conscience”
Problem: standard SOM does not faithfully represent input density
Magnification factor: m(x) is the number of units in small volume dx of input space.
Ideally:
m(x)  p(x)
But this is not what SOM does!
Idea (DeSieno):
If neuron wins too often/not often enough,
it decreases/increases its chance of winning.
(intrinsic plasticity)
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
33
Algorithm: SOM with conscience
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
34
Topology Learning: neural gas
idea: no fixed underlying topology but learn it on-line
Figure taken from Ballard
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
35
Figure taken from Ballard
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
36
SOM Summary
• “simple algorithms” with complex behavior.
• Can be used to understand organization of response preferences
for neurons in cortex, i.e. formation of cortical maps.
• Used for visualization of high dimensional data.
• Can be extended for supervised learning.
• Some notable extensions, e.g. neural gas and growing neural gas,
that are generalizations to 2-dim topology. Proper (local)
topology is discovered during the learning process
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
37
Learning Vector Quantization
Idea: supervised add-on for the case that class labels are available
for the input vectors
How it works: first do unsupervised learning, then label output nodes
Learning rule: Cx: desired class label for input x
Cw: class label of winning unit with weight vector w
In case Cw=Cx: w=a(t)[x-w] (move weight towards input)
In case CwCx: w=-a(t)[x-w] (move away from input)
a(t):
decaying
learning
rate
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
38
Figure taken from Haykin
LVQ
data:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
39
LVQ
result:
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch
Figure taken from Haykin
40