Transcript Document

The Dynamics of Learning Vector Quantization
Barbara Hammer
TU Clausthal-Zellerfeld
Institute of Computing Science
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Michael Biehl, Anarta Ghosh
Rijksuniversiteit Groningen
Mathematics and Computing Science
Introduction
prototype-based learning from example data:
representation, classification
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)
The dynamics of learning
a model situation: randomized data
learning algorithms for VQ und LVQ
analysis and comparison: dynamics, success of learning
Summary
Outlook
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization (VQ)
aim:
representation of large amounts
example:
identification and grouping
of data by (few) prototype vectors
in clusters of similar data
assignment of feature vector

to the closest prototype w
(similarity or distance measure,
e.g. Euclidean distance )
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
unsupervised competitive learning
• initialize K prototype vectors
• present a single example
• identify the closest prototype,
i.e the so-called winner
• move the winner even
closer towards the example
intuitively clear, plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function ...
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
quantization error
K
HVQ  
j1
prototypes
P

μ1
data
ξ
μ
 wj
dμj

K
2

k j
wj

Θ dμk  dμj

here:
Euclidean distance
is the winner !
aim: faithful representation (in general: ≠ clustering )
Result depends on
- the number of prototype vectors
- the distance measure / metric used
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Learning Vector Quantization (LVQ)
aim:
classification of data
example situtation:
3 classes , 3 prototypes
learning from examples

classification:
assignment of a vector

to the class of the closest
prototype w





Learning: choice of prototypes according to example data
aim
: generalization ability, i.e. correct classification
of novel data after training
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
mostly: heuristically motivated variations of competitive learning
prominent example [Kohonen]: “ LVQ 2.1. ”
• initialize prototype vectors
(for different classes)
• present a single example
• identify the closest correct
and the closest wrong prototype
• move the corresponding winner
towards / away from the example
known convergence / stability problems,
e.g. for infrequent classes
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- appear plausible, intuitive, flexible
- are fast, easy to implement
- are frequently applied in a variety of problems involving
the classification of structured data, a few examples:
- real time speech recognition
- medical diagnosis, e.g. from histological data
- gene expression data analysis
- texture recognition and classification
-...
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
illustration: microscopic images of (pig) semen cells after freezing
and storage, c/o Lidia Sanchez-Gonzalez, Leon/Spain
damaged cells
healthy cells
prototypes
obtained
by LVQ (1)
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
LVQ algorithms ...
- are often based on purely heuristic arguments,
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure,
inappropriate for heterogeneous data
- lack, in general, a thorough theoretical understanding of
dynamics, convergence properties,
performance w.r.t. generalization, etc.
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
In the following:
analysis of LVQ algorithms w.r.t.
- dynamics of the learning process
- performance, i.e. generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized, high-dimensional data
- essential features of LVQ learning
aim:
- contribute to the theoretical understanding
- develop efficient LVQ schemes
- test in applications
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
model situation: two clusters of N-dimensional data
random vectors
 ∈ ℝN
according to P( ξ ) 

σ  1
pσ P( ξ σ)

mixture of two Gaussians:
orthonormal center vectors:
B+, B- ∈ ℝN, ( B )2 =1, B+· B- =0
(p-)
prior weights of classes p+, pp+ + p- = 1
B-
separation ℓ
independent components:
ξ 2j
2
 ξj  1
σ
σ

The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

2
 1
P( ξ σ) 
exp  ξ -  Βσ 
N/2
2π
 2

1
ξj
ξ
2

σ
  Bσ, j
N

j 1
ξ 2j  N  2
ℓ
B+
(p+)
high-dimensional data (formally: N∞)
400 examples ξμ ∈ℝN , N=200, ℓ=1, p+=0.6
projections into the plane of
center vectors B+, B-
projections in two independent
random directions w1,2
(240)
(160)
x2  w2  ξμ
y   B  ξμ
(240)
(160)
y   B  ξμ
Note:
x1  w1  ξμ
model for studying typical behavior of LVQ algorithms,
not: density-estimation based classification
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
dynamics of on-line training
 
sequence of independent random data ξμ μ  1,2,3,...
acc. to P ξμ
update of prototype vectors:
wμs  wμs-1 
η
N

 
fs dμs , dμ-s , S, σ, ...
learning rate,
step size
competition,
direction of
update etc.
above examples:
unsupervised Vector Quantization
μ
ξ  wμs-1

S, σ   1

dμs  ξμ  wμs 1
2
change of prototype
towards or away
from the current data

fs  ...    dμs  dμs

The Winner Takes It All
(classes irrelevant/unknown)
Learning Vector Quantization “2.1.”

class)
fs  ...   S  σ  11 (( correct
wrong class)
here: two prototypes, no
explicit competition
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
mathematical analysis of the learning dynamics
1. description in terms of a few characteristic quantitities
s,t,σ 1,1
Rμsσ  wμs  Bσ
Qμst  wμs  wμt
projections in
the (B+, B- )-plane
length and relative
position of prototypes
wμs  wμs-1 

  ξμ  wμs-1 
η
fs dμs , dμ-s , S, σ, ...
N




( here: ℝ2N  ℝ7 )
 recursions
-1
Rμsσ  Rμsσ
-1
 η fs ... yμσ  Rμsσ
1/N
Qμst  Qμst-1
1/N


 η fs ... xμt  Qμst-1  η ft ... xμs  Qμst-1  η2 fs ... ft ...
random vector
 
Ο 1
Ν
ξμ enters only in the form of
projections
xμs  wμs-1  ξμ
distances
μ
μ
μ1 2
μ 2
ds  ξ  ws
 ξ
 2xμs  Qμss-1
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

yμ  B  ξμ
  
2. average over the current example
correlated Gaussian
random quantities
random vector acc. to P( ξμ | σ) 
in the thermodynamic limit N  
xμs  wμs-1  ξμ
yμ  B  ξμ
completely specified in terms of first and second moments (w/o indices μ)
N
N
j1
j1
x s σ   ws,j  j
   ws,j Bσ,j   Rsσ
σ
 if S σ
y σ    sσ  
0 else

y  y
xs x t σ - xs σ x t σ  Qst
xs y σ - xs σ y σ  Rs
 averaged recursions
 

σ  1
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
σ
- y
y
  
σ  σ
pσ  σ
closed in { Rsσ , Qst }
3. self-averaging properties
characteristic quantities
Qμst , Rμsσ
- depend on the random sequence of example data
- their variance vanishes with N   (here: ∝ N-1)
learning dynamics is completely described in terms of averages
4. continuous learning time
μ
α
N
recursions
# of examples
# of learning steps
per degree of freedom
 coupled, ordinary differential equations
 evolution of projections
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Qst( α ), Rsσ ( α )
5. learning curve
probability for misclassification of a novel example
εg  p d  d    p d  d  

 1 Q Q 2  R R


  
  p  
Q 2Q Q
 2



  p


 1 Q Q 2  R R
 
  
  2
Q 2Q Q




generalization error εg(α) after training with α N examples
investigation and comparison of given algorithms
- repulsive/attractive fixed points of the dynamics
- asymptotic behavior for 
- dependence on learning rate, separation, initialization
- ...
optimization and development of new prescriptions
- time-dependent learning rate η(α)
- variational optimization w.r.t. fs[...]
- ...
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
maximize
d εg
dα


optimal classification with minimal generalization error
in the model situation (equal variances of clusters):
p P( ξ σ  1)  p P( ξ σ  1)
separation of classes by the plane with
(p+)
ℓ
B+
B(p->p+ )
excess error
0.50
ℓ=0
εg
0.25
ℓ=1
minimal εg as a function
of prior weights
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
0
ℓ=2
0
0.5
p+
1.0
“LVQ 2.1.“
update the correct and wrong winner
[Seo, Obermeyer]: LVQ2.1. ↔ cost function
(likelihood ratios)
wμs  wμs-1 
η
σS
N
(analytical)
integration
for ws(0) = 0
 ξμ  wμs-1 
p = (1+m  ) / 2
(m>0)
R  
 1 m
1  eη m α
m 2


R   
 1 m
1  eη m α
m 2


Q   
R  
 1 m
1  eη m α
m 2


R 
 1 m
1  e η m α
m 2



theory and simulation (N=100)
p+=0.8, ℓ=1, =0.5
averages over 100 independent runs
R
Q 
6
Q 
R
R
0
R   , R  , Q 
remain f inite
R   ,R   , Q  , Q    with α  
-6
2
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
R
Q 
4 α 6
8
10
problem: instability of the algorithm
due to repulsion of wrong prototypes
trivial classification für
(p+> p-)
α∞:
εg = max { p+,p- }
strategies:
- selection of data in a window close to
the current decision boundary
slows down the repulsion,
system remains instable
- Soft Robust Learning Vector Quantization [Seo & Obermayer]
density-estimation based cost function
limiting case Learning from mistakes: LVQ2.1-step only,
if the example is currently misclassified
slow learning, poor generalization
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
(p- )
“ The winner takes it all ”
I) LVQ 1 [Kohonen]
wμs  wμs-1 

η
 dμS  dμS
N
winner ws
1
RS+
only the winner is updated
according to the class membership
 σμ S  ξμ  wμs-1 
numerical
integration
for ws(0)=0
R++
Q++
R-R-+
Q--
Q+- R
--
α
theory and simulation (N=200)
p+=0.2, ℓ=1.2, =1.2
averaged over 100 indep. runs
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
ℓ B+
w-
w+
w-
ℓ BRS-
trajectories in the (B+,B- )-plane
(•)
=20,40,....140
.......
optimal decision boundary
____
asymptotic position
learning curve
(p+=0.2, ℓ=1.2)
η
ε0.26
g
=1.2
εg
- role of the learning rate
- stationary state:
0.22
εg (α∞) grows lin. with η
0.2
0.18
0.14
- variable rate η(α) !?
η0
0.4
2.0
100
0
200
α
300
- well-defined asymptotics:
0.26
εg
0.22
η 0, α∞
(ηα)∞
η0
(ODE linear in η)
0.18
suboptimal
0.14
0
10
20
30
(η α)
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
40
50
min.
εg
“ The winner takes it all “
II ) LVQ+ ( only positive steps without repulsion)
wμs  wμs-1 

η
 dμS  dμS
N
winner
w+
 δσμ, S  ξμ  wμs-1 
correct
(ws updated only from class S)
α∞ asymptotic configuration
symmetric about ℓ (B++B-)/2
ℓ B+
classification scheme and the
ℓ Bwp+=0.2, ℓ=1.2, =1.2
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
achieved generalization error are
independent of the prior weights p
(and optimal for p = 1/2 )
LVQ+ ≈ VQ within the classes
εg
p+=0.2, ℓ=1.0, =1.0
learning curves
LVQ+
LVQ1
α
asymptotics: η0, (ηα)∞
εg
- LVQ 2.1.
trivial assignment to the
more frequent class
min {p+,p-}
optimal
classification
p+
- LVQ 1
here: close to optimal
classification
- LVQ+
min-max solution
p± -independent classification
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Vector Quantization
wμs  wμs-1 
competitive learning

η
 dμS  dμS
N
  ξμ  wμs-1 
ws winner
numerical integration for ws(0)≈0
( p+=0.2, ℓ=1.0, =1.2 )
R--
εg
1.0
0
100
α
 weakly repulsive fixed points
R+LVQ1
0
system is invariant under
exchange of the prototypes
VQR
++
LVQ+
α200
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
R-+
α
class membership is unknown
or identical for all data
300
interpretations:
- VQ, unsupervised learning
unlabelled data
- LVQ, two prototypes of the
same class, identical labels
- LVQ, different classes, but
labels are not used in training
εg
asymptotics (,0, )
p+≈0
p+
p-≈1
- low quantization error
- high gen. error εg
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Summary
•prototype-based learning
Vector Quantization and Learning Vector Quantization
•a model scenario: two clusters, two prototypes
dynamics of online training
•comparison of algorithms:
LVQ 2.1.: instability, trivial (stationary) classification
LVQ 1 : close to optimal asymptotic generalization
LVQ + : min-max solution w.r.t. asymptotic generalization
VQ
: symmetry breaking, representation
•
•
•
•
work in progress, outlook
regularization of LVQ 2.1, Robust Soft LVQ [Seo, Obermayer]
model: different cluster variances, more clusters/prototypes
optimized procedures: learning rate schedules,
variational approach / density estimation / Bayes optimal on-line
several classes and prototypes
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005
Perspectives
•Generalized Relevance LVQ
[Hammer & Villmann]
adaptive metrics, e.g. distance measure dλ ( w ,ξ ) 
•Self-Organizing Maps (SOM)
N


i 1
i
training
(many) N-dim. prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM
•applications
The Dynamics of Learning Vector Quantization, RUG, 10.01.2005

i 2
ξ  ws
i
Neural Gas (distance based)