Extending metric multidimensional scaling with Bregman

Transcript Extending metric multidimensional scaling with Bregman

Extending metric multidimensional scaling
with Bregman divergences
Mr. Jigang Sun
Supervisor: Prof. Colin Fyfe
Nov 2009
Multidimensional Scaling(MDS)
• A group of information visualisation methods
that projects data from high dimensional
space, to a low dimensional space, often two
or three dimensions, keeping inter-point
dissimilarities (e.g. distances) in low
dimensional space as close as possible to the
original dissimilarities in high dimensional
space. When Euclidean distances are used, it
is Metric MDS.
An example
low dimensional space
/latent space/output space
high dimensional space
/data space/input space
basic MDS
Basic MDS
• We minimise the stress function
E BasicMDS 
N

N
 (Lij  D ij ) 
2
i 1 j i 1
N

 E ij
i 1 j i 1
w h ere
erro r E ij
 abs(Lij  D ij )
Dij  || X i - X j ||, the distancebetweenpointsX i and X j in data space
Lij  || Yi - Yj ||, the mapped distancebetweenpointsYi and Y j in latentspace
data space
Latent space
Xi

Yi
X
j

Yj
Di j

Li j
N
2
Sammon Mapping (1969)
N
N
E Sammon  C1  
i 1 ji 1
(Lij  D ij ) 2
D ij
N
N
 C1  
i 1 ji 1
E ij
2
D ij
where
error E ij  abs(Lij  D ij )
N
N
Normalisation scalar C    D ij
i 1 j i 1
Focuses on small distances: for the same error,
the smaller distance is given bigger stress, thus
on average the small distances are mapped
more accurately than long distances. Small
neighbourhoods are well preserved.
Bregman divergence
dF ( p, q)  F ( p)  F (q)  p  q,F (q)
is the Bregman divergence between p and q based on strictly convex
function, F. Intuitively, the difference between the value of F at point p
and the value of the first-order Taylor expansion of F around point q
evaluated at point p.

F
F ( p)  F (q) 
( p  q)
q
Bregman divergence
• When F is in one variable, the Bregman Divergence is
truncated Taylor series
• A useful property for MDS: Non-negativity:
d F ( p, q)  0, and d F ( p, q)  0 
 p  q
• If d F (p,q) is a function in p, p approaches q when it is
minimised.
MDS using Bregman divergence
• Bregmanised MDS
• Equivalent Expression: residual Taylor series
Basic MDS is a special BMMDS
• Base convex function is chosen as
• And higher order derivatives are
• So
• Is derived as
Example 2: Extended Sammon
• Base convex function F(x)  x log x, x  0,
• This is equivalent to
• The Sammon mapping is rewritten as
Sammon and Extended Sammon
• The common term
• The Sammon mapping is considered to be an
approximation to the Extended Sammon
mapping using the common term.
• The Extended Sammon mapping will do more
adjustments on the basis of the higher order
terms.
An Experiment on Swiss roll data set
At a glance
• Basic MDS captures the global curve, but
poorly differentiates local points of same X
and Y coordinate but different Z coordinate.
• The Sammon mapping does better than
BasicMDS.
• The Extended Sammon mapping is the best.
Distance preservation
Distance preservation
• Horizontal axis: mean distances in data space,
40 sets.
• Vertical axis: relative mean distances in latent
space.
• Sammon is better than BasicMDS, Extended
Sammon is better than Sammon:
• Small distances are mapped closer to their
original value in data space; long distances are
mapped longer.
Relative standard deviation
Relative standard deviation
• On short distances, Sammon has smaller
variance than BasicMDS, Extended Sammon
has smaller variance than Sammon, i.e.
control of small distances is enhanced.
• Large distances are given more and more
freedom in the same order as above.
LCMC: local continuity meta-criterion
(L. Chen 2006)
• A common measure assesses projection
quality of different MDS methods.
• In terms of neighbourhood preservation.
• Value between 0 and 1, the higher the better.
Quality accessed by LCMC
Stress comparison between Sammon
and Extended Sammon
Stress comparison between Sammon
and Extended Sammon
• For the ExtendedSammon, a shorter distance
error (e.g. if Dij-Lij=2) in latent space is
penalized more than a longer distance error
(e.g. if Dij – Lij =-2)in latent space.
Stress formation by items
Stress formation by terms
• Stress coming from the term of the Sammon
mapping is the largest. It is the main part of
stress.
• However, for small distances, the contribution
from other terms is not negligible.
OpenBox, Sammon and FirstGroup
SecondGroup on OpenBox
Future work
• Combining two opposite strategies for
choosing base convex functions.
• Right Bregman divergences is one kind of CCA.
Conclusion
• Applied Bregman divergences to
multidimensional scaling.
• Shown that basic MMDS is a special case and
Sammon mapping approximates a BMMDS.
• Improved upon both with 2 families of
divergences.
• Shown results on two artificial data sets.