Clustering Methods

Download Report

Transcript Clustering Methods

Isomap Algorithm
Yuri Barseghyan
Yasser Essiarab
Linear Methods for Dimensionality
– PCA (Principal Component Analysis): rotate data so that
principal axes lie in direction of maximum variance
– MDS (Multi-Dimensional Scaling): find coordinates that best
preserve pairwise distances
Limitations of Linear methods
• What if the data does not lie within a linear subspace?
• Do all convex combinations of the measurements generate
plausible data?
• Low-dimensional non-linear Manifold embedded in a higher
dimensional space
Non-linear Dimensionality Reduction
• What about data that cannot be described by linear combination of
latent variables?
– Ex: swiss roll, s-curve
• In the end, linear methods do nothing more than “globally
transform” (rotate/translate/scale) data.
Sometimes need to
“unwrap” the data first
Non-linear Dimensionality Reduction
• Unwrapping the data = “manifold learning”
• Assume data can be embedded on a lower-dimensional manifold
• Given data set X = {xi}i=1…n, find representation Y = {yi}i=1…n
where Y lies on lower-dimensional manifold
• Instead of preserving global pairwise distances, non-linear
dimensionality reduction tries to preserve only the geometric
properties of local neighborhoods
• From Mathworld: two Riemannian manifolds M and N are
isometric if there is a diffeomorphism such that the Riemannian
metric from one pulls back to the metric on the other.
For a complete Riemannian manifold:
d(x, y) = geodesic distance between x and y
• Informally, an isometry is a smooth invertible mapping that looks
locally like a rotation plus translation
• Intuitively, for 2-dimensional case, isometries include whatever
physical transformations one can perform on a sheet of paper
without introducing tears, holes, or self-intersections
Trustworthiness [2]
The trustworthiness quanties how trustworthy is a projection of a
high-dimensional data set onto a low-dimensional space.
Specically a projection is trustworthy if the set of the t nearest
neighbors of each data point in the lowdimensional space are also
close-by in the original space.
M (t )  1 
(r (i, j )  t ),
vt(2v  3t  1) i 1 jU t (i )
r(i, j) is the rank of the data point j in the ordering according to the
distance from i in the original data space
Ut(i) denotes the set of those data points that are among the tnearest neighbors of the data point i in the low-dimensional space
but not in the original space.
The maximal value that trustworthiness can take is equal to one.
The closer M(t) is to one, the better the low-dimensional space
describes the originaldata.
Several methods to learn a manifold
• Two to start:
– Isomap [Tenenbaum 2000]
– Locally Linear Embeddings (LLE) [Roweis and Saul, 2000]
• Recently:
– Semidefinite Embeddings (SDE) [Weinberger and Saul, 2005]
An important observation
Small patches on a non-linear manifold look linear
These locally linear neighborhoods can be defined in two ways
– k-nearest neighbors: find the k nearest points to a given point, under some
metric. Guarantees all items are similarly represented, limits dimension to K-1
– ε-ball: find all points that lie within ε of a given point, under some metric. Best
if density of items is high and every point has a sufficient number of neighbors
• Find coordinates on lower-dimensional manifold that preserve
geodesic distances instead of Euclidean distances
• Key Observation:
If goal is to discover
underlying manifold,
geodesic distance
makes more sense
than Euclidean
Calculating geodesic distance
• We know how to calculate Euclidean distance
• Locally linear neighborhoods mean that we can approximate
geodesic distance within a neighborhood using Euclidean distance
• A graph is constructed by connecting each point to its K nearest
• Approximate geodesic
distances are calculated by
finding the length of the
shortest path in the graph
between points
• Use Dijkstra’s algorithm to
fill in remaining distances
Dijkstra’s Algorithm
• Greedy breadth-first algorithm to compute shortest path from
one point to all other points
Isomap Algorithm
– Compute fully-connected
neighborhood of points for each
• Can be k nearest neighbors
or ε-ball
– Calculate pairwise Euclidean
distances within each
– Use Dijkstra’s Algorithm to
compute shortest path from
each point to non-neighboring
– Run MDS on resulting distance
Isomap Algorithm [3]
Time Complexity of Algorithm
Isomap Results
Find a 2D embedding of the 3D S-curve
Residual Fitting Error
Plotting eigenvalues from MDS will tell you dimensionality of your
Neighborhood Graph
More Isomap Results
Results on projecting the face dataset to two
dimensions (Trustworthiness−Continuity) [1]
More Isomap Results
Isomap Failures
• Isomap has problems on closed manifolds of arbitrary topology
Isomap: Advantages
• Nonlinear
• Globally optimal
– Still produces globally optimal low-dimensional Euclidean
representation even though input space is highly folded,
twisted, or curved.
• Guarantee asymptotically to recover the true dimensionality.
Isomap: Disadvantages
• Guaranteed asymptotically to recover geometric structure of
nonlinear manifolds
– As N increases, pairwise distances provide better
approximations to geodesics by “hugging surface” more
– Graph discreteness overestimates dM(i,j)
• K must be high to avoid “linear shortcuts” near regions of high
surface curvature
• Mapping novel test images to manifold space
[1] Jarkko Venna and Samuel Kaski, Nonlinear dimensionality
reduction viewed as information retrieval, NIPS' 2006 workshop
on Novel Applications of Dimensionality Reduction, 9 Dec 2006
[2] Claudio Varini, Visual Exploration of Multivariate Data in Breast
Cancer by Dimensional Reduction, March 2006
[3] YimingWu, Kap Luk Chan, An Extended Isomap Algorithm for
Learning Multi-Class Manifold, Machine Learning and
Cybernetics, 2004. Proceedings of 2004 International Conference,
Aug. 2004
