Learning Manifolds in the Wild

Download Report

Transcript Learning Manifolds in the Wild

Manifold Learning in the Wild
A New Manifold Modeling and Learning Framework for
Image Ensembles
Aswin C. Sankaranarayanan
Rice University
Richard G. Baraniuk
Chinmay Hegde
Sensor Data Deluge
Internet Scale Databases
• Tremendous size of corpus of available data
– Google Image Search of “Notre Dame Cathedral”
yields 3m results  3Tb of data
Concise Models
• Efficient processing / compression requires
concise representation
• Our interest in this talk:
Collections of images
Concise Models
• Our interest in this talk:
Collections of image
parameterized by q \in Q
– translations of an object
q: x-offset and y-offset
– wedgelets
q: orientation and offset
– rotations of a 3D object
q: pitch, roll, yaw
Concise Models
• Our interest in this talk:
Collections of image
parameterized by q \in Q
– translations of an object
q: x-offset and y-offset
– wedgelets
q: orientation and offset
– rotations of a 3D object
q: pitch, roll, yaw
• Image articulation manifold
Image Articulation Manifold
• N-pixel images:
• K-dimensional
articulation space
• Then
is a K-dimensional manifold
in the ambient space
• Very concise model
– Can be learnt using
Non-linear dim. reduction
articulation parameter space
Ex: Manifold Learning
LLE
ISOMAP
LE
HE
Diff. Geo …
• K=1
rotation
Ex: Manifold Learning
• K=2
rotation and scale
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Smooth IAMs
• N-pixel images:
• Local isometry
image distance
parameter space distance
• Linear tangent spaces
are close approximation
locally
• Low dimensional
articulation space
articulation parameter space
Theory/Practice Disconnect
Isometry
• Ex: translation
manifold
all blue images
are equidistant
from the red image
• Local isometry
3.5
– satisfied only when sampling is
dense
Euclidean distance
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
Theory/Practice Disconnect
Nuisance articulations
• Unsupervised data, invariably, has additional
undesired articulations
– Illumination
– Background clutter, occlusions, …
• Image ensemble is no longer low-dimensional
Image representations
• Conventional representation for an image
– A vector of pixels
– Inadequate!
pixel image
Image representations
• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects
keypoint locations in an image and computes
keypoint descriptors for each keypoint
– Very popular in many many vision problems
Image representations
• Replace vector of pixels with an abstract
bag of features
– Ex: SIFT (Scale Invariant Feature Transform) selects
keypoint locations in an image and computes
keypoint descriptors for each keypoint
– Keypoint descriptors are local; it is very easy to make them
robust to nuisance imaging parameters
Loss of Geometrical Info
• Bag of features representations hide
potentially useful image geometry
Image space
Keypoint space
• Goal: make salient
image geometrical info
more explicit for
exploitation
Key idea
• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
Key idea
• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels
,
between keypoint locations, keypoint descriptors
Keypoint Kernel
• Keypoint space can be endowed with a rich
low-dimensional structure in many situations
• Mechanism: define kernels
,
between keypoint locations, keypoint descriptors
• Joint keypoint kernel between two images
is given by
Many Possible Kernels
• Euclidean kernel
• Gaussian kernel
• Polynomial kernel
• Pyramid match kernel
• Many others
[Grauman et al. ’07]
Keypoint Kernel
• Joint keypoint kernel between two images
is given by
• Using Euclidean/Gaussian (E/G) combination yields
From Kernel to Metric
Lemma: The E/G keypoint kernel is a Mercer kernel
– enables algorithms such as SVM
Lemma: The E/G keypoint kernel induces a metric
on the space of images
– alternative to conventional L2 distance between images
– keypoint metric robust to nuisance imaging parameters,
occlusion, clutter, etc.
Keypoint Geometry
Theorem: Under the metric induced by the kernel
certain ensembles of articulating images form
smooth, isometric manifolds
• Keypoint representation compact, efficient, and …
• Robust to illumination variations, non-stationary backgrounds,
clutter, occlusions
Keypoint Geometry
Theorem: Under the metric induced by the kernel
certain ensembles of articulating images form
smooth, isometric manifolds
• In contrast: conventional approach to image fusion
via image articulation manifolds (IAMs) fraught
with non-differentiability (due to sharp image edges)
– not smooth
– not isometric
Application: Manifold Learning
2D Translation
Application: Manifold Learning
2D Translation
IAM
KAM
Manifold Learning in the Wild
• Rice University’s Duncan Hall Lobby
– 158 images
– 360° panorama using handheld camera
– Varying brightness, clutter
Manifold Learning in the Wild
• Duncan Hall Lobby
• Ground truth using state of the
art structure-from-motion
software
Ground truth
IAM
KAM
Manifold Learning in the Wild
• Rice University’s Brochstein Pavilion
– 400 outdoor images of a building
– occlusions, movement in foreground, varying background
Manifold Learning in the Wild
• Brochstein Pavilion
– 400 outdoor images of a building
– occlusions, movement in foreground, background
IAM
KAM
Internet scale imagery
• Notre-dame
cathedral
– 738 images
– Collected from Flickr
– Large variations in
illumination
(night/day/saturations)
, clutter (people,
decorations), camera
parameters (focal
length, fov, …)
– Non-uniform sampling
of the space
Organization
• k-nearest neighbors
Organization
• “geodesics’
“zoom-out”
“Walk-closer”
3D rotation
Summary
• Challenges for manifold learning in the wild are both
theoretical and practical
• Need for novel image representations
– Sparse features
 Robustness to outliers, nuisance articulations, etc.
 Learning in the wild: unsupervised imagery
• Promise lies in fast methods that exploit only
neighborhood properties
– No complex optimization required