Face Recognition: Component-based versus Global Approaches

Download Report

Transcript Face Recognition: Component-based versus Global Approaches

Evaluation of Distance Metrics for
Recognition Based on NonNegative Matrix Factorization
David Guillamet, Jordi Vitrià
Pattern Recognition Letters
24:1599-1605, June, 2003
John Galeotti
Advanced Perception
March 23, 2004
Actually, Two ICPR’02 Papers
Analyzing Non-Negative Matrix
Factorization for Image Classification
David Guillamet, Bernt Schiele, Jordi
Vitrià
Determining a Suitable Metric When using
Non-negative Matrix Factorization
David Guillamet, Jordi Vitrià
Non-Negative Matrix Factorization

TLA: NMF
 Used for dimensionality reduction





Vnxm ≈ WnxrHrxm, r < nm/(n+m)
V has non-negative training samples as its columns
W contains the non-negative basis vectors
H contains the non-negative coefficients to
approximate each column of V using W
Results similar in concept to PCA, but with
non-negative “basis vectors”
NMF Distinguishing Properties
Requires positive data
 Computationally expensive
 Part-based decomposition

 Because
only additive combinations of
original data are allowed
 Not an orthonormal basis
Different Decomposition Types
20 Dimensions of Numeric Digits
PCA
NMF
50 Dimensions of Numeric Digits
PCA
NMF
Why not just use PCA?
PCA is optimal for reconstruction
 PCA is not optimal for separation and
recognition of classes

NMF Issues Addressed
If/when is NMF better at dimensionality
reduction than PCA for classification?
 Can combining PCA and NMF lead to
better performance?
 What is the best distance metric to use
with the nonorthonormal basis of NMF?

How NMF Works
Vnxm ≈ WnxrHrxm, r < nm/(n+m)
 Begin with a nxm matrix of training data V

 Each
column is a vectorized data point
Randomly initialize W and H with positive
values
 Iterate according to update rules:



How NMF Works
In general, NMF requires the non-linear
optimization of an objective function
 The update rules just given correspond
to a popular objective function, and are
guaranteed to converge.

 That
objective function relates to the
probability of generating the images in V
from the bases W and encodings H:

NMF vs. PCA Experiments

Dataset: 10 classes of natural textures
 Clouds, grass, ice, trees, sand, sky, etc.
 932 color images total
 Each image tessellated into 10x10 patches
 1000 patches for training, 1000 for testing
 Each patch classified as a single texture

Raw feature vectors: Color histograms
 Each
region histogrammed into 8 bins per
color, 16 colors  512 dimensional vectors
NMF vs. PCA Experiments
Learn both NMF and PCA subspaces
for each class of histogram
 For both NMF and PCA:

 Project
queries onto the learned
subspaces of each class
 Label each query by the subspace that
best reconstructs the query
 This seems like a poor scheme for NMF
 (Other
experiments allow better schemes)
NMF vs. PCA Results
NMF works best for dispersed classes
 PCA works best for compact classes
 Both seem useful…try combining them
 But, why are less than half of the sky
vectors best reconstructed by PCA
when for sky PCA has a mean
reconstruction error less than 1/4 that of
NMF? Mistakes?

NMF+PCA Experiments
During training, we learned whether
NMF or PCA worked best for each class
 Project a query to a class using only the
method that works best for that class
 Result: 2.3% improvement in the
recognition rate over NMF alone (PCA:
5.8%), but is this significant at 60%?

Hierarchy Experiments

At level k of the hierarchy, project the query
onto each original class’ NMF or PCA
subspace
 But, to choose the direction to descend the
hierarchy, we only care about the level k
super-class containing the matching class
 Furthermore, for each class the choice of
PCA vs. NMF can be independently set at
each level of the hierarchy
Hierarchy Results
2% improvement in recognition rate
 I really suspect that this is insignificant,
and resulting only from the additional
degrees of freedom
 They employ various additional
neighborhood-based hacks to increase
their accuracy further, but I don’t see
any relevance to NMF specifically

Need for a better metric
Want to classify based on nearest
neighbor, rather than reprojection error
 Unfortunately, NMF generates a
nonorthonormal basis, and so the
relative distance to a base depends on
the uniqueness of that base

 Bases
areas
will share a lot of pixels in common
Earth Movers Distance (EMD)
Defined as the minimal amount of
“work” that must be performed to
transform one feature distribution into
the other
 A special case of the “transportation
problem” from linear optimization

 Let
I=set of suppliers, J=set of consumers,
cij=cost to ship from I to J, fij=amount
shipped from I to J
 Distance = cost to make datasets equal
Earth Movers Distance (EMD)
Based on finding a measure of
correlation between bases to define its
cost matrix
 The cost matrix weights the transition of
one basis (bi) to another (bj)
 cij = distangle(bi,bj) = -( x • y )/( ||x|| ||y|| )

EMD: Transportation Problem

fij = quant. shipped from ij

Consumers don’t ship

Don’t exceed demand

Don’t exceed supply

Demand must equal supply for EMD to be a metric
EMD vs. “Other” Experiments





Digit recognition from MNIST digit database
60,000 training images + 10,000 for test
Classify by NN and 5NN in the subspace
Result: EMD works best in low-dimensional
subspaces, but in high-dimensional subspaces
EMD does not work well
More specificly, EMD works well when the
bases contain some intersecting pixels
Occlusion Experiments

Randomly occlude either 1 or 2 of the 4
quadrants of an image (25% and 50%
occlusion)
Best subspace & distance with occlusions

Low dim.
High dim.
25% Occlusion
NMF+distangle
50% Occlusion
NMF+distangle OR EMD
PCA sometimes
better
NMF+distangle
Why does distangle do so well?
Demo
NMF difficulties
 EMD experiments instead

 Demonstrate
using existing code within the
desired framework of a cost matrix
 Their code:
http://robotics.stanford.edu/~rubner/emd/de
fault.htm
 My code:
http://www.vialab.org/john/Pres9-code/
Conclusion
NMF is a parts-based alternative to PCA
 NMF and PCA should be combined for
minimum-reprojection-error classification
 For nearest-neighbor classification, NMF
needs a better metric

 When
the subspace dimensionality is
chosen appropriately for good bases,
NMF+EMD or NMF+distangle have the
highest recognition rates