Face Recognition: Component-based versus Global Approaches
Download
Report
Transcript Face Recognition: Component-based versus Global Approaches
Evaluation of Distance Metrics for
Recognition Based on NonNegative Matrix Factorization
David Guillamet, Jordi Vitrià
Pattern Recognition Letters
24:1599-1605, June, 2003
John Galeotti
Advanced Perception
March 23, 2004
Actually, Two ICPR’02 Papers
Analyzing Non-Negative Matrix
Factorization for Image Classification
David Guillamet, Bernt Schiele, Jordi
Vitrià
Determining a Suitable Metric When using
Non-negative Matrix Factorization
David Guillamet, Jordi Vitrià
Non-Negative Matrix Factorization
TLA: NMF
Used for dimensionality reduction
Vnxm ≈ WnxrHrxm, r < nm/(n+m)
V has non-negative training samples as its columns
W contains the non-negative basis vectors
H contains the non-negative coefficients to
approximate each column of V using W
Results similar in concept to PCA, but with
non-negative “basis vectors”
NMF Distinguishing Properties
Requires positive data
Computationally expensive
Part-based decomposition
Because
only additive combinations of
original data are allowed
Not an orthonormal basis
Different Decomposition Types
20 Dimensions of Numeric Digits
PCA
NMF
50 Dimensions of Numeric Digits
PCA
NMF
Why not just use PCA?
PCA is optimal for reconstruction
PCA is not optimal for separation and
recognition of classes
NMF Issues Addressed
If/when is NMF better at dimensionality
reduction than PCA for classification?
Can combining PCA and NMF lead to
better performance?
What is the best distance metric to use
with the nonorthonormal basis of NMF?
How NMF Works
Vnxm ≈ WnxrHrxm, r < nm/(n+m)
Begin with a nxm matrix of training data V
Each
column is a vectorized data point
Randomly initialize W and H with positive
values
Iterate according to update rules:
How NMF Works
In general, NMF requires the non-linear
optimization of an objective function
The update rules just given correspond
to a popular objective function, and are
guaranteed to converge.
That
objective function relates to the
probability of generating the images in V
from the bases W and encodings H:
NMF vs. PCA Experiments
Dataset: 10 classes of natural textures
Clouds, grass, ice, trees, sand, sky, etc.
932 color images total
Each image tessellated into 10x10 patches
1000 patches for training, 1000 for testing
Each patch classified as a single texture
Raw feature vectors: Color histograms
Each
region histogrammed into 8 bins per
color, 16 colors 512 dimensional vectors
NMF vs. PCA Experiments
Learn both NMF and PCA subspaces
for each class of histogram
For both NMF and PCA:
Project
queries onto the learned
subspaces of each class
Label each query by the subspace that
best reconstructs the query
This seems like a poor scheme for NMF
(Other
experiments allow better schemes)
NMF vs. PCA Results
NMF works best for dispersed classes
PCA works best for compact classes
Both seem useful…try combining them
But, why are less than half of the sky
vectors best reconstructed by PCA
when for sky PCA has a mean
reconstruction error less than 1/4 that of
NMF? Mistakes?
NMF+PCA Experiments
During training, we learned whether
NMF or PCA worked best for each class
Project a query to a class using only the
method that works best for that class
Result: 2.3% improvement in the
recognition rate over NMF alone (PCA:
5.8%), but is this significant at 60%?
Hierarchy Experiments
At level k of the hierarchy, project the query
onto each original class’ NMF or PCA
subspace
But, to choose the direction to descend the
hierarchy, we only care about the level k
super-class containing the matching class
Furthermore, for each class the choice of
PCA vs. NMF can be independently set at
each level of the hierarchy
Hierarchy Results
2% improvement in recognition rate
I really suspect that this is insignificant,
and resulting only from the additional
degrees of freedom
They employ various additional
neighborhood-based hacks to increase
their accuracy further, but I don’t see
any relevance to NMF specifically
Need for a better metric
Want to classify based on nearest
neighbor, rather than reprojection error
Unfortunately, NMF generates a
nonorthonormal basis, and so the
relative distance to a base depends on
the uniqueness of that base
Bases
areas
will share a lot of pixels in common
Earth Movers Distance (EMD)
Defined as the minimal amount of
“work” that must be performed to
transform one feature distribution into
the other
A special case of the “transportation
problem” from linear optimization
Let
I=set of suppliers, J=set of consumers,
cij=cost to ship from I to J, fij=amount
shipped from I to J
Distance = cost to make datasets equal
Earth Movers Distance (EMD)
Based on finding a measure of
correlation between bases to define its
cost matrix
The cost matrix weights the transition of
one basis (bi) to another (bj)
cij = distangle(bi,bj) = -( x • y )/( ||x|| ||y|| )
EMD: Transportation Problem
fij = quant. shipped from ij
Consumers don’t ship
Don’t exceed demand
Don’t exceed supply
Demand must equal supply for EMD to be a metric
EMD vs. “Other” Experiments
Digit recognition from MNIST digit database
60,000 training images + 10,000 for test
Classify by NN and 5NN in the subspace
Result: EMD works best in low-dimensional
subspaces, but in high-dimensional subspaces
EMD does not work well
More specificly, EMD works well when the
bases contain some intersecting pixels
Occlusion Experiments
Randomly occlude either 1 or 2 of the 4
quadrants of an image (25% and 50%
occlusion)
Best subspace & distance with occlusions
Low dim.
High dim.
25% Occlusion
NMF+distangle
50% Occlusion
NMF+distangle OR EMD
PCA sometimes
better
NMF+distangle
Why does distangle do so well?
Demo
NMF difficulties
EMD experiments instead
Demonstrate
using existing code within the
desired framework of a cost matrix
Their code:
http://robotics.stanford.edu/~rubner/emd/de
fault.htm
My code:
http://www.vialab.org/john/Pres9-code/
Conclusion
NMF is a parts-based alternative to PCA
NMF and PCA should be combined for
minimum-reprojection-error classification
For nearest-neighbor classification, NMF
needs a better metric
When
the subspace dimensionality is
chosen appropriately for good bases,
NMF+EMD or NMF+distangle have the
highest recognition rates