Transcript Slide 1

Eigen Value Analysis in Pattern
Recognition
By
Dr. M. Asmat Ullah Khan
COMSATS Institute of Information Technology,
Abbottabad
MULTI SPECTRAL IMAGE COMPRESSION
MULTI SPECTRAL IMAGE COMPRESSION
MULTI SPECTRAL IMAGE COMPRESSION
MULTI SPECTRAL IMAGE COMPRESSION
MULTI SPECTRAL IMAGE COMPRESSION
Principal Component Analysis
(PCA)

Pattern recognition in high-dimensional spaces
– Problems arise when performing recognition in a highdimensional space (curse of dimensionality).
– Significant improvements can be achieved by first mapping
the data into a lower-dimensional sub-space.
– The goal of PCA is to reduce the dimensionality of the data
while retaining as much as possible of the variation present in
the original dataset.
Principal Component Analysis
(PCA)

Dimensionality reduction
– PCA allows us to compute a linear transformation that maps
data from a high dimensional space to a lower dimensional
sub-space.
Principal Component Analysis
(PCA)

Lower dimensionality basis
– Approximate vectors by finding a basis in an appropriate
lower dimensional space.
(1) Higher-dimensional space representation:
(2) Lower-dimensional space representation:
Principal Component Analysis
(PCA)

Example
Principal Component Analysis
(PCA)

Information loss
– Dimensionality reduction implies information loss !!
– Want to preserve as much information as possible, that is:

How to determine the best lower dimensional subspace?
Principal Component Analysis
(PCA)

Methodology
– Suppose x1, x2, ..., xM are N x 1 vectors
Principal Component Analysis
(PCA)

Methodology – cont.
Principal Component Analysis
(PCA)

Linear transformation implied by PCA
– The linear transformation RN  RK that performs the
dimensionality reduction is:
Principal Component Analysis
(PCA)

Geometric interpretation
– PCA projects the data along the directions where the data
varies the most.
– These directions are determined by the eigenvectors of the
covariance matrix corresponding to the largest eigenvalues.
– The magnitude of the eigenvalues corresponds to the variance
of the data along the eigenvector directions.
Principal Component Analysis
(PCA)

How to choose the principal components?
– To choose K, use the following criterion:
Principal Component Analysis
(PCA)

What is the error due to dimensionality reduction?
– We saw above that an original vector x can be reconstructed
using its principal components:
– It can be shown that the low-dimensional basis based on
principal components minimizes the reconstruction error:
– It can be shown that the error is equal to:
Principal Component Analysis
(PCA)

Standardization
– The principal components are dependent on the units used to
measure the original variables as well as on the range of
values they assume.
– We should always standardize the data prior to using PCA.
– A common standardization method is to transform all the data
to have zero mean and unit standard deviation:
Principal Component Analysis
(PCA)

PCA and classification
– PCA is not always an optimal dimensionality-reduction
procedure for classification purposes:
Principal Component Analysis (PCA)

Case Study: Eigenfaces for Face Detection/Recognition
– M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal
of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

Face Recognition
– The simplest approach is to
think of it as a template
matching problem
– Problems arise when performing
recognition in a high-dimensional
space.
– Significant improvements can be
achieved by first mapping the data
into a lower dimensionality space.
– How to find this lowerdimensional space?
Principal Component Analysis
(PCA)

Main idea behind eigenfaces
Principal Component Analysis (PCA)

Computation of the eigenfaces
Principal Component Analysis
(PCA)

Computation of the eigenfaces – cont.
Principal Component Analysis (PCA)

Computation of the eigenfaces – cont.
Principal Component Analysis
(PCA)

Computation of the eigenfaces – cont.
Principal Component Analysis (PCA)

Representing faces onto this basis
Principal Component Analysis
(PCA)

Representing faces onto this basis – cont.
Principal Component Analysis (PCA)

Face Recognition Using Eigenfaces
Principal Component Analysis
(PCA)

Face Recognition Using Eigenfaces – cont.
– The distance er is called distance within the face space (difs)
– Comment: we can use the common Euclidean distance to
compute er, however, it has been reported that the
Mahalanobis distance performs better:
Principal Component Analysis
(PCA)

Face Detection Using Eigenfaces
Principal Component Analysis
(PCA)

Face Detection Using Eigenfaces – cont.
Principal Component Analysis (PCA)

Problems
– Background (de-emphasize the outside of the face – e.g., by
multiplying the input image by a 2D Gaussian window
centered on the face)
– Lighting conditions (performance degrades with light changes)
– Scale (performance decreases quickly with changes to head
size)


multi-scale eigenspaces
scale input image to multiple sizes
– Orientation (performance decreases but not as fast as with scale
changes)


plane rotations can be handled
out-of-plane rotations are more difficult to handle
Linear Discriminant Analysis (LDA)


Multiple classes and PCA
– Suppose there are C classes in the training data.
– PCA is based on the sample covariance which characterizes the
scatter of the entire data set, irrespective of class-membership.
– The projection axes chosen by PCA might not provide good
discrimination power.
What is the goal of LDA?
– Perform dimensionality reduction while preserving as much of the
class discriminatory information as possible.
– Seeks to find directions along which the classes are best separated.
– Takes into consideration the scatter within-classes but also the
scatter between-classes.
– More capable of distinguishing image variation due to identity
from variation due to other sources such as illumination and
expression.
Linear Discriminant Analysis (LDA)

Methodology
Linear Discriminant Analysis
(LDA)

Methodology – cont.
– LDA computes a transformation that maximizes the betweenclass scatter while minimizing the within-class scatter:
– Such a transformation should retain class separability while
reducing the variation due to sources other than identity (e.g.,
illumination).
Linear Discriminant Analysis
(LDA)

Linear transformation implied by LDA
– The linear transformation is given by a matrix U whose
columns are the eigenvectors of Sw-1 Sb (called Fisherfaces).
– The eigenvectors are solutions of the generalized eigenvector
problem:
Linear Discriminant Analysis
(LDA)

Does Sw-1 always exist?
– If Sw is non-singular, we can obtain a conventional eigenvalue
problem by writing:
– In practice, Sw is often singular since the data are image
vectors with large dimensionality while the size of the data set
is much smaller (M << N )
Linear Discriminant Analysis
(LDA)

Does Sw-1 always exist? – cont.
– To alleviate this problem, we can perform two projections:
1)
PCA is first applied to the data set to reduce its dimensionality.
2)
LDA is then applied to further reduce the dimensionality.
Linear Discriminant Analysis
(LDA)

Case Study: Using Discriminant Eigenfeatures for
Image Retrieval
– D. Swets, J. Weng, "Using Discriminant Eigenfeatures for
Image Retrieval", IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996.

Content-based image retrieval
– The application being studied here is query-by-example image
retrieval.
– The paper deals with the problem of selecting a good set of
image features for content-based image retrieval.
Linear Discriminant Analysis
(LDA)

Assumptions
– "Well-framed" images are required as input for training and
query-by-example test probes.
– Only a small variation in the size, position, and orientation of
the objects in the images is allowed.
Linear Discriminant Analysis (LDA)

Some terminology
– Most Expressive Features (MEF): the features (projections)
obtained using PCA.
– Most Discriminating Features (MDF): the features
(projections) obtained using LDA.

Computational considerations
– When computing the eigenvalues/eigenvectors of Sw-1SBuk =
kuk numerically, the computations can be unstable since Sw1S is not always symmetric.
B
– See paper for a way to find the eigenvalues/eigenvectors in a
stable way.
– Important: Dimensionality of LDA is bounded by C-1 --- this
is the rank of Sw-1SB
Linear Discriminant Analysis (LDA)

Case Study: PCA versus LDA
– A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 23, no. 2,
pp. 228-233, 2001.

Is LDA always better than PCA?
– There has been a tendency in the computer vision community
to prefer LDA over PCA.
– This is mainly because LDA deals directly with discrimination
between classes while PCA does not pay attention to the
underlying class structure.
– This paper shows that when the training set is small, PCA can
outperform LDA.
– When the number of samples is large and representative for
each class, LDA outperforms PCA.
Linear Discriminant Analysis (LDA)

Is LDA always better than PCA? – cont.
Linear Discriminant Analysis (LDA)

Is LDA always better than PCA? – cont.
Linear Discriminant Analysis
(LDA)

Is LDA always better than PCA? – cont.
Linear Discriminant Analysis
(LDA)

Critique of LDA
– Only linearly separable classes will remain separable after
applying LDA.
– It does not seem to be superior to PCA when the training data
set is small.
Appearance-based Recognition
• Directly represent appearance (image brightness), not geometry.
• Why?
Avoids modeling geometry, complex interactions
between geometry, lighting and reflectance.
• Why not?
Too many possible appearances!
m “visual degrees of freedom” (eg., pose, lighting, etc)
R discrete samples for each DOF
How to discretely sample the DOFs?
How to PREDICT/SYNTHESIS/MATCH with novel views?
Appearance-based Recognition
• Example:
• Visual DOFs: Object type P, Lighting Direction L, Pose R
• Set of R * P * L possible images:
  {xˆ }
P
RL
xˆ is an image of N pixels and
A point in N-dimensional space
Pixel 2 gray value
• Image as a point in high dimensional space:
xˆ
Pixel 1 gray value
The Space of Faces
=

+
An image is a point in a high dimensional space
– An N x M image is a point in RNM
– We can define vectors in this space as we did in the 2D case
[Thanks to Chuck Dyer, Steve Seitz, Nishino]
Key Idea
• Images in the possible set
P
  {xˆRL
}are highly correlated.
• So, compress them to a low-dimensional subspace that
captures key appearance characteristics of the visual DOFs.
• EIGENFACES: [Turk and Pentland]
USE PCA!
Eigenfaces
Eigenfaces look somewhat like generic faces.
Linear Subspaces
convert x into v1, v2 coordinates
What does the v2 coordinate measure?
- distance to line
- use it for classification—near 0 for orange pts
What does the v1 coordinate measure?
- position along line
- use it to specify which orange point it is

Classification can be expensive
– Must either search (e.g., nearest neighbors) or store large
probability density functions.

Suppose the data points are arranged as above
– Idea—fit a line, classifier measures distance to line
Dimensionality Reduction

Dimensionality reduction
– We can represent the orange points with only their v1 coordinates
 since v2 coordinates are all essentially 0
– This makes it much cheaper to store and compare points
– A bigger deal for higher dimensional problems
Linear Subspaces
Consider the variation along direction v
among all of the orange points:
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution: v1 is eigenvector of A with largest eigenvalue
v2 is eigenvector of A with smallest eigenvalue
Higher Dimensions

Suppose each data point is N-dimensional
– Same procedure applies:
– The eigenvectors of A define a new coordinate system


eigenvector with largest eigenvalue captures the most variation among
training vectors x
eigenvector with smallest eigenvalue has least variation
– We can compress the data by only using the top few
eigenvectors

corresponds to choosing a “linear subspace”
– represent points on a line, plane, or “hyper-plane”

these eigenvectors are known as the principal components
Problem: Size of Covariance Matrix A

Suppose each data point is N-dimensional (N pixels)
2
– The size of covariance matrix A is N x N
– The number of eigenfaces is N
2
– Example: For N = 256 x 256 pixels,
Size of A will be 65536 x 65536 !
Number of eigenvectors will be 65536 !
Typically, only 20-30 eigenvectors suffice. So, this
method is very inefficient!
Efficient Computation of Eigenvectors
If B is MxN and M<<N then A=BTB is NxN >> MxM
– M  number of images, N  number of pixels
– use BBT instead, eigenvector of BBT is easily
converted to that of BTB
(BBT) y = e y
=> BT(BBT) y = e (BTy)
=> (BTB)(BTy) = e (BTy)
=> BTy is the eigenvector of BTB
Eigenfaces – summary in words

Eigenfaces are
the eigenvectors of
the covariance matrix of
the probability distribution of
the vector space of
human faces

Eigenfaces are the ‘standardized face ingredients’ derived from
the statistical analysis of many pictures of human faces

A human face may be considered to be a combination of these
standardized faces
Generating Eigenfaces – in words
1.
2.
3.
4.
Large set of images of human faces is taken.
The images are normalized to line up the eyes,
mouths and other features.
The eigenvectors of the covariance matrix of the
face image vectors are then extracted.
These eigenvectors are called eigenfaces.
Eigenfaces for Face Recognition

When properly weighted, eigenfaces can be summed
together to create an approximate gray-scale
rendering of a human face.

Remarkably few eigenvector terms are needed to
give a fair likeness of most people's faces.

Hence eigenfaces provide a means of applying data
compression to faces for identification purposes.
Dimensionality Reduction
The set of faces is a “subspace” of the set
of images
– Suppose it is K dimensional
– We can find the best subspace using PCA
– This is like fitting a “hyper-plane” to the
set of faces

Any face:
spanned by vectors v1, v2, ..., vK
Eigenfaces

PCA extracts the eigenvectors of A
– Gives a set of vectors v1, v2, v3, ...
– Each one of these vectors is a direction in face space

what do these look like?
Projecting onto the Eigenfaces

The eigenfaces v1, ..., vK span the space of faces
– A face is converted to eigenface coordinates by
Is this a face or not?
Recognition with Eigenfaces

Algorithm
1. Process the image database (set of images with labels)
•
•
Run PCA—compute eigenfaces
Calculate the K coefficients for each image
2. Given a new image (to be recognized) x, calculate K
coefficients
3. Detect if x is a face
4. If it is a face, who is it?
•
Find closest labeled face in database
• nearest-neighbor in K-dimensional space
Key Property of Eigenspace Representation
Given
• 2 images
•
•
Then,
xˆ1 , xˆ2
that are used to construct the Eigenspace
gˆ1 is the eigenspace projection of image xˆ1
gˆ 2 is the eigenspace projection of image xˆ2
|| gˆ 2  gˆ1 ||  || xˆ2  xˆ1 ||
That is, distance in Eigenspace is approximately equal to the
correlation between two images.
Choosing the Dimension K
eigenvalues
i=
K
NM

How many eigenfaces to use?

Look at the decay of the eigenvalues
– the eigenvalue tells you the amount of variance “in the
direction” of that eigenface
– ignore eigenfaces with low variance
Papers
More Problems: Outliers
Sample Outliers
Intra-sample outliers
Need to explicitly reject outliers before or during computing PCA.
[De la Torre and Black]
Robustness to Intra-sample outliers
PCA
RPCA
RPCA: Robust PCA, [De la Torre and Black]
Robustness to Sample Outliers
Original
PCA
RPCA
Outliers
Finding outliers = Tracking moving objects
Research Questions

Does PCA encode information related to gender, ethnicity, age, and identity
efficiently?

What information do PCA encode?

Are there components (features) of PCA that encode multiples properties?
PCA


The aim of the PCA is a linear reduction of D dimensional data to d dimensional
data (d<D), while preserving as much information, in the data, as possible.
Linear functions
y1= w1 X
y2= w2 X
*
*
*
yd= wd X
Y= W X

X – inputs; Y – outputs, components; W – eigenvectors, eigenfaces, basis vectors
w1
w2
x2
x1
How many components?


Usual choice consider the first d PC’s which account for some percentage, usually
above 90 %, of the cumulative variance of the data.
This is disadvantageous if the last components are interesting
W1
x2
W2
x1
Dataset
Property



A subset of FERET dataset
2670 grey scale frontal face
images
Rich in variety: face images
vary in pose, background
lighting, presence or absence of
glasses, slight change in
expression
Gender
Ethnicity
Age
Identity
No.
Categorie
s
Categories
No.
Face
s
Male
1603
Female
1067
Caucasian
1758
African
320
East Asian
363
20 – 29
665
30 – 39
1264
40 – 49
429
50 – 59
206
60+
106
Individuals with 3
or more
examples
1161
2
3
5
358
Dataset




Each image is pre-processed to a 65 X 75 resolution.
Aligned based on eye locations
Cropped such that little or no hair information is available
Histogram equalisation is applied to reduce lighting effects
Does PCA efficiently represents information in face
images?







Images of 65 × 75 resolution leads to a dimensionality of 4875.
The first 350 components accounted for 90% variance of the data.
Each face is thus represented using 350 components instead of 4875
dimensions
Classification employing 5-fold cross validation, with 80% of faces
in each category for training and 20% of faces in each category for
testing
for identity recognition leave-one-out method is used.
LDA is performed on the PCA data
Euclidean measure is used for classification
Property
Classification
Gender
86.4%
Ethnicity
81.6%
Age
91.5%
Identity
90%
What information does PCA encode? – Gender


Gender encoding power estimated using the LDA
3rd component carries highest gender encoding power followed by the 4th
components
All important components are among the first 50 components
9
8
Gender Encoding Power

7
6
5
4
3
2
1
0
0
10
20
30
Components
40
50
What information does PCA encode? – Gender
-6 SD
-4 SD
-2 SD
Mean
+2 SD
+4 SD
+6 SD
Reconstructed images from the altered components (a) third and (b) fourth
components. The components are progressively added by quantities of -6 S.D
(extreme left) to +6 S.D (extreme right) in steps of 2 S.D.


Third component encodes information related to the complexion, length of the nose,
presence or absence of hair on the forehead, and texture around the mouth region.
Fourth component encodes information related to the eyebrow thickness, presence
or absence of smiling expression
Gender
(a) Face examples with the first two being female and the next two being male faces. (b)
Reconstructed faces of (a) using the top 20 gender important components. (c)
Reconstructed faces of (a) using all components, except the top 20 gender important
components.
What information does PCA encode? – Ethnicity
10
Ethnicity Encoding Power
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
Components


6th component carries highest ethnicity encoding power followed by the 15th
components
All ethnicity important components are among the first 50 components
Ethnicity
-6 SD
-4 SD
-2 SD
Mean
+2 SD
+4 SD
+6 SD
Reconstructed images from the altered components (a) 6th and (b) 4th components. The
components are progressively added by quantities of -6 S.D (extreme left) to +6 S.D
(extreme right) in steps of 2 S.D.


6th component encodes information related to complexion, broadness and length of
the nose
15th component encodes information related to length of the nose, complexion, and
presence or absence of smiling expression
What information does PCA encode? – Age
10

9
Age Encoding Power
8
7

6
5
Age – 20-39 and 50-60+ age
groups termed as young and old)
10th component is found to be the
most important for age
4
3
2
1
0
0
10
20
30
40
50
Components
-6 SD
-4 SD
-2 SD
Mean
+2 SD
+4 SD
+6 SD
Reconstructed images from the altered tenth component. The component is progressively
added by quantities of -6 S.D (extreme left) to +6 S.D (extreme right) in steps of 2 S.D
What information does PCA encode? – Identity
0.7
Identity Encoding Power
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60
80
100
Components


Many components are found to be important for identity. However, their
importance magnitude is small.
These components are widely distributed and not restricted to the first 50
components
Can a single component encode multiple properties?

9
10
8
9
8
7
Age Encoding Power

A grey beard informs that the person is a male and also, most probably, old.
As all important components of gender, ethnicity, and age are among the first 50
components there are overlapping components.
One example is the 3rd component which is found to be the most important for
gender and second most important for age
Gender Encoding Power

6
5
4
3
2
6
5
4
3
2
1
0
7
1
0
10
20
30
Components
40
50
0
0
10
20
30
Components
40
50
Can a single component encode multiple properties?
0.16
0.14
Young male
Young female
Old male
Old female
0.14
Young male
Young female
Old male
Old female
0.12
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
-10
-5
0
5
Component 3
10
15
20
0
-10
-5
0
5
10
Component 4
Normal distribution plots of the (a) third (b) and fourth components for male and female
classes of young and old age groups.
15
Summary



PCA encodes face image properties such as gender, ethnicity, age, and
identity efficiently.
Very few components are required to encode properties such as gender,
ethnicity and age and these components are amongst the first few
components which capture large part of the variance of the data. Large
number of components are required to encode identity and these
components are widely distributed.
There may be components which encode multiple properties.
Principal Component Analysis (PCA)

PCA and classification
– PCA is not always an optimal dimensionality-reduction
procedure for classification purposes.

Multiple classes and PCA
– Suppose there are C classes in the training data.
– PCA is based on the sample covariance which characterizes the
scatter of the entire data set, irrespective of class-membership.
– The projection axes chosen by PCA might not provide good
discrimination power.
Linear Discriminant Analysis
(LDA)

What is the goal of LDA?
– Perform dimensionality reduction “while preserving as much
of the class discriminatory information as possible”.
– Seeks to find directions along which the classes are best
separated.
– Takes into consideration the scatter within-classes but also the
scatter between-classes.
– More capable of distinguishing image variation due to identity
from variation due to other sources such as illumination and
expression.
Linear Discriminant Analysis
(LDA)
Angiograph Image Enhancement
Webcamera Calibration
THANKS
QUESTIONS