Transcript Lecture 11 Face Recognition CCA LDA Kernel PCA, ICA MDS/LLE
EE462 MLCV
Lecture 13-14 Face Recognition Subspace/Manifold Learning
Tae-Kyun Kim
1
Face Recognition Applications
EE462 MLCV
• Applications include – Automatic face tagging at commercial weblogs – Face image retrieval in MPEG7 (our solution is MPEG7 standard) – Automatic passport control – Feature length film character summarisation • A key issue is in the efficient representation of face images.
2
EE462 MLCV
Face Recognition vs Object Categorisation
Class 1 Class 2
Face image data sets
Intraclass variation Interclass variation
Object categorisation data sets
Class 2 Class 1 Intraclass variation Interclass variation 3
EE462 MLCV
Face Recognition vs Object Categorisation
In both, we try representations/features that minimise intraclass variations and maximise interclass variations. Face image variations are more subtle, compared to those of generic object categories.
Subspace/manifold techniques, cf. Bag of Words, are dominating-arts for face image analysis.
4
EE462 MLCV
Principal Component Analysis (PCA) Maximum Variance Formulation Minimum-error formulation Probabilistic PCA
5
EE462 MLCV
Maximum Variance Formulation of PCA
• PCA (also known as Karhunen-Loeve transform) is a technique for dimensionality reduction, lossy data compression, feature extraction, and data visualisation.
• PCA can be defined as the orthogonal projection of the data onto a lower dimensional linear space such that the variance of the projected data is maximised.
6
EE462 MLCV
• Given a data set {
x
n }, n = 1,...,
N
and
x
n ∈ R
D
, our goal is to project the data onto a space of dimension
M << D
while maximising the projected data variance.
For simplicity, M = 1. The direction of this space is defined by a vector
u
1 ∈ R
D
s.t.
u
1 T
u
1 = 1.
Each data point
x
n
u
1 T
x
n .
is then projected onto a scalar value 7
The mean is , where The variance is given by where S is the data covariance matrix defined as
EE462 MLCV
8
EE462 MLCV
We maximise the projected variance
u
1 T
Su
1
u
1 with the normalisation condition
u
1 T
u
1 = 1. with respect to The Lagrange multiplier formulation is By setting the derivative with respect to
u
1 to zeros, we obtain
u
1 is an eigenvector of
S
.
By multiplying
u
1
T
, the variance is obtained by 9
EE462 MLCV
The variance is a maximum when
u
1 largest eigenvalue λ 1 . is the eigenvector with the The eigenvector is called the
principal component
.
For the general case of an obtained by the
M M
dimensional subspace, it is eigenvectors
u
1 ,
u
2 , … ,
u
M
of the data covariance matrix
S
eigenvalues λ 1 , λ 2 corresponding to the …, λ
M .
M
largest 𝐮 1 𝐮 2 𝛿 𝑖𝑗 = 1, 𝑖𝑓 𝑖 = 𝑗 0,
otherwise
10
EE462 MLCV
Minimum-error formulation of PCA
• Alternative (equivalent) formulation of PCA is to minimise the projection error. We consider an orthonormal set of
D
dimensional basis vectors {
u
i }, i=1,...,
D
s.t.
𝛿 𝑖𝑗 = 1, 𝑖𝑓 𝑖 = 𝑗 0, otherwise • Each data point is represented by a linear combination of the basis vectors 11
EE462 MLCV
• The coefficients α ni have = x n T
u
i , and without loss of generality we Our goal is to approximate the data point using
M << D
. Using
M
-dimensional linear subspace, we write each data point as where
b i
are constants for all data points.
12
EE462 MLCV
• We minimise the distortion measure with repsect to
u
i
, z
ni
,
b i
.
Setting the derivative with respect to
z nj
orthonormality conditions, we have to zero, from the where
j = 1, … , M.
Setting the derivative of
J
w.r.t.
b i
to zero gives where
j = M + 1, … , D.
13
EE462 MLCV
If we substitute for
z ni
and
b i
, we have We see that the displacement vectors lie in the space orthogonal to the principal subspace, as it is a linear combination of u
i
,where
i = M + 1, … , D.
We further get 14
EE462 MLCV
• Consider a two-dimensional data space dimensional principal subspace
M D
= 2 and a one = 1. Then, we choose
u
2 that minimises Setting the derivative w.r.t.
u
2 to zeros yields
Su
2 = λ 2
u
2 We therefore obtain the minimum value of
J
by choosing
u
2 as the eigenvector corresponding to the smaller eigenvalue.
We choose the principal subspace by the eigenvector with the larger eigenvalue.
15
EE462 MLCV
• The general solution is to choose the eigenvectors of the covariance matrix with
M
largest eigenvalues.
where
I
= 1, ... ,
M
.
The distortion measure becomes 16
EE462 MLCV
Applications of PCA to Face Recognition
17
EE462 MLCV
(Recap) Geometrical interpretation of PCA
• Principal components are the vectors in the direction of the maximum variance of the projection data.
𝐱 2 • For given 2D data points, u1 and u2 are found as 𝐮 1 PCs.
𝐮 2 𝐱 1 • For dimension reduction, Each 2D data point is transformed to a single variable z1 representing the projection of the data point onto the eigenvector u1.
The data points projected onto u1 has the max variance.
• PCA infers the inherent structure of high dimensional data.
• The intrinsic dimensionality of data is much smaller.
18
EE462 MLCV
Eigenfaces
• Collect a set of face images.
• Normalize for scale, orientation, location (using eye locations), and vectorise them.
w h
D
=wh
X
R D
N N
: number of images • Construct the covariance matrix
S
and obtain eigenvectors
U
.
S
1
N X
X
T
,
SU
U
,
X
...,
x i
x
,...
U
R D
M M
: number of eigenvectors 19
EE462 MLCV
Eigenfaces
• Project data onto the subspace
Z
U T X
,
Z
R M
N
,
M
D
• Reconstruction is obtained as
i M
1
z i u i
Uz
, ~
X
UZ
• Use the distance to the subspace for face recognition
x
||
x
|| 20
EE462 MLCV
Eigenfaces
Method 1
• Given face images of different classes (i.e. identities), ci, compute the principal (eigen) subspace per class. • A query (test) image, x, is projected on each eigen-subspace and its reconstruction error is measured.
• The class that has the minimum error is assigned.
c1 c2 PCA x c3 : reconstruction by c th class subspace assign arg 𝑐 | 21
c3
EE462 MLCV
Eigenfaces
Method 2
• Given face images of different classes (i.e. identities), ci, compute the principal (eigen) subspace over all data. • A query (test) image, x, is projected on the eigen-subspace and its projection, z, is compared with the projections of the class means.
• The class that has the minimum error is assigned.
c1 x c2 PCA 𝑧_1 𝑧 𝑧_3 𝑧_2 𝑧_𝑐 : projection of c-th class data mean assign arg 𝑐 min | 𝑧 − 𝑧_𝑐 | 22
EE462 MLCV
Matlab Demos Face Recognition by PCA
• • • • • Face Images Eigenvectors and Eigenvalue plot Face image reconstruction Projection coefficients (visualisation of high-dimensional data) Face recognition 23
EE462 MLCV
Probabilistic PCA (PPCA)
• A subspace is spanned by the orthonormal basis (eigenvectors computed from covariance matrix).
• It interprets each observation with a generative model. • It estimates the probability of generating each observation with Gaussian distribution,
PCA:
uniform prior on the subspace
PPCA:
Gaussian dist. on the subspace 24
EE462 MLCV
Continuous Latent Variable Model
• PPCA has a continuous latent variable. • GMM (mixture of Gaussians) is the model with a discrete latent variable.
Lecture 3-4 • PPCA represents that the original data points lie close to a manifold of much lower dimensionality.
• In practice, the data points will not be confined precisely to a smooth low-dimensional manifold. We interpret the departures of data points from the manifold as
noise
.
25
EE462 MLCV
Continuous Latent Variable Model
• Consider an example of digit images that undergo a random displacement and rotation. • The images have the size of 100 x 100 pixel values, but the degree of freedom of variability across images is only three: vertical, horizontal translations and rotations. • The data points live on
a subspace whose intrinsic dimensionality
three.
is • The translation and rotation parameters are continuous
latent (hidden) variables
. We only observe the image vectors.
26
EE462 MLCV
Probabilistic PCA
• PPCA is an example of the linear-Gaussian framework, in which all marginal and conditional distributions are Gaussian.
Lecture 15-16 • We define a Gaussian prior distribution over the latent variable z as The observed
D
dimensional variable
x
is defined as where is the
z
is an
M D x M
dimensional Gaussian latent variable,
W
matrix and ε is a
D
dimensional zero-mean Gaussian-distributed noise variable with covariance σ
2
I
.
27
EE462 MLCV
• The conditional distribution takes the Gaussian form This is a generative process on a mapping from latent space to data space, in contrast to the conventional view of PCA.
• The marginal distribution is written in the form From the linear-Gaussian model, the marginal distribution is again Gaussian as where 28
The above can be seen from
EE462 MLCV
29
EE462 MLCV
30
EE462 MLCV
Maximum likelihood Estimation for PPCA
• We need to determine the parameters μ,
W
and σ 2 , which maximise the log-likelihood.
• Given a data set
X
= {
x
n } of observed data points, PPCA can be expressed as a directed graph.
31
EE462 MLCV
The log likelihood is For detailed optimisations, see Tipping and Bishop, PPCA (1999). where
U
M
is the
D x M
eigenvector matrix of
S
, and
L
M
diagonal eigenvalue matrix,
R
is the
M x M
is an orthogonal rotation matrix s.t. RR T = I.
32
EE462 MLCV
Redundancy happens up to rotations, R, of the latent space coordinates. Consider a matrix where R is an orthogonal rotation matrix s.t. RR T = I. We see Hence, it is independent of R.
33
EE462 MLCV
• Conventional PCA is generally formulated as a projection of points from the
D
dimensional data space onto an
M
dimensional linear subspace. • PPCA is most naturally expressed as a mapping from the latent space to the data space. • We can reverse this mapping using Bayes' theorem to get the posterior distribution
p
(
z
|
x
) as where the
M
x
M
matrix
M
is defined by 34
Limitations of PCA
EE462 MLCV
35
Unsupervised learning
EE462 MLCV
PCA finds the direction for maximum variance of data (unsupervised), while LDA (Linear Discriminant Analysis) finds the direction that optimally separates data of different classes (supervised).
PCA vs LDA
36
EE462 MLCV
Linear model
PCA is a linear projection method. When data lies in a nonlinear manifold, PCA is extended to Kernel PCA by the kernel trick.
Lecture 9-10 𝝓(𝒙)
Linear Manifold = Subspace
PCA vs Kernel PCA
Nonlinear Manifold
37
Gaussian assumption
EE462 MLCV
PCA models data as Gaussian distributions (2 nd order statistics), whereas ICA (Independent Component Analysis) captures higher-order statistics.
PCA PC2 IC1 IC2 ICA PC1
PCA vs ICA
38
EE462 MLCV
Holistic bases
PCA bases are holistic (cf. part-based) and less intuitive. ICA or NMF (Non-negative Matrix Factorisation) yields bases, which capture local facial components.
(or ICA) Daniel D. Lee and H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization".
5): 788 –791.
Nature
401
(675 39