Lecture 11 Face Recognition CCA LDA Kernel PCA, ICA MDS/LLE

Transcript Lecture 11 Face Recognition CCA LDA Kernel PCA, ICA MDS/LLE

EE462 MLCV

Lecture 13-14 Face Recognition Subspace/Manifold Learning

Tae-Kyun Kim

Face Recognition Applications

EE462 MLCV

• Applications include – Automatic face tagging at commercial weblogs – Face image retrieval in MPEG7 (our solution is MPEG7 standard) – Automatic passport control – Feature length film character summarisation • A key issue is in the efficient representation of face images.

EE462 MLCV

Face Recognition vs Object Categorisation

Class 1 Class 2

Face image data sets

Intraclass variation Interclass variation

Object categorisation data sets

Class 2 Class 1 Intraclass variation Interclass variation 3

EE462 MLCV

Face Recognition vs Object Categorisation

In both, we try representations/features that minimise intraclass variations and maximise interclass variations. Face image variations are more subtle, compared to those of generic object categories.

Subspace/manifold techniques, cf. Bag of Words, are dominating-arts for face image analysis.

EE462 MLCV

Principal Component Analysis (PCA) Maximum Variance Formulation Minimum-error formulation Probabilistic PCA

EE462 MLCV

Maximum Variance Formulation of PCA

• PCA (also known as Karhunen-Loeve transform) is a technique for dimensionality reduction, lossy data compression, feature extraction, and data visualisation.

• PCA can be defined as the orthogonal projection of the data onto a lower dimensional linear space such that the variance of the projected data is maximised.

EE462 MLCV

• Given a data set {

n }, n = 1,...,

and

n ∈ R

, our goal is to project the data onto a space of dimension

M << D

while maximising the projected data variance.

For simplicity, M = 1. The direction of this space is defined by a vector

1 ∈ R

s.t.

1 T

1 = 1.

Each data point

1 T

n .

is then projected onto a scalar value 7

The mean is , where The variance is given by where S is the data covariance matrix defined as

EE462 MLCV

We maximise the projected variance

1 T

1 with the normalisation condition

1 T

1 = 1. with respect to The Lagrange multiplier formulation is By setting the derivative with respect to

1 to zeros, we obtain

1 is an eigenvector of

By multiplying

, the variance is obtained by 9

EE462 MLCV

The variance is a maximum when

1 largest eigenvalue λ 1 . is the eigenvector with the The eigenvector is called the

principal component

For the general case of an obtained by the

M M

dimensional subspace, it is eigenvectors

1 ,

2 , … ,

of the data covariance matrix

eigenvalues λ 1 , λ 2 corresponding to the …, λ

M .

largest 𝐮 1 𝐮 2 𝛿 𝑖𝑗 = 1, 𝑖𝑓 𝑖 = 𝑗 0,

otherwise

EE462 MLCV

Minimum-error formulation of PCA

• Alternative (equivalent) formulation of PCA is to minimise the projection error. We consider an orthonormal set of

dimensional basis vectors {

i }, i=1,...,

s.t.

𝛿 𝑖𝑗 = 1, 𝑖𝑓 𝑖 = 𝑗 0, otherwise • Each data point is represented by a linear combination of the basis vectors 11

EE462 MLCV

• The coefficients α ni have = x n T

i , and without loss of generality we Our goal is to approximate the data point using

M << D

. Using

-dimensional linear subspace, we write each data point as where

b i

are constants for all data points.

EE462 MLCV

• We minimise the distortion measure with repsect to

, z

b i

Setting the derivative with respect to

z nj

orthonormality conditions, we have to zero, from the where

j = 1, … , M.

Setting the derivative of

w.r.t.

b i

to zero gives where

j = M + 1, … , D.

EE462 MLCV

If we substitute for

z ni

and

b i

, we have We see that the displacement vectors lie in the space orthogonal to the principal subspace, as it is a linear combination of u

,where

i = M + 1, … , D.

We further get 14

EE462 MLCV

• Consider a two-dimensional data space dimensional principal subspace

M D

= 2 and a one = 1. Then, we choose

2 that minimises Setting the derivative w.r.t.

2 to zeros yields

2 = λ 2

2 We therefore obtain the minimum value of

by choosing

2 as the eigenvector corresponding to the smaller eigenvalue.

We choose the principal subspace by the eigenvector with the larger eigenvalue.

EE462 MLCV

• The general solution is to choose the eigenvectors of the covariance matrix with

largest eigenvalues.

where

= 1, ... ,

The distortion measure becomes 16

EE462 MLCV

Applications of PCA to Face Recognition

EE462 MLCV

(Recap) Geometrical interpretation of PCA

• Principal components are the vectors in the direction of the maximum variance of the projection data.

𝐱 2 • For given 2D data points, u1 and u2 are found as 𝐮 1 PCs.

𝐮 2 𝐱 1 • For dimension reduction, Each 2D data point is transformed to a single variable z1 representing the projection of the data point onto the eigenvector u1.

The data points projected onto u1 has the max variance.

• PCA infers the inherent structure of high dimensional data.

• The intrinsic dimensionality of data is much smaller.

EE462 MLCV

Eigenfaces

• Collect a set of face images.

• Normalize for scale, orientation, location (using eye locations), and vectorise them.

w h   

=wh

  

R D



N N

: number of images • Construct the covariance matrix

and obtain eigenvectors

 1

N X



 

   ...,

x i



,...





R D



M M

: number of eigenvectors 19

EE462 MLCV

Eigenfaces

• Project data onto the subspace



U T X



R M





• Reconstruction is obtained as 

i M

  1

z i u i



, ~



• Use the distance to the subspace for face recognition

 || 20

EE462 MLCV

Eigenfaces

Method 1

• Given face images of different classes (i.e. identities), ci, compute the principal (eigen) subspace per class. • A query (test) image, x, is projected on each eigen-subspace and its reconstruction error is measured.

• The class that has the minimum error is assigned.

c1 c2 PCA x c3 : reconstruction by c th class subspace assign arg 𝑐 | 21

EE462 MLCV

Eigenfaces

Method 2

• Given face images of different classes (i.e. identities), ci, compute the principal (eigen) subspace over all data. • A query (test) image, x, is projected on the eigen-subspace and its projection, z, is compared with the projections of the class means.

• The class that has the minimum error is assigned.

c1 x c2 PCA 𝑧_1 𝑧 𝑧_3 𝑧_2 𝑧_𝑐 : projection of c-th class data mean assign arg 𝑐 min | 𝑧 − 𝑧_𝑐 | 22

EE462 MLCV

Matlab Demos Face Recognition by PCA

• • • • • Face Images Eigenvectors and Eigenvalue plot Face image reconstruction Projection coefficients (visualisation of high-dimensional data) Face recognition 23

EE462 MLCV

Probabilistic PCA (PPCA)

• A subspace is spanned by the orthonormal basis (eigenvectors computed from covariance matrix).

• It interprets each observation with a generative model. • It estimates the probability of generating each observation with Gaussian distribution,

PCA:

uniform prior on the subspace

PPCA:

Gaussian dist. on the subspace 24

EE462 MLCV

Continuous Latent Variable Model

• PPCA has a continuous latent variable. • GMM (mixture of Gaussians) is the model with a discrete latent variable.

Lecture 3-4 • PPCA represents that the original data points lie close to a manifold of much lower dimensionality.

• In practice, the data points will not be confined precisely to a smooth low-dimensional manifold. We interpret the departures of data points from the manifold as

noise

EE462 MLCV

Continuous Latent Variable Model

• Consider an example of digit images that undergo a random displacement and rotation. • The images have the size of 100 x 100 pixel values, but the degree of freedom of variability across images is only three: vertical, horizontal translations and rotations. • The data points live on

a subspace whose intrinsic dimensionality

three.

is • The translation and rotation parameters are continuous

latent (hidden) variables

. We only observe the image vectors.

EE462 MLCV

Probabilistic PCA

• PPCA is an example of the linear-Gaussian framework, in which all marginal and conditional distributions are Gaussian.

Lecture 15-16 • We define a Gaussian prior distribution over the latent variable z as The observed

dimensional variable

is defined as where is the

is an

M D x M

dimensional Gaussian latent variable,

matrix and ε is a

dimensional zero-mean Gaussian-distributed noise variable with covariance σ

EE462 MLCV

• The conditional distribution takes the Gaussian form This is a generative process on a mapping from latent space to data space, in contrast to the conventional view of PCA.

• The marginal distribution is written in the form From the linear-Gaussian model, the marginal distribution is again Gaussian as where 28

The above can be seen from

EE462 MLCV

Maximum likelihood Estimation for PPCA

• We need to determine the parameters μ,

and σ 2 , which maximise the log-likelihood.

• Given a data set

= {

n } of observed data points, PPCA can be expressed as a directed graph.

EE462 MLCV

The log likelihood is For detailed optimisations, see Tipping and Bishop, PPCA (1999). where

is the

D x M

eigenvector matrix of

, and

diagonal eigenvalue matrix,

is the

M x M

is an orthogonal rotation matrix s.t. RR T = I.

EE462 MLCV

Redundancy happens up to rotations, R, of the latent space coordinates. Consider a matrix where R is an orthogonal rotation matrix s.t. RR T = I. We see Hence, it is independent of R.

EE462 MLCV

• Conventional PCA is generally formulated as a projection of points from the

dimensional data space onto an

dimensional linear subspace. • PPCA is most naturally expressed as a mapping from the latent space to the data space. • We can reverse this mapping using Bayes' theorem to get the posterior distribution

(

) as where the

matrix

is defined by 34

Limitations of PCA

EE462 MLCV

Unsupervised learning

EE462 MLCV

PCA finds the direction for maximum variance of data (unsupervised), while LDA (Linear Discriminant Analysis) finds the direction that optimally separates data of different classes (supervised).

PCA vs LDA

EE462 MLCV

Linear model

PCA is a linear projection method. When data lies in a nonlinear manifold, PCA is extended to Kernel PCA by the kernel trick.

Lecture 9-10 𝝓(𝒙)

Linear Manifold = Subspace

PCA vs Kernel PCA

Nonlinear Manifold

Gaussian assumption

EE462 MLCV

PCA models data as Gaussian distributions (2 nd order statistics), whereas ICA (Independent Component Analysis) captures higher-order statistics.

PCA PC2 IC1 IC2 ICA PC1

PCA vs ICA

EE462 MLCV

Holistic bases

PCA bases are holistic (cf. part-based) and less intuitive. ICA or NMF (Non-negative Matrix Factorisation) yields bases, which capture local facial components.

(or ICA) Daniel D. Lee and H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization".

5): 788 –791.

Nature

401

(675 39