Transcript Document

Feature Generation: Linear
Transforms
By Zhang Hongxin
State Key Lab of CAD&CG
2004-03-24
Outline




Introduction
PCA and SVD
ICA
Other transforms
Introduction
 Goal: choosing suitable transforms,
so as to obtain high “information
packing”.
 Raw data -> Meaningful features.
 Unsupervised/Automatic methods.
 To exploit and remove information
redundancies via transform.
Basis Vectors and Images
 Input samples
xT  [ x(0), x(1),...,x( N 1)]
 Unitary NxN matrix A and
H
y

A
x
transformed Vector
 Basis vector representation
N 1
x  Ay   y(i)ai
i 0
N 1
 a j , x  a Hj x   y(i)  a j , ai   y( j )
i 0
Basis Vectors and Images (cont’)
 When X is an N  N image, A is a huge
piece of bread to eat ( N 2  N 2 )
 An alternative possibility:
 Let U and V be two unitary
matrices, and
Y  U H XV
 Then
Y is diagonal
N 1 N 1
X   Y (i, j )ui v Hj
i 0 j 0
N 1
X   Y (i, i )ui viH
i 0
NN
The Karhunen-Loeve Transform
 Goal: to generate features that are
optimally uncorrelated, that is,
E[ y(i) y( j )]  0, i  j
 Correlation matrix
R y  E[yyT ]  E[AT xxT A]  AT R x A
 R x is symmetric, A is chosen so that its columns
are the orthonormal eigenvectors a i of R x
R y  AT R x A  Λ
Properties of KL transform
 Mean square error approximation:
N 1
x   y (i)ai and y (i )  ai x
i 0
m
xˆ   y(i)ai
Approximation!
i 0
 Error estimation:
2
E[ x  xˆ ]  E[
N 1

i m

N 1
 E[ y
i m
2
(i )] 
N 1
2
y (i)ai ]  E[
 aiT E[ xxT ]ai
i m
N 1

i  m, j  m
( y(i)ai )T ( y( j )a j )T ]
Principle Component Analysis
 Choosing the eigenvectors corresponding to the m
largest eigen-values of the correlation matrix, to
obtain minimal error
N 1
N 1
i m
i m
E[ x  xˆ ]   aTi i ai   i
2
 This is also the minimum MSE, compare with any
other approximation of x by an m-dimensional
vector.
 A different form: computing A in terms of eigenvalues of the covariance matrix.
Remarks of PCA
 Total variance
 From all possible sets of m features, obtained via any
orthorgnal linear transformation on x, KL have the
largest sum variance.
 Entropy
H y  E[ln py (y)]
ln R y  ln(0 1...m 1 )
 When zero mean Gaussian
H y   12 E[y T R y1y ]  12 ln R y  m2 ln( 2 )
E[yT Ry1y]  E[trace(yT Ry1y)]  E[trace(Ry1yT y)]  trace(I)  m
Geometry interpretation
 If the data points x1 , x2 ,..., xN 1 form an ellipsoidal
shaped cloud


the eigenvectors are the principal axes of this hyper-ellipsoid
the first principal axis is the line that passes through its greatest dimension
Singular value decomposition
 SVD of X
 x0,0

x0,1

X


 x0, M 1
Singular values
x1,0
x1,1
x2,0
x2,1
x1, M 1
x2, M 1
xN 1,0 

xN 1,1 
 USV


xN 1, M 1 
N M
x1
Rx  XX T  (USV )(USV )T  US 2U T
Unitary Matrices
An example: Eigenfaces
 G. D. Finlayson, B. Schiele & J. Crowley. Comprehensive
colour image normalisation. ECCV 98 pp. 475~490.
Problem of PCA
1
2
Independent component analysis
 Goal: find independence rather than
un-correlation of the data.
 Given the set of input samples X, determine an NxN
invertible matrix W such that the entries y(i) of the
transformed vector
y  Wx
 are mutually independent.
 ICA is meaningful only the involved random
variables are non-Gaussian.
ICA based on Second and Fourth-order
Cumulants
 Hint: let Second and Fourth-order Cumulants be zero.
 Step1. Perform a PCA on the input data.
yˆ  AT x
 Step2. Compute another unitary matrix, so that the
fourth-order cross cumulants of the components of
ˆ T yˆ
yA
are zero. Equivalent to find
ˆ) 
max

(
A
T
ˆ ˆ 1
AA
Matrix diagonalization
N 1
2

(
y
(
i
))
 4
i 0
Finally, independent components is given by the combined
transform
ˆ )T x
y  ( AA
ICA based on mutual
information
 An iterative method.
Other transforms
 Discrete Fourier Transform
 Discrete Wavelet Transform
o Please think about the relationship
among those Linear Transforms.