Transcript Document
Feature Generation: Linear
Transforms
By Zhang Hongxin
State Key Lab of CAD&CG
2004-03-24
Outline
Introduction
PCA and SVD
ICA
Other transforms
Introduction
Goal: choosing suitable transforms,
so as to obtain high “information
packing”.
Raw data -> Meaningful features.
Unsupervised/Automatic methods.
To exploit and remove information
redundancies via transform.
Basis Vectors and Images
Input samples
xT [ x(0), x(1),...,x( N 1)]
Unitary NxN matrix A and
H
y
A
x
transformed Vector
Basis vector representation
N 1
x Ay y(i)ai
i 0
N 1
a j , x a Hj x y(i) a j , ai y( j )
i 0
Basis Vectors and Images (cont’)
When X is an N N image, A is a huge
piece of bread to eat ( N 2 N 2 )
An alternative possibility:
Let U and V be two unitary
matrices, and
Y U H XV
Then
Y is diagonal
N 1 N 1
X Y (i, j )ui v Hj
i 0 j 0
N 1
X Y (i, i )ui viH
i 0
NN
The Karhunen-Loeve Transform
Goal: to generate features that are
optimally uncorrelated, that is,
E[ y(i) y( j )] 0, i j
Correlation matrix
R y E[yyT ] E[AT xxT A] AT R x A
R x is symmetric, A is chosen so that its columns
are the orthonormal eigenvectors a i of R x
R y AT R x A Λ
Properties of KL transform
Mean square error approximation:
N 1
x y (i)ai and y (i ) ai x
i 0
m
xˆ y(i)ai
Approximation!
i 0
Error estimation:
2
E[ x xˆ ] E[
N 1
i m
N 1
E[ y
i m
2
(i )]
N 1
2
y (i)ai ] E[
aiT E[ xxT ]ai
i m
N 1
i m, j m
( y(i)ai )T ( y( j )a j )T ]
Principle Component Analysis
Choosing the eigenvectors corresponding to the m
largest eigen-values of the correlation matrix, to
obtain minimal error
N 1
N 1
i m
i m
E[ x xˆ ] aTi i ai i
2
This is also the minimum MSE, compare with any
other approximation of x by an m-dimensional
vector.
A different form: computing A in terms of eigenvalues of the covariance matrix.
Remarks of PCA
Total variance
From all possible sets of m features, obtained via any
orthorgnal linear transformation on x, KL have the
largest sum variance.
Entropy
H y E[ln py (y)]
ln R y ln(0 1...m 1 )
When zero mean Gaussian
H y 12 E[y T R y1y ] 12 ln R y m2 ln( 2 )
E[yT Ry1y] E[trace(yT Ry1y)] E[trace(Ry1yT y)] trace(I) m
Geometry interpretation
If the data points x1 , x2 ,..., xN 1 form an ellipsoidal
shaped cloud
the eigenvectors are the principal axes of this hyper-ellipsoid
the first principal axis is the line that passes through its greatest dimension
Singular value decomposition
SVD of X
x0,0
x0,1
X
x0, M 1
Singular values
x1,0
x1,1
x2,0
x2,1
x1, M 1
x2, M 1
xN 1,0
xN 1,1
USV
xN 1, M 1
N M
x1
Rx XX T (USV )(USV )T US 2U T
Unitary Matrices
An example: Eigenfaces
G. D. Finlayson, B. Schiele & J. Crowley. Comprehensive
colour image normalisation. ECCV 98 pp. 475~490.
Problem of PCA
1
2
Independent component analysis
Goal: find independence rather than
un-correlation of the data.
Given the set of input samples X, determine an NxN
invertible matrix W such that the entries y(i) of the
transformed vector
y Wx
are mutually independent.
ICA is meaningful only the involved random
variables are non-Gaussian.
ICA based on Second and Fourth-order
Cumulants
Hint: let Second and Fourth-order Cumulants be zero.
Step1. Perform a PCA on the input data.
yˆ AT x
Step2. Compute another unitary matrix, so that the
fourth-order cross cumulants of the components of
ˆ T yˆ
yA
are zero. Equivalent to find
ˆ)
max
(
A
T
ˆ ˆ 1
AA
Matrix diagonalization
N 1
2
(
y
(
i
))
4
i 0
Finally, independent components is given by the combined
transform
ˆ )T x
y ( AA
ICA based on mutual
information
An iterative method.
Other transforms
Discrete Fourier Transform
Discrete Wavelet Transform
o Please think about the relationship
among those Linear Transforms.