Transcript Document
Feature Generation: Linear Transforms By Zhang Hongxin State Key Lab of CAD&CG 2004-03-24 Outline Introduction PCA and SVD ICA Other transforms Introduction Goal: choosing suitable transforms, so as to obtain high “information packing”. Raw data -> Meaningful features. Unsupervised/Automatic methods. To exploit and remove information redundancies via transform. Basis Vectors and Images Input samples xT [ x(0), x(1),...,x( N 1)] Unitary NxN matrix A and H y A x transformed Vector Basis vector representation N 1 x Ay y(i)ai i 0 N 1 a j , x a Hj x y(i) a j , ai y( j ) i 0 Basis Vectors and Images (cont’) When X is an N N image, A is a huge piece of bread to eat ( N 2 N 2 ) An alternative possibility: Let U and V be two unitary matrices, and Y U H XV Then Y is diagonal N 1 N 1 X Y (i, j )ui v Hj i 0 j 0 N 1 X Y (i, i )ui viH i 0 NN The Karhunen-Loeve Transform Goal: to generate features that are optimally uncorrelated, that is, E[ y(i) y( j )] 0, i j Correlation matrix R y E[yyT ] E[AT xxT A] AT R x A R x is symmetric, A is chosen so that its columns are the orthonormal eigenvectors a i of R x R y AT R x A Λ Properties of KL transform Mean square error approximation: N 1 x y (i)ai and y (i ) ai x i 0 m xˆ y(i)ai Approximation! i 0 Error estimation: 2 E[ x xˆ ] E[ N 1 i m N 1 E[ y i m 2 (i )] N 1 2 y (i)ai ] E[ aiT E[ xxT ]ai i m N 1 i m, j m ( y(i)ai )T ( y( j )a j )T ] Principle Component Analysis Choosing the eigenvectors corresponding to the m largest eigen-values of the correlation matrix, to obtain minimal error N 1 N 1 i m i m E[ x xˆ ] aTi i ai i 2 This is also the minimum MSE, compare with any other approximation of x by an m-dimensional vector. A different form: computing A in terms of eigenvalues of the covariance matrix. Remarks of PCA Total variance From all possible sets of m features, obtained via any orthorgnal linear transformation on x, KL have the largest sum variance. Entropy H y E[ln py (y)] ln R y ln(0 1...m 1 ) When zero mean Gaussian H y 12 E[y T R y1y ] 12 ln R y m2 ln( 2 ) E[yT Ry1y] E[trace(yT Ry1y)] E[trace(Ry1yT y)] trace(I) m Geometry interpretation If the data points x1 , x2 ,..., xN 1 form an ellipsoidal shaped cloud the eigenvectors are the principal axes of this hyper-ellipsoid the first principal axis is the line that passes through its greatest dimension Singular value decomposition SVD of X x0,0 x0,1 X x0, M 1 Singular values x1,0 x1,1 x2,0 x2,1 x1, M 1 x2, M 1 xN 1,0 xN 1,1 USV xN 1, M 1 N M x1 Rx XX T (USV )(USV )T US 2U T Unitary Matrices An example: Eigenfaces G. D. Finlayson, B. Schiele & J. Crowley. Comprehensive colour image normalisation. ECCV 98 pp. 475~490. Problem of PCA 1 2 Independent component analysis Goal: find independence rather than un-correlation of the data. Given the set of input samples X, determine an NxN invertible matrix W such that the entries y(i) of the transformed vector y Wx are mutually independent. ICA is meaningful only the involved random variables are non-Gaussian. ICA based on Second and Fourth-order Cumulants Hint: let Second and Fourth-order Cumulants be zero. Step1. Perform a PCA on the input data. yˆ AT x Step2. Compute another unitary matrix, so that the fourth-order cross cumulants of the components of ˆ T yˆ yA are zero. Equivalent to find ˆ) max ( A T ˆ ˆ 1 AA Matrix diagonalization N 1 2 ( y ( i )) 4 i 0 Finally, independent components is given by the combined transform ˆ )T x y ( AA ICA based on mutual information An iterative method. Other transforms Discrete Fourier Transform Discrete Wavelet Transform o Please think about the relationship among those Linear Transforms.