KERNEL INDEPENDENT COMPONENT ANALYSIS BY FRANCIS …
Download
Report
Transcript KERNEL INDEPENDENT COMPONENT ANALYSIS BY FRANCIS …
KERNEL INDEPENDENT COMPONENT
ANALYSIS
BY
FRANCIS BACH & MICHAEL JORDAN
International Conference on Acoustics,
Speech, and Signal Processing
(ICASSP), 2003
Presented by Nagesh Adluru
Goal of the Paper
To perform Independent Component
Analysis (ICA) in a novel way which is
better, robust compared to the
existing techniques.
07.07.2015
2
Concepts Involved
ICA – Independent Component Analysis
Mutual Information
F – Correlation
RKHS – Reproducing Kernel Hilbert
Spaces
CCA – Canonical Correlation Analysis
KICA – Kernel ICA
KGV – Kernel Generalized Variance
07.07.2015
3
ICA – Independent Component Analysis
ICA is unsupervised learning
We have estimate x given the set of observations of y
(Assumption components of x are independent).
So we have to estimate W such that
x = Wy
07.07.2015
4
ICA – Independent Component Analysis
ICA is semi-parametric.
Because we do not know anything about the
distribution of x it is non-parametric.
But we do know the distribution of y and that it
is a distribution of ‘linear combination’ of
components of x.
So the problem is semi-parametric and kernels
do well in such situations.
07.07.2015
5
ICA – Independent Component Analysis
If we knew the distribution of x then we
can assume the ‘x-space’ and hence can
find W using gradient or fixed-point
algorithm.
But not in practice!!! So how??
Since we are looking for independent
components we need to maximize the
independence or minimize mutual
information.
07.07.2015
6
Mutual Information
Mutual Information is an abstract term that is
used to describe independence among
variables.
The mutual information is the least when the
dependence is the least.
So looks promising to be explored!!!
Prior work has focused on approximations to this
term because of difficulty involved with realvariables and finite samples.
Kernels offer better ways.
07.07.2015
7
F – Correlation
F – Correlation is defined as below:
If x1 and x2 are independent then the value is
zero but converse is important here.
07.07.2015
8
F – Correlation
Converse: If
is zero then the x1 and
x2 are independent.
Is that true?
It is true only if F ‘space’ is very large.
But it is also true if F is restricted to the
reproducing Kernel Hilbert Spaces based
on Gaussian kernels.
07.07.2015
9
F – Correlation
Since the converse holds even for the
restriction of F to RKHS, a mutual
information can be defined such that if it is
0 then the two variables are independent.
07.07.2015
10
RKHS – Reproducing Kernel Hilbert
Spaces
Operations using kernels can be treated as
operations in Hilbert space.
The reproducing ability of the kernels of
operations in Euclidean space is exploitable for
computational purposes.
So the correlation between fs can be interpreted
as the correlation between Фs which is defined
as the canonical correlation between Фs.
07.07.2015
11
CCA – Canonical Correlation Analysis
CCA vs PCA
PCA maximizes variance of projection of
distribution of a single random vector.
CCA maximizes correlation between
projections of distributions of two or more
random vectors. CIJ = cov(xI, xJ)
07.07.2015
12
CCA – Canonical Correlation Analysis
While PCA leads to eigenvector problem CCA
leads to generalized eigenvector problem.
(Eigenvector problem: AV = V Generalized
eigenvector problem: AV = BV)
The CCA can easily be kernelized and also
generalized to more than two random vectors.
So the max correlation between variables can
be found efficiently, which is very nice.
07.07.2015
13
CCA – Canonical Correlation Analysis
Though this kernelization of CCA can help
us, the generalization is not precise in
terms mutual independence measure
using F – Correlation.
But that is not limitation in practice, both
because of empirical results as well as
because mutuality could be achieved
using pair-wise dependence.
07.07.2015
14
Kernel ICA
We saw
And also that
can be calculated using
kernelized CCA.
So we now have Kernel – ICA not in the
sense that the basic ICA is kernelized but
because using kernelized CCA.
07.07.2015
15
KICA – Kernel ICA Algorithm
Input: W and
Procedure:
Estimate set
Minimize
are [N*N] Gram matrices for
each component of the random vector.
(Equivalent to generalized CCA, where
each of the m vectors is a single element
vector)
07.07.2015
16
KICA – Kernel ICA
Computational Complexity of calculating
‘smallest’ generalized eigen value of
matrices of size mN is O(N3). (Note: the
eigen values are not directly related to the
entries in W.)
But we can reduce it because of special
properties of the Gram matrix spectrum (or
range of values in its space) to O(M2N),
where M is a constant < N.
07.07.2015
17
KICA – Kernel ICA
The next crucial job is to find minimum C(W) in
the space and that W is called de-mixing matrix.
Preferably data is whitened (PCA) and W is
restricted to be ‘orthogonal’ because decorrelation implies independence.
The search for W in this restricted space (called
Stiefel manifold) can be done with Riemannian
metric suggesting gradient type algorithms.
07.07.2015
18
KICA – Kernel ICA
The problem of local-minima can be
solved either using heuristics (instead of
random) for selecting initial W.
Also it has been shown empirically that a
decent number of restarts would solve this
problem when large number of samples
are available.
07.07.2015
19
KGV – Kernel Generalized Variance
F – Correlation is the ‘smallest’
generalized eigenvalue of KCCA.
Idea with KGV is to make use of other
values as well.
The mutual information contrast function is
defined as
where
07.07.2015
20
Simulation Results
The results on the simulation data showed
that the KICA is better compared to other
ICA algorithms like FastICA, Jade, Imax
for larger number of ‘components’.
The simulation data was mixture of variety
of source distributions like subgaussian,
supergaussian and nearly gaussian.
The KICA is also robust for outliers.
07.07.2015
21
Simulation Results
07.07.2015
22
Conclusions
This paper proposed novel kernel-based
measures for independence.
The approach is flexible and
computationally demanding (because of
additional search in finding eigenvalues).
07.07.2015
23
Questions!!
07.07.2015
24