Newton Method for the ICA Mixture Model

Download Report

Transcript Newton Method for the ICA Mixture Model

Newton Method for the
ICA Mixture Model
Jason A. Palmer1 Scott Makeig1
Ken Kreutz-Delgado2 Bhaskar D. Rao2
1 Swartz
Center for Computational Neuroscience
2 Dept of Electrical and Computer Engineering
University of California San Diego, La Jolla, CA
Introduction
• Want to model sensor array data with multiple
independent sources — ICA
• Non-stationary source activity — mixture model
• Want the adaptation to be computationally
efficient — Newton method
Outline
• ICA mixture model
• Basic Newton method
• Positive definiteness of Hessian when model
source densities are true source densities
• Newton for ICA mixture model
• Example applications to analysis of EEG
ICA Mixture Model—toy example
• 3 models in two dimensions, 500 points per
model
• Newton method converges < 200 iterations,
natural gradient fails to converge, has difficulty
on poorly conditioned models
10
10
5
5
0
0
-5
-5
-10
-10
-10
-5
0
5
10
-10
-5
0
5
10
ICA Mixture Model
• Want to model observations x(t), t = 1,…,N,
different models “active” at different times
• Bayesian linear mixture model, h = 1, . . . , M :
• Conditionally linear given the model,
• Samples are modeled as independent in time:
:
Source Density Mixture Model
• Each source density mixture component has
unknown location, scale, and shape:
• Generalizes Gaussian
mixture model, more
peaked, heavier tails
ICA Mixture Model—Invariances
• The complete set of parameters to be
estimated is:
h = 1, . . ., M, i = 1, . . ., n, j = 1, . . ., m
• Invariances: W row norm/source density scale
and model centers/source density locations:
Basic ICA Newton Method
• Transform gradient (1st derivative) of cost
function using inverse Hessian (2nd derivative)
• Cost function is data log likelihood:
• Gradient:
• Natural gradient (positive definite transform):
Newton Method – Hessian
• Take derivative of (i,j)th element of gradient
with respect to (k,l)th element of W :
• This defines a linear transform
• In matrix form, this is:
:
Newton Method – Hessian
• To invert: rewrite the Hessian transformation
in terms of the source estimates:
• Define
,
,
• Want to solve linear equation
:
:
Newton Method – Hessian
• The Hessian transformation can be simplified
using source independence and zero mean:
• This leads to 2x2 block diagonal form:
Newton Direction
• Invert Hessian transformation, evaluate at
gradient:
• Leads to the following equations:
• Calculate the Newton direction:
Positive Definiteness of Hessian
• Conditions for positive
definiteness:
• Always true for true when model source
densities match true densities:
1)
2)
3)
Newton for ICA Mixture Model
• Similar derivation applies to ICA mixture model:
Convergence Rates
• Convergence is really much faster than natural
gradient. Works with step size 1!
• Need correct source density model
log likelihood
-1.97
-1.98
-1.99
-2
-2.01
-2.02
-2.03
20
iteration
40
60
80
100
120
iteration
140
160
180
Segmentation of EEG experiment trials
3 models
4 models
trial
trial
time
time
log
likelihood
log
likelihood
iteration
iteration
Applications to EEG—Epilepsy
1 model
5 models
log
likelihood
time
time
log
likelihood
difference
from
single
model
time
Conclusion
• We applied method of Amari, Cardoso and
Laheld, to formulate a Newton method for the
ICA mixture model
• Arbitrary source densities modeled with nongaussian source mixture model
• Non-stationarity modeled with ICA mixture
model (multiple mixing matrices learned)
• It works! Newton method is substantially
faster (superlinear). Also Newton can
converge when Natural Gradient fails
Code
• There is Matlab code available!!
– Generate toy mixture model data for testing
– Full method implemented: mixture sources,
mixture ICA, Newton
• Extended version of paper in preparation, with
derivation of mixture model Newton updates
• Download from:
http://sccn.ucsd.edu/~jason
Acknowledgements
• Thanks to Scott Makeig, Howard Poizner, Julie
Onton, Ruey-Song Hwang, Rey Ramirez, Diane
Whitmer, and Allen Gruber for collecting and
consulting on EEG data
• Thanks to Jerry Swartz for founding and
providing ongoing support the Swartz Center
for Computational Neuroscience
• Thanks for your attention!
Newton for ICA Mixture Model