Human brain dynamics and synchrony measures, applications

Download Report

Transcript Human brain dynamics and synchrony measures, applications

Quantifying
statistical
Analyzing
Brain
Signals
interdependence of point processes
by Combinatorial Optimization
Application to spike data and EEG
Justin Dauwels
LIDS, MIT
Amari Research Unit, Brain Science Institute, RIKEN
December 1, 2008
Topics
• Mathematical problem
Similarity of Multiple Point Processes
• Motivation/Application
Early diagnosis of Alzheimer’s disease from EEG signals
• Along the way…
Spike synchrony
Collaborators
François Vialatte*, Theophane Weber+, and Andrzej Cichocki*
Financial Support
(*RIKEN, +MIT)
Alzheimer's disease
One disease,
many symptoms
memory, language, executive functions,
apraxia, apathy, agnosia, etc…
Memory
(forgetting
relatives)
Apathy
Video sources: Alzheimer society
• 2% to 5% of people over 65 years old
• up to 20% of people over 80
Evolution of the disease (stages)
•
-
2 to 5 years before
mild cognitive impairment (MCI)
6 to 25 % progress to Alzheimer‘s
•
-
Mild (early stage)
becomes less energetic or spontaneous
noticeable cognitive deficits
still independent (able to compensate)
•
-
Moderate (middle stage)
Mental abilities decline
personality changes
become dependent on caregivers
•
-
Severe (late stage)
complete deterioration of the personality
loss of control over bodily functions
total dependence on caregivers
EEG data
Jeong 2004 (Nature)
GOAL: Diagnosis of MCI based on EEG
• EEG is relatively simple and inexpensive technology
• Early diagnosis: medication more effective, more time to prepare future care of patient, etc.
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-D point processes
 Two multi-D point processes
 Multiple multi-D point processes
Numerical Results
 Conclusion

Alzheimer's disease
Inside glimpse: abnormal EEG
EEG system: inexpensive, mobile, useful for screening
Brain “slow-down”
slow rhythms (0.5-8 Hz)
fast rhythms (8-30 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Decrease of synchrony
•
•
•
focus of this project
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)
MCI vs. mildAD (Babiloni et al., 2006).
Images: www.cerebromente.org.br
Spontaneous (scalp) EEG
Time-frequency |X(t,f)|2
(wavelet transform)
f (Hz)
Time-frequency patterns
(“bumps”)
Fourier |X(f)|2
Fourier power
amplitude
t (sec)
EEG x(t)
Sparse representation: bump model
f(Hz)
f(Hz)
Bumps
Sparse
representation
104-
105
coefficients
t (sec)
t (sec)
f(Hz)
t (sec)
Assumptions:
about 102 parameters
1. time-frequency map is suitable representation
2. oscillatory bursts (“bumps”) convey key information
F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).
Similarity of bump models
How “similar” are n ≥ 2 bump models?
Similarity of multiple multi-dimensional point processes
with
and
“point” / ”event”
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-dim point processes
 Two multi-dim point processes
 Multiple multi-dim point processes
Numerical Results
 Conclusion

Two one-dimensional point processes
x
0
t
0
t
x’
How synchronous/similar?
Classical methods for continuous time series fail
e.g., cross-correlation
Two aspects of synchrony
Analogy: waiting for a train
• Train may not arrive (e.g., mechanical problem)
= Event reliability
• Train may or may not be on time
= Timing precision
Two 1-dim point processes
Review of Spike Synchrony Measures
 Surrogate Spike Data
 Spike Trains from Morris-Lecar Neuron
 Conclusion

Spike Synchrony Measures






Von Rossum distance (mixed)
Schreiber et al similarity measure (mixed)
Hunter-Milton similarity measure (mixed)
Victor-Purpura distance metric (event reliability)
Event synchronization (mixed)
Stochastic event synchrony
(timing precision and event reliability)
Van Rossum distance measure
• Spikes convolved with exponential or Gaussian function
→ spike trains converted into time series s(t) and s’(t)
x
x’
0
0
τR
• Squared distance between s(t) and s’(t)
• If x = x’, we have DR = 0
• Time constant τR
van Rossum M.C.W., 2001. A novel spike distance. Neural Computation 13, 751–63.
Schreiber et al. similarity measure
• Spikes convolved with exponential or Gaussian function
→ spike trains converted into time series s(t) and s’(t)
• Correlation between s(t) and s’(t)
• If x = x’, we have SS = 1
• Time constant τS
Schreiber S., Fellous J.M., Whitmer J.H., Tiesinga P.H.E., and Sejnowski T.J., 2003. A new correlation-based measure of spike timing reliability.
Neurocomputing 52, 925–931.
Victor-Purpura distance measure
• Minimal cost DV of transforming x into x'
• Basic operations
• event insertion/deletion: cost = 1
• event movement: cost proportional to distance (constant CV)
DELETION
x
x’
0
0
INSERTION
• If x = x’, we have DV = 0
• Time constant τV = 1/CV
Victor J. D. and Purpura K. P., 1997. Metric-space analysis of spike trains: theory, algorithms, and application. Network: Comput. Neural Systems 8(17), 127–164.
Stochastic Event Synchrony
• x and x’ synchronous if identical apart from
• delay
• little timing jitter
• few deletions/insertions
• based on generative statistical model
x
v
0
0
x’
0
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Stochastic Event Synchrony
x
non-coincident
x
0
T0
0
T0
v
0
0
T0
-δt /2
T0
δt /2
x
x’
0
non-coincident
T0
Stochastic event synchrony (SES): delay δt , jitter st , non-coincidence ρ
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Stochastic Event Synchrony
x
non-coincident
x
0
i.i.d. deletions with prob pd
T0
Gaussian offsets with
mean -δt /2 and variance st /2
0
T0
geometric prior for lenght
v
0
T0
-δt /2
events i.u.d. in [0,T0]
Gaussian offsets with
mean δt /2 and variance st /2
0
T0
δt /2
x
x’
0
non-coincident
i.i.d. deletions with prob pd
T0
Marginalizing over v:
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Probabilistic inference
PROBLEM: Given 2 point processes x and x’, compute ρ and θ = δt , st
APPROACH:
(j*, j’*,θ*) = argmaxj,j’,θ log p(x, x’, j, j’,θ)
SOLUTION: Coordinate descent
(j(i+1) , j’(i+1) ) = argmaxj,j’ log p(x, x’, j , j’ , θ(i))
θ(i+1) = argmaxx log p(x, x’, j(i+1) , j’(i+1) , θ)
DYNAMIC PROGRAMMING
PARAMETER ESTIMATION
x’6
x’5
x’4
x’3
x’2
x’1
xk non-coincident
0
0 x1 x2 x3 x4 x5 x6
x’k’ non-coincident
(xk x’k’ ) coincident pair
Spike Synchrony Measures






Von Rossum distance (mixed)
Schreiber et al similarity measure (mixed)
Hunter-Milton similarity measure (mixed)
Victor-Purpura distance metric (event reliability)
Event synchronization (mixed)
Stochastic event synchrony
(timing precision and event reliability)
Two 1-dim point processes
Review of Spike Synchrony Measures
 Surrogate Spike Data
 Spike Trains from Morris-Lecar Neuron
 Conclusion

Surrogate Data
• pd = 0, 0.1, …, 0.4 (deletion probability)
• δt = 0, 25, and 50 ms (delay)
• σt = 10, 30, and 50 ms (timing jitter)
• length of hidden sequence = 40/(1-pd)
• expected length of x and x’ = 40
E{S} computed over 10’000 pairs
Surrogate Data: Results
δt =0
Van Rossum measure DR
similar for SS ,SH ,SQ
Victor Purpura measure DV
• E{DR} increases with pd and σt
→ DR cannot distinguish timing dispersion from event reliability
(likewise all measures except SES and DV)
• E{DV} increases with pd, practically independent of σt
→ DV measure for event reliability
• ONLY curves for δt = 0ms, measures strongly depend on lag
Surrogate Data: Results for SES
• E{σt} increases with σt, practically independent of pd
→ σt measure for timing dispersion
•E{ρ} increases with pd, practically independent of σt
→ ρ measure for event reliability
• Curves for δt = 0, 25, and 50 ms practically coincident
Two 1-dim point processes
Review of Spike Synchrony Measures
 Surrogate Spike Data
 Spike Trains from Morris-Lecar Neuron
 Conclusion

Morris-Lecar Neurons
• Simple neuron model
• Exhibits behavior of Type I & II neurons (saddle-node/Hopf bifurc.)
• Input current: baseline + sinusoid + Gaussian noise
Spiking threshold
• Membrane potential
Type II
Type I
5 trials
Morris-Lecar Neurons (2)
Type II
Type I
50 trials
High reliability
Large timing dispersion
jitter st = (15ms)2, non-coincidence ρ = 3%
Low reliability
Small timing dispersion
jitter st = (3ms)2, non-coincidence ρ = 27%
Morris-Lecar Neurons: Results
• Small τ: Type II has larger similarity than type I (dispersion in Type I)
• Large τ: Type I has larger similarity than type II (drop-outs in Type II)
• Observation:
Similarity depends on time constant τ → similarity FUNCTION S(τ)
SES AUTOMATICALLY selects st
Two 1-dim point processes
Review of Spike Synchrony Measures
 Surrogate Spike Data
 Spike Trains from Morris-Lecar Neuron
 Conclusion

Conclusion
• Similarity of pairs of spike trains: timing precision and reliability
• Comparison of various spike synchrony measures
• Most measures not able to separate the two aspect of synchrony
• Exception: Victor-Purpura and Stochastic Event Synchrony
• Victor-Purpura: event reliability
• SES: both timing precision and event reliability
• Most measures depend on time constant, to be chosen by user
• Exception: Event Synchronization and SES
• Most measures sensitive to lags between the two spike trains
• Exception: SES
• Future work: application to neurophysiological recordings
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-dim point processes
 Two multi-dim point processes
 Multiple multi-dim point processes
Numerical Results
 Conclusion

Similarity of two bump models...
... by matching bumps
• Bumps in one model, but NOT in other
→ fraction of “non-coincident” bumps ρ
• Bumps in both models, but with offset
→ Average time offset δt (delay)
→ Timing jitter with variance st
→ Average frequency offset δf
→ Frequency jitter with variance sf
Stochastic Event Synchrony (SES)
= (ρ, δt, st, δf, sf )
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number of bumps
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
y y’
( -δt /2, -δf /2)
( δt /2, δf /2)
• offset between hidden and observed bump
= Gaussian random vector with
mean ( ±δt /2, ±δf /2)
covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Summary
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING → max-product
ESTIMATION → closed-form
Dauwels J., Vialatte F., Rutkowski T., and Cichocki A., 2007. Measuring neural synchrony by message passing, NIPS 20, in press.
Average synchrony
1. Group electrodes in regions
2. Bump model for each region
3. SES for each pair of models
4. Average the SES parameters
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-dim point processes
 Two multi-dim point processes
 Multiple multi-dim point processes
Numerical Results
 Conclusion

Beyond pairwise interactions
Pairwise similarity
Multi-variate similarity
Similarity of multiple bump models
y1 y2 y3 y4 y5
Models similar if
• few deletions/large clusters
• little jitter
y1 y2 y3 y4 y5
Constraint: in each cluster at most
one bump from each signal
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
y1 y2 y3 y4 y5
Generate M “noisy” observations
• offset between hidden and observed bump
= Gaussian random vector with
mean ( δt,m /2, δf,m /2)
covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
pc (i) = p(cluster size = i |y)
(i = 1,2,…,M)
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
• “deletion” with probability pd
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Probabilistic inference
PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m, pc
APPROACH:
(b*,θ*) = argmaxb,θ log p(y, y’, b, θ)
SOLUTION: Coordinate descent
b(i+1) = argmaxc log p(y, y’, b, θ(i) )
θ(i+1) = argmaxx log p(y, y’, b(i+1) ,θ )
CLUSTERING (Integer Program)
ESTIMATION OF PARAMETERS
Integer programming methods (e.g., LP relaxation)
• IP with 10.000 variables solved in about 1s
• CPLEX: commercial toolbox for solving IPs (combines several algorithms)
Dauwels J., Vialatte F., Weber T. and Cichocki. Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008.
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-dim point processes
 Two multi-dim point processes
 Multiple multi-dim point processes
Numerical Results
 Conclusion

EEG Data
• EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched
control subjects (CTR) recorded while in rest with closed eyes
→ spontaneous EEG
• All 22 MCI patients suffered from Alzheimer’s disease (AD) later on
• Electrodes located on 21 sites according to 10-20 international system
• Electrodes grouped into 5 zones (reduces number of pairs)
1 bump model per zone
• Band pass filtered between 4 and 30 Hz
EEG data provided by Prof. T. Musha
Similarity measures
•
•
Correlation and coherence
Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ...
TIME
•
Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform)
No Phase Locking
•
State space based measures
sync likelihood, S-estimator, S-H-N-indices, ...
•
FREQUENCY
Information-theoretic measures
KL divergence, Jensen-Shannon divergence, ...
Phase Locking
Sensitivity (average synchrony)
Corr/Coh
Granger
Info. Theor.
State Space
Phase
SES
Significant differences for ffDTF and SES (more unmatched bumps, but same amount of jitter)
Mann-Whitney test: small p value suggests large difference in statistics of both groups
Classification (bi-SES)
± 85% correctly classified
ffDTF
•
•
•
Clear separation, but not yet useful as diagnostic tool
Additional indicators needed (fMRI, MEG, DTI, ...)
Can be used for screening population (inexpensive, simple, fast)
Correlations
Strong (anti-) correlations
„families“ of sync measures
Overview

Alzheimer’s Disease (AD)
decrease in EEG synchrony

Similarity of Point Processes
 Two
1-dim point processes
 Two multi-dim point processes
 Multiple multi-dim point processes
Numerical Results
 Conclusion

Conclusions

Measure for similarity of point processes

Key idea: matching of events

Applications
 Spiking synchrony (surrogate data/Morris Lecar neuron)
 EEG synchrony of MCI patients

SES allows to distinguish event reliability from timing precision


About 85-90% correctly classified MCI vs. healthy subjects
perhaps useful for screening a large population
Future work:
 Combination with other modalities (MEG, fMRI, ...)
 Integration of biophysical models
 Alternative inference techniques (variations on max-product, Monte-Carlo)
Quantifying
statistical
Analyzing
Brain
Signals
interdependence of point processes
by Combinatorial Optimization
Application to spike data and EEG
Justin Dauwels
LIDS, MIT
Amari Research Unit, Brain Science Institute, RIKEN
December 1, 2008
References + software
References
Quantifying Statistical Interdependence by Message Passing on Graphs
PART I: One-Dimensional Point Processes, Neural Computation (under revision)
Quantifying Statistical Interdependence by Message Passing on Graphs
PART II: Multi-Dimensional Point Processes, Neural Computation (under revision)
Quantifying Statistical Interdependence by Message Passing on Graphs
PART III: Multivariate Approach, Neural Computation (in preparation)
A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease
Based on EEG, NeuroImage (under revision)
On the Early Diagnosis of Alzheimer's Disease Based on EEG,
Current Alzheimer’s Research (in preparation, invited review)
Measuring Neural Synchrony by Message Passing, NIPS 2007
Analyzing Brain Signals by Combinatorial Optimization, Allerton 2008
Software
MATLAB implementation of the synchrony measures
MATLAB Toolbox for bump modelling
.
Summary
Similarity of multiple multi-dimensional point processes
Step 1: TWO ONE-dimensional point processes
Dynamic programming
Step 2: TWO MULTI-dimensional point processes
Max-product/LP relaxation/Edmund-Karp
Step 3: MULTIPLE MULTI-dimensional point processes
Integer Programming
Estimation
Simple closed form expressions
Deltas: average offset
...where
Sigmas: var of offset
artificial observations (conjugate prior)
Large-scale synchrony
Apparently, all brain regions affected...
Alzheimer's disease
Million of sufferers
Million of sufferers
Outside glimpse: the future (prevalence)
USA (Hebert et al. 2003)
• 2% to 5% of people
over 65 years old
• Up to 20% of people
over 80
Jeong 2004 (Nature)
World (Wimo et al. 2003)
Ongoing and future work
Applications

Fluctuations of EEG synchrony









Caused by auditory stimuli and music (T. Rutkowski)
Caused by visual stimuli (F. Vialatte)
Yoga professionals (F. Vialatte)
Professional shogi players (RIKEN & Fujitsu)
Brain-Computer Interfaces (T. Rutkowski)
Spike data from interacting monkeys (N. Fujii)
Calcium propagation in gliacells (N. Nakata)
Neural growth (Y. Tsukada & Y. Sakumura)
...
Algorithms
alternative inference techniques (e.g., MCMC, linear programming)
 time dependent (Gaussian processes)
 multivariate (T.Weber)

Fitting bump models

Initialisation
Adaptation
After adaptation
Signal
gradient method
Bump
F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).
Boxplots
SURPRISE!
No increase in jitter, but significantly less matched activity!
Physiological interpretation
• neural assemblies more localized?
• harder to establish large-scale synchrony?
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
y y’
( -δt /2, -δf /2)
( δt /2, δf /2)
• offset between hidden and observed bump
= Gaussian random vector with
mean ( ±δt /2, ±δf /2)
covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
Easily extendable to more than 2 observations…
Probabilistic inference
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING
POINT ESTIMATION
Alzheimer's disease
Inside glimpse: abnormal EEG
EEG system: inexpensive, mobile, useful for screening
Brain “slow-down”
slow rhythms (0.5-8 Hz)
fast rhythms (8-30 Hz)
(Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993).
Decrease of synchrony
•
•
•
focus of this project
AD vs. MCI
(Hogan et al. 203; Jiang et al., 2005)
AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006)
MCI vs. mildAD (Babiloni et al., 2006).
Images: www.cerebromente.org.br
Comparing EEG signal rhythms ?
2 signals
PROBLEM I:
Signals of 3 seconds sampled at 100 Hz ( 300 samples)
Time-frequency representation of one signal = about 25 000 coefficients
Comparing EEG signal rhythms ?(2)
One pixel
Numerous
neighboring pixels
PROBLEM II:
Shifts in time-frequency!
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
y1 y2 y3 y4 y5
Generate M “noisy” observations
• offset between hidden and observed bump
= Gaussian random vector with
mean ( δt,m /2, δf,m /2)
covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
pc (i) = p(cluster size = i |y)
(i = 1,2,…,M)
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
• “deletion” with probability pd
Classification (multi-SES)
Average bump freq
± 85% correctly classified
Average cluster size
Average bump width
Average cluster size
± 90% correctly classified
ffDTF
Similarity of bump models...
How “similar” or “synchronous” are two bump models?
Signatures of local synchrony
f (Hz)
Time-frequency patterns
(“bumps”)
EEG stems from thousands of neurons
bump if neurons are phase-locked
= local synchrony
t (sec)
Alzheimer's disease
Inside glimpse: brain atrophy
amyloid plaques and
neurofibrillary tangles
Video source:
Alzheimer society
Images: Jannis Productions.
(R. Fredenburg; S. Jannis)
Video source: P. Thompson, J.Neuroscience, 2003
Probabilistic inference
POINT ESTIMATION: θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
Uniform prior p(θ): δt, δf = average offset, st, sf = variance of offset
Conjugate prior p(θ): still closed-form expression
Other kind of prior p(θ): numerical optimization (gradient method)
Probabilistic inference
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
EQUIVALENT to (imperfect) bipartite max-weight matching problem
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) = argmaxc Σkk’ wkk’(i) ckk’
s.t. Σk’ ckk’ ≤ 1 and Σk ckk’ ≤ 1 and ckk’ 2 {0,1}
find heaviest set of disjoint edges
not necessarily perfect
ALGORITHMS
• Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm)
• Linear programming relaxation: extreme points of LP polytope are integral
• Max-product algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]
Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
Generative model
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β-2)ckk’
Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
Conditioning on θ
μ↓ μ↑ μ↓
μ↑
Max-product algorithm (2)
• Iteratively compute messages
• At convergence, compute marginals p(ckk’) = μ↓(ckk’) μ↓(ckk’) μ↑(ckk’)
• Decisions: c*kk’ = argmaxckk p(ckk’)
’
Algorithm
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING → max-product
ESTIMATION → closed-form
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
Generate two “noisy” observations
y y’
( -δt /2, -δf /2)
( δt /2, δf /2)
• offset between hidden and observed bump
= Gaussian random vector with
mean ( ±δt /2, ±δf /2)
covariance diag(st/2, sf /2)
• amplitude, width (in t and f) all i.i.d.
• “deletion” with probability pd
Easily extendable to more than 2 observations…
Generative model (2)
y y’
i
i’
( -δt /2, -δf /2)
j’
( δt /2, δf /2)
• Binary variables ckk’
ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 (e.g., cii’ = 1 cij’ = 0)
• Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’ are binary (“matching constraints”)
• Generative Model p(y, y’, yhidden , c, δt , δf , st , sf )
θ
(symmetric in y and y’)
• Eliminate yhidden → offset is Gaussian RV with mean = ( δt , δf ) and covariance diag (st , sf)
p(y, y’, c, θ) = ∫ p(y, y’, yhidden , c, θ) dyhidden
• Probabilistic Inference: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
Summary
• Bumps in one model, but NOT in other
→ fraction of “spurious” bumps ρspur
• Bumps in both models, but with offset
→ Average time offset δt (delay)
→ Timing jitter with variance st
→ Average frequency offset δf
→ Frequency jitter with variance sf
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
Objective function
y y’
i
( -δt /2, -δf /2)
i’
j’
( δt /2, δf /2)
• Logarithm of model: log p(y, y’, c, θ) = Σkk’ wkk’ ckk’ + log I(c) + log pθ(θ) + γ
wkk’ = -(1/st (t k’ – tk – δt)2 + 1/sf (f k’ – fk– δf)2 )
- 2 log β
Euclidean distance between bump centers
β = pd (λ/V)1/2
• Large wkk’ if :
a) bumps are close
b) small pd
c) few bumps per volume element
• No need to specify pd , λ, and V, they only appear through β = knob to control # matches
Distance measures
Scaling
wkk’ = 1/st,kk’ (t k’ – tk – δt)2 + 1/sf,kk’ (f k’ – fk– δf)2 + 2 log β
st,kk’ = (Δtk + Δt’k) st
Non-Euclidean
sf,kk’ = (Δfk + Δf’k) sf
Generative model
p(y, y’, c, θ) / I(c) pθ(θ) Πkk’ (N(t k’ – tk ; δt ,st,kk’) N(f k’ – fk ; δf ,sf, kk’) β-2)ckk’
Prior for parameters

Expect bumps to appear at about same frequency, but delayed
Frequency shift requires non-linear transformation, less likely than delay

Conjugate priors for st and sf (scaled inverse chi-squared):

Improper prior for δt and δt : p(δt) = 1 = p(δf)
Preliminary results for multi-variate model
linear comb of pc
CTR
MCI
Probabilistic inference
PROBLEM: Given two bump models, compute (ρspur, δt, st, δf, sf )
θ
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
APPROACH:
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING
POINT ESTIMATION
Minx2 X, y2Y d(x,y)
X
Y
Generative model
yhidden
Generate bump model (hidden)
• geometric prior for number n of bumps
p(n) = (1- λ S) (λ S)-n
• bumps are uniformly distributed in rectangle
• amplitude, width (in t and f) all i.i.d.
y1 y2 y3 y4 y5
Generate M “noisy” observations
• offset between hidden and observed bump
= Gaussian random vector with
mean ( δt,m /2, δf,m /2)
covariance diag(st,m/2, sf,m /2)
• amplitude, width (in t and f) all i.i.d.
pc (i) = p(cluster size = i |y)
(i = 1,2,…,M)
Parameters: θ = δt,m , δf,m , st,m , sf,m, pc
• “deletion” with probability pd
(other prior pc0 for cluster size)
Role of local synchrony
Stimuli
Consolidation
Assembly
activation
Assembly
recall
Voice Face
Stimulus
Hebbian consolidation
Voice
(Hebb 1949, Fuster 1997)
Probabilistic inference
PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m, pc
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
CLUSTERING (IP or MP)
POINT ESTIMATION
Integer program
• Max-product algorithm (MP) on sparse graph
• Integer programming methods (e.g., LP relaxation)
Fourier transform
2
1
2
3
3
1
Frequency
High frequency
Low frequency
Windowed Fourier transform
Fourier basis functions
*
=
Window
function
windowed basis
functions
f
Windowed
Fourier
Transform
t
Overview
Alzheimer’s Disease (AD):
decrease in EEG synchrony
 Synchrony measure in time-frequency domain

 Pairs
of EEG signals
 Collections of EEG signals
Numerical Results
 Conclusion

Average synchrony
1. Group electrodes in regions
2. Bump model for each region
3. SES for each pair of models
4. Average the SES parameters
Beyond pairwise interactions...
Pairwise similarity
Multi-variate similarity
Similarity measures
•
•
Correlation and coherence
Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ...
TIME
•
Phase Synchrony: compare instantaneous phases (wavelet/Hilbert transform)
No Phase Locking
•
State space based measures
sync likelihood, S-estimator, S-H-N-indices, ...
•
FREQUENCY
Information-theoretic measures
KL divergence, Jensen-Shannon divergence, ...
Phase Locking
Generative model (2)
Model
Cost function
unit cost
of non-coincident event
unit cost
of coincident pair
Surrogate Data: Results (2)
• SS depends on δt
• likewise other S except SES
Probabilistic inference
PROBLEM: Given two bump models, compute (ρ, δt, st, δf, sf )
θ
APPROACH:
(c*,θ*) = argmaxc,θ log p(y, y’, c, θ)
SOLUTION: Coordinate descent
c(i+1) = argmaxc log p(y, y’, c, θ(i) )
θ(i+1) = argmaxx log p(y, y’, c(i+1) ,θ )
MATCHING
POINT ESTIMATION
Probabilistic inference (2)
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
EQUIVALENT to (imperfect) bipartite max-weight matching problem
c(i+1) = argmaxc log p(y, y’, c, θ(i) ) = argmaxc Σkk’ wkk’(i) ckk’
s.t. Σk’ ckk’ ≤ 1 and Σk ckk’ ≤ 1 and ckk’ 2 {0,1}
find heaviest set of disjoint edges
not necessarily perfect
ALGORITHMS
• Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm)
• Linear programming relaxation: gives optimal solution if unique [Sanghavi (2007)]
• Max-product algorithm gives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]
Max-product algorithm
MATCHING: c(i+1) = argmaxc log p(y, y’, c, θ(i) )
μ↓ μ↑ μ↓
μ↑
• At convergence, compute marginals p(ckk’) = μ↓(ckk’) μ↓(ckk’) μ↑(ckk’)
• Decisions: c*kk’ = argmaxckk p(ckk’) (optimal if solution unique)
’
Exemplar-based formulation
yhidden
y1 y2 y3 y4 y5
• Exemplars = identical copies of hidden bumps = cluster “center”
• Other bumps in cluster = non-identical copies of exemplars
• Is event an exemplar?
• If not, which exemplar is it associated with?
• Several constraints
Integer program
Exemplar-based formulation: IP
Binary Variables
Integer Program: LINEAR objective function/constraints
Equivalent to k-dim matching: for k = 2: in P but for k > 2: NP-hard!