Estimating models of neural response from spike trains Jonathan Pillow Center for Neural Science New York University Okinawa Computational Neuroscience Course November 11, 2004

Download Report

Transcript Estimating models of neural response from spike trains Jonathan Pillow Center for Neural Science New York University Okinawa Computational Neuroscience Course November 11, 2004

Estimating models of neural
response from spike trains
Jonathan Pillow
Center for Neural Science
New York University
Okinawa Computational Neuroscience Course
November 11, 2004
Goal: characterize the relationship between
environmental stimuli and neural activity
spike response
stimulus
“Neural Coding Problem”
Goal: characterize the relationship between
environmental stimuli and neural activity
spike response
stimulus
300 x 200 pixels =
60,000 dimensions
General problem is too hard!
• stimulus space too large
• response space too large
• relationship is probabilistic
Solution:
- use a model to describe the tranformation from
stimuli to spikes
model
spike response
stimulus
Model criteria:
•
•
•
•
flexibility – represent arbitrary transformations
tractability – fit with extracellular data
biological plausibility – mechanistic interp.
probabilistic – account for resp variability
Basic goals
• develop intuition about neural coding
problem
• introduce several models and statistical
techniques for fitting them.
• discuss insights about neural coding
obtained from statistical modeling efforts
Goals of the current approach
Estimate quantitative models which allow us
to compute P(r|s) for arbitrary stimuli.
stimuli
?
response
“encoding models”
1. Classical approach: measure responses to a
small set of parametrically varying stimuli.
Map response as a function of
contrast, spatial frequency,
temporal frequency, size,
direction of motion, etc.
response (sp/s)
tuning curve
stimulus (orientation)
But, not obvious how to extract a full quantitative model of the response
to an arbitrary stimulus.
Current approach: rich stimulus sets
(e.g. white noise, naturalistic stimuli)
1) (Relatively) free of assumptions about “what drives the cell”
2) Possible to explore a large range of stimulus space quickly and
efficiently
3) Can characterize many neurons in parallel
Disadvantages:
1) May be hard to drive neurons with white noise
2) Characterizations may not apply to other stimulus regimes
2. “spike statistics” approach:
Analyze the renewal statistics of spike trains:
Poisson? Gamma?
Canonical example: Mainen & Sejnowski (1995),
“Reliability of spike timing in neocortical neurons”
•
•
not a complete model of input-output relationship;
doesn’t capture stimulus-dependence of response.
non-mechanistic
3. dynamical systems approach
Study neurons as dynamical systems
Canonical example: Hodgkin & Huxley
Na+ activation (fast)
Na+ inactivation (slow)
K+ activation (slow)
• clear biophysical interpretation; captures rich set of
dynamical behaviors
• tends to ignore statistical questions; models not tractable
for fitting with extracellular data; deterministic
4. Information theory approach
Quantify information between stimulus and spikes
•
often addressed to question of timescale on which spike
trains convey information
•
Canonical debate: “rate code” vs. “spike time code”
•
Mutual information is a functional of P(r|s): quantifies
the encoding relationship into a number of bits.
•
important validation tool for quantifying the
performance of modeling efforts.
Outline
1.
Review: Volterra/Weiner Kernels
a. Linear model
b. Higher order (polynomial) models
2.
LNP cascade models
a. spike-triggered avg
b. spike-triggered covariance
c. other methods
3.
Non-Poisson models
4.
Applications (what good are these models?)
1st idea: linear model

x
Let = stimulus

Model response y as as linear in x
 
y k x

Stim x
Resp y
0 0 0 0 0 0 0 0 0 0 0 0 0 01 0 0 00 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 00 0
1st idea: linear model
Regression solution:
Y =
0
0

X k

k
=
1
…
…
t
1
t
ˆ
k  (X X ) X Y
Derivation:
argmin ||Y-Xk||2
k
Stimulus
Spike-Triggered Avg
Covariance
Test of Linear Model
2
intensity
kˆ
stimulus
1
0
-1
-2
100
true response
linear model
Rate (sp/s)
50
0
-50
0
0.25
0.5
time (sec)
1. Rates are negative!
2. Model output is continuous and deterministic.
Must assume Poisson to obtain P(r|s)
0.75
1
2nd idea: polynomial model
(Volterra/Weiner Kernels)
Taylor series expansion of a function f(x) in n dimensions
const
# pars:
1
vector
matrix
3-tensor
n
(20)
n2
(400)
n3
(8000)
• estimate kernels using moments of spike-triggered stimuli
• in practice, never have enough data to go beyond 2nd order.
intensity
Test of Quadratic Model
2
1
0
-1
-2
stimulus
intensity
Test of Quadratic Model
100
Rate (sp/s)
stimulus
2
1
0
-1
-2
true response
linear model
quad model
LN cascade
50
0
-50
0
0.25
0.5
time (sec)
0.75
1
Why are Volterra/Weiner models so crappy?
Polynomials do a poor job of representing the
nonlinearities found in neurons.
150
true response
linear model
quad model
rate (sp/s)
100
50
0
-50
-3
-2
-1
0
1
stimulus projection
2
3
Summary
so far:
• Goal: find encoding model to compute P(r|s)
• Linear Models, Volterra/Weiner Kernels:
fit a polynomial to P(r|s)
Next Up:
• LNP cascade models: STA, STC
Break!
Goal:
Provide a geometric intuition for
• spike-triggered averaging (STA)
• spike-triggered covariance (STC)
• extensions
First concept:
spike-triggered stimulus ensemble
Spike-triggered ensemble
• 9-sample stimulus block
Spike-triggered ensemble (3D stimulus)
• 8 x 8 x 10 stimulus block
2D stimulus (flickering bars)
• 8 x 6 stimulus block
= 48-dimensional vector
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
• 8 x 6 stimulus block
= 48-dimensional vector
s2
raw stimuli
spiking stimuli
Geometric picture
s1
s2
raw stimuli
spiking stimuli
General Idea:
look for ways to capture
the difference between
the distribution of
red dots, P(s,r),
and blue dots, P(s).
s1
P(spike, stim)
P(spike|stim) =
P(stim)
s2
Computing the STA
s1
s2
raw stimuli
spiking stimuli
STA corresponds to a “direction” in stimulus space
Projecting onto the STA
Projecting onto the STA
STA response
Projecting onto an axis orthogonal to the STA
Projecting onto an axis orthogonal to the STA
LNP (Linear-Nonlinear-Poisson)
cascade model
Stimuli
k
n
Characterization Procedure:
1. Identify a single dimension of stimulus space
that drives the neuron’s response. (STA)
2. Project stimuli into 1-dimensional space
and estimate density of spiking stimuli.
Spike
Responses
Caveat: stimulus choice is important
Example 1:
“sparse” noise

true k
STA
Caveat: stimulus choice is important
• STA requires spherical stimulus distribution
Example 2:
uniform noise

true k
Caveat: stimulus choice is important

• STA provides an unbiased estimate for k only
when stimulus distribution is spherically symmetric.
covariance matrix
t
1
t
• “whitened” STA, ( X X ) X Y , unbiased only
for elliptically symmetric stimuli.
(e.g. correlated Gaussian noise)
How else can the STA fail?
L
N
idea: look for axes with a change in variance!
Looking for changes in variance
Two facts from linear algebra:
1) variance always traces out an ellipse
2) there is a readymade tool for solving this problem
(PCA, eigenvector decomposition, Hotelling transform)
Spike-triggered covariance (STC)
k
n
Multiple linear filters
Adelson & Bergen 1985
Multiple linear filters: STC
Constructing the 2D nonlinearity
Suppressive interactions
divisive normalization

STC: suppressive axes
STC: suppressive axes
Examples:
STC applied to neural data
V1 simple cell
• flickering bars (16 bars x 16 time bins)
Rust, Schwartz, Movshon, Simoncelli CNS*03
V1 simple cell
• flickering bars (16 bars x 16 time bins)
Rust, Schwartz, Movshon, Simoncelli CNS*03
V1 complex cell, standard model
V1 complex cell
Rust, Schwartz, Movshon, Simoncelli CNS*03
Retinal Ganglion Cell
flickering bars (8 bars x 22 time bins)
excitatory axes
suppressive axes
Pillow, Simoncelli & Chichilnisky SFN 03
Velocity sensitivity in fly visual system:
Brenner et al 2000
conclusion:
Standard models of neural response may
underestimate the number of dimensions in
which neurons compute their responses.
Beyond mean and variance: other techniques for
finding stimulus features that affect response
Information-theoretic approach: Search for stimulus axis which
maximizes difference between the two distributions.
• computationally intensive, but
• doesn’t require spherical symmetry
• Paninski ‘03
• Sharpee, Rust & Bialek ‘03
Regularization (if we have time)
lots of params + too little data = overfitting
(i.e. we explain features of the data the result from undersampling)
Solution: add penalty for ||k||2 (shrinkage)
or ||Dk||2 (smoothing)
Err =
(Smyth et al 2003)
Regularization (if we have time)
lots of params + too little data = overfitting
(i.e. we explain features of the data the result from undersampling)
Solution: add penalty for ||k||2 (shrinkage)
or ||Dk||2 (smoothing)
Err =
Bayesian interpretation: adding a Gaussian prior on k.
- results in a biased estimator, but gives improved performance
for appropriate choice of 1 and 2 (James & Stein 1960)
What have we achieved so far?
• techniques for identifying the subspace in which a neuron
computes its response: STA, STC, info-max, regularization
• However, this is still short of a full model of P(r|s).
• Cannot estimate nonlinearity in high-dimensional spaces.
• Rather, look at STC as providing a subspace in which to fit a
model. (“dimensionality reduction”)
Next Up:
• Beyond LNP
Break 2
Beyond LNP
So far we have considered models of the form:
r =
P(spike|r) = r exp(-rD)
(i.e. Poisson)
One thing we know to be inaccurate about this approach
is that it ignores the history of spiking.
idea: models which look high-dimensional when analyzed
with STC might in fact be low-dimensional when we
condition on spiking history. e.g.:
r =
Motivating Example: LIF (leaky integrate-and-fire)
Model dynamics are 1-dimensional and
purely linear except for spike reset:
V = 1  V = Vreset
First 6 modes of IF model
However, model appears
infinite-dimensional if
analyzed with STC!
Aguera y Arcas & Fairhall ‘03
GLMs (Generalized Linear Models)
Alternatively: “recurrent LNP”
r =
r  spikes via Poisson RV
• process no longer Poisson due to history dependence
•
– vector reflecting spike train history
• N – arbitrary nonlinear mapping, chosen in advance
Problem: how to fit this model?
Can’t use STA or STC unless “stimulus” spherically symetric
Aside: log-concavity
log-concavity is a powerful tool for proving that a function has no local maxima
(i.e. only a global maximum)
• concave – function has no points of positive (upward) curvature
• concavity precludes local maxima for smooth functions
• log-concavity – log of function is concave. This is a weaker condition, but
just as powerful for establishing lack of local maxima, since log is monotonic
• example: Gaussian is log-concave, but not concave
Aside: log-concavity
• Moreover, there exist powerful, fast optimization algorithms for finding the
maxima of concave functions, even in high dimensions.
• Solution to fitting problem: find loss function which is (negative) log-concave.
Theorem: likelihood of GLM model is log-concave
for suitable f. (Paninski ’04)
proof is simple!
Therefore: Use maximum likelihood to fit GLM.
r =
r  spikes via Poisson RV
Important question: How to choose N and
?
Stochastic Integrate-and-Fire Model
spike
responses
stimuli
receptive field
spiking mechanism
Nt
related models: Gerstner &
Kistler 02, Keat et al 01
Stochastic Integrate-and-Fire Model
spike
responses
stimuli
+ biophysically more plausible
+ powerful, flexible
+ fitting is tractable
Model behaviors:
r
adaptation
Model behaviors:
r
bursting
Model behaviors:
r
0
0
bistability
IF model simulation
Stimulus
filter K
Iinj
V
time (ms)
IF model simulation
Stimulus
filter K
Iinj
r
V
Noise
time (ms)
The Estimation problem
Learn the model parameters:
K = stimulus filter
t = leak time constant
 2 = noise variance
r=
From:
response current
stimulus train x(t)
spike times
ti
Solution: use Maximum Likelihood
Likelihood function
hidden variable:
P(spike at ti) = fraction of paths crossing threshold at ti
ti
Likelihood function
hidden variable:
P(spike at ti) = fraction of paths crossing threshold at ti
ti
Computing Likelihood
1
t
Diffusion Equation:
• linear dynamics
• additive Gaussian noise
P(V,t)
P(V,t+Dt)
fast methods for solving
linear PDE

efficient procedure for
computing likelihood
ti
Computing Likelihood
1
t
Diffusion Equation:
• linear dynamics
• additive Gaussian noise
fast methods for solving
linear PDE

reset
efficient procedure for
computing likelihood
ISIs are conditionally independent

likelihood is product over ISIs
Maximizing the likelihood
• parameter space is large (  20 to 100 dimensions)
• parameters interact nonlinearly
Main Theorem:

The log likelihood is concave in the parameters
{K, t, , r} , for any data {x(t), ti}
gradient ascent guaranteed to
converge to global maximum!
Applications
Now that we have fit the model params:
1) simulate spike trains and assess the fit
with real spike trains
2) decode spike responses using computed
values of P(r|s).
ON cell
RGC
LNP 74% of var
92 % of var
IF
Accounting for spike timing precision
0
P(spike)
time (ms)
200
r = .89
Decoding the neural response
Stim 1
Stim 2
Resp 1
Resp 2
?
Solution: use P(resp|stim)
Stim 1
a
Stim 2
P(R1|S1)P(R2|S2)
P(R1|S2)P(R2|S1)
Resp 1
Resp 2
?
Discriminate each repeat using P(Resp|Stim)
Stim 1
Stim 2
Resp 1
P(R1|S1)P(R2|S2)
Resp 2
?
P(R1|S2)P(R2|S1)
Discriminate each repeat using P(Resp|Stim)
Stim 1
Stim 2
Resp 1
Compare to LNP
model P(Resp|Stim)
Resp 2
?
94 % correct
LNP: 68 % correct
IF model % correct
Decoding the neural response
LNP model % correct
Summary:
1. Neural coding problem: P(r|s)
2. Classic approach: Volterra Kernels
3. LNP cascade models fit with dimensionalityreduction techniques (STA, STC).
4. Models which account for spike-history
(recurrent LNP, stochastic IF).
5. Applications to decoding.
Open problems:
1. Better quantitative models of V1.
2. Characterizing neurons deeper in sensory
processing pathway.
3. Multi-neuron dependencies
4. Adaptation