Transcript Document

Modelling and Control Issues
Arising in the Quest for a Neural Decoder
Computation, Control, and Biological
Systems Conference VIII,
July 30, 2003
Albert E. Parker
Complex Biological Systems
Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Collaborators: Tomas Gedeon, Alex Dimitrov, John Miller, and Zane Aldworth
Talk Outline
The Neural Coding Problem
A Clustering Problem
The Dynamical System
The Role of Bifurcation Theory
A new algorithm to solve the Neural Coding
Problem
The Neural Coding Problem
GOAL: To understand the neural code.
EASIER GOAL: We seek an answer to the question,
How does neural activity represent information about environmental stimuli?
“The little fly sitting in the fly’s brain trying to fly the fly”
Looking for the dictionary to the neural code …
encoding
inputs: stimuli
X
outputs: neural responses
Y
decoding
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses:
X
Yi| X
i = 1, 2, 3, 4
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses:
X
Yi| X
i = 1, 2, 3, 4
Neural coding is stochastic!!
Similarly, neural decoding is stochastic:
Y
Xi|Y
i = 1, 2, … , 9
Probability Framework
encoder: P(Y|X)
environmental
stimuli
neural
responses
Y
X
decoder: P(X|Y)
The Neural Coding Problem:
How to determine the
encoder P(Y|X) or the decoder P(X|Y)?
Common Approaches: parametric estimations, linear methods
Difficulty: There is never enough data.
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
P(YN|X)
P(X|YN)
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
Clustered Responses
q(YN |Y)
K objects {yi}
YN
N objects {yNi}
• q(YN|Y) is a stochastic clustering of the responses
• To address the insufficient data problem, one clusters the
outputs Y into clusters YN so that the information
that one can learn about X by observing YN , I(X;YN), is as
close as possible to the mutual information I(X;Y)
Two optimization problems which use this
approach
• Information Bottleneck Method (Tishby, Pereira, Bialek 1999)
min I(Y,YN) constrained by I(X;YN)  I0
q
max –I(Y,YN) +  I(X;YN)
q
• Information Distortion Method (Dimitrov and Miller 2001)
max H(YN|Y) constrained by I(X;YN)  I0
q
max H(YN|Y) +  I(X;YN)
q
In General:
We have developed an approach to solve optimization problems of the
form
maxqG(q) constrained by D(q)D0
or (using the method of Lagrange multipliers)
maxqF(q,) = maxq(G(q)+D(q))

where
•  [0,).
•  is a subset of valid stochastic clusterings in RNK.
• G and D are sufficiently smooth in .
• G and D have symmetry: they are invariant to relabelling of the classes
of YN.
Symmetry: invariance to
relabelling of the clusters of YN
class 1
class 2
q(YN|Y) : a clustering
Y
K objects {yi}
YN
N objects {yNi}
Symmetry: invariance to
relabelling of the clusters of YN
class 2
class 1
q(YN|Y) : a clustering
Y
K objects {yi}
YN
N objects {yNi}
An annealing algorithm
to solve
maxq(G(q)+D(q))

Let q0 be the maximizer of maxq G(q), and let 0 =0. For k  0, let (qk , k ) be
a solution to maxq G(q) +  D(q ). Iterate the following steps until
K =  max for some K.
1. Perform  -step: Let  k+1 = k + dk where dk>0
2. The initial guess for qk+1 at  k+1 is qk+1(0) = qk +  for some small
perturbation .
3. Optimization: solve maxq (G(q) +  k+1 D(q)) to get the maximizer qk+1 ,
using initial guess qk+1(0) .
Application of the annealing method to the Information Distortion problem
maxq (H(YN|Y) +  I(X;YN))
when p(X,Y) is defined by four gaussian blobs
p(X,Y)
X
52 stimuli
Y
52 responses
Stimuli
Responses
Y
q(YN |Y)
52 responses
YN
4 clusters
Evolution of the optimal clustering:
Observed Bifurcations for the Four Blob problem:
We just saw the optimal clusterings q* at some  *=  max . What do the clusterings look like for < max ??
??????
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some
other type?
How many bifurcating solutions are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch
which contains solutions of the optimization problem?
Are there bifurcations after all of the classes have resolved ?
Bifurcation theory
in the presence of symmetries
enables us to answer the
questions previously posed …
Recall the Symmetries:
To better understand the bifurcation structure, we capitalize
on the symmetries of the function G(q)+D(q)
class 1
class 3
q(YN|Y) : a clustering
Y
K objects {yi}
YN
N objects {yNi}
Recall the Symmetries:
To better understand the bifurcation structure, we capitalize
on the symmetries of the function G(q)+D(q)
class 3
class 1
q(YN|Y) : a clustering
Y
K objects {yi}
YN
N objects {yNi}
The symmetry group of all
permutations on N symbols
is
SN.
Formulate a Dynamical System
Goal: To solve  maxq  (G(q) +  D(q)) for each , incremented in
sufficiently small steps, as   .
Method: Study the equilibria of the of the gradient flow

 q 



    q , L (q,  ,  ) :  q ,  G(q)   D(q)    y   q( z | y)  1 

yY
 z

 

•
Equilibria of this system are possible solutions of the the maximization
problem  (satisfy the necessary conditions of constrained optimality)
•
The Jacobian q,L(q*,*) is symmetric, and so only bifurcations of equilibria
can occur.
Observed Bifurcation
Structure
Observed Bifurcation
Structure
Group Structure
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
1
S2 S2
S3
S2
S2
S2
The Equivariant Branching Lemma shows that the bifurcation structure contains the branches …
Observed Bifurcation
Structure
q*
Group Structure
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
S2 S2
S3
S2
S2
S2
1

The Smoller-Wasserman Theorem shows additional structure …
Observed Bifurcation
Structure
q*
Group Structure
S4
 12, 34 
 13, 24 
 14, 23 

Theorem: There are at exactly K/N bifurcations on the branch (q1/N ,  ) for the Information Distortion problem
Observed Bifurcation
Structure
q*
There are 13
bifurcations
on the first
branch

??????
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some
other type?
How many bifurcating solutions are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch
which contains solutions of the optimization problem?
Are there bifurcations after all of the classes have resolved ?
Conceptual Bifurcation Structure
??????
Observed Bifurcations for the 4 Blob Problem
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
There are N-1 symmetry breaking bifurcations from SM to SM-1 for M  N.
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other
type?
How many bifurcating solutions are there? There are at least N from the first bifurcation, at
least N-1 from the next one, etc.
What do the bifurcating branches look like? They are subcritical or supercritical depending on
the sign of the bifurcation discriminator (q*,*,uk) .
What is the stability of the bifurcating branches? Is there always a bifurcating branch which
contains solutions of the optimization problem? No.
Are there bifurcations after all of the classes have resolved ? In general, no.
Continuation techniques
provide
numerical confirmation
of the theory
A closer look …
q*

Bifurcation from S4 to S3…
q*

The bifurcation from S4 to S3 is subcritical …
(the theory predicted this since the bifurcation discriminator (q1/4,*,u)<0 )
Additional structure!!
Conclusions …
 We have a complete theoretical picture of how the
clusterings evolve for any problem of the form
maxq(G(q)+D(q))
subject to the assumptions stated earlier.
o When clustering to N classes, there are N-1 bifurcations.
o In general, there are only pitchfork and saddle-node bifurcations.
o We can determine whether pitchfork bifurcations are either
subcritical or supercritical (1st or 2nd order phase transitions)
o We know the explicit bifurcating directions
 SO WHAT??
 There are theoretical consequences …
 This yields a new and improved algorithm for solving the neural
coding problem …
A numerical algorithm to solve max(G(q)+D(q))
q

Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k  0, let (qk , k ) be a
solution to maxq G(q) +  D(q ). Iterate the following steps until
K =  max for
some K.
   qk 

1. Perform  -step: solve  q , L (qk , k ,  k )
        q , L (qk , k ,  k )
  k
   qk 
 and select  k+1 = k + dk where
for 

   k 
dk = (s sgn(cos )) /(||qk ||2 + ||k ||2 +1)1/2.
2.
The initial guess for (qk+1,k+1) at  k+1 is
(qk+1(0),k+1 (0)) = (qk ,k) + dk ( qk, k) .
3.
Optimization: solve maxq (G(q) +  k+1 D(q)) using pseudoarclength continuation
to get the maximizer qk+1, and the vector of Lagrange multipliers k+1 using initial
guess (qk+1(0),k+1 (0)).
4.
Check for bifurcation: compare the sign of the determinant of an identical block
of each of q [G(qk) +  k D(qk)] and q [G(qk+1) +  k+1 D(qk+1)]. If a bifurcation is
detected, then set qk+1(0) = qk + d_k u where u is bifurcating direction and repeat
step 3.
Application to cricket sensory data
E(X|YN): stimulus
means conditioned
on each of the classes
typical spike
patterns
optimal
quantizer