Transcript Document

THE FOUR C's OF
NEUROINFORMATION THEORY:
CODING, COMPUTING,
CONTROL AND COGNITION
IBM Almaden: Institute on Cognitive
Computing, May 10-11, 2006
Toby Berger
University of Virginia
Charlottesville, VA 22903
Motor System
Environment
e(k )

p3 (e(k ) | m(k ), e(k  1))
e(k  1)
m(k )
p2 (m(k ) | v(k ), m(k  1))

m(k  1)
v(k  1)
Selector
s(k  1)

p1 (v(k ) | s(k  1), v(k  1)), m(k  1))
Sensory System; Brain
FIG. 1 BLOCK DIAGRAM OF MARKOV-MARKO BRAIN MODEL
v(k )
(Control)
Environment
e(k )

p3 (e(k ) | m(k ), e(k  1))
Motor System
m(k )
e(k  1)
p2 (m(k ) | v(k ), m(k  1))

m(k  1)
v(k  1)
Selector

(Cognition)
s(k  1)
p1 (v(k ) | s(k  1), v(k  1)), m(k  1))
Sensory System
v(k )
(Computation)
(Coding)
FIG. 1 BLOCK DIAGRAM OF MARKOV-MARKO BRAIN MODEL
MY BIO-IT COLLABORATORS
• Prof. William B. “Chip” Levy, UVA Med - Neuroscientist and
my prime bio-collaborator.
• Former Grad Students:
Zhen Zhang, Yuzheng Ying, Jun
Chen
• PhD Candidate:
Prapun Suksompong
FIGURE 1 OF EVERY INFORMATION
THEORY TEXTBOOK
Source
Source
Encoder
Channel
Encoder
Channel
User
Source
Decoder
Channel
Decoder
Channel is “fixed” and “given.” Future source data does
not depend on past outputs to user (open loop).
Channel behavior is independent of source statistics.
Good performance usually requires computationally
intense, long-delay source and channel codes .
Source and user must exchange coding rules a priori
and must share a common “language.”
BUT, IN THE INTRA-ORGANISM
COMMUNICATION THAT NEUROSCIENTS
STUDY,
• Channels are not fixed. They adapt their transition
probabilities over eons, or over milliseconds, in response to
the empirical distribution of the source.
• Future source data depends on past outputs to user.
• Time-varying joint source-channel coding often can be
efficiently performed by biochemical subsystems of
appropriate topology via simple probabilistic
transformations. No coding occurs in the classical
sense of information theory.
WAIT! What about DNA?
Long block code, discrete alphabet, extensive
redundancy, perhaps to control against the
infiltration of errors.
But DNA enables two organisms to communicate;
it’s designed for inter-organism communication.
DNA also controls gene expression, an intraorganism process, so a comprehensive theory of
intra-organism communication needs to address it
eventually.
ROBUST SHANNON-OPTIMAL PERFORMANCE WITHOUT CODING
2

1
R( D)  log( )
Ex. 1: IID
Source, MSE
2
D
AWGN Channel
C  1 log(1  S )
2
N
Equating R(D) to C yields the
2 (1  S )  1
D


Shannon-optimum mean distortion:
N
N (0,  2 )
But this minimum possible MSE per unit variance can be achieved
simply by scaling the signal to the available channel input power level
and then scaling the channel output to produce the MMSE estimate!
N (0, N )
Source
X
PS

+
S 2
SN
Channel
E( X  Y )2   2 (1  S )1
N
Y
User
ROBUST SHANNON-OPTIMAL PERFORMANCE WITHOUT CODING
Ex. 2: Bern-1/2 Source, Hamming
Distance
R ( D )  1  h( D )
C  1  h( p )
BSC(p) Channel
Equating R(D) to C yields the
Shannon-optimum Hamming distortion:
D p
This minimum possible Hamming distortion obviously can be achieved
simply by feeding the source output directly into the channel and
sending the channel output directly to the user – no delay, no coding!!
Source,
Bern-1/2
X
Y
BSC(p)
E( X  Y )  p
User
SHANNON OPTIMALITY IS ACHIEVED WITHOUT CODING OR
DELAY IN THESE TWO EXAMPES BECAUSE:
Source is matched to the channel. Source outputs are
distributed over channel input space in a way that maximizes
the mutual info rate between the channel input and output
subject to operative constraint(s), thereby achieving capacity.
Channel is matched to the source. The channel transition
probability structure is optimum for the source and distortion
measure; i.e., it achieves the point on their rate-distortion
function at which the rate equals the channel’s capacity.
[INSPIRED BY MY ABOVE EXAMPLES 1 AND 2, B. RIMOLDI, M. GASTPAR AND
M. VETTERLI HAVE DETERMINED A BROAD CLASS OF EXAMPLES THAT
EXHIBIT SUCH DOUBLE MATCHING, FIRST WITHOUT AND LATER WITH
NOISELESS FEEDBACK OF THE CHANNEL OUTPUTS TO THE ENCODER.]
I CONTEND THAT MOST BIOLOGICAL
SYSTEMS HAVE EVOLVED TO BE
NEARLY DOUBLY MATCHED LIKE
THIS. THUS, THEY HANDLE DATA
OPTIMALLY WITH MINIMAL IF ANY
CODING AND NEGLIGIBLE DELAY.
Information theorists recently have come to appreciate
that near-optimum performance can be obtained in
many situations via relatively simple probabilistic
methods that employ feedback in the source encoder
and/or around the channel, and/or in the channel
decoder. Biology has knows this for eons.
BUT THERE’S MORE! LIVING ORGANISMS
ARE INGENIOUSLY ENERGY-AWARE*.
THEY’RE OPTIMALLY DOUBLY MATCHED OVER A
WIDE RANGE OF POWER CONSUMPTION LEVELS.
THEY HAVE EVOLVED THE ABILITY TO CHANGE
THEIR INTERNAL CHANNEL TRANSITION
FUNCTIONS, OVER BOTH THE LONG RUN AND
THE SHORT RUN, TO MEET THE INFORMATION
RATE NEEDS OF THE APPLICATION AT HAND.
*The brain consumes 25-50% of the total metabolic energy budget of
sedentary human. (L. Sokoloff (1989), “Circulation and energy
metabolism of the brain,” in Basic Neurochemistry: Molecular, Cellular
and Medical Aspects, 4th ed., G. Siegel et al., Eds.)
C
Capacity – bits/s
Slope = (bits/s)/(joules/s) = bits/joule
C  1 log(1  S )
2
N
N.B. Increasing joules/s to get more bits/s
requires expending more joules/bit !!
Average power – joules/s
S
NEURON CARDINALITY
There are approximately 1011 neurons in the human
brain.
Each neuron forms synapses with between 10 and
105 others, resulting in a total of circa 1015
synapses.
From age -1/2 to age +2, the number of synapses
increases at net rate of a million per second, day
and night; many are abandoned, too.
It had long been believed that neuron and synapse
formation effectively cease after age 1 and age 2,
respectively, but recent studies have shown that
they continue until at least age 6.
MULTICASTING:
• Viewed as a network, the human brain simultaneously
multicasts 1011 messages that have an average of 104
recipients each. Each of of these 1011 x 104 = 1015 destinations
receives a new binary digit – spike or no spike - once every 2.5
ms, which is the effective spike width.
• Moreover, 2.5 ms later another petabit that depends on the
outcome of processing the previous one has been
multicast. (The Internet pales by comparison!)
• The brain does not simply use store-and-forward routing.
Rather, it uses an intensive form of network coding, the
exciting new information-theoretic discipline recently introduced
by Raymond Yeung and Bob Li. (See, e.g., the latest IT
Outstanding Paper Award winning article by Yeung, Li,
Ahlswede, and Cai.)
Time permitting, we shall see
below that the fact that neurons
actually fire asynchronously in
continuous time may enable
them to send considerably more
bps than their relatively low firing
rates suggest is the case.
DEFINITION OF A “TEAM” OF SENSORY NEURONS
THE AXONS IN A TEAM OF SENSORY NEURONS
FORM MANY OF THEIR SYNAPSES WITH OTHER
NEURONS IN THE TEAM (HORIZONTAL,
FEEDBACK). SOMETIMES THE LOCAL
CONNECTIVITY IS CLOSE TO 50%, AS OPPOSED TO
ONLY 10-7 BRAINWIDE.
THE REMAINDER OF THE SYNAPSES TO WHICH
A TEAM’S AXONS ARE EFFERENT ARE SPLIT
BETWEEN “LOWER” NEURONS (TOP-DOWN
FEEDBACK) AND “HIGHER” NEURONS (BOTTOMUP FEEDFORWARD).
TIME-DISCRETE
MODEL OF A “TEAM”
OF NEURONS
PSPs
MAXIMUM INFORMATION RATE HYPOTHESIS
The process {X(k))} afferent to a team of neurons has
the property that it maximizes the directed mutual
information rate from {X(k)} to the efferent process
{Y(k)} that it generates, where the maximization is
over all processes that lead to the same or smaller
energy expenditure in the Y-neurons.
Remarks: 1)Energy is expended in the synapses
both in receiving and in responding to afferent
excitation, and in the axons both to restore chemical
concentrations during refractory periods following
action potential generation and, to a lesser extent, to
drive spikes down the axonal ‘transmission lines’.
2) Time permitting, directed information will be
defined in a subsequent slide.
The Brain as a Markov Chain
MAIN THEOREM:
IF MAXIMUM INFORMATION RATE HYPTOTHESIS IS TRUE, THEN:
• {(Xk,Yk} is a first-order (non-homog) Markov chain
• {Yk} is a first-order (non-homog) Markov chain
• {Xk} is not necessarily Markovian
PROOF: Via the Berger-Ying lemmas: Joint work with Yuzheng Ying, to
appear in IEEE IT Trans.
REMARKS:
• The max info rate hypothesis says the source {X(k)} is
robustly “matched” to the channel’s transition matrix, P(y|x).
• If double matching prevails, as we suspect it does, then the
QSF rate parameterizes the rate-distortion function, and
distortion is measured by a Weber-Fechner fidelity criterion
of the form
n

1
n
 d ( x(k ), y (k ), y (k  1)).
k 1
• The Markovianness of the Main Theorem is essential to
the brain’s low-latency processing of sensory information.
Without it, bottom-up delay would accumulate too fast to
allow for the number of hierarchy levels needed to achieve
the sophisticated distinctions of which the brain is capable.
NEURAL CODING AND SYNAPTIC CLOCKS
It is widely held that the principal, if not
the only, information transmission task a
neuron is called upon to perform is to
convey continually to its efferent cohort
the value of the afferent excitation
intensity (a.k.a. the “bombardment”) it
has recently been experiencing.
FIXED THRESHOLD NEURAL MODELS ARE PLAGUED
BY LARGE COEFFICIENTS OF VARIATION
Several investigators have studied the statistics
of the durations of interspike intervals (ISI’s ) for
mathematical models of leaky, fixed-threshold
neurons. Both with and without a refractory
period included in the model, the ISI’s coefficient
of variation (i.e., the ratio, / m, of its standard

deviation to its mean) is greater
than 1 over
almost the entire range of afferent excitation
levels of practical interest; the only exception is
at the highest excitation levels that result in the
neuron firing about as fast as it can (saturation).
This renders timing codes virtually useless,
leaving rate codes as the only means by which a
neuron can reliably communicate information to its
efferent neighbors about the bombardment intensity it
is currently experiencing.
However, that is in direct conflict with numerous
recent experiments which convincingly demonstrate
that many neurons in cortex and elsewhere exhibit
reliable ISI’s in response to repetitions of investigatorcontrolled stimuli. Also, animals can respond
intelligently at latencies which are substantially lower
than the time it would take for a hierarchy of rate
codes to achieve a useful level of statistical reliability.
A compelling (?) case for this has been
made by Berger and Levy,
Encoding of Excitation
via Dynamic Thresholding
NEUROSCIENCE 2004
San Diego, CA 10/23-28/2004
PSP
Increasing 
4
8
10
12
Time, ms
MEAN PSP v. TIME FOR VARIOUS BOMBARDMENT INTENSITIES
PSP
4
8
Time, ms
Filtered Poisson PSP’s v. Time
PSP
Spiking times of red and blue
PSPs for descending threshold
Fixed Threshold
Descending Threshold
Time, ms
Spiking times of red and blue
PSPs for fixed threshold
DYNAMICALLY DESCENDING
THRESHOLDS ENABLE TIMING CODES
A descending threshold can serve as
a simple mechanism by means of
which a neuron can accurately
convert (i.e., encode) - into the
duration of the ISI between any two of
its successive AP’s - the value of the
excitation intensity it has experienced
during said ISI. This statement is true
regardless of whether the intensity in
question is strong, moderate, or weak.
A neuron that possesses a fixed
threshold cannot accomplish this.
It is also known that synapses possess
chemical “clocks” that enable them to
“remember” even for hundreds of
milliseconds how long ago its most recent
and its next-to-most recent afferent spikes
arrived.
ALL THIS LEADS ME TO BELIEVE THAT
NEURONS DO INDEED IMPLEMENT
ACCURATE, LOW-LATENCY TIMING CODES
BY MEANS OF DYNAMIC POST-SYNAPTIC
POTENTIAL THRESHOLDS THAT DECAY
WITH TIME.
Alternatively, a neuron also can achieve
much the same result by having a postsynaptic leakage conductance that varies
inversely with PSP. (See, e.g., Brette and
Gensler, 2005.)
It may well be that neurons employ a
combination of theshold decay and variable
leakage conductance. However, in what
follows we use only threshold decay
terminology.
1. The precise shape of the threshold decay curve is
not important; the neurons in the efferent cohort
can readily adapt to the shape of T(t).
2. The resulting variance in estimating  has the form

Var (
  )  c1.
3. If instead you are interested in estimating log ,

Var[(log  )  log ]  c2 / .
4. To estimate the accuracy of ISI encoding of
bombardment intensity, one must take into account
at least the following three sources of imprecision:
i) Imprecision in the instant of generation of an AP
ii) Imprecision in the rates of axonal propagation
along the axon for two successive action potentials
iii) imprecision in the estimate of the AP’s time of arrival
at the synapse. (See Berger and Suksompong, IEEE
ISIT, Seattle, July 9-15, 2006.) Doing so shows that
neural encoding bit rates can be meaningfully higher
than previously had been thought!
5. If the excitation is a time-varying Poisson process,
then its intensity  (t ) is a sufficient statistic for
stochastically describing it, so it is the only thing that
needs to be communicated.
6. The excitation of a (cortical) neuron is indeed robustly
a time-varying Poisson process, despite the individual
spike trains of which it is composed not being Poisson
and possibly being highly correlated. (This is a
consequence of Stein-Chen Poisson approximation
theory; cf. C. Stein, IMS Lecture Notes, vol. 78,
Lecture VIII, IMS, Hayward, CA, 1986, and
subsequent work of Barbour et al., among others.)
A CHALLENGING, IMPORTANT
QUESTION ABOUT RNN’s
Consider a sparsely connected, feedback-heavy
network of hundreds of millions of neurons most
of which have an in-degree and out-degree of
circa 10,000. When galvanized by sensory inputs
and exchanging their excitation histories in the
manner described above, what kinds of decisions,
computations, and responses can such a network
generate? (N.B The excitation history that a
neuron communicates does not directly
propagate beyond its first-tier neighbors.)
(Control)
Environment
e(k )

p3 (e(k ) | m(k ), e(k  1))
Motor System
m(k )
e(k  1)
p2 (m(k ) | v(k ), m(k  1))

m(k  1)
v(k  1)
Selector

(Cognition)
s(k  1)
p1 (v(k ) | s(k  1), v(k  1)), m(k  1))
Sensory System
v(k )
(Computation)
(Coding)
FIG. 1 BLOCK DIAGRAM OF MARKOV-MARKO BRAIN MODEL
ACTIVITY DURING TIME SLOT k
AT THE START OF TIME SLOT k , e(k-1), s(k-1),
v(k-1) and m(k-1) ALL EXIST ALREADY.
AS SLOT k PROGRESSES, FIRST v(k), NEXT
m(k), NEXT e(k), AND FINALLY s(k) GET
PRODUCED IN THAT ORDER.
MARKOV REVISITED
THE NOTATION USED IN THE BOXES IN FIG. 1, e.g.,
p1 (v(k ) | s(k  1), v(k  1)), m(k  1)),
IMPLIES THAT THE CONDITIONAL PROBABILITY OF THE
RANDOM VECTOR APPEARING BEFORE THE
CONDITIONING BAR WOULD NOT CHANGE IF ONE WERE
TO INCLUDE AFTER THE CONDITIONING BAR TIMEPREDECESSORS OF ONE OR MORE OF THE VECTORS
THAT CURRENTLY APPEAR THERE. THAT IS, THE MODEL
TREATS THE (SENSORY, MOTOR, ENVIRONMENT)DYNAMIC SYSTEM AS JOINTLY FIRST-ORDER MARKOV.
SURELY, THIS IS ONLY AN APPROXIMATION TO REALITY.
HOWEVER, THE NEXT TWO SLIDES DISCUSS HOW TO
BUILD THE SENSORY PORTION OF THE MODEL SO THAT IT
ACCURATELY RESPECTS THE NEUROBIOLOGY WHILE AT
THE SAME TIME BEING FIRST-ORDER MARKOV.
BRAIN STATE AS A FIRST-ORDER MARKOV PROCESS
IT DOES NOT SUFFICE TO USE AS THE STATE OF THE BRAIN AT TIME k
A BINARY VECTOR WHOSE jth COMPONENT EQUALS 1 IF NEURON j
HAS FIRED DURING SLOT k-1 AND 0 IF IT HAS NOT. THAT’S BECAUSE
THE NEURONS THAT HAVE NOT FIRED DURING THE LAST SLOT
CARRY OVER INTO THE NEXT SLOT INFORMATION ABOUT THE SIZE
OF THEIR SUB-THRESHOLD PSP’s AND THE STATUS OF CERTAIN OF
THEIR SYNAPTIC CLOCKS.
INSTEAD, WE INTRODUCE A STATE VECTOR L(k) WHOSE jth
COMPONENT IS THE NUMBER OF TIME SLOTS THAT HAVE
TRANSPIRED SINCE THE LAST SLOT IN WHICH NEURON j GENERATED
A SPIKE. THE COMPONENTS OF L(k) THAT ARE ZERO INDEX THE SET
OF NEURONS THAT HAVE JUST FIRED IN THE PREVIOUS SLOT, SO
THIS SUBSUMES THE USUAL STATE VECTOR. MOREOVER, IT
ALLOWS US TO TAKE DYNAMIC THRESHOLDS INTO ACCOUNT, WITH
ABSOLUTE REFRACTORINESS CORRESPONDING TO A THRESHOLD
THAT IS INFINITELY HIGH DURING THE SLOT IMMEDIATELY
FOLLOWING ONE IN WHICH A NEURON HAS FIRED. L(k) CAPTURES
EVERYTHING THAT MATTERS EXCEPT QUANTAL SYNAPTIC FAILURE
(QSF), WHICH WE ADDRESS ON THE NEXT SLIDE.
BRAIN STATE AUGMENTSED BY QSF DATA
QSF’s PROVIDE A POTENT MECHANISM FOR MAKING THE
CONDITIONAL DISTRIBUTIONS IN THE THREE MAIN BOXES
OF OUR MODEL GENUINELY PROBABILISTIC. THIS IS
CRUCIAL TO MANY PHENOMENA OF NEUROSCIENTIFIC
INTEREST, INCLUDING THE BUILDING OF AN INTERNAL
STOCHASTIC MODEL OF THE ENVIRONMENT THE RANDOM
NATURE OF WHICH CAN BE VARIED RAPIDLY OVER A
LARGE DYNAMIC RANGE.
INCORPORATING QSF’s INECESSITATES INCREASING THE
SIZE OF THE STATE VECTOR FROM THE NUMBER OF
NEURONS TO THE NUMBER OF SYNAPSES, A FACTOR OF
ABOUT 104 IN THE CASE OF THE HUMAN BRAIN. THE
COMPONENTS THUS ADDED ARE BINARY, EQUALING 1 IF
THE LAST SPIKE AFFERENT TO NEURON j FROM NEURON i
WAS FAILED AND 0 IF IT WASN’T. THIS IS BECAUSE THE
CONDITIONAL PROBABILITY THAT THE NEXT SPIKE TO
ARRIVE AT SYNAPSE (i,j) WILL BE FAILED DEPENDS BPTJ
ON HOW LONG IT HAS BEEN SINCE A SPIKE LAST ARRIVED
THERE AND ON WHETHER OR NOT THAT SPIKE WAS
FAILED. WITH THIS AUGMENTATION WE GET A HIGHLY
ACCURATE FIRST-ORDER MARKOV MODEL OF THE BRAIN.
MARKO REVISITED
NEURONS IN A CORTICAL REGION, SAY V2, RECEIVE SOME OF THEIR
INPUTS DIRECTLY FROM OTHERS IN V2 (HORIZONTAL), SOME FROM
OTHERS IN V3 AND ABOVE (TOP DOWN), AND SOME FROM OTHERS IN
V1 AND BELOW (BOTTOM UP).
AS A CONSEQUENCE THE INFORMATION THESE NEURONS TRANSMIT
TO OTHERS VIA THEIR AXONAL SPIKES IS DYNAMICALLY DETERMINED
IN REAL TIME BY THE INPUTS THEY ARE STEADILY RECEIVING. THE
NEURONS THAT CONSTITUTE V2 THEREFORE ARE NOT
INFORMATION SOURCES IN THE SHANNON SENSE. THAT IS, THEY DO
NOT GENERATE DATA A PRIORI AND INDEPENDENTLY OF WHAT THEY
HEAR FROM THOSE WITH WHOM THEY ARE CONVERSING. THEIR
OUTPUTS ARE INSTEAD HEAVILY INFLUENCED BY INPUTS THEY HAVE
RECEIVED FROM OTHERS IN BOTH THE RECENT AND THE DISTANT
PAST. SUCH SOURCES THUS SUBSCRIBE TO THE COMMUNICATION
MODEL INTRODUCED BY MARKO. (H. Marko,The bidirectional
communication theory: A generalization of information theory, IEEE Trans.
Comm., vol. COM-21, pp. 1345-1351, December 1973.)
Control Link
Comm Link
Comm Link
Comm Link
NASA Houston
CANONICAL REMOTE CONTROL PROBLEM
REPRESENTATION OF THE ENVIRONMENT
We subscribe to the view that, within its brain, a healthy organism
steadily builds, refines, extends and modifies a model of it’s
environment. We view this model not as some mystical or
metaphysical construct but rather as being instantiated as a collection
of interacting neurons. The model may be located in a particular
region or regions of the brain, but its crucial importance militates for it
being widely distributed over much if not all of the brain. Most of the
basic infrastructure of the model is forged during gestation according
to genetic prescriptions, including the design of the fundamental
mechanisms by means of which the model subsequently will be
extended and modified based on acquired experience.
The posited model constitutes an internal representation of the
external environment. As such, it is the mechanism by which the
organism persistently seeks to solve the “representation problem” of
neuropsychology with ever-increasing sophistication.
THE REASON FOR MODEL BUILDING
An organism’s principal reason for constructing
and continually updating its internal model of the
environment is to learn how to better control that
environment. If no physical actions are taken, the
organism effectively defaults on any attempt at
environmental control. The sine qua non, then, is
to learn to generate the most effective motor
responses possible based on the environmental
stimuli acquired by the sensory organs.
ESTIMATING ENVIRONMENTAL RESPONSE
An organism can use its internal model of the environment to
generate estimates of how the environment will react to
prospective motor controls. Depending upon the amounts of
time, computational ability, and energy consumption that are
permissible in a given situation, the organism may be able to
input many prospective motor controls to the environmental
model. In this connection, since the actual environment
contains sources of randomness due both to stochastic
natural phenomena and to the usually unpredictable actions
of other denizens of the environment, an organism’s model of
it should be similarly stochastic. (QSF’s may play a major
role in producing this stochasticity.) Therefore, better
estimates may result if a given prospective control is put into
the model more than once and statistics are gathered about
the set of resulting responses of the model.
THE PERF0RMANCE CRITERION
Adopting the block diagram of Figure 1, and also
subscribing to the view that an organism is always
engaged in building and exercising a model of its
environment in the manner described in the preceding
slides, leads to the following conclusion:
The purpose of processing sensory stimuli
is less to convey to the top brain what
stimuli have been sensed in the past than it
is to enable the brain to better predict what
stimuli will be sensed in the future.
THE PERF0RMANCE CRITERION (Cont.)
IN SYMBOLS, THE SENTIMENT EXPRESSED IN THE PREVIOUS
SLIDE IS THAT THE DISTORTION MEASURE TO BE APPLIED IN
TIME SLOT k IS NOT OF THE FORM
d (s(k  1), v(k ))
BUT INSTEAD IS OF THE FORM
WHERE

s (k )

d (s (k ), s(k ))
IS THE BRAIN’S ESTIMATE OF WHAT
s(k ) WILL BE BASED ON THE m(k ) IT INPUTS TO THE
ENVIROMENT, AS CALCULATED DURING SLOT k ON THE
BASIS OF THE
v(k ) DERIVED FROM PROCESSING s(k  1)
MASSEY REVISITED (Cont.)
BUT THE MASSEY-TATIKONDA THEOREM ASSUMES A
SHANNON-STYLE SOURCE – ONE OF 2TR PRE-GENERATED
MESSAGES TO BE SENT DURING AN INTERVAL OF DURATION
T. SINCE OUR (S,M,E)- MODEL USES MARKO-STYLE
SOURCES, THE M-T THEOREM IS NOT APPLICABLE TO IT.
PERHAPS IT WILL TURN OUT THAT DIRECTED INFORMATION
IS RELEVANT TO THE PROBLEM OF NEURAL CODING AND
LEARNING, BUT AT PRESENT THERE I SEE NO COMPELLING
REASON TO BELIEVE THAT IS THE CASE.
MASSEY REVISITED
DIRECTED INFORMATION WAS INTRODUCED IN A PAIR OF
CHARACTERISTICALLY BEAUTIFUL PAPERS BY JIM MASSEY.* AMONG
OTHER THINGS, MASSEY SHOWED THAT THE CAPACITY OF A CHANNEL
WITH MEMORY AND FEEDBACK IS GIVEN BY THE SUPREMUM OF THE
DIRECTED INFORMATION RATE FROM THE CHANNEL’S INPUT TO ITS
OUTPUT THAT HE INTRODUCED THEREIN, AS OPPOSED TO THE
SUPREMUM OF SHANNON’S MUTUAL INFORMATION RATE WHICH HE
SHOWED IS IN GENERAL STRICTLY GREATER. (S. TATIKONDA HAS
SINCE PROVED THE CORRESPONDING CONVERSE THEOREM.)
*1. J. L. Massey, Causality, feedback and directed information, Proceedings of
the International Symposium on Information Theory and its Applications,
Honolulu, HI, Nov. 27-30, 1990.
2. J. L. Massey, Network information theory – some tentative definitions,
DIMACS Workshop on Network Information Theory, March 17, 2003.
BERGER-YING LEMMAS
REGARDLESS OF WHETHER MUTUAL INFORMATION OR
DIRECTED INFORMATION IS USED, THE BERGER-YING
LEMMAS WILL APPLY. THE B-Y LEMMAS SAY THAT, IF IT IS
DESIRED TO MAXIMIZE THE RATE AT WHICH EITHER
INFORMATION OR DIRECTED INFORMATION IS SENT PART
WAY OR ALL THE WAY AROUND THE LOOP FROM {s(k)} TO
{v(k)} TO {m(k)} TO {e(k)}, THEN THE PROCESSES INVOLVED IN
THAT PORTION OF THE LOOP WILL BE JOINTLY FIRST-ORDER
MARKOV. MOREOVER, EACH OF THEM, EXCEPT PERHAPS
{s(k)}, WILL BE INDIVIDUALLY FIRST-ORDER MARKOV. THESE
FACTS REMAIN TRUE EVEN IF CONSTRAINTS ARE IMPOSED
ON THE EXPECTED VALUES OF ONE OR MORE FUNCTIONS
OF {s(k-1),v(k),m(k),e(k),v(k-1),m(k-1),e(k-1)); THIS INCLUDES
CONSTRAINTS ON ENERGY USAGE.
“We have knowledge of the past, but we
can’t control it. We can control the
future, but we have no knowledge of it.”
CLAUDE E. SHANNON, 1960
THE BRAIN IS A WONDERFUL ORGAN.
IT STARTS WORKING THE MOMENT YOU
GET UP IN THE MORNING AND DOES NOT
STOP UNTIL YOU GET TO THE OFFICE.
Robert Frost (1874-1963)
THE END
Temporal Dynamics of a V1 Neuron’s Response to Real
and Illusory Contours
From T. S. Lee and M. Nguyen, Dynamics of subjective contour formation in the early visual cortex. PNAS 98(4):1907-1911, 2001.