Learning, Volatility and the ACC

Download Report

Transcript Learning, Volatility and the ACC

Learning, Volatility
and the ACC
Tim Behrens
FMRIB + Psychology, University of Oxford
FIL - UCL.
0.8 B
CON
0.7
Reward History Weight (β)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
i-1
Kennerley, et al., Nature
Neuroscience, 2006
i-2
i-3
i-4
i-5
Trials Into Past
i-6
i-7
i-8
0.8 B
CON
ACCs
0.7
Reward History Weight (β)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
i-1
Kennerley et al. Nature
Neuroscience, 2006
i-2
i-3
i-4
i-5
Trials Into Past
i-6
i-7
i-8
Monkeys will sacrifice food
opportunities to look at other monkeys
ACCG
Rudebeck,et al. Science 2005
Interest in other individuals
is reduced after ACC gyrus lesion
ACCG
Rudebeck,et al. Science 2005
Anatomy - Differences in connections
between ACCs and ACCg.
• Connections unique to the sulcus are
•
mainly with motor regions:
• Primary motor cortex
• Premotor cortex
• Parietal motor areas
• Spinal Cord
ACCs has information about our own
actions
Anatomy - Differences in connections
between ACCs and ACCg.
•
Connections unique to the gyrus are mainly with
regions that process emotional and biological
stimuli:
•
•
•
•
•
Periacqueductal grey
hypothalamus
STS/STG
Insula/Temporal pole connections are stronger
to the gyrus
ACCg has access to information about other
agents.
Anatomy - shared connections
between ACCs and ACCg.
• Some shared connections
•
•
•
•
•
Orbitofrontal cortex
Amydala
Ventral striatum
ACCg and ACCs are strongly interconnected
Both regions have access to and
influence over reward and value
processing.
ACC Sulcus and learning
about your actions.
0.8 B
CON
ACCs
0.7
Reward History Weight (β)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
i-1
Kennerley et al. Nature
Neuroscience, 2006
i-2
i-3
i-4
i-5
Trials Into Past
i-6
i-7
i-8
What determines the integration length?
0.8
CON
0.7
Reward History Weight (β)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
i-1
i-2
i-3
i-4
i-5
Trials Into Past
i-6
i-7
i-8
Kennerly et al. Nat Neurosci 2006
Sugrue et al. Science 2005
VOLATILE
STABLE
Reward probabilities change Reward probabilities change
approximately every 25 trials only after hundreds of trials
0.8
CON
0.7
Reward History Weight (β)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
i-1
i-2
i-3
i-4
i-5
Trials Into Past
i-6
i-7
i-8
Kennerly et al. Nat Neurosci 2006
Sugrue et al. Science 2005
Reinforcement learning
• We need to continually re-appraise the value of an
action based each new experience.
outcome

prediction
(Vt)
new prediction
 x 
(Vt+1)
Updating beliefs on the basis of new
information
Vt+1=Vt +(  x 
The prediction error
is the information
available from this event
The learning rate is the
weight given to the
current information
14
The learning rate and the value of
information.
Vt+1=Vt +(  x 
The learning rate should represent the
value of the current information
for guiding future beliefs.
Relationship with integration length
 =0.01
 =0.1
 =0.4
37
stable
63
Behrens et al., Nature Neuroscience, 2007
Vt+1=Vt+ x 
Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
changes in reward estimates occur throughout the task…
…as do change in volatility estimates
Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
Monitor
x
Volatility
Decide
Monitor
Behrens et al., Nature Neuroscience, 2007
ACC effect size predicts learning rate
across subjects
Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007
ACC Gyrus and learning
about your social partners.
Interest in other individuals
is reduced after ACC gyrus lesion
ACCG
Rudebeck et al. Science 2005
Rudebeck et al., Science, 2006
Learning about other agents
37
63
25
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Sources of information
Probability that correct colour is blue
Probability that confederate advice is good
Value of action information
Value of social information
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Social information is integrated over
time - behaviour
Vt+1=Vt +(  x 
Reward Prediction Error
Reward - Expectation
Effect size
Outcome
Time
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Vt+1=Vt +(  x 
Prediction error on a social partner.
Lie event -Lie prediction
Effect size
Outcome
Time
Behrens, Hunt, Woolrich, Rushworth Nature 2008
Vt+1=Vt +(  x 
The value of information and the ACC
Value of reward information
Value of social information
30
Vt+1=Vt +(  x 
Combining Information to drive
behaviour
Conclusions
• ACC codes a learning signal when information
is observed.
• This signal predicts the speed of learning.
• Learning from our own and others’ actions are
processed in parallel in ACCs and ACCg.
• The outputs of these parallel learning
32
Acknowledgments
• Matthew Rushworth
• Mark Woolrich
• Laurence Hunt
• Mark Walton
33