Transcript Document
Value based decision making: behavior and theory Institute for Theoretical Physics and Mathematics Tehran January, 2006 Leo Sugrue Greg Corrado SENSORY INPUT low level sensory analyzers DECISION MECHANISMS motor output structures ADAPTIVE BEHAVIOR SENSORY INPUT low level sensory analyzers DECISION MECHANISMS representation of stimulus/ action value motor output structures REWARD HISTORY ADAPTIVE BEHAVIOR How do we measure value? Herrnstein RJ, 1961 Choice Fraction The Matching Law Behavior: What computation does the monkey use to ‘match’? Theory: Can we build a model that replicates the monkeys’ behavior on the matching task? How can we validate the performance of the? model? Why is a model useful? Physiology: What are the neural circuits and signal transformations within the brain that implement the computation? An eye movement matching task Baiting Fraction 6:1 6:1 2:1 1:1 1:2 1:2 1:6 Dynamic Matching Behavior Dynamic Matching Behavior Rewards Dynamic Matching Behavior Rewards Responses Relation Between Reward and Choice is Local Rewards Responses How do they do this? What local mechanism underlies the monkey’s choices in this game? To estimate this mechanism we need a modeling framework. Linear-Nonlinear-Poisson (LNP) Models of choice behavior Strategy estimation is straightforward Estimating the form of the linear stage How do animals weigh past rewards in determining current choice? Estimating the form of the nonlinear stage Differential Value How is differential value mapped onto the animal’s instantaneous probability of choice? Probability of Choice (red) Monkey F Monkey G Differential Value (rewards) Our LNP Model of Choice Behavior Model Validation • Can the model predict the monkey’s next choice? • Can the model generate behavior on its own? Can the model predict the monkey’s next choice? Predicting the next choice: single experiment Predicting the next choice: all experiments Can the model generate behavior on its own? Model generated behavior: single experiment Distribution of stay durations summarizes behavior across all experiments Stay Duration (trials) Model generated behavior: all experiments Stay Duration (trials) Model generated behavior: all experiments Stay Duration (trials) Ok, now that you have a reasonable model what can you do with it? 1. Explore second order behavioral questions 2. Explore neural correlates of valuation Ok, now that you have a reasonable model what can you do with it? 1. Explore second order behavioral questions 2. Explore neural correlates of valuation Choice of Model Input reward history: 0000010100001 choice history: 0000111110011 Surely ‘not getting a reward’ also has some influence on the monkey’s behavior? Choice of Model Input reward history: 0000010100001 choice history: 0000111110011 hybrid history: 0000010100001 d d d d d = the value of an unrewarded choice Can we build a better model by taking unrewarded choices into account? hybrid history: 0000010100001 d d d d • Systematically vary the value of d • Estimate new L and N stages for the model • Test each new model’s ability to a) predict choice and b) generate behavior Unrewarded choices: The value of nothin’ Predictive Performance Generative Performance Value of Unrewarded Choices (d) Value of Unrewarded Choices (d) Unrewarded choices: The value of nothin’ Predictive Performance Generative Performance Value of Unrewarded Choices (d) Value of Unrewarded Choices (d) Choice of Model Input Contrary to our intuition inclusion of information about unrewarded choices does not improve model performance Optimality of Parameters Weighting of past rewards Is there an ‘optimal’ weighting function to maximize the rewards a player can harvest in this game? Weighting of past rewards • The tuning of the 2 (long) component of the Lstage affects foraging efficiency. Monkeys have found this optimum. • The tuning of the s, the nonlinear function relating value to p(choice) affects foraging efficiency. The monkeys have found this optimum also. • The 1 (short) component of the L-stage does not affect foraging efficiency. Why do monkeys overweight recent rewards? The differential model is a better predictor of monkey choice • Monkeys match; best LNP model • Model predicts and generates choices • Unrewarded choices have no effect • Monkeys find optimal 2 and s; 1 not critical • Differential value predicts choices better than fractional value ? Best LNP model: Candidate decision variable, differential value: g(v1 - v2) = pc Aside: what would Bayes do? 1) maintain beliefs over baiting probabilities 2) be greedy or use dynamic programming Firing rates in LIP are related to target value on a trial-by-trial basis into RF LIP out of RF gm020b http://brainmap.wustl.edu/vanessen.html Target Value The differential model also accounts for more variance in LIP firing rates What I’ve told you: • • • • • • Howmatching we control/measure value The law An experimental tasktask based on that principle A dynamic foraging A simple model of value based model choice Our Linear-Nonlinear-Poisson How we validate that model Predictive and generative validation How wemodels, use the optimality model to explore behavior Hybrid of reward weights How wefiring use the model explore value Neural in area LIPtocorrelates with related signals in the ‘differential value’ on brain a trial-by-trial basis Foraging Efficiency Varies as a Function of 2 Foraging Efficiency Does Not Vary as a Function of 1 What do animals do? Animals match. Matching is a probabilistic policy: pchoose = f pbait , pbait Matching is almost optimal within the set of probabilistic policies. + the change over delay Greg Corrado How do we implement the change over delay? only one ‘live’ target at a time