Bayesian models of human learning and reasoning Josh Tenenbaum MIT

Download Report

Transcript Bayesian models of human learning and reasoning Josh Tenenbaum MIT

Bayesian models of human
learning and reasoning
Josh Tenenbaum
MIT
Department of Brain and Cognitive Sciences
Computer Science and AI Lab (CSAIL)
Collaborators
Chris Baker
Vikash Mansinghka
Noah Goodman
Tom Griffiths
Charles Kemp
Amy Perfors
Lauren Schmidt
Pat Shafto
The probabilistic revolution in AI
• Principled and effective solutions for
inductive inference from ambiguous data:
–
–
–
–
–
Vision
Robotics
Machine learning
Expert systems / reasoning
Natural language processing
• Standard view: no necessary connection to
how the human brain solves these problems.
Bayesian models of cognition
Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman,
Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...]
Language acquisition and processing [Brent, de Marken, Niyogi, Klein,
Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …]
Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording,
Kawato, Doya, Todorov, Shadmehr, …]
Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …]
Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …]
Attention [Mozer, Huber, Torralba, Oliva, Geisler, Movellan, Yu, Itti, Baldi, …]
Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro,
Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …]
Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …]
Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …]
Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker,
Goodman, Tenenbaum, …]
Everyday inductive leaps
How can people learn so much about the
world from such limited evidence?
– Learning concepts from examples
“horse”
“horse”
“horse”
Learning concepts from examples
“tufa”
“tufa”
“tufa”
Everyday inductive leaps
How can people learn so much about the
world from such limited evidence?
–
–
–
–
–
Kinds of objects and their properties
The meanings of words, phrases, and sentences
Cause-effect relations
The beliefs, goals and plans of other people
Social structures, conventions, and rules
Modeling Goals
• Principled quantitative models of human behavior,
with broad coverage and a minimum of free
parameters and ad hoc assumptions.
• Explain how and why human learning and
reasoning works, in terms of (approximations to)
optimal statistical inference in natural
environments.
• A framework for studying people’s implicit
knowledge about the structure of the world: how it
is structured, used, and acquired.
• A two-way bridge to state-of-the-art AI and
machine learning.
The approach: from statistics to intelligence
1.
How does background knowledge guide learning
from sparsely observed data?
Bayesian inference:
P ( d | h) P ( h)
P( h | d ) 
 P(d | hi ) P(hi )
hi H
2. What form does background knowledge take, across
different domains and tasks?
Probabilities defined over structured representations: graphs,
grammars, predicate logic, schemas, theories.
3. How is background knowledge itself acquired?
Hierarchical probabilistic models, with inference at multiple
levels of abstraction. Flexible nonparametric models in
which complexity grows with the data.
Outline
• Predicting everyday events
• Learning concepts from examples
• The big picture
Basics of Bayesian inference
P ( d | h) P ( h)
• Bayes’ rule: P(h | d ) 
 P(d | hi ) P(hi )
• An example
hi H
– Data: John is coughing
– Some hypotheses:
1. John has a cold
2. John has lung cancer
3. John has a stomach flu
– Likelihood P(d|h) favors 1 and 2 over 3
– Prior probability P(h) favors 1 and 3 over 2
– Posterior probability P(h|d) favors 1 over 2 and 3
Bayesian inference in perception and
sensorimotor integration
(Weiss, Simoncelli & Adelson 2002)
(Kording & Wolpert 2004)
Everyday prediction problems
(Griffiths & Tenenbaum, 2006)
• You read about a movie that has made $60 million to date.
How much money will it make in total?
• You see that something has been baking in the oven for 34
minutes. How long until it’s ready?
• You meet someone who is 78 years old. How long will they
live?
• Your friend quotes to you from line 17 of his favorite poem.
How long is the poem?
• You meet a US congressman who has served for 11 years.
How long will he serve in total?
• You encounter a phenomenon or event with an unknown
extent or duration, ttotal, at a random time or value of t <ttotal.
What is the total extent or duration ttotal?
Bayesian analysis
P(ttotal|t)  P(t|ttotal) P(ttotal)
 1/ttotal
P(ttotal)
Assume
random
sample
(for 0 < t < ttotal
else = 0)
Form of P(ttotal)?
e.g., uninformative (Jeffreys) prior  1/ttotal
Bayesian analysis
P(ttotal|t)  1/ttotal
1/ttotal
posterior
probability
“Uninformative”
prior
Random
sampling
P(ttotal|t)
t
Best guess for ttotal:
t* such that P(ttotal > t*|t) = 0.5
ttotal
Yields Gott’s Rule:
Guess t* = 2t
Evaluating Gott’s Rule
• You read about a movie that has made $78 million
to date. How much money will it make in total?
– “$156 million” seems reasonable.
• You meet someone who is 35 years old. How long
will they live?
– “70 years” seems reasonable.
• Not so simple:
– You meet someone who is 78 years old. How long will
they live?
– You meet someone who is 6 years old. How long will
they live?
Priors P(ttotal) based on empirically measured durations or magnitudes
for many real-world events in each class:
Median human judgments of the total duration or magnitude ttotal of
events in each class, given that they are first observed at a duration or
magnitude t, versus Bayesian predictions (median of P(ttotal|t)).
You learn that in ancient
Egypt, there was a great
flood in the 11th year of
a pharaoh’s reign. How
long did he reign?
You learn that in ancient
Egypt, there was a great
flood in the 11th year of
a pharaoh’s reign. How
long did he reign?
How long did the typical
pharaoh reign in ancient
egypt?
Summary: prediction
• Predictions about the extent or magnitude of
everyday events follow Bayesian principles.
• Contrast with Bayesian inference in perception,
motor control, memory: no “universal priors” here.
• Predictions depend rationally on priors that are
appropriately calibrated for different domains.
– Form of the prior (e.g., power-law or exponential)
– Specific distribution given that form (parameters)
– Non-parametric distribution when necessary.
• In the absence of concrete experience, priors may
be generated by qualitative background knowledge.
Learning concepts from examples
“tufa”
• Word learning
“tufa”
“tufa”
• Property induction
Cows have T9 hormones.
Seals have T9 hormones.
Squirrels have T9 hormones.
Cows have T9 hormones.
Sheep have T9 hormones.
Goats have T9 hormones.
All mammals have T9 hormones.
All mammals have T9 hormones.
The computational problem
(c.f., semi-supervised learning)
?
Horse
Cow
Chimp
Gorilla
Mouse
Squirrel
Dolphin
Seal
Rhino
Elephant
?
?
?
?
?
?
?
?
Features
(85 features from Osherson et al. E.g., for Elephant: ‘gray’,
‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’,
‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)
New property
Similarity-based models
Human judgments
of argument strength
Model predictions
Cows have property P.
Elephants have property P.
Horses have property P.
Gorillas have property P.
Mice have property P.
Seals have property P.
All mammals have property P.
All mammals have property P.
Beyond similarity-based induction
• Reasoning
based on
dimensional
thresholds:
(Smith et al., 1993)
• Reasoning
based on causal
relations:
(Medin et al., 2004;
Coley & Shafto,
2003)
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Salmon carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Salmon carry E. Spirus bacteria.
Horses have T9 hormones
Rhinos have T9 hormones
Cows have T9 hormones
}X
P(Y | X ) 
Y


h consistentwith X ,Y
h consistentwith X
Hypotheses h
Horse
Cow
Chimp
Gorilla
Mouse
Squirrel
Dolphin
Seal
Rhino
Elephant
?
?
?
?
?
?
?
...
P ( h)
...
?
Prior P(h)
P ( h)
Horses have T9 hormones
Rhinos have T9 hormones
Cows have T9 hormones
}X
P(Y | X ) 
Y
?
?
?
?
?
?
?
P ( h)
h consistentwith X ,Y
P ( h)
h consistentwith X
Hypotheses h
Horse
Cow
Chimp
Gorilla
Mouse
Squirrel
Dolphin
Seal
Rhino
Elephant


Prediction P(Y | X)
...
...
?
Prior P(h)
Where does the prior come from?
Horse
Cow
Chimp
Gorilla
Mouse
Squirrel
Dolphin
Seal
Rhino
Elephant
...
...
Prior P(h)
Why not just enumerate all logically possible hypotheses
along with their relative prior probabilities?
Different sources for priors
Chimps have T9 hormones.
Taxonomic similarity
Gorillas have T9 hormones.
Poodles can bite through wire.
Jaw strength
Dobermans can bite through wire.
Salmon carry E. Spirus bacteria.
Food web relations
Grizzly bears carry E. Spirus bacteria.
Hierarchical Bayesian Framework
F: form
P(structure | form)
Tree with species at leaf nodes
mouse
squirrel
S: structure
chimp
Has T9
hormones
gorilla
P(data | structure)
F1
F2
F3
F4
Background
knowledge
P(form)
D: data
mouse
squirrel
chimp
gorilla
…
?
?
?
The value of structural form knowledge:
inductive bias
Hierarchical Bayesian Framework
F: form
Tree with species at leaf nodes
mouse
squirrel
S: structure
chimp
F1
F2
F3
F4
Has T9
hormones
gorilla
D: data
Property induction
mouse
squirrel
chimp
gorilla
…
?
?
?
P(D|S): How the structure constrains the
data of experience
• Define a stochastic process over structure S that
generates hypotheses h.
– Intuitively, properties should vary smoothly over structure.
Smooth: P(h) high
Not smooth: P(h) low
P(D|S): How the structure constrains the
data of experience
S
Gaussian Process
(~ random walk,
diffusion)
[Zhu, Ghahramani
& Lafferty 2003]
y
Threshold
h
P(D|S): How the structure constrains the
data of experience
S
Gaussian Process
(~ random walk,
diffusion)
[Zhu, Lafferty &
Ghahramani 2003]
y
Threshold
h
Structure S
Data D
Species 1
Species 2
Species 3
Species 4
Species 5
Species 6
Species 7
Species 8
Species 9
Species 10
Features
85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’,
‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’,
‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
[c.f., Lawrence,
2004; Smola &
Kondor 2003]
Structure S
Data D
Species 1
Species 2
Species 3
Species 4
Species 5
Species 6
Species 7
Species 8
Species 9
Species 10
?
?
?
?
?
?
?
?
Features
New property
85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’,
‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’,
‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…
Cows have property P.
Elephants have property P.
2D
Tree
Horses have property P.
Gorillas have property P.
Mice have property P.
Seals have property P.
All mammals have property P.
Testing different priors
Inductive bias
Correct
bias
Wrong
bias
No
bias
Too strong
bias
A connectionist alternative
(Rogers and McClelland, 2004)
Species
Features
Emergent structure:
clustering on hidden
unit activation vectors
Reasoning about spatially
varying properties
“Native American artifacts” task
Property type
“has T9 hormones”
“can bite through wire”
Theory Structure
taxonomic tree
+ diffusion process
directed chain
+ drift process
“carry E. Spirus bacteria”
directed network
+ noisy transmission
Class D
Class A
Class B
Class D
Class A
Class A
Class F
Class C
Class D
Class E
Class C
Class C
Class E
Class B
Class G
Class E
Class B
Class F
Class G
Hypotheses
Class A
Class B
Class C
Class D
Class E
Class F
Class G
Class F
Class G
...
...
...
Reasoning with two property types
Biological
property
“Given that X has property P, how likely is it that Y does?”
Herring
Tuna
Mako shark
Sand shark
Disease
property
Dolphin
Human
Kelp
Tree
Web
Sand shark
(Shafto, Kemp,
Bonawitz,
Coley &
Tenenbaum)
Kelp
Herring
Tuna
Dolphin
Mako shark
Human
Summary so far
• A framework for modeling human inductive
reasoning as rational statistical inference over
structured knowledge representations
– Qualitatively different priors are appropriate for different
domains of property induction.
– In each domain, a prior that matches the world’s structure
fits people’s judgments well, and better than alternative
priors.
– A language for representing different theories: graph
structure defined over objects + probabilistic model for the
distribution of properties over that graph.
• Remaining question: How can we learn
appropriate theories for different domains?
Hierarchical Bayesian Framework
F: form
S: structure
Chain
Tree
chimp
mouse
gorilla
squirrel
squirrel
chimp
Space
mouse
squirrel
gorilla
chimp
gorilla
F1
F2
F3
F4
mouse
D: data
mouse
squirrel
chimp
gorilla
Discovering structural forms
Ostrich Robin Crocodile Snake Turtle Bat
Orangutan
Discovering structural forms
“Great chain of being”
Linnaeus
Ostrich Robin Crocodile Snake Turtle Bat
Orangutan
People can discover structural forms
• Scientific discoveries
“great chain of being”
Tree structure for
biological species
Periodic structure for
chemical elements
Systema Naturae
Kingdom Animalia
Phylum Chordata
Class Mammalia
Order Primates
Family Hominidae
Genus Homo
Species Homo sapiens
(1579)
(1735)
(1837)
• Children’s cognitive development
–
–
–
–
Hierarchical structure of category labels
Clique structure of social groups
Cyclical structure of seasons or days of the week
Transitive structure for value
Typical structure learning algorithms
assume a fixed structural form
Flat Clusters
Line
Circle
K-Means
Mixture models
Competitive learning
Guttman scaling
Ideal point models
Circumplex models
Tree
Grid
Euclidean Space
Hierarchical clustering
Bayesian phylogenetics
Self-Organizing Map
Generative topographic
mapping
MDS
PCA
Factor Analysis
The ultimate goal
“Universal Structure
Learner”
Data
K-Means
Hierarchical clustering
Factor Analysis
Guttman scaling
Circumplex models
Self-Organizing maps
···
Representation
A “universal grammar” for structural forms
Form
Process
Form
Process
Hierarchical Bayesian Framework
F: form
Favors simplicity
mouse
squirrel
S: structure
Favors smoothness
chimp
gorilla
D: data
F1
F2
F3
F4
[Zhu et al., 2003]
mouse
squirrel
chimp
gorilla
Model fitting
• Evaluate each form in parallel
• For each form, heuristic search over structures
based on greedy growth from a one-node seed:
Structural forms from relational data
Dominance
hierarchy
Primate troop
“x beats y”
Tree
Bush administration
“x told y”
Cliques
Prison inmates
“x likes y”
Ring
Kula islands
“x trades with y”
Development of structural forms
as more data are observed
Beyond “Nativism” versus “Empiricism”
• “Nativism”: Explicit knowledge of structural
forms for core domains is innate.
– Atran (1998): The tendency to group living kinds into
hierarchies reflects an “innately determined cognitive
structure”.
– Chomsky (1980): “The belief that various systems of mind are
organized along quite different principles leads to the natural
conclusion that these systems are intrinsically determined, not
simply the result of common mechanisms of learning or
growth.”
• “Empiricism”: General-purpose learning systems
without explicit knowledge of structural form.
– Connectionist networks (e.g., Rogers and McClelland, 2004).
– Traditional structure learning in probabilistic graphical models.
Summary: learning from examples
Bayesian inference over
hierarchies of structured
representations provides a
framework to understand core
questions of human learning:
F: form
mouse
S: structure
squirrel
chimp
F1
F2
F3
F4
– What is the content and form of
gorilla
human knowledge, at multiple
levels of abstraction?
– How does abstract domain
mouse
D: data
knowledge guide learning of new
squirrel
chimp
concepts?
gorilla
– How is abstract domain knowledge
learned? What must be built in?
– How can domain-general learning mechanisms acquire domainspecific representations? How can probabilistic inference work
together with symbolic, flexibly structured representations?
Learning word meanings
Bayesian inference over treestructured hypothesis space:
(Xu & Tenenbaum;
Schmidt & Tenenbaum)
“tufa”
“tufa”
“tufa”
Learning word meanings
Principles
Structure
Data
Shape bias
Taxonomic principle
Contrast principle
Basic-level bias
Representative examples
Learning causal relations
Abstract
Principles
Structure
Data
(Griffiths, Tenenbaum, Kemp et al.)
Both objects activate
the detector
Object A does not
activate the detector
by itself
Causal learning with prior knowledge
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
ure
used
et al.
(2002), Experiment
2
Sobel
et in
al.Sobel
(2002),
Experiment
2
One-Cause Condition
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Blocking&Condition
(Griffiths, Sobel, Backward
Tenenbaum
Gopnik)
ion
One-Cause Condition
Childr
each is
Thenare
they
makethe
“Backwards blocking”
paradigm:
Both objects activate
the detector
A does not
Object A does notObject
activate the detector
activate the detector
by itself
by itself
Object A does not
Children are asked if
each
is a blicket
activate
the
detector
Both objects activate
Object Aactivates the
are asked if
Children are askedChildren
if
Then
they
are
asked to
byaitself
the
detector
detector by itself
blicket
each is a blicket each is
Both objects activate
Object A does not
make
t
he
machine
go
Then
Then
they
are
asked
to
they are asked to
the detector
activate the detector
makethe machine go
makethe machine go
by itself
Initial
Backward Blocking Condition
ng Condition
Both objects activate
Aactivates the
Object AactivatesObject
thethe detector
detector by itself detector by itself
AB Trial
A Trial
Childr
each
Children are asked ifis
Thenare
each is a blicketthey
make
Thenare asked to the
they
makethe machine go
Backward Blocking Condition
Object Aactivates the
detector
by
Children
areitself
asked if
Children are asked
if
each is a blicket each is a blicket
Both objects activate
Thenare asked to the detector
Thenare asked to they
they
makethe machine go
makethe machine go
Children are asked if
each is a blicket
Thenare asked to
they
Object Aactivates the
makethe machine go
detector by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
First-order probabilistic theories for
causal learning
Learning causal relations
Structure
Data
conditions
patients
has(patient,condition)
Abstract causal theories
Classes = {C}
Laws = {C C}
Classes = {R, D, S}
Laws = {R D, D
S}
Classes = {R, D, S}
Laws = {S D}
Learning causal relations
Abstract
Principles
Classes = {R, D, S}
Laws = {R D, D
R: working in factory,
smoking, stress, high fat
diet, …
S}
D: flu, bronchitis, lung
cancer, heart disease,
…
S: headache, fever,
coughing, chest pain, …
Structure
Data
conditions
patients
has(patient,condition)
True structure of
graphical model G:
7
8
# of samples: 20
Graph G
1
2
3
4
5
6
9
10
11
12
13
14
80
15
16
1000
edge
(G)
Data D
Classes Z
Abstract
Theory
1 2 3
4 5 6
…
c1
h
c1 c2
7 8 9 10
11 12 13
14 15 16
… c
…
2
class
(z)
…
c1 0.0 0.4 …
c2 0.0 0.0
Graph G
Data D
edge
(G)
(Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)
“Universal Grammar”
Hierarchical phrase structure
grammars (e.g., CFG, HPSG, TAG)
P(grammar | UG)
Grammar
P(phrase structure | grammar)
S  NP VP
NP  Det [ Adj ] Noun [ RelClause ]
RelClause  [ Rel ] NP V
VP  VP NP
VP  Verb
Phrase structure
P(utterance | phrase structure)
Utterance
P(speech | utterance)
Speech signal
(Jurafsky; Levy & Jaeger; Klein & Manning; Perfors et al., ….)
Vision as probabilistic parsing
“Analysis by
Synthesis”
(Han & Zhu, 2006)
Goal-directed action
(production and comprehension)
(Wolpert et al., 2003)
Understanding goal-directed actions
• Heider and Simmel
• Csibra & Gergely
Constraints
Goals
Principle of rationality: An intentional
agent plans actions to achieve its
goals most efficiently given its
environmental constraints.
Actions
Goal inference as inverse
probabilistic planning
human
judgments
(Baker, Tenenbaum & Saxe)
Constraints
Goals
Rational planning
(PO)MDP
Actions
model predictions
The big picture
• What we need to understand: the mind’s ability to build
rich models of the world from sparse data.
–
–
–
–
–
Learning about objects, categories, and their properties.
Causal inference
Language comprehension and production
Scene understanding
Understanding other people’s actions, plans, thoughts, goals
• What do we need to understand these abilities?
–
–
–
–
Bayesian inference in probabilistic generative models
Hierarchical models, with inference at all levels of abstraction
Structured representations: graphs, grammars, logic
Flexible representations, growing in response to observed data
Open directions and challenges
• Effective methods for learning structured knowledge
– How to balance expressiveness and learnability?
… flexibility and constraint?
• More precise relation to psychological processes
– To what extent do mental processes implement boundedly
rational methods of approximate inference?
• Relation to neural computation
– How to implement structured representations in brains?
• Understanding failure cases
– Are these simply “not Bayesian”, or are people using a
different model? How do we avoid circularity?
Unsupervised
Supervised
The “standard model” of learning in
neuroscience
1 n
E   ( yt  w  xt ) 2
2 t 1
n
E
w  
  ( yt  w  xt ) xt
w t 1
1 n 2 1 n
E   yt   ( w  xt ) 2
2 t 1
2 t 1
w  
n
E
  yt  xt
w t 1
Learning grounded causal models
(Goodman, Mansinghka & Tenenbaum)
A child learns that petting the cat leads to purring, while pounding
leads to growling. But how to learn these symbolic event concepts
over which causal links are defined?
abc
a
b
c
abc
abc
The chicken-and-egg problem of
structure learning and feature selection
A raw data matrix:
The chicken-and-egg problem of
structure learning and feature selection
Conventional clustering (CRP mixture):
Learning multiple structures to explain
different feature subsets
(Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006)
CrossCat:
System 1
System 2
System 3
The “nonparametric safety-net”
True structure of
graphical model G:
11
12
1
10
2
9
3
8
4
7
# of samples: 40
Graph G
Data D
Abstract theory Z
6
5
100
1000
edge
(G)
edge
(G)
Graph G
Data D
class
(z)
Bayesian prediction
P(ttotal|tpast) 
posterior
probability
1/ttotal
Random
sampling
P(tpast)
Domain-dependent
prior
What is the best guess for ttotal?
Compute t such that P(ttotal > t|tpast) = 0.5:
P(ttotal|tpast)
ttotal
We compared the median
of the Bayesian posterior
with the median of subjects’
judgments… but what about
the distribution of subjects’
judgments?
Sources of individual differences
• Individuals’ judgments could by noisy.
• Individuals’ judgments could be optimal,
but with different priors.
– e.g., each individual has seen only a sparse
sample of the relevant population of events.
• Individuals’ inferences about the posterior
could be optimal, but their judgments could
be based on probability (or utility) matching
rather than maximizing.
Proportion of judgments below predicted value
Individual differences in prediction
P(ttotal|tpast)
ttotal
Quantile of Bayesian posterior distribution
Individual differences in prediction
P(ttotal|tpast)
ttotal
Average over all
prediction tasks:
•
•
•
•
•
•
movie run times
movie grosses
poem lengths
life spans
terms in congress
cake baking times