9.94.Tenenbaum

Download Report

Transcript 9.94.Tenenbaum

9.94 The cognitive science of
intuitive theories
J. Tenenbaum, T. Lombrozo,
L. Schulz, R. Saxe
Plan for the class
• Goal:
– Introduce you to an exciting field of research
unfolding at MIT and in the broader world.
Plan for the class
• An integrated series of talks and discussions
– Today: Josh Tenenbaum (MIT), “Computational
models of theory-based inductive inference”
– Tuesday: Tania Lombrozo (Harvard/Berkeley)
“Explanation in intuitive theories”
– Wednesday: Rebecca Saxe (Harvard/MIT),
“Understanding other minds”
– Thursday: Laura Schulz (MIT), “Theories and
evidence”
– Friday: Special Mystery Guest, “When theories fail”
Plan for the class
• Requirements for credit (pass/fail, 3 units)
– Attend the classes
– Participate in discussions
– Take-home quiz:
• Emailed to you this weekend (after the class)
• Due back to me by email on Wednesday, Feb. 1
If you are not registered on the list below, make sure
to register and send me an email message at:
[email protected]
Class list
Azeez,Zainab O
Ferranti,Darlene E
Liu,Yicong
Taub,Daniel M
Belova,Nadezhda
Frazier,Jonathan J
McGuire,Salafa'anius
Tung,Roland
Brenman,Stephanie V
Gordon,Matthew A
Ovadya,Aviv
Voelbel,Kathleen
Cherian,Tharian S
Green,Delbert A
Poon,Samuel H
Vosoughi,Soroush
Clark,Abigail M
Huhn,Anika M
Pradhan,Nikhil T
Willmer,Anjuli J
Clark,Samuel D
Hunt,Beatrice P.
Ren,Danan T
Ye,Diana F
Curry,Justin M
Kamrowski,Kaitlin M
Rotter,Juliana C
Yuen,Grace J
Dai,Jizi
Kanaga,Noelle J
Slagle,Amy M
Zhao,Bo
Dean,Clare
Kwon,Jennifer
Tang,Di
Scheduling
• Today:
– Go to 4:30 (with a break in the middle)?
• Friday:
– An hour earlier: 1:00 – 3:00?
The big problem of intelligence
How does the mind get so much out of so little?
The big problem of intelligence
How does the mind get so much out of so little?
Three-dimensional:
Two-dimensional:
The big problem of intelligence
How can we generalize new concepts reliably
from just one or a few examples?
– Learning word meanings
“horse”
“horse”
“horse”
The objects of planet Gazoob
“tufa”
“tufa”
“tufa”
The big problem of intelligence
How can we generalize new concepts reliably
from just one or a few examples?
– Learning about new properties of categories
Cows have T4 hormones.
Bees have T4 hormones.
Salmon have T4 hormones.
Cows have T4 hormones.
Goats have T4 hormones.
Sheep have T4 hormones.
Humans have T4 hormones.
Humans have T4 hormones.
The big problem of intelligence
How do we use concepts in ways that go
beyond our experience?
• “dog”
• Is it still a dog if you…
– Put a penguin costume on it?
– Surgically alter it until it looks just like a penguin?
– Pre-natally inject a substance that causes it to look just like a penguin?
… and it can mate with penguins and produce penguin offspring?
The big problem of intelligence
How do we use concepts in ways that go
beyond our experience?
• Two cars were reported stolen by the Groveton police
yesterday.
• The judge sentenced the killer to die in the electric chair
for the second time.
• No one was injured in the blast, which was attributed to a
buildup of gas by one town official.
• One witness told the commissioners that she had seen
sexual intercourse taking place between two parked cars in
front of her house.
The big problem of intelligence
How do we use concepts in ways that go
beyond our experience?
Consider a man named Boris.
– Is the mother of Boris’s father his grandmother?
– Is the mother of Boris’s sister his mother?
– Is the son of Boris’s sister his son?
(Note: Boris and his family were
stranded on a desert island when he
was a young boy.)
What makes us so smart?
• Memory?
• Logical inference?
What makes us so smart?
• Memory? No.
– The difference between a test that you can pass on
rote memory and a test that shows whether you
“actually learned something”.
• Logical inference? No.
– The difference between deductive inference and
inductive inference.
Modes of inference
• Deductive inference:
All mammals have biotinic acid in their blood.
Horses are mammals.
Horses have biotinic acid in their blood.
• Inductive inference:
Horses have biotinic acid in their blood.
Horses are mammals.
All mammals have biotinic acid in their blood.
What makes us so smart?
• Intuitive theories
– Systems of concepts that are in some important
respects like scientific theories.
– Abstract knowledge that supports prediction,
explanation, exploration, and decision-making for
an infinite range of situations that we have not
previously encountered.
Some questions about intuitive theories
•
•
•
•
What is their content?
How are they represented in the mind or brain?
How are they used to generalize to new situations?
How are they acquired?
Some questions about intuitive theories
•
•
•
•
What is their content?
How are they represented in the mind or brain?
How are they used to generalize to new situations?
How are they acquired?
• Can they be described in computational terms?
• In what essential ways are they similar to or different
from scientific theories?
• How good (accurate, comprehensive, rich) are they,
under what circumstances? What can we learn from
their failures?
What can we learn from
perceptual or cognitive illusions?
• Goal of visual
perception is to recover
world structure from
visual images.
• Why the problem is
hard: many world
structures can produce
the same visual input.
Scene hypotheses
Image data
What can we learn from
perceptual or cognitive illusions?
• Goal of visual
perception is to recover
world structure from
visual images.
• Why the problem is
hard: many world
structures can produce
the same visual input.
• Illusions reveal the
visual system’s implicit
theories of the physical
world and the process of
image formation.
Computational models of theorybased inductive inference
Josh Tenenbaum
Department of Brain and Cognitive Sciences
Computer Science and Artificial Intelligence Laboratory
MIT
Plan for today
• A general framework for solving underconstrained inference problems
– Bayesian inference
• Applications in perception and cognition
– lightness perception
– predicting the future (with Tom Griffiths)
– learning about properties of natural species (with
Charles Kemp)
Modes of inference
• Deductive inference: Logic
All mammals have biotinic acid in their blood.
Horses are mammals.
Horses have biotinic acid in their blood.
• Inductive inference: Probability
Horses have biotinic acid in their blood.
Horses are mammals.
All mammals have biotinic acid in their blood.
Bayesian inference
• Definition of conditional probability:
P( A, B)  P( A) P( B | A)  P( B) P( A | B)
• Bayes’ rule:
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
• “Posterior probability”: P(h | d )
• “Prior probability”: P(h)
• “Likelihood”: P(d | h)
Bayesian inference
P ( h) P ( d | h)
• Bayes’ rule: P(h | d ) 
 P(hi ) P(d | hi )
• An example
hi
– Data: John is coughing
– Some hypotheses:
1. John has a cold
2. John has emphysema
3. John has a stomach flu
– Prior favors 1 and 3 over 2
– Likelihood P(d|h) favors 1 and 2 over 3
– Posterior P(d|h) favors 1 over 2 and 3
Bayesian inference
P ( h) P ( d | h)
• Bayes’ rule: P(h | d ) 
 P(hi ) P(d | hi )
hi
• What makes a good scientific argument?
P(h|d) is high if:
– Hypothesis is plausible: P(h) is high
– Hypothesis strongly predicts the observed data:
P(d|h) is high
– Data are surprising:
P(d )   P(hi ) P(d | hi ) is low
hi
Coin flipping
HHTHT
HHHHH
What process produced these sequences?
Comparing two simple hypotheses
• Contrast simple hypotheses:
– H1: “fair coin”, P(H) = 0.5
– H2:“always heads”, P(H) = 1.0
• Bayes’ rule:
P( H ) P( D | H )
P( H | D) 
P( D)
• With two hypotheses, use odds form
Comparing two simple hypotheses
P( H1 | D)
P ( H 2 | D)
D:

P( D | H1 )

P( D | H 2 )
HHTHT
H1, H2:
“fair coin”, “always heads”
P(D|H1) = 1/25
P(H1) =
?
P(D|H2) = 0
P(H2) =
1-?
P( H1 )
P( H 2 )
Comparing two simple hypotheses
P( H1 | D)
P ( H 2 | D)
D:

P( D | H1 )

P( D | H 2 )
P( H1 )
P( H 2 )
HHTHT
H1, H2:
“fair coin”, “always heads”
P(D|H1) = 1/25
P(H1) =
999/1000
P(D|H2) = 0
P(H2) =
1/1000
P( H1 | D)
P ( H 2 | D)
1 32 999


 infinity
0
1
Comparing two simple hypotheses
P( H1 | D)
P ( H 2 | D)
D:

P( D | H1 )

P( D | H 2 )
P( H1 )
P( H 2 )
HHHHH
H1, H2:
“fair coin”, “always heads”
P(D|H1) = 1/25
P(H1) =
999/1000
P(D|H2) = 1
P(H2) =
1/1000
P( H1 | D)
P ( H 2 | D)
1 999


 30
32 1
Comparing two simple hypotheses
P( H1 | D)
P ( H 2 | D)
D:

P( D | H1 )

P( D | H 2 )
P( H1 )
P( H 2 )
HHHHHHHHHH
H1, H2:
“fair coin”, “always heads”
P(D|H1) = 1/210
P(H1) =
999/1000
P(D|H2) = 1
P(H2) =
1/1000
P( H1 | D)
P ( H 2 | D)
1
999


1
1024 1
The role of intuitive theories
The fact that HHTHT looks representative of
a fair coin and HHHHH does not reflects our
implicit theories of how the world works.
– Easy to imagine how a trick all-heads coin
could work: high prior probability.
– Hard to imagine how a trick “HHTHT” coin
could work: low prior probability.
Plan for today
• A general framework for solving underconstrained inference problems
– Bayesian inference
• Applications in perception and cognition
– lightness perception
– predicting the future (with Tom Griffiths)
– learning about properties of natural species (with
Charles Kemp)
Gelb / Gilchrist demo
Explaining the illusion
• The problem of lightness constancy
– Separating the intrinsic reflectance (“color”) of
a surface from the intensity of the illumination.
• Anchoring heuristic:
– Assume that the brightest patch in each scene is
white.
• Questions:
– Is this really right?
– Why (and when) is it a good solution to the
problem of lightness constancy?
Why is lightness constancy hard?
• The physics of light reflection:
L=IxR
L: luminance (light emitted from surface)
I: intensity of illumination in the world
R: reflectance of surface in the world
• The problem: Given L, solve for I and R.
Why is lightness constancy hard?
• The physics of light reflection:
L1 = I x R1
L2 = I x R2
...
Ln = I x Rn
• The problem: Given L1, …, Ln, solve for I
and R1, …, Rn.
Why is lightness constancy hard?
Scene hypotheses
I = 10
R = {0.2, 0.4, 0.5, 0.9}
L=IxR
Image data
I = 15
R = {0.13, 0.26, 0.33, 0.60}
I = 100
R = {0.02, 0.04, 0.05, 0.09}
L = {2, 4, 5, 9}
A simplified theory of the
visual world
• Really bright illuminants are rare.
P(I)
P(I)
0
I
0
I
A simplified theory of the
visual world
• Really bright illuminants are rare.
P(I)
P(I)
0
• Any surface color is
equally likely.
I
I
0
P(Ri)
0
1
(black)
(white)
Ri
A simplified theory of the
visual world
• Really bright illuminants are rare.
P(I)
P(I)
0
I
0
• Observed luminances,
P(Li|I )
Li = I x Ri , are a random
sample from 0 to I.
0
I
I
Li
A simplified theory of the
visual world
• Really bright illuminants are rare.
P(I)
P(I)
0
I
I
0
• Observed luminances,
P(Li|I )
Li = I x Ri , are a random
sample from 0 to I.
0
I’
I
Li
Scene hypotheses h
h1
P(h1): high
I = 10
Image data d
h2
P(h2): med
I = 15
h3
P(h3): low
I = 100
L = {9}
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
Scene hypotheses h
h1
P(h1): high
I = 10
Prior probability
alone can’t explain
how inference changes
with more data.
Image data d
h2
P(h2): med
I = 15
h3
P(h3): low
I = 100
L = {2, 4, 5, 9}
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
Scene hypotheses h
h1
P(h1): high
P ( d | h) 
I = 10
1 104
1
In
Image data d
h2
P(h2): med
I = 15
1 154
L = {9}
1 1004
h3
P(h3): low
I = 100
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
Scene hypotheses h
h1
P(h1): high
P ( d | h) 
I = 10
1 104
1
In
Image data d
h2
P(h2): med
I = 15
1 154
L = {2, 4, 5, 9}
1 1004
h3
P(h3): low
I = 100
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
p(L= l | I )
Graphing the likelihood
I =10
I =15
9
0
p({l1}| I=10) ~ p({l1}| I=15)
L
p(L= l | I )
Graphing the likelihood
I =10
I =15
2 4 5
0
9
L
p({l1, l2, l3, l4,}| I=10) >> p({l1, l2, l3, l4,}| I=15)
Explanations lightness constancy
• Anchoring heuristic: Assume that the
brightest patch in each scene is white.
– Is this really right?
– Why (and when) is it a good solution to the
problem?
• Bayesian analysis
– Explains the computational basis for inference.
– Explains why confidence in “brightest = white”
increases as more samples are observed.
Applications to cognition
• Predicting the future (with Tom Griffiths)
• Learning about properties of natural species
(with Charles Kemp)
Everyday prediction problems
• You read about a movie that has made $60 million
to date. How much money will it make in total?
• You see that something has been baking in the
oven for 34 minutes. How long until it’s ready?
• You meet someone who is 78 years old. How long
will they live?
• Your friend quotes to you from line 17 of his
favorite poem. How long is the poem?
• You see taxicab #107 pull up to the curb in front of
the train station. How many cabs in this city?
Making predictions
• You encounter a phenomenon that has
existed for tpast units of time. How long will
it continue into the future? (i.e. what’s ttotal?)
• We could replace “time” with any other
variable that ranges from 0 to some
unknown upper limit (c.f. lightness).
Bayesian inference
P(ttotal|tpast)  P(tpast|ttotal) P(ttotal)
posterior
probability
likelihood
prior
Bayesian inference
P(ttotal|tpast)  P(tpast|ttotal) P(ttotal)
posterior
probability
P(ttotal|tpast) 
likelihood
prior
1/ttotal
P(ttotal)
Assume
random
sample
(0 < tpast < ttotal)
Bayesian inference
P(ttotal|tpast)  P(tpast|ttotal) P(ttotal)
posterior
probability
P(ttotal|tpast) 
likelihood
prior
1/ttotal
1/ttotal
Assume
“Uninformative”
random
prior
sample
(0 < tpast < ttotal)
Bayesian inference
P(ttotal|tpast) 
posterior
probability
1/ttotal
1/ttotal
Random
sampling
“Uninformative”
prior
What is the best guess for ttotal?
How about maximal value of P(ttotal|tpast)?
P(ttotal|tpast)
ttotal = tpast
ttotal
Bayesian inference
P(ttotal|tpast) 
posterior
probability
1/ttotal
1/ttotal
Random
sampling
“Uninformative”
prior
What is the best guess for ttotal?
Instead, compute t such that P(ttotal > t|tpast) = 0.5:
P(ttotal|tpast)
ttotal
Bayesian inference
P(ttotal|tpast) 
posterior
probability
1/ttotal
1/ttotal
Random
sampling
“Uninformative”
prior
What is the best guess for ttotal?
Instead, compute t such that P(ttotal > t|tpast) = 0.5.
Yields Gott’s Rule: P(ttotal > t|tpast) = 0.5 when t = 2tpast
i.e., best guess for ttotal = 2tpast .
Evaluating Gott’s Rule
• You read about a movie that has made $78 million
to date. How much money will it make in total?
– “$156 million” seems reasonable.
• You meet someone who is 35 years old. How long
will they live?
– “70 years” seems reasonable.
• Not so simple:
– You meet someone who is 78 years old. How long will
they live?
– You meet someone who is 6 years old. How long will
they live?
The effects of priors
• Different kinds of priors P(ttotal) are
appropriate in different domains.
Gott: P(ttotal) ttotal-1
The effects of priors
• Different kinds of priors P(ttotal) are
appropriate in different domains.
e.g., wealth,
contacts
e.g., height,
lifespan
The effects of priors
Evaluating human predictions
• Different domains with different priors:
–
–
–
–
–
–
A movie has made $60 million
Your friend quotes from line 17 of a poem
You meet a 78 year old man
A move has been running for 55 minutes
A U.S. congressman has served for 11 years
A cake has been in the oven for 34 minutes
• Use 5 values of tpast for each.
• People predict ttotal .
You learn that in ancient
Egypt, there was a great
flood in the 11th year of
a pharaoh’s reign. How
long did he reign?
You learn that in ancient
Egypt, there was a great
flood in the 11th year of
a pharaoh’s reign. How
long did he reign?
How long did the typical
pharaoh reign in ancient
egypt?
Assumptions guiding inference
• Random sampling
• Strong prior knowledge
– Form of the prior (power-law or exponential)
– Specific distribution given that form (parameters)
– Non-parametric distribution when necessary.
• With these assumptions, strong predictions
can be made from a single observation
Applications to cognition
• Predicting the future (with Tom Griffiths)
• Learning about properties of natural species
(with Charles Kemp)
Which argument is stronger?
Cows have biotinic acid in their blood
Horses have biotinic acid in their blood
Rhinos have biotinic acid in their blood
All mammals have biotinic acid in their blood
Cows have biotinic acid in their blood
Dolphins have biotinic acid in their blood
Squirrels have biotinic acid in their blood
All mammals have biotinic acid in their blood
“Diversity phenomenon”
Osherson, Smith, Wilkie, Lopez, Shafir (1990):
• 20 subjects rated the strength of 45 arguments:
X1 have property P.
X2 have property P.
X3 have property P.
All mammals have property P.
• 40 different subjects rated the similarity of all
pairs of 10 mammals.
Traditional psychological models
Osherson et al. consider two similarity-based
models:
• Sum-Similarity:
P(all mammals| X ) 

 sim(i, j )

max sim(i, j )
imammals j X
• Max-Similarity:
P(all mammals | X ) 
imammals j X
Data
Data vs. models
Model
.
Each “ ” represents one argument:
X1 have property P.
X2 have property P.
X3 have property P.
All mammals have property P.
Open questions
• Explaining similarity:
– Why does Max-sim fit so well? When worse?
– Why does Sum-sim fit so poorly? When better?
• Explaining Max-sim:
– Is there some rational computation that Max-sim
implements or approximates?
– What theory about this task and domain is
implicit in Max-sim?
(c.f., analysis of lightness constancy)
A simplified theory of biology
• Species generated by an evolutionary
branching process.
– A tree-structured taxonomy of species.
• Taxonomy also central in folkbiology (Atran).
Theory-based Bayesian model
Begin by reconstructing intuitive taxonomy
from similarity judgments:
clustering
Theory-based Bayesian model
Hypothesis space H: each taxonomic cluster is
a possible hypothesis for the extension of
the novel property.
h0: “all mammals”
h1
h3
h6
h17
...
h0: “all mammals”
p(all mammals | X ) 
p( X | h0 )
 p( X | h) p(h)
hH
p(h): uniform
n
 1 
p ( X | h)  
if x1 ,  , xn  h

 size(h) 
 0 if any xi  h
How taxonomy constrains induction
• Atran (1998): “Fundamental principle of
systematic induction” (Warburton 1967,
Bock 1973)
– Given a property found among members of any
two species, the best initial hypothesis is that
the property is also present among all species
that are included in the smallest higher-order
taxon containing the original pair of species.
“all mammals”
Cows have property P.
Dolphins have property P.
Squirrels have property P.
All mammals have property P.
Strong (0.76 [max = 0.82])
“large herbivores”
Cows have property P.
Dolphins have property P.
Squirrels have property P.
Cows have property P.
Horses have property P.
Rhinos have property P.
All mammals have property P.
All mammals have property P.
Strong: 0.76 [max = 0.82])
Weak: 0.17 [min = 0.14]
“all mammals”
Cows have property P.
Dolphins have property P.
Squirrels have property P.
Seals have property P.
Dolphins have property P.
Squirrels have property P.
All mammals have property P.
All mammals have property P.
Strong: 0.76 [max = 0.82]
Weak: 0.30 [min = 0.14]
Bayes
(taxonomic)
Max-sim
Sum-sim
Conclusion
kind:
“all mammals”
“horses”
“horses”
Number of
examples:
3
2
1, 2, or 3
Cows have property P.
Dolphins have property P.
Squirrels have property P.
Bayes
(taxonomic)
All mammals have property P.
Seals have property P.
Dolphins have property P.
Squirrels have property P.
Max-sim
All mammals have property P.
Sum-sim
Conclusion
kind:
“all mammals”
Number of
examples:
3
A simplified theory of biology
• Species generated by an evolutionary
branching process.
– A tree-structured taxonomy of species.
• Features generated by stochastic mutation
process and passed on to descendants.
– Novel features can appear anywhere in tree, but
some distributions are more likely than others.
Theory-based Bayesian model
Hypothesis space H: each taxonomic cluster is
a possible hypothesis for the extension of a
novel feature.
h0: “all mammals”
h1
h3
h6
h17
...
Theory-based Bayesian model
Generate hypotheses for novel feature F via
(Poisson arrival) mutation process over
 b
branches b: p( F develops along b)  1  e
Theory-based Bayesian model
Generate hypotheses for novel feature F via
(Poisson arrival) mutation process over
 b
branches b: p( F develops along b)  1  e
Theory-based Bayesian model
Generate hypotheses for novel feature F via
(Poisson arrival) mutation process over
 b
branches b: p( F develops along b)  1  e
Theory-based Bayesian model
Generate hypotheses for novel feature F via
(Poisson arrival) mutation process over
 b
branches b: p( F develops along b)  1  e
Samples from the prior
• Labelings that cut the data along longer
branches are more probable:
>
Samples from the prior
• Labelings that cut the data along fewer
branches are more probable:
“monophyletic”
“polyphyletic”
>
h0: “all mammals”
p(all mammals | X ) 
p( X | h0 )
 p( X | h) p(h)
hH
p(h): “evolutionary” process
(mutation + inheritance)
n
 1 
p ( X | h)  
if x1 ,  , xn  h

 size(h) 
 0 if any xi  h
Bayes
(taxonomic)
Max-sim
Sum-sim
Conclusion
kind:
“all mammals”
“horses”
“horses”
Number of
examples:
3
2
1, 2, or 3
Bayes
(taxonomy+
mutation)
Max-sim
Sum-sim
Conclusion
kind:
“all mammals”
“horses”
“horses”
Number of
examples:
3
2
1, 2, or 3
Explaining similarity
• Why does Max-sim fit so well?
– An efficient and accurate approximation to
Bayesian (evolutionary) model.
Correlation with Bayes
on three-premise
general arguments,
over 100 simulated tree
structures:
There’s also a theorem.
Mean r = 0.94
Correlation (r)
Biology: Summary
• Theory-based statistical inference explains
inductive reasoning in folk biology.
• Mathematical modeling reveals people’s
implicit theories about the world.
– Category structure: taxonomic tree.
– Feature distribution: stochastic mutation process +
inheritance.
• Clarifies traditional psychological models.
– Why Max-sim over Sum-sim?
Beyond taxonomic similarity
• Generalization
based on
known
dimensions:
(Smith et al., 1993;
Blok et al., 2002)
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Beyond taxonomic similarity
• Generalization
based on
known
dimensions:
(Smith et al., 1993;
Blok et al., 2002)
• Generalization
based on causal
relations:
(Medin et al.,
2004; Shafto &
Coley, 2003)
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Salmon carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Salmon carry E. Spirus bacteria.
Predicate type
“has T4 hormones”
“can bite through wire”
Generative theory
taxonomic tree
+ mutation
directed chain
+ unknown threshold
“carries E. Spirus bacteria”
directed network
+ noisy transmission
Class D
Class A
Class B
Class D
Class A
Class A
Class F
Class C
Class D
Class E
Class C
Class C
Class E
Class B
Class G
Class E
Class B
Class F
Class G
Hypotheses
Class A
Class B
Class C
Class D
Class E
Class F
Class G
Class F
Class G
...
...
...
Herring
Island ecosystem
Tuna
Mako shark
Sand shark
Taxonomy
Dolphin
Human
Kelp
Food web
Human
Sand shark
Mako shark
Tuna
Dolphin
(Shafto, Kemp, Baraff, Coley, Tenenbaum)
Herring
Kelp
Models
Datasets
Bayes
(food web)
Bayes
(tree)
MaxSim
0.25
-0.15
0.92
0.07
0.87
0.79
0.31
0.01
0.89
0.17
0.86
Mammal
ecosystem:
- disease
- genetic
property
r = 0.75
Island
ecosystem:
- disease
- genetic
property
Assumptions guiding inferences
• Qualitatively different priors are appropriate for
different domains of inductive generalizaiton.
• In each domain, a prior that matches the world’s
structure fits people’s inductive judgments better
than alternative priors.
• A common framework for representing people’s
domain models: a graph structure defined over
entities or classes, and a probability distribution for
predicates over that graph.
Conclusion
• The hard problem of intelligence: how do
we “go beyond the information given”?
Cows have property P.
Dolphins have property P.
Squirrels have property P.
All mammals have property P.
• The solution:
– Bayesian statistical
inference:
P ( h) P ( d | h)
P( h | d ) 
 P(hi ) P(d | hi )
hi
– Implicit theories about the structure of the
world, generating P(h) and P(d | h).
Discussion
• How is this intuitive theory of biology like
or not like a scientific theory?
• In what sense does the visual system have a
theory of the world? How is it like or not
like a cognitive theory of biology, or a
scientific theory?