Bootstrapping

Download Report

Transcript Bootstrapping

Bootstrapping
Tom Griffiths
Bootstrapping
• How to learn words without knowing words
• Various proposals:
– “semantic bootstrapping”
– “syntactic bootstrapping”
(Pinker, 1984)
(Gleitman, 1990)
• Characterized by accelerated learning
(e.g. Regier, 2004)
• Question:
– when is bootstrapping possible?
Word learning
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
“blicket”
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
Bayes’ theorem
Posterior
probability
Likelihood
Prior
probability
p(d | h) p(h)
p(h | d ) 
 p(d | h) p(h)
hH
h: hypothesis
d: data
Sum over space
of hypotheses
Bayesian word learning
(Tenenbaum, 1999; Tenenbaum & Xu, 2002)
• Data
– scene-word pairs
x
• Hypotheses
h
– functions labeling scenes
• Likelihood
– weak sampling
– strong sampling

w
1 x  h
p(d | h)  p(w | x,h)  
0 x  h

 1 x  h
p(d | h)  p(x | w,h)  | h |

 0 x  h
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
p(d|h) = 0
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
p(d|h) = 1/3
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
“blicket”
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
p(d|h) = (1/3)3
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
p(d|h) = 1/12
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
“blicket”
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“blicket”
p(d|h) = (1/12)3
Bootstrapping
• Bayesian word learning is a form of
semantic bootstrapping
(Niyogi, 2002)
• What about accelerated learning?
– non-linear* increase in probability of correct
answer for a random scene and word
• When can it occur?
– not when hypotheses independent and all
equally likely, when using weak sampling
– speculation: hypotheses are dependent
Forms of dependency
• Hierarchical priors
– unknowns across learning events
• Compositional priors
– unknowns within learning events
Hierarchical priors

x
h
h
h
w
“blicket”
x
x
w
“toma”
x
h
w
“dax”
w
“wug”
“dax”
“blicket”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
Qui ckT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) dec ompressor
are needed t o see this pic ture.
Quic kTi me™ a nd a
TIFF (Un co mp res se d) d ec ompre ss or
ar e n ee ded to see th is p ictu re .
Quic kT ime™ and a
T IFF (Uncompress ed) dec ompress or
are needed to s ee this pic ture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
TIFF(Uncompres sed) decompressor
are needed t o see this pict ure.
Quick Time™ an d a
TIFF ( Un compr ess ed ) de co mpr es sor
ar e n eed ed to s ee this pic tur e.
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
QuickTime™ and a
TIFF(Uncompressed) decompressor
are needed to see this pi cture.
“toma”
“wug”?
Hierarchical priors
• What is contained in a hierarchical prior?
• Any learned information that constrains
scene-word mappings
– typical referents (whole object)
– dimensions of stimuli (shape/substance)
– pragmatic dependencies (mutual exclusivity)
– sound and meaning (morphology)
Compositional hypotheses
“blicket toma”
Quic kT ime™ and a
T IFF (Uncompress ed) decompress or
are needed to s ee this pi cture.
h
G
x
w1
h1
w2
holistic
x
w1
h2
w2
independent
h1
x
w1
h2
w2
compositional
Compositional hypotheses
• Good news:
– express syntactic bootstrapping
– model referential uncertainty
• Bad news
– requires complete linguistic theory
Bootstrapping
• When do we see accelerated learning?
– speculation: dependent hypotheses
• Sources of dependency in language
– hierarchical priors
– compositional hypotheses
• Bootstrapping goes beyond language
– learning causal theories aids learn causal
relationships, learning concepts…