The Harmonic Mind - Cognitive Science at Johns Hopkins

Download Report

Transcript The Harmonic Mind - Cognitive Science at Johns Hopkins

Jakobson's Grand Unified Theory
of Linguistic Cognition
Paul Smolensky
Cognitive Science Department
Johns Hopkins University
with:
Elliott Moreton
Karen Arnold
Donald Mathis
Melanie Soderstrom
Géraldine Legendre
Alan Prince
Peter Jusczyk
Suzanne Stevenson
Grammar and Cognition
1. What is the system of knowledge?
2. How does this system of knowledge
arise in the mind/brain?
3. How is this knowledge put to use?
4. What are the physical mechanisms
that serve as the material basis for
this system of knowledge and for the
use of this knowledge?
(Chomsky ‘88; p. 3)
Advertisement
The complete story, forthcoming (2003)
Blackwell:
The harmonic mind: From neural computation to
optimality-theoretic grammar
Smolensky & Legendre
Jakobson’s Program
A Grand Unified Theory for the cognitive science
of language is enabled by Markedness:
Avoid α
① Structure
• Alternations eliminate α
①
• Typology: Inventories lack α
② Acquisition
②
• α is acquired late
③ Processing
③
• α is processed poorly
④ Neural
④
• Brain damage most easily disrupts α
Formalize
through OT?
OT
Structure
Acquisition Use Neural Realization
 Theoretical. OT (Prince & Smolensky ’91, ’93):
– Construct formal grammars directly from
markedness principles
– General formalism/ framework for grammars:
phonology, syntax, semantics; GB/LFG/…
– Strongly universalist: inherent typology

/ Empirical. OT:
– Allows completely formal markednessbased explanation of highly complex data
Structure
Acquisition Use Neural Realization
• Theoretical
Formal structure enables OT-general:
– Learning algorithms
• Constraint
 Initial
state Demotion: Provably correct and

efficient (when part of a general decomposition
of the grammar learning problem)
– Tesar 1995 et seq.
Empirical
– Tesar
Smolensky 1993,
…, 2000
– Initial
state&predictions
explored
through
• Gradual Learning
Algorithm
behavioral
experiments
with infants
– Boersma 1998 et seq.
Structure
Acquisition Use Neural Realization
• Theoretical
– Theorems
regarding
the computational
• Empirical
(with
Suzanne
Stevenson)
–
–
–
–
complexity of algorithms for processing
Typical
processing theory:
with OTsentence
grammars
heuristic
constraints
• Tesar ’94 et seq.
OT:
output
• Ellison
’94 for every input; enables incremental
(word-by-word)
• Eisner ’97 et seq. processing
Empirical
results
• Frank & Satta
’98 concerning human sentence
processing
can be explained with OT
• Karttunendifficulties
’98
grammars employing independently motivated
syntactic constraints
The competence theory [OT grammar] is the
performance theory [human parsing heuristics]
Structure
Acquisition UseNeural Realization
• Theoretical
Empirical
OT
derives fromof
the
theory of abstract
 Construction
a miniature,
concreteneural
LAD
(connectionist) networks
– via Harmonic Grammar (Legendre, Miyata,
Smolensky ’90)
For moderate complexity, now have general
formalisms for realizing
– complex symbol structures as distributed patterns of
activity over abstract neurons
– structure-sensitive constraints/rules as distributed
patterns of strengths of abstract synaptic connections
– optimization of Harmony
Program
Structure
 OT
• Constructs formal grammars directly from
markedness principles
• Strongly universalist: inherent typology
 OT allows completely formal markedness-based
explanation of highly complex data
Acquisition
Initial state predictions explored through
behavioral experiments with infants
Neural Realization
 Construction of a miniature, concrete LAD
 The Great Dialectic
Phonological representations serve two masters
MARKEDNESS
FAITHFULNESS
Phonetic interface
[surface form]
Often: ‘minimize effort
(motoric & cognitive)’;
‘maximize discriminability’
Phonetic
s
Lexical interface
/underlying form/
Recoverability:
‘match this invariant form’
Phonological
Representation
Locked in conflict
Lexico
n
OT from Markedness Theory
• MARKEDNESS constraints: *α: No α
• FAITHFULNESS constraints
– Fα demands that /input/  [output] leave α
unchanged (McCarthy & Prince ’95)
– Fα controls when α is avoided (and how)
• Interaction of violable constraints: Ranking
– α is avoided when *α ≫ Fα
– α is tolerated when Fα ≫ *α
– M1 ≫ M2: combines multiple markedness dimensions
OT from Markedness Theory
• MARKEDNESS constraints: *α
• FAITHFULNESS constraints: Fα
• Interaction of violable constraints: Ranking
– α is avoided when *α ≫ Fα
– α is tolerated when Fα ≫ *α
– M1 ≫ M2: combines multiple markedness dimensions
• Typology: All cross-linguistic variation results
from differences in ranking – in how the
dialectic is resolved (and in how multiple
markedness dimensions are combined)
OT from Markedness Theory
•
•
•
•
MARKEDNESS constraints
FAITHFULNESS constraints
Interaction of violable constraints: Ranking
Typology: All cross-linguistic variation results
from differences in ranking – in resolution of the
dialectic
• Harmony = MARKEDNESS + FAITHFULNESS
– A formally viable successor to Minimize Markedness is
OT’s Maximize Harmony (among competitors)
 Structure
Explanatory goals achieved by OT
• Individual grammars are literally and
formally constructed directly from
universal markedness principles
• Inherent Typology :
Within the analysis of phenomenon Φ in
language L is inherent a typology of Φ
across all languages
Program
Structure
 OT
• Constructs formal grammars directly from
markedness principles
• Strongly universalist: inherent typology
 OT allows completely formal markedness-based
explanation of highly complex data --- Friday
Acquisition
Initial state predictions explored through
behavioral experiments with infants
Neural Realization
 Construction of a miniature, concrete LAD
 Structure: Summary
• OT builds formal grammars directly from
markedness: MARK, with FAITH
Friday:
• Inventories consistent with markedness
relations are formally the result of OT with
local conjunction
• Even highly complex patterns can be
explained purely with simple markedness
constraints: all complexity is in constraints’
interaction through ranking and conjunction:
Lango ATR vowel harmony
Program
Structure
 OT
• Constructs formal grammars directly from
markedness principles
• Strongly universalist: inherent typology
 OT allows completely formal markedness-based
explanation of highly complex data
Acquisition
Initial state predictions explored through
behavioral experiments with infants
Neural Realization
 Construction of a miniature, concrete LAD
Nativism I: Learnability
• Learning algorithm
– Provably correct and efficient (under strong assumptions)
– Sources:
• Tesar 1995 et seq.
• Tesar & Smolensky 1993, …, 2000
– If you hear A when you expected to hear E,
increase the Harmony of A above that of E by
minimally demoting each constraint violated
by A below a constraint violated by E
Constraint Demotion Learning
If you hear A when you expected to hear E,
increase the Harmony of A above that of E by
minimally demoting each constraint violated by A
below a constraint violated by E
Correctly handles difficult case: multiple violations in E
in +
Candidates
possible
☹☞E
☺
☞
Mark
Faith
Faith
(NPA)
*
inpossible
A impossible
*
*
Nativism I: Learnability
• M ≫ F is learnable with
/in+possible/→impossible
– ‘not’ = in- except when followed by …
– “exception that proves the rule, M = NPA”
• M ≫ F is not learnable from data if there are
no ‘exceptions’ (alternations) of this sort, e.g.,
if lexicon produces only inputs with mp, never
np: then M and F, no M vs. F conflict, no
evidence for their ranking
• Thus must have M ≫ F in the initial state, ℌ0
The Initial State
OT-general:
MARKEDNESS ≫ FAITHFULNESS
 Learnability demands (Richness of the Base)
(Alan Prince, p.c., ’93; Smolensky ’96a)
 Child production: restricted to the unmarked
 Child comprehension: not so restricted
(Smolensky ’96b)
Nativism II: Experimental Test
 Collaborators
 Peter Jusczyk
 Theresa Allocco
 Language Acquisition (2002)
Nativism II: Experimental Test
• Linking hypothesis:
More harmonic phonological stimuli ⇒
Longer listening time
• More harmonic:
– M ≻ *M, when equal on F
– F ≻ *F, when equal on M
– When must chose one or the other, more
harmonic to satisfy M: M ≫ F
• M = Nasal Place Assimilation (NPA)
Experimental Paradigm
• Headturn Preference Procedure
(Kemler Nelson et al. ‘95; Jusczyk ‘97)
• X/Y/XY paradigm (P. Jusczyk)
un...b...umb
un...b...umb
*FNP
ℜ
um...b...umb
iŋ…..gu...iŋgu
…
um...b...iŋgu
vs.
iŋ…..gu…umb
…
•Highly general paradigm: Main result
p = .006
∃FAITH
4.5 Months (NPA)
20
H igher H
Lower H
18
Higher Harmony
Lower Harmony
um…ber…umber
um…ber… iŋgu
p = .006 (11/16)
1 5 .3 6
16
Tim e (sec)
14
1 2 .3 1
12
10
8
6
4
2
0
Faithfulness
M arkedness
M ≫F
4.5 Months (NPA)
20
H igher H
Lower H
18
1 5 .3 6
Tim e (sec)
14
1 2 .3 1
Lower Harmony
um…ber…umber
un…ber…unber
p = .044 (11/16)
1 5 .2 3
16
Higher Harmony
1 2 .7 3
12
10
8
6
4
2
0
Faithfulness
M arkedness
M ≫F
4.5 Months (NPA)
20
H igher H
Lower H
18
1 5 .3 6
Tim e (sec)
14
un…ber…umber
1 5 .2 3
16
1 2 .3 1
* Markedness
 Faithfulness
 Markedness
* Faithfulness
un…ber…unber
???
1 2 .7 3
12
10
8
6
4
2
0
Faithfulness
M arkedness
M ≫F
Higher Harmony
Lower Harmony
un…ber…umber un…ber…unber
4.5 Months (NPA)
p = .001 (12/16)
20
H igher H
Lower H
18
1 5 .3 6
1 6 .7 5
1 5 .2 3
1 4 .0 1
16
Tim e (sec)
14
1 2 .3 1
1 2 .7 3
12
10
8
6
4
2
0
Faithfulness
M arkedness
M ≫F
Program
Structure
 OT
• Constructs formal grammars directly from
markedness principles
• Strongly universalist: inherent typology
 OT allows completely formal markedness-based
explanation of highly complex data
Acquisition
Initial state predictions explored through
behavioral experiments with infants
Neural Realization
 Construction of a miniature, concrete LAD
The question
• The nativist hypothesis, central to
generative linguistic theory:
Grammatical principles respected by all human
languages are encoded in the genome.
• Questions:
– Evolutionary theory: How could this happen?
– Empirical question: Did this happen?
– Today: What — concretely — could it mean for a
genome to encode innate knowledge of universal
grammar?
UGenomics
• The game: Take a first shot at a concrete
example of a genetic encoding of UG in a
Language Acquisition Device
¿ Proteins ⇝ Universal grammatical principles ?
Time to willingly suspend disbelief …
UGenomics
• The game: Take a first shot at a concrete
example of a genetic encoding of UG in a
Language Acquisition Device
¿ Proteins ⇝ Universal grammatical principles ?
• Case study: Basic CV Syllable Theory (Prince
& Smolensky ’93)
• Innovation: Introduce a new level, an
‘abstract genome’ notion parallel to [and
encoding] ‘abstract neural network’
Approach: Multiple Levels of Encoding
= A instantiates B
= A encodes B
Grammar
Innate Constraints
Abstract Neural Network
Abstract Genome
Biological Neural Network
Biological Genome
UGenome for CV Theory
• Three levels
– Abstract symbolic: Basic CV Theory
– Abstract neural:
CVNet
– Abstract genomic: CVGenome
UGenomics: Symbolic Level
• Three levels
– Abstract symbolic: Basic CV Theory
– Abstract neural:
CVNet
– Abstract genomic: CVGenome
Approach: Multiple Levels of Encoding
= A instantiates B
= A encodes B
Grammar
Innate Constraints
Abstract Neural Network
Abstract Genome
Biological Neural Network
Biological Genome
Basic syllabification: Function
• Basic CV Syllable Structure Theory
– ‘Basic’ — No more than one segment per
syllable position: .(C)V(C).
• ƒ: /underlying form/  [surface form]
• /CVCC/  [.CV.C V C.]
/pæd+d/[pædd]
• Correspondence Theory
– McCarthy & Prince 1995 (‘M&P’)
• /C1V2C3C4/  [.C1V2.C3 V C4]
Why basic CV syllabification?
• ƒ: underlying  surface linguistic forms
• Forms simple but combinatorially
productive
• Well-known universals; typical typology
• Mini-component of real natural language
grammars
• A (perhaps the) canonical model of
universal grammar in OT
Syllabification: Constraints (Con)
• PARSE: Every element in the input
corresponds to an element in the output
• ONSET: No V without a preceding C
• etc.
UGenomics: Neural Level
• Three levels
– Abstract symbolic: Basic CV Theory
– Abstract neural:
CVNet
– Abstract genomic: CVGenome
Approach: Multiple Levels of Encoding
= A instantiates B
= A encodes B
Grammar
Innate Constraints
Abstract Neural Network
Abstract Genome
Biological Neural Network
Biological Genome
CVNet Architecture
/C1 C2/  [C1 V C2]
/ C1
C2
C
[
V
‘1’
C1
V
C2
‘2’
]
/
Connection substructure
s2

2
c
1
c
s1
i
2
1
Local: fixed, genetically determined
Content of
constraint 1
Global: variable
during learning
Strength of
constraint 1

Network weight: Wφψ 
Ncon
i
 si cΦΨ
i 1
Network input: ι = WΨ a
PARSE
• All connection coefficients are +2
1
C
1
1
V
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
3
3
ONSET
• All connection coefficients are 1
C
V
Crucial Open Question
(Truth in Advertising)
• Relation between strict domination and
neural networks?
CVNet Dynamics
• Boltzmann machine/Harmony network
– Hinton & Sejnowski ’83 et seq. ; Smolensky ‘83 et seq.
– stochastic activation-spreading algorithm:
higher Harmony  more probable
– CVNet innovation: connections realize fixed
symbol-level constraints with variable strengths
– learning: modification of Boltzmann machine
algorithm to new architecture
Learning Behavior
• A simplified system can be solved
analytically
• Learning algorithm turns out to ≈
Dsi() = e [# violations of constrainti P ]
UGenomics: Genome Level
• Three levels
– Abstract symbolic: Basic CV Theory
– Abstract neural:
CVNet
– Abstract genomic: CVGenome
Approach: Multiple Levels of Encoding
= A instantiates B
= A encodes B
Grammar
Innate Constraints
Abstract Neural Network
Abstract Genome
Biological Neural Network
Biological Genome
Connectivity geometry
• Assume 3-d grid geometry
C
V
‘N’
‘E’
‘back’
ONSET
• VO segment: N&S S VO | N S x0 x0 segment: | S S VO
C
V
Connectivity: PARSE
• Input units grow south and connect
• Output units grow east and connect
• Correspondence units grow north & west
and connect with input & output units.
1
C
1
1
V
1
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
3
3
To be encoded
• How many different kinds of units are
there?
• What information is necessary (from the
source unit’s point of view) to identify
the location of a target unit, and the
strength of the connection with it?
• How are constraints initially specified?
• How are they maintained through the
learning process?
Unit types
• Input units
C V
• Output units
C V x
• Correspondence units C V
• 7 distinct unit types
• Each represented in a distinct subregion of the abstract genome
• ‘Help ourselves’ to implicit machinery
to spell out these sub-regions as distinct
cell types, located in grid as illustrated
Direction of projection growth
• Topographic organizations widely
attested throughout neural structures
– Activity-dependent growth a possible
alternative
• Orientation information (axes)
– Chemical gradients during development
– Cell age a possible alternative
Projection parameters
• Direction
• Extent
– Local
– Non-local
• Target unit type
• Strength of connections encoded
separately
Connectivity Genome
• Contributions from ONSET and PARSE:
Source:
CI
VI
CO
Projections:
S LCC
S L VC
E L CC
 Key:
Direction
N(orth) S(outh)
E(ast) W(est)
F(ront) B(ack)
VO
CC
VC
xo
E L VC
N L CI N L VI S S VO
N&S S VO W L CO W L VO
N S x0
Extent
Target
L(ong) S(hort) Input: CI VI
Output: CO VO
x(0)
Corr: VC CC
CVGenome: Connectivity
C-I
V-I
DE T DE T
IDENTITY
LINEARITY
C-C
D E
T
V-C
D E
T
UNIFORMITY
OUTPUTID
NOOUTGAPS
RESPOND
CORRESPOND S L C-C S L V-C N
W
PARSE
S L C-C S L V-C N
W
FILL-V
S L V-C
ONSET
NOCODA
x
T
F Sh V-C
B Sh C-C
N/E L C-C&V-C N/E L C-C&V-C
S/W L C-C&V-C S/W L C-C&V-C
S L
C-C
S L
V-C
N L
C-C
N L
V-C
E L
C-C
E L
V-C
W L
C-C
W L
V-C
INTEGRITY
FILL-C
C-O
V-O
DE T DE T DE
S L C-C
F Sh V-O B Sh C-O F Sh C-O
B Sh x B Sh x F Sh V-O
N Sh x* N Sh x* S Sh C-O&V-O
L
L
L
L
C-I
C-O
C-I
C-O
N L
W L
C-I
C-O
N
W
N
W
N
W
L
L
L
L
L
L
V-I
V-O
V-I
V-O
V-I
V-O
E L C-C E L V-C
E L C-C E L V-C
E L V-C
E L C-C
N Sh V-O S Sh 1rst V-O
S Sh V-O
N Sh 1rst x
N Sh C-O
N Sh C-O
S Sh C-O
S Sh x
Encoding connection strength
 Network-level specification
Wφψ 
—
Ncon

i 1
i
si c ΦΨ
• For each constraint i , need to ‘embody’
– Constraint strength si
i
– Connection coefficients c
(Φ  Ψ cell
types)
• Product of these is contribution of i to the
Φ  Ψ connection weight
Processing
[P1] ∝ s1
1
R1  c
0
1
1
w1  [ P1 ] R
 s1c
Φ
2
R2  c 
0
W =i wi
Ψ
Development
1
1
R1  G
 c 
1
1
G
 c 
0
1
1
L1  G
 c
Φ
2
2
G
 c 
0
2
2
L2  G
 c
2
2
R2  G
 c
Ψ
Learning
(during phase P+; reverse during P )
[ P1 ]  K1
1
1
1
D K1  L1  c
L1  G
 c
Φ
2
2
D[ P2 ]  D K2   L2   G
 c
When  and  are simultaneously active,
1
1
Dsi  D[ P1 ]  D K1  L1  G
 c
Ψ
CVGenome:
Connection Coefficients
Constraint
From
To
Strength Constraint From
To
Strength
IDENTITY
C-C
V-C
1
PARSE C-C&V-C bias
3
LINEARITY
C-C&V-C
C-C&V-C
1
C-I&V-I
bias
1
INTEGRITY
C-C&V-C
C-C&V-C
1
C-I&C-O
C-C
2
UNIFORMITY
C-C
C-C
1
V-I&V-O V-C
2
OUTPUTID C-O&V-O&x C-O&V-O&x
2
FILL-V
V-C
bias
3
NOOUTGAPS
x
C-O&V-O
1
V-O
bias
1
RESPOND C-O&V-O&x
bias
1
V-I&V-O V-C
2
CORRESPOND C-C&V-C
bias
2
FILL-C
C-C
bias
3
C-C
C-I&C-O
1
C-O
bias
1
V-C
V-I&V-O
1
C-I&C-O
C-C
2
NOCODA
C-O
C-O&x
1
ONSET
V-O
V-O&x
1
Abstract Gene Map
General Developmental Machinery
S L CC
C-C:
S L VC
F S VC

N/E
L CC&VC
S/W L CC&VC
target
RESPOND:
CO&V&x
Constraint Coefficients
V-I:
C-I:
direction extent
Connectivity
CORRESPOND:
B 1

G
G
CC&VC
B 2
CC CI&CO
1
VC VI&VO
1
UGenomics
• Realization of processing and learning
algorithms in ‘abstract molecular
biology’, using the types of interactions
known to be biologically possible and
genetically encodable
UGenomics
• Host of questions to address
–
–
–
–
Will this really work?
Can it be generalized to distributed nets?
Is the number of genes [77=0.26%] plausible?
Are the mechanisms truly biologically
plausible?
– Is it evolvable?
 How is strict domination to be handled?
Hopeful Conclusion
• Progress is possible toward a Grand Unified
Theory of the cognitive science of language
– addressing the structure, acquisition, use, and
neural realization of knowledge of language
– strongly governed by universal grammar
– with markedness as the unifying principle
– as formalized in Optimality Theory at the
symbolic level
– and realized via Harmony Theory in abstract
neural nets which are potentially encodable
genetically
Hopeful Conclusion
• Progress is possible toward a Grand Unified
Theory of the cognitive science of language
€
Still lots of promissory notes, but
all in a common currency —
Harmony ≈ unmarkedness; hopefully
this will promote further
progress by facilitating integration of
the sub-disciplines of cognitive science
Thank you for your attention (and
indulgence)