Introduction to Cognitive Science

Download Report

Transcript Introduction to Cognitive Science

Computational Cognitive
Modelling
COGS 511-Lecture 6
Computational
Cognitive Modelling
in Studying
Inflectional
Morphology
17.07.2015
COGS 511
1
Related Readings
Readings: Nakisa et al. Single and Dual-Route Models of Inflectional Morphology;
İn Broeder P. and J. Murre (2002). Models of Language Acquisition: Inductive and
Deductive Approaches, OUP, 2002
Optional and Further Readings

Rumelhart and McClelland (1986) On Learning the Past Tenses of English Verbs. In
McClelland et al. (eds) Parallel Distributed Processing, vol. 2, MIT Press.

The Past Tense Debate (articles and replies by Pinker and Ullman vs. McClelland and
Patterson). Trends in Cognitive Sciences, 6(11), 2002.

Taatgen and Anderson (2002). Why do Children learn to say “Broke”? A model of
learning the past tense without feedback. Cognition 86, pp. 123-155

Almor, A. (2003). Past Tense Learning. In Arbib, M. (ed). Handbook of Brain Theory
and Neural Networks, MIT Press.

Pinker. S (1999) Words and Rules: The Ingredients of Language. Phoenix

Marcus, G (2000) The Algebraic Mind: Integrating Connectionism and Cognitive
Science, MIT Press.

Marcus. Children’s Overregularization and Cognition in Broeder and Murre (2002)
All figures adopted are referenced in the notes parts of the relevant slide respectively.
17.07.2015
COGS 511
2
Units of speech

Phones: unitary segments of the streams
of speech



Are made up of phonological features acc. to
places of articulation, voicing etc [+glottal],
[+voiced], [-voiced], see also consonants and
vowels
Phonemes: abstract units characterizing a
phone and its allophones (variants of the
same sound): [p] in spin and [ph] in pin
are allophones of the same phoneme /p/
Syllables: combinations of phones
17.07.2015
COGS 511
3
Morphology


The study of word structure, of words and
how they are formed.
Morphemes: the smallest meaningful
linguistic unit. Morphemes may have
more than one phonemic form, each of
which is an allomorph of the morphemea meaningful form is a morph.

a/an in English; -ler/-lar in Turkish (-lAR);
will/’ll (contractions) in English
17.07.2015
COGS 511
4
Derivational vs Inflectional
Affixes

Derivational:


function change (may change part of speech
or derive new word – energy noun – energy +
ize – energize verb but happy-unhappy, or
pig-piglet
Inflectional


Bound forms of grammatical morphemes
No function, part-of-speech change, rather
markings for tense, gender, case, number
e.g. plural morphemes, past tense formation
17.07.2015
COGS 511
5
Other terminology





Lexicon: our mental dictionary –avg adult knows 45,000 to
60,000 words
Root: A lexical morpheme which is the base to
morhological processes
Stem – used either as a synonym to root or the base to
inflectional morphology
Word class, category, part of speech: a linguistically
relevant group that share particular linguistic properties:
nouns, verbs, adjectives, adverbs, prepositions, pronouns,
determiners etc
Suppletive forms: Irregular related forms, ex: be and
were.

Partial suppletion: sub-regularity ex: sing-sang, ring-rang
17.07.2015
COGS 511
6
Morphological Rules

Morphological rules express



When a morpheme has allomorphs, the
choice among these, ex: kitaplar
Necessary and possible combinations and
order of morphemes which make up words
(morphotactics), ex: *kitabımlar
Morphosyntactic constraints e.g. Subject-verb
agreement: I eat but she eats
17.07.2015
COGS 511
7
Language Impairments




Aphasias (impairment in language and speech);
developmental disorders (autism, William’s syndrome).
Broca’s aphasia (aka cortical motor aphasia) slow, halting,
telegraphic speech. Finer distinctions in understanding
language (basic word order vs movements)
Wernicke’s aphasia (aka cortical sensory aphasia)
difficulties in understanding language; grammatical but
meaningless utterances.
Common types of syndromes: paraphasias (production
errors like chair for table; tame for lame); anomic
(difficulties in finding the right word); echolalia
(compulsive repetition).

Agrammatism: impairment of comprehension often
associated with agrammatic production (absence of
grammatical morphemes) in nonfluent aphasics
17.07.2015
COGS 511
8
The Past Tense Debate and
Inflectional Processes

Is regular inflection (e.g. English past tense
suffix –ed) an implication for rules in mental
computation?


Wug test (Berko, 1958): one wug, two ?  English
speakers (age 3 upwards) apply the regular rule to new
words they havent heard before
Overregularization errors: At around age 3, children
who may have previously used irregular forms
correctly suddenly start to inappropriately regularizing
many irregular forms. Went  goed/wented. Plotting
children’s performance against age is what is known as
“U shaped learning curve”.
• Acquisition of a rule?
• A qualitative change in the learning mechanism?
17.07.2015
COGS 511
9
Dual vs single route
mechanisms


Dual Route (Pinker and others- Pinker’s version
post-1999 aka Words and Rules theory)
Proposal: Inflectional morphology in all human
languages is computed by a dual route
mechanism consisting of pattern associator type
of memory module (for irregulars and frequently
encountered, possibly irregular sounding
regulars) and a rule (for defaults) which is
unblocked only when the pattern associator fails.
Single route (McClelland and others) Proposal:
Single mechanism for handling both regular and
exceptional forms – mainly put forward by
connectionist modelling.
17.07.2015
COGS 511
10
Rumelhart and McClelland
(1986)






Landmark connectionist model in past tense debate
Input: phonological representations of stem forms;
output: phonological representations of past tense forms
Fixed encoding and decoding networks: word forms are
represented by units designating each phoneme together
with its predecessor and successor. Encoding will map
these into so called “Wickelfeatures” that represent
features (voiced, stop etc) of phonemes.
Learning by perceptron convergence (PDP version) and
then backpropagation (Nature version)
Pattern associator with modifiable connections
No explicit rules but able to produce regular past tense
forms for novel verbs and the U shaped learning curve
characteristic of children in training.
17.07.2015
COGS 511
11
17.07.2015
COGS 511
12
Criticisms


Divergence from human behaviour, e.g.
model did not generalize well to novel
forms that have an unusual sound (e.g.
the model mapped the stem tour (not in
the training set) to toureder).
U shaped learning occurs a result of
implausible and carefully engineered
training regime, e.g. a sudden jump in
vocabulary from 10 to 420 verbs (Pinker
and Prince, 1988)
17.07.2015
COGS 511
13
Later Developments






Better connectionist models: MacWhinney and Leinbach (1991), Plunkett
and Marchman (1991,1993, 1996)- obtaining the U shaped learning with
gradual increase in vocabulary but performance in regular verbs also
decreases with decrease in irregular verb performance – contradiction
with Marcus’ data.
More criticisms of dual route theorists on specific assumptions of specific
models
But very few computational comparable models of dual route theory, so
is the theory underspecified?
Should simplifying assumptions of connectionist models be critical in the
points they make?
And what about the assumptions that dual route theorists make? Ex:
about the innate nature of blocking mechanism
Led to new empirical studies of frequency distribution of inputs and
outputs in morphological acquisition as well as models for inflectional
processes in other languages (German, Arabic, Hebrew), which have
different morphological properties and frequencies than English.
17.07.2015
COGS 511
14
A theoretical assessment of Words and
Rules (Dual Route) theory
Acc. To Pinker (2002)
 Contrasts with generative phonology: Applying
rules to irregular form by categorizing them into
phonological patterns will lead to too many
exceptions.
 More similar to lexicalist theories (e.g.
Jackendoff) that posit morphological phenomena
are neither arbitrary lists nor fully productive
phenomena.
 It is not a connectionist system glued onto a rule
system (cf. Nakisa et al.) as lexical entries have
structured morphological, semantic etc.
properties current connectionist models do not
17.07.2015
COGS 511
15
17.07.2015
COGS 511
16
Dual Route theory does not
say (Pinker, 2002)


Literally there is a rule “to form the past tense
add –ed to the verb.” (Thus compatible with
constraint or construction based theories of
language)
It is not the case that regular forms are never
stored, but just that they do not have to be.
Such storage depends on word-, task- and
speaker-specific factors.


Regular forms that constitute doublets with irregulars
(dived/dove; dreamed/dreamt) must be stored to
escape blocking by the irregular.
Regular forms that resemble irregulars (blinked, glided)
must be stored to escape a partial blocking effect by
similar irregulars.
17.07.2015
COGS 511
17
Support for Dual Route
Theory
Marcus et al collected past tense forms of English form CHILDES
database from 83 children 1-6 years of age. Findings: Children
overregularize rarely (4%). Concl: Errors stem from a performance error
rather than qualitative grammatical reorganization.

Low frequency verbs tend to be overregularized more often than high
frequency verbs. Concl: Overregularization is a result of memory failure

Verbs with greater number of similar sounding irregular numbers were
less likely to overregularized.
 Overregularization disappears gradually over time.

Onset of overregularization coincides with development of reliable regular
past tense marking.

Presence of similar sounding regular verbs does not make
overregularizations of irregulars more likely.

Cross linguistic study: On German plurals, both children and adults use –
s for novel words that sound unusual and names
Concl: Regular inflection can be generalized independently of frequency.

17.07.2015
COGS 511
18
17.07.2015
COGS 511
19
17.07.2015
COGS 511
20
Empirical Evidence from Dual Route
Theorist’s Point of View

Generalization to Unusual Novel Words: People tend to
apply regular inflections to novel unusual words




Even connectionist models that can do so, either implement
or presuppose a rule- e.g.not generating full form, but
activating local output units for past tense inflection only;
having extra mechanisms corresponding to an innate
mechanism
Onset and rate of overregularization errors in children do not
correlate with changes in the number and proportion of
regular verbs used by parents.
Regular inflections may form a minority class but be
generalized like English regulars in other languages.
Connectionist claims that distribution of regulars over
phonological space is crucial (esp. Not in specific clusters) do
not hold in languages like Hebrew where speakers apply them
to unusual sounding and exocentric nouns.
17.07.2015
COGS 511
21
Systematic Regularization




Some irregular forms can systematically be used
in regular forms.
Words and Rules theory says this is because
they lack a root in head position that can be
marked for the inflectional feature (tense or
number) and thus regular suffix applies since
memory access is disabled.
Dinged, “I found three man’s on page 1”,a
couple of wolfs (wolfing down the food)
If a irregular sounding word changes in meaning
but retains a root in head position it stays
irregular no matter how radical the change is:
straw men, beewolves, superwomen etc.
17.07.2015
COGS 511
22
Dual Route Reply to Single
Route
Key issue is not gradedness in
behavioural data but whether
human language mechanisms are
combinatorial and sensitive to
grammatical structure and
categories. Rules can be acquired
gradually and apply probabilistically,
and thus can deal with gradedness.
17.07.2015
COGS 511
23
Connectionist View of Two Approaches
17.07.2015
COGS 511
24
Connectionist Reply to
“Words or Rules”

Connectionist models exploit the quasi-regularity
(the tendency for an exception to exhibit aspects
of the regular pattern) as they are processed by
the same mechanism and dual route theory
does not.




Cut, hit etc past tense identical
Bleed, breed ; past tense bled, bred
59% of 181 irregulars fall into one of the eight classes
defined in McClelland and Patterson (2002). Rest also
exhibit quasi-regularity except be and go.
Quasiregularity occurs in other domains such as
spelling-sound mapping; derivational
morphology.
17.07.2015
COGS 511
25
Sudden Acquisition of Past
Tense


Marcus’ (dual route) claim: First
overregularization in each child’s corpus
indicates a moment of acquisition of the
past tense rule, and this is followed by
rapid increases in inflecting regulars to
high levels shortly.
Connectionists’ reply: Hoeffner’s
reevaluation of the same data gives a
more gradual and graded picture.
17.07.2015
COGS 511
26
Uniformity with respect to
Phonology:


Dual route theorists’ claim: Rules apply
on categorical conditions
Connectionists’ reply: Prasada and
Pinker’s conclusion that there was no
effect of similarity of novel words to
known regulars was ill-founded as their
stems were not of high phonological
acceptibility. Regular past tense is
sensitive to phonological attributes of the
stem.
17.07.2015
COGS 511
27
Uniformity with respect to
Semantics
Dual theorists claim: word meaning
does not affect tendencies for novel
(aka nonce) words.
Connectionists’ claim: It does,
Ramscar’s placement of novel words
like frink into semantic contexts that
primed words alternatively like drink
or blink, elicited different past tense
formations, namely frank or frinked.

17.07.2015
COGS 511
28
Frequency Effects

The use of irregularly inflected forms is strongly affected
by their frequency; and to the extent that regularly
inflected forms show frequency effects, these effects are
quite small.



Both dual and single route theories can explain this.
Distinguishing type frequency and token frequency: irregular
verbs are few in type but common as tokens.
Irregularization errors (incorrectly producing an irregular
form for the regular form) are more likely for low
frequency regular verbs than for high frequency regular
verbs; also latency of correct responses for low frequency
regulars is more if there is interference by similar sounding
irregulars.

Almor’s claim: this is not compatible with dual route theory as
the theory predicts only regulars stored in the memory
system should be high frequency regulars.
17.07.2015
COGS 511
29
The Case for Minority
Defaults



Regular past tense in English applies 86% of
1000 most common words.
Regular German past participle +t, the Arabic
broken plural, and the German –s plural have
been claimed by dual theorists as being minority
defaults thus strenghtening the case for dual
mechanism.
Connectionist claim: Empirical data show
otherwise for all three cases. For +s plural,
although it is minority, it does not apply
uniformly across contexts, hence it is not the
default.
17.07.2015
COGS 511
30
Neurological Impairments
and Imaging


Double dissociations between having trouble with regulars vs
irregulars
Temporal and functional differences between processing of
regulars and irregulars



Dual interpretation: Grammar areas handle regular processing;
lexical semantics areas handle irregular processing (agrammatism vs
anomia)
Alternative dual interpretation (Ullman, 2001) Regular processing on
procedural memory, irregular on declarative memory
Connectionist models can also show selective impairments to regulars
and irregulars
• Irregulars depend more on semantics than phonology, where as regulars
depend more on phonology; so more damage to phonological
representation will cause affect regulars more. (Joanisse and Seidenberg,
1999). Pinker claims the representation in semantics is effectively a
lexicon, with one unit dedicated to each word. More evidence against
connectionist modelling: Anomic patients with no difficulty in accessing
word meanings still have difficulty with irregulars; the prediction that
patient groups should have parallel tendencies to generalize regular and
irregular inflection to novel words but there is dissociation.
17.07.2015
COGS 511
31
17.07.2015
COGS 511
32
Double Dissociations

Connectionist Claim: Data reported by dual
theorists on selective impairment is either
misinterpreted or experimentally biased, eg
Ullmans study had word final consonants twice
longer in regulars than in exceptions. This
increases phonological complexity; thus
impairment to phonological representation will
entail impairment to regular inflection (similar
prediction for developmental language
disorders). When phonological complexity is
matched, an advantage for irregulars no longer
remains (Bird et al.)
17.07.2015
COGS 511
33
Against the Predictions of
Connectionist Models





It is not necessary or empirically correct to assume
overregularization is triggered by a sudden increase in regular
forms in the input.
No polysemous irregular roots tie regular forms to specific
meanings e.g. *throwed up. Ramscar’s experiment is ill-founded.
Experimental evidence about –t participles and –s plurals in
German: e.g. controversies on counting for determining majority
Currently SLI (specific Language Impairment) patients show no
difference in impairment for regulars vs irregulars. Language
impaired people are impaired with rules (hence unable to inflect
nonsense words) but can memorize common regular forms (lack
of deficit compared w. irregulars). SLI is found to have no
relation w. Auditory perception.
Replication of aphasia studies showing non-fluent aphasics have
more trouble with regular than irregular forms gave mixed
results; neither did it show that it is a side effect of phonological
complexity.
17.07.2015
COGS 511
34
Comparative evaluation of Dual and
Single Route Strategies (Nakisa et al.)

For three different paradigms




German plurals
Arabic plurals
English past tense
Three different pattern associators





A nearest neighbour classifier: for a novel word, find the most
similar neighbour and adopt its inflection type.
Simplified Nosofky Generalized Context Model: Based on
probabilistic reasoning on classification.
Three layer feedforward network with backpropagation;
outputs corresponding to local units for different inflections.
Dual route models are implemented with definition of
“memory failure” in each model and an additional rule
mechanism: e.g. memory fails if the greatest output unit
activity is less than a threshold value in the neural network.
A phonology based representation was used in all simulations
17.07.2015
COGS 511
35
17.07.2015
COGS 511
36
Some Constraints
Associative memory of the dual
route classifier is trained with only
irregular forms.
 Nearest neighbour algorithms can
not deal with token frequencies so it
is not accounted for in any of the
pattern associators.

17.07.2015
COGS 511
37
17.07.2015
COGS 511
38
17.07.2015
COGS 511
39
17.07.2015
COGS 511
40
Major findings



Nearly in all simulations single route classifiers
generalized better more accurately than dual
route classifiers.
Sound of a word stem is a good predictor of the
inflection type the stem undergoes.
The failure to deal with Arabic dependent on the
distribution of irregulars with respect to regulars.
Broken plurals (73% of type frequencies of the
data) were distant to other irregulars, thus were
mistakenly regularized by the dual route system.
17.07.2015
COGS 511
41
An ACT-R model of Past
Tense Learning


(Taatgen and Anderson, 2002) Showing Ushaped learning without direct feedback (internal
feedback is provided by execution times of
different strategies), with realistic training
regime, i.e. gradual changes in vocabulary, and
unrealistically high rates of regular verbs; and
can deal with minority default rules. Uses rules
both for regular and irregular cases.
Interpreted as characterizing an underlying
connectionist system at a higher level of
analysis; with rules providing descriptive
summaries of the regularities captured in the
network’s connections.
17.07.2015
COGS 511
42
Various Strategies Used





Retrieval Strategy: Produce a past tense by recalling an
example of inflecting the word from memory
Analogy: Recall an arbitrary example of past tense from
memory, and use it as a basis for analogy. Leads to
learning regular rule (takes some time to learn, and
overregularization occurs whenever retrieval fails in low
frequency verbs)
Zero strategy: Do no inflection at all.
The strategy with highest expected utility is applied with
highest probability.
Perception and generation alter over the period of
simulation; 478 words based on Marcus (1992) study.
17.07.2015
COGS 511
43
Comparison w. Dual Route
Account

It is not the case that cognitive
system discovers that the regular
rule is an overgeneralization but just
that it has not properly memorized
the exceptions yet. Dominance of
the irregular is a result of its greater
efficency not because of the
assumption of blocking system being
the dominant strategy.
17.07.2015
COGS 511
44
17.07.2015
COGS 511
45
17.07.2015
COGS 511
46
17.07.2015
COGS 511
47
Conclusion





Hot debate, with major implications for cognitive
architecture
Close scrunity to methodologies and
interpretations of both experiments and corpus
based studies.
Computational vs noncomputational models are
hard to compare. Dual route theorist have a
nonfair advantage there.
Which level of description one is offering?
Generally a good example of what
computational cognitive models can lead to.
17.07.2015
COGS 511
48
Lecture 7

Next Week: Sample Models in Cognitive
Neuropsychology
• Readings: Cohen and ServanSchreiber,Context, Cortex and Dopamine;
Farah, Locality
17.07.2015
COGS 511
49