Looking for a ‘gold standard’ to measure language complexity: What psycholinguistics and neurolinguistics can (and can’t) offer to formal linguistics Lise Menn & Jill.

Transcript Looking for a ‘gold standard’ to measure language complexity: What psycholinguistics and neurolinguistics can (and can’t) offer to formal linguistics Lise Menn & Jill.

Looking for a ‘gold standard’ to
measure language complexity:
What psycholinguistics and neurolinguistics
can (and can’t) offer to formal linguistics
Lise Menn & Jill Duffield
Linguistics Dept/Institute of Cognitive Science
University of Colorado, Boulder
[email protected]
[email protected]
Background for our stance
• Cross-linguistic work on basic morphosyntax in
aphasia, and on the earliest stages of child
phonology shows that these areas are loaded with
individual and with language-specific differences.
• ‘Markedness’ keeps vanishing into the mist of
unverifiability. It’s no guide to complexity.
• So, the issue of what’s simple and why, especially
in those domains, has been a constant undercurrent
and a frequent topic of discussion.
3/23/12
Grammatical Complexity Workshop
2
What we’re going to say – 1 to 4
1. What our brains find difficult is not always what grammars
consider complex, partly because what’s hard for our brains is not
constant; it depends on many factors. (It’s complicated.)
2. Proposed metrics for language or grammar complexity should
correspond in some way to the ‘gold standard’ of what’s hard for
our brains to process.
3. Language complexity measures will have to go beyond a single
measure of grammar complexity, because complexity for
speakers/writers is not the same as for hearers/readers.
4. Psycho- and neurolinguistics can’t provide a royal road to
measuring the complexity of a grammar or a language, but they do
provide tools to measure processing complexity of individual
sentences/utterances for speakers vs. listeners or readers, and
learners vs. skilled language users.
3/23/12
Grammatical Complexity Workshop
3
What we’re going to say – 5 to 8
5, Psycho- and neurolinguistic studies indicate that a valid measure
of complexity will have to integrate across many linguistic levels
(including semantics) and take frequency into account.
6. This implies that construction-based and usage-based
approaches to grammar can provide insights into how grammars
can come closer to reflecting what our brains do.
7. But: Complexity measures must handle competition and how it
gets resolved in both comprehension and production: the
paradigmatic axis also plays a role in complexity.
8. Pragmatics/real-world knowledge are involved in resolving this
competition. (Implications for practical applications are clear; for
comparison of languages, much less so.)
3/23/12
Grammatical Complexity Workshop
4
Problematic examples to start from
• Pourquoi l’aphasique peut-il dire: "Je ne peux pas le dire" et pas
"Elle ne peut pas la chanter''? (Nespoulous & Lecours 1989)
[Why can the aphasic person say I can’t say it but not She can’t
sing it ?]
– Possible culprits: lexical frequency, collocation frequency (formula status?),
emotional weight…
• Dressler’s (1991) work on Breton: A speaker with fluent aphasia
tends to name pictures or examples of a single object using the
plural form if the object itself is most frequently found in quantity
(leaves, potatoes); using the dual if the object is usually found in
pairs (eyes, hands).
– Goes against all notions of markedness. Relative frequency of particular
inflected form of particular word must be the explanation.
 What brains find difficult is not always what grammar and
linguists consider complex.
3/23/12
Grammatical Complexity Workshop
5
Why is there a problem?
1. What our brains find difficult is not always what
grammars consider complex, partly because
what’s hard for our brains is not constant; it
depends on many factors.
We cannot equate ‘simpler’ with what is
learned earlier, and the reason we
cannot do it is that the neural networks
change over the course of development.
3/23/12
Grammatical Complexity Workshop
6
Another example from aphasia
• Aphasic verb tense production errors are
often, as one might expect, substitutions of
present tense for past tense.
• But the reverse seems to be true for at least
some agrammatic aphasic speakers of Arabic
(Mimouni & Jarema, 1997), Polish (Jarema &
Kądzieława, 1990), and Korean (Halliwell,
2000)
3/23/12
Grammatical Complexity Workshop
7
What brains find difficult is not always what
grammars consider complex
• This is not just about speakers with brain damage. ‘Difficulty’ is
tied to particular circumstances.
• Production/comprehension asymmetry:
– most obvious case: ambiguity. Speakers, who know what
they intend to say, often produce utterances that are difficult
for hearers because of ambiguity in their referring
expressions (He did it!) and elsewhere
– Long history of studies of ambiguity resolution in
psycholinguistics that demonstrates & relies on the
processing difficulty caused by hearer’s or reader’s need to
resolve ambiguity on-line.
– Other studies showing speakers have to put effort into being
simple for their hearers (any teacher knows this!)
3/23/12
Grammatical Complexity Workshop
8
What brains find difficult is not always what
grammars consider complex
• Learning changes the brain (how could it fail to?),
creating a learner/skilled user asymmetry
• So, relative ‘simple-to-complex’ rankings must shift
as we learn our first languages (OT calls this
constraint (re-)ranking).
– Phonotactics provides many uncontroversial examples
• Blevins’ (1995) illustrations of syllable types: Spanish and Sedang
permit CCVC but not CVCC, while the reverse is true for Klamath and
Finnish.
• Japanese speakers struggle with English /tr/ but routinely produce
[tstʃ], e.g. in place name ‘Tsuchiura’)
3/23/12
Grammatical Complexity Workshop
9
Before we go on:
An essential distinction
• The ‘grammar of a language’ as an abstraction
across speakers – the ‘patterns out there’ to be
learned (E-language)
– The ‘grammar of a language’ as an abstraction across speakers isn’t
directly testable by psycholinguistics/neurolinguistics. If that grammar is
your main concern, what we have to say has to be mediated by your idea of
the relationship between the grammar of a language and the grammar of
each speaker.
• the grammar internal to a given speaker, which
should be that speaker’s internal approximation to the
‘patterns out there’ (I-language).
– This is what we’re concerned with in this presentation.
3/23/12
Grammatical Complexity Workshop
10
But focusing on speaker-internal
grammar is only a start
• Making this distinction can handle (some) differences
between learners (who have a cruder approximation
to that abstract grammar) and skilled users (who
have a better one).
• But there are more problems to deal with. One that
we’ll keep coming back to: If there’s only one internal
grammar, a single measure of its complexity can’t
handle the the fact that what’s difficult for
comprehension (e.g. ambiguity, unclear reference)
is not necessarily difficult for production.
3/23/12
Grammatical Complexity Workshop
11
Back to our first three points, slightly elaborated:
1. What our brains find difficult is not always what grammars
consider complex, partly because what’s hard for our brains
is not constant and depends on many factors.
2. Proposed metrics for language or grammar complexity – at
least for speaker-internal language or grammar – should
correspond in some way to the empirical ‘gold standard’ of
what’s hard for our brains to process.
3. So language complexity measures that claim to be valid
metrics for what’s in human minds will also have to go
beyond a single measure of grammar or language
complexity.
it’s Complicated
3/23/12
Grammatical Complexity Workshop
12
And we still have the problem cases
we started with:
• Some aphasic people can say
I can’t say it.
but not
She can’t sing it.
• A Breton speaker with fluent aphasia tends to name pictures or
examples of a single object
using the plural form
if the object itself is most frequently found in quantity (leaves, potatoes)
using the dual form
if the object is usually found in pairs (eyes, hands).
3/23/12
Grammatical Complexity Workshop
13
What can (or can’t) psycholinguistics
& neurolinguistics offer?
3/23/12
Grammatical Complexity Workshop
14
What’s a ‘gold standard’?
Why do linguists need one for complexity?
• A rigorous standpoint, outside of particular
formalisms and levels of language, to inform
proposed measures of complexity
• Needed in order to test whether a proposed metric
corresponds to measures of what our brains find
effortful to process
…just as a proposed metric of color must correspond to
some psychophysical measure of human responses to color
if it’s going to be useful in accounting for perception. A metric
that is useful for calibrating printers may not do well at
accounting for what colors people find similar.
3/23/12
Grammatical Complexity Workshop
15
Computational measures of utterance complexity
need to be validated against processing measures –
i.e., measures of performance
• Validating a particular formal analysis of processing
(e.g., an analysis that can take the number of
competing antecedents for a relative pronoun into
account, an analysis that can take various aspects of
frequency into account) puts us into the domains of
psycholinguistics and neurolinguistics, as other
speakers have already made clear.
3/23/12
Grammatical Complexity Workshop
16
Quotes on taking a psycholinguistic approach
to grammatical complexity
…the emerging correlation between performance and grammars exists because
grammars have conventionalized the preferences of performance, in proportion to
their strength and in proportion to their number, as they apply to the relevant
structures in the relevant language types. (Hawkins 2004)
In order to test the hypothesis that typological distributions reflect processing
complexity, an independently motivated, well-defined, and empirically assessable
notion of processing difficulty is essential. (Jaeger & Tily 2010)
…not only [should] grammatical theorists …be interested in performance modeling,
but also …empirical facts about various aspects of performance can and should
inform the development of the theory of linguistic competence.” –Sag & Wasow,
2011
…the competence-performance distinction acknowledges the value of the sort of
work linguists do in their day-to-day research, while recognizing that this work
eventually must be placed in a broader psychological context. (Jackendoff 2002)
3/23/12
Grammatical Complexity Workshop
17
Complexity measures must predict
processing effort for different levels, and
their interactions
Psycholinguistic experiments with normal speakers
Study of cognitive constraints and island effects (Hofmeister & Sag,
2010)
• Results: Island constraints interact with other features to affect
processing effort, correlating with grammaticality judgments
– WH-island violations are processed more easily when the extracted element
is complex (a WH-phrase)
• Which employee did Albert learn whether they dismissed after the
annual performance review?
processed more easily than
• Who did Albert learn whether they dismissed after the annual
performance review?
3/23/12
Grammatical Complexity Workshop
18
Measuring complexity for speakers:
Error rate as an example
Studies of formation of grammatical dependencies: even
subject-verb agreement can be complex!
•
Producing subject-verb agreement: a “local noun”
embedded in a subject noun phrase may interfere
with the production of agreement - but structure
constrains that interference. (Bock & Cutting 1992)
• The key [PP to the wooden cabinet]… IS / ARE…
attraction
effect
no
attraction
effect
3/23/12
• The key [PP to the wooden cabinets]… IS / ARE…
• The key [RC that ___ opened the cabinet]… IS / ARE…
• The key [RC that ___ opened the cabinets]… IS / ARE…
Grammatical Complexity Workshop
19
And there’s a glitch: Complexity measures
must be compatible with different
performance metrics
•
Speakers and listeners show different sensitivities to certain
structures in processing tasks
– While error data Bock & Cutting (1992) showed that relative clauses
isolate information interfering with agreement for speakers while
prepositional phrase modifiers do not (i.e., a clause boundary effect),
Tanner (2012) uses ERP and reading times to show no interaction
between structure x local noun number x grammaticality (no clause
bounding effect)
– in production
• The key [PP to the wooden cabinet(s)]…
effect of number
• The key [RC that ___ opened the cabinet(s)]…
of embedded
– in comprehension
• The key [PP to the wooden cabinet(s)]…
• The key [RC that ___ opened the cabinet(s)]…
3/23/12
Grammatical Complexity Workshop
noun differs
across structure
here, but not
here
20
Glitch: Complexity measures must be compatible
with different performance metrics
• Speakers and listeners show different sensitivities to certain
structures in processing tasks
• Speaker-hearer asymmetries aren’t ‘just’ matters of discourse
ambiguity….
– which means that there can’t be just one measure of
complexity of a sentence – comprehension and production
may give different complexity rankings
3/23/12
Grammatical Complexity Workshop
21
Glitch: Different sensitivities to structures in
different processing tasks
Study of discourse and weight-based factors on relative clause
extraposition: Francis & Michaelis (2012) used two different tasks to
measure the combined effects of:
• Definiteness: e.g., (Some/The) research…
• VP length: e.g. (…was conducted/…has been conducted fairly
recently)
• RC length: e.g., (….that refutes the existing theories/…that
refutes the existing theories with very clear and convincing
evidence)
1. Judgment task: readers saw two versions of a relative clause sentence
(e.g., "Further research that indicates...." vs. "Further research has
been conducted...")
2. Elicited production task: speakers were given three constituents - a
subject NP, a relative clause, and a verb phrase, and asked to order
those constituents in a full sentence.
3/23/12
Grammatical Complexity Workshop
22
Glitch: Speakers and listeners may show different
sensitivities to structures in particular processing
tasks
• In both experiments, indefinite subjects (e.g. "Some research"
vs. "The research"). as opposed to definite ones, were more
likely to be used with extraposed relative clauses. BUT:
• In the judgment (comprehension) task, readers preferred the
extraposed version for longer relative clauses
• VP length, however, didn't matter
• In the production task, VP length did matter: shorter VPs were
more likely to predict extraposed relative clauses.
• Relative clause length, however, didn't matter in production.
No explanation for this particular pair of comprehension/production
discrepancies yet…
3/23/12
Grammatical Complexity Workshop
23
So: We don’t have a ‘royal road’ to offer;
how about a set of road-building tools?
What tools do we have to measure utterance
complexity?
• Complexity for the listener (comprehension)
– Neural correlates of relative effort: one current measure is
Event-Related Potential (ERP);
– Errors, reaction time and eye-tracking can also be used.
(Imaging tools. though glamorous, don’t have good enough
spatiotemporal resolution yet)
• Complexity for the speaker (production)
– Error-based measures of relative effort comprise the
majority of production research.
– Reaction time measures and eye-tracking measures also
are useful.
3/23/12
24
Grammatical Complexity Workshop
Things you probably know about measuring
performance
• Available performance measures for both comprehension
and production can only look at one word, utterance, or short
passage at a time.
• Performance always has a large random element – minds
differ, and they are simultaneously busy with many things
besides the task set by the experimenter (wishing for coffee,
worrying about politics or the weather…)
• So most measures have to be averaged over fairly large
sets of similar items, and sometimes over speakers as well.
3/23/12
Grammatical Complexity Workshop
25
Measuring complexity for listeners:
Event-Related Potential as an example
An ERP study of metaphor comprehension (Lai, Curran, & Menn 2009)
• Comprehending conventional and novel metaphors: Lexical
semantics affects comprehension effort when structure is
held constant.
• Sense/nonsense judgment task, comparing listeners’ ERPs for
these four semantic groups:
–
–
–
–
Literal: Every soldier in the frontline was ATTACKED.
Conventional: Every point in my argument was ATTACKED.
Novel: Every second of our time was ATTACKED.
Anomalous: Every drop of rain was ATTACKED.
(assignment of sentences to groups checked by naïve subject ratings)
3/23/12
Grammatical Complexity Workshop
26
Measuring complexity for listeners:
Event-Related Potential as an example
• Result: Conventional metaphors required a short burst of
additional processing effort when compared with literal
sentences. Novel metaphors required a more sustained effort,
similar to the effort observed in anomalous sentences.
–
–
–
–
Literal: Every soldier in the frontline was ATTACKED.
Conventional: Every point in my argument was ATTACKED.
Novel: Every second of our time was ATTACKED.
Anomalous: Every drop of rain was ATTACKED.
• Comprehension of metaphors involves an initial stage of
mapping from one concept to another; such mappings are
cognitively taxing, implying that complexity (as processing
effort) involves more than structure.
• ERP matches our intuitions about complexity – and elaborates
them.
3/23/12
Grammatical Complexity Workshop
27
Measuring complexity for speakers:
more about agreement errors
Studies of formation of grammatical dependencies: even
subject-verb agreement can be complex!
•
Producing subject-verb agreement: a “local noun”
embedded in a subject noun phrase may interfere
with the production of agreement - but structure
constrains that interference. (Bock & Cutting 1992)
• The key [PP to the cabinet]… IS / ARE…
• The key [PP to the cabinets]… IS / ARE…
• The key [RC that ___ locks the cabinets]… IS / ARE…
3/23/12
Grammatical Complexity Workshop
28
Measuring complexity for speakers:
more about agreement errors
• Producing subject-verb agreement: “Local nouns”
that are embedded more deeply are less likely to
interfere with agreement production (Franck, Vigliocco &
Nicol, 2002)
• The threat [PP to the president [PP of the companies]]
ARE…
… IS /
• The threat [PP to the presidents [PP of the company]]
ARE…
… IS /
– so syntactic structure can directly affect production
complexity – and here, the more complex
structure has made one component of producing
this sentence easier!
3/23/12
Grammatical Complexity Workshop
29
Measuring complexity for speakers:
Other measures:
• Onset latencies – how long before speaker starts to
respond
– Show difficulty in processing (e.g., competition between verb
forms in subject-verb agreement: Haskell & MacDonald,
2003; Staub, 2009)
• Eye tracking
– Shows the interface between high-level message formulation
and sentence planning (Brown-Schmidt & Tanenhaus, 2006)
• Directed elicitation of alternate forms
– Provides a measure of accessibility (e.g., production of
optional complementizers allows speakers time to access
upcoming constituents: Ferreira & Firato, 2002)
3/23/12
Grammatical Complexity Workshop
30
So that was the fourth point:
• Psycho- and neurolinguistics can’t provide a royal
road to measuring the complexity of a grammar or a
language, but they do provide tools to measure
processing complexity of individual
sentences/utterances for speakers vs. listeners or
readers, and learners vs. skilled language users.
3/23/12
Grammatical Complexity Workshop
31
5. A valid measure of complexity will have to
integrate across many linguistic levels (including
semantics), and take frequency into account.
• What does that mean, and what kinds of data
support it? Let’s break it up into sub-claims
that we can examine one at a time.
3/23/12
Grammatical Complexity Workshop
32
What valid complexity measures would have to do:
5. Complexity measures must
predict processing effort for
multiple levels, and for
interactions among levels, taking
frequency into account
O’Grady: “… the interaction of simple elements and
phenomena can yield systems and effects of a
qualitatively different and more complex nature.”
3/23/12
Grammatical Complexity Workshop
33
First, let’s talk about processing effort for
different levels and their interactions
• A testable general complexity measure would have to
be able to make predictions about the effort for
processing short individual utterances or passages,
and correctly predict the relative effort needed.
• We’ve already seen that this effort depends partly on
the choice of individual lexical items within those
utterances or passages
– She can’t sing it; Every minute of our time was attacked.
3/23/12
Grammatical Complexity Workshop
34
5. Complexity measures must predict
processing effort for different levels and
their interactions
• While linguists may analyze a particular utterance into several
components, the mind may store it and process it as a whole, or
as both whole and analyzed.
– Work on idiom blends suggests that idioms are both analyzable and stored
as lexical items (Cutting & Bock, 1997)
Observed speech error: “Help all you want”, blended from idioms/formulas
‘Help yourself!’ and ‘Take all you want!’
- idiom blends like this respect the internal structure of each component,
because their surface syntax remains well-formed, so the speaker must
have access to those internal structures.
As many theorists now argue,
• The ‘whole vs. analyzed’ opposition is much too crude, as many
linguists have argued on theoretical grounds (Culicover 1999,
among many others.)
3/23/12
Grammatical Complexity Workshop
35
5. Complexity measures must predict
processing effort for different levels, and
their interactions
Start with some intuitive evidence for constructions
•
Particular verbs may be specified or preferred by a particular
construction, such as mind if in the politeness formula Do/will [person
A] mind if [event X]? (Do you mind if I sit here?). Easy to process,
and not completely fixed lexically. (This construction is nested in more
general patterns, cf. Would your mother have a fit if I…)
•
Conversely, when an unexpected word is used in a familiar
construction – especially if it evokes a different construction - it can
make the construction relatively harder to process (Would you care if I
sit here?).
– so ‘effort for different levels’ needs to include ‘mixed levels’, e.g.
structures with some lexical items specified
3/23/12
Grammatical Complexity Workshop
36
5. Complexity measures must predict
processing effort for different levels and
their interactions
Psycho- and neurolinguistics also tell us that the ‘whole vs. analyzed’
opposition is too crude. We even have to go beyond constructions and
talk about collocations.
• Collocations with high transitional probabilities, even when the
collocations don’t form constructions (e.g., subject + aux
collocations, he has or I am), are easier for speakers to produce.
(That’s why we can have contractions across the NP-VP boundary!)
• Smooth flow across this boundary is well exemplified in fluent aphasias,
e.g. French jargon aphasia (Lecours et al. 1981)
• Learning probabilities is a subconscious (procedural) and gradual
process. Expectations that A will be followed by B, or that A will occur
in structure α, become stronger over time, rather than clicking from
‘nothing’ to ‘all’..
3/23/12
Grammatical Complexity Workshop
37
‘Agrammatic’ aphasic speakers show
the effect of high sequential
probabilities
• … forgot the wash the dishes ‘forgot that she was washing the
dishes’
• … I like the go home. ‘I’d like to go home.’
Once they have chosen the definite article to follow forget or like,
these speakers are in trouble; both plug in familiar phrases (wash
the dishes, go home) with appropriate semantic content, but in
forms that cannot follow the. These utterances are difficult to
explain in grammatical terms, because they show the article being
substituted for the infinitive marker, and, even more strikingly,
because the collocation ‘V+the’ goes across the major syntactic
boundary between the verb and what should be the start of its NP
object.
3/23/12
Grammatical Complexity Workshop
38
5. Complexity measures must predict processing
effort...taking frequency into account
• We can’t escape dealing with usage and frequency.
• In the limit, there may be no empirically testable difference
between ‘being stored as a unit’ and having very strong,
predictable links from one sub-unit to another. So if your theory
doesn’t permit multi-level or other kinds of complex ‘units’
and/or doesn’t recognize collocations that aren’t constructions,
that’s not necessarily a problem.
• What you do need is – at least - a way to incorporate item
frequencies and transition probabilities into the
representation of a structure after the structure has its words
filled in (or during the process of getting them filled in).
3/23/12
Grammatical Complexity Workshop
39
5. Complexity measures must predict
processing effort for different levels, and
their interactions
We’ll shortly consider some more evidence for constructions, as structures
lying between the extremes of lexical items and full clauses.
•
•
But: the complexity of deploying a construction is not determined by
construction frequency alone (e.g., Dutch word order in aphasia,
Bastiaanse, Bouma & Post 2009)
For example, the interpretation or deployment of a construction,
such as the English subject or object cleft, may be made better or
worse by the existence of similar constructions (Dick et al.
2001:772): the ‘paradigmatic axis' is relevant to processing complexity
– more on that soon.
3/23/12
Grammatical Complexity Workshop
40
5. Complexity measures must predict processing
effort for different levels, and their interactions,
taking frequency into account
• Purely structural complexity does affect normal and aphasic
language processing (e.g., Thompson & Shapiro 2007 showed that
practice on structurally more complex clause generalizes to
improvement on less complex clauses, but practice on simple clauses
doesn’t generalize to more complex ones).
• But in general: processing effort is a function of the interaction
of structure and frequency at multiple levels,
• Let’s look at a psycholinguistic study of normal speakers and a
related one of aphasic speakers which demonstrates this.
3/23/12
Grammatical Complexity Workshop
41
5. Complexity measures must predict processing
effort for different levels, and their interactions –
taking frequency into account
• An example of frequency/structure interaction: Relative
verb-(subcategorization) frame frequencies create a bias
(reader’s expectation) that affects readers’ processing patterns
and comprehension.
‘Shrink’, for example, has a syntactic bias towards the undergoersubject argument structure – it is more frequently used in the
‘unccusative’ frame than in any other.
The sweater shrank two sizes
They shrank the sweater two sizes
5. Complexity measures must predict processing
effort for different levels, and their interactions –
taking frequency into account
The sweater shrank two sizes
They shrank the sweater two sizes
• Eye movements during reading: verbs with similar syntactic
biases pattern together, whereas verbs that are similar only
in meaning do not, Garnsey et al. 1997.
• Comprehension: Clauses that conform to a verb’s bias, They proposed X, They suggested that Y - comprehended
faster and more accurately than bias-violating sentences
with the same structure
- They proposed that X, They suggested Y (ibid.)
• Production studies supporting this: Gahl & Garnsey 2004,
2006.
5. …taking both structure and frequency into
account
• Gahl et al. 2003: People with aphasia comprehend
sentences better when the verb is in its preferred
frame.
• Clauses with unaccusatives (undergoer-subjects) can be
hard or easy, depending on how typical it is for the verb to be
used in the unaccusative construction.
• This supports a processing model that has relatively
direct semantic construal of the verb frames of simple
clauses, rather than indirect construal involving, e.g.,
traces.
The interaction of levels and the effect of
frequency on complexity also support our
sixth point:
6. Construction-based and usage-based approaches to
grammar can provide insights into how grammars can
come closer to reflecting what our brains do.
Finally, let’s look at evidence for our last two points
3/23/12
Grammatical Complexity Workshop
45
a valid measure of complexity will have to integrate across many linguistic level,
and furthermore…
Complexity measures must be
compatible with different performance
metrics
There can be no single performance metric.
3/23/12
Grammatical Complexity Workshop
46
Complexity measures must be compatible with
different performance metrics
There can be no single performance metric. Why not?
• What is effortful for speakers does not always match what is
effortful for hearers
• Speakers and hearers have been shown to be sensitive to
different structural features the same utterance types
• Therefore: The goal of an overall complexity measure needs to
be split into sub-goals; several roughly commensurable but
sometimes incompatible measures are required.
• Let’s look at some experimental evidence
3/23/12
Grammatical Complexity Workshop
47
Complexity measures must be compatible with
different performance metrics
Identifying referents--Production
Cognitive demands can swamp information that would improve referential
success (Wardlow Lane & Ferreira, 2008)
• Speakers are asked to name target
objects (in the common ground) so that
listeners can identify them, while being
faced with privileged objects of varying
saliency.
• Speakers could name the target object
by only using information that is
common ground (e.g., “the heart”) or by
also using privileged information (e.g.,
“the small heart”) that is useless and
possibly confusing to the hearer.
3/23/12
Grammatical Complexity Workshop
48
Complexity measures must be compatible with
different performance metrics
Identifying referents—Production experiment
Cognitive demands swamp information that would improve referential
success (Wardlow Lane & Ferreira, 2008)
• Results: When the saliency of
privileged information was increased,
speakers made more reference to it,
(e.g., identifying the target object as “the
small heart”) than when such
information was less salient, despite the
risk of confusing the listener.
• In this task, cognitive demands result in
a lower processing load for the speaker
when producing more (complex)
descriptions.
3/23/12
Grammatical Complexity Workshop
49
Complexity measures must be compatible with
different performance metrics
Identifying referents—Comprehension experiment using eye-tracking
(following the listener’s successive visual fixations)
Visual context affects ambiguity resolution (Spivey, Tanenhaus,
Eberhard & Sedivy, 2002)
Four conditions: Listeners heard either
Put the apple on the towel in the box
or
Put the apple that’s on the towel in the box
Sometimes what they saw was this:
A. ‘on the
towel’ is
redundant:
there’s only
one apple
3/23/12
Grammatical Complexity Workshop
50
Complexity measures must be compatible with
different performance metrics
Identifying referents—Comprehension experiment using eye-tracking
(following the listener’s successive visual fixations)
Visual context affects ambiguity resolution (Spivey, Tanenhaus,
Eberhard & Sedivy, 2002)
Four conditions: Listeners heard either
Put the apple on the towel in the box
or
Put the apple that’s on the towel in the box
And sometimes what they saw was this:
B. ‘on the
towel’ is
relevant: there
are two apples
3/23/12
Grammatical Complexity Workshop
51
Complexity measures must be compatible with
different performance metrics
Put the apple on the towel in the box.
• Listeners’ eye movements were recorded to see whether they treated the
PP modifier (e.g., “…on the towel”) in the instruction as a goal, in which
case they would look to the empty towel, or as a modifier, in which case
they would look to the apple on the towel, and not to the empty towel.
A.
redundan
t info:
one apple
3/23/12
B.
relevant
info:
two
apples
Grammatical Complexity Workshop
52
Complexity measures must be compatible with
different performance metrics
Participants heard instructions in one of these two forms:
- temporarily ambiguous: Put the apple on the towel in the box or
- unambiguous (control): Put the apple that’s on the towel in the box
• Results: In the one-referent visual condition listeners were more likely to
look towards the empty towel – that is, to treat the PP modifier on the towel in
this temporarily ambiguous instruction as a goal - than in the two-referent
instruction.
A.
redundan
t source
info:
one apple
3/23/12
B. useful
source
info:
two
apples
Grammatical Complexity Workshop
53
Complexity measures must be compatible with
different performance metrics
• Results - 2: In the two-referent condition (B), there was no difference
between the responses to the temporarily ambiguous Put the apple
on the towel in the box and the unambiguous Put the apple that’s on
the towel in the box. Participants looked to the correct apple, and then to
the correct goal (the box), but not to the false goal (the towel).
A.
redundan
tsource
info:
one apple
3/23/12
B. useful
source
info:
two
apples
Grammatical Complexity Workshop
54
Complexity measures must be compatible with
different performance metrics
•
•
So the visual context – the second apple – in display (B) resulted in a
lower processing load for the listeners who heard the temporarily
ambiguous (garden-path) Put the apple on the towel in the box
instruction.
The structural ambiguity (does ‘on the towel’ modify the apple or
describe what is to be done with it?) apparently added no
complexity/processing load for them.
3/23/12
Grammatical Complexity Workshop
55
Complexity measures must be compatible with
different performance metrics
• So we’ve seen that both speaker and hearer are
influenced by their environments.
• Speakers may produce something that’s more
complex because it’s easier to say in the context,
even though that context can make the utterance
more difficult for the listener. (Wardlow Lane &
Ferreira, 2008)
• Listeners can easily comprehend something that’s
more complex by making use of the visual context.
(Spivey et al., 2002)
3/23/12
Grammatical Complexity Workshop
56
What valid complexity measures would have to do:
7. Complexity measures must handle
competition and how it gets resolved
in both comprehension and production
3/23/12
Grammatical Complexity Workshop
57
7. Complexity measures must handle
competition and how it gets resolved in both
comprehension and production
Processing is strongly influenced by competition – normal and aphasic
speakers have both substitution errors and blends.
Competing lexical items - substitutions
.
addressing a child using the name of their sibling
tip-of-the-tongue (Bhatnagar – Baharav – Bhuvana!)
aphasic word search: my right side was blunt- lumptrip- slipCompeting members of an inflectional paradigm
Competing interpretations (visiting relatives…, garden paths)
Competing possible continuations of a partially-produced structure.
3/23/12
Grammatical Complexity Workshop
58
7. Complexity measures must handle
competition and how it gets resolved in both
comprehension and production
• English-speaking aphasic speaker struggles: ‘And the boy give to a
cookie—the boy give to girl a cookie,’ apparently torn between
double object and prepositional constructions.
• More errors of article choice among people with agrammatic
aphasia in German than in Italian. A German speaker struggles to
find the right form: “die...der...das...die...den..den Hund”
(Bates, Wulfeck, & MacWhinney 1991)
• More verb inflection errors among people with agrammatic aphasia
in languages with richer verb paradigms.
• So: the ‘paradigmatic axis’, as well as the structural ‘syntagmatic
axis’, affects the complexity of deploying a given structure.
3/23/12
Grammatical Complexity Workshop
59
What valid complexity measures would have to accommodate:
8. Pragmatics/real-world knowledge
are involved in resolving this
competition
3/23/12
Grammatical Complexity Workshop
60
8. Pragmatics/real-world knowledge are
involved in resolving this competition
Pragmatics has an enormous effect on processing
• We already saw this in the metaphor processing study
• Other speakers here have also made this point
Let’s look at a clear, reasonably controlled example…
3/23/12
Grammatical Complexity Workshop
61
What’s this?
3/23/12
Grammatical Complexity Workshop
62
What’s this?
3/23/12
Grammatical Complexity Workshop
63
What’s this?
3/23/12
Grammatical Complexity Workshop
64
8. Pragmatics/real-world knowledge is
involved in resolving this competition
Competing perspectives add to the difficulty of choosing among
truth-value-equivalent constructions
• There’s a table with a lamp, A lamp on a table – we’re
rarely tempted to say There’s a table under the lamp !
• There’s a bed with a pillow at the wrong end or
A pillow at the foot of a bed – we don’t hesitate
to choose the larger object as the location.
•
In the third picture, we’re slowed up by the four-way competition
among possible orders of mention and choice of location:
–
–
–
–
3/23/12
The footstool is behind the armchair
The armchair has a footstool behind it
There’s a footstool with an armchair in front of it
There’s an armchair in front of a footstool.
Grammatical Complexity Workshop
65
8. Pragmatics/real-world knowledge is
involved in resolving this competition
• Performance data like this lead us to conceptualize
complexity as requiring an integration of fluctuating
levels of competition among different types of
structures, varying from moment to moment as well
as from speaker to speaker.
3/23/12
Grammatical Complexity Workshop
66
Wrapping up
3/23/12
Grammatical Complexity Workshop
67
To evaluate complexity measures for speakerinternal grammar,
we need:
• Studies with systematic variation of structures and
lexical items that would let us zero in on how
construction frequency, lexical frequency, and other
sources of processing difficulty interact.
• Should be carried out with normal speakers using
ERP or other sensitive measures of processing
load, because the number of linguistic variables is so
high that a good design would impose serious
burdens on a speaker with aphasia.
3/23/12
Grammatical Complexity Workshop
68
Implications for a
valid complexity measure
•
•
•
•
Point 1 – the need to work across unit sizes/levels and to predict the
interaction of frequency and structure (including transition probabilities)
are arguments for framing a processing account in terms of a
formalization of grammar that takes surface structure (constructions)
and usage into account
Point 2 – compatibility with different performance metrics - implies that
the goal of a creating formal processing complexity metric needs
to be split into sub-goals for comprehension, production, and
learning.
Point 3 – that competition adds to complexity – implies that the
‘paradigmatic axis’ as well as the structural ‘syntagmatic axis’ is affects
the complexity of a given structure.
Point 4 – the need to deal with the impact of real-world situations on
processing - suggests that any formal metric, even one that satisfies
the above requirements - will be incomplete. But that doesn’t mean it
won’t be interesting and useful!
3/23/12
Grammatical Complexity Workshop
69
How might a formal system handle
this?
• The construction grammar approach treats language as an
inventory of constructions (both lexical and combinatoric)
(abstract entities that are the loci of constraints on the interface
of form & meaning; Sag, Boas, & Kay 2012).
• Each construction has specific information about its properties;
we don’t assume generalizations across utterances simply
because of structural similarity. For example, there are unique
properties of various filler-gap constructions (Sag, 2010). By
using a type hierarchy, we can represent different grains of
constructions.
3/23/12
Grammatical Complexity Workshop
70
How might a formal system handle
this?
•
•
•
Furthermore, constructions contain more than just syntactic
information—they contain semantic and pragmatic information as well,
thus formalizing these elements.
These features may allow us to deal with the interaction of different
levels (morphological, lexical, syntactic, semantic & pragmatic)
although it does not directly address the issue of frequency and
transition probabilities.
The unique specification of construction properties organized in a type
hierarchy could allow us to predict certain types of competition. For
example, if two constructions are specified for the same lexical item,
argument structure, or co-occurrence restrictions, we would expect
there to be competition between those two constructions—although the
competition itself is not formally represented.
3/23/12
Grammatical Complexity Workshop
71
3/23/12
Grammatical Complexity Workshop
72
Thank you!
Special thanks to Laura Michaelis-Cummings for steering us to
good readings and giving feedback.
3/23/12
Grammatical Complexity Workshop
73

Looking for a ‘gold standard’ to measure language complexity: What psycholinguistics and neurolinguistics can (and can’t) offer to formal linguistics Lise Menn & Jill.

Transcript Looking for a ‘gold standard’ to measure language complexity: What psycholinguistics and neurolinguistics can (and can’t) offer to formal linguistics Lise Menn & Jill.

Directory