Convention, Metaphors, and Similes

Download Report

Transcript Convention, Metaphors, and Similes

Elliptical Arguments
Patrick Hanks
Institute of Formal and Applied Linguistics,
Charles University in Prague, Czech Republic
***
1
Outline of the talk
• A task for e-lexicographers: identifying
syntagmatic patterns (or constructions) in corpora,
and establishing what they mean.
– Patterns can include quite a lot of variation.
• We also need a lexically driven theory of
language: accounting for rules governing the
normal and abnormal uses of words
• Abnormal uses cause problems for lexical analysts
• This presentation will discuss two such problems.
• Conclude with an on-line demo.
2
Corpus Pattern Analysis (CPA)
• The lexicographical task is to establish how words are
used, not just what they mean
– Such an investigation must be based on corpus analysis,
not guesswork and imagination
– Invented examples have a tendency to distort.
• BUT authenticity alone is not enough
– Bizarre authentic examples also distort, e.g.:
– “I hazarded various Stuartesque destinations like
Florida, Bali, Crete and Western Turkey.” – J. Barnes
– “Always vacuum your moose from the snout up.” –
Massachusetts Journal of Taxidermy, 1986
3
The need for patterns
• We need to establish, through painstaking corpus analysis,
the patterns of usage that are associated with each word.
• And we need a reliable theoretical base:
• Some mixture of components such as
–
–
–
–
Herbst et al. 2004. Valency Dictionary of English.
Fillmore et al.: FrameNet
Miller, Fellbaum: WordNet
Pustejovsky. 1995: The Generative Lexicon.
• Different patterns of usage around a lexeme activate
different meanings.
• We need to distinguish patterns from abnormal, innovative
linguistic behaviour.
4
Empirical recogniton of
patterns
• When you first open a concordance, patterns leap out at
you.
– Collocations make patterns: one word goes with another
– To see how words make meanings, we need to analyse collocations
• The more you look, the more patterns you see.
BUT
• When you try to formalize the patterns, you start to see
more and more exceptions.
• The boundaries are fuzzy and there are many outlying
cases.
• Speakers and writers exploit the norms of language.
5
The linguistic ‘double-helix’
hypothesis
• A language is a system of rule-governed behaviour.
• Not one, but TWO (interlinked) sets of rules:
1. Rules governing the normal uses of words to make
meanings
2. Rules governing the exploitation of norms
6
What is a pattern?
• The verb is the pivot of the clause.
• A pattern is a statement of the clause structure
(valency) associated with a meaning of a verb,
– together with typical semantic values of each
argument, realized by salient collocates
• Different semantic values of arguments activate
different meanings of each verb.
7
Pattern are contrastive
fire, verb
1. [[Human]] fire [[Firearm]] (at [[Phys Obj = Target]])
2. [[Human]] fire [[Projectile]] (from [[Firearm]]) (at [[Phys
Obj = Target]])
3. [[Human 1]] fire [[Human 2]]
4. [[Anything]] fire [[Human]] {with enthusiasm}
5. [[Human]] fire [NO OBJ] ....
• Etc.
8
Semantic Types and Ontology
• Items in double square brackets are semantic
types.
• Semantic types are being gathered together into a
shallow ontology.
– (This is work in progress in the currect CPA project)
– Preliminary outline in Pustejovsky, Rumshisky, and
Hanks 2004
• Each type in the ontology will (eventually) be
populated with a set of lexical items on the basis
of what’s in the corpus under each relevant
pattern.
9
Exploitations
• People exploit the rules of normal usage for
various purposes:
• For economy and speed:
– Conversation is quick
– Listeners (and readers) get bored easily
– Words that are ‘obvious’ can sometimes be omitted
• To say new things (reporting discoveries,
registering patents, ...)
• To say old things in new ways
– For rhetoric, humour, poetry, politics …
10
Anomalous collocates
exploit norms
• “… a brick arrived through my living room window.” —
(BNC) M. Grist, 1993. Life at the tip.
– Normally, people (travellers) and vehicles arrive – not bricks.
• Whatever the intention, rehabilitation does punish
people; in particular, it allows people to be put into
institutions where they would rather not be. —(BNC) Bob
Roshier, 1989. Controlling Crime.
– Normally, people punish people – not procedures such as
rehabilitation.
11
The null object alternation
• Earlier in this talk, I said:
– “Invented examples have a tendency to distort;
Bizarre authentic examples also distort.”
• Someone might ask, “distort what?”
• But when I said this, I assumed you know what
such examples distort – common knowledge
between us – so I don’t need to say it.
• Omitting – eliding – ‘unnecessary’ words is a very
common pattern of linguistic behavior.
12
Ellipsis
• Absence of an expected collocate is a type of
exploitation.
– The police fired [[]] into the crowd.
– The police fired rubber bullets [[]].
– He gave the order and they fired [[]] [[]].
• The valency pattern of this sense of fire, v.,
requires SUBJECT, OBJECT, and ADVERBIAL:
– [[Human]] fire [[Projectile] [Adv[Direction]]
• Correct description of valency requires syntactic
analysis and semantic typing of arguments.
13
Ellipsis and ambiguity
Corpus example: Later that morning he changed.
– What is the meaning of change here?
a? At breakfast he was still wearing a black tie and crumpled
dinner jacket from the night before. Later that morning he
changed.
b? At breakfast he greeted us with a cheerful grin and seemed
not to have a care in the world. Later that morning he
changed.
c? He got on at Köln thinking that it was a through train to
Berlin, but the ticket inspector told him that it would
terminate at Hannover. Later that morning he changed.
14
Only primary norms are exploited
by elision (?)
• Many small farmers, unable to cultivate successfully,
turned to the sale or renting of land.
– BUT NOT: *He had many friends in America but in
England he was unable to cultivate successfully.
• We punish too much—and … we imprison too much.
– BUT NOT:
• He offered one to the Englishman, who declined.
– “Whatever is reported as having been declined has already been
named, mentioned, or indicated with sufficient clarity; so that the
reader, arriving at the word declined, need be in no doubt about
what would be a suitable object or infinitive clause.”
–Sinclair (1991)
15
Types and Qualia in CPA
• The apparatus needed for analysing nouns is
different from that needed for verbs
– Plug and socket
• Verbs need event typing and argument structure
• Nouns need analysis of their qualia structure
[Pustejovsky’s term]:
– What sort of thing is it?
– What’s it for?
– What properties does it have?
AND their semantic prosody: is it good or bad? (and if so, for
whom?)
AND their verb preferences
16
Each argument of each verb
is a complex lcp
• [[Event | Human]] calm [[Animate]]
– calm a hysterical patient
– calm the horses
– But can you *calm a cockroach?
• Not part of the lcp for “calm [[Animate]]” – not a norm
– Calm {[POSDET] {nerves | anxiety} [= properties of
[[Animate]] ]
– Calm a riot [= behaviour of [[Animate]] ]
– Calm the market [[= Location = Activity in Location =
Human Group Acting in Location]]
17
Semantic types and
semantic roles
• sentence, v.
• PATTERN: [[Human 1 = Judge]] sentence [[Human 2 =
Convicted Criminal]] to [[{Time Period | Event} =
Punishment]]
• IMPLICATURE: [[Human 1]]
• SECONDARY IMPLICATURE: [[Time Period]] is a jail
sentence
• EXAMPLE: Mr Woods sentenced Bailey to 7 years.
Note that the implicature is “anchored” to the pattern.
18
ON-LINE DEMO (?)
•
•
•
•
http://nlp.fi.muni.cz/projects/cpa
Choose Web Access
Log-in: guest
Password: guest
19
Shimmering lexical sets
• Lexical sets are not stable – not „all and only”.
• Example from Hanks and Jezek (2008):
–
–
–
–
[[Human]] attend [[Event]]
[[Event]] = meeting, wedding, funeral, etc.
But not all events: not thunderstorm, suicide.
and not only events: attend school, attend a clinic
• Contrast with another pattern for attend:
– [[Human 1]] attend [[Human 2 = High Status]]
20
Meanings and boundaries
• Boundaries of all linguistic and lexical categories
are fuzzy.
– There are many borderline cases.
• Instead of fussing about boundaries, we should
focus instead on identifying prototypes
• Then we can decide what goes with what
– Many decision will be obvious.
– Some decisions – especially about boundary cases –
will be arbitrary.
21
The Idiom Principle (Sinclair)
• In word use, there is tension between the
„terminological tendency” and the
„phraseological tendency”:
– The terminological tendency: the tendency for words
to have meaning in isolation
– The phraseological tendency: the tendency for the
meaning of a word to be activated by the context in
which it is used.
22
Current work in progress
• Hanks (forthcoming): Lexical Analysis: Norms and
Exploitations. MIT Press
– A corpus-driven, lexically based theory of meaning in
language
• Linked to PDEV (A Pattern Dictionary of English Verbs)
by CPA (Corpus Pattern Analysis)
– A basic infrastructure resource
– 468 verbs analyzed and released, freely available
– http://nlp.fi.muni.cz/projects/cpa
– Experiments with automating the analytical procedure
and applying the results for NLP (IR, MT, …) and
language teaching (lexical syllabus design)
– Building a shallow ontology is in progress
23
Semantic Frames: FrameNet
• “Word Meanings must be described in relation to
semantic frames—schematic representations of the
conceptual structures and patterns of beliefs,
practices, institutions, images, etc., that provide a
foundation for meaningful interaction in a given
speech community.”
—Fillmore et al. in International Journal of
Lexicography 16 (3): p. 235
24
FrameNet and Valency
• “Syntactic valence information is usually specified in
terms of the phrase type of the possible complements, and
in terms of the grammatical functions … expressed in
terms of subcategorization frames.” – ibid, p. 236
• SOME PROBLEMS WITH THIS:
– Aiming at all possible complementation frames of a
verb may be too ambitious
– Better to aim at all normal complementation frames
– In a slot-and-filler grammatical model (Halliday), not a
generative model
– “Subcategorization” carries theoretical assumptions that
may be incompatible with empirical data analysis
25
A methodological problem?
“ look at examples of one particular word, [How many? How
chosen?]
b. for each frame element that occurs with that word, look for
other words with similar meanings that also take that kind of
complement,
c. notice which complement types cluster together with groups of
meaning-sharing words,
d. given two types of complement that both occur with the target
word, if one complement regularly occurs with one group of
related words, and the other with a different group …, this is
strong evidence for a a sense distinction (based on a frame
distinction).”
—Atkins et al. in IJL 16 (3): p. 255
QUESTION: Does (should?) FrameNet proceed frame by frame?
Or verb by verb? Or both at the same time?
a.
26
Thanks
• The late John Sinclair & colleagues (Cobuild project)
• Bob Taylor, Marie-Claire van Leunen & the late Digital
Equipment Corporation Systems Research Center in Palo
Alto (Hector project)
• James Pustejovsky, Anna Rumshisky, & Brandeis U.
• Masaryk U., Brno & Karel Pala, Pavel Rychly, and Adam
Rambousek
• Institute of Formal and Applied Linguistics, Charles U.,
Prague, & Jan Hajic, Martin Holub
• Various Czech agencies for funding
• You, for listening
27