Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 Discourse Analysis  Discourse: collocated, related groups of sentences (from book) April 2005 Discourse Analysis David M.

Download Report

Transcript Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 Discourse Analysis  Discourse: collocated, related groups of sentences (from book) April 2005 Discourse Analysis David M.

Discourse Analysis
David M. Cassel
Natural Language Processing
Villanova University
April 21st, 2005
Discourse Analysis

Discourse: collocated,
related groups of
sentences (from book)
April 2005
Discourse Analysis
David M. Cassel
Discourse Analysis
Discourse Model -- a model to represent the
entities mentioned in the discourse
 Coreference or Anaphora Resolution -determining which entity a referring expression
refers to
 Coherence -- modeling the logical flow of the
discourse
The book also discusses Psycholinguistic Studies
of Reference and Coherence

April 2005
Discourse Analysis
David M. Cassel
Anaphora Resolution
Before the game, manager Charlie Manuel said Gavin Floyd's performance
would not affect whether he remains with the team when Vicente Padilla comes
off the disabled list Tuesday.
Then Floyd went out and had a nightmarish first inning: four walks, one wild
pitch, one hit, four runs.
After the game, Manuel said Floyd's disastrous outing had not changed his
mind. The righthander will remain with the club and be used in relief.
"The pitcher we saw in St. Louis is a pitcher who has the ability to be a very
good major-league pitcher," he said. "He didn't have command of his fastball
and couldn't get his breaking ball over tonight... . Maybe the cold was affecting
his breaking ball, because he was bouncing a lot of them."
-- Sam Carchidi, Philadelphia Inquirer, 4/16/05
April 2005
Discourse Analysis
David M. Cassel
Discourse Model
Charlie Manuel
Vicente Padilla
Gavin Floyd
evoke
(introduce)
refer
corefer
Gavin Floyd
Adapted from Figure 18.1,
Speech & Language Processing
April 2005
Discourse Analysis
David M. Cassel
he
Floyd
The righthander
The pitcher we saw in St. Louis
his
Types of Anaphoric References

Indefinite noun phrases


Definite noun phrases


He had a bad game.
Demostratives


The righthander will remain with the club.
Pronouns


A baseball player like that should do well.
This player has a bright future.
One-anaphora

I saw no less than 6 Acura Integras today. Now I want one. (from book)
April 2005
Discourse Analysis
David M. Cassel
Reference Constraints

Number Agreement


Person and Case


Floyd took his glove with him. It fit well.
Syntactic Contraints


He didn’t have command of his fastball.
Gender Agreement


Floyd pitched 6 innings. They went well.
Floyd threw him the ball.
Selectional Restrictions

Floyd stepped onto the mound with the ball. He threw it really fast.
April 2005
Discourse Analysis
David M. Cassel
Preferences

Recency


Grammatical Role


(See article)
Parallelism


Floyd threw the ball to Lieberthal. His arm was getting tired.
Repeated Mention


Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket.
Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too.
Verb Semantics


John telephoned Bill. He lost the pamphlet on Acuras.
John criticized Bill. He lost the pamphlet on Acuras.
April 2005
Discourse Analysis
David M. Cassel
Pronoun Resolution Algorithms

Traditional










Carter: shallow parsing
Rich, LuperFoy: distributed
architecture
Carbonell, Brown: multi-strategy
Rico Pérez: scalar product
Mitkov: combination of linguistic,
statistical (high 80s)
Lappin, Leass: syntax-based
(86%)
Hobbs: Tree Search Algorithm
(91.7%)
Grosz, Joshi, Weinstein:
Centering Algorithm (77.6%)
Hobbs: Coherence
April 2005
Alternative








Discourse Analysis
David M. Cassel
Nasukawa: knowledgeindependent (93.8%)
Dagan, Itai: statistical, corpus
processing (87% for “genuine” it)
Connolly, Burger, Day: machine
learning
Aone, Bennett: machine learning
(“close to 90%”)
Mitkov: uncertainty reasoning
Mitkov: 2-engine (~90%)
Tin, Akman: situational semantics
Say, Vakman
Lappin & Leass
Book presents a slightly modified algorithm for
nonreflexive, 3rd person pronouns. Two parts:
 Update discourse model with salience value
 Resolve pronouns
Let’s apply this to some text:
In the afternoon, Gavin Floyd played baseball at the park. Then he
went to a bar with Mike Lieberthal. He enjoyed a beer.
April 2005
Discourse Analysis
David M. Cassel
Salience Factors
Factor
Sentence recency
Subject emphasis
Existential emphasis
Accusative (direct object) emphasis
Indirect object, oblique complement
emphasis
Non-adverbial emphasis
Head noun emphasis
April 2005
Discourse Analysis
David M. Cassel
Weight
100
80
70
50
40
50
80
Pronoun Salience
Factor
Weight
Role parallelism
35
Cataphora
-175
April 2005
Discourse Analysis
David M. Cassel
L&L Algorithm





Collect the potential referents (up to four sentences
back).
Remove potential referents that do not agree in number
or gender with the pronoun.
Remove potential referents that do not pass
intrasentential syntactic coreference constraints.
Compute the total salience value of the referent by
adding any applicable values to existing salience value.
Select the referent with the highest salience value. In
case of ties, select closest referent in terms of string
position.
April 2005
Discourse Analysis
David M. Cassel
Example
In the afternoon, Gavin Floyd played baseball at the park. Then he went
to a bar with Mike Lieberthal. He enjoyed a beer.
Rec Subj
the afternoon
100
Gavin Floyd
100
baseball
100
the park
100
April 2005
Exist Obj
80
50
Ind- Non- Head Total
Obj Adv Noun
80
180
50
80
310
50
50
250
50
Discourse Analysis
David M. Cassel
150
Example
In the afternoon, Gavin Floyd played baseball at the park. Then he went
to a bar with Mike Lieberthal. He enjoyed a beer.
Carry Rec Subj
the afternoon
90
Gavin Floyd
155
baseball
125
the park
75
Exist Obj
Ind- Non- Head Total
Obj Adv Noun
a bar
100
50
Mike Lieberthal
100
50
April 2005
Discourse Analysis
David M. Cassel
80
230
150
Example
In the afternoon, Gavin Floyd played baseball at the park. Then he went
to a bar with Mike Lieberthal. He enjoyed a beer.
Carry Rec Subj
the afternoon
90
{Gavin Floyd, he}
155
baseball
125
the park
75
100
Exist Obj
80
Ind- Non- Head Total
Obj Adv Noun
50
80
465
80
230
a bar
100
50
Mike Lieberthal
100
50
April 2005
Discourse Analysis
David M. Cassel
150
Example
In the afternoon, Gavin Floyd played baseball at the park. Then he went
to a bar with Mike Lieberthal. He enjoyed a beer.
Carry
the afternoon
45
{Gavin Floyd, he}
230
baseball
62
the park
37
a bar
115
Mike Lieberthal
75
a beer
280
April 2005
Gavin Floyd gets 35 point for Role
Parallelism. Mike Lieberthal does not.
Floyd => 265 points
Lieberthal => 75 points
We pick Floyd as the antecedent of He.
Discourse Analysis
David M. Cassel
Summary



Discourse Analysis requires processing
more text than POS tagging or finding
entities.
Part of tracing the flow of discourse is
resolving anaphora.
That resolution lets us capture more
relationships and other information than
we could otherwise.
April 2005
Discourse Analysis
David M. Cassel