Discourse Parsing in the Penn Discourse Treebank: Using
Download
Report
Transcript Discourse Parsing in the Penn Discourse Treebank: Using
Ph.D. Thesis Proposal
Discourse Parsing in the Penn Discourse Treebank:
Using Discourse Structures to Model Coherence and
Improve User Tasks
Ziheng Lin
Advisors: Prof Min-Yen Kan and Prof Hwee Tou Ng
Introduction
A text is usually understood by its discourse
structure
Discourse parsing: a process of
Identifying discourse relations, and
Constructing the internal discourse structure
A number of discourse frameworks has been
proposed:
2
Mann & Thompson (1988)
Lascarides & Asher (1993)
Webber (2004)
…
Introduction
The Penn Discourse Treebank (PDTB):
Is a large-scale discourse-level annotation
Follows Webber’s framework
Understanding a text’s discourse structure is
useful:
3
Discourse structure and textual coherence have a
strong connection
Discourse parsing is useful in modeling coherence
Discourse parsing also helps downstream NLP
applications
Contrast, Restatement summarization
Cause QA
Introduction
Research goals:
1.
2.
3.
4
Design an end-to-end PDTB-styled discourse
parser
Propose a coherence model based on discourse
structures
Show discourse parsing improves downstream
NLP application
Outline
Introduction
Literature review
1.
2.
1.
2.
3.
4.
5.
6.
7.
5
Discourse parsing
Coherence modeling
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
Discourse parsing
Recognize the discourse relations between two
text spans, and
Organize these relations into a discourse
structure
Two main classes of relations in PDTB:
6
Explicit relations: explicit discourse connective such
as however and because
Implicit relations: no discourse connective, harder
to recognize
parsing implicit relations is a hard task
Discourse parsing
Marcu & Echihabi (2002):
Word pairs extracted from two text spans
Collect implicit relations by removing connectives
Wellner et al. (2006):
Connectives, distance between text spans, and event-based features
Discourse Graphbank: explicit and implicit
Soricut & Marcu (2003):
Probabilistic models on sentence-level segmentation and parsing
RST Discourse Treebank (RST-DT)
duVerle & Prendinger (2009):
SVM to identify discourse structure and label relation types
RST-DT
Wellner & Pustejovsky (2007), Elwell & Baldridge (2008), Wellner (2009)
7
Coherence modeling
Barzilay & Lapata (2008):
Barzilay & Lee (2004):
8
Local coherence
Distribution of discourse entities exhibits certain
regularities on a sentence-to-sentence transition
Model coherence using an entity grid
Global coherence
Newswire reports follow certain patterns of topic
shift
Used a domain-specific HMM model to capture
topic shift in a text
Outline
Introduction
Literature review
Recognizing implicit discourse relations
1.
2.
3.
1.
2.
4.
5.
6.
7.
9
Methodology
Experiments
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
Methodology
Supervised learning on a maximum entropy
classifier
Four feature classes
10
Contextual features
Constituent parse features
Dependency parse features
Lexical features
Methodology:
Contextual features
Dependencies between two adjacent discourse
relations r1 and r2
independent
fully embedded argument
shared argument
properly contained argument
pure crossing
partially overlapping argument
Fully embedded argument and shared argument are
the most common ones in the PDTB
11
Methodology:
Contextual features
For an implicit relation curr that we want to
classify, look at the surrounding two relations
prev and next
12
six binary features:
Methodology:
Constituent parse features
Collect all production rules
S NP VP
NP PRP
PRP “We”
……
Three binary features to check whether a rule
appears in Arg1, Arg2, and both
13
Methodology:
Dependency parse features
Encode additional information at the word level
Collect all words with the dependency types from their
dependents:
“had” nsubj dobj
“problems” det nn advmod
“at” dep
Three binary features to check whether a rule appears in Arg1,
Arg2, and both
14
Methodology:
Lexical features
Marcu & Echihabi (2002) show word pairs are
a good signal to classify discourse relations
Arg1: John is good in math and sciences.
Arg2: Paul fails almost every class he takes.
(good, fails) is a good indicator for a contrast
relation
Stem and collect all word pairs from Arg1 and
Arg2 as features
15
Outline
Introduction
Literature review
Recognizing implicit discourse relations
1.
2.
3.
1.
2.
4.
5.
6.
7.
16
Methodology
Experiments
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
Experiments
w/o feature selection
count
accuracy
count
accuracy
Production Rules
11,113
36.7%
100
38.4%
Dependency Rules
5,031
26.0%
100
32.4%
105,783
30.3%
500
32.9%
Yes
28.5%
Yes
28.5%
Word Pairs
Context
All
Baseline
w/ feature selection
35.0%
40.2%
26.1%
w/ feature selection
Employed MI to select the top 100 rules, and top 500 word pairs (as word
pairs are more sparse)
Production rules, dependency rules, and word pairs all gave significant
improvement with p < 0.01
Applying all feature classes yields the highest accuracy of 40.2%
Results show predictiveness of feature classes:
17
production rules > word pairs > dependency rules > context features
Experiments
Question: can any of these feature classes be omitted to achieve the same
level of performance?
Add in feature classes in the order of their predictiveness
production rules > word pairs > dependency rules > context features
Production
Rules
Dependency
Rules
Word pairs
Context
Acc.
100
100
500
Yes
40.2%
100
100
500
39.0%
500
38.9%
100
100
38.4%
The results confirm that
each additional feature class contributes a marginal performance
improvement, and
all feature classes are needed for the optimal performance
18
Conclusion
Implemented an implicit discourse relation classifier
Features include:
Modeling of the context of the relations
Features extracted from constituent and dependency trees
Word pairs
Achieved an accuracy of 40.2%, a 14.1% improvement over the
baseline
With a component that handles implicit
relations, continue to design a full parser
19
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
1.
2.
3.
4.
1.
2.
3.
5.
6.
7.
20
System overview
Components
Experiments
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
System overview
The parsing algo mimics the PDTB annotation
procedure
Input – a free text T
Output – discourse structure of T in the PDTB
style
Three steps:
21
Step 1: label Explicit relation
Step 2: label Non-Explicit relation (Implicit, AltLex,
EntRel and NoRel)
Step 3: label attribution spans
System overview
22
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
1.
2.
3.
4.
1.
2.
3.
5.
6.
7.
23
System overview
Components
Experiments
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
Components:
Connective classifier
Use syntactic features from Pitler & Nenkova
(2009)
A connective’s context and POS give indication of
its discourse usage
E.g., after is a discourse connective when it is followed
by a present particle, such as “after rising 3.9%”
New contextual features for connective C:
24
C POS
prev + C, prev POS, prev POS + C POS
C + next, next POS, C POS + next POS
The path from C to the root
Components:
Argument labeler
Label Arg1 and Arg2 spans in two steps:
Step 1 - argument position classifier:
Step 1: identify the locations of Arg1 and Arg2
Step 2: label their spans
Arg2 is always associated with the connective
Use contextual and lexical info to locate Arg1
Step 2 – argument extractor:
Case 1 – Arg1 and Arg2 in the same sentence
Case 2 – Arg1 in some previous sentence: assume the
immediately previous
25
Components:
Explicit classifier
Human agreement:
94% on Level-1
84% on Level-2
We train and test on Level-2 types
Features:
26
Connective C
C POS
C + prev
Components:
Non-Explicit classifier
Non-Explicit: Implicit, AltLex, EntRel, NoRel
Modify the implicit classifier to include the
AltLex, EntRel and NoRel
AleLex is signaled by non-connective
expressions such as “That compared with”,
which usually appear at the beginning of Arg2
27
Add another three features to check the beginning
three words of Arg2
Components:
Attribution span labeler
Label the attribution spans for Explicit, Implicit, and AltLex
Consists of two steps:
Step 1: split the text into clauses
Step 2: decide which clauses are attribution spans
prev
curr
next
Features from curr, prev and next clauses:
28
Unigrams of curr
Lowercased and lemmatized verbs in curr
First term of curr, Last term of curr, Last term of prev, First term of next
Last term of prev + first term of curr, Last term of curr + first term of
next
Position of curr in the sentence
Production rules extracted from curr
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
1.
2.
3.
4.
1.
2.
3.
5.
6.
7.
29
System overview
Components
Experiments
Modeling coherence using discourse relations
Proposed work and timeline
Conclusion
Experiments
Each component in the pipeline can be tested
with two dimensions:
Whether there is error propagation from previous
component (EP vs no EP), and
Whether gold standard parse trees and sentence
boundaries or automatic parsing and sentence splitting
are used (GS vs Auto)
Three settings:
30
GS + no EP: per component evaluation
GS + EP
Auto + EP: fully automated end-to-end evaluation
Experiments
Connective classifier
Argument extractor
31
Experiments
Explicit classifier
Non-explicit classifier
32
Experiments
Attribution span labeler
Evaluate the whole pipeline:
33
GS + EP gives F1 of 46.8% under partial match and 33% under
exact match
Auto + EP gives F1 of 38.18% under partial match and 20.64%
under exact match
Conclusion
Designed and implemented an end-to-end
PDTB-styled parser
Incorporated the implicit classifier into the pipeline
Evaluated the system both component-wise as
well as with error propagation
Reported overall system F1 for partial match
of 46.8% with gold standard parses and
38.18% with full automation
34
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
1.
2.
3.
4.
5.
1.
2.
3.
6.
7.
35
A relation transition model
A refined approach: discourse role matrix
Conclusion
Proposed work and timeline
Conclusion
A relation transition model
Recall: Barzilay & Lapata (2008)'s coherence
representation models sentence-to-sentence
transitions of entities
Well-written texts follow certain patterns of
argumentative moves
Reflected by relation transition patterns
A text T can be represented as a relation
transition:
36
A relation transition model
Method and preliminary results:
Extract the relation bigrams from the relation transition
sequence
[Cause Cause], [Cause Contrast], [Contrast
Restatement], [Restatement Expansion]
A training/test instance is a pair of relation sequences:
Sgs = gold standard sequence
Sp = permuted sequence
Task: rank the pair (Sgs, Sp)
Ideally, Sgs should be ranked higher, ie, more coherent
Baseline: 50%
37
A relation transition model
The rel transition sequence is sparse
38
Expect longer articles to give more predictable
sequence
Perform experiments with diff sentence thresholds
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
1.
2.
3.
4.
5.
1.
2.
3.
6.
7.
39
A relation transition model
A refined approach: discourse role matrix
Conclusion
Proposed work and timeline
Conclusion
A refined approach: discourse role matrix
Instead of looking at the discourse roles of
sentences, we look at the discourse roles of terms
Use sub-sequences of discourse roles as features
40
Comp.Arg2 Exp.Arg2, Comp.Arg1 nil, …
A refined approach: discourse role matrix
Experiments:
41
Compared with Barzilay & Lapata (2008) ’s entity
grid model
Outline
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse relations
Proposed work and timeline
1.
2.
3.
4.
5.
6.
1.
2.
3.
7.
42
Literature review on several NLP applications
Proposed work
Timeline
Conclusion
Literature review on several NLP applications
Text summarization:
43
Discourse plays an important role in text
summarization
Marcu (1997) showed that RST tree is a good
indicator of salience in text
PDTB relations are helpful in summarization:
Generic summarization: utilize Instantiation and
Restatement relations to recognize redundancy
Update summarization: use Contrast relations to
locate updates
Literature review on several NLP applications
Argumentative zoning (AZ):
44
Proposed by Teufel (1999) to automatically
construct the rhetorical moves of argumentation of
academic writings
Label sentences with 7 tags:
aim, textual, own, background, contrast, basis,
and other
Has been shown that AZ can help in:
Summarization (Teufel & Moens, 2002)
Citation indexing (Teufel et al., 2006)
Literature review on several NLP applications
Why-QA:
45
Aims to answer generic question “Why X?”
Verberne et al. (2007) showed that discourse
structure in RST framework is helpful in a why-QA
system
Prasad and Joshi (2008) generate why-questions
with the use of causal relations in the PDTB
We believe that the PDTB hierarchical relation
typing will help in designing a why-QA system
Proposed work
Work done:
A system to automatically recognize implicit relations
Sec 3, EMNLP 2009
An end-to-end discourse parser
Sec 4, a journal in preparation
Coherence model based on discourse structures
Sec 5, ACL 2011
Next step, I propose to work on one of the NLP
applications
46
Aim: show that discourse parsing can improve the
performance of this NLP app
Timeline
2010 Sep – Dec
Continue working on the coherence model
Done
2010 Nov – Dec
Write an ACL submission on the coherence model
Done
2011 Jan – May
Work on NLP application
In progress
2011 May – Jul
Thesis write-up
2011 Aug
Thesis defense
47
Outline
1.
2.
3.
4.
5.
6.
7.
48
Introduction
Literature review
Recognizing implicit discourse relations
A PDTB-styled end-to-end discourse parser
Modeling coherence using discourse
relations
Proposed work and timeline
Conclusion
Conclusion
Designed and implemented an implicit discourse
classifier in the PDTB
Designed and implemented an end-to-end discourse
parser in the PDTB representation
Proposed a coherence model based on discourse
relations
Proposed work: apply discourse parsing in one
downstream NLP application
Summarization, argumentative zoning, or why-QA
Parser Demo
49
Thank you!
50
Back up slides
51
The Penn Discourse Treebank
A discourse level annotation over the WSJ corpus
Adopts a binary predicate-argument view on discourse
relations
Explicit relations: signaled by discourse connectives
Arg2: When he sent letters offering 1,250 retired major
leaguers the chance of another season,
Arg1: 730 responded.
Implicit relations:
Arg1: “I believe in the law of averages,”
declared San Francisco batting coach Dusty Baker after
game two.
Arg2: [accordingly] “I’d rather see a so-so hitter who’s hot
come up for the other side than a good hitter who’s cold.”
52
The Penn Discourse Treebank
AltLex relations:
Arg1: For the nine months ended July 29, SFE
Technologies reported a net loss of $889,000 on sales
of $23.4 million.
Arg2: AltLex [That compared with] an operating loss of
$1.9 million on sales of $27.4 million in the yearearlier period.
EntRel:
Arg1: Pierre Vinken, 61 years old, will join the board as
a nonexecutive director Nov. 29.
Arg2: Mr. Vinken is chairman of Elsevier N.V., the Dutch
publishing group.
53
The Penn Discourse Treebank
54
Experiments
Classifier: OpenNLP MaxEnt
Training data: Sections 2 – 21 of the PDTB
Test data: Section 23 of the PDTB
Feature selection: Use Mutual Information(MI)
to select features for production rules,
dependency rules, and word pairs separately
Majority baseline: 26.1%, where all instances
are classified into Cause
55
Components:
Argument labeler
56
A relation transition model
can be represented by:
or:
57
Experiments
The classifier labels no instances of Synchrony, Pragmatic Cause,
Concession, and Alternative
The percentages of these four types are too small: totally only 4.76% in
the training data
As Cause is the most predominant type, it has high recall but low precision
58
Methodology:
Constituent parse features
Syntactic structure within one argument may constrain the relation type
and the syntactic structure of the other argument
(a) Arg1: But the RTC also requires “working” capital to maintain the bad
assets of thrifts that are sold
Arg2: [subsequently] That debt would be paid off as the assets are sold
(b) Arg1: It would have been too late to think about on Friday.
Arg2: [so] We had to think about it ahead of time.
59
Components:
Connective classifier
PDTB defines 100 discourse connectives
Features from Pitler and Nenkova (2009):
60
Connective: because
Self category: IN
Parent category: SBAR
Left sibling category: none
Right sibling category: S
Right sibling contains a VP: yes
Right sibling contains a trace: no
trace
Experiments
Connective classifier:
61
Adding the lexico-syntactic and path features
significantly (p < 0.001) improves accuracy and F1 for
both GS and Auto
The connective with the highest number of incorrect
labels is and
and is always regarded as an ambiguous connective
Experiments
Argument position classifier:
62
Performance drops when EP and Auto are added in
The degradation is mostly due to the SS class
False positives propagated from connective classifier
For GS + EP: 30/36 classified as SS
For Auto + EP: 46/52 classified as SS
the difference between SS and PS is largely due to error
propagation
Experiments
Argument extractor - argument node identifier:
63
F1 for Arg1, Arg2, and Rel (Arg1+Arg2)
Arg1/Arg2 nodes for subordinating connectives are the
easiest ones to locate
97.93% F1 for Arg2, 86.98% F1 for Rel
Performance for discourse adverbials are the lowest
Their Arg1 and Arg2 nodes are not strongly bound
Experiments
Argument extractor:
64
Report both partial and exact match
GS + no EP gives a satisfactory Rel F1 of 86.24% for partial match
The performance for exact match is much lower than human
agreement (90.2%)
Most misses are due to small portions of text being deleted
from / added to the spans by the annotators
Experiments
Explicit classifier:
65
Human agreement = 84%
A baseline that uses only connective as features
yields an F1 of 86% under GS + no EP
Adding new features improves to 86.77%
Experiments
Non-explicit classifier:
66
A majority baseline (all classified as EntRel) gives F1
in the low 20s
GS + no EP shows a F1 of 39.63%
Performance for GS + EP and Auto + EP are much
lower
Still outperforms baseline by ~6%
Experiments
Attribution span labeler:
67
GS + no EP achieves F1 of 79.68% and 65.95% for partial and
exact match
With EP: degradation is mostly due to the drop in precision
With Auto: degradation is mostly due to the drop in recall
Experiments
Evaluate the whole pipeline:
68
Look at the Explicit and Non-Explicit relations that are
correctly identified
Define a relation as correct if its relation type is
classified correctly, and both Arg1 and Arg2 are labeled
correctly (partial or exact)
GS + EP gives F1 of 46.8% under partial match and 33%
under exact match
Auto + EP gives F1 of 38.18% under partial match and
20.64% under exact match
A large portion of misses come from the Non-Explicit
relations
A lexical model
Lapata (2003) proposed a sentence ordering model
Assume the coherence of adjacent sentences is based on lexical
word pairs:
The coherence of the text is thus:
RST enforces two possible canonical orders of text spans:
Satellite before nucleus (e.g., conditional)
Nucleus before satellite (e.g., restatement)
A word pair-based model can be used to check whether
these orderings are enforced
69
A lexical model
Method and preliminary results:
Extract (wi-1,j, C, wi,k) as features:
Use mutual information to select top n features, n =
5000
Accuracy = 70%, baseline = 50%
70
Experiments
w/o feature selection
count
accuracy
Production Rules
11,113
36.7%
Dependency Rules
5,031
26.0%
105,783
30.3%
Yes
28.5%
Word Pairs
Context
All
Baseline
35.0%
26.1%
w/o feature selection
Production rules and word pairs yield significantly better performance
Contextual features perform slightly better than the baseline
Dependency rules perform slightly lower than baseline, and applying all
feature classes does not yield the highest accuracy
noise
71
Components:
Argument labeler: Argument position classifier
Relative positions of Arg1:
SS: in the same sentence as the connective (60.9%)
PS: in some previous sentence of the connective
(39.1%)
FS: in some sentence following the sentence of the
connective (0%, only 8 instances, thus ignored)
Classify the relative position of Arg1 as SS or PS
Features:
72
Connective C, C POS
Position of C in the sentence (start, middle, end)
prev1, prev1 POS, prev1 + C, prev1 POS + C POS
prev2, prev2 POS, prev2 + C, prev2 POS + C POS
Components:
Argument labeler: Argument extractor
When Arg1 is classified as in the same sentence
(SS) as Arg2, it can be one of:
Arg1 before Arg2
Arg2 before Arg1
Arg1 embedded within Arg2
Arg2 embedded within Arg1
Arg1 and Arg2 nodes in the parse tree can be
syntactically related in one of three ways:
73
Components:
Argument labeler: Argument extractor
Design an argument node identifier to
identify the Arg1 and Arg2 subtree nodes within the
sentence parse tree
Features:
74
Connective C
C’s syntactic category (subordinate, coordinate,
adverbial)
Numbers of left and right siblings of C
Path P of C to the node under consideration
Path P and whether the size of C’s left sibling is greater
than one
The relative position of the node to C
Components:
Argument labeler: Argument extractor
When Arg1 is classified as in some previous
sentence (PS), we use the majority classifier
75
Label the immediately previous sentence as Arg1
(76.9%)