Transcript Slides
Project
LOGO
End-to-End Discourse Parser
Evaluation
Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard
Johansson
Department of Information Engineering and Computer Science
University of Trento, Italy
Project
LOGO
Content
Introduction
Discourse Parser: what + why + how
Discourse Parser & Penn Discourse TreeBank (PDTB)
Our contribution
Architecture
Feature
Result
Conclusion
End2End Disc Pars Eval
2
Project
LOGO
Introduction
What: we refer to coherent structured group of sentences
or expressions as a discourse
Why: discourse structure to represent the meaning of the
document
How :
Process flow: data (discourse) segmentation discourse parsing
discourse structure
Discourse structure includes relations (connective and its
arguments ) lexically anchored in the document text
Common Data Sources: Rhetorical Structure Tree (RST) & Penn
Discourse TreeBank (PDTB ) We used this
End2End Disc Pars Eval
3
Project
LOGO
Examples from PDTB(1)
Arg1 -> I never gamble too far.
Explicit Connective -> In particular
Arg2 -> I quit after one try, whether I win or lose.
[EXPANSION ]
Each annotated relation includes a connective, two
arguments and a sense label of connective
Connective occur between two arguments or at the
beginning of sentence or inside argument
The top-level senses of three-layered hierarchy:
TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION
End2End Disc Pars Eval
4
Project
LOGO
Examples from PDTB(2)
When Mr. Green won a $240,000 verdict in a land condemnation
case against the State in June 1983, he says, Judge O’Kicki
unexpectedly awarded him an additional $100,000. [TEMPORAL ]
As an indicator of the tight grain supply situation in the U.S., market
analysts said that late Tuesday the Chinese government, which often
buys U.S. grains in quantity, turned instead to Britain to buy 500,000
metric tons of wheat. [COMPARISON ]
Since McDonald’s menu prices rose this year, the actual deadline
may have been more. [CONTINGENCY ]
(Arg1 italicized, connectives underlined, Arg2 boldfaced)
End2End Disc Pars Eval
5
Project
LOGO
PDTB Corpus Statistics
Arg2 always in same sentence as connective
60.9% of the annotated Arg1 in same sentence as
connective, 39.1% is in the previous sentence (30.1%
adjacent, 9.0% non adjacent)
We used this statistic information to establish baseline
End2End Disc Pars Eval
6
Project
LOGO
Our Contribution
Developed end-to-end discourse parser to retrieve
discourse structure with explicit connective, 2 arg spans
starting with text paragraph
Evaluation
Established system with Gold-standard data (PTB+PDTB)
Evaluated with baseline
Implemented same method in automated system
Improvement of the automated system in terms of applicability
Overlapping discourse segmentation technique (+2/-2
window) applied on the complete text
Followed chunking strategy for classification
The discourse model is a cascaded CRF
End2End Disc Pars Eval
7
Project
LOGO
End-to-End Architecture
Doc
Parser
Parse_Tree
Chunklink
• By Sabaine
Buchholz
• CoNLL’00
task
AddDiscourse
• Pitler &
Nenkova ‘09
• Conn.
SenseDet.
RootExtract
+Morpha
• Morph & All
Feat
• Johansson+
Minnen et al
Pruner
End2End Disc Pars Eval
Arg2
Arg1
8
Project
LOGO
Features
Features used for Arg1 and Arg2 segmentation and labeling.
F1. Token (T)
F2. Sense of Connective (CONN)
F3. IOB chain (IOB)
F4. PoS tag
F5. Lemma (L)
F6. Inflection (INFL)
F7. Main verb of main clause (MV)
F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1
F9. Arg2 Labels
For more details: Ghosh et al IJCNLP 2011
End2End Disc Pars Eval
9
Project
LOGO
Features: Arg1
Features used for Arg1 and Arg2 segmentation and labeling.
F1. Token (T)
F2. Sense of Connective (CONN)
F3. IOB chain (IOB)
F4. PoS tag
F5. Lemma (L)
F6. Inflection (INFL)
F7. Main verb of main clause (MV)
F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1
F9. Arg2 Labels
For more details: Ghosh et al IJCNLP 2011
End2End Disc Pars Eval
10
Project
LOGO
Features: Arg2
Features used for Arg1 and Arg2 segmentation and labeling.
F1. Token (T)
F2. Sense of Connective (CONN)
F3. IOB chain (IOB)
F4. PoS tag
F5. Lemma (L)
F6. Inflection (INFL)
F7. Main verb of main clause (MV)
F8. Boolean feature for MV (BMV)
Additional feature used only for Arg1
F9. Arg2 Labels
For more details: Ghosh et al IJCNLP 2011
End2End Disc Pars Eval
11
Project
LOGO
Evaluation & Baseline
Metrics: Precision, Recall and F1 measure
Scoring schemes:
Exact Match: correct if classified span exactly coincides with gold standard span
Baseline (On the basis of statistics given at annotation manual):
Arg2: by labeling all tokens of the text span between the connective and the
beginning of the next sentence
Arg1: by labeling all tokens in the text span from the end of the previous
sentence to the connective position; if the connective occurs at the beginning of
a sentence, labeling previous sentence.
End2End Disc Pars Eval
12
Project
LOGO
Exact Arg2 Results:
Comparison Viewgraph
P
R
F1
Baseline
0.53
0.46 0.49
Gold-Standard
0.84
0.74 0.79
Automatic
0.80
0.74 0.77
AutoConn+GoldSPT
0.82
0.70 0.76
GoldConn+AutoSPT
0.76
0.61 0.68
Lightweight(Auto)
0.72
0.56 0.63
End2End Disc Pars Eval
13
Project
LOGO
Exact Arg1 Results:
Comparison Viewgraph
P
R
F1
Baseline
0.19
0.19
0.19
Gold-Standard
0.68
0.39
0.49
Automatic
0.63
0.28
0.39
AutoConn+GoldSPT
0.67
0.31
0.43
GoldConn+AutoSPT
0.62
0.31
0.41
Lightweight(Auto)
0.60
0.27
0.37
End2End Disc Pars Eval
14
Project
LOGO
Features
The IOB(Inside-Outside-Begin) chain all constituents on the path
between the root note and the current leaf node of the tree.
For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP ,
where B-, I-, E- and C- indicate whether the given token is respectively
at the beginning, inside, at the end of the constituent, or a single token chunk.
End2End Disc Pars Eval
15
Project
LOGO
Conclusion
The Automatic end2end system results nearly same with
Gold standard
We lead towards a “lightweight” version of the pipeline –
shallow & less dependence of SPTs
We wish to explore more features
We improved our result by 5 points for Arg1 classification
using a previous sentence feature (Ghosh et al IJCNLP
2011)
End2End Disc Pars Eval
16
Project
LOGO
Thank you
Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson
Department of Information Engineering and Computer Science
University of Trento, Italy
{ghosh, riccardi}@disi.unitn.it
End2End Disc Pars Eval
Project
LOGO
Previous Work
Task limited to retrieving the argument heads (Wellner et
al 2007, Elwell et al 2008)
Dinesh et al. (2005) extracted complete arguments with
boundaries, but only for a restricted class of connectives
The identification of Arg1 has been only partially
addressed in previous works (Prasad 2010)
Automatic surface-sense classification (at class level)
already reached the upper bound of inter-annotator
agreement (Pitler and Nenkova, 2009)
End2End Disc Pars Eval
18
Project
LOGO
Data & Tools
Corpus Used: Penn Discourse Tree Bank (PDTB)
For Gold Standard System: Penn Tree Bank (PTB) corpus is used
Third party software/scripts used:
Stanford Syntactic Tree Parser (by Klein & Manning 2003)
AddDiscourse (Explicit Connective Classification) (Pitler and
Nenkova 2008)
ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL
Shared Task 2000)
RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard
Johansson)
Morpha (Minnen et al 2001)
Conditional Random Field: CRF++ by Taku Kudo
End2End Disc Pars Eval
19
Project
LOGO
Overall Architecture
Syntactic tree parser is used for automatic systems
Connective Detection and classification tool is used for automatic systems
PDTB & PTB are not used during end-to-end automatic testing phase
End2End Disc Pars Eval
20
Project
LOGO
End2End Testing Phase
End2End Disc Pars Eval
21
Project
LOGO
Conditional Random Field
We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence
labeling classification (Lafferty et al., 2001), with second-order Markov
dependency between tags.
Beside the individual specification of a feature in the feature description
template, the features in various combinations are also represented.
We used this tool because the output of CRF++ is compatible to CoNLL
2000 chunking shared task, and we view our task as a discourse
chunking task.
On the other hand, linear-chain CRFs for sequence labeling offer
advantages over both generative models like HMMs and classifiers
applied at each sequence position. Also Sha and Pereira (2003) claim
that, as a single model, CRFs outperform other models for shallow
parsing.
End2End Disc Pars Eval
22
Project
LOGO
Hill Climbing Algorithm
function HILL-CLIMBING ( problem) returns a state that is a local maximum
current 9— MAKE-NODE(problem.INITIAL-STATE)
loop do
neighbor
highest-valued successor of current
if (neighbor.VALUE < current.VALUE) then return current.STATE
current 9<— neighbor [Artificial Intelligence: Stuart J. Russel]
The hill climbing search algorithm, the most basic local search
technique. At each step the current node is replaced by the best
neighbor;
Here neighbor with the highest VALUE, but if a heuristic cost estimate h
is used, we would find the neighbor with the lowest h.
Hill climbing is greedy, fast local search
We optimized this selected set with feature ablation technique, leaving
1 feature each time
End2End Disc Pars Eval
23
Project
LOGO
Features
The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories
of all the constituents on the path between the root note and the current leaf node of the tree.
The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate
whether the given token is respectively at the beginning, inside, at the end of the constituent,
or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain,
except for the last VP, which dominates one single leaf.
End2End Disc Pars Eval
24
Project
LOGO
Result: Gold-lbl & Auto
Arg2
Arg1
P
R
0.84
0.53
0.74 0.79
0.46 0.49
Partial
0.93
0.80
0.82 0.88
0.85 0.82
Overlap
0.97
0.98
0.88 0.92
0.85 0.91
Exact
0.68
0.19
Exact
Partial
Overlap
F1
P
Arg2
R
F1
Exact
0.80 0.74 0.77
Partial
0.91 0.85 0.88
Overlap
0.97 0.88 0.92
Arg1
Exact
0.64 0.31 0.42
semi
Partial
0.76 0.39 0.52
0.39 0.49
0.19 0.19
auto
Overlap
0.84 0.40 0.54
Arg1
Exact
0.63 0.28 0.39
0.81
0.50
0.51 0.62
0.68 0.58
full
Partial
0.74 0.36 0.48
0.91
0.70
0.52 0.66
0.68 0.69
auto
Overlap
0.83 0.37 0.51
Automatic Sys Output
Gold-labeled Sys Output
(Baseline result in blue color)
End2End Disc Pars Eval
25
Project
LOGO
Combo Result
P
Arg2
Arg1
R
F1
Exact
0.82 0.70 0.76
Partial
0.93 0.79 0.85
Overlap
0.96 0.83 0.89
Exact
0.67 0.31 0.43
Partial
0.81 0.44 0.57
Overlap
0.94 0.44 0.60
Auto Conn + Gold SPT
P
Arg2
Arg1
R
F1
Exact
0.76 0.61 0.68
Partial
0.91 0.73 0.81
Overlap
0.96 0.77 0.85
Exact
0.62 0.31 0.41
Partial
0.76 0.42 0.54
Overlap
0.87 0.43 0.58
Gold Conn + Auto SPT
End2End Disc Pars Eval
26
Project
LOGO
Result: replc. IOB chain
Arg2
Arg1
P
R
F1
Exact
0.80
0.74
0.77
Partial
0.91
0.85
0.88
Overlap
0.97
0.88
0.92
Exact
0.65
0.29
0.40
Partial
0.80
0.43
0.56
Overlap
0.97
0.43
0.60
End2End Disc Pars Eval
27