Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University 10/9/01 PropBank.
Download ReportTranscript Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University 10/9/01 PropBank.
Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University 10/9/01 PropBank 1 Outline Overview (Ace consensus: BBN,NYU,MITRE,Penn) Motivation Approach • Guidelines, lexical resources, frame sets • Tagging process, hand correction of automatic tagging Status: accuracy, progress Colleagues: Joseph Rosenzweig, Paul Kingsbury, Hoa Dang, Karin Kipper, Scott Cotton, Laren Delfs, Christiane Fellbaum 10/9/01 PropBank 2 Proposition Bank: Generalizing from Sentences to Propositions Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met Powell met with Zhu Rongji consult Proposition: meet(Powell, Zhu Rongji) Powell and Zhu Rongji had a meeting meet(Somebody1, Somebody2) ... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. 10/9/01 meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) PropBank 3 Penn English Treebank 1.3 million words Wall Street Journal and other sources Tagged with Part-of-Speech Syntactically Parsed Widely used in NLP community Available from Linguistic Data Consortium 10/9/01 PropBank 4 A TreeBanked Sentence (S (NP-SBJ Analysts) (VP have (VP been VP (VP expecting (NP (NP a GM-Jaguar pact) VP have (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) NP-SBJ been VP (VP would Analysts (VP give expecting NP (NP the U.S. car maker) SBAR (NP (NP an eventual (ADJP 30 %) stake) NP S (PP-LOC in (NP the British a GM-Jaguar WHNP-1 company)))))))))))) VP pact that NP-SBJ VP *T*-1 would NP give S Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company. 10/9/01 NP the US car maker PropBank PP-LOC NP an eventual 30% stake in NP the British company 5 The same sentence, PropBanked (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give a GM-Jaguar Arg2 (NP the U.S. car maker) pact Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Arg0 that would give have been expecting Arg0 Analysts Arg1 *T*-1 Arg2 the US car maker 10/9/01 an eventual 30% stake in the British company expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) PropBank 6 Motivation Why do we need accurate predicate-argument relations? They have a major impact on Information Processing. Ex: Korean/English Machine Translation: ARL/SBIR • CoGenTex, Penn, Systran (K/E Bilinugal Lexicon, 20K) • 4K words ( < 500 words from Systran, military messages) • Plug and play architecture based on DsyntS (rich dependency structure) • Converter bug led to random relabeling of predicate arguments • Correction of predicate argument labels alone led to tripling of acceptable sentence output 10/9/01 PropBank 7 Focusing on Parser comparisons 200 sentences hand selected to represent “good” translations given a correct parse. Used to compare: • Corrected DsyntS output • Juntae’s parser output (off-the-shelf) • Anoop’s parser output (Treebank trained, 95% F) 10/9/01 PropBank 8 Evaluating translation quality Compare DLI Human translation to system output (200) Criteria used by human judges (2 or more, not blind) • • • • [g] = good, exactly right [f1] = fairly good, but small grammatical mistakes [f2] = Needs fixing, but vocabulary basically there [f3] = Needs quite a bit of fixing, usually some un-translated vocabulary, but most v. is right • [m] = seems grammatical, but semantically wrong, actually misleading • [i] = irredeemable, really wrong, major problems 10/9/01 PropBank 9 Results Comparison = 200 sent. Correct Juntae Anoop 0 20 40 60 80 100 Anoop Juntae Correct Bad 5 9 3 Fixable 85 67 11 Good 10 24 85 10/9/01 PropBank 120 10 Plug and play? Converter used to map Parser outputs into MT DsyntS format • Bug in the converter affected both systems • Predicate argument structure labels were being lost in the conversion process, relabeled randomly The converter was also still tuned to Juntae’s parse output, needed to be customized to Anoop’s 10/9/01 PropBank 11 Anoop’s parse -> MTW DsyntS –0010Target: Unit designations are normally transmitted in code. –0010Corrected: Normally unit designations are notified in the code. –0010Anoop: Normally it is notified unit designations in code. P = Arg0 C = Arg1 designations notified normally code unit 10/9/01 PropBank 12 Anoop’s parse -> MTW DsyntS 0022Target: Under what circumstances does radio inteference occur? 0022Corrected: In what circumstances does the interference happen in the radio? 0022Anoop: Do in what circumstance happen interference in radio? P = ArgM C = Arg0 happen P = Arg0 C = Arg1 circumstances radio interference what 10/9/01 PropBank 13 New and Old Results Comparison Correct J2 A2 0% 20% 40% 60% 80% 100% A2 A1 J2 J1 Correct Bad 4.5 5 4 9 3 Fixable 60.5 85 64.5 67 11 37 10 31 24 85 Good 10/9/01 PropBank 14 English PropBank 1M words of Treebank over 2 years, May’01-03 New semantic augmentations • Predicate-argument relations for verbs • label arguments: Arg0, Arg1, Arg2, … • First subtask, 300K word financial subcorpus (12K sentences, 35K+ predicates) Spin-off: Guidelines (necessary for annotators) • English lexical resource • 6000+ verbs with labeled examples, rich semantics 10/9/01 PropBank 15 Task: not just undoing passives The earthquake shook the building. <arg0> <WN3> <arg1> The walls shook; the building rocked. <arg1> <WN3>; <arg1> <WN1> The guidelines = lexicon with examples: Frames Files 10/9/01 PropBank 16 Guidelines: Frames Files Created manually – Paul Kingsbury • working on semi-automatic expansion Refer to VerbNet, WordNet and Framenet Currently in place for 230 verbs • Can expand to 2000+ using VerbNet • Will need hand correction Use “semantic role glosses” unique to each verb (map to Arg0, Arg1 labels appropriate to class) 10/9/01 PropBank 17 Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: REL: Arg1: rates 10/9/01 Portfolio managers expect further declines in interest PropBank 18 Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation 10/9/01 PropBank 19 The same sentence, PropBanked (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give a GM-Jaguar Arg2 (NP the U.S. car maker) pact Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Arg0 that would give have been expecting Arg0 Analysts Arg1 *T*-1 Arg2 the US car maker 10/9/01 an eventual 30% stake in the British company expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) PropBank 20 Complete Sentence Analysts have been expecting a GM-Jaguar pact that *T*-1 would give the U.S. car maker an eventual 30% stake in the British company and create joint venture that *T*-2 would produce an executive-model range of cars. 10/9/01 PropBank 21 How are arguments numbered? Examination of example sentences Determination of required / highly preferred elements Sequential numbering, Arg0 is typical first argument, except O ergative/unaccusative verbs (shake example) O Arguments mapped for "synonymous" verbs 10/9/01 PropBank 22 Additional tags (arguments or adjuncts?) Variety of ArgM’s (Arg#>4): 10/9/01 • TMP - when? • LOC - where at? • DIR - where to? • MNR - how? • PRP -why? • REC - himself, themselves, each other • PRD -this argument refers to or modifies another • ADV -others PropBank 23 Tense/aspect Verbs also marked for tense/aspect O O O O Passive Perfect Progressive Infinitival Modals and negation marked as ArgMs 10/9/01 PropBank 24 Ergative/Unaccusative Verbs: rise Roles Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. *Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have used ArgM-Source, ArgM-Goal. Arbitrary distinction. 10/9/01 PropBank 25 Synonymous Verbs: add in sense rise Roles: Arg1 = Logical subject, patient, thing rising/gaining/being added to Arg2 = EXT, amount risen Arg4 = end point The Nasdaq composite index added 1.01 to 456.6 on paltry volume. 10/9/01 PropBank 26 Phrasal Verbs Put together Put in Put off Put on Put out Put up ... 10/9/01 PropBank 27 Frames: Multiple Rolesets Rolesets are not necessarily consistent between different senses of the same verb • Verb with multiple senses can have multiple frames, but not necessarily Roles and mappings onto argument labels are consistent between different verbs that share similar argument structures, Similar to Framenet • • Levin / VerbNet classes http://www.cis.upenn.edu/~dgildea/VerbNet/ Out of the 179 most frequent verbs: • • • 10/9/01 1 Roleset – 92 2 rolesets – 45 3+ rolesets – 42 (includes light verbs) PropBank 28 Annotation procedure Extraction of all sentences with given verb First pass – automatic tagging Second pass: Double blind hand correction • Variety of backgrounds • less syntactic training than for treebanking Script to discover discrepancies Third pass: Solomonization (adjudication) 10/9/01 PropBank 29 Inter-annotator agreement Quote 100 100 Comment 92 90 Compare 91 Earn 90 Announce 87 End 84 Seem 83 Result 83 80 Want 75Fall 76 Want 75 70 Result 82 82 Approve 81 Resign Close 80 Elect 75 Change 84 Return 73 BeginBid 7070 Cost 67 Know 61 Call 59 60 KeepSell 52 52 Leave 50 50 Buy 48 Find 61 Work 63 Hit 57 Decline 53 Climb 62 Cause 55 Add 51 Base 46 Offer 43 Name 41 40 Bring 39 See 34 Gain 29 30 20 Tell 18 Believe 11 10 0 10/9/01 PropBank 30 Annotator Accuracy vs. Gold Standard One version of annotation chosen (sr. annotator) Solomon modifies => Gold Standard Verb Acquire Add Announce Bid Cost Decline Hit Keep Know 10/9/01 Darren 85% 86% 90% Erwin 50% 78% 96% PropBank Kate Katherine 96% 93% 99% 95% 89% 61% 96% 60% 92% 53% 89% 69% 31 Status 179 verbs framed (+ Senseval2 verbs) 97 verbs first-passed O 12,300+ predicates O Does not include ~3000 predicates tagged for Senseval 54 verbs second-passed O 6600+ predicates 9 verbs solomonized O 885 predicates 10/9/01 PropBank 32 Throughput Framing: approximately 2 verbs per hour Annotation: approximately 50 sentences per hour Solomonization: approximately 1 hour per verb 10/9/01 PropBank 33 Automatic Predicate Argument Tagger Predicate argument labels • Uses TreeBank “cues” • Consults lexical semantic KB —Hierarchically organized verb subcategorization frames and alternations associated with tree templates —Ontology of noun-phrase referents —Multi-word lexical items • Matches annotated tree templates against parse in Treeadjoining Grammar style • standoff annotation in external file referencing treenodes Preliminary accuracy rate of 83.7% (800+ predicates) 10/9/01 PropBank 34 Summary Predicate-argument structure labels are arbitrary to a certain degree, but still consistent, and generic enough to be mappable to particular theoretical frameworks Automatic tagging as a first pass makes the task feasible Agreement and accuracy figures are reassuring 10/9/01 PropBank 35 Solomonization Source tree: Intel told analysts that the company will resume shipments of the chips within two to three weeks . *** kate said: arg0 : Intel arg1 : the company will resume shipments of the chips within two to three weeks arg2 : analysts *** erwin said: arg0 : Intel arg1 : that the company will resume shipments of the chips within two to three weeks arg2 : analysts 10/9/01 PropBank 36 Solomonization Such loans to Argentina also remain classified as non-accruing, *TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of interest income in the third period. *** kate said: argM-TMP : in the third period arg3 : the bank arg2 : $ 10 million *TRACE*-*U* of interest income arg1 : *TRACE*-1 *** erwin said: argM-TMP : in the third period arg3 : the bank arg2 : $ 10 million *TRACE*-*U* of interest income arg1 : *TRACE*-1 10/9/01 37 Such loans to Argentina PropBank Solomonization Also , substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth. *** kate said: argM-MNR : relative to earnings growth arg3-PRD : flat arg1 : its tax outlay arg0 : the company *** katherine said: argM-ADV : relative to earnings growth arg3-PRD : flat arg1 : its tax outlay arg0 : the company 10/9/01 PropBank 38