Transcript Automatic Labeling of Semantic Roles
Automatic Labeling of Semantic Roles
By Daniel Gildea and Daniel Jurafsky Presented By Kino Coursey
Outline
Their Goals Semantic Roles Related Work Methodology Results Their Conclusions
Their Goals
To create a system that can identify the semantic relationships or semantic roles, filled by the syntatic constituents of a sentence and place them into a semantic frame.
Lexical and syntactic features are derived from parse trees and are used to make statistical classifiers from hand –annotated training data
Potential users
Shallow semantic analysis would be useful in a number of NLP tasks Domain independent starting point for information extraction Word sense Disambiguation based on current semantic role Intermediate representation for translation and summarization Adding semantic roles could improve parser and speech recognition accuracy
Their Approach
Treat the role assignment problem as being like other tagging problems Use recent successful methods in probabilistic parsing and statistical classification Use the hand-labeled FrameNet database to provide training info over 50,000 sentences from the BNC FrameNet roles defines the tag set
Semantic Roles
Historically two types of roles • Very abstract like AGENT & PATIENT • Verb specific like EATER and EATEN for “eat” FrameNet defines and intermediate, schematic representation of situations, with participants, props and conceptual roles.
A frame being a situation description can be activated by multiple verbs or other constituents
Frame Advantages
Avoids difficulty with trying to find a small set of universal, abstract or thematic roles Has as many roles as necessary to describe the situation with minimal information loss and discrimination Abstract roles can be defined as high level roles of abstract frames such as “action” or “motion” at the top of the hiearchy
Example Domains and Frames
Examples of Semantic Roles
Example FrameNet Markup
Related Work
Traditional parsing and understanding systems rely on hand developed grammars • Must anticipate the way semantic roles are realized through syntax • Time consuming to develop • Limited coverage (human proscriptive recall problem)
Related Work
Others have used data-driven approaches for template-based semantic analysis in “shallow” systems Miller(1996) Air Travler Information System, probability of a constituent filling slots in frames. Each node could have both semantic and syntactic elements Data-driven information extraction by Riloff. Automatically derived case frames for words in domain
Related Work
Blaheta and Charniak used a statistical algorithm for assigning Penn Tree bank functional words with F-measure of 87% with 99% when ‘no tag’ is valid choice
Methodology
Two part strategy • Identify the boundaries of the frame elements in the sentence • Given the boundaries label each with the correct role Statistics based: train a classifier on labeled training set then test on unlabeled test set
Methodology
Training • Trained using Collins parser on 37000 sentences • Match annotated frame elements to parse constituents • Extract various features from string of words and parse tree Testing • Run parser on test sentences and extract same features • Probability for each semantic role r is computed from features
Features used
Phrase Type: Standard syntactic type (NP,VP,S) Grammatical Function • Relation to rest of sentence (subject of verb, object of verb…) • Limited to NP’s Position • Before or after predicate defining the frame • Correlated to Grammatical functions • Redundant backup information Voice: Used 10 passive-identifying patterns for active/passive classification Head Word: head words of each constituent
Parsed Sentence with FrameNet role assignments
Testing
FrameNet corpus test set • 10% of each target word -> test set • 10% of each target word -> tuning set • Words with fewer than 10 ignored • Average number of sentences per target word = 34 [Too SPARSE !!!] • Average number of sentences per frame = 732
Sparseness Problem
Problem: Data is too sparse to directly calculate probabilities on the full set of features Approach: Build classifiers by combining probabilities from distributions conditioned on combinations of features Additional problem: FrameNet data was selected to show prototypical examples of semantic frames, not as a random sample for each frame Approach : Collect more data in the future
Results: Probability Distributions
Coverage= % of test data seen in training Accuracy = % of test data correctly predicted (similar to precision) Performance = overall % of test data for which correct role is predicted (similar to recall)
Results: Simple Probabilities
Used simple empirical distributions
Results: Linear Interpolation
Results: Geometric mean in the log domain
Results: Combining data
Schemes of giving more weight to distributions with more data did not have a significant effect Role assignments only depended on relative ranking so fine tuning makes little difference Backoff combination: use less specific data only if more specific is missing
Results: Linear Backoff was the best
Final system performance 80.4% up from the 40.9% baseline
Linear Backoff performed 80.4% on development set and 76.9% on Test set Baseline performed 40.9% on development set and 40.6% on Test set
Results: Their Discussions
Constituent position relative to target word + active/passive info (78.8%) performed as well as reading grammatical functions off the parse tree (79.2%) Using active/passive info can improve performance from 78.8% to 80.5%. 5% of examples were passives Lexicalization via head words when available is good • P(role|head,target) is available for only 56.0% of data • P(role|head,target) is 86.7% correct without using any syntactic features.
Results: Lexical Clustering
Since head words performed so well but are so sparse, try to use clustering to improve coverage Compute soft clusters for nouns using only frame elements with noun head words from the BNC
P(r|h,nt,t)=SumOf( P(r|c,nt,t)*P(c|h), over C clusters h belongs to)
Unclustered data is 87.6% correct but only covers 43.7% Clustered head words 79.9% for the 97.9% of nominal head words in vocabulary.
Adding clustering of NP constituents improved performance from 80.4% to 81.2% (Question: Would other lexical semantic resources help?)
Automatic Identification of Frame Element Boundaries
Original experiments used hand annotated frame element boundaries Used features in a sentence parse tree likely to be a frame element System given human annotated target word and frame Main feature used: path from target word through parse tree to constituent, using upward and downward links Used P(fe|path), P(fe|path,target) and P(fe|head,target)
Automatic Identification of Frame Element Boundaries
P(fe|path,target) peforms relatively poorly since only about 30 sentences for each target word P(fe|head,target) alone not a useful classifier, but helps with linear interpolation Can only ID frame elements that have a constituent in the parse tree, but can be helped with partial matching With relaxed matching, 86% agreement with hand annotations When correctly ID’ed FE’s are fed into the previous role labeler, 79.6% are correct, in the same range as with human data
(Question: If it is correctly ID’ed, shouldn’t this be the case?)
Their Conclusions
Their system can label roles with some accuracy Lexical statistics on constituents head words were most important feature used Problem is while very accurate they are very sparse Key to high overall performance was combining features Combined system was more accurate than any feature alone, the specific method was less important