Transcript Slide 1
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier
Article Summary by Mark Vickers
Presentation Layout
Introduction to Research
Methods used
• Overview of GermaNet
• Overview of SPPC
• Details of their Approach
Results
Conclusion
Introduction
Methods Used
GermaNet
SPPC
Approach
Goal
Results
Conclusion
Automatic Acquisition of Domain Relevant
terms and their relations
How?
• Single-word Terms: TFIDF classification
• Domain Relevant Relations:
•
Use Lexico-syntactic patters:
Existing Ontologies
Collocation methods
Introduction
Methods Used
Input
GermaNet
SPPC
Approach
Results
Conclusion
No seed words
No syntactic patterns
Just a collection of classified documents
Introduction
Methods Used
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Builds on Other Systems:
GermaNet
(They built an Ontology Inference Machine
to search GermaNet)
•
For: Accessing Semantic relations
SPPC (Shallow Processing Production Center)
•
For: Linguistic Annotation
Accessing Semantic Relations
GermaNet
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
• Developed within the LSD Project at the Division of Computational
Linguistics of the Linguistics Department at the University of Tübingen,
Germany
•A lexical-semantic net
• German nouns, verbs, and adjectives are semantically grouped by an
underlying lexical concept (like a thesaurus) – called synsets
• Synsets are connected by semantic relations
• Lexical relationships include synonyms, antonyms, and “pertains to”
• Conceptual relations include hyponyms (‘is-a’), meronyms (‘has-a’),
entailment, and cause
• Based off the technology of WordNet (Princeton)
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
Inference Machine
GermaNet
SPPC
Approach
Results
Conclusion
• Allows GermaNet’s relations to be searched by other applications
• Provides 3 different functions:
• Retrieval of relations assigned to words
Example: “Find all synonyms for the word bar” rod, saloon, …
• Retrieval of relations between words
Example: “Find relations between Internet-Service-Provider and
Company” hyponym (so and ISP is a company)
• Navigation in the GermaNet graph
Introduction
Methods Used
Linguistic Annotation
SPPC
GermaNet
SPPC
Approach
Results
Conclusion
SPPC (Shallow Processing Production Center)
•
Robust German NLP that uses cascaded optimized
weighted finite state devices
SPPC parts:
• Tokenizer
• Lexical Processor
• Part-of-Speech Filtering
• Named-entity Finder
• Chunk recognizer
Introduction
Methods Used
Their Extraction Engine
GermaNet
SPPC
Approach
Results
Conclusion
Three Main components:
1.
TFIDF-based single-word term classifier
2.
Lexico-syntactic pattern finder
1.
2.
3.
Learns patterns based on known relations
Learns patterns based on term collocation methods
Relation Extractor
Introduction
Methods Used
Their Extraction Engine
GermaNet
SPPC
Approach
Results
Conclusion
1. Extract Singleword terms
Single-word
term extraction
(KFIDF)
2. Learn multi-word terms &
identify syntactic patterns
3. Learn patterns
from known relations
4. Extract related terms
using found lexicosyntactic patterns
Discovering Domain
Relevant Terms
Apply a TFIDF
measure: KFIDF
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Their Extraction Engine
Collocation
learner
GermaNet
SPPC
Approach
Results
Conclusion
Learning Term
Collocations
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Examples: man-eating shark, dead serious, depend
on, blue-collard
Measures:
•
•
Mutual Information (probabilities)
-
Occurrence of one word predicts the occurrence of another
Not practical for sparse data
Log-Likelihood Measures (contingency tables)
- Tells how much more likely the occurrence of one pair is over the
•
another
T-test
- Accept or reject the null hypothesis (terms are independent)
Introduction
Methods Used
Their Extraction Engine
Relation
Extractor
GermaNet
SPPC
Approach
Results
Conclusion
Learning Relations with
Lexico-syntactic patterns
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Example of a lexico-syntactic pattern finding relations
Pattern: “or other”
Sentence: Bruises, wounds, or other injuries are common.
Hyponym Relations: (Bruises, Injuries), (Wounds, Injuries)
------------------------------------------------------------------------------Pattern: “as well as”
Sentence: Cocaine as well as Hashish, and LSD…
Near synonyms? -- Now we can match LSD to Drug domain
Learning Relations with
Lexico-syntactic patterns
Term relation
extractor applies
newly extracted
lecixo-syntactic
patterns
Extracted terms
GermaNet
(semantic relationships)
Domain
independent
patterns
Domain
specific
patterns
Terms with semantic
relations
(synonymy, hyponymy,
meronymy)
Put semantically similar fragments
Into Landau-Finkelstien and Morin’s
Algorithm to cluster patterns
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
List of related
terms with
possible
hyponymous
relations
With Near
Synonyms – search
GermaNet to find
common hyponyms,
then assign the
newly found
hyponymous
relation to the term
not encode in the
GermaNet
Introduction
Methods Used
Results
GermaNet
SPPC
Approach
Results
Conclusion
• There’s a correlation between corpus size and precision
• LogLike delivers best result compared to Mutual Information And T-Test
• Noun-Verb collocations were most prominent and had best results
• In Drug domain, N-V = 56% precision and N-N = 41% precision
Introduction
Methods Used
Conclusion
GermaNet
SPPC
Approach
Results
Conclusion
KFIDF proves promising for single-word
term extraction
Statistical measures are suitable for freeword order languages like German
Extracting term relations useful for realworld IE
Introduction
Methods Used
My Evaluation
GermaNet
SPPC
Approach
Results
Conclusion
+ Uses well known existing systems
+ Seemingly no human interaction
+ Domain Adaptive (robust)
- Precision does not seem to be too impressive, and recall?
I’d like to see more results
We see from the past few papers that automatic ontology
generation approaches consist of:
•
•
Combining multiple strategies (statistics, existing ontologies)
Have a cyclic, machine learning nature.