Transcript Slide 1

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier
Article Summary by Mark Vickers
Presentation Layout




Introduction to Research
Methods used
• Overview of GermaNet
• Overview of SPPC
• Details of their Approach
Results
Conclusion
Introduction
Methods Used
GermaNet
SPPC
Approach
Goal

Results
Conclusion
Automatic Acquisition of Domain Relevant
terms and their relations
How?
• Single-word Terms: TFIDF classification
• Domain Relevant Relations:
•
Use Lexico-syntactic patters:


Existing Ontologies
Collocation methods
Introduction
Methods Used
Input



GermaNet
SPPC
Approach
Results
Conclusion
No seed words
No syntactic patterns
Just a collection of classified documents
Introduction
Methods Used
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Builds on Other Systems:

GermaNet
(They built an Ontology Inference Machine
to search GermaNet)
•

For: Accessing Semantic relations
SPPC (Shallow Processing Production Center)
•
For: Linguistic Annotation
Accessing Semantic Relations
GermaNet
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
• Developed within the LSD Project at the Division of Computational
Linguistics of the Linguistics Department at the University of Tübingen,
Germany
•A lexical-semantic net
• German nouns, verbs, and adjectives are semantically grouped by an
underlying lexical concept (like a thesaurus) – called synsets
• Synsets are connected by semantic relations
• Lexical relationships include synonyms, antonyms, and “pertains to”
• Conceptual relations include hyponyms (‘is-a’), meronyms (‘has-a’),
entailment, and cause
• Based off the technology of WordNet (Princeton)
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
WordNet
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Accessing Semantic Relations
Inference Machine
GermaNet
SPPC
Approach
Results
Conclusion
• Allows GermaNet’s relations to be searched by other applications
• Provides 3 different functions:
• Retrieval of relations assigned to words
Example: “Find all synonyms for the word bar”  rod, saloon, …
• Retrieval of relations between words
Example: “Find relations between Internet-Service-Provider and
Company”  hyponym (so and ISP is a company)
• Navigation in the GermaNet graph
Introduction
Methods Used
Linguistic Annotation
SPPC

GermaNet
SPPC
Approach
Results
Conclusion
SPPC (Shallow Processing Production Center)
•
Robust German NLP that uses cascaded optimized
weighted finite state devices
SPPC parts:
• Tokenizer
• Lexical Processor
• Part-of-Speech Filtering
• Named-entity Finder
• Chunk recognizer
Introduction
Methods Used
Their Extraction Engine
GermaNet
SPPC
Approach
Results
Conclusion
Three Main components:
1.
TFIDF-based single-word term classifier
2.
Lexico-syntactic pattern finder
1.
2.
3.
Learns patterns based on known relations
Learns patterns based on term collocation methods
Relation Extractor
Introduction
Methods Used
Their Extraction Engine
GermaNet
SPPC
Approach
Results
Conclusion
1. Extract Singleword terms
Single-word
term extraction
(KFIDF)
2. Learn multi-word terms &
identify syntactic patterns
3. Learn patterns
from known relations
4. Extract related terms
using found lexicosyntactic patterns
Discovering Domain
Relevant Terms

Apply a TFIDF
measure: KFIDF
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Introduction
Methods Used
Their Extraction Engine
Collocation
learner
GermaNet
SPPC
Approach
Results
Conclusion
Learning Term
Collocations


Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Examples: man-eating shark, dead serious, depend
on, blue-collard
Measures:
•
•
Mutual Information (probabilities)
-
Occurrence of one word predicts the occurrence of another
Not practical for sparse data
Log-Likelihood Measures (contingency tables)
- Tells how much more likely the occurrence of one pair is over the
•
another
T-test
- Accept or reject the null hypothesis (terms are independent)
Introduction
Methods Used
Their Extraction Engine
Relation
Extractor
GermaNet
SPPC
Approach
Results
Conclusion
Learning Relations with
Lexico-syntactic patterns
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
Example of a lexico-syntactic pattern finding relations
Pattern: “or other”
Sentence: Bruises, wounds, or other injuries are common.
Hyponym Relations: (Bruises, Injuries), (Wounds, Injuries)
------------------------------------------------------------------------------Pattern: “as well as”
Sentence: Cocaine as well as Hashish, and LSD…
Near synonyms? -- Now we can match LSD to Drug domain
Learning Relations with
Lexico-syntactic patterns
Term relation
extractor applies
newly extracted
lecixo-syntactic
patterns
Extracted terms
GermaNet
(semantic relationships)
Domain
independent
patterns
Domain
specific
patterns
Terms with semantic
relations
(synonymy, hyponymy,
meronymy)
Put semantically similar fragments
Into Landau-Finkelstien and Morin’s
Algorithm to cluster patterns
Introduction
Methods Used
GermaNet
SPPC
Approach
Results
Conclusion
List of related
terms with
possible
hyponymous
relations
With Near
Synonyms – search
GermaNet to find
common hyponyms,
then assign the
newly found
hyponymous
relation to the term
not encode in the
GermaNet
Introduction
Methods Used
Results
GermaNet
SPPC
Approach
Results
Conclusion
• There’s a correlation between corpus size and precision
• LogLike delivers best result compared to Mutual Information And T-Test
• Noun-Verb collocations were most prominent and had best results
• In Drug domain, N-V = 56% precision and N-N = 41% precision
Introduction
Methods Used
Conclusion



GermaNet
SPPC
Approach
Results
Conclusion
KFIDF proves promising for single-word
term extraction
Statistical measures are suitable for freeword order languages like German
Extracting term relations useful for realworld IE
Introduction
Methods Used
My Evaluation





GermaNet
SPPC
Approach
Results
Conclusion
+ Uses well known existing systems
+ Seemingly no human interaction
+ Domain Adaptive (robust)
- Precision does not seem to be too impressive, and recall?
I’d like to see more results
We see from the past few papers that automatic ontology
generation approaches consist of:
•
•
Combining multiple strategies (statistics, existing ontologies)
Have a cyclic, machine learning nature.