Transcript Slide 1
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers Presentation Layout Introduction to Research Methods used • Overview of GermaNet • Overview of SPPC • Details of their Approach Results Conclusion Introduction Methods Used GermaNet SPPC Approach Goal Results Conclusion Automatic Acquisition of Domain Relevant terms and their relations How? • Single-word Terms: TFIDF classification • Domain Relevant Relations: • Use Lexico-syntactic patters: Existing Ontologies Collocation methods Introduction Methods Used Input GermaNet SPPC Approach Results Conclusion No seed words No syntactic patterns Just a collection of classified documents Introduction Methods Used Methods Used GermaNet SPPC Approach Results Conclusion Builds on Other Systems: GermaNet (They built an Ontology Inference Machine to search GermaNet) • For: Accessing Semantic relations SPPC (Shallow Processing Production Center) • For: Linguistic Annotation Accessing Semantic Relations GermaNet Introduction Methods Used GermaNet SPPC Approach Results Conclusion • Developed within the LSD Project at the Division of Computational Linguistics of the Linguistics Department at the University of Tübingen, Germany •A lexical-semantic net • German nouns, verbs, and adjectives are semantically grouped by an underlying lexical concept (like a thesaurus) – called synsets • Synsets are connected by semantic relations • Lexical relationships include synonyms, antonyms, and “pertains to” • Conceptual relations include hyponyms (‘is-a’), meronyms (‘has-a’), entailment, and cause • Based off the technology of WordNet (Princeton) Introduction Methods Used Accessing Semantic Relations WordNet GermaNet SPPC Approach Results Conclusion Introduction Methods Used Accessing Semantic Relations WordNet GermaNet SPPC Approach Results Conclusion Introduction Methods Used Accessing Semantic Relations WordNet GermaNet SPPC Approach Results Conclusion Introduction Methods Used Accessing Semantic Relations Inference Machine GermaNet SPPC Approach Results Conclusion • Allows GermaNet’s relations to be searched by other applications • Provides 3 different functions: • Retrieval of relations assigned to words Example: “Find all synonyms for the word bar” rod, saloon, … • Retrieval of relations between words Example: “Find relations between Internet-Service-Provider and Company” hyponym (so and ISP is a company) • Navigation in the GermaNet graph Introduction Methods Used Linguistic Annotation SPPC GermaNet SPPC Approach Results Conclusion SPPC (Shallow Processing Production Center) • Robust German NLP that uses cascaded optimized weighted finite state devices SPPC parts: • Tokenizer • Lexical Processor • Part-of-Speech Filtering • Named-entity Finder • Chunk recognizer Introduction Methods Used Their Extraction Engine GermaNet SPPC Approach Results Conclusion Three Main components: 1. TFIDF-based single-word term classifier 2. Lexico-syntactic pattern finder 1. 2. 3. Learns patterns based on known relations Learns patterns based on term collocation methods Relation Extractor Introduction Methods Used Their Extraction Engine GermaNet SPPC Approach Results Conclusion 1. Extract Singleword terms Single-word term extraction (KFIDF) 2. Learn multi-word terms & identify syntactic patterns 3. Learn patterns from known relations 4. Extract related terms using found lexicosyntactic patterns Discovering Domain Relevant Terms Apply a TFIDF measure: KFIDF Introduction Methods Used GermaNet SPPC Approach Results Conclusion Introduction Methods Used Their Extraction Engine Collocation learner GermaNet SPPC Approach Results Conclusion Learning Term Collocations Introduction Methods Used GermaNet SPPC Approach Results Conclusion Examples: man-eating shark, dead serious, depend on, blue-collard Measures: • • Mutual Information (probabilities) - Occurrence of one word predicts the occurrence of another Not practical for sparse data Log-Likelihood Measures (contingency tables) - Tells how much more likely the occurrence of one pair is over the • another T-test - Accept or reject the null hypothesis (terms are independent) Introduction Methods Used Their Extraction Engine Relation Extractor GermaNet SPPC Approach Results Conclusion Learning Relations with Lexico-syntactic patterns Introduction Methods Used GermaNet SPPC Approach Results Conclusion Example of a lexico-syntactic pattern finding relations Pattern: “or other” Sentence: Bruises, wounds, or other injuries are common. Hyponym Relations: (Bruises, Injuries), (Wounds, Injuries) ------------------------------------------------------------------------------Pattern: “as well as” Sentence: Cocaine as well as Hashish, and LSD… Near synonyms? -- Now we can match LSD to Drug domain Learning Relations with Lexico-syntactic patterns Term relation extractor applies newly extracted lecixo-syntactic patterns Extracted terms GermaNet (semantic relationships) Domain independent patterns Domain specific patterns Terms with semantic relations (synonymy, hyponymy, meronymy) Put semantically similar fragments Into Landau-Finkelstien and Morin’s Algorithm to cluster patterns Introduction Methods Used GermaNet SPPC Approach Results Conclusion List of related terms with possible hyponymous relations With Near Synonyms – search GermaNet to find common hyponyms, then assign the newly found hyponymous relation to the term not encode in the GermaNet Introduction Methods Used Results GermaNet SPPC Approach Results Conclusion • There’s a correlation between corpus size and precision • LogLike delivers best result compared to Mutual Information And T-Test • Noun-Verb collocations were most prominent and had best results • In Drug domain, N-V = 56% precision and N-N = 41% precision Introduction Methods Used Conclusion GermaNet SPPC Approach Results Conclusion KFIDF proves promising for single-word term extraction Statistical measures are suitable for freeword order languages like German Extracting term relations useful for realworld IE Introduction Methods Used My Evaluation GermaNet SPPC Approach Results Conclusion + Uses well known existing systems + Seemingly no human interaction + Domain Adaptive (robust) - Precision does not seem to be too impressive, and recall? I’d like to see more results We see from the past few papers that automatic ontology generation approaches consist of: • • Combining multiple strategies (statistics, existing ontologies) Have a cyclic, machine learning nature.