Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University
Download ReportTranscript Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University
Extended Named Entity Ontology with Attribute Information LREC 2008 May 28, 2008 Satoshi Sekine New York University Named Entity • Named Entity is the most important information unit in many Information Access applications (such as IE, Q&A, Summarization, IR, MT) • History – MUC6 First define Named Entity • Person, Location, Organization, Date, Time, Money, Percent – IREX • MUC6 + Artifact – ACE (20 kinds),TIMEX (Standerdized Time Expression) • Problem: Is it enough with 7~20 categories? What is the meaning of names? Extended Named Entities • Extended to 200 categories (LREC 02,04) – Finer categories • Location → → → → GPE(Country, Province, City…) Geographical region (landform, water form …) Region(Domestic region, Continental region …) Astral body(Star, Planet …) – New categories • • • • • Line(Railroad, Road, Waterway, Tunnel Bridge …) Product (Vehicle, Food, Cloth, Weapon, Award …) Event (Games, Conference, Natural Phenomena, War …) Disease, Currency, God … Era, Age, Color, Unit Development of ENE • Long time, steady development for years – – – – – Capital words in English newspaper (~2000) Q&A, IE examples Refer Encyclopedia, WordNet,,, Refer Related work, Related systems 100->140->200->210 • Used in IE and Q&A system and refine the definition • http://nlp.cs.nyu.edu/ene What is Named Entity? • Name is only a label • Properties and Attributes are the essential meaning • “Hudson River” is still “Hudson River” even if people call it “Muh-he-kun-ne-tuk” • Meaning of the entity can discerned from – “the river is in New York State” – “It is 507 km in length” – “It runs Adirondack Mountains to Upper New York Bay” • Name is only a label which can be used to refer to the river Attributes • “River” has attributes such as “source location”, “outflow”, “length” and so on • “People” has attributes such as “occupation”, ”birth date”, “nationality” and so on • Design those attributes and construct the knowledge will be very useful on the applications of NLP technologies – Q&A, IE, IR, Dialogue, co-reference… Design of the attributes • We use encyclopedia – Encyclopedia is the knowledge archive of named entities (dictionary for common words) – Description must contain many attributes • We will extract attributes from description of named entities (samples) and compile general attributes for each category Procedure 1. 2. Extract (up to 50) sample name entity instances for each categories. We use a famous Japanese Encyclopedia, “Nippon Daihyakka (Nipponica)” published by Shogakkan Inc. Annotators extract possible attribute values from description of the samples, and name the attribute label (Attribute values must be a noun phrase or equivalent) 3. 4. 5. Unify the attribute labels and identify the important (essential and mandatory) attributes for each category Redesign the ENE categories Construct a set of attributes Attributes for Person (20) Example of value Freq. ENE Vocation Professional baseball player 46(100) Vocation Nationality American, Chinese, Japanese 29(63) Country Career Professor at Yale University 26(57) Vocation Guernica, Mona Lisa 25(54) Product, Facility M.A. in German at Cambridge 20(44) School Paris, Manchester, Shanghai 19(41) City State of Illinois, Sichuan 18(39) Province England, New york 12(26) Location Mentor Andrea del Verrocchio 10(22) Person Death date 04/23/1704, unknown 10(22) Date The 11th Century 8(17) Era Academy Award, MVP, Nobel Prize 8(17) Award Saint Nicholas 8(17) Person Santa, father Christmas 8(17) Person Knight, an honorary degree at Yale 6(13) Title World Series, 1955 piano competition in Paris 6(13) Game New York, Brirmingham 5(11) Location John B. Kelly, Sr. 5(11) Person Car accident, Guillotine 5(11) Masterpiece Graduate Hometown Native Providence Previous stay Era Award Real name Another name Title Competition Place of death Father Cause of death Attributes for International Organization 17 Example of value Freq. ENE Another name CARICOM, EMU, CCDN 30(75) Inter. Org. Year founded 1/10/1920, 2004 26(65 Date Purpose of foundation Encouragement of the African economy 23(58) Number of signatories 170 countries, 190 20(50) League of Nations, International Labor Organization 16(40) New York, Prague 13(33) City Covenant of the League of Nations 12(30) Rule Top Organization EU (the European Union) 11(28) Inter. Org. Member China, Senegal, Norway 10(25) Country African Union (OAU), Caribbean Free Trade Association 9(23) Inter. Org. International Amateur Athletics Federation 8(20) Organization Board of directors, Special UN Organization 7(18) Japan, Czech, Ethiopia 7(18) Country Year of dissolution 1974, 06/20/1977 6(15) Date Proposer Country USA, England, Luxemburg 5(13) Country United Nations Economic and Social Commission for Asia and the Pacific 5(13) Inter. Org. Eisenhower, Colonel Qadhafi , Pierre Wellner 4(10) Person Type Headquarters Agreement, Proposal Predecessor Subsidiary Organization Rank Headquarters (country) Successor Organization Proposer (Person) N_Country Problems we encountered and/or we haven’t solved yet 1. Entity dependent attributes ex) Song/Poem of river, “Loreley” on “Rhine River” 2. Fineness of attribute ex) Bird’s “color of head” or “color of body” 3. Span of value expression Longer than a noun phrase, ex) definition 4. Structure in value ex) Museum’s exhibit has own attributes (author, year) 5. ENE category definition Attributes are useful to define categories, but not always 6. Distinction of mandatory and optional Distinction of Property and attribute Inter-annotator Agreement • 2 annotator work on Person, Landform, International Organization and Academy • They agree more often on attributes which have values very often • They disagree the span of values Percentage of having values ~60% ~40% ~10% Agree 13 37 61 Disagree 2 3 16 Summary • Design Attributes on Extended Named Entity – Attributes are important in applications – Created based on Encyclopedia description – Document available (in Japanese, English in progress) – Dictionary / Tagger in development • http://nlp.cs.nyu.edu/ene Application • Q&A/IR – What is the 15th highest mountain in the world – How many mountains are there which is higher than 6000m – Tell me the major league player from New York – I met Satoshi Sekine from New York • Document understanding – “Yankees came back home!!” – “I visited the Marakech’s main sightseeing places”