Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University

Download Report

Transcript Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University

Extended Named Entity Ontology
with Attribute Information
LREC 2008
May 28, 2008
Satoshi Sekine
New York University
Named Entity
• Named Entity is the most important information
unit in many Information Access applications
(such as IE, Q&A, Summarization, IR, MT)
• History
– MUC6 First define Named Entity
• Person, Location, Organization, Date, Time, Money, Percent
– IREX
• MUC6 + Artifact
– ACE (20 kinds),TIMEX (Standerdized Time Expression)
• Problem: Is it enough with 7~20 categories?
What is the meaning of names?
Extended Named Entities
• Extended to 200 categories (LREC 02,04)
– Finer categories
• Location →
→
→
→
GPE(Country, Province, City…)
Geographical region (landform, water form …)
Region(Domestic region, Continental region …)
Astral body(Star, Planet …)
– New categories
•
•
•
•
•
Line(Railroad, Road, Waterway, Tunnel Bridge …)
Product (Vehicle, Food, Cloth, Weapon, Award …)
Event (Games, Conference, Natural Phenomena, War …)
Disease, Currency, God …
Era, Age, Color, Unit
Development of ENE
• Long time, steady development for years
–
–
–
–
–
Capital words in English newspaper (~2000)
Q&A, IE examples
Refer Encyclopedia, WordNet,,,
Refer Related work, Related systems
100->140->200->210
• Used in IE and Q&A system and refine the
definition
• http://nlp.cs.nyu.edu/ene
What is Named Entity?
• Name is only a label
• Properties and Attributes are the essential meaning
• “Hudson River” is still “Hudson River” even if people call
it “Muh-he-kun-ne-tuk”
• Meaning of the entity can discerned from
– “the river is in New York State”
– “It is 507 km in length”
– “It runs Adirondack Mountains to Upper New York Bay”
• Name is only a label which can be used to refer to the river
Attributes
• “River” has attributes such as “source location”,
“outflow”, “length” and so on
• “People” has attributes such as “occupation”, ”birth
date”, “nationality” and so on
• Design those attributes and construct the
knowledge will be very useful on the applications
of NLP technologies
– Q&A, IE, IR, Dialogue, co-reference…
Design of the attributes
• We use encyclopedia
– Encyclopedia is the knowledge archive of
named entities (dictionary for common words)
– Description must contain many attributes
• We will extract attributes from description
of named entities (samples) and compile
general attributes for each category
Procedure
1.
2.
Extract (up to 50) sample name entity instances for each
categories. We use a famous Japanese Encyclopedia,
“Nippon Daihyakka (Nipponica)” published by
Shogakkan Inc.
Annotators extract possible attribute values from
description of the samples, and name the attribute label
(Attribute values must be a noun phrase or equivalent)
3.
4.
5.
Unify the attribute labels and identify the important
(essential and mandatory) attributes for each category
Redesign the ENE categories
Construct a set of attributes
Attributes for Person
(20)
Example of value
Freq.
ENE
Vocation
Professional baseball player
46(100)
Vocation
Nationality
American, Chinese, Japanese
29(63)
Country
Career
Professor at Yale University
26(57)
Vocation
Guernica, Mona Lisa
25(54)
Product, Facility
M.A. in German at Cambridge
20(44)
School
Paris, Manchester, Shanghai
19(41)
City
State of Illinois, Sichuan
18(39)
Province
England, New york
12(26)
Location
Mentor
Andrea del Verrocchio
10(22)
Person
Death date
04/23/1704, unknown
10(22)
Date
The 11th Century
8(17)
Era
Academy Award, MVP, Nobel Prize
8(17)
Award
Saint Nicholas
8(17)
Person
Santa, father Christmas
8(17)
Person
Knight, an honorary degree at Yale
6(13)
Title
World Series, 1955 piano competition in Paris
6(13)
Game
New York, Brirmingham
5(11)
Location
John B. Kelly, Sr.
5(11)
Person
Car accident, Guillotine
5(11)
Masterpiece
Graduate
Hometown
Native Providence
Previous stay
Era
Award
Real name
Another name
Title
Competition
Place of death
Father
Cause of death
Attributes for International Organization
17
Example of value
Freq.
ENE
Another name
CARICOM, EMU, CCDN
30(75)
Inter. Org.
Year founded
1/10/1920, 2004
26(65
Date
Purpose of foundation
Encouragement of the African economy
23(58)
Number of signatories
170 countries, 190
20(50)
League of Nations, International Labor Organization
16(40)
New York, Prague
13(33)
City
Covenant of the League of Nations
12(30)
Rule
Top Organization
EU (the European Union)
11(28)
Inter. Org.
Member
China, Senegal, Norway
10(25)
Country
African Union (OAU), Caribbean Free Trade Association
9(23)
Inter. Org.
International Amateur Athletics Federation
8(20)
Organization
Board of directors, Special UN Organization
7(18)
Japan, Czech, Ethiopia
7(18)
Country
Year of dissolution
1974, 06/20/1977
6(15)
Date
Proposer Country
USA, England, Luxemburg
5(13)
Country
United Nations Economic and Social Commission for Asia and the Pacific
5(13)
Inter. Org.
Eisenhower, Colonel Qadhafi , Pierre Wellner
4(10)
Person
Type
Headquarters
Agreement, Proposal
Predecessor
Subsidiary Organization
Rank
Headquarters (country)
Successor Organization
Proposer (Person)
N_Country
Problems
we encountered and/or we haven’t solved yet
1.
Entity dependent attributes
ex) Song/Poem of river, “Loreley” on “Rhine River”
2.
Fineness of attribute
ex) Bird’s “color of head” or “color of body”
3.
Span of value expression
Longer than a noun phrase, ex) definition
4.
Structure in value
ex) Museum’s exhibit has own attributes (author, year)
5.
ENE category definition
Attributes are useful to define categories, but not always
6.
Distinction of mandatory and optional
Distinction of Property and attribute
Inter-annotator Agreement
• 2 annotator work on Person, Landform, International
Organization and Academy
• They agree more often on attributes which have values
very often
• They disagree the span of values
Percentage of
having values
~60%
~40%
~10%
Agree
13
37
61
Disagree
2
3
16
Summary
• Design Attributes on Extended Named Entity
– Attributes are important in applications
– Created based on Encyclopedia description
– Document available
(in Japanese, English in progress)
– Dictionary / Tagger in development
• http://nlp.cs.nyu.edu/ene
Application
• Q&A/IR
– What is the 15th highest mountain in the world
– How many mountains are there which is higher than
6000m
– Tell me the major league player from New York
– I met Satoshi Sekine from New York
• Document understanding
– “Yankees came back home!!”
– “I visited the Marakech’s main sightseeing places”