Transcript [Slides]

CLASSIFYING ENTITIES INTO
AN INCOMPLETE ONTOLOGY
Bhavana Dalvi, William W. Cohen, Jamie Callan
School of Computer Science,
Carnegie Mellon University
Motivation

Existing Techniques


Semi-supervised Hierarchical Classification: Carlson WSDM’10
Extending knowledge bases: Finding new relations or attributes of
existing concepts Mohamed et al. EMNLP’11

Unsupervised ontology discovery:
Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09

Evolving Web-scale datasets



Billions of entities and hundreds of thousands of concepts
Difficult to create a complete ontology
Hierarchical classification of entities into incomplete ontologies is
needed
Contributions

Hierarchical Exploratory EM



Adds new instances to the existing classes
Discovers new classes and adds them at appropriate places in the
ontology
Class constraints:


Inclusion: Every entity that is “Mammal” is also an “Animal”
Mutual Exclusion: If an entity is “Electronic Device” then its not
“Mammal”
Problem Definition

Input





Large set of data-points : 𝑋1 … 𝑋𝑛
Some known classes : 𝐶1 … 𝐶𝑘
Class constraints 𝑍𝑘 between 𝑘 classes
Small number of seeds per known class: |seeds| ≪ n
Output



Labels for all data-points 𝑋𝑖
Discover new classes from data: 𝑪𝒌+𝟏 … 𝑪𝒌+𝒎
k +𝒎 ≪ 𝒏
Updated class constraints: Zk ⊆ 𝑍𝑘+𝑚
Review: Exploratory EM
[Dalvi et al. ECML 2013]
Classification/clustering
Initialize model with few
seeds per class
KMeans, NBayes, VMF …
Iterate till convergence (Data likelihood and # classes)


E step: Predict labels for unlabeled points
Max/Min Xi,
ratioj=1 to k
If P(Cj | Xi) is nearly-uniform for a data-point
JS itDivergence
 Create a new class Ck+1, assign Xi to

M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
 Number of classes might increase in each iteration

Check if model selection criterion is satisfied
AIC, BIC, AICc …
If not, revert to model in Iteration `t-1’
Hierarchical Exploratory EM
Initialize model with few seeds per class
Iterate till convergence (Data likelihood and # classes)

E step: Predict labels for unlabeled points
 Assign a consistent bit vector of labels for each unlabeled datapoint
 If 𝑷 𝑪𝒄𝒂𝒏𝒅𝒊𝒅𝒂𝒕𝒆 𝑿𝒊 ) is nearly-uniform for a data-point 𝑿𝒊
 Create a new class 𝐶𝑛𝑒𝑤 , assign 𝑿𝒊 to it
 Update class constraints accordingly

M step: Recompute model parameters using seeds
+
predicted labels for unlabeled points
 Number of classes might increase in each iteration
 Since the E step follows class constraints this step need not be modified

Check if model selection criterion is satisfied
If not, revert to model in Iteration `t-1’
Divide-And-Conquer Exploratory EM
Level 1
Assumptions:Root
 Classes are arranged in a treestructured hierarchy.
 Classes at any level of the hierarchy
are mutually exclusive.
Level 2
Location
Food
Inclusion
Level 3
State
Country
Vegetable
Condiment
E.g. Spinach, Potato, Pepper…
Mutual ExcIusion
Divide-And-Conquer Exploratory EM
Root
1.0
Location
State
California
Food
Country
Vegetable
Condiment
Divide-And-Conquer Exploratory EM
1.0
Root
Location
State
0.9
Country
0.1
Vegetable
California
Food
Condiment
Divide-And-Conquer Exploratory EM
0.9
Location
Country
State
0.8
California
1.0
Root
0.1
Food
Condiment
Vegetable
0.2
1
1
0
1
0
0
0
Divide-And-Conquer Exploratory EM
Root
1.0
Location
State
Coke
Food
Country
Vegetable
Condiment
Divide-And-Conquer Exploratory EM
Root
Location
State
0.1
Country
1.0
0.9
Vegetable
Coke
Food
Condiment
Divide-And-Conquer Exploratory EM
Root
Location
State
0.1
Country
Coke
1.0
Food
0.9
Vegetable
0.55
Condiment
0.45
Divide-And-Conquer Exploratory EM
Location
State
0.1
Country
Coke
1.0
Root
Food
0.9
Coke
0.55
1
0
C8
Condiment
Vegetable
0.45
1
0
0
0
0
1
Divide-And-Conquer Exploratory EM
Location
State
0.1
Coke
1.0
Root
Food
0.9
0.55
1
0
𝑪𝟖
Condiment
Vegetable
Country
Adds to class constraints
𝐶8 ⊆ 𝐹𝑜𝑜𝑑
𝐶8 ∩ 𝐶𝑜𝑛𝑑𝑖𝑚𝑒𝑛𝑡 = 𝜙 …
Coke
0.45
1
0
0
0
0
1
Divide-And-Conquer Exploratory EM
Root
Location
0.45
Adds to class constraints
𝐶9 ⊆ 𝑅𝑜𝑜𝑡
𝐶9 ∩ 𝐹𝑜𝑜𝑑 = 𝜙 …
Cat
1.0
0.55
C9
Food
Cat
State
Country
1
0
0
C8
Condiment
Vegetable
0
0
0
0
0
1
What are we trying to optimize?
Objective Function :
Maximize { Log Data Likelihood – Model Penalty }
m: #clusters,
Params{C1… Cm}
subject to
Class constraints: Zm
Datasets
Ontology 2
Ontology 1
Dataset
#Classes #Levels
#NELL
entities
#Contexts
DS-1
11
3
2.5K
3.4M
DS-2
39
4
12.9K
6.7M
Clueweb09 Corpus
+
Subsets of NELL
Results
Dataset #Train
/Test
Points
DS-1
335/
2.2K
DS-2
1.5K/
11.4K
Results
Dataset #Train Level #Seed/
/Test
#Ideal
Points
Classes
DS-1
DS-2
335/
2.2K
2
2/3
3
4/7
1.5K/
11.4K
2
3.9/4
3
9.4/24
4
2.4/10
Results
Dataset #Train Level #Seed/ Macro-averaged Seed Class
/Test
#Ideal
F1
Points
Classes
FLAT
SemisupEM
DS-1
DS-2
ExploratoryEM
335/
2.2K
2
2/3
43.2
78.7 *
3
4/7
34.4
42.6 *
1.5K/
11.4K
2
3.9/4
64.3
53.40
3
9.4/24
31.3
33.7 *
4
2.4/10
27.5
38.9 *
Results
Dataset #Train Level #Seed/
/Test
#Ideal
Points
Classes
Macro-averaged Seed Class F1
FLAT
SemisupEM
DS-1
DS-2
DAC
ExploratoryEM
SemisupEM
ExploratoryEM
335/
2.2K
2
2/3
43.2
78.7 *
69.5
77.2 *
3
4/7
34.4
42.6 *
31.3
44.4 *
1.5K/
11.4K
2
3.9/4
64.3
53.40
65.4
68.9 *
3
9.4/24
31.3
33.7 *
34.9
41.7 *
4
2.4/10
27.5
38.9 *
43.2
42.40
Conclusions


Hierarchical Exploratory EM works with incomplete
class hierarchy and few seed instances to extend the
existing knowledge base.
Encouraging preliminary results
Hierarchical classification ≥ Flat classification
 Exploratory Learning ≥ Semi-supervised Learning


Future work:
Incorporate arbitrary class constraints
 Evaluate the newly added clusters

Thank You
Questions?
Extra Slides
Class Creation Criterion
Given 𝑃 𝐶𝑗 𝑋𝑖 ) , 𝑗 = 1 … 𝑘
& 𝑃𝑢𝑛𝑖𝑓𝑜𝑟𝑚 =


1
1
[ … ]
𝑘
𝑘
MinMax ratio: 𝑚𝑎𝑥(𝑃 𝐶𝑗 𝑋𝑖 ))/
𝑚𝑖𝑛(𝑃 𝐶𝑗 𝑋𝑖 ) < 2
Jensen-Shannon divergence: JS1
Div(𝑃 𝐶𝑗 𝑋𝑖 ), 𝑃𝑢𝑛𝑖𝑓𝑜𝑟𝑚 ) <
𝑘
Model Selection
Extended Akaike Information Criterion
AICc(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1)
Here
g: model being evaluated,
L(g): log-likelihood of data given g,
v: number of free parameters of the model,
n: number of data-points.
