Transcript [Slides]
CLASSIFYING ENTITIES INTO
AN INCOMPLETE ONTOLOGY
Bhavana Dalvi, William W. Cohen, Jamie Callan
School of Computer Science,
Carnegie Mellon University
Motivation
Existing Techniques
Semi-supervised Hierarchical Classification: Carlson WSDM’10
Extending knowledge bases: Finding new relations or attributes of
existing concepts Mohamed et al. EMNLP’11
Unsupervised ontology discovery:
Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09
Evolving Web-scale datasets
Billions of entities and hundreds of thousands of concepts
Difficult to create a complete ontology
Hierarchical classification of entities into incomplete ontologies is
needed
Contributions
Hierarchical Exploratory EM
Adds new instances to the existing classes
Discovers new classes and adds them at appropriate places in the
ontology
Class constraints:
Inclusion: Every entity that is “Mammal” is also an “Animal”
Mutual Exclusion: If an entity is “Electronic Device” then its not
“Mammal”
Problem Definition
Input
Large set of data-points : 𝑋1 … 𝑋𝑛
Some known classes : 𝐶1 … 𝐶𝑘
Class constraints 𝑍𝑘 between 𝑘 classes
Small number of seeds per known class: |seeds| ≪ n
Output
Labels for all data-points 𝑋𝑖
Discover new classes from data: 𝑪𝒌+𝟏 … 𝑪𝒌+𝒎
k +𝒎 ≪ 𝒏
Updated class constraints: Zk ⊆ 𝑍𝑘+𝑚
Review: Exploratory EM
[Dalvi et al. ECML 2013]
Classification/clustering
Initialize model with few
seeds per class
KMeans, NBayes, VMF …
Iterate till convergence (Data likelihood and # classes)
E step: Predict labels for unlabeled points
Max/Min Xi,
ratioj=1 to k
If P(Cj | Xi) is nearly-uniform for a data-point
JS itDivergence
Create a new class Ck+1, assign Xi to
M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
Number of classes might increase in each iteration
Check if model selection criterion is satisfied
AIC, BIC, AICc …
If not, revert to model in Iteration `t-1’
Hierarchical Exploratory EM
Initialize model with few seeds per class
Iterate till convergence (Data likelihood and # classes)
E step: Predict labels for unlabeled points
Assign a consistent bit vector of labels for each unlabeled datapoint
If 𝑷 𝑪𝒄𝒂𝒏𝒅𝒊𝒅𝒂𝒕𝒆 𝑿𝒊 ) is nearly-uniform for a data-point 𝑿𝒊
Create a new class 𝐶𝑛𝑒𝑤 , assign 𝑿𝒊 to it
Update class constraints accordingly
M step: Recompute model parameters using seeds
+
predicted labels for unlabeled points
Number of classes might increase in each iteration
Since the E step follows class constraints this step need not be modified
Check if model selection criterion is satisfied
If not, revert to model in Iteration `t-1’
Divide-And-Conquer Exploratory EM
Level 1
Assumptions:Root
Classes are arranged in a treestructured hierarchy.
Classes at any level of the hierarchy
are mutually exclusive.
Level 2
Location
Food
Inclusion
Level 3
State
Country
Vegetable
Condiment
E.g. Spinach, Potato, Pepper…
Mutual ExcIusion
Divide-And-Conquer Exploratory EM
Root
1.0
Location
State
California
Food
Country
Vegetable
Condiment
Divide-And-Conquer Exploratory EM
1.0
Root
Location
State
0.9
Country
0.1
Vegetable
California
Food
Condiment
Divide-And-Conquer Exploratory EM
0.9
Location
Country
State
0.8
California
1.0
Root
0.1
Food
Condiment
Vegetable
0.2
1
1
0
1
0
0
0
Divide-And-Conquer Exploratory EM
Root
1.0
Location
State
Coke
Food
Country
Vegetable
Condiment
Divide-And-Conquer Exploratory EM
Root
Location
State
0.1
Country
1.0
0.9
Vegetable
Coke
Food
Condiment
Divide-And-Conquer Exploratory EM
Root
Location
State
0.1
Country
Coke
1.0
Food
0.9
Vegetable
0.55
Condiment
0.45
Divide-And-Conquer Exploratory EM
Location
State
0.1
Country
Coke
1.0
Root
Food
0.9
Coke
0.55
1
0
C8
Condiment
Vegetable
0.45
1
0
0
0
0
1
Divide-And-Conquer Exploratory EM
Location
State
0.1
Coke
1.0
Root
Food
0.9
0.55
1
0
𝑪𝟖
Condiment
Vegetable
Country
Adds to class constraints
𝐶8 ⊆ 𝐹𝑜𝑜𝑑
𝐶8 ∩ 𝐶𝑜𝑛𝑑𝑖𝑚𝑒𝑛𝑡 = 𝜙 …
Coke
0.45
1
0
0
0
0
1
Divide-And-Conquer Exploratory EM
Root
Location
0.45
Adds to class constraints
𝐶9 ⊆ 𝑅𝑜𝑜𝑡
𝐶9 ∩ 𝐹𝑜𝑜𝑑 = 𝜙 …
Cat
1.0
0.55
C9
Food
Cat
State
Country
1
0
0
C8
Condiment
Vegetable
0
0
0
0
0
1
What are we trying to optimize?
Objective Function :
Maximize { Log Data Likelihood – Model Penalty }
m: #clusters,
Params{C1… Cm}
subject to
Class constraints: Zm
Datasets
Ontology 2
Ontology 1
Dataset
#Classes #Levels
#NELL
entities
#Contexts
DS-1
11
3
2.5K
3.4M
DS-2
39
4
12.9K
6.7M
Clueweb09 Corpus
+
Subsets of NELL
Results
Dataset #Train
/Test
Points
DS-1
335/
2.2K
DS-2
1.5K/
11.4K
Results
Dataset #Train Level #Seed/
/Test
#Ideal
Points
Classes
DS-1
DS-2
335/
2.2K
2
2/3
3
4/7
1.5K/
11.4K
2
3.9/4
3
9.4/24
4
2.4/10
Results
Dataset #Train Level #Seed/ Macro-averaged Seed Class
/Test
#Ideal
F1
Points
Classes
FLAT
SemisupEM
DS-1
DS-2
ExploratoryEM
335/
2.2K
2
2/3
43.2
78.7 *
3
4/7
34.4
42.6 *
1.5K/
11.4K
2
3.9/4
64.3
53.40
3
9.4/24
31.3
33.7 *
4
2.4/10
27.5
38.9 *
Results
Dataset #Train Level #Seed/
/Test
#Ideal
Points
Classes
Macro-averaged Seed Class F1
FLAT
SemisupEM
DS-1
DS-2
DAC
ExploratoryEM
SemisupEM
ExploratoryEM
335/
2.2K
2
2/3
43.2
78.7 *
69.5
77.2 *
3
4/7
34.4
42.6 *
31.3
44.4 *
1.5K/
11.4K
2
3.9/4
64.3
53.40
65.4
68.9 *
3
9.4/24
31.3
33.7 *
34.9
41.7 *
4
2.4/10
27.5
38.9 *
43.2
42.40
Conclusions
Hierarchical Exploratory EM works with incomplete
class hierarchy and few seed instances to extend the
existing knowledge base.
Encouraging preliminary results
Hierarchical classification ≥ Flat classification
Exploratory Learning ≥ Semi-supervised Learning
Future work:
Incorporate arbitrary class constraints
Evaluate the newly added clusters
Thank You
Questions?
Extra Slides
Class Creation Criterion
Given 𝑃 𝐶𝑗 𝑋𝑖 ) , 𝑗 = 1 … 𝑘
& 𝑃𝑢𝑛𝑖𝑓𝑜𝑟𝑚 =
1
1
[ … ]
𝑘
𝑘
MinMax ratio: 𝑚𝑎𝑥(𝑃 𝐶𝑗 𝑋𝑖 ))/
𝑚𝑖𝑛(𝑃 𝐶𝑗 𝑋𝑖 ) < 2
Jensen-Shannon divergence: JS1
Div(𝑃 𝐶𝑗 𝑋𝑖 ), 𝑃𝑢𝑛𝑖𝑓𝑜𝑟𝑚 ) <
𝑘
Model Selection
Extended Akaike Information Criterion
AICc(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1)
Here
g: model being evaluated,
L(g): log-likelihood of data given g,
v: number of free parameters of the model,
n: number of data-points.