Transcript [Poster]
Exploratory Learning
Semi-supervised Learning in the presence of unanticipated classes
Bhavana Dalvi , William W. Cohen , Jamie Callan
School Of Computer Science, Carnegie Mellon University
Motivation
The Exploratory EM Algorithm
Multi-class semi-supervised learning: The number of
natural classes present in the data might not be known.
There may be no labeled data for some of the classes.
Exploratory Learning extends the semi-supervised EM
algorithm by dynamically add new classes when
appropriate.
Thus it uses the existing knowledge in the form of seeds,
and discovers clusters belonging to unknown classes.
Initialize the model with a few seeds per class
(Data likelihood and number of classes)
E Step: Predict labels for unlabeled points
For i = 1 : n
If P(Cj | Xi) is nearly-uniform for a data-point Xi, j = 1 to k
Create a new class Ck+1, assign Xi to it
else
Assign Xi to Argmax Cj
(State: CA, PA, MN etc.)
instances of Country,
collecting ``Cities’’ or
Semantic
drift
even other kind
of locations.
Extending existing
SSL methods
Seeded
von
MisesFisher
Exploratory
version
Multinomial Model
if (P(Cj | Xi) is nearly
uniform)
label(Xi)
label(Xi) = Ck+1
= Argmax (Cj|Xi) Else
Cj=1..k
label(Xi)
= Argmax P(Cj|Xi)
Cj=1..k
Assign Xi to closest
centroid Cj
Distribution of data
on the unit
hypersphere.
If (Xi is nearly
equidistant from all
centroids)
Create new cluster Ck+1
and
put Xi in it
Else
Assign Xi to closest
centroid
Extension similar to
Naïve Bayes based on
near-uniformity
of P (Cj | Xi)
label(Xi)
= Argmax P(Cj|Xi)
Cj=1..k
Model Selection
Criterion
We tried BIC, AIC, and AICc criteria, and AICc
worked best
BIC(g) = -2 * L(g) + v * ln(n)
AIC(g) = -2 * L(g) + 2 * v
AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1)
Here g: model being evaluated, L(g): log-likelihood
of data given g, v: number of free parameters of the
model, n: number of data-points.
Objective using
AICc
721
26
18.9K
65
Comparison: macro averaged seeded-class F1
If not, revert to model in Iteration `t-1’
``States’’ might end up
Seeded Features: L1
K-Means normalized TFIDF
vectors
Similarity: Dot
Product (centroid,
data-point)
61.2K
20
Check if model selection criterion is satisfied
Model seeded with
Naïve
Bayes
|F|
|C|
# Entities
/Documents
# features
# classes
Number of classes might increase in each iteration.
State, City, Museums etc.
Semi-supervised
version
{P(Cj | Xi)}
|X|
Dataset
20Delicious_
Reuters
Newsgroups Sports
18.8K
282
8.3K
M step: Re-compute model parameters using seeds and predicted
labels for unlabeled data-points.
Unlabeled data contains
Model
Symbol Description
Iterate till convergence
Example of Semantic Drift
seeds: (Country: USA, Japan, India…)
Experimental Results
Hypothesis: Dynamically inducing clusters of
data-points that do not belong to any of the
seeded classes will reduce the semantic drift on
seeded classes.
When New Classes Are Created
For each data-point Xi, we compute posterior distribution P(Cj | Xi) of
Xi belonging to any of the existing classes C1 … Ck
Criterion 1 : MinMax
maxP = max(P(Cj | Xi)), minP = min(P(Cj | Xi))
if (maxP / minP < 2) Create a new class/cluster
Baseline
Best case
performance of
improved baseline
Proposed
Method
Exploratory EM improves seed
class F1 (over Semi-supervised
EM) on all three publicly
available datasets.
Varying #seed classes and #seeds per class
As the number of seed classes or
the number of seeds per class
increases, both methods improve.
20-Newsgroups
Criterion 2 : JS
uniP = uniform distribution over k classes = {1/k, 1/k, ….1/k}
jsDiv = JD-Divergence(uniP, P(Cj|Xi)
if (jsDiv < 1/k) Create a new class/cluster
ExploratoryEM is beneficial
especially when amount of
supervision is small.
Hypothesis: If P(Cj | Xi) is nearly
uniform then Xi does not belong to any
of the existing classes, hence a new
class/cluster needs to be created.
Exploratory EM
discovers unseeded
clusters and improves
seed class F1
Delicious_Sports
Comparison to Chinese
Restaurant Process
Initialize the model using seed data
for (epoch in 1 to numEpochs) {
for (item in unlabeled data) {
Decrement data counts for item and label[epoch-1,item]
Sample a label from P(label | item)
Create a new class using CRP
Increment data counts for item, register label[epoch, item]
}
Exploratory EM is better than
}
Gibbs+CRP in terms of
Seed class F1
Run-time
#classes produced
20-Newsgroups
No parameter tuning
Conclusions
We investigate and improve the robustness of SSL methods in a
setting in which seeds are available for only a subset of the classes.
Our proposed approach, called Exploratory EM, introduces new
classes on-the-fly during learning, based on the intuition that
hard-to-classify examples, specifically, examples with a nearlyuniform posterior class distribution, are in new classes.
We showed that this approach outperforms standard Semisupervised EM approaches on three different publicly available
datasets.
We also showed performance improvements over a Gibbs
sampling baseline that uses the Chinese Restaurant Process
(CRP) to induce new clusters.
In the future, we plan on extending this technique to multi-label,
hierarchical and multi-view classification problems.
Acknowledgements : This work is supported by Google and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL)
contract number FA8650-10-C-7058.