Transcript [Poster]

Exploratory Learning
Semi-supervised Learning in the presence of unanticipated classes
Bhavana Dalvi , William W. Cohen , Jamie Callan
School Of Computer Science, Carnegie Mellon University
Motivation
The Exploratory EM Algorithm
 Multi-class semi-supervised learning: The number of
natural classes present in the data might not be known.
 There may be no labeled data for some of the classes.
 Exploratory Learning extends the semi-supervised EM
algorithm by dynamically add new classes when
appropriate.
 Thus it uses the existing knowledge in the form of seeds,
and discovers clusters belonging to unknown classes.
 Initialize the model with a few seeds per class
(Data likelihood and number of classes)
 E Step: Predict labels for unlabeled points
For i = 1 : n
If P(Cj | Xi) is nearly-uniform for a data-point Xi, j = 1 to k
Create a new class Ck+1, assign Xi to it
else
Assign Xi to Argmax Cj
(State: CA, PA, MN etc.)
instances of Country,
collecting ``Cities’’ or
Semantic
drift
even other kind
of locations.
Extending existing
SSL methods
Seeded
von
MisesFisher
Exploratory
version
Multinomial Model
if (P(Cj | Xi) is nearly
uniform)
label(Xi)
label(Xi) = Ck+1
= Argmax (Cj|Xi) Else
Cj=1..k
label(Xi)
= Argmax P(Cj|Xi)
Cj=1..k
Assign Xi to closest
centroid Cj
Distribution of data
on the unit
hypersphere.
If (Xi is nearly
equidistant from all
centroids)
Create new cluster Ck+1
and
put Xi in it
Else
Assign Xi to closest
centroid
Extension similar to
Naïve Bayes based on
near-uniformity
of P (Cj | Xi)
label(Xi)
= Argmax P(Cj|Xi)
Cj=1..k
Model Selection
Criterion
 We tried BIC, AIC, and AICc criteria, and AICc
worked best
BIC(g) = -2 * L(g) + v * ln(n)
AIC(g) = -2 * L(g) + 2 * v
AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1)
Here g: model being evaluated, L(g): log-likelihood
of data given g, v: number of free parameters of the
model, n: number of data-points.
Objective using
AICc
721
26
18.9K
65
Comparison: macro averaged seeded-class F1
If not, revert to model in Iteration `t-1’
``States’’ might end up
Seeded Features: L1
K-Means normalized TFIDF
vectors
Similarity: Dot
Product (centroid,
data-point)
61.2K
20
 Check if model selection criterion is satisfied
Model seeded with
Naïve
Bayes
|F|
|C|
# Entities
/Documents
# features
# classes
Number of classes might increase in each iteration.
State, City, Museums etc.
Semi-supervised
version
{P(Cj | Xi)}
|X|
Dataset
20Delicious_
Reuters
Newsgroups Sports
18.8K
282
8.3K
 M step: Re-compute model parameters using seeds and predicted
labels for unlabeled data-points.
Unlabeled data contains
Model
Symbol Description
 Iterate till convergence
 Example of Semantic Drift
seeds: (Country: USA, Japan, India…)
Experimental Results
Hypothesis: Dynamically inducing clusters of
data-points that do not belong to any of the
seeded classes will reduce the semantic drift on
seeded classes.
When New Classes Are Created
 For each data-point Xi, we compute posterior distribution P(Cj | Xi) of
Xi belonging to any of the existing classes C1 … Ck
 Criterion 1 : MinMax
maxP = max(P(Cj | Xi)), minP = min(P(Cj | Xi))
if (maxP / minP < 2)  Create a new class/cluster
Baseline
Best case
performance of
improved baseline
Proposed
Method
Exploratory EM improves seed
class F1 (over Semi-supervised
EM) on all three publicly
available datasets.
Varying #seed classes and #seeds per class
As the number of seed classes or
the number of seeds per class
increases, both methods improve.
20-Newsgroups
 Criterion 2 : JS
uniP = uniform distribution over k classes = {1/k, 1/k, ….1/k}
jsDiv = JD-Divergence(uniP, P(Cj|Xi)
if (jsDiv < 1/k)  Create a new class/cluster
ExploratoryEM is beneficial
especially when amount of
supervision is small.
Hypothesis: If P(Cj | Xi) is nearly
uniform then Xi does not belong to any
of the existing classes, hence a new
class/cluster needs to be created.
Exploratory EM
discovers unseeded
clusters and improves
seed class F1
Delicious_Sports
Comparison to Chinese
Restaurant Process
 Initialize the model using seed data
 for (epoch in 1 to numEpochs) {
for (item in unlabeled data) {

Decrement data counts for item and label[epoch-1,item]

Sample a label from P(label | item)

Create a new class using CRP

Increment data counts for item, register label[epoch, item]
}
Exploratory EM is better than
}
Gibbs+CRP in terms of
 Seed class F1
 Run-time
 #classes produced
20-Newsgroups
 No parameter tuning
Conclusions
 We investigate and improve the robustness of SSL methods in a
setting in which seeds are available for only a subset of the classes.
 Our proposed approach, called Exploratory EM, introduces new
classes on-the-fly during learning, based on the intuition that
hard-to-classify examples, specifically, examples with a nearlyuniform posterior class distribution, are in new classes.
 We showed that this approach outperforms standard Semisupervised EM approaches on three different publicly available
datasets.
 We also showed performance improvements over a Gibbs
sampling baseline that uses the Chinese Restaurant Process
(CRP) to induce new clusters.
 In the future, we plan on extending this technique to multi-label,
hierarchical and multi-view classification problems.
Acknowledgements : This work is supported by Google and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL)
contract number FA8650-10-C-7058.