Transcript Slides - ijcai-11
Positive Unlabeled Learning for Time Series Classification
Nguyen Minh Nhut
Xiao-Li Li See-Kiong Ng Institute for Infocomm Research, Singapore
Outline
Introduction and Related Work The Proposed Technique Learning from Common Local Clusters (LCLC ) Evaluation Experiments Conclusions Discussion and question
1. Introduction
Traditional Supervised Learning – Given a set of labeled training examples of n classes, the system uses this set to build a classifier.
– The classifier is then used to classify new examples into the n classes.
Typically require a large number of labeled examples: – Human labels can be expensive, time consuming and sometimes even impossible.
Unlabeled Data
Unlabeled data are usually
plentiful
.
Can we label only a
small
number of examples and make use of a
large
number of unlabeled examples to learn?
Unlabeled data contain information which can be used to improve the classification accuracy.
Positive-Unlabeled (PU) Learning
Positive examples : We have a set of examples of interesting class P , and Unlabeled set : also has a set (or mixed) examples with instances from and also not from
P
U of unlabeled (negative examples).
P
Build a classifier: using the data in U P and U to classify as well as future test data.
PU Learning for Time Series Classification
PU learning is applicable in a wide range of application domains such as: text classification, bio-medical informatics, pattern recognition, and recommendation system… However, the application of PU learning for time series data has been relatively to: – High feature correlation.
less explored due – Lack of joint probability distribution over words and collocations (in texts).
Existing PU Learning for Time Series Classification
Wei, L. and E. Keogh (2006). "Semi-supervised time series classification." ACM SIGKDD. The idea is: Let the classifier teach itself by its own predication.
Unfortunately, without a good
stopping criterion
, the method often stops too early, resulting in high precision but low recall.
The Proposed Technique: LCLC ( Learning from Common Local Clusters)
The proposed LCLC algorithm addresses two specific issues in PU learning for time series classification: – – Select independent and relevant features from the time series data using cluster based approach.
Accurately extract reliable positive and negatives from the given unlabeled data.
Local Clustering and Feature Selection
Algorithm 1
.
Local clustering and feature selection Input
: one initial seed positive
s
, unlabelled dataset
U
, number of clusters
K
1.
Use Wei’s method to get an initial positive set
P
; 2.
K-ULCs
Partition
U
into
K
local clusters using
K
-
means
; 3.
Select
K common principal features
from the raw feature set
Clever-Cluster
(
P
,
K-ULCs
);
U U U U U U P U U U
Yoon, H., K. Yang, et al. (2005). "Feature subset selection and feature ranking for multivariate time series." IEEE Transactions on Knowledge and Data Engineering
U
Local Clustering and Feature Selection
The cluster-based approach of the proposed LCLC method offers the following advantages: The cluster-based approach is much more robust than instance-based methods for extracting the likely positives and negatives from U .
The similarity between two time series data can be effectively measured using a well-selected subset of the common principal features that can capture the underlying characteristics of both positive and unlabeled clusters.
Extracting Reliable Negative set Algorithm 2
.
Extracting Reliable Negative Examples Input
: positive data
P
,
K
unlabeled local clusters
ULC i 1. RN
= ,
AMBI
= ;
RN
2. For i =1 to
K AMBI
3. Compute the distance between local cluster
ULC i
to
P
; 4. Sort
d
(
ULC i
,
P
) (
i
=1, 2, …,
K
) in a decreasing order;
AMBI AMBI 5. d Median
= the median distance of
d
(
ULC i
,
P
) (
i
=1, 2,…,
K
);
RN P AMBI
6. For i =1 to
K
7. If
8.
(
d
(
ULC i
,
P
)>
RN
=
RN
d Median ULC i
; )
9. Else
10.
AMBI = AMBI
ULC i
;
AMBI RN RN RN
Boundary decision using Cluster chaining approach
Algorithm 3
. Identifying likely positive clusters
LP
and likely negative clusters
LN
Input
: ambiguous clusters
AMBI
, positive cluster
P
, reliable negative clusters set in
RN
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
LP
= ;
LN
= ;
While
(
AMBI
!= ) Find the nearest
AMBI
cluster
C AMBI,A
to
P
and add
C AMBI,A
to
While
C AMBI,A cluster-chain
RN i
; Find the nearest cluster
C AMBI,B
(from
AMBI
RN
) to
C AMBI,A
and add
C AMBI,B
to
cluster-chain i
;
C AMBI,A = C AMBI,B
;
Loop
for all the cluster-chains
breaking link i
(
C m
,
C m
+1 )= the link with
maximal distance
between the clusters in the
cluster-chain LP
AMBI
from the
cluster-chain
with
P LN
AMBI
from the
cluster-chain
without
P AMBI RN AMBI AMBI AMBI RN P AMBI Decision Boundary RN RN RN
Boundary decision using Cluster chaining approach
Minimize the effect of possible noisy examples.
Offer a robust solution for the cases of severely unbalanced positive and negative examples in the unlabeled dataset U .
EMPIRICAL EVALUATION
Datasets Name ECG Word Spotting Wafer Yoga CBF Training set Testing set Positive Negative Positive Negative Num of Features
208 602 312 904 86 109 381 156 155 796 3201 150 310 109 381 156 155 796 3201 150 310 272 152 428 128 Wei, L. (2007). "Self Training dataset." http://alumni.cs.ucr.edu/~wli/selfTraining/ .
Keogh, E. (2008). "The UCR Time Series Classification/ Clustering Homepage" http://www.cs.ucr.edu/~eamonn/time_series_data/ .
Experiment setting
We randomly select just one seed instance from the positive class for the learning phase, the rest of training data are treated as unlabeled data.
We build a 1-NN classifier using P together with LP as a positive training set, and RN together with LN as a negative training set.
We repeat our experiments 30 times and report the average values of the 30 results.
Overall performance
Dataset Wei’s method ECG Word Spotting Wafer Yoga CBF
0.405
0.279
0.433 0.466 0.201
Ratana’s method
0.840
0.637
0.080 0.626 0.309
LCLC wo FS LCLC wo CC
0.631
0.781
0.608
0.52
0.637 0.808 0.599
0.32
0.699 0.586
LCLC (F-measure)
0.867
0.727
0.724 0.854 0.701
Wei, L. and E. Keogh (2006) Ratanamahatana, C. and D. Wanichsan (2008).
Sensitivity of the size of local clusters
We have set the number of clusters
K
where
ULC
_
size
= Size(
U
)/ is size of the unlabeled clusters.
ULC
_
size
,
Conclusions
There are three key approaches that underlie LCLC’s improved classification performance over existing methods.
First, LCLC adopts a cluster-based method that is much more robust than instance-based PU learning methods.
Secondly, we have adopted a feature selection strategy that can take the characteristics of both positive and unlabeled clusters.
Finally, we have devised a novel cluster chaining approach to extract the boundary positive and negative clusters.