Slides - ijcai-11

Download Report

Transcript Slides - ijcai-11

Positive Unlabeled Learning for Time Series Classification

Nguyen Minh Nhut

Xiao-Li Li See-Kiong Ng Institute for Infocomm Research, Singapore

Outline

 Introduction and Related Work  The Proposed Technique Learning from Common Local Clusters (LCLC )  Evaluation Experiments  Conclusions  Discussion and question

1. Introduction

 Traditional Supervised Learning – Given a set of labeled training examples of n classes, the system uses this set to build a classifier.

– The classifier is then used to classify new examples into the n classes.

 Typically require a large number of labeled examples: – Human labels can be expensive, time consuming and sometimes even impossible.

Unlabeled Data

 Unlabeled data are usually

plentiful

.

 Can we label only a

small

number of examples and make use of a

large

number of unlabeled examples to learn?

 Unlabeled data contain information which can be used to improve the classification accuracy.

Positive-Unlabeled (PU) Learning

 Positive examples : We have a set of examples of interesting class P , and  Unlabeled set : also has a set (or mixed) examples with instances from and also not from

P

U of unlabeled (negative examples).

P

 Build a classifier: using the data in U P and U to classify as well as future test data.

PU Learning for Time Series Classification

 PU learning is applicable in a wide range of application domains such as: text classification, bio-medical informatics, pattern recognition, and recommendation system…  However, the application of PU learning for time series data has been relatively to: – High feature correlation.

less explored due – Lack of joint probability distribution over words and collocations (in texts).

Existing PU Learning for Time Series Classification

 Wei, L. and E. Keogh (2006). "Semi-supervised time series classification." ACM SIGKDD.  The idea is: Let the classifier teach itself by its own predication.

 Unfortunately, without a good

stopping criterion

, the method often stops too early, resulting in high precision but low recall.

The Proposed Technique: LCLC ( Learning from Common Local Clusters)

 The proposed LCLC algorithm addresses two specific issues in PU learning for time series classification: – – Select independent and relevant features from the time series data using cluster based approach.

Accurately extract reliable positive and negatives from the given unlabeled data.

Local Clustering and Feature Selection

Algorithm 1

.

Local clustering and feature selection Input

: one initial seed positive

s

, unlabelled dataset

U

, number of clusters

K

1.

Use Wei’s method to get an initial positive set

P

; 2.

K-ULCs

 Partition

U

into

K

local clusters using

K

-

means

; 3.

Select

K common principal features

from the raw feature set 

Clever-Cluster

(

P

,

K-ULCs

);

U U U U U U P U U U

Yoon, H., K. Yang, et al. (2005). "Feature subset selection and feature ranking for multivariate time series." IEEE Transactions on Knowledge and Data Engineering

U

Local Clustering and Feature Selection

The cluster-based approach of the proposed LCLC method offers the following advantages:  The cluster-based approach is much more robust than instance-based methods for extracting the likely positives and negatives from U .

 The similarity between two time series data can be effectively measured using a well-selected subset of the common principal features that can capture the underlying characteristics of both positive and unlabeled clusters.

Extracting Reliable Negative set Algorithm 2

.

Extracting Reliable Negative Examples Input

: positive data

P

,

K

unlabeled local clusters

ULC i 1. RN

=  ,

AMBI

=  ;

RN

2. For i =1 to

K AMBI

3. Compute the distance between local cluster

ULC i

to

P

; 4. Sort

d

(

ULC i

,

P

) (

i

=1, 2, …,

K

) in a decreasing order;

AMBI AMBI 5. d Median

= the median distance of

d

(

ULC i

,

P

) (

i

=1, 2,…,

K

);

RN P AMBI

6. For i =1 to

K

7. If

8.

(

d

(

ULC i

,

P

)>

RN

=

RN

d Median ULC i

; )

9. Else

10.

AMBI = AMBI

ULC i

;

AMBI RN RN RN

Boundary decision using Cluster chaining approach

Algorithm 3

. Identifying likely positive clusters

LP

and likely negative clusters

LN

Input

: ambiguous clusters

AMBI

, positive cluster

P

, reliable negative clusters set in

RN

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

LP

=  ;

LN

=  ;

While

(

AMBI

!=  ) Find the nearest

AMBI

cluster

C AMBI,A

to

P

and add

C AMBI,A

to

While

C AMBI,A cluster-chain

RN i

; Find the nearest cluster

C AMBI,B

(from

AMBI

RN

) to

C AMBI,A

and add

C AMBI,B

to

cluster-chain i

;

C AMBI,A = C AMBI,B

;

Loop

for all the cluster-chains

breaking link i

(

C m

,

C m

+1 )= the link with

maximal distance

between the clusters in the

cluster-chain LP

AMBI

from the

cluster-chain

with

P LN

AMBI

from the

cluster-chain

without

P AMBI RN AMBI AMBI AMBI RN P AMBI Decision Boundary RN RN RN

Boundary decision using Cluster chaining approach

 Minimize the effect of possible noisy examples.

 Offer a robust solution for the cases of severely unbalanced positive and negative examples in the unlabeled dataset U .

EMPIRICAL EVALUATION

Datasets Name ECG Word Spotting Wafer Yoga CBF Training set Testing set Positive Negative Positive Negative Num of Features

208 602 312 904 86 109 381 156 155 796 3201 150 310 109 381 156 155 796 3201 150 310 272 152 428 128 Wei, L. (2007). "Self Training dataset." http://alumni.cs.ucr.edu/~wli/selfTraining/ .

Keogh, E. (2008). "The UCR Time Series Classification/ Clustering Homepage" http://www.cs.ucr.edu/~eamonn/time_series_data/ .

Experiment setting

 We randomly select just one seed instance from the positive class for the learning phase, the rest of training data are treated as unlabeled data.

 We build a 1-NN classifier using P together with LP as a positive training set, and RN together with LN as a negative training set.

 We repeat our experiments 30 times and report the average values of the 30 results.

Overall performance

Dataset Wei’s method ECG Word Spotting Wafer Yoga CBF

0.405

0.279

0.433 0.466 0.201

Ratana’s method

0.840

0.637

0.080 0.626 0.309

LCLC wo FS LCLC wo CC

0.631

0.781

0.608

0.52

0.637 0.808 0.599

0.32

0.699 0.586

LCLC (F-measure)

0.867

0.727

0.724 0.854 0.701

Wei, L. and E. Keogh (2006) Ratanamahatana, C. and D. Wanichsan (2008).

Sensitivity of the size of local clusters

We have set the number of clusters

K

where

ULC

_

size

= Size(

U

)/ is size of the unlabeled clusters.

ULC

_

size

,

Conclusions

There are three key approaches that underlie LCLC’s improved classification performance over existing methods.

   First, LCLC adopts a cluster-based method that is much more robust than instance-based PU learning methods.

Secondly, we have adopted a feature selection strategy that can take the characteristics of both positive and unlabeled clusters.

Finally, we have devised a novel cluster chaining approach to extract the boundary positive and negative clusters.

Discussion and question