featureselection.asu.edu

Download Report

Transcript featureselection.asu.edu

Learning Dissimilarities for Categorical Symbols

Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki

Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk, zaki}@cs.rpi.edu

Presentation Outline

• • • • •

Introduction Related Work Learning Dissimilarity (LD) Algorithm Experimental Results Conclusion

Introduction

• Distance plays an important role in many data mining tasks •

Distance is rarely defined precisely for categorical data

– – nominal and ordinal e.g., rating of a movie {very bad, bad, fair, good, very good} •

Goal:

derive dissimilarities between categorical symbols – – To enable the full power of distance-based methods. Hopefully easier for interpretation as well.

Notation

• • A dataset X ={x 1 ,x 2 ,…,x t } of t data points. Each

point attributes

values x i = (x 1 i ,…, x m i ).

Each attribute A i is drawn from n i Each a i j is also called a

symbol

.

x i has m discrete values {a i 1 ,…, a i ni }. • • • The

similarity

between symbols a i k and a i l : The

dissimilarity

or The

distance

between two data points x i terms of the distance between symbols and x j is defined in

Notation (cont.)

• Let the

frequency

of symbol a i in the dataset be then the

probability

• • • Class

label Output

of the classier on point x i : The

error

of misclassifying point x i : • Total classification error:

Related Work

• Unsupervised methods: – Assign based on frequency; Emphasize

mismatch frequent

or

rare

point of views.

or

match

for symbols from certain probability or information theory     Lin Burnaby Smirnov Goodall  Gambaryan  Eskin  Occurrence Frequency (OF)  Inverse Occurrence Frequency (IOF) • Supervised methods: – Take the classes information into account  Value Difference Metric (VDM)  Cheng et al..

Unsupervised Method Examples

Goodall

: less frequent attribute values make greater contribution to the overall similarity than frequent attribute values on match . That is, if a i =a j otherwise, 0 •

Inverse Occurrence Frequency (IOF):

assigns higher weight to mismatches on less frequent symbols . That is, if a i !=a j otherwise, 1

Supervised Method Examples

• VDM: – Symbols are similar if they occur with a similar relative frequency for all the classes.

where C ai,c is the number of times symbol a i number of times a i occurs in class c. C occurs in the whole dataset. h is a constant.

ai is the total • Cheng: – – based on RBF classier They attempt to evaluate all the pair-wise distances between symbols, and they optimize the error function using gradient descent method

Learning Dissimilarity Algorithm

Motivation:

– learn a mapping function from each categorical attribute A i onto the real number interval based on the classes information may facilitate the classification task and is possible.

Learning Dissimilarity Algorithm (cont.)

• • • Based on nearest neighbor classifier and the distance difference two classes from Iteration learning Guided by gradient descent method to minimize the total classification error

Learning Dissimilarity Algorithm (cont.)

Objective Function and Update Equation

Learning Dissimilarity Algorithm (cont.)

The Derivative of ∆d

The full update equation

Learning Dissimilarity Algorithm (cont.)

Intuitive meaning of assignment update

Datasets

Experimental Result

Experimental Result (cont.)

• Redundancy among symbols

Experimental Result (cont.)

• Comparison with Various Data-Driven Methods – On average, the LD and VDM achieve the best accuracy, indicating that supervised dissimilarities attain better results over the unsupervised ones. Among the unsupervised measures, IOF, Lin are slightly superior to others.

Experimental Result (cont.)

• Analysis with confidence interval (accuracy +/- standard deviation) – LD performed statistically worse than Lin on datasets Splice and Tic-tac-toe but better than Lin on datasets Connection-4, Hayes and Balance Scale. – LD performed statistically worse than VDM only on one dataset (Splice) but better on two datasets (Connection-4 and Tic-tac-toe). – Finally, LD performed statistically at least as well as (and on some datasets, e.g. Connection-4, better than) the remaining methods.

Experimental Result (cont.)

• Comparison with Various Classifiers – LD performed statistically worse than the other methods on only one dataset (Splice) but performed better on at least three other datasets than each of the other methods.

Conclusion

• A task-oriented or supervised iterative learning approach to learn a distance function for categorical data. – Explores the relationships between categorical symbols by utilizing the classification error as guidance. – The real value mappings found by our algorithm provide discriminative information, which can be used to refine features and improve classification accuracy.

Thank you!