Document 7823549

Download Report

Transcript Document 7823549

Fuzzy Support Vector Machines
(FSVMs)
Weijia Wang, Huanren Zhang,
Vijendra Purohit, Aditi Gupta
Outline
• Review of SVMs
• Formalization of FSVMs
•Training algorithm for FSVMs
•Noisy distribution model
•Determination of heuristic function
• Experiment results
SVM – brief review
• Classification technique
• Method:
• Maps points into high-dimensional feature space
• Finds a separating hyperplane that maximizes the
margin
Set S of labeled training points:
Each point
belongs to one of the two classes,
Let
be feature space vector, with mapping
to feature space
Then equation of hyperplane:
For linearly separable data, Optimization problem:
Subject to
from
For non-linearly separable data (soft margin), introduce slack
variables
-> some measure of amount of misclassifications
Optimization problem:
Limitation: All training points are treated equal
FSVM – Fuzzy SVM
• each training point belongs exactly to no more than one
class
• some training points are more important than others- these
meaningful data points must be classified correctly
(even if some noisy, less important points, are misclassified).
Fuzzy membership: si
: how much point xi belongs to one class (amount of meaningful
information in the data point)
: amount of noise in the data point
Set S of labeled training points:
Optimization problem:
- Regularization constant
large C -> narrower margin, less misclassifications
Lagrange function:
Taking derivatives:
Optimization problem:
Kuhn-Tucker conditions :
λ – lagrange multiplier
g(x) – inequality constraint

Points with
are support vectors (lie on red boundary).
Two types of support vectors:
lies on margin of hyperplane
misclassified if
>1
=> Points with same
could be different types of support
vectors in FSVM due to
=> SVM – one free parameter (C)
FSVM - number of free params = C, si (~ number of training
points)
Training algorithm for FSVMs

Objective function for optimization

Minimization of the error function
 Maximization of the margin
 The balance is controlled by tuning C
Selection of error function

Least absolute value in SVMs

Least square value in LS-SVMs



Suykens and Vanewalle, 1999
the QP is transformed to solving a linear system
the support values are mostly nonzero
Selection of error function

maximum likelihood method


when the underlying error probability
can be estimated
optimization problem becomes
Maximum likelihood error

limitation

the precision of estimation of hyperplane
depends on estimation of error function

the estimation of error is reliable only when
the underlying hyperplane is well estimated
Selection of error function

Weighted least absolute value

each data is associated with a cost or
importance factor



when the noise distribution model of data given
px(x) is the probability that point x is not a noise
optimization becomes
Weighted least absolute value

Relation with FSVMs
take px(x) as a fuzzy membership, i.e
px(x) = s
Selection of max margin term

Generalized optimal plane (GOP)

Robust linear programming(RLP)
Implementation of NDM

Goal


build a probability distribution model for data
Ingredients



a heuristic function: highly relevant to px(x)
confident factor: hC
trashy factor: hT
Density function for data
Density function for data
Heuristic function

Kernel-target alignment

K-nearest neighbors
Basic idea: Outliers have higher
probability to be noise
Kernel-target alignment
Measurement of how likely the point
xi is noise.
K-nearest neighbors: example
Gaussian kernel
can be written as
the cosine of the angel
between two vectors
in the feature space


The outlier data point xi
will have smaller value
of fK(xi,yi)
Use fK(x,y) as a
heuristic function h(x)
K-nearest neighbors (k-NN)



For each xi, the set Sik consists
k nearest neighbors of xi
ni is the number of data points in
the set Sik that the class label is
the same as the class label of
data point xi
Heuristic function h(xi)=ni
Comparison of two heuristic
function

Kernel-target alignment
 Operate
in the feature space, use the
information of all data points to
determine the heuristic for one point

k-NN
 Operate
in the original space, use the
information of k data points to
determine the heuristic for one point

How about combine them two?!
Overall Procedure for FSVMs


S ik
1. Use SVM algorithm to get the optimal
kernel parameters and the regularization
parameter C
2. Fix the kernel parameters and the
regularization parameter C, determine
heuristic function h(x), and use exhaustive
search to choose the confident factor hc
and trashy factor hT, mapping degree d
and the fuzzy membership lower bound σ
Experiments

Data with time property
FSVM results for data with time property
SVM results for data with time property
Experiments

Two classes with different weighting
Results from FSVM
Results from SVM
Experiments

Using class center to reduce effect of
outliers.
Results from FSVM
Results from SVM
Experiments
(setting fuzzy membership)

Kernel Target Alignment

Two step strategy

Fix fUBk and fLBk as following:
 fUBk =
maxi fk (xi, yi) and fLBk = mini fk (xi, yi)

Find σ and d using a two-dimensional search.

Now, find fUBk and fLBk
Experiments
(setting fuzzy membership)

k-Nearest Neighbor

Perform a two-dimensional search for
parameters σ and k.

kUB = k/2 and d=1 are fixed.
Experiments

Comparison of results from KTA and k-NN
with other classifiers (Test Errors)
Conclusion




FSVMs work well when the average training
error is high, which means it can improve
performance of SVMs for noisy data.
No. of free parameters for FSVMs is very
high C, si for each data point.
Results using KTA and k-NN are similar but
KTA is more complicated and takes more
time to find optimal values of parameters.
This papers studies FSVMs only for two
classes, multi-class scenarios are not
explored.