슬라이드 1

Download Report

Transcript 슬라이드 1

SVM Classification
2005. 01. 10.
Dae-Won Park
SVM Classification
 SVM
 Introduced in COLY-92 by Boser, Guyon, Vapnik
 Original idea of SVM Classification
 Use a linear separating hyperplane to create a classifier
2
Linear Separating Hyperplane(1/3)
 Notation
 Training vectors xi, i =1, … , l of length n
 A vector y defined as follows
yi = 1 if xi in class 1
-1 if xi in class 2
 Try to find the separating hyperplane
 With the largest margin between two classes
=> Find parameters w and b such that
the distance between wTx + b = ±1 maximize
w Tx
+b =
+1
0
-1
3
Linear Separating Hyperplane(2/3)
 The distance between wTx + b = ±1
 Consider a point x on the wTx + b = -1
 Assume x + tw touches the line wTx + b = 1
 twTw = 2
 tw = 2||w||/(wTw) = 2/||w||
maximizing 2/||w|| is equivalent to minimizing wTw/2
* constraint yi(wTx + b) ≥ 1
- (wTxi) + b ≥ 1 if yi = 1
- (wTxi) + b ≤ 1 if yi = -1
4
Linear Separating Hyperplane(3/3)
 Example
 Training data { x1 = 0, x2 = 1}, y = [ -1, 1]T  R1
0
1
 What is the separating hyperplane?
- minw,b
1/2w2
- Subject to
w·1 + b ≥1,
-1(w·0 + b) ≥ 1
=> (w,b) = (2,-1)
☺ The separating hyperplane is 2x-1 = 0
0
x =1/2
1
5
Mapping Data to Higher Dimensional Spaces(1/3)
 Practical problems
 Non-linearly separable data
 Noisy data
 Examples
 Cotes and Vapnik[1995] introduced slack variables
 ξi, i = 1, … , l
6
Mapping Data to Higher Dimensional Spaces(2/3)
 Allow that training data may not be on the correct side
 It happens when ξi > 1
 (w, b) -> (w, b, ξ)
- ξi = max( 0, 1 - yi(wTxi + b) ), i = 1, … , l
 Most data except some noisy ones are separable by a linear function
=> in the objective function, we add a penalty term
- C ∑i=1,,,l ξi , where C > 0
( most ξi should be zero, so the constraint goes back to its original form)
7
Mapping Data to Higher Dimensional Spaces(3/3)
 If data are distributed in a highly nonlinear way
 A linear function causes many training instances to be on the wrong side of the
hyperplane => underfitting occurs
 To fit the training data better
 Using a nonlinear curve
- Example) eliptic curve, polynomial curve
 Mapping data into a higher dimensional space
- Example) height, weight -> height, weight, height-weight,weight/(height2)
 Transform the original input space into a higher dimensional feature space
 φ(x) = (φ1(x), φ2(x), φ3(x), … )
8
Dual Representation (1/2)
 The decision function can be re-written as follows
 f(x) = ‹ w, x › + b = ∑i yi ‹ xi x › + b
- w = ∑ i y i x i
 Data appear only within dot products
- f(x) = ∑i yi ‹ φ (xi), φ (x) › + b
9
Dual Representation (2/2)
 Problems with feature space
 Working in high dimensional feature spaces solves the problem of expressing
complex functions
 But
- There is a computational problem (working with very large vectors)
- There is a generalization theory problem (curse of dimensionality)
 Kernels
 Solve the computational problem of working with many dimensions
 Can make it possible to use infinite dimensions
- Efficiently in time / space
10
Kernels
 Kernel
 A function that returns the value of the dot product between the images of the
two arguments
K( x1, x2 ) = ‹ φ (x1), φ (x2) ›
 For example
Gaussian kernel or Radial basis function (RBF) kernel
Polynomial kernel
11
Multi-class SVM
 One-against-all Multi-class SVM
 One-against-the-rest
 Exception
- f i(x) ≥ 1 and f j(x) ≤ -1 , if j ≠ i
 Decision rule
- Predicted Class = argmax i=1, 4 f i(x)
yi = 1
yi = -1
Decision Function
Class 1
Classes 2, 3, 4
f 1(x) = w(1)Tx + b1
Class 2
Classes 1, 3, 4
f 2(x) = w(2)Tx + b2
Class 3
Classes 1, 2, 4
f 3(x) = w(3)Tx + b3
Class 4
Classes 1, 2, 3
f 4(x) = w(4)Tx + b4
12
Multi-class SVM
 One-against-one Multi-class SVM
 Pairwise approach
yi = 1
yi = -1
Decision Function
Class 1
Class 2
f 12(x) = w(12)Tx + b12
Class 1
Class 3
f 13(x) = w(13)Tx + b13
Class 1
Class 4
f 14(x) = w(14)Tx + b14
Class 2
Class 3
f 23(x) = w(23)Tx + b23
Class 2
Class 4
f 24(x) = w(24)Tx + b24
Class 3
Class 4
f 34(x) = w(34)Tx + b34
 Choose one class obtains the highest number of votes
13