Support Vector Machine

Download Report

Transcript Support Vector Machine

Support Vector
Machines
Optimization
objective
Machine Learning
Alternative view of logistic regression
If
If
, we want
, we want
,
,
Andrew Ng
Alternative view of logistic regression
Cost of example:
If
(want
):
If
(want
):
Andrew Ng
Support vector machine
Logistic regression:
Support vector machine:
Andrew Ng
SVM hypothesis
Hypothesis:
Andrew Ng
Support Vector
Machines
Large Margin
Intuition
Machine Learning
Support Vector Machine
-1
If
If
, we want
, we want
1
-1
1
(not just
(not just
)
)
Andrew Ng
SVM Decision Boundary
Whenever
Whenever
:
-1
1
-1
1
:
Andrew Ng
SVM Decision Boundary: Linearly separable case
x2
x1
Large margin classifier
Andrew Ng
Large margin classifier in presence of outliers
x2
x1
Andrew Ng
Support Vector
Machines
The mathematics
behind large margin
classification (optional)
Machine Learning
Vector Inner Product
Andrew Ng
SVM Decision Boundary
Andrew Ng
SVM Decision Boundary
Andrew Ng
Support Vector
Machines
커널(Kernels) I
Machine Learning
비선형 결정 경계
x2
x1
특징
의 또 다른/ 더 나은 선택이 있는가?
Andrew Ng
Kernel
주어진
에 대하여, 랜드마크
의
근접도에 의존하는 새로운 특징을 계산한다.
x2
x1
Andrew Ng
커널과 유사도(Similarity)
Andrew Ng
Example:
Andrew Ng
x2
x1
Andrew Ng
Support Vector
Machines
커널(Kernels) II
Machine Learning
Choosing the landmarks
Given :
x2
x1
Predict
if
Where to get
?
Andrew Ng
SVM with Kernels
Given
choose
Given example :
For training example
:
Andrew Ng
SVM with Kernels
Hypothesis: Given , compute features
Predict “y=1” if
Training:
Andrew Ng
SVM parameters:
C(
). Large C: Lower bias, high variance.
Small C: Higher bias, low variance.
Large : 특징 가 좀 더 부드럽게 변화한다.
Higher bias, lower variance.
Small
: 특징
가 좀 덜 부드럽게 변화한다.
Lower bias, higher variance.
Andrew Ng
Support Vector
Machines
Using an SVM
Machine Learning
Use SVM software package (e.g. liblinear, libsvm, …) to solve for
parameters .
Need to specify:
Choice of parameter C.
Choice of kernel (similarity function):
E.g. No kernel (“linear kernel”)
Predict “y = 1” if
Gaussian kernel:
, where
Need to choose
.
.
Andrew Ng
Kernel (similarity) functions:
function f = kernel(x1,x2)
x1
x2
return
주의 : 가우시안 커널을 사용하기 전에 특징 스케일링을 수행한다.
Andrew Ng
Other choices of kernel
Note: 모든 유사도 함수
가 유효한 커널은 아니다.
( SVM 패키지의 최적화가 정확히 실행되고 발산하지 않으려면, “Mercer’s
Theorem”라는 기술적 조건을 만족할 필요가 있다).
많은 이용 가능한 커널들:
- Polynomial kernel:
-
좀 더 난해한: String kernel, chi-square kernel, histogram
intersection kernel, …
Andrew Ng
Multi-class classification
많은 SVM 패키지들이 이미 내장된 다중-클래스 분류 기능을 갖는다.
그렇지 않으면, 일-대-모두 방법을 사용한다
(Train
SVMs, one to distinguish
from the rest, for
), get
Pick class with largest
Andrew Ng
Logistic regression vs. SVMs
number of features (
),
number of training examples
If is large (relative to ):
Use logistic regression, or SVM without a kernel (“linear kernel”)
If
is small,
is intermediate:
Use SVM with Gaussian kernel
If
is small, is large:
Create/add more features, then use logistic regression or SVM
without a kernel
Neural network likely to work well for most of these settings, but may be
slower to train.
Andrew Ng