Review of SVM optimization
Download
Report
Transcript Review of SVM optimization
PEGASOS
Primal Estimated sub-GrAdient Solver for SVM
Ming TIAN
04-20-2012
1
Reference
[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos:
primal estimated sub-gradient solver for svm. ICML, 807-814.
Mathematical Programming, Series B, 127(1):3-30, 2011.
[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).
Multi-Class Pegasos on a Budget. ICML.
[3] Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2,
262-292.
[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classification on a budget. NIPS, 16, 225-232.
2
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
3
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
4
Review of SVM optimization
Q1:
Regularization term
Empirical loss
5
Review of SVM optimization
6
Review of SVM optimization
Dual-based methods
Interior Point methods
Memory: m2, time: m3, log(log(1/))
Decomposition methods
Memory: m, Time: super-linear in m
Online learning & Stochastic Gradient
Memory: O(1), Time: 1/2 (linear kernel)
2, dimensional
Better
rates for 1/
finite
(Murata,
Bottou)
Memory:
Time: 1/4 instances
(non-linear
kernel)
Typically, online learning algorithms do not
converge to the optimal solution of SVM
7
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
8
PEGASOS
A_t = S
|A_t| = 1
Subgradient method
Stochastic gradient
Subgradient
Projection
9
Run-Time of Pegasos
Choosing |At|=1 and a linear kernel over Rn
Run-time required for Pegasos to find
accurate solution with probability 1-
Run-time does not depend on #examples
Depends on “difficulty” of problem ( and )
10
Formal Properties
Definition: w is accurate if
Theorem 1: Pegasos finds accurate
solution w.p. 1- after at most
iterations.
Theorem 2: Pegasos finds log(1/) solutions
s.t. w.p. 1-, at least one of them is
accurate after
iterations
11
Proof Sketch
A second look on the update step:
12
Proof Sketch
Denote:
Logarithmic Regret for OCP
Take expectation:
f(wr)-f(w*)
0 Markov gives that w.p.
1-
Amplify the confidence
13
Proof Sketch
14
Proof Sketch
A function f is called - strongly convex if
is a convex function.
15
Proof Sketch
16
Proof Sketch
17
Experiments
3 datasets (provided by Joachims)
Reuters CCAT (800K examples, 47k features)
Physics ArXiv (62k examples, 100k features)
Covertype (581k examples, 54 features)
4 competing algorithms
SVM-light (Joachims)
SVM-Perf (Joachims’06)
Norma (Kivinen, Smola, Williamson ’02)
Zhang’04 (stochastic gradient descent)
18
Training Time (in seconds)
Pegasos
SVMPerf
SVMLight
Reuters
2
77
20,075
Covertype
6
85
25,514
AstroPhysics
2
5
80
19
Compare to Norma (on Physics)
obj. value
test error
20
Objective
Compare to Zhang (on Physics)
But, tuning the parameter is more expensive than learning …
21
Objective
Effect of k=|At| when T is fixed
22
Objective
Effect of k=|At| when kT is fixed
23
bias term
Popular approach: increase dimension of x
Cons: “pay” for b in the regularization term
Calculate subgradients w.r.t. w and w.r.t b:
Cons: convergence rate is 1/2
Define:
Cons: |At| need to be large
Search b in an outer loop
Cons: evaluating objective is 1/2
24
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
25
multi-class SVM (Crammer & Singer, 2001)
multi-class model :
26
multi-class SVM (Crammer & Singer, 2001)
multi-class SVM objective function:
where
and the multi-class hinge-loss function is defined as:
where
27
multi-class Pegasos
use the instantaneous objective function :
multi-class Pegasos works by iteratively executing
the two-step updates :
Step 1:
Where:
28
multi-class Pegasos
If loss is equal to zero then:
Else:
Step 2:
project the weight wt+1 into the closed convex set:
29
Budgeted Multi-Class Pegasos
30
Budget Maintenance Strategies
Budget maintenance through removal
the optimal removal always selects the oldest SV
Budget maintenance through projection
projecting an SV onto all the remaining SVs and thus
results in smaller weight degradation.
Budget maintenance through Merging
merging two SVs to a newly created one
The total cost of finding the optimal merging for the n-th
and m-th SV is O(1).
31
Experiments
32
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
33
Further works
Distribution_aware Pegasos?
Online structural regularized SVM?
34
Thanks! Q&A
35
36