Review of SVM optimization

Download Report

Transcript Review of SVM optimization

PEGASOS
Primal Estimated sub-GrAdient Solver for SVM
Ming TIAN
04-20-2012
1
Reference
[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos:
primal estimated sub-gradient solver for svm. ICML, 807-814.
Mathematical Programming, Series B, 127(1):3-30, 2011.
[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).
Multi-Class Pegasos on a Budget. ICML.
[3] Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2,
262-292.
[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classification on a budget. NIPS, 16, 225-232.
2
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
3
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
4
Review of SVM optimization
Q1:
Regularization term
Empirical loss
5
Review of SVM optimization
6
Review of SVM optimization
 Dual-based methods
 Interior Point methods
 Memory: m2, time: m3, log(log(1/))
 Decomposition methods
 Memory: m, Time: super-linear in m
 Online learning & Stochastic Gradient
 Memory: O(1), Time: 1/2 (linear kernel)
2, dimensional
Better
rates for 1/
finite
(Murata,
Bottou)
 Memory:
Time: 1/4 instances
(non-linear
kernel)
 Typically, online learning algorithms do not
converge to the optimal solution of SVM
7
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
8
PEGASOS
A_t = S
|A_t| = 1
Subgradient method
Stochastic gradient
Subgradient
Projection
9
Run-Time of Pegasos
 Choosing |At|=1 and a linear kernel over Rn
 Run-time required for Pegasos to find 
accurate solution with probability 1-
 Run-time does not depend on #examples
 Depends on “difficulty” of problem ( and )
10
Formal Properties
 Definition: w is  accurate if
 Theorem 1: Pegasos finds  accurate
solution w.p. 1- after at most
iterations.
 Theorem 2: Pegasos finds log(1/) solutions
s.t. w.p. 1-, at least one of them is 
accurate after
iterations
11
Proof Sketch
A second look on the update step:
12
Proof Sketch
 Denote:
 Logarithmic Regret for OCP
 Take expectation:
 f(wr)-f(w*)
0  Markov gives that w.p.
1-
 Amplify the confidence
13
Proof Sketch
14
Proof Sketch
A function f is called  - strongly convex if
is a convex function.
15
Proof Sketch
16
Proof Sketch
17
Experiments
 3 datasets (provided by Joachims)
 Reuters CCAT (800K examples, 47k features)
 Physics ArXiv (62k examples, 100k features)
 Covertype (581k examples, 54 features)
 4 competing algorithms
 SVM-light (Joachims)
 SVM-Perf (Joachims’06)
 Norma (Kivinen, Smola, Williamson ’02)
 Zhang’04 (stochastic gradient descent)
18
Training Time (in seconds)
Pegasos
SVMPerf
SVMLight
Reuters
2
77
20,075
Covertype
6
85
25,514
AstroPhysics
2
5
80
19
Compare to Norma (on Physics)
obj. value
test error
20
Objective
Compare to Zhang (on Physics)
But, tuning the parameter is more expensive than learning …
21
Objective
Effect of k=|At| when T is fixed
22
Objective
Effect of k=|At| when kT is fixed
23
bias term
 Popular approach: increase dimension of x
Cons: “pay” for b in the regularization term
 Calculate subgradients w.r.t. w and w.r.t b:
Cons: convergence rate is 1/2
 Define:
Cons: |At| need to be large
 Search b in an outer loop
Cons: evaluating objective is 1/2
24
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
25
multi-class SVM (Crammer & Singer, 2001)
multi-class model :
26
multi-class SVM (Crammer & Singer, 2001)
multi-class SVM objective function:
where
and the multi-class hinge-loss function is defined as:
where
27
multi-class Pegasos
use the instantaneous objective function :
multi-class Pegasos works by iteratively executing
the two-step updates :
Step 1:
Where:
28
multi-class Pegasos
If loss is equal to zero then:
Else:
Step 2:
project the weight wt+1 into the closed convex set:
29
Budgeted Multi-Class Pegasos
30
Budget Maintenance Strategies
 Budget maintenance through removal
 the optimal removal always selects the oldest SV
 Budget maintenance through projection
 projecting an SV onto all the remaining SVs and thus
results in smaller weight degradation.
 Budget maintenance through Merging
 merging two SVs to a newly created one
 The total cost of finding the optimal merging for the n-th
and m-th SV is O(1).
31
Experiments
32
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
33
Further works
 Distribution_aware Pegasos?
 Online structural regularized SVM?
34
Thanks! Q&A
35
36