Online Passive-Aggressive Algorithms

Download Report

Transcript Online Passive-Aggressive Algorithms

PEGASOS

1 Shai Shalev-Shwartz Yoram Singer Nati Srebro

The Hebrew University Jerusalem, Israel

QP form:

Support Vector Machines

2 More “natural” form: Regularization term Empirical loss

Outline

• Previous Work • The Pegasos algorithm • Analysis – faster convergence rates • Experiments – outperforms state-of-the-art • Extensions • kernels • complex prediction problems • bias term 3

Previous Work

• Dual-based methods • Interior Point methods • Memory: m 2 , time: m 3 log(log(1/  )) • Decomposition methods • Memory: m, Time: super-linear in m • Online learning & Stochastic Gradient • Memory: O(1), Time: 1/  2 • Memory: 1/  2 , Time: 1/  4 (linear kernel) (non-linear kernel) Typically, online learning algorithms do not converge to the optimal solution of SVM 4

PEGASOS

A_t = S Subgradient method |A_t| = 1 Stochastic gradient 5 Subgradient Projection

Run-Time of Pegasos

• Choosing |A t |=1 and a linear kernel over R n  Run-time required for Pegasos to find  accurate solution w.p. ¸ 1  6 • Run-time does not depend on #examples • Depends on “difficulty” of problem (  and  )

Formal Properties

• Definition: w is  accurate if • Theorem 1 : Pegasos finds  accurate solution w.p. ¸ 1  after at most iterations.

• Theorem 2 : Pegasos finds log(1/  ) solutions s.t. w.p. ¸ 1  , at least one of them is  accurate after iterations 7

Proof Sketch

A second look on the update step: 8

Proof Sketch

• Lemma ( free projection ): • Logarithmic Regret for OCP (Hazan et al’06) 9 • Take expectation : • f(w r )-f(w * ) ¸ 0  Markov gives that w.p. ¸ 1  • Amplify the confidence

Experiments

3 datasets

(provided by Joachims) • Reuters CCAT (800K examples, 47k features) • Physics ArXiv (62k examples, 100k features) • Covertype (581k examples, 54 features) •

4 competing algorithms

• SVM-light (Joachims) • SVM-Perf (Joachims’06) • Norma (Kivinen, Smola, Williamson ’02) • Zhang’04 (stochastic gradient descent) •

Source-Code

available online 10

Training Time (in seconds)

Pegasos SVM-Perf SVM-Light 11 Reuters

2

77 20,075 Covertype Astro Physics

6 2

85 5 25,514 80

Compare to Norma (on Physics)

12 obj. value test error

Compare to Zhang (on Physics)

13 But, tuning the parameter is more expensive than learning …

Effect of k=|A

t

| when T is fixed

14

Effect of k=|A

t

| when kT is fixed

15

I want my kernels !

• Pegasos can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function • No need to switch to the dual problem • Number of support vectors is bounded by 16

Complex Decision Problems

• Pegasos works whenever we know how to calculate subgradients of loss func. l (w;(x,y)) • Example: Structured output prediction 17 • Subgradient is  (x,y’)  (x,y) where y’ is the maximizer in the definition of l

b

ias term

• Popular approach: increase dimension of x Cons: “pay” for b in the regularization term • Calculate subgradients w.r.t. w and w.r.t b: Cons: convergence rate is 1/  2 • Define: Cons: |A t | need to be large • Search b in an outer loop Cons: evaluating objective is 1/  2 18

Discussion

• Pegasos : Simple & Efficient solver for SVM • Sample vs. computational complexity • Sample complexity: How many examples do we need as a function of VC-dim (  ), accuracy (  ), and confidence (  ) • In Pegasos, we aim at analyzing computational complexity based on  ,  , and  (also in Bottou & Bousquet) • Finding argmin vs. calculating min : It seems that Pegasos finds the argmin more easily than it requires to calculate the min value 19