Transcript Slide 1

Multiple Kernel Learning
Marius Kloft
Technische Universität Berlin
Korea University
Marius Kloft (TU Berlin)
1/12
Machine Learning
• Aim
▫ Learning the relation
of two random quantities
 from observations
• Kernel-based learning:
• Example
of
and
▫ Object detection in images
Marius Kloft (TU Berlin)
Multiple Views / Kernels
Space
Shape
2/12
(Lanckriet, 2004)
How to combine
the views?
Weightings.
Color
Marius Kloft (TU Berlin)
3/12
Computation of Weights?
• State of the art
(Bach, 2008)
▫ Sparse weights
 Kernels / views are completely
discarded
▫ But why discard
information?
Marius Kloft (TU Berlin)
4/12
From Vision to Reality?
• State of the art: sparse method
▫ empirically ineffective
(Gehler et al., Noble et al., ShaweTaylor et al., NIPS 2008)
• My dissertation: new methodology
▫ established as a standard
Effective in
Applications
More efficient and
effective in practice
Breakthrough at learning bounds: O(M/n)
Marius Kloft (TU Berlin)
Non-sparse Multiple Kernel Learning
Marius Kloft (TU Berlin)
5/12
(Kloft et al., ECML
2010, JMLR 2011)
New Methodology
• Computation of weights?
▫ Model
 Kernel
▫ Mathematical program
• Generalized formulation
▫ arbitrary loss
▫ arbitrary norms
 e.g. lp-norms:
 1-norm leads to sparsity:
Optimization over weights
Convex problem.
Marius Kloft (TU Berlin)
6/12
Theoretical Analysis
• Theoretical foundations
▫ Active research topic
 NIPS workshop 2010
• Corollaries (Learning Bounds)
▫ Upper bound with rate
 best known rate:
(Cortes et al., ICML 2010)
▫ We show:
 Theorem (Kloft & Blanchard).
The local Rademacher
complexity of MKL is bounded
by:
 Generally
 for
,
improvement of two orders of
magnitude
(Kloft & Blanchard, NIPS 2011, JMLR 2012)
Marius Kloft (TU Berlin)
Proof (Sketch)
1.
Relating the original class with the centered class
2.
Bounding the complexity of the centered class
3.
Khintchine-Kahane’s and Rosenthal’s inequalities
4.
Bounding the complexity of the original class
5.
Relating the bound to the truncation of the spectra of the kernels
7/12
Marius Kloft (TU Berlin)
8/12
Optimization
• Algorithms
(Kloft et al., JMLR 2011)
1.
Newton method
2.
sequential, quadratically
constrained programming
with level set projections
3.
• Implementation
▫ In C++ (“SHOGUN Toolbox”)
 Matlab/Octave/Python/R support
▫ Runtime:
block-coordinate descent alg.
 Alternate
(Sketch)
 solve (P) w.r.t. w
 solve (P) w.r.t.
:
 Until convergence
analytical
(proved)
~ 1-2 orders of magnitude faster
Marius Kloft (TU Berlin)
11
Toy Experiment
• Results
• Design
▫ Choice
Two 50-dimensional
of p crucial Gaussians
 mean μ1of
onpsimplex,
▫ Optimality
depends on true sparsity





50%
test
error
▫ Zero-mean
features
irrelevant
SVM
fails in the sparse
scenarios
50 training examples
1-norm best in the sparsest scenario only
One linear kernel per feature
p-norm MKL proves robust in all scenarios
▫ Six scenarios
• Bounds
 Vary % of irrelevant features
▫ Can minimize
w.r.t. 92%
p
0% 44% bounds
64% 82%
0%
0%
44%
64%
92% 98%
sparsity
98%
sparsity
 Variance so that Bayes error constant
▫ Bounds well reflect empirical results
82%
Sparsity
Marius Kloft (TU Berlin)
Applications
• Lesson learned:
▫ Optimality of a method depends on
true underlying sparsity of problem
• Applications studied:
non-sparsity
12
Marius Kloft (TU Berlin)
9/12
Application Domain: Computer Vision
• Visual object recognition
▫ Aim: annotation of visual media
(e.g., images)
aeroplane
▫ Motivation:
▫ content-based image retrieval
bicycle
bird
Marius Kloft (TU Berlin)
9/12
Application Domain: Computer Vision
• Visual object recognition
▫ Aim: annotation of visual media
(e.g., images)
▫ Motivation:
▫ content-based image retrieval
• Multiple kernels
▫ based on
 Color histograms
 shapes
(gradients)
 local features
(SIFT words)
 spatial features
Marius Kloft (TU Berlin)
9/12
Application Domain: Computer Vision
• 32 Kernels
• Datasets
▫ 1. VOC 2009 challenge
▫ 4 types:
Bag of Words
Global
Histogram
Gradients
BoW-SIFT
Ho(O)G
Color Histograms
BoW-C
HoC
▫ varied over color channel
combinations and spatial
tilings (levels of a spatial
pyramid)
▫ one-vs.-rest classifier for each
visual concept
 7054 train / 6925 test images
 20 object categories
▫ aeroplane, bicycle,…
▫ 2. ImageCLEF2010 Challenge
 8000 train / 10000 test images
▫ taken from Flickr
 93 concept categories
▫ partylife, architecture,
skateboard, …
Marius Kloft (TU Berlin)
9/12
Application Domain: Computer Vision
• Preliminary results:
SVM
MKL
VOC2009
55.85
56.76
CLEF2010
36.45
37.02
• Challenge results
▫ Employed our approach for
ImageCLEF2011 Photo
Annotation challenge
 achieved the winning entries in 3
categories!
(Binder et al., 2010,2011)
▫ using BoW-S only gives worse
results
→ BoW-S alone is not sufficient
Marius Kloft (TU Berlin)
9/12
Application Domain: Computer Vision
• Why can MKL help?
▫ some images better captured
by certain kernels
• Experiment
▫ Disagreement of single-kernel
classifiers
▫ Different images may have
different kernels that capture
them well
▫ BoW-S kernels induce more or
less the same predictions
Marius Kloft (TU Berlin)
Application Domain: Genetics
(Kloft et al., NIPS 2009, JMLR 2011)
• Detection of
▫ transcription start sites:
• Empirical analysis
▫ detection accuracy (AUC):
• by means of kernels based on:
▫ sequence alignments
▫ distribution of nukleotides
 downstream, upstream
▫ folding properties
 binding energies and angles
▫ higher accuracies than sparse
MKL and ARTS
 ARTS winner of international
comparison of 19 models
(Abeel et al., 2009)
Marius Kloft (TU Berlin)
Application Domain: Genetics
(Kloft et al., NIPS 2009, JMLR 2011)
• Theoretical analysis
▫ impact of lp-Norm on bound
• Empirical analysis
▫ detection accuracy (AUC):
▫ confirms experimental results:
 stronger theoretical guarantees for
proposed approach (p>1)
 empirical and theoretical results
approximately equal for
▫ higher accuracies than sparse
MKL and ARTS
 ARTS winner of international
comparison of 19 models
(Abeel et al., 2009)
Marius Kloft (TU Berlin)
20
Application Domain: Pharmacology
• Protein Fold Prediction
▫ Prediction of fold class of protein
• Results
▫ Accuracy:
 Fold class related to protein’s
function
▫ e.g., important for drug
design
▫ Data set and kernels from Y. Ying
 27 fold classes
 Fixed train and test sets
 12 biologically inspired kernels
▫ e.g. hydrophobicity, polarity,
van-der-Waals volume
▫ 1-norm MKL and SVM on par
▫ p-norm MKL performs best
 6% higher accuracy than
baselines
Marius Kloft (TU Berlin)
21
Further Applications
non-sparsity
Marius Kloft (TU Berlin)
10/12
Conclusion:
Non-sparse Multiple Kernel Learning
Visual Object Recognition
established standard:
winner of ImageCLEF
2011 Challenge
Computational Biology
Applications
Training with > 100,000 data
points and > 1 000 Kernels
Sharp learning bounds
More accurate gene
detector than winner
of int. comparison
12/12
Thank you for your attention.
I will be pleased to answer any additional questions.
11/12
References
▫
Abeel, Van de Peer, Saeys (2009). Toward a gold standard for promoter prediction evaluation.
Bioinformatics.
▫
Bach (2008). Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine
Learning Research (JMLR).
▫
Kloft, Brefeld, Laskov, and Sonnenburg (2008). Non-sparse Multiple Kernel Learning. NIPS
Workshop on Kernel Learning.
▫
Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009). Efficient and Accurate Lp-norm
Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).
▫
Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.
▫
Kloft, Blanchard (2011). The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning.
Advances in Neural Information Processing Systems (NIPS 2011).
▫
Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of
Machine Learning Research (JMLR), 12(Mar):953-997.
▫
Kloft and Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning.
Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.
▫
Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with
Semidefinite Programming. Journal of Machine Learning Research (JMLR).