Semi-supervised learning based on kernel methods and graph

Download Report

Transcript Semi-supervised learning based on kernel methods and graph

Semi-supervised learning based on kernel methods and graph cut algorithms
Semi-supervised learning
based on
kernel methods and graph cut algorithms
Tijl De Bie
Promotor: Prof. Bart De Moor
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Overview
Class learning
• Learning??
• Class learning
Semi-supervised
learning methods
• Semi-supervised learning methods
• Other general learning settings
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
2/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Learning??
Class learning
Semi-supervised
learning methods
• Learning =
– Observing information (data)…
– …detecting regularities…
– …with the goal of making reliable predictions.
• Prerequisites for learning:
Other general
learning settings
– Statistical assumptions
– Enough information, as compared to the
‘complexity’ of the type of regularity to be
learned
23 May 2005
PhD defense - Tijl De Bie
3/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Learning to do what??
Class learning
• Divide pixels of an image in foreground and background…
(image segmentation)
• Divide genes in cell-cycle related and cell-cycle unrelated
genes (bioinformatics)…
Semi-supervised
learning methods
• Divide websites in related and unrelated to some query …
(information retrieval)
• Divide pictures of faces in faces belonging to particular
persons… (machine vision)
Other general
learning settings
 Learn classes or clusters in data!
23 May 2005
PhD defense - Tijl De Bie
4/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Overview
Class learning
• Learning??
• Class learning
Semi-supervised
learning methods
Other general
learning settings
– Supervised class learning: classification
– Unsupervised class learning: clustering
– Semi-supervised class learning: transduction and sideinformation learning
– Examples of semi-supervised learning problems
• Semi-supervised learning methods
• Other general learning settings
23 May 2005
PhD defense - Tijl De Bie
5/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Supervised class learning:
Classification
Class learning
•Classification
•
Given: a set of data
points, and their
class label.
•
How to tell the label
from the data points?
Learn this from the
training set
(induction).
•
I.e., come up with a
good classifier.
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
6/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Supervised class learning:
Classification
Class learning
•Classification
•
Given: a set of data
points, and their
class label.
•
How to tell the label
from the data points?
Learn this from the
training set
(induction).
•
I.e., come up with a
good classifier.
•
Make predictions
about the labels of a
test set (deduction)
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
7/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Unsupervised class learning:
Clustering
Class learning
•Classification
•
Given: a set of data
points, but no class
labels.
•
How to divide the
data points into
coherent groups?
•
(Additional data
points available only
later can be labeled
according to their
location close to one
of the cluster
centers.)
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
8/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Semi-supervised class learning:
Transduction
Class learning
•Classification
•
Given: a training set
of data points with
their class labels,
and a test set of
unlabeled data
points.
•
How to divide the
data points into
coherent groups
while respecting the
labels?
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
9/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Semi-supervised class learning:
Side-information learning
Class learning
•Classification
•
Given: a set of data
points and
constraints on their
labels (sideinformation)
•
How to divide the
data points into
coherent groups
while respecting the
side-information?
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
10/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
•Classification
Why semi-supervised
learning…?
• Unlabeled data is often easy to obtain
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised
learning methods
• Labeled data often expensive / scarce
• Use all information available!
• The statistical assumptions are weaker and more
realistic
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
11/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?
•Examples
Cluster the pixels…
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
12/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?  transductive approach!
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
13/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?  transductive approach!
•Examples
Semi-supervised
learning methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
14/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?  transductive approach!
•Examples
Semi-supervised
learning methods
– Bioinformatics: many genes, wet lab may label some of
them, but at a high cost
Other general
learning settings
DNA sequence…
Microarrays…
23 May 2005
PhD defense - Tijl De Bie
15/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?  transductive approach!
•Examples
Semi-supervised
learning methods
– Bioinformatics: many genes, wet lab may label some of
them, but at a high cost
– Information retrieval on the web: user labels a few websites,
machine learning system uses additional unlabeled websites
from the web (e.g.: likes and dislikes)
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
16/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Very often the ‘right’
approach
Learning??
Class learning
•Classification
•Clustering
•Transduction
& side-info
• Examples:
– Image segmentation: find coherent sets of pixels in a
figure… what is ‘coherent’?  transductive approach!
•Examples
Semi-supervised
learning methods
– Bioinformatics: many genes, wet lab may label some of
them, but at a high cost
– Information retrieval on the web: user labels a few websites,
machine learning system uses additional unlabeled websites
from the web (e.g.: likes and dislikes)
Other general
learning settings
– Face recognition based on different frames of a video
sequence
– …
23 May 2005
PhD defense - Tijl De Bie
17/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
But…
Class learning
•Classification
•Clustering
•Transduction
& side-info
•Examples
Semi-supervised learning
=
Semi-supervised
learning methods
an intrinsically hard combinatorial
problem!
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
18/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Overview
Class learning
• Learning??
• Class learning
Semi-supervised
learning methods
Other general
learning settings
• Semi-supervised learning methods
– Learning a distance metric using sideinformation
– Label constrained graph cut clustering
– Transductive SVM
• Other general learning settings
23 May 2005
PhD defense - Tijl De Bie
19/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Semi-supervised learning
methods
• 3 important approaches
– Learning a distance metric using side-information
– Label constrained graph cut clustering
– Transductive SVM
Data points
& metric
Semi-supervised
learning methods
Side-information
Dim. reduction
Label constrained
clustering
Data points
& new metric
Other general
learning settings
Data points
& affinity measure
Resulting classes
Data points
& metric (kernel)
Classification
With unlabeled data
Resulting classes
Clustering
Resulting classes
23 May 2005
PhD defense - Tijl De Bie
20/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Learning a distance metric
using side-information
• Consider the data points (in the real 2-D space)
Can we distinguish classes/clusters?
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
Clustering?
23 May 2005
PhD defense - Tijl De Bie
21/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Learning a distance metric
using side-information
• Consider the data points (in the real 2-D space)
Can we distinguish classes/clusters?
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
What if we know pairwise constraints on the labels?
23 May 2005
PhD defense - Tijl De Bie
22/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Learning a distance metric
using side-information
• Can be achieved using semi-supervised
dimensionality reduction, followed by clustering:
Project data
points on the
lower
dimensional
hyperplane…
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
23/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Learning a distance metric
using side-information
• Can be achieved using semi-supervised
dimensionality reduction, followed by clustering:
Project data
points on the
lower
dimensional
hyperplane…
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
• We developed a technique based on canonical
correlation analysis to perform such dimensionality
reduction. (Can be kernelized  nonlinear version)
23 May 2005
Own contribution
PhD defense - Tijl De Bie
24/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering
• Graph cut clustering = ?
• Define a measure of affinity, similarity, between
data points
:
Semi-supervised
learning methods
•Metric learning
•Graph cuts
where
•Kernel methods
Graph clustering
Other general
learning settings
=
Divide the graph in coherent parts…
23 May 2005
PhD defense - Tijl De Bie
25/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering
• Notation:
Affinity matrix:
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
Label vector:
(Graph Laplacian:
23 May 2005
)
PhD defense - Tijl De Bie
26/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering
• Normalized cut cost clustering:
Cut cost
Labels
Semi-supervised
learning methods
Balance
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
Very hard combinatorial
problem!
23 May 2005
PhD defense - Tijl De Bie
27/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering ( eigenvalue)
• Relaxation to an eigenvalue problem:
Rewrite in terms of:
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
• Observation:
 relax combinatorial
constraint to this norm constraint
23 May 2005
PhD defense - Tijl De Bie
28/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Label constrained graph cut
clustering ( eigenvalue)
Class learning
Semi-supervised
learning methods
• Solved by eigenvalue problem:
•Metric learning
•Graph cuts
•Kernel methods
• Eigenvector with smallest nonzero eigenvalue =
approximation for unrelaxed
Other general
learning settings
Own contribution
(also: Shi & Malik)
23 May 2005
PhD defense - Tijl De Bie
29/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering ( eigenvalue)
• Transductive setting:
Parameterize the label vector as
}
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Training set
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
Own contribution
30/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering (SDP)
• Relaxation to a semi-definite programming (SDP) problem
• Trick: introduce (rank one) label matrix
• Relax to positive semi-definiteness constraint:
Primal:
Dual:
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
• Much tighter relaxation
• Quite a bit slower… doable up to 1000 data points
23 May 2005
PhD defense - Tijl De Bie
Own contribution
31/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Label constrained graph cut
clustering (SDP)
• Transductive setting:
Parameterize the label matrix as
}
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Training set
Other general
learning settings
23 May 2005
Own contribution
PhD defense - Tijl De Bie
32/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Semi-supervised
learning methods
Label constrained graph cut
clustering (SDP)
• Can we speed up, and retain accuracy of the
relaxation?
• Yes! Combine the eigenvalue relaxation and the
SDP relaxation…
•Metric learning
•Graph cuts
•Kernel methods
• Then often feasible up to 5000 data points
Other general
learning settings
The subspace trick, useful to speed up relaxations
of many hard combinatorial problems!
23 May 2005
Own contribution
PhD defense - Tijl De Bie
33/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Transductive SVM
Class learning
• Support Vector Machine (SVM) = classification
problem
+
+
Semi-supervised
learning methods
-
+
+
-
•Metric learning
•Graph cuts
-
•Kernel methods
-
+
+
the
margin
Other general
learning settings
• SVMs search for the maximum margin hyperplane,
parameterized by a weight vector
23 May 2005
PhD defense - Tijl De Bie
34/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Transductive SVM
Class learning
•
is the solution of:
• Dual:
Semi-supervised
learning methods
•Metric learning
where
and
are inner products
•Graph cuts
•Kernel methods
Other general
learning settings
• Inner product can be any symmetric, bilinear, positive
definite function, i.e. a kernel function:
23 May 2005
PhD defense - Tijl De Bie
35/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Transductive SVM
Class learning
• Transductive SVM:
Also uses the set of unlabeled data points
• (Dual) optimization problem (due to Vapnik):
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
Very hard combinatorial
problem!
23 May 2005
PhD defense - Tijl De Bie
36/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Transductive SVM
Own contribution
Class learning
• Relaxation to an SDP problem, after several
simplifications, and with
:
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
• Feasible up to 100 unlabeled and 1000 labeled
data points
• Subspace trick also works here  up to 1000 total
23 May 2005
PhD defense - Tijl De Bie
37/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Transductive SVM
Class learning
• Performance on an artificial dataset:
With only 2
training
points…
Semi-supervised
learning methods
•Metric learning
•Graph cuts
•Kernel methods
Other general
learning settings
23 May 2005
PhD defense - Tijl De Bie
38/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Overview
Class learning
• Learning??
• Class learning
Semi-supervised
learning methods
• Semi-supervised learning
• Other general learning settings
Other general
learning settings
– Classification using heterogeneous information sources: a
bioinformatics case study
– Inferring transcriptional modules using heterogeneous
information sources
– Canonical correlation analysis: study of regularization
23 May 2005
PhD defense - Tijl De Bie
39/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Classification using
heterogeneous information
• Bioinformatics case study
– Classify genes: coding for transmembrane proteins or not
– Classify genes: coding for ribosomal proteins or not
Semi-supervised
learning methods
– Information available
•
•
•
•
•
Other general
learning settings
•Classification w.
heterogeneous
information
•Inferring modules
DNA sequence of the genes
Upstream DNA sequences
Gene expression profiles
Protein sequence
Protein-protein interaction data
– Positive results using a data fusion approach based on
SVMs, in a transductive setting
Own contribution
•Regularized CCA
23 May 2005
PhD defense - Tijl De Bie
40/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Inferring transcriptional modules
using heterogeneous information
Class learning
Find modules as
sets of genes,
satisfying:
• All share the same
set of regulators
Semi-supervised
learning methods
• All share the same
set of motifs in
upstream DNA
• Their expression
profiles are strongly
correlated
Other general
learning settings
•Classification w.
heterogeneous
information
Own contribution
•Inferring modules
•Regularized CCA
23 May 2005
PhD defense - Tijl De Bie
41/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Inferring transcriptional modules
using heterogeneous information
Class learning
Discovery step…
Semi-supervised
learning methods
Validation step…
Other general
learning settings
•Classification w.
heterogeneous
information
Thanks to Karen Lemmens!
•Inferring modules
•Regularized CCA
23 May 2005
PhD defense - Tijl De Bie
42/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Class learning
Regularization of
canonical correlation analysis
• Regularization
– To prevent overfitting, improve generalization
– To improve numerical stability
– To deal with noise
Semi-supervised
learning methods
• We derived regularized CCA using the last
approach
Other general
learning settings
•Classification w.
heterogeneous
information
• Showed connections between CCA and
independent component analysis (ICA)
Own contribution
•Inferring modules
•Regularized CCA
23 May 2005
PhD defense - Tijl De Bie
43/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Conclusions
Class learning
• Semi-supervised learning is useful !
• Semi-supervised learning is hard algorithmically,
inherently a combinatorial problem
Semi-supervised
learning methods
• We proposed 3 approaches:
– Using dimensionality reduction
– By adapting a graph cut cluster algorithm to take label
constraints into account
– By relaxing the transductive SVM classifier
Other general
learning settings
• Other contributions in data fusion, and multivariate
statistics… (not shown in presentation)
23 May 2005
PhD defense - Tijl De Bie
44/46
Semi-supervised learning based on kernel methods and graph cut algorithms
Learning??
Outlook
Class learning
Semi-supervised
learning methods
• Convex relaxations to approximately solve
combinatorial problems? New optimization results
needed…
• The subspace trick as a means of speeding up
SDP relaxations!
• Statistical study of semi-supervised learning
Other general
learning settings
• Approaches to integrate information coming from
heterogeneous data sources
23 May 2005
PhD defense - Tijl De Bie
45/46
Semi-supervised learning based on kernel methods and graph cut algorithms
No phd without…
…my promotor Bart De Moor,
Johan Suykens, Lieven De Lathauwer, Kathleen Marchal, Yves Moreau, Joos
Vandewalle, Jan Willems, André Barbé, Bart Preneel, Sabine Van Huffel,
Adhemar Bultheel,
Nello Cristianini!
Michael Jordan, Roman Rosipal, Laurent El Ghaoui, Peter Bartlett, Wolfgang
Polonik, Dan Gusfield, Michinari Momma, Bill Noble, John Shawe-Taylor, Edgard
Nyssen,
Gert Lanckriet, Pieter Abbeel,
SCD: Andy, Bart, Bert, Carlos, Diana, Dries, Evelyne, Frank, Geert, Ivan, Ivan,
Jeroen, Jeroen, Johan, Jos, Luc, Lukas, Katrien, Koenraad, Kristiaan, Maarten,
Marcelo, Mieke, Oscar, Pieter, Simon, Sven, Tom, Tony,
and… Bioi@SCD! Bert, Cynthia, Frank, Frizo, Geert, Gert, Janick, Joke, Karen,
Kristof, Mik, Nathalie, Olivier, Patrick, Peter, Peter, Raf, Ruth, Pieter, Steffen,
Stein, Steven, Tim, Tom, Wouter,
the FWO for sponsoring my phd and visits abroad!
23 May 2005
PhD defense - Tijl De Bie
Thanks!
46/46