Computer Vision

Download Report

Transcript Computer Vision

Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Advanced Machine Learning
Lecture 5
Gaussian Processes 2
31.10.2012
Bastian Leibe
RWTH Aachen
http://www.vision.rwth-aachen.de/
[email protected]
This Lecture: Advanced Machine Learning
• Regression Approaches
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced




Linear Regression
Regularization (Ridge, Lasso)
Kernels (Kernel Ridge Regression)
Gaussian Processes
• Bayesian Estimation & Bayesian Non-Parametrics




Mixture Models & EM
Dirichlet Processes
Latent Factor Models
Beta Processes
• SVMs and Structured Output Learning


SV Regression, SVDD
Large-margin Learning
B. Leibe
Topics of This Lecture
• Kernels
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Recap: Kernel trick
Constructing kernels
• Gaussian Processes


Recap: Definition, Prediction, GP Regression
Influence of hyperparameters
• Learning Gaussian Processes


Bayesian Model Selection
Model selection for Gaussian Processes
• Gaussian Processes for Classification


Linear models for classification
Gaussian Process classification
• Applications
B. Leibe
3
Recap: Kernel Ridge Regression
• Dual definition
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Instead of working with w, substitute w = ©Ta into J(w) and
write the result using the kernel matrix K = ©©T :
Solving for a, we obtain
• Prediction for a new input x:

Writing k(x) for the vector with elements
 The dual formulation allows the solution to be entirely
expressed in terms of the kernel function k(x,x’).
B. Leibe
4
Recap: Properties of Kernels
• Theorem
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Let k: X × X ! R be a positive definite kernel function. Then
there exists a Hilbert Space H and a mapping ' : X ! H such
that
where h. , .iH is the inner product in H.
• Translation


Take any set X and any function k : X × X ! R.
If k is a positive definite kernel, then we can use k to learn a
classifier for the elements in X!
• Note

X can be any set, e.g. X = "all videos on YouTube" or X = "all
permutations of {1, . . . , k}", or X = "the internet".
Slide credit: Christoph Lampert
5
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Recap: The “Kernel Trick”
Any algorithm that uses data only in the form
of inner products can be kernelized.
• How to kernelize an algorithm


Write the algorithm only in terms of inner products.
Replace all inner products by kernel function evaluations.
 The resulting algorithm will do the same as the linear
version, but in the (hidden) feature space H.

Caveat: working in H is not a guarantee for better performance.
A good choice of k and model selection are important!
Slide credit: Christoph Lampert
B. Leibe
6
How to Check if a Function is a Kernel
• Problem:
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Checking if a given k : X × X ! R fulfills the conditions for a
kernel is difficult:
We need to prove or disprove
for any set x1,… , xn 2 X and any t 2 Rn for any n 2 N.
• Workaround:

It is easy to construct functions k that are positive definite
kernels.
Slide credit: Christoph Lampert
B. Leibe
7
Constructing Kernels
1. We can construct kernels from scratch:
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

For any ' : X ! Rm, k(x, x’) = h'(x), '(x’)iRm is a kernel.
Example: '(x) = ("# of red pixels in image x", green,blue ).

Any norm k.k : V ! Rm that fulfills the parallelogram equation
induces a kernel by polarization:
Example: X = time series with bounded values,
Slide credit: Christoph Lampert
B. Leibe
8
Constructing Kernels (2)
1. We can construct kernels from scratch:
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

If d : X × X ! R is conditionally positive definite, i.e.
for any t 2 Rn with i ti = 0,
for x1,… , xn 2 X for any n 2 N, then
k(x, x’) := exp(−d(x, x’)) is a positive kernel.
Example: d(x, x’) = kx − x’k2 .
Slide credit: Christoph Lampert
B. Leibe
9
Constructing Kernels (3)
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
2. We can construct kernels from other kernels:

If k is a kernel and ® > 0, then k and k + ® are kernels.

if k1, k2 are kernels, then k1 + k2 and k1  k2 are kernels.

if k is a kernel, then exp(k) is a kernel.
• Examples for kernels for X = Rd:

Any linear combination j ®jkj with ®j ¸ 0,

Polynomial kernels k(x, x’) = (1 + hx, x‘i)m, m > 0

Gaussian a.k.a. RBF
with ¾ > 0.
Slide credit: Christoph Lampert
B. Leibe
10
Constructing Kernels (4)
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
2. We can construct kernels from other kernels:

If k is a kernel and ® > 0, then k and k + ® are kernels.

if k1, k2 are kernels, then k1 + k2 and k1  k2 are kernels.

if k is a kernel, then exp(k) is a kernel.
• Examples for kernels for other X:
for n-bin histograms h, h’.


k(p, p’) = exp(−KL(p, p’)) with KL the symmetrized KLdivergence between positive probability distributions.

k(s, s’) = exp(−D(s, s’)) for strings s, s’ and D = edit distance
• Not an example: tanh (ahx, x’i + b) is not positive
definite!
Slide credit: Christoph Lampert
B. Leibe
11
Topics of This Lecture
• Kernels
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Recap: Kernel trick
Constructing kernels
• Gaussian Processes


Recap: Definition, Prediction, GP Regression
Influence of hyperparameters
• Learning Gaussian Processes


Bayesian Model Selection
Model selection for Gaussian Processes
• Gaussian Processes for Classification


Linear models for classification
Gaussian Process classification
• Applications
B. Leibe
15
Recap: Gaussian Process
• Gaussian distribution
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Probability distribution over scalars / vectors.
• Gaussian process (generalization of Gaussian distrib.)

Describes properties of functions.
Function: Think of a function as a long vector where each entry
specifies the function value f(xi) at a particular point xi.

Issue: How to deal with infinite number of points?

– If you ask only for properties of the function at a finite number of
points…
– Then inference in Gaussian Process gives you the same answer if
you ignore the infinitely many other points.
• Definition

A Gaussian process (GP) is a collection of random variables any
finite number of which has a joint Gaussian distribution.
Slide credit: Bernt Schiele
B. Leibe
16
Recap: Gaussian Process
• A Gaussian process is completely defined by
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Mean function m(x) and
m(x) = E[f (x)]

Covariance function k(x,x’)
k(x; x 0) = E[(f (x) ¡ m(x)(f (x 0) ¡ m(x 0))]

We write the Gaussian process (GP)
f (x) » GP(m(x); k(x; x 0))
Slide adapted from Bernt Schiele
B. Leibe
17
Recap: GPs Define Prior over Functions
• Distribution over functions:
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced



Specification of covariance function implies distribution over
functions.
I.e. we can draw samples from the distribution of functions
evaluated at a (finite) number of points.
Procedure
– We choose a number of input points X ?
– We write the corresponding covariance
matrix (e.g. using SE) element-wise:
K (X ? ; X ? )
– Then we generate a random Gaussian
vector with this covariance matrix:
f ? » N (0; K (X ? ; X ? ))
Slide credit: Bernt Schiele
B. Leibe
Example of 3 functions
18
sampled
Image source: Rasmussen & Williams, 2006
Recap: Prediction with Noise-free Observations
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Assume our observations are noise-free:

Joint distribution of the training outputs f and test outputs f*
according to the prior:
·

f
f?
¸
µ
» N
·
0;
K (X ; X )
K (X ? ; X )
K (X ; X ? )
K (X ? ; X ? )
¸¶
Calculation of posterior corresponds to conditioning the joint
Gaussian prior distribution on the observations:
f¹? = E[f ? jX ; X ? ; t ]

with:
Slide adapted from Bernt Schiele
B. Leibe
19
Recap: Prediction with Noisy Observations
• Joint distribution of the observed values and the test
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
locations under the prior:


Calculation of posterior corresponds to conditioning the joint
Gaussian prior distribution on the observations:
f¹? = E[f ? jX ; X ? ; t ]
with:
 This is the key result that defines Gaussian process regression!
– Predictive distribution is Gaussian whose mean and variance depend
on test points X* and on the kernel k(x,x’), evaluated on X.
Slide adapted from Bernt Schiele
B. Leibe
20
GP Regression Algorithm
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Very simple algorithm

Based on the following equations (Matrix inv.  Cholesky fact.)
B. Leibe
21
Image source: Rasmussen & Williams, 2006
Recap: Computational Complexity
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Complexity of GP model

Training effort: O(N3) through matrix inversion

Test effort: O(N2) through vector-matrix multiplication
• Complexity of basis function model

Training effort: O(M3)

Test effort: O(M2)
• Discussion



If the number of basis functions M is smaller than the number of
data points N, then the basis function model is more efficient.
However, advantage of GP viewpoint is that we can consider
covariance functions that can only be expressed by an infinite
number of basis functions.
Still, exact GP methods become infeasible for large training sets.
B. Leibe
22
Influence of Hyperparameters
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Most covariance functions have some free parameters.

Example:

Parameters:
2
– Signal variance: ¾f
– Range of neighbor influence (called “length scale”): l
– Observation noise:
Slide credit: Bernt Schiele
B. Leibe
23
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Influence of Hyperparameters
• Examples for different settings of the length scale
(¾ parameters set by optimizing
the marginal likelihood)
= (0:3; 1:08; 0:00005)
Slide credit: Bernt Schiele
= (1; 1; 0:1)
B. Leibe
= (3:0; 1:16; 0:89)
24
Image source: Rasmussen & Williams, 2006
Topics of This Lecture
• Kernels
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Recap: Kernel trick
Constructing kernels
• Gaussian Processes


Recap: Definition, Prediction, GP Regression
Influence of hyperparameters
• Learning Gaussian Processes


Bayesian Model Selection
Model selection for Gaussian Processes
• Gaussian Processes for Classification


Linear models for classification
Gaussian Process classification
• Applications
B. Leibe
25
Learning Kernel Parameters
• Can we determine the length scale and noise levels from
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
training data?
Slide credit: Bernt Schiele
B. Leibe
26
Bayesian Model Selection
• Goal
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Determine/learn different parameters of Gaussian Processes
• Hierarchy of parameters



Lowest level
– w – e.g. parameters of a linear model.
Mid-level (hyperparameters)
– µ – e.g. controlling prior distribution of w.
Top level
– Typically discrete set of model structures Hi.
• Approach

Inference takes place one level at a time.
Slide credit: Bernt Schiele
B. Leibe
27
Model Selection at Lowest Level
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Posterior of the parameters w is given by Bayes’ rule
• with



p(t|X,w,Hi)
p(w|µ,Hi)
likelihood and
prior parameters w,
Denominator (normalizing constant) is independent of the
parameters and is called marginal likelihood.
Slide credit: Bernt Schiele
B. Leibe
28
Model Selection at Mid Level
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Posterior of parameters µ is again given by Bayes’ rule
• where



The marginal likelihood of the previous level p(t|X,µ,Hi)
plays the role of the likelihood of this level.
p(µ|Hi) is the hyperprior (prior of the hyperparameters)
Denominator (normalizing constant) is given by:
Slide credit: Bernt Schiele
B. Leibe
29
Model Selection at Top Level
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• At the top level, we calculate the posterior of the model
• where



Again, the denominator of the previous level p(t|X,Hi)
plays the role of the likelihood.
p(Hi) is the prior of the model structure.
Denominator (normalizing constant) is given by:
Slide credit: Bernt Schiele
B. Leibe
30
Bayesian Model Selection
• Discussion
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Marginal likelihood is main difference to non-Bayesian methods
It automatically incorporates a trade-off
between the model fit and the model
complexity:
– A simple model can only account
for a limited range of possible
sets of target values – if a simple
model fits well, it obtains a high
posterior.
– A complex model can account for
a large range of possible sets of
target values – therefore, it can
never attain a very high posterior.
Slide credit: Bernt Schiele
B. Leibe
31
Bayesian Model Selection
• Computational issues
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Requires the evaluation of several integrals, which may or may
not be analytically tractable, depending on details of the
models.
In general, one may have to resort to analytic approximations or
MCMC methods. (Lecture 7)
• Model selection for GP regression

GP regression models with Gaussian noise are an (important)
exception:
– Integrals over the parameters are analytically tractable and
– At the same time, the models are flexible.
Slide credit: Bernt Schiele
B. Leibe
32
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
33
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
34
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
35
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
36
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
37
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
38
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
39
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
40
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example
Slide credit: Bernt Schiele
B. Leibe
41
Topics of This Lecture
• Kernels
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Recap: Kernel trick
Constructing kernels
• Gaussian Processes


Recap: Definition, Prediction, GP Regression
Influence of hyperparameters
• Learning Gaussian Processes


Bayesian Model Selection
Model selection for Gaussian Processes
• Gaussian Processes for Classification


Linear models for classification
Gaussian Process classification
• Applications
B. Leibe
42
Classification
• Classic view of classification
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Prediction: we want to assign an input pattern x to one of C
classes.
• Probabilistic classification


Predictions take the form of class probabilities.
More general than class assignments
– Class assignment is obtained by solving a decision problem that
involves the predictive probabilities as well as costs in making the
correct/wrong decision.
• Relation to regression


Both regression and classification can be seen as function
approximation.
Solution of classification problems using Gaussian processes is
(unfortunately) more demanding…
Slide credit: Bernt Schiele
B. Leibe
43
Classification Problem
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Setting

Given input patterns x

And corresponding class labels : y = C1,…,CC

We are interested in p(y|x)
• Goal

Calculate posterior probabilities for each class using
p(yjx) = P


p(xjy)p(y)
C
c= 1
p(xjCc )p(Cc )
Generative approach
– Learn model for p(x|y) (learning a model for p(y) is often simple)
Discriminative approach
– Learn directly model for p(y|x).
Slide credit: Bernt Schiele
B. Leibe
44
Classification Problem
• Problem when applying Gaussian processes
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Goal is to model the posterior probabilities of the target
variable, which must lie in the interval (0,1).
The GP makes predictions that lie in (-1,1).
• Solution

Adapt Gaussian processes by transforming the output using an
appropriate nonlinear activation function.
e.g. a sigmoid
¾(f (x))
B. Leibe
45
Classification Problem
• Discriminative approaches for the binary case (2 classes)
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Linear logistic regression model
– Combines the linear model with a logistic response function
p(C1 jx) = ¸ (x T w)

¸ (z) =
1
1 + exp(¡ z)
Linear probit regression model
– Combines the linear model with the probit response function
(cumulative density function of standard normal distribution)
Z
p(C1 jx) = ©(x T w)
z
©(z) =
N (xj0; 1)dx
¡ 1
• Note:

The Gaussian process classifiers developed in the following are
discriminative.
Slide credit: Bernt Schiele
B. Leibe
46
Linear Models for Classification
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Setting

Binary classification: y = +1 and y = -1

T
Likelihood is: p(y = + 1jx; w ) = ¾(x w )
p(y = ¡ 1jx; w ) = 1 ¡ ¾(x T w )
– For linear logistic regression:
– For linear probit regression:

¾(z) = ¸(z)
¾(z) = ©(z)
Given a data point (xi,yi) and noting that ¾(-z) = 1 - ¾(z)
the likelihood can be written in compact form:
p(yi jx i ; w) = ¾(yi x T w)
Slide credit: Bernt Schiele
B. Leibe
47
Linear Models for Classification
• Learning
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Given a data set:
D = f (x i ; yi )ji = 1; : : : ; ng
Assuming data is i.i.d.
• Maximum likelihood (ML):

Maximize data log-likelihood given by
Xn
log ¾(yi x Ti w )
log p(y jX ; w ) =
i= 1
• Maximum a posteriori (MAP):



Assume Gaussian prior over weights:
Maximize
p(wjX ; y) / p(yjX ; w)p(w)
I.e.
Xn
log p(w jX ; y ) /
i= 1
Slide credit: Bernt Schiele
log ¾(yi x Ti w )
B. Leibe
1 T ¡1
¡ w §p w
2
48
Linear Models for Classification
• Example of MAP-solution
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


2D input space
2D weight space (no offset)
p(w) = N (0; I )
p(wjX ; y)
Note: posterior is
uni-modal but
non-Gaussian.
Slide credit: Bernt Schiele
B. Leibe
49
Image source: Rasmussen & Williams, 2006
Linear Models for Classification
• Predictions
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Predictions based on the training set D for a test point x*
is given by
Z
p(y? jx ? ; X ; y ) =

p(y? = + 1jw; x ? )p(w jX ; y )dw
This leads to contours of the predictive distribution:
p(wjX ; y)
Slide credit: Bernt Schiele
B. Leibe
50
Image source: Rasmussen & Williams, 2006
Gaussian Process Classification
• Basic idea (for the binary case)
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Place a GP prior over the latent function f(x)
“Squash” this function through the logistic function to obtain a
prior on
4
¼(x) = p(y = + 1jx) = ¾(f (x))


Note
– ¼ is a deterministic function of f. Since f is stochastic, so is ¼.
Example
f(x) drawn from GP prior
Slide credit: Bernt Schiele
B. Leibe
¼(x) = ¸ (f (x))
51
Image source: Rasmussen & Williams, 2006
Latent Function f
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Function f plays the role of a nuisance function

We do not observe values of f itself.

And we are not particularly interested in the values of f.

Rather, we are interested in values of ¼(x) and

Specifically for test cases: ¼(x*)
• Purpose of f:


To allow convenient formulation of the model.
Our computational goal is to eliminate f (by means of integration
over f).
Slide credit: Bernt Schiele
B. Leibe
52
Inference and Prediction
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
• Natural division into 2 steps
1. Computing the distribution of the latent variable corresponding
to test case:
Z
p(f ? jX ; y ; x ? ) =
p(f ? jX ; x ? ; f )p(f jX ; y )df
– where the posterior over the latent variables is given by
p(y jf )p(f jX )
p(f jX ; y ) =
p(y jX )
2. Using the distribution over the latent f* to produce the
probabilistic prediction
Z
4
¼
¹ ? = p(y? = + 1jX ; y ; x ? ) =
Slide credit: Bernt Schiele
B. Leibe
¾(f ? )p(f ? jX ; y ; x ? )df ?
53
Inference and Prediction
• For regression
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

Computation of predictions was easy, as the relevant integrals
were Gaussian and could be computed analytically.
• For classification


Non-Gaussian likelihood makes the integral analytically
intractable.
Approximations of the integrals
– e.g. based on Monte Carlo sampling
– e.g. Laplace approximation method
(see Rasmussen & Williams, Chapter 3.4)
– …
Slide credit: Bernt Schiele
B. Leibe
54
Topics of This Lecture
• Kernels
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Recap: Kernel trick
Constructing kernels
• Gaussian Processes


Recap: Definition, Prediction, GP Regression
Influence of hyperparameters
• Learning Gaussian Processes


Bayesian Model Selection
Model selection for Gaussian Processes
• Gaussian Processes for Classification


Linear models for classification
Gaussian Process classification
• Applications
B. Leibe
55
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Application: Non-Linear Dimensionality Reduction
Slide credit: Andreas Geiger
B. Leibe
56
Gaussian Process Latent Variable Model
• At each time step t, we express our observations y as a
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
combination of basis functions à of latent variables x.
X
yt =
bj Ãj (x t ) + ±t
j
• This is modeled as a Gaussian process…
Slide credit: Andreas Geiger
B. Leibe
57
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Example: Style-based Inverse Kinematics
Learned GPLVMs using a walk, a jump shot and a baseball pitch
Slide credit: Andreas Geiger
B. Leibe
58
Application: Modeling Body Dynamics
• Task: estimate full body pose in m video frames.
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


High-dimensional Y*
Model body dynamics using hierarchical Gaussian process latent
variable model (hGPLVM) [Lawrence & Moore, ICML 2007].
Time (frame #)
Training
T = [t i 2 R]
^ =
p(ZjT ; µ)
Yq
N (Z :;i j0; K T )
i= 1
Z = [zi 2 Rq ]
Latent space
YD
p(Y jZ; µ) =
N (Y :;i j0; K z )
i= 1
Y = [y i 2 RD ]
Configuration
Slide credit: Bernt Schiele
B. Leibe
59
[Andriluka, Roth, Schiele, CVPR’08]
Application: Mapping b/w Pose and Appearance
• Appearance prediction
Regression problem
 High-dimensional data on both sides
 Low-dim. representation needed
for learning!
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced

• 3D joint locations • segm. image
• 60-dim.
• ~2500-dim.
• Training with Motion-capture data possible


Synthesized silhouettes for training
Background subtraction for test
[Jaeggli, Koller-Meier, Van Gool, ACCV’07]
Learning a Generative Mapping
X : Body Pose
(high dim.)
Learn LLE dim. red.
reconstruct
pose
x : Body Pose
(low dim.)
generative mapping
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Body Pose
Y : Image
(high dim.)
PCA projection
dynamic prior
likelihood
y : Appearance
Descriptor: (low dim.)
Appearance
B. Leibe
61
[Jaeggli, Koller-Meier, Van Gool, ACCV’07]
Experimental Results
• Difficulties
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced




Changing viewpoints
Low resolution (50 px)
Compression artifacts
Disturbing objects
Original video
[Jaeggli, Koller-Meier, Van Gool, ACCV’07]
Articulated Motion in Latent Space
(different work)
• Gaussian Process regression from latent space to
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced


Pose [
= p(Pose|z) to recover original pose from latent space]
Silhouette [
= p(Silhouette|z) to do inference on silhouettes]
Walking cycles have one
main (periodic) DOF
Additional DOF encodes
„walking style“
B. Leibe
63
[Gammeter, Ess, Leibe, Schindler, Van Gool, ECCV’08]
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
Results
454 frames (~35 sec)
23 Pedestrians
20 detected by multi-body tracker
B. Leibe
64
[Gammeter, Ess, Leibe, Schindler, Van Gool, ECCV’08]
References and Further Reading
• Kernels and Gaussian Processes are (shortly) described
Computing
Augmented
and Sensory
PerceptualMachine
Winter’12
Learning
Advanced
in Chapters 6.1 and 6.4 of Bishop’s book.
Christopher M. Bishop
Pattern Recognition and Machine Learning
Springer, 2006
Carl E. Rasmussen, Christopher K.I. Williams
Gaussian Processes for Machine Learning
MIT Press, 2006
• A better introduction can be found in Chapters 3 and 5
of the book by Rasmussen & Williams (also available
online: http://www.gaussianprocess.org/gpml/)
B. Leibe
65