presentation

Download Report

Transcript presentation

Improved Initialisation and Gaussian
Mixture Pairwise Terms for Dense
Random Fields with Mean-field Inference
Vibhav Vineet, Jonathan Warrell,
Paul Sturgess, Philip H.S. Torr
http://cms.brookes.ac.uk/research/visiongroup/
1
Labelling Problem
Assign a label to each image pixel
Object segmentation
Stereo
Object detection
2
Problem Formulation
Find a labelling that maximizes the conditional
probability or minimizes the energy function
3
Problem Formulation
Grid CRF leads to over smoothing around boundaries
Inference
4
Problem Formulation
Grid CRF leads to over smoothing around boundaries
Dense CRF is able to recover fine boundaries
Inference
Inference
5
Inference in Dense CRF
Very high time complexity
graph-cuts based methods not feasible
alpha-expansion takes almost 1200 secs/per image with
neighbourhood size of 15 on PascalVOC segmentation
dataset
6
Inference in Dense CRF
Filter-based mean-field inference method takes 0.2 secs*
Efficient inference under two assumptions
Mean-field approximation to CRF
Pairwise weights take Gaussian weights
*Krahenbuhl
et al. Efficient Inference in Fully Connected CRFs with Gaussian
Edge Potentials, NIPS 11
7
Efficient inference in dense CRF
• Mean-fields methods (Jordan et.al., 1999)
• Intractable inference with distribution P
• Approximate distribution
from tractable family
P
8
Naïve mean field
Assume all variables are independent
9
Efficient inference in dense CRF
Assume Gaussian pairwise weight
Mixture of Gaussian kernels
Bilateral
Spatial
10
Marginal update
• Marginal update involve expectation of cost over
distribution Q given that x_i takes label l
Expensive message passing step is solved using highly
efficient permutohedral lattice based filtering approach
• Maximum posterior marginal (MPM) with approximate
distribution:
11
Q distribution
Q distribution for different classes across
different iterations on CamVID dataset
Iteration 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
112
Q distribution
Q distribution for different classes across
different iterations on CamVID dataset
Iteration 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
113
Q distribution
Q distribution for different classes across
different iterations on CamVID dataset
Iteration 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
114
Q distribution
Q distribution for different classes across
different iterations on CamVID dataset
Iteration 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
115
Q distribution
Q distribution for different classes across
different iterations on CamVID dataset
Iter 0
Iter 1
Iter 2
Iter 10
16
Two issues associated with the method
• Sensitive to initialisation
• Restrictive Gaussian pairwise weights
17
Our Contributions
Resolve two issues associated with the method
• Sensitive to initialisation
Propose SIFT-flow based initialisation method
• Restrictive Gaussian pairwise weights
Expectation maximisation (EM) based strategy
to learn more general Gaussian mixture model
18
Sensitivity to initialisation
Experiment on PascalVOC-10 segmentation dataset
Mean-field
Alpha-expansion
Unary potential
28.52 %
27.88%
Ground truth label
41 %
27.88%
Observe an improvement of almost 13% in I/U score on
initialising the mean-field inference with the ground truth labelling
• Good initialisation can lead to better solution
Propose a SIFT-flow based better initialisation method
19
SIFT-flow based correspondence
Given a test image, we first retrieve a set of nearest
neighbours from training set using GIST features
Test image
Nearest neighbours retrieved from training set
20
SIFT-flow based correspondence
K-nearest neighbours warped to the test image
23.31
13.31
14.31
18.38
22
22
Test image
22
30.87
27.2
Warped nearest neighbours and corresponding flows
21
SIFT-flow based correspondence
Pick the best nearest neighbour based on the flow value
Test image
Nearest neighbour
Warped image
Flow: 13.31
22
Label transfer
Warp the ground truth according to correspondence
Transfer labels from top 1 using flow
Ground truth of test image
Ground truth of
the best nearest
neighbour
Flow
23
Warped ground truth according to flow
SIFT-flow based initialisation
Rescore the unary potential
s rescores the unary potential of a variable based on the label
observed after the label transfer stage
set through cross-validation
Test image
Ground truth
Without rescoring
After rescoring
Qualitative improvement in accuracy after using rescored unary potential
24
SIFT-flow based initialisation
Initialise mean-field solution
Test image
Ground truth
Without initialisation
With initialisation
Qualitative improvement in accuracy after initialisation of mean-field
25
Gaussian pairwise weights
Experiment on PascalVOC-10 segmentation dataset
Plotted the distribution of class-class (
selecting pair of random points (i-j)
) interaction by
500
400
300
200
100
0
-100
-200
-300
-400
-500
-400
Aeroplane-Aeroplane
-300
-200
Horse-Person
-100
0
100
200
300
400
500
Car-Person
26
Gaussian pairwise weights
Experiment on PascalVOC-10 segmentation dataset
Such complex structure of data can not be captured by zero
mean Gaussian
500
400
distributed horizontally
300
200
distributed vertically
100
0
-100
not centred around
zero mean
-200
-300
-400
-500
-400
-300
-200
-100
0
100
200
300
400
500
Propose an EM-based learning strategy to incorporate
more general class of Gaussian mixture model
27
Our model
Our energy function takes following form:
We use separate weights for label pairs but
Gaussian components are shared
We follow piecewise learning strategy to learn
parameters of our energy function
28
Learning mixture model
• Learn the parameters similar to this model*
*Krahenbuhl
et al. Efficient Inference in Fully Connected CRFs with Gaussian
Edge Potentials, NIPS 11
29
Learning mixture model
• Learn the parameters similar to this model*
• Learn the parameters of the Gaussian mixture
mean, standard deviation
mixing coefficients
*Krahenbuhl
et al. Efficient Inference in Fully Connected CRFs with Gaussian
Edge Potentials, NIPS 11
30
Learning mixture model
• Learn the parameters similar to this model*
• Learn the parameters of the Gaussian mixture
mean, standard deviation
mixing coefficients
• Lambda is set through cross validation
*Krahenbuhl
et al. Efficient Inference in Fully Connected CRFs with Gaussian
Edge Potentials, NIPS 11
31
Our model
• We follow a generative training model
Maximise joint likelihood of pair of labels and
features:
: latent variable: cluster
assignment
We follow expectation maximization (EM) based
method to maximize the likelihood function
32
Learning mixture model
500
400
300
Our model is able to
capture the true distribution
of class-class interaction
200
100
0
-100
-200
-300
-400
-500
-400
-300
-200
50
50
50
100
100
100
150
150
150
200
200
200
250
250
300
300
350
350
250
300
350
400
50
100
150
200
250
300
Aeroplane-Aeroplane
350
400
400
400
50
100
150
200
250
300
Horse-Person
350
400
-100
50
0
100
100
150
200
200
300
250
400
300
500
350
400
Car-Person
33
Inference with mixture model
• Involves evaluating M extra Gaussian terms:
• Perform blurring on mean-shifted points
• Increases time complexity
34
Experiments on Camvid
Q distribution for building
classes on CamVID dataset
Iteration 0
Ground truth
Without initialisation
With initialisation
Confidence of building pixels increases with initialisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
35
1
Experiments on Camvid
Q distribution for building
classes on CamVID dataset
Iteration 1
Ground truth
Without initialisation
With initialisation
Confidence of building pixels increases with initialisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
36
1
Experiments on Camvid
Q distribution for building
classes on CamVID dataset
Iteration 2
Ground truth
Without initialisation
With initialisation
Confidence of building pixels increases with initialisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
37
1
Experiments on Camvid
Q distribution for building
classes on CamVID dataset
Iteration 10
Ground truth
Without initialisation
With initialisation
Confidence of building pixels increases with initialisation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
38
1
Experiments on Camvid
Image 2
Without Initialisation
Ground truth
With Initialisation
Building is properly recovered with our initialisation strategy
39
Experiments on Camvid
Quantitative results on Camvid dataset
Algorithm
Time(s)
Overall(%-corr)
Av. Recall
Av. U/I
Alpha-exp
0.96
78.84
58.64
43.89
APST(U+P+H)
1.6
85.18
60.06
50.62
denseCRF
0.2
79.96
59.29
45.18
Ours (U+P+I)
0.35
85.31
59.75
50.56
• Our model with unary and pairwise terms achieve better accuracy
than other complex models
• Generally achieve very high efficiency compared to other methods
40
Experiments on Camvid
Qualitative results on Camvid dataset
Image
Ground truth
Alpha-expansion
Ours
Able to recover building and tree properly
41
Experiments on PascalVOC-10
Qualitative results of SIFT-flow method
Image
Ground truth
Warped nearest
ground truth image
Output without
SIFT-flow
Output with
SIFT-flow
Able to recover missing body parts
42
Experiments on PascalVOC-10
Quantitative results PascalVOC-10 segmentation dataset
Algorithm
Time(s)
Overall(%-corr)
Av. Recall
Av. U/I
Alpha-exp
3.0
79.52
36.08
27.88
AHCRF+Cooc
36
81.43
38.01
30.9
Dense CRF
0.67
71.63
34.53
28.4
Ours1(U+P+GM)
26.7
80.23
36.41
28.73
Ours2 (U+P+I)
0.90
79.65
41.84
30.95
Ours3 (U+P+I+GM)
26.7
78.96
44.05
31.48
• Our model with unary and pairwise terms achieves better accuracy
than other complex models
• Generally achieves very high efficiency compared to other methods 43
Experiments on PascalVOC-10
Qualitative results on PascalVOC-10 segmentation dataset
Image
Ground truth
alpha-expansion
Dense CRF
Ours
Able to recover missing object and body parts
44
Conclusion
• Filter-based mean-field inference promises high
efficiency and accuracy
• Proposed methods to robustify basic mean-field method
• SIFT-flow based method for better initialisation
• EM based algorithm for learning general Gaussian
mixture model
• More complex higher order models can be incorporated
into pairwise model
45
Thank you 
46