mf_eccv12_vwt

Download Report

Transcript mf_eccv12_vwt

Filter-based Mean-Field Inference for
Random Fields with Higher-Order
Terms and Product Label-Spaces
Vibhav Vineet*, Jonathan Warrell*,
Philip H.S. Torr
http://cms.brookes.ac.uk/research/visiongroup/
*Joint first authors
1
Labelling problems
Many vision problems can be expressed
as dense image labelling problems
Object segmentation
Stereo
Optical flow
2
Overview
• Graph cuts so far have proved the method of choice for CRFs
3
Overview
• Graph cuts so far have proved the method of choice for CRFs
• Recently message passing methods have started to achieve equal
performance with much faster run times
• But only for pairwise CRFs
4
Overview
• Graph cuts so far have proved the method of choice for CRFs
• Recently message passing methods have started to achieve equal
performance with much faster run times
• But only for pairwise CRFs
• Some problems require higher order information
• Co-occurrences terms
• Product label spaces
5
Overview
• Graph cuts so far have proved the method of choice for CRFs
• Recently message passing methods have started to achieve equal
performance with much faster run times
• But only for pairwise CRFs
• Some problems require higher order information
• Co-occurrences terms
• Product label spaces
• Our contribution is to develop fast message passing based methods
for certain classes of higher order information
6
Importance of co-occurrence terms
Context is an important cue for global scene understanding
Can you identify this object?
Slide courtesy A Torralba
7
Importance of co-occurrence terms
We can identify it as keyboard through scene context
Slide courtesy A Torralba
8
Importance of co-occurrence terms
The keyboard, table and monitor often co-occur together
Shown to improve accuracy recently in Ladický et al (ECCV ’10)
Slide courtesy A Torralba
9
Importance of PN Potts terms
• PN Potts enforce region consistency
• Detector-based PN potentials are
formed by applying grab-cut to
bounding box to create a clique
• Improves over pairwise terms only
Result without
detections
Slide courtesy L Ladicky
Set of detections
Final Result
10
Importance of higher order terms
We use higher order information to improve object class
segmentation …
Image
Object labels
11
Importance of higher order terms
… and also to improve joint object and stereo labelling using
product label spaces
Image
Object labels
Disparity labels
12
CRF formulation
Standard CRF energy formulation
Pairwise
CRF
Data term
Inference
Smoothness term
13
CRF formulation
Standard CRF energy formulation
Higher
Order
CRF
Data term
Inference
Smoothness term
Higher order terms
Co-occurrence
term
14
Inference
Standard CRF energy
Data term
Smoothness term
Higher order term
Co-occurrence
term
Can be solved using graph-cuts based method
But with co-occurrence ~10 times slower than pairwise only
Relatively fast but still computationally expensive!
15
Our inference
Standard CRF energy
Data term
Smoothness term
Higher order term
Co-occurrence
term
We use filter-based mean-field inference approach
Our method achieves almost 10-40 times speed up
compared to graph cuts based methods
Much faster due to efficient filtering
16
Efficient inference in pairwise CRF
• Krähenbühl et al (NIPS ’11) propose an efficient method for
inference in pairwise CRF under two assumptions:
• Mean-field approximation to CRF
• Pairwise weights take a linear combination of Gaussian
kernels
17
Efficient inference in pairwise CRF
• Krähenbühl et al (NIPS ’11) propose an efficient method for
inference in pairwise CRF under two assumptions:
• Mean-field approximation to CRF
• Pairwise weights take a linear combination of Gaussian
kernels
• They achieve almost 5 times speed up over graph cuts + also
allow dense connectivity
Fully
connected
(dense)
pairwise CRF
Slide courtesy P Krahenbuhl
Inference
18
Mean-field based inference
• Mean-field approximation
approximate intractable P with Q from a tractable family
P
• Minimize the KL-divergence between Q and P
Slide courtesy S Nowozin
19
Mean-field based inference
• Mean-field update for pairwise terms:
20
Mean-field based inference
• Mean-field update for pairwise terms:
• This can be evaluated using Gaussian convolutions
21
Mean-field based inference
• Mean-field update for pairwise terms:
• This can be evaluated using Gaussian convolutions
• We evaluate two approaches for Gaussian convolution
• **Permutohedral lattice based filtering
• ***Domain transform based filtering
**Adams
et.al. Fast high-dimensional filtering using the permutohedral lattice. CG-10
***Gasta et.al. Domain transform for edge-aware image and video processing. TOG-11
22
Q distribution
Q distribution for different classes across
different iterations
Iteration 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
123
Q distribution
Q distribution for different classes across
different iterations
Iteration 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
124
Q distribution
Q distribution for different classes across
different iterations
Iteration 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
125
Q distribution
Q distribution for different classes across
different iterations
Iteration 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
126
Higher order mean-field update
• Marginal update in mean-field
27
Higher order mean-field update
• Marginal update in mean-field
-
-
-
-
-
-
-
-
-
Labels:
=1
=2
=3
28
Higher order mean-field update
• Marginal update in mean-field
• High time complexity for general higher order terms: O(L|C|)
We show how these can be solved for PN Potts
and co-occurrence terms efficiently
29
PN Potts example
PN Potts enforces region consistent labellings
Label set consists of 3 labels
Potts
patterns
Clique of 6
variables
Example: Detector potentials
30
Expectation update
Sum across possible states of the clique
Clique takes label l
Clique does not
taking label l
By rearranging the expectation as above, we reduce
the time complexity from O(LN) to O(NL)
Can be extended to pattern-based potentials (Komodakis et al CVPR ’09)
31
Global co-occurrence terms
Co-occurrence models which objects belong together
Λ(x)={ aeroplane, tree, flower,
building, boat, grass, sky }


Λ(x)={ building, tree, grass, sky }
32
Global co-occurrence terms
Associates a cost with each possible label subset
={
,
,
}
33
Global co-occurrence terms
Associates a cost with each possible label subset
={
,
,
}
We use a second order assumption to cost function
34
Our model
We define a cost over a set of latent variables: Y{1…L}
Each latent variable represents a label
Y:
Costs include unary and pairwise cost
Each latent variable node is connected
to each image variable node
K
Latent variable binary states:
:on
:off
X:
35
Global co-occurrence constraints
Constraint on the model
Constraint violation
Y:
K
K
X:
Pay cost K for each violation
If latent variable is off, no image
variable should take that label
Overall complexity: O(NL+L2)
36
Product label space
Assign an object and disparity label to each pixel
Joint energy function defined over product label space:
data term
Left Camera Image
Right Camera Image
smoothness term
Inference in
product label
space
higher order term
Object Class Segmentation
Dense Stereo Reconstruction 37
PascalVOC-10 dataset - qualitative
Image
Ground truth
Fully connected
pairwise CRF*
alpha-expansion**
Ours
Observe an improvement over alternative methods
*Krahenbuhl
et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11
38
**Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10
PascalVOC - quantitative
Algorithm
Time (s)
Overall
Av. Recall
Av. I/U
AHCRF+Cooc
36
81.43
38.01
30.9
Dense pairwise 0.67
71.43
34.53
28.40
Dense pairwise 4.35
+ Potts
79.87
40.71
30.18
Dense pairwise 4.4
+ Potts + Cooc
80.44
43.08
33.2
Observe an improvement of 2.3% in I/U score over Ladicky et.al*
*Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10
39
PascalVOC - quantitative
Algorithm
Time (s)
Overall
Av. Recall
Av. I/U
AHCRF+Cooc
36
81.43
38.01
30.9
Dense pairwise 0.67
71.43
34.53
28.40
Dense pairwise 4.35
+ Potts
79.87
40.71
30.18
Dense pairwise 4.4
+ Potts + Cooc
80.44
43.08
33.2
Observe an improvement of 2.3% in I/U score over Ladicky et.al*
Achieve 8-9x speed up compared to alpha-expansion based
method of Ladicky et.al*
*Ladicky L. et.al. Graph cut based inference with co-occurrence statistics, ECCV-10
40
Leuven dataset - qualitative
Left image
Ground truth
Ours
Right image
Ground truth
Ours
41
Leuven dataset - quantitative
Algorithm
Time (s)
Object
(% correct)
Stereo
(% correct)
GC + Range (1)
24.6
95.94
76.97
GC + Range (2)
49.9
95.94
77.31
GC + Range (3)*
74.4
95.94
77.46
Extended CostVol
4.2
95.20
77.18
Dense + HO (PLBF)
3.1
95.24
78.89
Dense + HO (DTBF)
2.1
95.06
78.21
Dense + HO + CostVol +DTBF
6.3
94.98
79.00
Achieve 12-35x speed up compared to alpha-expansion based method of
Ladicky et.al*
*Ladicky L. et.al. Joint optimisation for object class segmentation an dense stereo
reconstruction. BMVC-2010
42
Conclusion
• We provide efficient ways of incorporating higher-order terms
into fully connected pairwise CRF models
• Demonstrate improved efficiency compared to previous models
with higher-order terms
• Also demonstrate improved accuracy over previous approaches
• Similar methods applicable to a broad range of vision problems
• Code is available for download:
http://cms.brookes.ac.uk/staff/VibhavVineet/
43
EXTRA …
44
Joint object-stereo model
Introduce two different set of variables
Xi: object variable
Yi: disparity variable
Z_i: [ x_i y_i ]
Messages exchanged between object and stereo variables
Joint energy function:
Unary
Pairwise
Higher order
45
Marginal update for object variables
Message from disparity variables to object variables
Filtering is done using permutohedral lattice based filtering*
strategy
*Adams A. et.al. Fast high-dimensional filtering using the permutohedral lattice.
Computer Graphics Forum-2010
46
Marginal update for disparity variables
Message from object variables to disparity variables
Filtering is done using domain transform based filtering* strategy
*Gasta E.S.L. et.al. Domain transform for edge-aware image and video processing.
ACM Trans. Graph.-2011
47
Mean-field Vs. Graph-cuts
• Measure I/U score on PascalVOC-10 segmentation
• Increase standard deviation for mean-field
• Increase window size for graph-cuts method
• Both achieve almost similar accuracy
48
Mean-field Vs. Graph-cuts
• Measure I/U score on PascalVOC-10 segmentation
• Increase standard deviation for mean-field
• Increase window size for graph-cuts method
•Time complexity very high, making infeasible to work with large neighbourhood system
49
Window sizes
• Comparison on matched energy
Algorithm
Model
Time (s)
Av. I/U
Alpha-exp (n=10)
Pairwise
326.17
28.59
Mean-field
pairwise
0.67
28.64
Alpha-exp (n=3)
Pairwise + Potts
56.8
29.6
Mean-field
Pairwise + Potts
4.35
30.11
Alpha-exp (n=1)
Pairwise + Potts
+ Cooc
103.94
30.45
Mean-field
Pairwise + Potts
+ Cooc
4.4
32.17
Impact of adding more complex costs and increasing window size50