pptx - Home pages of ESAT

Download Report

Transcript pptx - Home pages of ESAT

A coarse-to-fine approach for fast
deformable object detection
Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez
Object detection
[VOC 2010]
2
[Fischler Elschlager 1973]
[Vedaldi Zisserman 2009]
[Felzenszwalb et al 08]
[Zhu et al 10]
• Addressing the computational bottleneck
- branch-and-bound [Blaschko Lampert 08, Lehmann et al. 09]
- cascades
[Viola Jones 01, Vedaldi et al. 09, Felzenszwalb et al 10, Weiss Taskar 10]
-
-
jumping windows [Chum 07]
sampling windows [Gualdi et al. 10]
coarse-to-fine [Fleuret German 01, Zhang et al 07, Pedersoli et al. 10]
3
Analysis of the cost of
pictorial structures
The cost of pictorial structures
•
•
•
4
cost of inference
one part: L
two parts: L2
…
P parts: LP
-
with a tree
using dynamic programming
PL2
Polynomial, but still too slow in practice
-
L = number of part locations
~ number of pixels
~ millions
with a tree and quadratic springs
using the distance transform [Felzenszwalb and Huttenlocher 05]
PL
In principle, millions of times faster than dynamic programming!
-
A notable case: deformable part models
•
-
55
Deformable part model [Felzenszwalb et al. 08]
locations are discrete
number of possible part locations:
L
-
L / δ2
δ
deformations are bounded
cost of placing two parts:
L2
LC, C << L
C = max. deformation size
image
total geometric cost:
C PL / δ2
A notable case: deformable part models
•
•
With deformable part models
finding the optimal parts configuration is cheap
distance transform speed-up is limited
-
6
geometric cost:
C PL /
δ2
Standard analysis does not account for filtering:
filtering cost:
F PL / δ2
F = size of filter
image
total cost: (F + C) PL / δ2
• Typical example
- filter size: F = 6 × 6 × 32
- deformation size: C = 6 × 6
• Filtering dominates the finding the optimal part configuration!
Accelerating deformable part models
deformable part model cost:
(F + C) PL / δ2
the key is reducing the filter evaluations
•
Cascade of deformable parts
[Felzenszwalb et al. 2010]
•
detect parts sequentially
stop when confidence below a threshold
Coarse-to-fine localization
[Pedersoli et al. 2010]
-
multi-resolution search
we extend this idea to
deformable part models
7
8
Our contribution:
Coarse-to-fine for deformable models
Our model
•
•
9
Multi-resolution deformable parts
each part is a HOG filter
recursive arrangement
resolution doubles
bounded deformation
-
Score of a configuration S(y)
HOG filter score
parent-child deformation score
-
image
Coarse-to-Fine search
10
Quantify the saving
•
•
11
# filter evaluations
1D view (circle = part location)
overall speedup
4R
2D view
exponentially larger saving
exact
CTF
L
L
4L
L
16L
L
Lateral constraints
•
•
•
Geometry in deformable part models is cheap
can afford additional constraints
-
Lateral constraints
connect sibling parts
-
Inference
use dynamic programming within each level
open the cycle by conditioning one node
-
12
Lateral constraints
•
•
13
Why are lateral constraints useful?
Encourage consistent local deformations
without lateral constraints siblings move independently
no way to make their motion coherent
-
without lateral constraints y and y’ have the
same geometric cost
with lateral constraints y can be encouraged
14
Experiments
Effect of deformation size
•
•
INRIA pedestrian dataset
C = deformation size (HOG cells)
AP = average precision (%)
Coarse-to-fine (CTF) inference
-
15
C
3×3
5×5
7×7
AP
time
83.5
0.33s
83.2
2.0s
83.6
9.3s
Remarks
large C slows down inference but does not improve precision
small C implies already substantial part deformation due to
multiple resolutions
-
Effect of the lateral constraints
Exact vs Coarse-to-fine (CTF) inference
inference
tree
tree + lateral conn.
•
•
exact inference
83.0 AP
83.4 AP
CTF inference
80.7 AP
83.5 AP
tree
CTF ~ exact inference scores
CTF ≤ exact
bound is tighter with
lateral constraints
-
Effect is significant on training as well
additional coherence
avoids spurious solutions
Example
learning the head model
-
tree + lat.
exact score
•
16
CTF score
CTF learning and tree
CTF learning and tree + lat.
Training speed
•
•
Structured latent SVM [Felzenszwalb et al. 08, Vedaldi et al. 09]
deformations of training objects are unknown
estimated as latent variables
-
Algorithm
Initialization: no negative examples, no deformations
Outer loop
▪ Inner loop
Collect hard negative examples (CTF inference)
Learn the model parameters (SGD)
Estimate the deformations (CTF inference)
-
▪
•
17
•
•
The training speed is dominated by the cost of inference!
time
exact inference
training
≈20h
testing
2h
( 10s per image)
CTF inference
≈2h
4m
> 10× speedup!
(0.33s per image)
PASCAL VOC 2007
•
Evaluate on the detection of 20 different object categories
~5,000 images for training, ~5,000 images for testing
-
MKL BOW
PS
Hierarc.
Cascade
OUR
•
plane
37,6
29,0
29,4
22,8
27,7
bike
47,8
54,6
55,8
49,4
54,0
bird
15,3
0,6
9,4
10,6
6,6
boat bottle bus car cat chair cow
15,3 21,9 50,7 50,6 30,0 17,3 33,0
13,4 26,2 39,4 46,4 16,1 16,3 16,5
14,3 28,6 44,0 51,3 21,3 20,0 19,3
12,9 27,1 47,4 50,2 18,8 15,7 23,6
15,1 14,8 44,2 47,3 14,6 12,5 22,0
Remarks
very good for aeroplane,
bicycle, boat, table, horse,
motorbike, sheep
less good for bottle, sofa, tv
-
•
18
Speed-accuracy trade-off
time is drastically reduced
hit on AP is small
-
table
22,5
24,5
25,2
10,3
24,2
dog horse mbike person
21,5 51,2 45,5 23,3
5,0 43,6 37,8 35,0
12,5 50,4 38,4 36,6
12,1 36,4 37,1 37,2
12,0 52,0 42,0 31,2
plant sheep sofa train tv mean Time (s)
12,4 23,9 28,5 45,3 48,5 32,1 ~ 70
8,8 17,3 21,6 34,0 39,0 26,8 ~ 10
15,1 19,7 25,1 36,8 39,3 29,6 ~ 8
13,2 22,6 22,9 34,7 40,0 27,3 < 1
10,6 22,9 18,8 35,3 31,1 26,9 < 1
Comparison to the cascade of parts
•
•
Cascade of parts [Felzenszwalb et al. 10]
test parts sequentially, reject when score falls below threshold
saving at unpromising locations (content dependent)
difficult to use in training (thresholds must be learned)
-
Coarse-to-fine inference
saving is uniform (content independent)
can be used during training
-
19
Coarse-to-fine cascade of parts
•
•
20
Cascade and CTF use orthogonal principles
CTF
easily combined
speed-up multiplies!
-
Example
apply a threshold at the root
plot AP vs speed-up
In some cases 100 x speed-up
can be achieved
-
CTF
CTF
cascade
score > τ1?
cascade
score > τ2?
reject
reject
Summary
•
•
•
•
Analysis of deformable part models
filtering dominates the geometric configuration cost
speed-up requires reducing filtering
-
Coarse-to-fine search for deformable models
lower resolutions can drive the search at higher resolutions
lateral constraints add coherence to the search
exponential saving independent of the image content
can be used for training too
-
Practical results
10x speed-up on VOC and INRIA with minimum AP loss
can be combined with cascade of parts for multiplied speedup
-
Future
More complex models with rotation, foreshortening, …
-
21
Thank you!