Interactive Systems Laboratories

Download Report

Transcript Interactive Systems Laboratories

Computer Vision:
Chamfer System
Dr. Edgar Seemann
[email protected]
1
Interactive Systems Laboratories, Universität Karlsruhe (TH)
cv:hci
Research Group, Universität Karlsruhe (TH)
Computer Vision for Human-Computer Interaction
Silhouette Matching
Dr. Edgar Seemann
2
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Chamfer Matching [Gavrila & Philomin ICCV’99]
Goal
•
Align known object shapes with image
Object shapes
Real-world image of object
Requirements for an alignment algorithm
•
•
•
•
High detection rate
Few false positives
Robustness
Computationally inexpensive
Dr. Edgar Seemann
3
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Distance Transform

Used to compare/align two (typically binary) shapes
Shape 1
Shape 2
Distance = ?
1.
Compute for each pixel the distance to the next edge pixel

Here the eculidean distances are
approximated by the 2-3 distance
Dr. Edgar Seemann
Distance transform
8
6
5
3
2
2
3
5
6
5
3
2
0
0
2
4
5
3
2
0
2
0
2
4
4
2
0
2
2
0
2
4
4
2
0
2
2
0
2
4
5
3
2
0
2
0
2
4
6
5
3
2
0
0
2
4
8
6
5
3
2
2
3
5
4
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Distance Transform
2.
Overlay second shape over distance transform
Distance transform
8
6
5
3
2
2
3
5
6
5
3
2
0
0
2
4
5
3
2
0
2
0
2
4
4
2
0
2
2
0
2
4
4
2
0
2
2
0
2
4
5
3
2
0
2
0
2
4
6
5
3
2
0
0
2
4
8
6
5
3
2
2
3
5
Distance = 14
3.
4.
Accumulate distances along shape 2
Find best matching position by an exhaustive search


Distance is not symmetric
Distance has to be normalized w.r.t. the length of the
shapes
Dr. Edgar Seemann
5
Research Group, Universität Karlsruhe (TH)
Binary image
Distance transform
8
6
5
3
2
2
3
5
6
5
3
2
0
0
2
4
5
3
2
0
2
0
2
4
4
2
0
2
2
0
2
4
4
2
0
2
2
0
2
4
5
3
2
0
2
0
2
4
6
5
3
2
0
0
2
4
8
6
5
3
2
2
3
5
Distance transform of a real-world image
Chamfer Matching
• Compute distance transform (DT)
• For each possible object location
• Position known object shape over DT
• Accumulate distances along the
contour
Distance measure
cv:hci
Computer Vision for Human-Computer Interaction
Chamfer Matching
Dr. Edgar Seemann
6
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Efficient implementation
 The distance transform can be efficiently computed by two
scans over the complete image
 Forward-Scan


Starts in the upper-left corner and moves from left to right, top to
bottom
3 2 3
Uses the following mask
2
0
 Backward-Scan


Starts in the lower-right corner and moves from right to left,
bottom to top
Uses the following mask
0 2
3
Dr. Edgar Seemann
2
3
7
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Forward scan
 We can choose different values for the filter mask
 The local distances, d, s and c, in the mask are
added to the pixel values of the distance map and
the new value of the zero pixel is the minimum of
the five sums
 Example:
3
2
3
5
3 2+2 2+3 3+5
2
0
?
?
2 2+0 2
?
?
?
?
?
?
?
Dr. Edgar Seemann
?
?
d
s
s
c
d
8
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Advantages and Disadvantages
 Fast


Distance transform has to be computed only once
Comparison for each shape location is cheap
 Good performance on uncluttered images (with
few background structures)
 Bad performance for cluttered images
 Needs a huge number of people silhouettes

But computation effort increases with the number of
silhouettes
Dr. Edgar Seemann
9
Research Group, Universität Karlsruhe (TH)
 To reduce the number of silhouettes to consider, silhouettes
can be organized in a template hierarchy
 For this, the shapes are clustered by similarity
cv:hci
Computer Vision for Human-Computer Interaction
Template Hierachy
Dr. Edgar Seemann
10
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Search in the hierarchy
 Matching the shapes, then corresponds to a
traversal of the template hierarchy
 How can we prune search branches to speed up
matching?

Thresholds depend on:





Edge detector (likelihood of gaps)
Silhouette sizes
Hierarchy level
Allowed shape variation
Thresholds are set statistically
during training
Dr. Edgar Seemann
11
cv:hci
Research Group, Universität Karlsruhe (TH)
Computer Vision for Human-Computer Interaction
Example Detections
Dr. Edgar Seemann
12
cv:hci
Research Group, Universität Karlsruhe (TH)
Computer Vision for Human-Computer Interaction
Video
Dr. Edgar Seemann
13
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Coarse-To-Fine Search
 Goal: Reduce search effort by discarding unlikely
regions with minimal computation
 Idea:


Subsample image and search
first at a coarse scale
Only consider regions with a
low distance when searching
for a match on finer scales
 Again, we have to find
reasonable thresholds
Dr. Edgar Seemann
Level 1
Level 2
Level 3
14
cv:hci
Research Group, Universität Karlsruhe (TH)
Computer Vision for Human-Computer Interaction
Protector System (Daimler)
Dr. Edgar Seemann
15
Research Group, Universität Karlsruhe (TH)
 So far edge orientation has been completely
ignored
Distance = small
 Idea: Consider edge orientation for each pixel
cv:hci
Computer Vision for Human-Computer Interaction
Adding edge orientation
Dr. Edgar Seemann
16
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Edge orientation - The math
 Given two shapes S, C, we can express the chamfer
distance in the following manner
 The orientation correspondence between two points is then
measured by
 The combined distance measure:
Dr. Edgar Seemann
17
Research Group, Universität Karlsruhe (TH)
 Adding statistical relevance of silhouette regions
further improves the results [Dimitrijevic06]
cv:hci
Computer Vision for Human-Computer Interaction
Statistical Relevance
Dr. Edgar Seemann
18
Research Group, Universität Karlsruhe (TH)
 Use multiple successive frames to build a spatiotemporal template (T={T1,…,TN})
 Allow spatial variations of dx, dy (due to motion
or camera movements)
cv:hci
Computer Vision for Human-Computer Interaction
Spatio-Temporal templates
Dr. Edgar Seemann
19
cv:hci
Research Group, Universität Karlsruhe (TH)
Computer Vision for Human-Computer Interaction
Example: single-frame vs. 3 frames
Dr. Edgar Seemann
20
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Quantitative Results
 Red: spatio-temporal templates + statistical
relevance
Dr. Edgar Seemann
21
Research Group, Universität Karlsruhe (TH)
 Restrict detection to a single articulation (when legs are in a
v-shaped position)
 Spatio-Temporal templates:


Allows more reliable detection of motion direction
Avoids confusions and some false positive detections
cv:hci
Computer Vision for Human-Computer Interaction
Video
Dr. Edgar Seemann
22
Research Group, Universität Karlsruhe (TH)
 Originally developed to compare histograms
 Idea: Find the minimal ‘flow’ to transform one
histogram to another
 Example:
cv:hci
Computer Vision for Human-Computer Interaction
Alternatives: Earth Mover’s Distance
Dr. Edgar Seemann
23
Research Group, Universität Karlsruhe (TH)
Chamfer:
EMD Matching
EMD:
• Detect edges in image
• For each possible object location
• Optimize correspondences between
known shape and edge image
Example detection
Distance measure basis
cv:hci
Computer Vision for Human-Computer Interaction
Earth mover’s distance (EMD)
Dr. Edgar Seemann
24
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
EMD – The math
 Variant of the transportation problem (possible solutions: Stepping
Stone Algorithm, Transportation-simplex method)
 Constraints
 EMD-Distance
Dr. Edgar Seemann
25
Research Group, Universität Karlsruhe (TH)
cv:hci
Computer Vision for Human-Computer Interaction
Advantages and Disadvantages
 Optimizes matching between silhouette and edge
structure in image
 Enforces one-to-one matchings (unlike chamfer)
 Allows partial matches
 Can deal with arbitrary features
 High computational complexity

Approximation is possible [Graumann, Darrel CVPR’94]
Dr. Edgar Seemann
26