Transcript pptx

3D Object Recognition
Pipeline
Kurt Konolige, Radu Rusu, Victor
Eruhmov, Suat Gedikli
Willow Garage
Stefan Holzer, Stefan Hinterstoisser
TUM
Morgan Quigley, Stephen Gould
Stanford
Marius Muja
UBC
3D and Object Recognition
•
•
•
Provides more info than just visual texture
Good for scale and segmentation
Verification
Need a good device for 3D info
2
3D Cameras
3
Technology
Examples
Pro/Con
Stereo
Newcombe, Davison CVPR 2010
Not dense, smearing; real-time, good
resolution
Registration + regularization
Stereo + texture
WG device
Dense, real-time, good resolution
Short range
Laser
Laser line
line scan
scan
STAIR
STAIR Borg
Borg scanner
scanner
Structured
Structured light
light
PrimeSense
PrimeSense
Dense,
Dense, most
most accurate
accurate
Short
Short range,
range, not
not real
real time
time
Dense,
Dense, real-time,
real-time, good
good resolution
resolution
Short
Short range,
range, ambient
ambient light/scene
light/scene texture
texture
Phase
Phase shift
shift
SR4,
SR4, PMD,
PMD Canesta
Dense,
Dense, real-time,
real-time, medium
medium range
range
Low
resolution,
low
accuracy,
Low resolution, low accuracy, gross
gross errors
errors
Gated
Gated reflectance
reflectance
3DV
Canesta
Dense,
Dense, real-time
real-time
Low
Low resolution,
resolution, low
low accuracy
accuracy
Tabletop manipulation:
• Short range
• High resolution
• High range accuracy
• Real-time
WG Projected Texture
Stereo Device
• Paint the scene with texture from a projector
• vs. single camera with structured light
• Advantages:
• Simple projector
• Standard algorithms
• Full frame rates (640x480)
• Dynamic scenes
WG project texture device
Projector
• Red LED
• Eye safe
• Synchronized to
cameras
3D Fly-thru
Object Recognition Pipeline
Pre-filter
Detect
Verify
• Textured objects via keypoints [Victor
Eruhimov, Suat Gedikli]
• Untextured objects via DOT [Stefan Holzer,
Stefan Hinterstoisser]
• Simple 3D model matching [Marius Muja]
• STAIR 2D/3D features [Stephen Gould]
6
MOPED – Textured object
recognition with pose
• Model: Stereo view of an object at a known
pose
• Extract keypoints and features
• For a new scene, match keypoints to each
model
• Run SfM geometric check to verify and
recover pose
Torres, Romea, Srinivasa ICRA 2010
7
- Need texture
- Need high res camera
8
Dominant Orientation Templates (DOT)
Stefan Hinterstoisser, Stefan Holzer (TUM; CVPR 2010, ECCV 2010)
●
DOT is a template matching based approach
template
current scene
- Template is slid over the image to compute the response for each image position
- If response is above a threshold it is considered as detection of the template
DOT – Basic Principle
●
DOT uses gradients instead of color or gray values
template
current scene
- Gradients are less sensitive to illumination changes
- Gradients have orientation and magnitude
Offline Learning
●
Good learning is necessary to reduce false-positive rate
●
We try to use all available information to segment the object:
●
Point cloud from narrow stereo is used to detect the table and segment the
point cloud of the object
●
Object point cloud is used to create an initial mask
●
Mask is refined using GrabCut (see OpenCV)
False-Positive Rejection
●
Two more precise templates for validation:
●
more precise and not discretized gradient template
●
disparity template to compare expected with real disparities
False-Positive Rejection
●
Compute error between reference point cloud and point cloud at
detected position
Optimize initial 3D point cloud pose given from the detection
Directly gives object pose if model is associated with
learned point clouds
14
15
16
17
18
STAIR Vision Library (SVL)
Stanford STAIR project [Andrew Ng, Stephen Gould]
•
•
•
Initially developed to support
the Stanford AI Robot
(STAIR) project
Builds on top of OpenCV
computer vision library and
Eigen matrix library
Provides a range of software
infrastructure for
•
•
•
•
computer vision
machine learning
probabilistic graphical
models
Hosted on SourceForge
Object Detection in SVL
•
Sliding-window object
detector
•
•
•
Features are extracted from a
local window
Learned boosted decision-tree
classifier scores each window
Image is scanned at multiple
resolutions to detect objects at
different scales
[Quigley et al., ICRA 2009]
Image Channels
•
Image decomposed into multiple channels
•
Depth at each pixel, obtained from a laser
scanner, can be thought of as an additional
channel
intensity image
edge map
depth map
[Quigley et al., ICRA 2009]
Object Detection Features
•
Learn a “patch”
dictionary over intensity,
edge and depth
channels
•
•
•
Patches encode localized
templates for matching
Depth patches capture
shape; intensity and
edge patches capture
appearance
Patch responses (over
entire dictionary) are
combined to form the
feature vector
[Quigley et al., ICRA 2009]
Results
•
•
•
150 images of cluttered indoor scenes
5-fold cross-validation
Depth information provides significant improvement
in area under precision-recall curve
8% improvement
3% improvement
38% improvement
Conclusions
• Realtime, accurate 3D devices are becoming
available
• 3D can help in object detection for
untextured objects
- Combo of visual and 3D features best
• 3D is useful for verification
• Check out the PR2 Grasping Demo!
24