RGB-D Perception

Transcript RGB-D Perception

Practical Modeling and Recognition
using RGB-D Cameras
Xiaofeng Ren, Dieter Fox
Intel Labs, University of Washington
Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst,
Mike Krainin, Hao Du and others @ University of Washington
June 27, 2011
RGB-D Camera: Color+Depth
640x480, 30Hz, color + dense depth
2
7/6/2015
At RGB-D 2010 Workshop:
 3D modeling of indoor environments
RGBD-ICP matching + Loop closure; Flythrough visualization
 3D modeling of everyday objects
Robot in-hand modeling through real-time registration and modeling
 Robust recognition of everyday objects
Preliminary object dataset captured with RGB-D
Preliminary results on sparse distance learning
3
7/6/2015
RGB-D Perception @ UW and Intel
 3D modeling of objects & environments
Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ‘11]
Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10]
Interactive 3D Visualization: [Cheng, Ren; ’11]
 Robust recognition of everyday objects
Egocentric recognition: [Ren, Gu; CVPR ’10]
Joint object-pose recognition: [Gu, Ren; ECCV ’10]
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10, IROS ’11]
Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper)
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
4
7/6/2015
RGB-D Perception @ UW and Intel
 3D modeling of objects & environments
Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]
 Robust recognition of everyday objects
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
5
7/6/2015
RGB-D Mapping: Pipeline
6
7/6/2015
7
7/6/2015
[Henry-Krainin-Herbst-Ren-Fox]
Comparing to Laser-based Mapping
8
7/6/2015
From RGB-D to Interactive Modeling
9
7/6/2015
[Du-Henry-Ren-Fox-Goldman-Seitz; Ubicomp 11]
Discovering and Learning Objects
10
7/6/2015
[Herbst-Henry-Ren-Fox; ICRA 2011]
Discovering and Learning Objects
•
•
•
•
11
(Robot) capturing scenes in RGB-D over extended period of time
3D scene reconstruction for efficient representation
Proper sensor models for both color and depth
Pairwise scene differencing with sensor models and MRF clean-up
7/6/2015
[Herbst-Henry-Ren-Fox; ICRA 2011]
Discovering and Learning Objects
• Handling changed detections in multiple visits with multi-label MRF
• Matching potential objects by movements and appearance
•
ICP for shape matching
•
Color image recognition with kernel descriptors
• Spectral clustering for object discovery
12
7/6/2015
[Herbst-Ren-Fox; IROS 2011]
Discovering and Learning Objects
13
7/6/2015
[Herbst-Ren-Fox; IROS 2011]
Object Learning through Manipulation
14
7/6/2015
[Krainin-Henry-Ren-Fox IJRR 2011]
Next-Best-View Planning
15
7/6/2015
[Krainin-Curless-Fox ICRA 2011]
RGB-D Perception @ UW and Intel
 3D modeling of objects & environments
Indoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]
 Robust recognition of everyday objects
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
16
7/6/2015
RGB-D Object Dataset
300 objects from 51 categories, 250,000 RGB-D views
http://www.cs.washington.edu/rgbd-dataset/
(search “rgbd”+”dataset”)
17
7/6/2015
[Lai-Bo-Ren-Fox; ICRA 2011]
Cluttered scenes
Benchmarking RGB-D Recognition
Category-Level Recognition (51 categories)
Classifier
Shape (Depth)
Vision (RGB)
RGB-D
Linear SVM
51.71.8
72.73.2
80.52.9
Kernel SVM
63.52.3
72.93.2
83.03.7
RandomForest
65.52.4
73.13.7
78.54.1
Kernel Desc.
+Linear SVM
75.72.2
76.12.6
84.12.2
Instance-Level Recognition (303 instances)
18
Classifier
Shape (Depth)
Vision (RGB)
RGB-D
Linear SVM
29.40.5
90.40.5
89.60.5
Kernel SVM
50.10.9
90.80.5
90.40.6
RandomForest
51.61.1
89.60.7
90.20.3
7/6/2015
[Lai-Bo-Ren-Fox; ICRA 2011]
RGB-D Object Recognition
?
SIFT (or HOG)
Bag-of-Words
Sparse Coding (LLC,LCC)
Spatial Pyramid Matching (SPM)
Efficient Match Kernel (EMK)
Feed-forward Networks
Your favorite
model
Recognition
Image
19
7/6/2015
Patch features
Image features
Kernel Descriptors: Generalizing SIFT
Linear kernel on SIFT descriptors
= a product of two histograms
= a product summed over all
pairs of pixels
normalized
gradient magnitude
Gradient
Match Kernel
uP vQ
Includes SIFT as a special case
Avoids any “binning” issues in histogram features
7/6/2015
pixel
coordinates
Kgrad ( P, Q)   mu mv ko (u ,v )k p (u, v)
image patch
20
gradient
orientation
[Bo-Ren-Fox; NIPS 2010]
kernels
Kernel Descriptors: Image Recognition




Low-dimensional approximations of match kernels
Explicitly compute descriptors/features from patches
Easily generalize gradient features to color, binary shape, etc
Outperform SIFT and sophisticated feature learning techniques
Scene-15
KDES:
SIFT:
Caltech-101
86.7%
82.2%
CIFAR10
KDES: 76.4%
SPM[1]: 64.4%
KDES:
76.0%
mcRBM-DBN[3]: 71.0%
CDBN[2]:
LCC[4]:
LCC[4]:
TCNN[5]:
65.5%
73.4%
74.5%
73.1%
[1] Lazebnik, Schmid, Ponce, CVPR ‘06.
[2] Lee, Grosse, Ranganath, Ng, ICML ‘09.
[3] Ranzato & Hinton, CVPR ‘10.
[4] Yu & Zhang, ICML ‘10.
[5] Le, Ngiam, Chen, Chia, Koh & Ng, NIPS ‘10.
21
7/6/2015
[Bo-Ren-Fox; NIPS 2010]
Kernel Descriptors: RGB-D Recognition
Category-Level Recognition (51 categories)
Classifier
Shape (Depth)
Vision (RGB)
RGB-D
Linear SVM
51.71.8
72.73.2
80.52.9
Kernel SVM
63.52.3
72.93.2
83.03.7
RandomForest
65.52.4
73.13.7
78.54.1
Kernel Desc.
+Linear SVM
75.72.2
76.12.6
84.12.2
Instance-Level Recognition (303 instances)
22
Classifier
Shape (Depth)
Vision (RGB)
RGB-D
Linear SVM
29.40.5
90.40.5
89.60.5
Kernel SVM
50.10.9
90.80.5
90.40.6
RandomForest
51.61.1
89.60.7
90.20.3
7/6/2015
[Bo-Lai-Ren-Fox; CVPR 2011; IROS 2011]
Toward Practical Recognition
•
•
•
•
23
7/6/2015
A mug?
Kevin’s mug?
A mug facing right?
A mug with orientation (90,15,0)
……
Scalable and Hierarchical Recognition
8 discrete views
continuous angles
24
7/6/2015
[Lai-Bo-Ren-Fox; AAAI 2011]
Joint Recognition with Object-Pose Tree
•
•
•
•
Tree structure enables efficient joint recognition
Object-Pose tree outperforms nearest neighbor and 1vsA baselines
Joint tree-based learning outperforms separate learning
Promising pose estimation results on generic objects
• Natural tree structure of category-instance-pose works really well
RGB-D Dataset: 300 objects, 51 categories, 250,000 color-depth pairs
25
7/6/2015
[Lai-Bo-Ren-Fox; AAAI 2011]
Application: Interactive LEGO
RGB-D used for object recognition and hand tracking
26
7/6/2015
[Ziola-Harrison-Powledge-Lai-Bo-Ren-Fox]
Application: Chess Playing Robot
27
7/6/2015
[Matuszek-Mayton-Aimi-Bo-Deisenroth-Chu-Kung-LeGrand-Smith-Fox]
RGB-D Perception: Summary
 RGB-D cameras provide synchronized color and depth,
making visual perception both robust and efficient.
 RGB-D mapping generates detailed 3D maps at near realtime and enables on-the-fly user interaction and feedback.
 Kernel descriptors provide a principled way to extract rich
features from pixel attributes, outperforming SIFT and
leading to robust RGB-D recognition.
 Robust RGB-D recognition and modeling enable interesting
scenarios for object-aware interactions and applications.
28
7/6/2015
RGB-D Perception: The Future?
 Will RGB-D have a deep impact on vision applications?
YES! It’s already happening, faster than we can track.
 Will RGB-D start a revolution in vision applications?
NO.
We still need to solve recognition, segmentation,
tracking, scene understanding, etc. etc.
YES! RGB-D helps address two BIG issues in computer vision:
loss of 3D from projection; lighting conditions.
RGB-D helps “abstract away” many low-level problems.
 Is RGB-D the future for smart vision-based systems?
Why not? At $50 today and $10 tomorrow.
29
7/6/2015
THANK YOU
30
7/6/2015