Transcript pptx - Smart Geometry Processing Group
Annotating RGBD Images of Indoor Scenes
Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB SA2014.SIGGRAPH.ORG
SPONSORED BY
Outline
Motivation Related Works Annotation Procedure User Study SA2014.SIGGRAPH.ORG
SPONSORED BY
Motivation
Scene understanding is a popular topic.
RGBD dataset with high quality semantic annotations are valuable: Learning Evaluations Two fundamental problems • Data Acquisition and Annotation SA2014.SIGGRAPH.ORG
SPONSORED BY
Motivation
Scene understanding is a popular topic.
RGBD dataset with high quality semantic annotations are valuable: Learning Evaluations Two fundamental problems • Data Acquisition and Annotation SA2014.SIGGRAPH.ORG
SPONSORED BY
RGBD Indoor Datasets
Cornell-RGBD (2011-12) : 24 labeled office scenes NYU2 (2011-12) : 1449 labeled indoor scenes – 408,000+ RGBD videos frames ( unlabeled ) SUN 3D (2013) : 415+ full captured room – 10+ room is full labeled , annotations are propagated through video.
UZH & ETH 3D Scanned Point Datasets (2014) : 42 x full captured room – high quality point clouds ( unlabeled )
Object Detection and Classification from Large-Scale Cluttered Indoor Scans (EG 2014)
… SA2014.SIGGRAPH.ORG
SPONSORED BY
Motivation
Data annotation is a painstaking and time consuming task
OMG! So many data need to be annotated
SA2014.SIGGRAPH.ORG
SPONSORED BY
Motivation
Data annotation is a painstaking and time consuming task Interactive tool for annotating RGBD indoor scenes
We need a good tool!
SA2014.SIGGRAPH.ORG
SPONSORED BY
Motivation
Data annotation is a tedious and time consuming task Interactive tool for annotating RGBD indoor scenes Leverage both the cognitive ability of human and computational power of machine.
SA2014.SIGGRAPH.ORG
SPONSORED BY
RELATED WORKS
SA2014.SIGGRAPH.ORG
SPONSORED BY
Image Annotation
LabelMe: a database and web-based tool for image annotation. Russell et. al. , IJCV 2007 SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels, Xiao et.al. ICCV 2013 Cheaper by the Dozen: Group Annotation of 3D Data, Boyko et. al., UIST 2014 SPONSORED BY SA2014.SIGGRAPH.ORG
Scene Understanding using RGBD Data
Image-based
Indoor segmentation and support inference from RGBD images.
Silberman et.al. ECCV 2012.
RGB-(D) scene labeling: Features and algorithms
. Ren et. al. CVPR. 2012 Proxy-based
Imagining the unseen: Stability- based cuboid arrangements for understanding cluttered indoor scenes.
Shao et. al., SIGGRAPH Asia 2014
PanoContext: A whole-room 3d context model for panoramic scene understanding.
Zhang et. al., ECCV 2014
Holistic scene understanding for 3D object detection with rgbd cameras
. , Lin et. al., ICCV 2013
3D- based reasoning with blocks, support, and stability
. Xiao et. al. CVPR 2013 SPONSORED BY SA2014.SIGGRAPH.ORG
Annotation Procedure: Overview
Input : RGB-D image Output : Seg., Label, Box proxy, Support structure
Machine
Input Output
Å User
SPONSORED BY SA2014.SIGGRAPH.ORG
Input RGB-D Image
Annotation Procedure: Overview
Machine Session
Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure
User Session
SA2014.SIGGRAPH.ORG
SPONSORED BY
Annotation Procedure:
Preprocessing
Estimate normal Perform over-segmentation using both color and normal map .
• Efficient graph based image segmentation [Felzenszwalb et.al. 2004] • The coarser segmentation is used for room estimation.
• The finer segmentation is used for user assisted object segmentation.
SPONSORED BY SA2014.SIGGRAPH.ORG
Annotation Procedure:
Extracting Room Layout
Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure Perform RANSAC fitting on each seg.
Roughly align point cloud by Gravity Info 𝑔 Find the floor segmentation by : E i = (1 −< n i , y e > ) + inverse ratio of seg. size + normalized Y coords Estimate wall candidates like 𝐸 = < 𝑛 𝑖 , 𝑓𝑙𝑜𝑜𝑟 > 𝑖 * If gravity info is not available: 𝐸 = < 𝑛 𝑖 , 𝑛 𝑗 𝑖 𝑖≠𝑗 > SA2014.SIGGRAPH.ORG
Annotation Procedure:
Input RGB-D Image
User Scribbles
Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure Check floor and walls hypotheses • If the hypotheses fail, user clicks the segment to identify floor and walls .
User draws scribbles to extract the object segments SPONSORED BY
User
SA2014.SIGGRAPH.ORG
Annotation Procedure:
Estimating Boxes
Input RGB-D Image Extract Room SA2014.SIGGRAPH.ORG
Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure • Box orientation = Find out an orthogonal basis in 3D domain ( 3 unknowns direction ) • We assume one direction of box is parallel to the normal of floor (1 unknowns direction, 1 by cross product) Box Fitting Method : 1.
Filtering point cloud by KNN 2.
3.
Project point cloud of a box to floor plane Fit a line in 2D domain to extract a major direction 4.
Annotation Procedure:
Annotate Label and 3D Structure
Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure User Tasks : 1. Type in the object label 2. Drag an arrow to specify the support relationships SPONSORED BY
User
SA2014.SIGGRAPH.ORG
Annotation Procedure:
Box Quality Refinement (Optional)
Input RGB-D Image Extract Room Draw Scribbles Estimate Boxes Annotate Label and Structure Output Annotated 3D Structure User Tasks : 1. Adjust the orientation of boxes 2. Adjust the size of boxes SPONSORED BY
User
SA2014.SIGGRAPH.ORG
USER STUDY
SA2014.SIGGRAPH.ORG
SPONSORED BY
User Study : Settings
• Select 50 x scenes NYU2 across 7 scene class from • Recruit 2 users , • Each user is requested to annotate 50 x scenes • Target class : 24 merged object classes • List : bed, chair, cabinet, dresser, television, night stand, table, sofa, picture, pillow, … • Each scene contains 3-6 objects SPONSORED BY SA2014.SIGGRAPH.ORG
User Study : Results
• System Process Time: calculate normal, fitting planes and boxes: < 3 sec [in C++] • Annotation Time: ( 50 x Scenes ) Task Type Check Room Draw Scribbles Type Labels Drag Supports Boxes Adjustment Mean time per box - 16 sec 4 sec 2 sec 11 sec Mean time per scene 1.6 sec 1 min 17 sec 9 sec 35 sec Total Time 1.3 min 51 min 13 min 7.5 min 29 min ( Accuracy = 64 %) TOTAL = 101 min SPONSORED BY SA2014.SIGGRAPH.ORG
Demo
SA2014.SIGGRAPH.ORG
SPONSORED BY
Conclusion
An interactive system to facilitate annotating RGBD indoor scenes.
Generating high quality ground truth data with rich annotations Object segments Object labels 3D geometry 3D structure SPONSORED BY SA2014.SIGGRAPH.ORG
On Going Work
The major bottleneck lie in manual operations: Drawing scribbles Refine box proxy Typing labels Specify structure Incorporate inferring algorithm and 3D structure analysis to reduce the manual burden from the user.
SPONSORED BY SA2014.SIGGRAPH.ORG
SA2014.SIGGRAPH.ORG
THANKS YOU !
SPONSORED BY