Transcript Slides

Yunhai Wang1 Minglun Gong1,2 Tianhua Wang1,3 Hao (Richard) Zhang 4
Daniel Cohen-Or 5 Baoquan Chen1,6
1Shenzhen
2 Memorial
4 Simon
5
Institutes of
Advanced Technology
Fraser University
University
of Newfoundland
3Jilin
Tel-Aviv University
6 Shandong
University
University
 One of the most fundamental tasks in shape
analysis
 Low-level cues (minimal rule; convexity) alone
insufficient
2/40
Learning segmentation
Unsupervised co-analysis
[Kalograkis et al. 10]
[Sidi et al. 2011]
Keys to success: amount & quality of
labelled or unlabelled 3D data
3
Joint segmentation
Active co-analysis
[Huang et al. 2011]
[Wang et al. 2012]
3/40
 How many 3D models of strollers, golf carts, gazebos, …?
 Not enough 3D models = insufficient knowledge
 Labeling 3D shapes is also a non-trivial task
380 labeled meshes over 19 object
categories
4/40
About 14 million images across almost 22,000 object
categories
Labeling images is quite a bit easier than labeling 3D
shapes
5/40
Self-intersecting; non-manifold
Incomplete
Real-world 3D models (e.g., those from Tremble
Warehouse) are often imperfect
6
6/40
 Treat a 3D shape as a set of projected binary
images
 Label these images by
learning from vast amount of
image data
 Then propagate the image
labels to the 3D shape
 Alleviate various data
artifacts in 3D, e.g., selfintersections
7/40
 Joint image-shape analysis via projective
analysis for semantic 3D segmentation
 Utilize vast amount of available image data
 Allowing us to analyze imperfect 3D shapes
8/40
 Bi-class Symmetric Hausdorff distance = BiSH
 Designed for matching 1D binary images
 More sensitive to topology changes (holes)
 Caters to our needs: part-aware label transfer
9/40
Many works on 2D-3D fusion, e.g., for reconstruction
[Li et al.11]
Image-guided 3D modeling
[Xu et al.11]
10
10/40
Image-space simplification error
[Lindstrom and Turk 10]
Light field descriptor for
3D shape retrieval
[Chen et al.03]
We deal with the higher-level and more
delicate task of semantic 3D segmentation
11/40
11
 PSA for 3D shape segmentation
 Region-based binary shape matching
 Results and conclusion
12/40
Labeling involves GrabCut and some user assistance
13/40
 Assume all objects are upright oriented; they mostly are!
 Project an input 3D shape from multiple pre-set
viewpoints
14/40
 For each projection of the input 3D shape, retrieve top
matches from the set of labelled images
15/40
 Select top (non-adjacent) projections with the smallest
average matching costs for label transfer
16/40
 Label transfer is done per
corresponding horizontal slabs
Later …
 Pixel correspondence
straightforward
17/40
 Label transfer is weighted by a confidence value per pixel
 Three terms based on image-level, slab-level,
and pixel-level similarity: more similar = higher
confidence
18/40
 Probabilistic map over input 3D shape: computed by
integrating per-pixel confidence values over each shape
primitive

One primitive projects to multiple pixels in multiple
images
 Per-pixel confidence gathered over multiple retrieved
images
19/40
 Final labeling of 3D shape: multi-label alpha expansion
graph cuts based on the probabilistic map
20/40
 PSA for 3D shape segmentation
 Region-based binary shape matching
 Results and conclusion
21/40
…
Projections of input 3D shape
…
Database of (labeled) images
 Goal: find shapes most suitable for label transfer and
FAST!
 Not a global visual similarity based retrieval
 Want part-aware label transfer but cannot reliably
segment
 Characteristics of the data to be matched
Classical descriptors, e.g., shape context, interior
distance
shape
context
(IDSC),
GIST,
Zenikenot
moments,

Possibly
complex
topology
(lots
of holes),
just a
Fourier
descriptors, etc., do not quite fulfill our needs
contour
22/40
 All upright orientated: to be exploited
Takes advantage of upright orientation
23/40
 Cluster scan-lines into smaller number of slabs --efficiency!
 Hierarchical clustering by a distance between adjacent
slabs
Classical choice for distance:
symmetric Hausdorff (SH)
But not sensitive to topology
changes; not part-aware
24/40
C
SH(C,B)=2, SH(Cc, Bc)=2
B
A
SH(A,B)=2, SH(Ac, Bc)=10
B
SH for only one class may not be topologysensitive
A bi-class SH distance is!
25/40
C
B
SH(C,B)=2, SH(Cc, Bc)=2
BiSH(C,B) = 2
A
B
SH(A,B)=2, SH(Ac, Bc)=10
BiSH(A,B) = 10
26/40
BiSH is more part-aware: new slabs near part boundaries
BiSH
SH
27/40
 Slabs are scaled/warped vertically for better alignment
 Another measure to encourage part-aware label transfer
Warp
Slabs of labeled image warped to
better align with slabs in projected
image
Recolor
Slabs recolored: many-to-one slab
matching possible
28/40
 Dissimilarity between slabs: BiSH scaled by slab
height
 Slab matching allows linear warp: optimized by a
dynamic time warping (DTW) algorithm
 Dissimilarity between images: sum over slab
dissimilarity after warped slab matching
29/40
 PSA for 3D shape segmentation
 Region-based binary shape matching
 Results and conclusion
30/40
 Same inputs, training data (we project), and experimental
setting
 Models in [K 2010]: manifold, complete, no selfintersections
 PSA allows us to handle any category and imperfect
31/40
 11 object categories; about 2600 labeled images
 All input 3D shapes tested have self-intersections
as well as other data artifacts
32/40
Pavilion
(465 pieces)
Bicycle
(704 pieces)
33/40
34/40
 Matching two images (512 x 512) takes 0.06
seconds
 Label transfer (2D-to-2D then to 3D): about 1
minute for a 20K-triangle mesh
 Number of selected projections: 5 – 10
 Number of retrieved images per projection: 2
35/40
 Projective shape analysis (PSA): semantic
3D segmentation by learning from labeled
2D images
 Demonstrated potential in labeling 3D
models: imperfect, complex topology, over
any category
36/40
36
 Utilize the rich availability and ease of
processing of photos for 3D shape analysis
 No strong requirements on quality of 3D model
37/40
 Inherent limitation of 2D
projections: they do not fully
capture 3D info
 Inherent to data-driven: knowledge has to be in
data
 Relying on spatial and not feature-space
 analysis
Assuming upright; not designed for articulated
shapes
38/40
 Labeling 2D images is still tedious:
unsupervised projective analysis
 Additional cues from images and projections,
e.g., color, depth, etc.
 Apply PSA for other knowledge-driven
analyses
39/40
More results and data can be found
from http://web.siat.ac.cn/~yunhai/psa.html
40/40
40