Automatic scene inference for 3D object compositing

download report

Transcript Automatic scene inference for 3D object compositing

Automatic scene inference for 3D
object compositing
Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.;
Jin, H.; Fonte, R.; Sittig, M., David Forsyth
What is this system
Image editing system
Drag-and-drop object insertion
Place objects in 3D and relight
Fully automatic for recovering a comprehensive
3D scene model: geometry, illumination, diffuse
albedo, and camera parameters
• From single low dynamic range (LDR) image
Existing problems
• It’s the artist’s job to create photorealistic
effects by recognizing the physical space
• Lighting, shadow, perspective
• Need: camera parameters, scene geometry,
surface materials, and sources of illumination
What can not this system handle
• Works best when scene lighting is diffuse;
therefore generally works better indoors than
• Errors in either geometry, illumination, or
materials may be prominent
• Does not handle object insertion behind
existing scene elements
• Illumination inference: recovers a full lighting
model including light sources not directly
visible in the photograph
• Depth estimation: combines data-driven
depth transfer with geometric reasoning
about the scene layout
How to do this
• Need: geometry, illumination, surface
• Even though the estimates are coarse, the
composites still look realistic because even
large changes in lighting are often not
Indoor/outdoor scene classification
K-nearest-neighbor matching of GIST features
Indoor dataset: NYUv2
Outdoor dataset: Make3D
Different training images and classifiers are
chosen depending on indoor/outdoor scene
Single image reconstruction
• Camera parameters, geometry
– Focal length f, camera center (cx, cy) and extrinsic
parameters are computed from three orthogonal
vanishing points detected in the scene
Surface materials
• Per-pixel diffuse material albedo and shading
by Color Rentinex method
Data-driven depth estimation
• Database: rgbd
• Appearance cues for correspondences: multiscale SIFT features
• Incorporate geometric information
Data-driven depth estimation
Et: depth transfer
Em: Manhattan world
Eo: orientation
E3s: spatial smoothness in 3D
Scene illumination
Visible sources
• Segment the image into superpixels;
• Then compute features for each superpixel;
– Location in image
– Use 340 features used in Make3D
• Train a binary classifier with annotated data to
predict whether or not a superpixel is
emitting/reflecting a significant amount of light.
Out-of-view sources
• Data-driven: annotated SUN360 panorama
• Assumption: if photographs are similar, then
the illumination environment beyond the
photographed region will be similar as well.
Out-of-view sources
• Use features: geometric context, orientation maps,
spatial pyramids, HSV histograms, output of the light
• Measure: histogram intersection score, per-pixel inner
• Similarity metric of IBLs: how similar the rendered
canonical objects are;
• Ranking function: 1-slack, linear SVN-ranking
optimization (trained).
Relative intensities of the light sources
• Intensity estimation through rendering: adjusting until
a rendered version of the scene matches the original
• Humans cannot distinguish between a range of
illumination configurations, suggesting that there is a
family of lighting conditions that produce the same
perceptual response.
• Simply choose the lighting configuration that can be
rendered faster.
Physically grounded image editing
• Drag-and-drop insertion
• Lighting adjustment
• Synthetic depth-of-field
User study
• Real object, real scene VS inserted object, real
• Synthetic object, synthetic scene VS inserted
object, synthetic scene
• Produces perceptually convincing results