Joint Depth Map and Color Consistency Estimation for

Download Report

Transcript Joint Depth Map and Color Consistency Estimation for

Joint Depth Map and Color Consistency
Estimation for Stereo Images with
Different Illuminations and Cameras
Yong Seok Heo, Kyoung Mu Lee and Sang Uk Lee
IEEE Transactions on Pattern Analysis and Machine
Intelligence 2012
Overview
• Introduction
• Relate Work
• Algorithm
• Experimental Results
Introduction (1/3)
• image color values can be easily affected by
radiometric variations including global intensity
changes and local intensity changes (caused by
varying light, vignetting and non-Lambertian
surface) and noise.
• For stereo matching, most algorithms assume
radiometrically calibrated images
• However, there exist many real and practical
situations or challenging applications in which
radiometric variations between stereo images are
inevitable.
Introduction (2/3)
• For examples: 3D reconstruction of aerial
images [1], general multiview stereo [4], 3D
modeling with internet photos (e.g. Photo
Tourism [5] and Photosynth [6]), and
PhotoModeler [7], etc.
Introduction (3/3)
• In general, color consistency and stereo matching
are a chicken-and-egg problem.
• Color consistency can enhance the performance
of stereo matching, while accurate disparity maps
can improve the color consistency or constancy.
• In this paper, new iterative framework that infers
both accurate disparity maps and colorconsistent
images for radiometrically varying stereo images
are proposed.
Relate Work
• Stereo Matching:
– Census transform (7 × 7) [41]
– Mutual Information (MI) [22]
– Adaptive Normalized Cross Correlation (ANCC)
[15]
• Color Consistency:
– Color Histogram Equalized (CHE) [39]
Non-parametric Local Transforms for
Computing Visual Correspondence [41]
R. Zabih and J. Woodfill, in Proc. European
Conference on Computer Vision, 1994.
• 7 × 7 windows.
• Rank Transform:
– Non-parametric measure of local intensity.
– Using the number of pixels in the local region whose
intensity is less than the center pixel.
• Census Transform:
– Non-parametric summarizes of local image structure.
– Using the number of neighboring pixels whose
intensity is less than the center pixel.
Stereo Processing by Semiglobal
Matching and Mutual Information [22]
H. Hirschmuller, IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008.
𝑃𝐼1;𝐼2 : Joint probability distribution
Robust Stereo Matching Using Adaptive
Normalized Cross-Correlation [15]
Y. S. Heo, K. M. Lee, and S. U. Lee, IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 33, no. 4, pp. 807–822, 2011.
• The whole window information around
matching pixels is used by the NCC in order to
find the mean and standard deviation.
• Weight distribution information around
matching pixels using the bilateral filter.
Log-chromaticity Color Space [15](1/2)
• 𝜌(𝑝) : brightness factor
• 𝑎𝑘 : illuminant color factor of channel 𝑘
• 𝛾 : gamma correction factor
[15] Y. S. Heo, K. M. Lee, and S. U. Lee, “Robust stereo matching using adaptive
normalized cross-correlation,” IEEE Trans. Pattern Analysis and Machine Intelligencea
Log-chromaticity Color Space (2/2)
• 𝛾 and 𝐶𝑘 are constant for each channel k
• 𝐿𝑘 (𝑝) is an invariant color value for pixel p
under radiometric variations.
• M : multiplication factor which is set to 500.
SIFT
• The SIFT descriptor for each pixel is computed
with this color value 𝐼′𝑘 for each channel 𝑘
and with the intensity (gray) value 𝐼𝑔 of the
original image 𝐼. We denote 𝑣𝑘 and 𝑣𝑔 as SIFT
descriptors.
Joint Probability Density Function (1/2)
• Joint probability density function (pdf)
represents the statistical relationship between
the left and right image color values.
• compute the joint pdf by means of the SIFT
descriptors rather than pixel values to encode
the spatial gradient information.
Joint Probability Density Function (2/2)
• 𝑗𝑘𝑅 ∈ 𝐽𝑘𝑅 , 𝑗𝑘𝐿 ∈ 𝐽𝑘𝐿
• 𝑍 : normalization constant
• SIFT-weighting factor 𝑢𝑘 𝑝, 𝑞 ,:
• ・ : Euclidean distance
• 𝑣𝑘𝐿 𝑝 , 𝑣𝑘𝑅 𝑞 ∶ SIFT descriptors
Linear Function Estimation
• The log-chromaticity color space is linear, a linear
function can be fitted to the joint pdf.
• To find the linear functions, we use the Huber
distance function ρ(r):
• 𝑟 : the distance between the line and the Point
• 𝐶 : the constant threshold parameter (𝐶 =
1.345)
• Using the OpenCV function ‘cvFitLine()’.
Disparity Map Estimation (1/9)
• 𝐸𝑑 𝑓 : data energy
• 𝐸𝑠 𝑓 : smoothness energy
• N : local four-neighborhood system
• 𝐷𝑝 𝑓𝑝 : the data cost that encodes the penalty for the
dissimilarity of corresponding pixels
• 𝑉𝑝𝑞 𝑓𝑝 , 𝑓𝑞 : the smoothness cost that penalizes the
discontinuity of disparities between neighboring pixels.
[34] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
Disparity Map Estimation (2/9)
• We combine the mutual information and the
SIFT descriptor in our data cost [11], [12].
• Where more weight is given to 𝑣𝑘 than 𝑣𝑔 to
further emphasize features in the logchromaticity color values.
• The SIFT term has important role at the first
iteration.
Disparity Map Estimation (3/9)
• 𝑚𝑘 ‧, ‧ and 𝑚𝑔 ‧, ‧ are mutual information
(MI) terms for log-chromaticity color 𝐽𝑘 and
𝑆𝐼𝐹𝑇
original gray images 𝐼𝑔 using 𝑃𝑘
‧, ‧ .
Disparity Map Estimation (4/9)
• In (10), λ𝑝 is the adaptive weighting factor of a
pixel p between α𝑝 f𝑝 and β𝑝 f𝑝 :
[35] X. Hu and P. Mordohai, “Evaluation of stereo
confidence indoors and outdoors,” in Proc. IEEE Conference on
Computer Vision and Pattern Recognition, 2010.
Disparity Map Estimation (5/9)
Disparity Map Estimation (6/9)
• MI tends to make some regions over-smooth, while SIFT can
also blur some boundaries and is weak at textureless regions.
• We incorporated segment-based plane-fitting constraints [36],
[37] to produce sharper and more accurate disparity map.
• In this framework, the disparity f is parameterized as 𝑓 =
𝑎𝑥 + 𝑏𝑦 + 𝑐.
• Before extracting 3-D plane parameter for each segment,
the mean-shift segmentation method [38] is applied to the
left and right original color images, independently.
[36] L. Hong and G. Chen, “Segment-based stereo matching using graph cuts,” in Proc.
IEEE Conference on Computer Vision and Pattern Recognition, 2004.
[37] J. Sun, Y. Li, S. B. Kang, and H.-Y. Shum, “Symmetric stereo matching for occlusion
handling,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005.
[38] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2001.
Disparity Map Estimation (7/9)
• In the first step, initial plane-fitting is
performed using only reliable pixels for
each segment.
• This solution is sensitive to outliers. Hence,
using the estimated plane parameter, we recompute disparity value 𝑓 in the range [𝑓 −
𝛿, 𝑓 + 𝛿] for each pixel:
Disparity Map Estimation (8/9)
• After the initial plane-fitting, we perform a
refined plane-fitting scheme to find more
𝑜𝑝𝑡
accurate plane parameter 𝜋𝑠 for each
segment s.
• 𝑁𝑟 : the number of reliable pixels
• 𝑁𝑠 : the number of pixels that have the same
disparities as the disparity map obtained from
(8).
Disparity map estimation (9/9)
• 𝜔 :weighting factor
• 𝐶𝑂 : constant occlusion cost
• Using (17) and (18), the total energy 𝐸(𝑓) in (8)
is minimized by the Graph-cuts expansion
algorithm[34].
[34] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
Fast approximate energy
minimization via graph cuts [34]
T. Gevers and H. Stokman, IEEE Trans. Pattern Analysis and Machine Intelligence, Jan. 2004.
• The minimum cut problem is to find the
cut with smallest cost. There are
numerous algorithms for this problem
with low-order polynomial complexity.
• Using two move to refine that: swap
and expansion.
Occlusion Map Estimation [37]
• We define 𝐵𝐿 (𝑝; 𝑓 𝑅 ) ∈ 0, 1 which is a
binary map for the left image.
• 𝐵𝐿 (‧; 𝑓 𝑅 ) is computed by warping 𝑓 𝑅 to the
left image, and assigning ‘0’ to visiting pixels,
and ‘1’ to non-visiting pixels.
• 𝛽𝑑 and 𝛽𝑠 : the weighting factors, set as 4.0
and 1.4.
[37] J. Sun, Y. Li, S. B. Kang, and H.-Y. Shum, “Symmetric stereo matching for occlusion
handling,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005.
Color Histogram Equalization (CHE)
• The color histogram equalization (CHE) method
was proposed in [39].
• 𝑁𝑇 : the total number of pixels in the image
• 𝐼𝑚𝑎𝑥 = 255 : the maximum value of color
• If 𝑃(𝐼𝑘 ≤ 𝐼𝑘 (𝑝)) is invariant under any illumination
change, then the 𝐼𝑘𝑖𝑛𝑣 (𝑝) is also invariant.
• This method is stable only under global radiometric
changes such as exposure or gamma changes.
Stereo Color Histogram
Equalization (SCHE)
𝐿
𝑊(𝐽𝑘 (𝑝))
• Where
is the transformed value of
𝐿
𝐽𝑘 𝑝 using the estimated linear function from
the joint pdf.
Boosting the Disparity Map Estimation
Using SCHE Images
• The color consistency of the SCHE images is
based on the accurate disparity map estimation.
• Conversely, the estimation of the disparity map
can benefit from the color-consistent SCHE
stereo images.
Experimental Results
• Using various test images with ground truth
disparity map such as Aloe, Dolls, Moebius, Art,
Laundry, Reindeer, Rocks1, and Cloth4 dataset in
[40] that have different radiometric variations.
• Each data set has three different camera exposures
(0~2) and three different configurations of the
light source (1~3).
• The total running time of our method for Aloe
images (size : 427 × 370, disparity range : 0-70),
for example, is about 4 minutes on a PC with
Intel(R) Core(TM) i7-2600K 4.5GHz CPU.
“http://vision.middlebury.edu/stereo/,” 2012.
Experimental for SCHE
Color consistency performance
• Computed the RMSE [39]
images performed individually
for the left and right original
images.
• ‘CHE1’ means RMSE for the
stereo images after individually
performing CHE using original
input stereo images.
• ‘CHE2’ means RMSE for the
stereo images after individually
performing CHE using stereo
images in the log-chromaticity
color space.
[39] G. Finlayson, S. Hordley, G. Schaefer, and G. Y. Tian, “Illuminant and device invariant
colour using histogram equalisation,” Pattern Recognition, vol. 38, no. 2, pp. 179–190, 2005.
Effects of SIFT Weight in MI Computation
• We turned off both the SIFT term (𝛽𝑝 (・)) in
(10) and the plane-fitting constraint in (17)
from our data cost.
Effects of Adaptive Weight
• Compared the results using this preliminary data cost in
(10) by varying 𝜆𝑝 which is fixed as the same value for
all pixels.
Stereo Matching Performance Comparison
• Census transform [41]
• Mutual Information (MI) [22]
• Normalized Cross Correlation (NCC) and
Adaptive Normalized Cross Correlation
(ANCC) [15].
• To evaluate the effects of exposure changes,
we only changed the index of exposure.
Stereo Matching Performance Comparison
Different Configurations of the Light Source
• Only changed the index of the light
configuration while fixing the exposure.
Tests for Scenes With Different Cameras
• The left images were taken by Canon IXUS
870 IS, and the right images were taken by
Sony Cyber-shot DSC-W570.
• The left images were taken with flash, while
the right images were taken without flash.
• The exposure times of the left and right images
were set 1/60 sec and 1/100 sec, respectively.