Transcript PowerPoint 簡報
On Building an Accurate Stereo Matching System on Graphics Hardware
Xing Mei ; Xun Sun ; Mingcai Zhou ; Shaohui Jiao ; Haitao Wang ; Xiaopeng Zhang Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE
Outline • Introduction • Related Works • Algorithmn • CUDA Implementation • Experimental Results • Conclusion
Introduction
Introduction Dense two-frame stereo matching • Compute a disparity map from stereo images.
• Broad applications: 3D reconstruction, view interpolation
Related Works
Related Works •
Local methods
• Compute each pixel’s disparity independently over a local support region.
•
Fast
but
inaccurate.
•
Global methods
• Solve the stereo problem in an energy minimization process.
•
Accurate
but
slow
due to time-comsuming global optimizer.(GC,BP)
Related Works • Propagation-based methods • Produce quasi-dense or dense disparity results from a set of seed pixels.
• Relatively fast but sensitive to early wrong matches • use segmented regions as guided propagation unit • expensive cost
Related Works • Introduce a simple guided unit for propagation : pixel-wise 1D line segments. • No image segmentation required here.
• Simple, fast and accurate
Algorithmn
Algorithmn • Framework
Input: Stereo images Output: Disparity map
Algorithmn
Input: Stereo images Output: Disparity map
Disparity Cost Computing • Cost mesure : AD, BT, gradient-based measures, non-parametric transforms(rank/census [3] )......
• Combination : SAD + gradient [6] , AD + Census • AD (Absolute Distance) • Constant color assumption • Repetitive structures • Census • Encodes local image structures • Textureless regions [3] H. Hirschmuller and D. Scharstein. “Evaluation of stereo matching costs on images with radiometric differences.”
IEEE TPAMI
, 31(9):2009.
[6] A. Klaus, M. Sormann, and K. Karner. “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure.” ICPR,2006.
AD-Census Cost Initialization 𝐶 p, 𝑑 = 𝜌(𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 • p : pixel • • d : level p, 𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 ) + 𝜌(𝐶 𝐴𝐷 p, 𝑑 , 𝜆 >> a robust function on variable 𝑐 𝐴𝐷 ) • • pd = (
x-d,y
) in the right image
d
• 𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 : Hamming distance [22] Left I Right I [22] R. Zabih and J. Woodfill. “Non-parametric local transforms for computing visual correspondence.” In
Proc. ECCV
, 1994.
Census Transform 121 130 26 31 39 109 115 33 40 30 98 102 78 67 45 47 67 32 170 198 39 86 99 159 210 0 1 1 1 1 0 1 1 1 1 0 1 0 0 X 1 1 0 0 0 1 1 0 0 0 Census transform window :
1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1
Census Hamming Distance • Left image
1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 XOR 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1
• Right image
1 1 1 1 1 1 Hamming Distance = 3 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
AD-Census Cost Initialization • 𝐶 p, 𝑑 = 𝜌(𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 p, 𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 ) + 𝜌(𝐶 𝐴𝐷 p, 𝑑 , 𝜆 𝐴𝐷 ) > >> a robust function on variable 𝑐
AD-Census Cost Initialization • AD-Census measure produces proper disparity results for both repetitive structures and textureless regions.
Algorithmn
Input: Stereo images Output: Disparity map
Cross-based Cost Aggregation [23] • Cross construction • Line ending points P1, P2 for P are located when rule 1 or 2 are violated: R1: Color self-similarity in the line region:
smooth depth assumption
R2: Arm length limitation:
avoid over-smoothness
[23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.”
IEEE TCSVT
,2009.
Cross-based Cost Aggregation
Cross-based Cost Aggregation • Enhance cross construction (use pixel p’s left arm and the endpoint pixel p l as an example) • • •
Cross-based Cost Aggregation • Cost aggregation • Run this step for 4 iterations to get stable cost values. • For iteration 1 and 3, aggregated horizontally and then vertically. • For iteration 2 and 4, aggregated vertically and then horizontally. • Reduce the errors at depth discontinuities.
Cross-based Cost Aggregation • Our aggregation method can better handle large textureless regions and depth discontinuities.
Cross-based Cost Aggregation
[21]
K.-J. Yoon and I.-S. Kweon. “Adaptive support-weight approach for correspondence search.”
IEEE TPAMI
, 2006.
[23]
K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.”
IEEE TCSVT
,2009.
Algorithmn
Input: Stereo images Output: Disparity map
Scanline Optimization [2] • 4 scanline optimization processes are performed independently.
• 2 horizontal directions • 2 vertical directions 𝐶 𝑟 𝐶 𝑟 𝐶 2 𝐶 𝑟 𝐶 𝑟 [2] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information.” IEEE TPAMI, 2008.
Scanline Optimization p-r p r r : direction p-r : the previous pixel along the same direction 𝑃 1
,
𝑃 2 : penalize the disparity changes between neighboring pixels. ( 𝑃 1
≤
𝑃 2 ) [8] [8]S. Mattoccia, F. Tombari, and L. D. Stefano. “Stereo vision enabling precise border localization within a scanline optimization framework.” In
Proc. ACCV
, pages 517–527, 2007.
Scanline Optimization • The final cost : • The disparity with the minimum 𝐶 2
p
’s intermediate result.
value is selected as pixel 𝐶 𝑟 𝐶 𝑟 𝐶 2 𝐶 𝑟 𝐶 𝑟
Algorithmn
Input: Stereo images Output: Disparity map
Multi-step Disparity Refinement • Outlier Handling • Outlier Detection • Iterative Region Voting • Proper Interpolation • Depth Discontinuity Adjustment • Sub-pixel Enhancement
Outlier Handling--Detection • The outliers : 𝐷 𝐿
(p) !=
𝐷
R (p − (
𝐷 𝐿
(p), 0))
• Outliers are further classified into occlusion and mismatch points • p
intersect its epipolar line and
𝐷
R
•
is checked
If no intersection
p
is labelled as “occlusion”, otherwise “mismatch”
Outlier Handling--Iterative Region Voting • Construct cross-based regions and a robust voting scheme • • •
S
p : 𝜏𝑆
,
𝜏𝐻 : threshold values • 5 iterations dd
Outlier Handling--Proper Interpolation • occlusion • • The pixel with the lowest disparity value is selected for interpolation It’s most likely comes from the background • mismatch points • The pixel with the most similar color is selected for interpolation.
Depth Discontinuity Adjustment • For each pixel
p
on the
disparity edge
, two pixels
p
1 ,
p
2 both sides of the edge are collected. from • 𝐷 𝐿 (
p
) is replaced by 𝐷 𝐿 (
p
1 ) or 𝐷 𝐿 (
p
2 ) if one of the two pixels has
smaller matching cost
than 𝐶 2 (
p
, 𝐷 𝐿 (
p
)). 𝐷 𝐿 (
P 1
) 𝐷 𝐿 (
P
) 𝐷 𝐿 (
P 2
)
Sub-pixel Enhancement [20] • Quadratic polynomial interpolation • • • With 3*3 median filter [20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. “Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling.” IEEE TPAMI, 2009.
Multi-step Disparity Refinement • The average error percentages after performing each refinement step.
CUDA Implementation
CUDA Implementation • Compute Unified Device Architecture (CUDA) is a programming interface for
parallel computation
tasks on NVIDIA graphics hardware.
• The computation task is coded into a
kernel
function. • The allocation of the threads is controlled with two hierarchical concepts:
grid
and
block
.
• A
kernel
creates a
grid
with multiple
blocks
, and each
block
consists of multiple
threads
. Kernel Grid Grid … Block Block … Thread Thread …
CUDA Implementation •
Cost Initialization
: • • Parallelize with 𝑊 × 𝐻 threads. Organize into a 2D grid and the block size is set to 32 × 32. • Each thread computes a cost value for a pixel at a given disparity. • For
census transform
, a square window is require for each pixel, which requires loading more data into the
shared memory
for fast access.
Kernel Block Grid 32X32 Grid … Thread …
CUDA Implementation •
Cross-based Cost Aggregation
: • • A grid with 𝑊 × 𝐻 threads.
Cross construction : block size is 𝑊 𝐻 to efficiently handle a scan line or • Cost aggregation : block size is 32X32 • Data reuse with shared memory is considered in both steps.
CUDA Implementation •
Scanline Optimization
: • • This step is different, because the process is sequential in the scanline direction and parallel in the orthogonal direction.
𝑊 × 𝐷 or 𝐻 × 𝐷 threads •
Disparity Refinement
: • 𝑊 × 𝐻 threads
Experimental Results
Experimental Results • Device : A PC with Core 2 Duo 2.20GHz CPU and NVIDIA GeForce GTX 480 graphics card • Settings parameters: • Source : Middlebury http://vision.middlebury.edu/stereo/ HHI database(book arrival) Microsofy i2i database(Ilkay)
Experimental Results CPU GPU Tsukuba 2
.
5 0.015
Venus 4.5
0.032
Teddy 15 0.095
Cones 15 0.094
• The GPU-friendly system brings an impressive 140 × speedup.
• The average proportions of the GPU running time for the four computation steps are 1% , 70% , 28% and 1% respectively. • The iterative cost aggregation step and the scanline optimization process dominate the running time.
Experimental Results • • • First row: disparity maps generated with our system. Second row: disparity error maps with threshold 1. Errors in unoccluded and occluded regions are marked in black and gray respectively.
Experimental Results
Experimental Results • video
Experimental Results Snapshots on ’book arrival’ stereo video
Experimental Results Snapshots on ’Ilkay’ stereo video
Conclusion
Conclusion • Contributions • Present a near real-time stereo system with accurate disparity results.
• Combine some known techniques without sacrificing performance and parallelism to obtain the high quality disparity map.
• Future works • Improve to apply it in real world applications • Robust parameter setting methods