PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

On Building an Accurate Stereo Matching System on Graphics Hardware

Xing Mei ; Xun Sun ; Mingcai Zhou ; Shaohui Jiao ; Haitao Wang ; Xiaopeng Zhang Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE

Outline • Introduction • Related Works • Algorithmn • CUDA Implementation • Experimental Results • Conclusion

Introduction

Introduction Dense two-frame stereo matching • Compute a disparity map from stereo images.

• Broad applications: 3D reconstruction, view interpolation

Related Works

Related Works •

Local methods

• Compute each pixel’s disparity independently over a local support region.

Fast

but

inaccurate.

Global methods

• Solve the stereo problem in an energy minimization process.

Accurate

but

slow

due to time-comsuming global optimizer.(GC,BP)

Related Works • Propagation-based methods • Produce quasi-dense or dense disparity results from a set of seed pixels.

• Relatively fast but sensitive to early wrong matches • use segmented regions as guided propagation unit • expensive cost

Related Works • Introduce a simple guided unit for propagation : pixel-wise 1D line segments. • No image segmentation required here.

• Simple, fast and accurate

Algorithmn

Algorithmn • Framework

Input: Stereo images Output: Disparity map

Algorithmn

Input: Stereo images Output: Disparity map

Disparity Cost Computing • Cost mesure : AD, BT, gradient-based measures, non-parametric transforms(rank/census [3] )......

• Combination : SAD + gradient [6] , AD + Census • AD (Absolute Distance) • Constant color assumption • Repetitive structures • Census • Encodes local image structures • Textureless regions [3] H. Hirschmuller and D. Scharstein. “Evaluation of stereo matching costs on images with radiometric differences.”

IEEE TPAMI

, 31(9):2009.

[6] A. Klaus, M. Sormann, and K. Karner. “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure.” ICPR,2006.

AD-Census Cost Initialization 𝐶 p, 𝑑 = 𝜌(𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 • p : pixel • • d : level p, 𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 ) + 𝜌(𝐶 𝐴𝐷 p, 𝑑 , 𝜆 >> a robust function on variable 𝑐 𝐴𝐷 ) • • pd = (

x-d,y

) in the right image

d

• 𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 : Hamming distance [22] Left I Right I [22] R. Zabih and J. Woodfill. “Non-parametric local transforms for computing visual correspondence.” In

Proc. ECCV

, 1994.

Census Transform 121 130 26 31 39 109 115 33 40 30 98 102 78 67 45 47 67 32 170 198 39 86 99 159 210 0 1 1 1 1 0 1 1 1 1 0 1 0 0 X 1 1 0 0 0 1 1 0 0 0 Census transform window :

1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1

Census Hamming Distance • Left image

1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 XOR 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1

• Right image

1 1 1 1 1 1 Hamming Distance = 3 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

AD-Census Cost Initialization • 𝐶 p, 𝑑 = 𝜌(𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 p, 𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 ) + 𝜌(𝐶 𝐴𝐷 p, 𝑑 , 𝜆 𝐴𝐷 ) > >> a robust function on variable 𝑐

AD-Census Cost Initialization • AD-Census measure produces proper disparity results for both repetitive structures and textureless regions.

Algorithmn

Input: Stereo images Output: Disparity map

Cross-based Cost Aggregation [23] • Cross construction • Line ending points P1, P2 for P are located when rule 1 or 2 are violated:  R1: Color self-similarity in the line region:

smooth depth assumption

 R2: Arm length limitation:

avoid over-smoothness

[23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.”

IEEE TCSVT

,2009.

Cross-based Cost Aggregation

Cross-based Cost Aggregation • Enhance cross construction (use pixel p’s left arm and the endpoint pixel p l as an example) • • •

Cross-based Cost Aggregation • Cost aggregation • Run this step for 4 iterations to get stable cost values. • For iteration 1 and 3, aggregated horizontally and then vertically. • For iteration 2 and 4, aggregated vertically and then horizontally. • Reduce the errors at depth discontinuities.

Cross-based Cost Aggregation • Our aggregation method can better handle large textureless regions and depth discontinuities.

Cross-based Cost Aggregation

[21]

K.-J. Yoon and I.-S. Kweon. “Adaptive support-weight approach for correspondence search.”

IEEE TPAMI

, 2006.

[23]

K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.”

IEEE TCSVT

,2009.

Algorithmn

Input: Stereo images Output: Disparity map

Scanline Optimization [2] • 4 scanline optimization processes are performed independently.

• 2 horizontal directions • 2 vertical directions 𝐶 𝑟 𝐶 𝑟 𝐶 2 𝐶 𝑟 𝐶 𝑟 [2] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information.” IEEE TPAMI, 2008.

Scanline Optimization p-r p r  r : direction  p-r : the previous pixel along the same direction  𝑃 1

,

𝑃 2 : penalize the disparity changes between neighboring pixels. ( 𝑃 1

𝑃 2 ) [8] [8]S. Mattoccia, F. Tombari, and L. D. Stefano. “Stereo vision enabling precise border localization within a scanline optimization framework.” In

Proc. ACCV

, pages 517–527, 2007.

Scanline Optimization • The final cost : • The disparity with the minimum 𝐶 2

p

’s intermediate result.

value is selected as pixel 𝐶 𝑟 𝐶 𝑟 𝐶 2 𝐶 𝑟 𝐶 𝑟

Algorithmn

Input: Stereo images Output: Disparity map

Multi-step Disparity Refinement • Outlier Handling • Outlier Detection • Iterative Region Voting • Proper Interpolation • Depth Discontinuity Adjustment • Sub-pixel Enhancement

Outlier Handling--Detection • The outliers : 𝐷 𝐿

(p) !=

𝐷

R (p − (

𝐷 𝐿

(p), 0))

• Outliers are further classified into occlusion and mismatch points • p

intersect its epipolar line and

𝐷

R

is checked

If no intersection

p

is labelled as “occlusion”, otherwise “mismatch”

Outlier Handling--Iterative Region Voting • Construct cross-based regions and a robust voting scheme • • •

S

p : 𝜏𝑆

,

𝜏𝐻 : threshold values • 5 iterations dd

Outlier Handling--Proper Interpolation • occlusion • • The pixel with the lowest disparity value is selected for interpolation It’s most likely comes from the background • mismatch points • The pixel with the most similar color is selected for interpolation.

Depth Discontinuity Adjustment • For each pixel

p

on the

disparity edge

, two pixels

p

1 ,

p

2 both sides of the edge are collected. from • 𝐷 𝐿 (

p

) is replaced by 𝐷 𝐿 (

p

1 ) or 𝐷 𝐿 (

p

2 ) if one of the two pixels has

smaller matching cost

than 𝐶 2 (

p

, 𝐷 𝐿 (

p

)). 𝐷 𝐿 (

P 1

) 𝐷 𝐿 (

P

) 𝐷 𝐿 (

P 2

)

Sub-pixel Enhancement [20] • Quadratic polynomial interpolation • • • With 3*3 median filter [20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. “Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling.” IEEE TPAMI, 2009.

Multi-step Disparity Refinement • The average error percentages after performing each refinement step.

CUDA Implementation

CUDA Implementation • Compute Unified Device Architecture (CUDA) is a programming interface for

parallel computation

tasks on NVIDIA graphics hardware.

• The computation task is coded into a

kernel

function. • The allocation of the threads is controlled with two hierarchical concepts:

grid

and

block

.

• A

kernel

creates a

grid

with multiple

blocks

, and each

block

consists of multiple

threads

. Kernel Grid Grid … Block Block … Thread Thread …

CUDA Implementation •

Cost Initialization

: • • Parallelize with 𝑊 × 𝐻 threads. Organize into a 2D grid and the block size is set to 32 × 32. • Each thread computes a cost value for a pixel at a given disparity. • For

census transform

, a square window is require for each pixel, which requires loading more data into the

shared memory

for fast access.

Kernel Block Grid 32X32 Grid … Thread …

CUDA Implementation •

Cross-based Cost Aggregation

: • • A grid with 𝑊 × 𝐻 threads.

Cross construction : block size is 𝑊 𝐻 to efficiently handle a scan line or • Cost aggregation : block size is 32X32 • Data reuse with shared memory is considered in both steps.

CUDA Implementation •

Scanline Optimization

: • • This step is different, because the process is sequential in the scanline direction and parallel in the orthogonal direction.

𝑊 × 𝐷 or 𝐻 × 𝐷 threads •

Disparity Refinement

: • 𝑊 × 𝐻 threads

Experimental Results

Experimental Results • Device : A PC with Core 2 Duo 2.20GHz CPU and NVIDIA GeForce GTX 480 graphics card • Settings parameters: • Source : Middlebury http://vision.middlebury.edu/stereo/ HHI database(book arrival) Microsofy i2i database(Ilkay)

Experimental Results CPU GPU Tsukuba 2

.

5 0.015

Venus 4.5

0.032

Teddy 15 0.095

Cones 15 0.094

• The GPU-friendly system brings an impressive 140 × speedup.

• The average proportions of the GPU running time for the four computation steps are 1% , 70% , 28% and 1% respectively. • The iterative cost aggregation step and the scanline optimization process dominate the running time.

Experimental Results • • • First row: disparity maps generated with our system. Second row: disparity error maps with threshold 1. Errors in unoccluded and occluded regions are marked in black and gray respectively.

Experimental Results

Experimental Results • video

Experimental Results Snapshots on ’book arrival’ stereo video

Experimental Results Snapshots on ’Ilkay’ stereo video

Conclusion

Conclusion • Contributions • Present a near real-time stereo system with accurate disparity results.

• Combine some known techniques without sacrificing performance and parallelism to obtain the high quality disparity map.

• Future works • Improve to apply it in real world applications • Robust parameter setting methods