CS 4495/7495 Slides

Download Report

Transcript CS 4495/7495 Slides

Stereo
Dan Kong
Stereo vision
Triangulate on two images
of the same scene point
to recover depth.
depth
baseline
left
Right
Camera calibration
Finding all correspondence
Computing depth or
surfaces
Outline
Basic stereo equations
Constraints and assumption
Windows-based matching
Cooperative Stereo
Dynamic programming
Graph cut and Belief Propagation
Segmentation-based method
Pinhole Camera Model
P  ( X ,Y , Z )
Image
plane
x
f
y
O
z
Virtual
Image
P  ( X , Y , Z )
Z   f ,
X   f
x   X ,
y  Y 
X
Y
, Y   f
Z
Z
X
Y
( X , Y , Z )  ( x, y,1)  ( f , f , 1)
Z
Z
Basic Stereo Derivations
x
p1
O1
y
B
z
P1  ( X , Y , Z )
x
f
y
O2
z
p2
Derive expression for Z as a function of x1 , x2 , f , B
Basic Stereo Derivations
x
O1
y
B
z
P1  ( X , Y , Z )
x
f
y
O2
z
X1
X1  B
B
x1   f
, x2   f
 x1  f
Z1
Z1
Z1
 Z1 
fB
fB

x1  x2
d
disparity
Stereo Constraint
Color constancy
The color of any world points remains
constant from image to image
This assumption is true under
Lambertian Model
In practice, given photometric camera
calibration and typical scenes, color
constancy holds well enough for most
stereo algorithms.
Stereo Constraint
Epipolar geometry
The epipolar geometry is the fundamental constraint in
stereo.
Rectification aligns epipolar lines with scanlines
Epipolar plane
Epipolar line for p’
Epipolar line for p
Stereo Constraint
Uniqueness and Continuity
Proposed by Marr&Poggio.
Each item from each image may be
assigned at most one disparity value,”
and the “disparity” varies smoothly
almost everywhere.
Correspondence Using
Window-based matching
scanline
Left
SSD error
Right
disparity
Sum of Squared (Pixel)
Differences
Left
Right
wL
wR
m
wL
wR
IL
IR
m
( xL , yL ) ( xL  d , yL )
wL and wR are correspond ing m by m windows of pixels.
We define the window function :
Wm ( x, y )  {u, v | x  m2  u  x  m2 , y  m2  v  y  m2 }
The SSD cost measures the intensity difference as a function of disparity :
C r ( x, y , d ) 
2
[
I
(
u
,
v
)

I
(
u

d
,
v
)]
 L
R
( u ,v )Wm ( x , y )
Image Normalization
 Even when the cameras are identical models, there can be
differences in gain and sensitivity.
 The cameras do not see exactly the same surfaces, so their
overall light levels can differ.
 For these reasons and more, it is a good idea to normalize
the pixels in each window:
I
I
1
Wm ( x , y )
Wm ( x , y )

 I (u, v)
Average pixel
( u ,v )Wm ( x , y )
2
[
I
(
u
,
v
)]

Window magnitude
( u ,v )Wm ( x , y )
ˆI ( x, y )  I ( x, y )  I
I  I W ( x, y )
m
Normalized pixel
Images as Vectors
Left
Right
“Unwrap”
image to form
vector, using
raster scan order
wL
wR
m
row 2
m
row 3
m
m
wL
m
Each window is a vector
in an m2 dimensional
vector space.
Normalization makes
them unit length.
row 1
wL
Normalized Correlation
wR (d )
CNC (d ) 
wL
 Iˆ (u, v) Iˆ
L
( u ,v )Wm ( x , y )
R
(u  d , v)
 wL  wR (d )  cos
Normalized Correlation
d  arg max d wL  wR (d )
*
Results Using window-based
Method
Left
Images courtesy of Point Grey Research
Disparity Map
Stereo Results
Left
Disparity map
Problems with Window-based
matching
Disparity within the window must be
constant.
Bias the results towards frontal-parallel
surfaces.
Blur across depth discontinuities.
Perform poorly in textureless regions.
Erroneous results in occluded regions
Cooperative Stereo Algorithm
Based on two basic assumption by Marr
and Poggio:
Uniqueness: at most a single unique match
exists for each pixel.
Continuous: disparity values are generally
continuous, i.e., smooth within a local
neighborhood.
Disparity Space Image (DSI)
The 3D disparity space has dimensions
row r column c and disparity d. Each
element (r, c, d) of the disparity space
projects to the pixel (r, c) in the left
image and to the (r, c + d) in the right
image
 DSI represents the confidence or
likelihood of a particular match.
Illustration of DSI
(r, c) slices for different d
(c, d) slice for r = 151
Definition
to element (r, c, d) at
Ln (r , c, d ) Match value assigned
iteration n
L0 (r , c, d ) Initial values computed from SSD or NCC
 ( r , c, d )
Inhibition area for element (r, c, d)
(r , c, d ) Local support area for element (r, c, d)
Illustration of Inhibitory and
Support Regions
Iterative Updating DSI
1
2
3
4
Explicit Detection of Occlusion
Identify occlusions by examining the magnitude of the
converged values in conjunction with the uniqueness
constrain
Summary of Cooperative Stereo
 Prepare a 3D array, (r, c, d): (r, c) for each pixel in
the reference image and d for the range of disparity.
 Set initial match values L0 using a function of image
intensities, such as normalized correlation or SSD.
 Iteratively update match values L n using (4) until
the match values converge.
 For each pixel (r, c), find the element (r, c, d) with
the maximum match value.
 If the maximum match value is higher than a
threshold, output the disparity d, otherwise, declare
a occlusion.
MRF Stereo Model
Local Evidence ( x p , y p ) :Lx1 vector
function
Compatibility
( x p , xn ) :LxL matrix
function
Disparity Optimization
Joint probability of MRF:
P( x1 , x2 ,..., xN , y1 , y2 ,... yN ) 
 ( xi , x j ) ( x p , y p )
(i , j )
(1)
p
The disparity optimization step requires
choosing an estimator for x1 ,...xN
MMSE: estimate of the mean of the marginal
distribution of x i
MAP: the labeling of x1 ,...xN maximize the
above joint probability
Equivalence to Energy
Minimization
Taking the negative log of equation 1:
E ( x1 , x2 ,..., xN , y1 , y2 ,... yN ) 
  log ( xi , x j )    log ( x p , y p )
(i , j )
(2)
p
In graph cut, equation 2 is expressed as:
E ( x1 , x2 ,..., xN , y1 , y2 ,... yN ) 
 V ( xi , x j )   D( x p , y p )
(i , j )
p
(3)
Maximizing the probability in equation 1is
equivalent to minimizing energy in
equation 3.
Stereo Matching Using Belief
Propagation
Belief propagation is an iterative
inference algorithm that propagates
messages in the Markov network
mst ( xs , xt )
ms ( xs , ys )
bs ( xs )
x s send to xt
Message observed node y s send to x s
Belief at node x s
Message node
We simplify mst ( xs , xt ) as
ms ( xs )
mst ( xt ) , andms ( xs , ys ) as
Belief Propagation Algorithm
Initialize messages as uniform distribution
Iterative update messages for I = 1:T
Compute belief at each node and output disparity
Illustration of BP
BP Results
Stereo As a Pixel-Labeling
Problem
Let P be a set of pixels, L be a label set. The
goal is find a labeling f which minimize some
energy. For stereo, the labels are disparities.
The classic form of energy function is:
Energy Function:
The energy function Dp ( f p ) measures how
appropriate a label is for the pixel p
given the observed data. In stereo, this
term corresponds to the match cost or
likelihood.
The energy term Vp,q ( f p , f q ) encodes the
prior or smoothness constraint. In
stereo, the so called Potts model is
used:
f p  fq
 0
V p ,q ( f p , f q )  
  I (I ) otherwise
Two Energy Minimization
Algorithm via Graph Cuts
 
Swap algorithm
Two Energy Minimization
Algorithm via Graph Cuts

expansion algorithm
Moves
Graph Cuts Results
Graph Cuts
Belief Propagation
Ordering Constraint
 If an object a is left on an object b in the left
image then object a will also appear to the left of
object b in the right image
Ordering constraint…
…and its failure
Stereo Correspondences
Left scanline
…
Right scanline
…
Match intensities sequentially between two
scanlines
Stereo Correspondences
Left scanline
Right scanline
…
…
Match
Match
Left occlusion
Match
Right occlusion
Search Over
Correspondences
Left Occluded Pixels
Left scanline
Right scanline
Right occluded Pixels
Three cases:
Sequential – cost of match
Left occluded – cost of no match
Right occluded – cost of no match
Standard 3-move Dynamic
Programming for Stereo
Left Occluded Pixels
Start
Left scanline
Right scanline
Right occluded Pixels
Dynamic
programming yields
the optimal path
through grid. This is
the best set of
matches that satisfy
the ordering
constraint
End
Dynamic Programming
 Efficient algorithm for solving sequential decision
(optimal path) problems.
i 1
1
1
1
i2
2
2
2
i 3
3
3
3
3
t 1
t 2
t 3
t T
1
…
How many paths through this trellis? 3T
2
Dynamic Programming
i 1
States:
1
12
 22
1
1
2
2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
 32
Suppose cost can be decomposed into stages:
 ij  Cost of going from state i to state j
Dynamic Programming
i 1
1
12
 22
1
1
j2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
 32
2
2
Principle of Optimality for an n-stage assignment problem
Ct ( j )  min i ( ij  Ct 1 (i))
Dynamic Programming
i 1
1
1
bt (2)  2
1
j2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
2
2
Ct ( j )  min i ( ij  Ct 1 (i))
bt ( j )  arg min i ( ij  Ct 1 (i))
Stereo Matching with
Dynamic Programming
Pseudo-code describing how to calculate the
optimal match
Stereo Matching with
Dynamic Programming
Pseudo-code describing how
to reconstruct the optimal
path
Results
Local errors may be propagated along a scan-line and
no inter scan-line consistency is enforced.
Assumption Behind
Segmentation-based Stereo
Depth discontinuity tend to correlate
well with color edges
Disparity variation within a segment
is small
Approximation the scene with piecewise planar surfaces
Segmentation-based stereo
Plane equation is fitted in each segment
based on initial disparity estimation
obtained SSD or Correlation
Globe matching criteria: if a depth map is
good, warping the reference image to the
other view according to this depth will
render an image that matches the real
view
Optimization by iterative neighborhood
depth hypothesizing
Hypothesizing neighborhood
depth
Correct depth is propagated to reduce
fattening effect:
Hypothesizing neighborhood
depth
Background depth is hypothesized for
unmatched region:
Result
Another Result