CS 4495/7495 Slides
Download
Report
Transcript CS 4495/7495 Slides
Stereo
Dan Kong
Stereo vision
Triangulate on two images
of the same scene point
to recover depth.
depth
baseline
left
Right
Camera calibration
Finding all correspondence
Computing depth or
surfaces
Outline
Basic stereo equations
Constraints and assumption
Windows-based matching
Cooperative Stereo
Dynamic programming
Graph cut and Belief Propagation
Segmentation-based method
Pinhole Camera Model
P ( X ,Y , Z )
Image
plane
x
f
y
O
z
Virtual
Image
P ( X , Y , Z )
Z f ,
X f
x X ,
y Y
X
Y
, Y f
Z
Z
X
Y
( X , Y , Z ) ( x, y,1) ( f , f , 1)
Z
Z
Basic Stereo Derivations
x
p1
O1
y
B
z
P1 ( X , Y , Z )
x
f
y
O2
z
p2
Derive expression for Z as a function of x1 , x2 , f , B
Basic Stereo Derivations
x
O1
y
B
z
P1 ( X , Y , Z )
x
f
y
O2
z
X1
X1 B
B
x1 f
, x2 f
x1 f
Z1
Z1
Z1
Z1
fB
fB
x1 x2
d
disparity
Stereo Constraint
Color constancy
The color of any world points remains
constant from image to image
This assumption is true under
Lambertian Model
In practice, given photometric camera
calibration and typical scenes, color
constancy holds well enough for most
stereo algorithms.
Stereo Constraint
Epipolar geometry
The epipolar geometry is the fundamental constraint in
stereo.
Rectification aligns epipolar lines with scanlines
Epipolar plane
Epipolar line for p’
Epipolar line for p
Stereo Constraint
Uniqueness and Continuity
Proposed by Marr&Poggio.
Each item from each image may be
assigned at most one disparity value,”
and the “disparity” varies smoothly
almost everywhere.
Correspondence Using
Window-based matching
scanline
Left
SSD error
Right
disparity
Sum of Squared (Pixel)
Differences
Left
Right
wL
wR
m
wL
wR
IL
IR
m
( xL , yL ) ( xL d , yL )
wL and wR are correspond ing m by m windows of pixels.
We define the window function :
Wm ( x, y ) {u, v | x m2 u x m2 , y m2 v y m2 }
The SSD cost measures the intensity difference as a function of disparity :
C r ( x, y , d )
2
[
I
(
u
,
v
)
I
(
u
d
,
v
)]
L
R
( u ,v )Wm ( x , y )
Image Normalization
Even when the cameras are identical models, there can be
differences in gain and sensitivity.
The cameras do not see exactly the same surfaces, so their
overall light levels can differ.
For these reasons and more, it is a good idea to normalize
the pixels in each window:
I
I
1
Wm ( x , y )
Wm ( x , y )
I (u, v)
Average pixel
( u ,v )Wm ( x , y )
2
[
I
(
u
,
v
)]
Window magnitude
( u ,v )Wm ( x , y )
ˆI ( x, y ) I ( x, y ) I
I I W ( x, y )
m
Normalized pixel
Images as Vectors
Left
Right
“Unwrap”
image to form
vector, using
raster scan order
wL
wR
m
row 2
m
row 3
m
m
wL
m
Each window is a vector
in an m2 dimensional
vector space.
Normalization makes
them unit length.
row 1
wL
Normalized Correlation
wR (d )
CNC (d )
wL
Iˆ (u, v) Iˆ
L
( u ,v )Wm ( x , y )
R
(u d , v)
wL wR (d ) cos
Normalized Correlation
d arg max d wL wR (d )
*
Results Using window-based
Method
Left
Images courtesy of Point Grey Research
Disparity Map
Stereo Results
Left
Disparity map
Problems with Window-based
matching
Disparity within the window must be
constant.
Bias the results towards frontal-parallel
surfaces.
Blur across depth discontinuities.
Perform poorly in textureless regions.
Erroneous results in occluded regions
Cooperative Stereo Algorithm
Based on two basic assumption by Marr
and Poggio:
Uniqueness: at most a single unique match
exists for each pixel.
Continuous: disparity values are generally
continuous, i.e., smooth within a local
neighborhood.
Disparity Space Image (DSI)
The 3D disparity space has dimensions
row r column c and disparity d. Each
element (r, c, d) of the disparity space
projects to the pixel (r, c) in the left
image and to the (r, c + d) in the right
image
DSI represents the confidence or
likelihood of a particular match.
Illustration of DSI
(r, c) slices for different d
(c, d) slice for r = 151
Definition
to element (r, c, d) at
Ln (r , c, d ) Match value assigned
iteration n
L0 (r , c, d ) Initial values computed from SSD or NCC
( r , c, d )
Inhibition area for element (r, c, d)
(r , c, d ) Local support area for element (r, c, d)
Illustration of Inhibitory and
Support Regions
Iterative Updating DSI
1
2
3
4
Explicit Detection of Occlusion
Identify occlusions by examining the magnitude of the
converged values in conjunction with the uniqueness
constrain
Summary of Cooperative Stereo
Prepare a 3D array, (r, c, d): (r, c) for each pixel in
the reference image and d for the range of disparity.
Set initial match values L0 using a function of image
intensities, such as normalized correlation or SSD.
Iteratively update match values L n using (4) until
the match values converge.
For each pixel (r, c), find the element (r, c, d) with
the maximum match value.
If the maximum match value is higher than a
threshold, output the disparity d, otherwise, declare
a occlusion.
MRF Stereo Model
Local Evidence ( x p , y p ) :Lx1 vector
function
Compatibility
( x p , xn ) :LxL matrix
function
Disparity Optimization
Joint probability of MRF:
P( x1 , x2 ,..., xN , y1 , y2 ,... yN )
( xi , x j ) ( x p , y p )
(i , j )
(1)
p
The disparity optimization step requires
choosing an estimator for x1 ,...xN
MMSE: estimate of the mean of the marginal
distribution of x i
MAP: the labeling of x1 ,...xN maximize the
above joint probability
Equivalence to Energy
Minimization
Taking the negative log of equation 1:
E ( x1 , x2 ,..., xN , y1 , y2 ,... yN )
log ( xi , x j ) log ( x p , y p )
(i , j )
(2)
p
In graph cut, equation 2 is expressed as:
E ( x1 , x2 ,..., xN , y1 , y2 ,... yN )
V ( xi , x j ) D( x p , y p )
(i , j )
p
(3)
Maximizing the probability in equation 1is
equivalent to minimizing energy in
equation 3.
Stereo Matching Using Belief
Propagation
Belief propagation is an iterative
inference algorithm that propagates
messages in the Markov network
mst ( xs , xt )
ms ( xs , ys )
bs ( xs )
x s send to xt
Message observed node y s send to x s
Belief at node x s
Message node
We simplify mst ( xs , xt ) as
ms ( xs )
mst ( xt ) , andms ( xs , ys ) as
Belief Propagation Algorithm
Initialize messages as uniform distribution
Iterative update messages for I = 1:T
Compute belief at each node and output disparity
Illustration of BP
BP Results
Stereo As a Pixel-Labeling
Problem
Let P be a set of pixels, L be a label set. The
goal is find a labeling f which minimize some
energy. For stereo, the labels are disparities.
The classic form of energy function is:
Energy Function:
The energy function Dp ( f p ) measures how
appropriate a label is for the pixel p
given the observed data. In stereo, this
term corresponds to the match cost or
likelihood.
The energy term Vp,q ( f p , f q ) encodes the
prior or smoothness constraint. In
stereo, the so called Potts model is
used:
f p fq
0
V p ,q ( f p , f q )
I (I ) otherwise
Two Energy Minimization
Algorithm via Graph Cuts
Swap algorithm
Two Energy Minimization
Algorithm via Graph Cuts
expansion algorithm
Moves
Graph Cuts Results
Graph Cuts
Belief Propagation
Ordering Constraint
If an object a is left on an object b in the left
image then object a will also appear to the left of
object b in the right image
Ordering constraint…
…and its failure
Stereo Correspondences
Left scanline
…
Right scanline
…
Match intensities sequentially between two
scanlines
Stereo Correspondences
Left scanline
Right scanline
…
…
Match
Match
Left occlusion
Match
Right occlusion
Search Over
Correspondences
Left Occluded Pixels
Left scanline
Right scanline
Right occluded Pixels
Three cases:
Sequential – cost of match
Left occluded – cost of no match
Right occluded – cost of no match
Standard 3-move Dynamic
Programming for Stereo
Left Occluded Pixels
Start
Left scanline
Right scanline
Right occluded Pixels
Dynamic
programming yields
the optimal path
through grid. This is
the best set of
matches that satisfy
the ordering
constraint
End
Dynamic Programming
Efficient algorithm for solving sequential decision
(optimal path) problems.
i 1
1
1
1
i2
2
2
2
i 3
3
3
3
3
t 1
t 2
t 3
t T
1
…
How many paths through this trellis? 3T
2
Dynamic Programming
i 1
States:
1
12
22
1
1
2
2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
32
Suppose cost can be decomposed into stages:
ij Cost of going from state i to state j
Dynamic Programming
i 1
1
12
22
1
1
j2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
32
2
2
Principle of Optimality for an n-stage assignment problem
Ct ( j ) min i ( ij Ct 1 (i))
Dynamic Programming
i 1
1
1
bt (2) 2
1
j2
i2
2
i 3
3
3
3
Ct 1
Ct
Ct 1
2
2
Ct ( j ) min i ( ij Ct 1 (i))
bt ( j ) arg min i ( ij Ct 1 (i))
Stereo Matching with
Dynamic Programming
Pseudo-code describing how to calculate the
optimal match
Stereo Matching with
Dynamic Programming
Pseudo-code describing how
to reconstruct the optimal
path
Results
Local errors may be propagated along a scan-line and
no inter scan-line consistency is enforced.
Assumption Behind
Segmentation-based Stereo
Depth discontinuity tend to correlate
well with color edges
Disparity variation within a segment
is small
Approximation the scene with piecewise planar surfaces
Segmentation-based stereo
Plane equation is fitted in each segment
based on initial disparity estimation
obtained SSD or Correlation
Globe matching criteria: if a depth map is
good, warping the reference image to the
other view according to this depth will
render an image that matches the real
view
Optimization by iterative neighborhood
depth hypothesizing
Hypothesizing neighborhood
depth
Correct depth is propagated to reduce
fattening effect:
Hypothesizing neighborhood
depth
Background depth is hypothesized for
unmatched region:
Result
Another Result