#### Transcript Lecture 23 - Stereo and Projective Structure from Motion

```04/13/10
Stereo and Projective Structure from
Motion
Computer Vision
CS 543 / ECE 549
University of Illinois
Derek Hoiem
Many slides adapted from Lana Lazebnik, Silvio Saverese, Steve Seitz
This class
• Recap of epipolar geometry
• Recovering structure
– Generally, how can we estimate 3D positions for
matched points in two images? (triangulation)
– If we have a moving camera, how can we recover
3D points? (projective structure from motion)
– If we have a calibrated stereo pair, how can we get
dense depth estimates? (stereo fusion)
Basic Questions
• Why can’t we get depth if the camera doesn’t
translate?
• Why can’t we get a nice panorama if the
camera does translate?
Recap: Epipoles
• Point x in left image corresponds to epipolar line l’ in right
image
• Epipolar line passes through the epipole (the intersection of
the cameras’ baseline with the image plane
C
C
Recap: Fundamental Matrix
• Fundamental matrix maps from a point in one
image to a line in the other
• If x and x’ correspond to the same 3d point X:
Recap: Automatically Relating Projections
Assume we have matched points x x’ with outliers
Homography (No Translation)
Fundamental Matrix (Translation)
Recap: Automatically Relating Projections
Assume we have matched points x x’ with outliers
Homography (No Translation)
•
Correspondence Relation
x'  Hx  x'Hx  0
1. Normalize image
coordinates
~
x  Tx ~x   T x 
2. RANSAC with 4 points
~
3. De-normalize: H  T1H
T
Fundamental Matrix (Translation)
Recap: Automatically Relating Projections
Assume we have matched points x x’ with outliers
Homography (No Translation)
Fundamental Matrix (Translation)
•
• Correspondence Relation
Correspondence Relation
x'  Hx  x'Hx  0
1. Normalize image
coordinates
~
x  Tx ~x   T x 
2. RANSAC with 4 points
~
3. De-normalize: H  T1H
T
xT Fx  0
1. Normalize image
coordinates
~
x  Tx ~x   T x 
2. RANSAC with 8 points
~
3. Enforce detF  0 by SVD
~
4. De-normalize: F  T1FT
Recap
• We can get projection matrices P and P’ up to a
projective ambiguity
P  I | 0
T








P  e F | e e F  0
See HZ p. 255-256
• Code:
function P = vgg_P_from_F(F)
[U,S,V] = svd(F);
e = U(:,3);
P = [-vgg_contreps(e)*F e];
Recap
• Fundamental matrix song
Triangulation: Linear Solution
X
• Generally, rays Cx
and C’x’ will not
exactly intersect
• Can solve via SVD,
finding a least squares
solution to a system of
equations
x
x'
x  PX  0
x  PX  0
 upT3  p1T 
 T
T 
v
p

p
2 
AX  0 A   3T
u p3  p1T 
 T
T
 v p3  p2 
Triangulation: Linear Solution
Given P, P’, x, x’
1. Precondition points and projection
matrices
T


p
1
2. Create matrix A
 T
P

3. [U, S, V] = svd(A)
p 2 
p T3 
 
4. X = V(:, end)
Pros and Cons
• Works for any number of
corresponding images
• Not projectively invariant
u 
x    v 
 
 1 
u 
x  v 
 
1 
p1T 
 T

P  p2 
p3T 
 
 upT3  p1T 
 T
T 
vp  p 2 
A   3T
u p3  p1T 
 T
T



v p3  p2 
Code: http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_X_from_xP_lin.m
Triangulation: Non-linear Solution
• Minimize projected error while satisfying
xTFx=0
• Solution is a 6-degree polynomial of t,
minimizing
Projective structure from motion
• Given: m images of n fixed 3D points
• xij = Pi Xj , i = 1,… , m, j = 1, … , n
• Problem: estimate m projection matrices Pi and n 3D points
Xj from the mn corresponding points xij
Xj
x1j
x3j
P1
x2j
P3
Slides from Lana Lazebnik
P2
Projective structure from motion
• Given: m images of n fixed 3D points
• xij = Pi Xj ,
i = 1,… , m, j = 1, … , n
• Problem: estimate m projection matrices Pi
and n 3D points Xj from the mn corresponding
points xij
• With no calibration info, cameras and points
can only be recovered up to a 4x4 projective
transformation Q:
• X → QX, P → PQ-1
• We can solve for structure and motion when
• 2mn >= 11m +3n – 15
• For two cameras, at least 7 points are needed
Sequential structure from motion
•Initialize motion from two images
using fundamental matrix
•Initialize structure by triangulation
points
– Determine projection matrix of
new camera using all the known
3D points that are visible in its
image – calibration
cameras
Sequential structure from motion
•Initialize motion from two images
using fundamental matrix
•Initialize structure by triangulation
points
– Determine projection matrix of
new camera using all the known
3D points that are visible in its
image – calibration
– Refine and extend structure:
compute new 3D points,
re-optimize existing points that are
also seen by this camera –
triangulation
cameras
Sequential structure from motion
•Initialize motion from two images
using fundamental matrix
•Initialize structure by triangulation
points
– Determine projection matrix of new
camera using all the known 3D
points that are visible in its image –
calibration
– Refine and extend structure:
compute new 3D points,
re-optimize existing points that are
also seen by this camera –
triangulation
•Refine structure and motion: bundle
cameras
• Non-linear method for refining structure and motion
• Minimizing reprojection error
2
E (P, X)   Dxij , Pi X j 
m
n
i 1 j 1
Xj
P1Xj
x3j
x1j
P1
P2Xj
x2j
P3Xj
P3
P2
Self-calibration
• Self-calibration (auto-calibration) is the process of
determining intrinsic camera parameters directly
from uncalibrated images
• For example, when the images are acquired by a
single moving camera, we can use the constraint
that the intrinsic parameter matrix remains fixed
for all the images
– Compute initial projective reconstruction and find 3D
projective transformation matrix Q such that all
camera matrices are in the form Pi = K [Ri | ti]
• Can use constraints on the form of the calibration
matrix: zero skew
Summary so far
• From two images, we can:
– Recover fundamental matrix F
– Recover canonical cameras P and P’ from F
– Estimate 3d position values X for corresponding points
x and x’
• For a moving camera, we can:
– Initialize by computing F, P, X for two images
– Sequentially add new images, computing new P,
– Auto-calibrate assuming fixed calibration matrix to
Photo synth
Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring
photo collections in 3D," SIGGRAPH 2006
http://photosynth.net/
3D from multiple images
Building Rome in a Day: Agarwal et al. 2009
Plug: Steve Seitz Talk
• Steve Seitz will talk about “Reconstructing the
World from Photos on the Internet”
– Monday, April 26th, 4pm in Siebel Center
Special case: Dense binocular stereo
• Fuse a calibrated binocular stereo pair to
produce a depth image
image 1
image 2
Dense depth map
Many of these slides adapted from
Steve Seitz and Lana Lazebnik
Basic stereo matching algorithm
• For each pixel in the first image
– Find corresponding epipolar line in the right image
– Examine all pixels on the epipolar line and pick the best match
– Triangulate the matches to get depth information
• Simplest case: epipolar lines are scanlines
– When does this happen?
Simplest Case: Parallel images
• Image planes of cameras are
parallel to each other and to
the baseline
• Camera centers are at same
height
• Focal lengths are the same
Simplest Case: Parallel images
• Image planes of cameras are
parallel to each other and to
the baseline
• Camera centers are at same
height
• Focal lengths are the same
• Then, epipolar lines fall along
the horizontal scan lines of the
images
Special case of fundamental matrix
Epipolar constraint:
x E x  0, E  [t ]R
T
R=I
t = (T, 0, 0)
x
x’
t
0 0
u v 10 0
0 T
0  u  
 

 T   v   0
0  1 
0 0
E  [t ]R  0 0

0 T
 0 


u v 1  T   0
 Tv 


The y-coordinates of corresponding points are the same!
0 
T

0 
Tv  Tv
Depth from disparity
X
z
x’
x
f
O
f
Baseline
B
O’
B f
disparity  x  x 
z
Disparity is inversely proportional to depth!
Stereo image rectification
Stereo image rectification
• Reproject image planes
onto a common plane
parallel to the line
between optical centers
• Pixel motion is horizontal
after this transformation
• Two homographies (3x3
transform), one for each
input image reprojection
 C. Loop and Z. Zhang. Computing
Rectifying Homographies for Stereo
Vision. IEEE Conf. Computer Vision
and Pattern Recognition, 1999.
Rectification example
Basic stereo matching algorithm
• If necessary, rectify the two stereo images to transform
epipolar lines into scanlines
• For each pixel x in the first image
– Find corresponding epipolar scanline in the right image
– Examine all pixels on the scanline and pick the best match x’
– Compute disparity x-x’ and set depth(x) = 1/(x-x’)
Correspondence search
Left
Right
scanline
Matching cost
disparity
• Slide a window along the right scanline and
compare contents of that window with the
reference window in the left image
• Matching cost: SSD or normalized correlation
Correspondence search
Left
Right
scanline
SSD
Correspondence search
Left
Right
scanline
Norm. corr
Effect of window size
W=3
– Smaller window
+ More detail
• More noise
– Larger window
+ Smoother disparity maps
• Less detail
W = 20
Failures of correspondence search
Textureless surfaces
Occlusions, repetition
Non-Lambertian surfaces, specularities
Results with window search
Data
Window-based matching
Ground truth
How can we improve window-based
matching?
• So far, matches are independent for each
point
• What constraints or priors can we add?
Stereo constraints/priors
• Uniqueness
– For any point in one image, there should be at
most one matching point in the other image
Stereo constraints/priors
• Uniqueness
– For any point in one image, there should be at most
one matching point in the other image
• Ordering
– Corresponding points should be in the same order in
both views
Stereo constraints/priors
• Uniqueness
– For any point in one image, there should be at most
one matching point in the other image
• Ordering
– Corresponding points should be in the same order in
both views
Ordering constraint doesn’t hold
Non-local constraints
• Uniqueness
– For any point in one image, there should be at most one
matching point in the other image
• Ordering
– Corresponding points should be in the same order in both
views
• Smoothness
– We expect disparity values to change slowly (for the most
part)
Stereo matching as energy minimization
I2
I1
W1(i )
D
W2(i+D(i ))
D(i )
E   Edata ( I1, I 2 , D)   Esmooth ( D)
Edata   W1 (i)  W2 (i  D(i))
2
i
Esmooth 
  D(i)  D( j)
neighbors i , j
• Energy functions of this form can be minimized
using graph cuts
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization
via Graph Cuts, PAMI 2001
Many of these constraints can be encoded in an energy
function and solved using graph cuts
Graph cuts
Ground truth
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy
Minimization via Graph Cuts, PAMI 2001
For the latest and greatest: http://www.middlebury.edu/stereo/
Summary
• Recap of epipolar geometry
– Epipoles are intersection of baseline with image planes
– Matching point in second image is on a line passing through its epipole
– Fundamental matrix maps from a point in one image to an epipole in
the other
– Can recover canonical camera matrices from F (with projective
ambiguity)
• Recovering structure
– Triangulation to recover 3D position of two matched points in images
with known projection matrices
– Sequential algorithm to recover structure from a moving camera,
followed by auto-calibration by assuming fixed K
– Get depth from stereo pair by aligning via homography and searching
across scanlines to match; Depth is inverse to disparity.
Next class
• KLT tracking
• Elegant SFM method using tracked points,
assuming orthographic projection
• Optical flow
```