Assignment - University of Delaware

Download Report

Transcript Assignment - University of Delaware

CREDITS
Rasmussen, UBC (Jim Little), Seitz (U. of Wash.), Camps
(Penn. State), UC, UMD (Jacobs), UNC, CUNY
Computer Vision : CISC 4/689
Multi-View Geometry
Relates
• 3D World Points
• Camera Centers
• Camera Orientations
Computer Vision : CISC 4/689
Multi-View Geometry
Relates
• 3D World Points
• Camera Centers
• Camera Orientations
• Camera Intrinsic Parameters
• Image Points
Computer Vision : CISC 4/689
Stereo
scene point
image plane
optical center
Computer Vision : CISC 4/689
Stereo
• Basic Principle: Triangulation
– Gives reconstruction as intersection of two rays
– Requires
• calibration
• point correspondence
Computer Vision : CISC 4/689
Stereo Constraints
p’ ?
p
Given p in left image, where can the corresponding point p’
in right image be?
Computer Vision : CISC 4/689
Stereo Constraints
M
Image plane
Y1
Epipolar Line
p
p’
Y2
Z1
O1
X2
X1
Focal plane
O2
Epipole
Computer Vision : CISC 4/689
Z2
Stereo
• The geometric information that relates two different
viewpoints of the same scene is entirely contained in a
mathematical construct known as fundamental matrix.
• The geometry of two different images of the same scene is
called the epipolar geometry.
Computer Vision : CISC 4/689
Stereo/Two-View Geometry
•
•
The relationship of two views of a
scene taken from different camera
positions to one another
Interpretations
– “Stereo vision” generally means
two synchronized cameras or eyes
capturing images
– Could also be two sequential views
from the same camera in motion
• Assuming a static scene
http://www-sop.inria.fr/robotvis/personnel/sbougnou/Meta3DViewer/EpipolarGeo
Computer Vision : CISC 4/689
3D from two-views
There are two ways of extracting 3D from a pair of images.
• Classical method, called Calibrated route, we need to calibrate both
cameras (or viewpoints) w.r.t some world coordinate system. i.e,
calculate the so-called epipolar geometry by extracting the essential
matrix of the system.
• Second method, called uncalibrated route, a quantity known as
fundamental matrix is calculated from image correspondences, and this
is then used to determine the 3D.
Either way, principle of binocular vision is triangulation. Given a single
image, the 3D location of any visible object point must lie on the
straight line that passes through COP and image point (see fig.).
Intersection of two such lines from two views is triangulation.
Computer Vision : CISC 4/689
Mapping Points between Images
• What is the relationship between the images x, x’ of the
scene point X in two views?
• Intuitively, it depends on:
– The rigid transformation between cameras (derivable from the
camera matrices P, P’)
– The scene structure (i.e., the depth of X)
• Parallax: Closer points appear to move more
Computer Vision : CISC 4/689
Example: Two-View Geometry
x3
x2
x’3
x1
x’2
x’1
courtesy of F. Dellaert
Is there a transformation relating the points
Computer Vision : CISC 4/689
xi to x’i ?
Epipolar Geometry
• Baseline: Line joining camera centers C, C’
• Epipolar plane ¦: Defined by baseline and scene point X
Computerbaseline
Vision : CISC 4/689
from Hartley
& Zisserman
Epipolar Lines
•
Epipolar lines l, l’: Intersection of epipolar plane ¦ with image planes
•
Epipoles e, e’: Where baseline intersects image planes
– Equivalently, the image in one view of the other camera center.
C’
C
Computer Vision : CISC 4/689
from Hartley
& Zisserman
Epipolar Pencil
•
As position of X varies, epipolar planes “rotate” about the baseline (like a book
with pages)
– This set of planes is called the epipolar pencil
•
Epipolar lines “radiate” from epipole—this is the pencil of epipolar lines
Computer Vision : CISC 4/689
from Hartley
& Zisserman
Epipolar Constraint
•
•
•
Camera center C and image point define ray in 3-D space that projects to epipolar line l’ in other
view (since it’s on the epipolar plane)
3-D point X is on this ray, so image of X in other view x’ must be on l’
In other words, the epipolar geometry defines a mapping
in the other
x ! l’, of points in one image to lines
x’
C’
C
from Hartley
& Zisserman
Computer Vision : CISC 4/689
Example: Epipolar Lines for Converging
Cameras
Left view
Right view
Intersection of epipolar lines = Epipole !
Indicates direction of other camera
Computer Vision : CISC 4/689
from Hartley
& Zisserman
Special Case:
Translation Parallel to Image Plane
Note that epipolar lines are parallel and corresponding points lie on correspondVision : CISC 4/689
ing epipolar lines (the latter is Computer
true for
all kinds of camera motions)
From Geometry to Algebra
P
p
p’
O’
O
Computer Vision : CISC 4/689
From Geometry to Algebra
P
p
p’
O’
O
Computer Vision : CISC 4/689
Rotation arrow should be at the other end, to get p’ in p coordinates
Linear Constraint:
Should be able to express as matrix
multiplication.
Computer Vision : CISC 4/689
Review: Matrix Form of Cross Product
Computer Vision : CISC 4/689
Review: Matrix Form of Cross Product
Computer Vision : CISC 4/689
Matrix Form
Computer Vision : CISC 4/689
The Essential Matrix
If un-calibrated, p gets multiplied by Intrisic matrix, K
Computer Vision : CISC 4/689
The Fundamental Matrix F
• Mapping of point in one image to epipolar line in other image x
! l’ is
expressed algebraically by the fundamental matrix F
line
point
= Fx
Since x’ is on l’, by the point-on-line definition we know that
x’T l’ = 0
Substitute l’ = Fx, we can thus relate corresponding points in the
camera pair (P, P’) to each other with the following:
x’T Fx = 0
• Write this as l’
•
•
Computer Vision : CISC 4/689
The Fundamental Matrix F
• F is 3 x 3, rank 2 (not invertible, in contrast to homographies)
– 7 DOF (homogeneity and rank constraint take away 2 DOF)
•
The fundamental matrix of (P’, P) is the transpose FT
NOW, can get implicit equation for any x, which is epipolar line)
x’
Computer Vision : CISC 4/689
from Hartley
& Zisserman
Computing Fundamental Matrix
u T Fu  0
(u’ is same as x in the prev. slide,
u’ is same as x)
Fundamental Matrix is singular with rank 2
In principal F has 7 parameters up to scale and can be estimated
from 7 point correspondences
Direct Simpler Method requires 8 correspondences
Computer Vision : CISC 4/689
Estimating Fundamental Matrix
The 8-point algorithm
u Fu  0
T
Each point correspondence can be expressed as a linear equation
 F11
u v 1 F21
 F31
F12
F22
F32
F13  u 
F23   v   0
F33   1 
 F11 
F 
 12 
 F13 
 
 F21 
uu uv u uv vv v u v 1 F22   0
 
 F23 
F 
 31 
 F32 
F 
 33 
Computer Vision : CISC 4/689
The 8-point Algorithm
Lot of squares, so numbers
have varied range, from say
1000 to 1. So pre-normalize.
And RANSaC!
Computer Vision : CISC 4/689
Computing F: The Eight-point
Algorithm
• Input: n point correspondences ( n >= 8)
– Construct homogeneous system Ax= 0 from
pr TF pl  0
• x = (f11,f12, ,f13, f21,f22,f23 f31,f32, f33) : entries in F
• Each correspondence gives one equation
• A is a nx9 matrix (in homogenous format)
– Obtain estimate F^ by SVD of A
A  UDVT
• x (up to a scale) is column of V corresponding to the least
singular value
– Enforce singularity constraint: since Rank (F) = 2
• Compute SVD of F^
ˆ  UDVT
F
• Set the smallest singular value to 0: D -> D’
• Correct estimate of F :
F'  UD' VT
• Output: the estimate of the fundamental matrix, F’
• Similarly we can compute E given intrinsic parameters
Computer Vision : CISC 4/689
Locating the Epipoles from F
pr TF pl  0 el lies on all the epipolar
lines of the left image
pr TFel  0
Fel  0
P
Pl
Epipolar Plane
True For every
pr
F is not identically zero
• Input: Fundamental Matrix F
Pr
Epipolar Lines
pl
Ol
p
r
el
er
Epipoles
T
F  UDV
– Find the SVD of F
– The epipole el is the column of V corresponding to the null singular
value (as shown above)
– The epipole er is the column of U corresponding to the null singular
value
• Output: Epipole el and er
Computer Vision : CISC 4/689
Or
Special Case:
Translation along Optical Axis
• Epipoles coincide at focus of expansion
• Not the same (in general) as vanishing point of scene lines
Computer Vision : CISC 4/689
from Hartley & Zisserman
Finding Correspondences
• Epipolar geometry limits where feature in one image can
be in the other image
– Only have to search along a line
Computer Vision : CISC 4/689
Simplest Case
•
•
•
•
Image planes of cameras are parallel.
Focal points are at same height.
Focal lengths same.
Then, epipolar lines are horizontal scan lines.
Computer Vision : CISC 4/689
We can always achieve this geometry
with image rectification
• Image Reprojection
– reproject image planes onto common
plane parallel to line between optical centers
•
Notice, only focal point of camera really matters
Computer Vision : CISC 4/689
(Seitz)
Stereo Rectification
P
Stereo
System with Parallel Optical Axes
Epipoles are at infinity
Pl
Pr

Horizontal epipolar lines

Y’l
p’
r
p’ l
Y’r
Z’l
• Rectification
X’l
Ol T
X’r
Or
– Given a stereo pair, the intrinsic and extrinsic parameters, find the image
transformation to achieve a stereo system of horizontal epipolar lines
– A simple algorithm: Assuming calibrated stereo cameras
Computer Vision : CISC 4/689
Z’r
Stereo Rectification
P
• Algorithm
– Rotate both left and right
camera so that they share
the same X axis : Or-Ol =
T
– Define a rotation matrix
Rrect for the left camera
– Rotation Matrix for the
right camera is RrectRT
– Rotation can be
implemented by image
transformation
Pl
p
Yl
p
l
Xl
X’l
Pr
Zl
r
Zr
Ol T
Or
R, T
Xl’ = T_axis,
Computer Vision : CISC 4/689
Yr
Yl’ = Xl’xZl,
Xr
Z’l = Xl’xYl’
Stereo Rectification
P
• Algorithm
– Rotate both left and right
camera so that they share
the same X axis : Or-Ol =
T
– Define a rotation matrix
Rrect for the left camera
– Rotation Matrix for the
right camera is RrectRT
– Rotation can be
implemented by image
transformation
Pl
p
Yl
p
l
Xl
X’l
Pr
Zl
r
Zr
Ol T
Or
R, T
Xl’ = T_axis,
Computer Vision : CISC 4/689
Yr
Yl’ = Xl’xZl,
Xr
Z’l = Xl’xYl’
Stereo Rectification
P
• Algorithm
– Rotate both left and right
camera so that they share
the same X axis : Or-Ol =
T
– Define a rotation matrix
Rrect for the left camera
– Rotation Matrix for the
right camera is RrectRT
– Rotation can be
implemented by image
transformation
Pl
Y’l
Pr
p’
r
p’ l
Y’r Z
r
Z’l
X’l
X’r
Ol T
R, T
T’ = (B, 0, 0),
Computer Vision : CISC 4/689
Or
Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923
Computer Vision : CISC 4/689
Teesta suspension
bridge-Darjeeling,
India
Computer Vision
: CISC 4/689
Mark Twain at Pool
Table", no
date,: UCR
Computer
Vision
CISCMuseum
4/689 of Photography
Woman getting eye exam during immigration procedure at Ellis
Computer
CISC 4/689
Island, c. 1905
- 1920Vision
, UCR :Museum
of Phography
Stereo matching
• attempt to match every pixel
• use additional constraints
Computer Vision : CISC 4/689
A Simple Stereo System
LEFT CAMERA
RIGHT CAMERA
baseline
Right image:
target
Left image:
reference
disparity
Depth Z
Elevation Zw
Zw=0
Computer Vision : CISC 4/689
Let’s discuss reconstruction with this geometry before correspondence, because
it’s much easier.
( -ve,  +ve, refer
previous slide fig.)
P
xl,yl=(f X/Z, f Y/Z)
Xr,yr=(f (X-T)/Z, f Y/Z)
d=xl-xr=f X/Z – f (X-T)/Z
Z
f
xl
Disparity:
xr
pl
pr
Ol
Or
T
Then given Z, we can compute X
and Y.
T is the stereo baseline
d measures the difference in retinal position between corresponding points
Computer Vision : CISC 4/689
(Camps)
Correspondence: What should we match?
•
•
•
•
Objects?
Edges?
Pixels?
Collections of pixels?
Computer Vision : CISC 4/689
Extracting Structure
•
•
The key aspect of epipolar geometry is its linear constraint on where a point in one
image can be in the other
By correlation-matching pixels (or features) along epipolar lines and measuring the
disparity between them, we can construct a depth map (scene point depth is inversely
proportional to disparity)
courtesy of P. Debevec
View 1
View 2
Computer Vision : CISC 4/689
Computed depth map
Correspondence: Photometric constraint
• Same world point has same intensity in both images.
– Lambertian fronto-parallel
– Issues:
• Noise
• Specularity
• Foreshortening
Computer Vision : CISC 4/689
Using these constraints we can use matching
for stereo
For each epipolar line
For each pixel in the left image
• compare with every pixel on same epipolar line in right image
• pick pixel with minimum match cost
• This will never work, so:
Improvement: match windows
(Seitz)
Computer Vision : CISC 4/689
Aggregation
• Use more than one pixel
• Assume neighbors have similar disparities*
– Use correlation window containing pixel
– Allows to use SSD, ZNCC, etc.
Computer Vision : CISC 4/689
?
=
Comparing Windows:
f
Most
popular
For each window, match to closest window on epipolar
line in other image.
(Camps)
Computer Vision : CISC 4/689
g
Comparing image regions
Compare intensities pixel-by-pixel
I(x,y)
I´(x,y)
Dissimilarity measures
Sum of Square Differences
Computer Vision : CISC 4/689
Comparing image regions
Compare intensities pixel-by-pixel
I(x,y)
I´(x,y)
Similarity measures
Zero-mean Normalized Cross Correlation
Computer Vision : CISC 4/689
Aggregation window sizes
Small windows
• disparities similar
• more ambiguities
• accurate when correct
Large windows
• larger disp. variation
• more discriminant
• often more robust
• use shiftable windows to deal with
discontinuities
Computer Vision : CISC 4/689
(Illustration from Pascal Fua)
Window size
W=3
•
Effect of window size
Better results with adaptive window
•
•
(Seitz)
W = 20
T. Kanade and M. Okutomi, A Stereo Matching
Algorithm with an Adaptive Window: Theory and
Experiment,, Proc. International Conference on
Robotics and Automation, 1991.
D. Scharstein and R. Szeliski. Stereo matching with
nonlinear diffusion. International Journal of
Computer Vision, 28(2):155-174, July 1998
Computer Vision : CISC 4/689
Correspondence Using Windowbased matching
scanline
Left
SSD error
Right
disparity
Computer Vision : CISC 4/689
Sum of Squared (Pixel) Differences
Left
Right
wL
wR
m
wL
wR
IL
IR
m
( xL , yL ) ( xL  d , yL )
wL and wR are correspond ing m by m windows of pixels.
We define the window function :
Wm ( x, y )  {u, v | x  m2  u  x  m2 , y  m2  v  y  m2 }
The SSD cost measures the intensity difference as a function of disparity :
C r ( x, y , d ) 
2
[
I
(
u
,
v
)

I
(
u

d
,
v
)]
 L
R
( u ,v )Wm ( x , y )
Computer Vision : CISC 4/689
Image Normalization
•
•
•
Even when the cameras are identical models, there can be differences in gain and sensitivity.
The cameras do not see exactly the same surfaces, so their overall light levels can differ.
For these reasons and more, it is a good idea to normalize the pixels in each window:
I
I
1
Wm ( x , y )
Wm ( x , y )

 I (u, v)
Average pixel
( u ,v )Wm ( x , y )
2
[
I
(
u
,
v
)]

Window magnitude
( u ,v )Wm ( x , y )
ˆI ( x, y )  I ( x, y )  I
I  I W ( x, y )
Normalized pixel
m
Computer Vision : CISC 4/689
Stereo results
– Data from University of Tsukuba
Scene
(Seitz)
Ground truth
Computer Vision : CISC 4/689
Results with window correlation
Window-based matching
(best window size)
(Seitz)
Computer Vision : CISC 4/689
Ground truth
Results with better method
State of the art method
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
International Conference on Computer Vision, September 1999.
Computer Vision : CISC 4/689
(Seitz)
Ground truth