Transcript Document

EECS 274 Computer Vision
Motion Estimation
Motion estimation
• Aligning images
• Estimate motion parameters
• Optical flow
– Lucas-Kanade algorithm
– Horn-Schunck algorithm
• Slides based on Szeliski’s lecture notes
• Reading: FP Chapter 15, S Chapter 8
Why estimate visual motion?
• Visual Motion can be annoying
– Camera instabilities, jitter
– Measure it; remove it (stabilize)
• Visual Motion indicates dynamics in the
scene
– Moving objects, behavior
– Track objects and analyze trajectories
• Visual Motion reveals spatial layout
– Motion parallax
Classes of techniques
• Feature-based methods
– Extract visual features (corners, textured areas) and track them
– Sparse motion fields, but possibly robust tracking
– Suitable especially when image motion is large (10s of pixels)
• Direct methods
– Directly recover image motion from spatio-temporal image
brightness variations
– Global motion parameters directly recovered without an
intermediate feature motion calculation
– Dense motion fields, but more sensitive to appearance
variations
– Suitable for video and when image motion is small (< 10 pixels)
Optical flow vs. motion field
• Optical flow does not always correspond to motion field
• Optical flow is an approximation of the motion field
• The error is small at points with high spatial gradient
under some simplifying assumptions
Patch matching
• How do we determine correspondences?
– block matching or SSD (sum squared differences)
Correlation and SSD
• For larger displacements, do template matching
– Define a small area around a pixel as the template
– Match the template against each pixel within a search
area in next image
– Use a match measure such as correlation, normalized
correlation, or sum-of-squares difference
– Choose the maximum (or minimum) as the match
– Sub-pixel estimate (Lucas-Kanade)
Discrete search vs. gradient
based
• Consider image I translated by u0 , v0
I 0 ( x, y )  I ( x, y )
I1 ( x  u0 , y  v0 )  I ( x, y )  1 ( x, y )
E (u, v)   ( I ( x, y )  I1 ( x  u , y  v)) 2
x, y
  ( I ( x, y )  I ( x  u0  u , y  v0  v)  1 ( x, y )) 2
x, y
• The discrete search method simply searches for the best
estimate
• The gradient method linearizes the intensity function and
solves for the estimate
Brightness constancy
• Brightness Constancy Equation:
I 0 ( x, y )  I1 ( x  u ( x, y ), y  v( x, y ))
I 0 is the template image
• Minimize :
E (u, v)  ( I1 ( x  u, y  v)  I 0 ( x, y)) 2
• Linearize (assuming small (u,v))
using Taylor series expansion:
I1 ( x, y)  I1 ( x, y)  I1x ( x, y)  u ( x, y)  I1 y ( x, y)  v( x, y)
Estimating optical flow
• Assume the image intensity I is constant
Time = t
Time = t+dt
 x, y 
 x  dx, y  dy 
I 0 ( x, y, t )  I1 ( x  x, y  y, t  t )
Optical flow constraint
I ( x, y, t )  I ( x  x, y  y, t  t )
I
I
I
I ( x, y, t )  I ( x, y, t )  x  y  t
x
y
t
I x I y I

  0, and let t  0
x t y t t
I xu  I y v  I t  0
I xu  I y v   I t ,
I
x
u 
I y     I t , I T u  b; Au  b
v 
v
(I x , I y )

E (u , v)  ( I x u  I y v  I t ) 2
u
Lucas-Kanade algorithm
Assume a single velocity for all pixels within an image patch
E (u, v) 
2
(
I
(
x
,
y
)
u

I
(
x
,
y
)
v

I
)
 x
y
t
x , y
Minimizing
Au  b
2
1
u  ( A A) A b
T
Hessian
  I x2

 I x I y
T
I I
I
x y
2
y
 u 
 I x I t 
    

 I y I t 
  v 
( I I T )u   I  I t
LHS: sum of the 2x2 outer product of the gradient vector
Matrix form
E (u  u)   I1 (x i  u  u)  I o (x i )
2
i
  I1 (x i  u)  J1 (x i  u)u  I o (x i )
2
i
  J1 (x i  u)u  ei 
2
i
 I1 I1 
(x i  u)
J1 (x i  u)  I1 (x i  u)  
,
 x y 
ei  I1 (x i  u)  I o (x i )
Matrix form
Au  b
A   J1T (x i  u)J1 (x i  u)
i
b   ei J1 ( xi  u )
i
  I x2
A
 I x I y
I I
I
x y
2
y

 I x I t 
, b   

 I y I t 

 I1 I1 
(x i  u)
J1 (x i  u)  I1 (x i  u)  
,
 x y 
ei  I1 (x i  u)  I o (x i )
J1 (x i  u)  J 0 (x i ) for computational efficiency
Computing gradients in X-Y-T
y
time
j+1
k+1
j
k
i
Ix 
i+1
x
1
[( I i 1, j ,k  I i 1, j ,k 1  I i 1, j 1,k  I i 1, j 1,k 1 )
4 x
 ( I i , j ,k  I i , j ,k 1  I i , j 1,k  I i , j 1,k 1 )]
likewise for Iy and It
Local patch analysis
• How certain are the motion estimates?
The aperture problem
 I x I t 
Let A   I I , and b   

 I y I t 
T
• Algorithm: At each pixel compute u by solving Au  b
• A is singular if all gradient vectors point in the same direction
• e.g., along an edge
• of course, trivially singular if the summation is over a single pixel
or there is no texture
• i.e., only normal flow is available (aperture problem)
• Corners and textured areas are OK
SSD surface – textured area
SSD surface – edge
SSD – homogeneous area
Iterative refinement
• Estimate velocity at each pixel using one
iteration of Lucas and Kanade estimation
• Warp one image toward the other using
the estimated flow field
(easier said than done)
• Refine estimate by repeating the process
Optical flow: iterative estimation
estimate
update
Initial guess:
Estimate:
x0
x
(using d for displacement here instead of u)
Optical flow: iterative estimation
estimate
update
Initial guess:
Estimate:
x0
x
Optical flow: iterative estimation
estimate
update
Initial guess:
Estimate:
x0
x
Optical flow: iterative estimation
x0
x
Optical flow: iterative estimation
• Some implementation issues:
– Warping is not easy (ensure that errors in warping are
smaller than the estimate refinement)
– Warp one image, take derivatives of the other so you
don’t need to re-compute the gradient after each
iteration
– Often useful to low-pass filter the images before
motion estimation (for better derivative estimation,
and linear approximations to image intensity)
Error functions
• Robust error function
x2
E (u)    ( I (xi  u)  I (x)),  ( x) 
2
2
1

x
/
a
i
• Spatially varying weights
E (u)   w0 (xi ) w1 (xi  u)I (xi  u)  I (xi )
2
i
• Bias and gain: images taken with different exposure
I (x  u)  (1   ) I (x)   ,  is the gain and  is the bias
E (u)   I (x i  u)  (1   ) I (x i )   
2
i
• Correlation (and normalized cross correlation)
E (u)   I (xi ) I (xi  u)
i
Horn-Schunck algorithm
• Global method with smoothness constraint to solve
aperture problem
• Minimize a global energy functional with calculus of
variations



E   I xu  I y v  I t    u  v dxdy
2
2
2
2
L  L  L


0
u x u x y u y
L  L  L


0
v x vx y v y
where L is the integrand of the energy function
Horn-Schunck algorithm
I x ( I x u  I y v  I t )   2 u  0
I y ( I x u  I y v  I t )   2 v  0
2
2
where   2  2 is the Laplace operator
x y
u ( x, y )  u ( x, y )  u ( x, y )
( I x2   2 )u  I x I y v   2u  I x I t
I x I y u  ( I y2   2 )v   2 v  I y I t
See Robot Vision by Horn for details
Horn-Schunck algorithm
• Iterative scheme
k
k
I
(
I
u

I
v
 It )
x
x
y
k 1
k
u u 
 2  I 2x I 2y
v k 1  v k 
(u , v )
v
(u, v)
(I x , I y )
I y ( I xu k  I y v k  I t )
 I I
• Yields high density flow
• Fill in missing information in the homogenous
regions
2
2
x
2
y
• More sensitive to noise than local methods
u
Optical flow: aliasing
Temporal aliasing causes ambiguities in optical flow because
images can have many pixels with the same intensity.
I.e., how do we know which ‘correspondence’ is correct?
actual shift
estimated shift
nearest match is correct
(no aliasing)
nearest match is incorrect
(aliasing)
To overcome aliasing: coarse-to-fine estimation.
Coarse-to-fine estimation
warp
+
a

aw
refine
J pixels
u=1.25

u=2.5 pixels
Δa
u=5 pixels
image J
Pyramid of image J
u=10 pixels
image I
Pyramid of image I
Coarse-to-fine estimation

ain
J
J
warp
+
pyramid
construction
J
I
Jw
refine
Jw
refine
warp
warp
+

a

a
+
J
I

aout
Jw
pyramid
construction
I

a
refine

a
I
Global (parametric) motion models
•
•
•
•
2D Models:
Affine
Quadratic
Planar projective transform (Homography)
•
•
•
•
3D Models:
Instantaneous camera motion models
Homography+epipole
Plane+Parallax
Motion models
Translation
Affine
Perspective
3D rotation
2 unknowns
6 unknowns
8 unknowns
3 unknowns
Example: affine motion
u ( x, y )  a1  a2 x  a3 y
v( x, y )  a4  a5 x  a6 y
• Substituting into the B.C. equation
I x  u  I y  v  It  0
I x (a1  a2 x  a3 y)  I y (a4  a5 x  a6 y)  I t  0
Each pixel provides 1 linear constraint in 6 global unknowns
Least square minimization (over all pixels):


Err (a )   I x (a1  a2 x  a3 y )  I y (a4  a5 x  a6 y )  I t

2
Parametric motion
x' (x; p) : motion field or correspond ence map
E   I1 (x' (x i ; p  p))  I 0 (x i )
2
i
  I1 (x i ' )  J1 (x i ' )p  I 0 (x i )
2
i
  J1 (x i ' )p  ei 
2
i
x i '
I1
J1 ( x i ' ) 
 I 1 ( x i ' )
(x i ' )
p
p
i.e., the product of image gradient with Jacobian of the correspondence field
Parametric motion


A   J Tx ' (x i ) I1T (xi ' )I1 (xi ' ) J Tx ' (xi )
i
i

b   J Tx ' (xi ) ei I1T (xi ' )
i
i
i

the expressions inside the brackets are the same as the cases for
simpler translation motion case
Other 2D Motion Models
Quadratic – instantaneous
approximation to planar motion
u  q1  q2 x  q3 y  q7 x 2  q8 xy
v  q4  q5 x  q6 y  q7 xy  q8 y 2
x' 
Projective – exact planar motion
h1  h2 x  h3 y
h7  h8 x  h9 y
h4  h5 x  h6 y
y' 
h7  h8 x  h9 y
and
u  x' x, v  y ' y
3D Motion Models
Instantaneous camera motion:
u   xy X  (1  x 2 )Y  y Z  (TX  TZ x) Z
2
Global parameters:  X , Y , Z , TX , TY , TZ v  (1  y ) X  xyY  x Z  (TY  TZ x) Z
Local Parameter:
Z ( x, y)
Homography+Epipole
h1 x  h2 y  h3   t1
x' 
h7 x  h8 y  h9   t3
y' 
h4 x  h5 y  h6   t1
h7 x  h8 y  h9   t3
Global parameters: h1 ,, h9 , t1 , t2 , t3
Local Parameter:  ( x, y )
and : u  x' x,
Residual Planar Parallax Motion
u  xw  x 
Global parameters:
Local Parameter:
t1 , t2 , t3
 ( x, y )
v  y ' y

(t3 x  t1 )
1   t3

v  yw  x 
(t3 y  t 2 )
1   t3
Shi-Tomasi feature tracker
1. Find good features (min eigenvalue of 22
Hessian)
2. Use Lucas-Kanade to track with pure
translation
3. Use affine registration with first feature patch
4. Terminate tracks whose dissimilarity gets too
large
5. Start new tracks when needed
Tracking results
Tracking - dissimilarity
Tracking results
Correlation window size
• Small windows lead to more false matches
• Large windows are better this way, but…
– Neighboring flow vectors will be more correlated (since the
template windows have more in common)
– Flow resolution also lower (same reason)
– More expensive to compute
• Small windows are good for local search:
more detailed and less smooth (noisy?)
• Large windows good for global search:
less detailed and smoother
Robust estimation
Noise distributions are often non-Gaussian, having much heavier tails.
Noise samples from the tails are called outliers.
• Sources of outliers (multiple motions):
– specularities / highlights
– jpeg artifacts / interlacing / motion blur
– multiple motions (occlusion boundaries, transparency)
velocity space
u2
+
+
u1
How to handle background and foreground motion
Robust estimation
Standard Least Squares Estimation allows too much influence
for outlying points
E (m)    ( xi )
i
 ( xi )  ( xi  m) 2

Influence  ( x) 
 ( xi  m)
x
Robust estimation
Ed (u s , vs )    I xu s  I y vs  I t 
Ed (u s , vs )    I ( x, y )  I ( x  u s , y  vs ) 
Robust gradient constraint
Robust SSD
Robust estimation
Problem: Least-squares estimators penalize deviations between
data & model with quadratic error fn (extremely sensitive to outliers)
error penalty function
 ( )  
influence function
 ( )
 ( ) 
 2

2
Redescending error functions (e.g., Geman-McClure) help to reduce
the influence of outlying measurements.
error penalty function
2
 ( ; s) 
s  2
influence function
2s
 ( ; s) 
(s   2 )2
Performance evaluation
• See Baker et al. “A Database and Evaluation
Methodology for Optical Flow”, ICCV 2007
• Algorithms:
• Pyramid LK: OpenCV-based implementation of
Lucas-Kanade on a Gaussian pyramid
• Black and Anandan: Author’s implementation
• Bruhn et al.: Our implementation
• MediaPlayerTM: Code used for video frame-rate
upsampling in Microsoft MediaPlayer
• Zitnick et al.: Author’s implementation
Performance evaluation
• Difficulty: Data substantially more challenging
than Yosemite
• Diversity: Substantial variation in difficulty
across the various datasets
• Motion GT vs Interpolation: Best algorithms
for one are not the best for the other
• Comparison with Stereo: Performance of
existing flow algorithms appears weak
Motion representations
• How can we describe this scene?
Block-based motion prediction
• Break image up into square blocks
• Estimate translation for each block
• Use this to predict next frame, code difference
(MPEG-2)
Layered motion
• Break image sequence up into “layers”:


• Describe each layer’s motion
=
Layered motion
•
•
•
•
•
•
•
•
Advantages:
can represent occlusions / disocclusions
each layer’s motion can be smooth
video segmentation for semantic processing
Difficulties:
how do we determine the correct number?
how do we assign pixels?
how do we model the motion?
Layers for video summarization
Background modeling (MPEG-4)
• Convert masked images into a
background sprite for layered video coding
+
=
+
+
+
+
+
What are layers?
•
•
•
•
•
•
Intensities
Velocities
Layers
Alpha matting
Sprites
Wang and Adelson,
“Representing moving
images with layers”, IEEE
Transactions on Image
Processing, 1994
How do we form them?
How do we estimate the layers?
1.
2.
3.
4.
5.
compute coarse-to-fine flow
estimate affine motion in blocks (regression)
cluster with k-means
assign pixels to best fitting affine region
re-estimate affine motions in each region…
Layer synthesis
• For each layer:
•
•
stabilize the sequence with the affine motion
compute median value at each pixel
• Determine occlusion relationships
Results
Applications
•
•
•
•
•
•
•
Motion analysis
Video coding
Morphing
Video denoising
Video stabilization
Medical image registration
Frame interpolation