Invariant features

Download Report

Transcript Invariant features

EECS 274 Computer Vision
Local Invariant Features
:Local features
•
•
•
•
•
•
Matching with Harris Detector
Scale-invariant Feature Detection
Scale Invariant Image Descriptors
Affine-invariant Feature Detection
Object Recognition
SIFT Features
• Reading: S Chapter 4
Examples
Features
What local features to use?
Aperture problem
Ambiguity of 1-dimensional motion perception
Stripes moved left 5 pixels
Stripes moved upward 6 pixels
Introduction
Local invariant photometric descriptors
()
local descriptor
Local : robust to occlusion/clutter + no segmentation
Photometric : distinctive
Invariant : to image transformations + illumination changes
History - Recognition
Color histogram [Swain 91]
Each pixel is described
by a color vector
r
 
g
b
 
Distribution of color vectors
is described by a histogram







 not robust to occlusion, not invariant, not distinctive




History - Recognition
Eigenimages [Turk 91]
• Each face vector is represented in the eigenimage space
– eigenvectors with the highest eigenvalues = eigenimages
..
. .
v2
v1
v3
• The new image is projected into the eigenimage space
– determine the closest face
not robust to occlusion, requires segmentation, not invariant,
discriminant
History - Recognition
Geometric invariants [Rothwell 92]
• Function with a value independent of the
transformation
f ( x, y)  f ( x, y) where ( x, y)t  T ( x, y)t
• Invariant for image rotation : distance of two
points
• Invariant for planar homography : cross-ratio
 local and invariant, not discriminant, requires sub-pixel extraction of
primitives
History - Recognition
Problems : occlusion, clutter, image
transformations, distinctiveness
Solution : recognition with local
photometric invariants
[ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ]
Approach
()
local descriptor
1) Extraction of interest points (characteristic locations)
2) Computation of local descriptors
3) Determining correspondences
4) Selection of similar images
Matching with interest points
• Extraction of interest points with the Harris detector
• Comparison of points with cross-correlation
• Verification with the fundamental matrix
Moravec corner detector
• Developed for Stanford Cart in 1977
Moravec corner detector
Change of intensity for the shift [u,v]:
E (u , v)   w( x, y )  I ( x  u , y  v)  I ( x, y ) 
2
x, y
Window
function
Shifted
intensity
Intensity
Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1)
Look for local maxima in min(E)
Problems of Moravec detector
• Noisy response due to a binary window
function
• Only a set of shifts at every 45 degree is
considered
• Only minimum of E is taken into account
Harris corner detector (1988) solves these
problems.
Harris detector
Based on the idea of auto-correlation
Important difference in all directions  interest point
Interest points
Geometric features
repeatable under transformations
2D characteristics of the signal
high informational content
Comparison of different detectors [Schmid98]
Harris detector
Harris detector
Auto-correlation function for a point ( x, y ) and a shift
(u, v)  (x, y)
E (u, v)   w( x, y )( I ( x, y )  I ( x  u, y  v)) 2
x
y
Discrete shifts can be avoided with the auto-correlation matrix
u 
with I ( x  u, y  v)  I ( x, y )  ( I x ( x, y ) I y ( x, y )) 
v
E (u , v)  
x

u 

  


w
(
x
,
y
)
I
(
x
,
y
)
I
(
x
,
y
)
y
y
 x
 v 

  w( x, y ) * ( I x ( x, y )) 2

 u v  x , y
w( x, y ) * I x ( x, y ) I y ( x, y )

x, y
 uT Au
 I x2
A  w*
I I
 x y
IxI y 

2 
Iy 
2
 w( x, y) * I ( x, y) I ( x, y) u 
 v 
w
(
x
,
y
)(
I
(
x
,
y
))

 
x
y
x, y
2
y
x, y

Comparison of detectors: Rotation
repeatability = #good matches/mean(#points)
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
Comparison of detectors: Perspective
repeatability = #good matches/mean(#points)
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
Harris corner detector
Intensity change in shifting window: eigenvalue analysis
E (u)  E (u, v)  uT Au
 I x2
A  w*
I I
 x y
IxI y 

2 
Iy 
1, 2 – eigenvalues of A
Ellipse E(u,v) = const
direction of the
slowest change
direction of the
fastest change
(max)-1/2
(min)-1/2
Shi and Tomasi use min(1, 2)
to locate good features to track
uncertainty ellipse
Harris corner detector
Classification of image
points using eigenvalues
of M:
2
edge
2 >> 1
Corner
1 and 2 are large,
1 ~ 2;
E increases in all
directions
1 and 2 are small;
E is almost constant
in all directions
flat
edge
1 >> 2
1
Harris detection
• Auto-correlation matrix
– captures the structure of the local neighborhood
– measure based on eigenvalues of this matrix
• 2 strong eigenvalues => interest point
• 1 strong eigenvalue => contour
• 0 eigenvalue
=> uniform region
• Interest point detection
– threshold on the eigenvalues
– local maximum for localization
Harris corner detector
Measure of corner response:
R  det( A)  k (trace( A)) 2
 12  k (1  2 ) 2
(k – empirical constant, k = 0.04-0.06)
Example
Example
Example
Good features
Using auto-correlation
or Hessian matrix
Local descriptors
()
local descriptor
Descriptors characterize the local neighborhood of a point
Local jet
Convolution of image I with Gaussian derivatives
 I ( x, y )  G ( ) 


 I ( x, y )  Gx ( ) 
 I ( x, y ) * G ( ) 
y


v( x, y )   I ( x, y ) * Gxx ( ) 
 I ( x, y ) * G ( ) 
xy


 I ( x, y ) * G yy ( ) 





I ( x, y )  G ( ) 
 
  G( x, y) I ( x  x, y  y)dxdy
  
( x, y)t
t
G (( x, y) ,  ) 
exp( 
)
2
2
2
2
1
2
N-Jet, local jet
• Invariance to image rotation : differential
invariants [Koen87]
L
L

 


 

L
L
L
L

L
L
i
i
x
x
y
y

 


  Lxx Lx Lx  2 Lxy Lx Ly  Lyy Lyy 
Li Lij L j

 

L
L

L
ii
xx
yy

 


   Lxx Lxx  2 Lxy Lxy  Lyy Lyy 
Lij Lij

 


(
L
L
L
L

L
L
L
L
)

 ij jkl i k l


jkk i l l 
 L L L L L LL L  


iij j k k
ijk i j l

 



L
L
L
L


 

ij jkl i k l

 

Lijk Li L j Lk


 

where  ij is the antisymmet ric epsilon te nsor
Local descriptors
Robustness to illumination changes
In case of an affine transformation
I1 (x)  aI 2 (x)  b
or normalization of the image patch with mean and
variance
Li Lij L j




3/ 2
(
L
L
)
i
i




Lii


1/ 2


( Li Li )




Lij L ji




Li Li




  ij ( L jkl Li Lk Ll  L jkk Li Ll Ll ) 


( Li Li ) 2


 L L L L L LL L ) 
ijk i j k
 iij j k k



( Li Li ) 2




  ij L jkl Li Lk Ll


( Li Li ) 2






Lijk Li L j Lk


2
(
L
L
)


i i


Determining correspondences
()
?
=
()
Vector comparison using the Mahalanobis distance
dist M (p, q)  (p  q)T 1 (p  q)
Selection of similar images
• In a large database
– voting algorithm
– additional constraints
• Rapid acces with an indexing
mechanism
Voting algorithm
()
vector of
local characteristics
Voting algorithm
2
1
1
1
0
I 1 is the corresponding model image
Additional constraints
• Semi-local constraints
– neighboring points should match
– angles, length ratios should be similar
1
1
2
3
2
3
• Global constraints
– robust estimation of the image transformation
(homogaphy, epipolar geometry)
Results
database with ~1000 images
Results
Harris detector
Interest points extracted with Harris (~ 500 points)
Cross-correlation matching
Initial matches (188 pairs)
Global constraints
Robust estimation of the fundamental matrix
99 inliers
89 outliers
Summary of Harris detector
• Very good results in the presence of occlusion and
clutter
– local information
– discriminant greyvalue information
– invariance to image rotation and illumination
• Not invariance to scale and affine changes
• Solution for more general view point changes
– local invariant descriptors to scale and rotation
– extraction of invariant points and regions
Scale Invariant Feature Detection
• Consider two images of the same scene, related by
a scale change (i.e. zooming)
• How are their scale space representations related?
Scale Space Theory (Lindeberg ’98):
  normalized derivative s,    t  / 2  x ,   t  / 2  y
( x' , y ' )  ( sx, sy )
I ( x, y )  I ( x' , y ' )
t '  s 2t
L' ( x' , y ' , t ' )  s m ( 1) L( x, y, t )
 is a free parameter for the task at hand
Normalized derivatives have the same value at
corresponding relative scales
Harris detector + scale changes
Scale Adapted Harris Detector
Many corresponding points at which the scale factor corresponds
to scale change between images
Scale invariant Harris points
• Multi-scale extraction of Harris interest points
• Selection of points at characteristic scale in scale space
Laplacian
Chacteristic scale :
- maximum in scale space
- scale invariant
Scale invariant interest points
multi-scale Harris points
selection of points
at the characteristic scale
with Laplacian
invariant points + associated regions [Mikolajczyk & Schmid’01]
Harris-Laplacian Feature
Harris detector – adaptation to scale
Evaluation of scale invariant detectors
repeatability – scale changes
SIFT: Overview
• 1999
• Generates image features, “keypoints”
– invariant to image scaling and rotation
– partially invariant to change in illumination and
3D camera viewpoint
– many can be extracted from typical images
– highly distinctive
Algorithm overview
1. Scale-space extrema detection
– Uses difference-of-Gaussian function
2. Keypoint localization
– Sub-pixel location and scale fit to a model
3. Orientation assignment
– 1 or more for each keypoint
4. Keypoint descriptor
– Created from local image gradients
Scale space
• Definition:
L( x, y,  )  G( x, y,  )  I ( x, y)
1
 ( x 2  y 2 ) / 2 2
e
where G( x, y,  ) 
2
2
Scale space
• Keypoints are detected using scale-space
extrema in difference-of-Gaussian function D
• D definition:
D( x, y,  )  (G ( x, y, k )  G ( x, y,  ))  I ( x, y )
 L( x, y, k )  L( x, y,  )
• Efficient to compute
Relationship of D to
 G
2
2
• Close approximation to scale-normalized
Laplacian of Gaussian,  2 2G
•
•
G
Diffusion equation:    2G
Approximate ∂G/∂σ: G  G( x, y, k )  G( x, y, )

k  
– giving,
G ( x, y, k )  G ( x, y,  )
  2G
k  
G( x, y, k )  G( x, y,  )  (k  1) 2 2G
• When D has scales differing by a constant factor
it already incorporates the σ2 scale normalization
required for scale-invariance
• e.g., k  2
Scale space construction
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
Scale space
• A collection of images obtained by progressively smoothing the
input image
• Analogous to gradually reducing image resolution
• See Vedaldi’s implementation (http://www.vlfeat.org/ ) for details
• Discretized scales
 ( s, o)   0  2( s / S o ) , DoG( o,s )  I o ( o,s 1)  I o ( o,s )
 0  1.6, s  0,, S  1, o  omin ,, omin  O  1
s : # of scales per octave, O : # of octaves
S  3, O  log 2 (min( I w , I h ))
Scale space images
…
…
…
…
…
…
…
…
first octave
second octave
third octave
fourth octave
Difference-of-Gaussian images
…
…
…
…
…
…
…
…
first octave
second octave
third octave
fourth octave
Frequency of sampling
• There is no minimum
• Best frequency determined experimentally
Prior smoothing for each octave
• Increasing σ increases robustness, but costs
• σ = 1.6 a good tradeoff
• Doubling the image initially increases number of
keypoints
Finding extrema
• Sample point D(x,y,σ) is selected only if it is a
minimum or a maximum of these points
Extrema in this image
DoG scale space
Localization
• 3D quadratic function is fit to the local
sample points
• Start with Taylor expansion with sample
point as the origin D()  D  D   12  D 
T
2
T
2
– where
  ( x, y,  )T
• Take the derivative with respect to X, and
D  D ˆ
0


X
set it to 0, giving
X X

D

D
• ˆ     is the location of the keypoint
• This is a 3x3 linear system
2
2
2
1
2
Localization
D  2 D ˆ
0

X
2
X X
2D
  2
 2
 D
 y
2D

 x
2D
y
2D
y 2
2D
yx
2D 
 D 

  
x  

 D 
2D  
y   



yx
 y 

x

 D 
 2 D   
 x 
x 2 
• Hessian and derivative approximated by finite
differences,
– example:
D Dki ,j1  Dki ,j1


2
 2 D Dki ,j1  2 Dki , j  Dki ,j1

 2
1
 2 D ( Dki 11, j  Dki 11, j )  ( Dki 11, j  Dki 11, j )

y
4
• If X is > 0.5 in any dimension, process repeated
Filtering
• Contrast (use prev. equation):
– If |D(X)| < 0.03, throw it out
ˆ )  D  1 D Xˆ
D( 
2 
T
• Edge-iness:
– Use ratio of principal curvatures to throw out poorly
defined peaks
D 
D
H

D

– Curvatures come from Hessian:
D


– Ratio of Trace(H)2 and Determinant(H)
xx
xy
xy
yy
Tr ( H )  Dxx  Dyy    
Tr ( H ) 2 (   ) 2
Det ( H )  Dxx Dyy  ( Dxy )   , ratio 

Det ( H )

ratio > (r+1)2/(r), throw it out (SIFT uses
2
– If
r=10)
Orientation assignment
• Descriptor computed relative to keypoint’s
orientation achieves rotation invariance
• Precomputed along with mag. for all levels
(useful in descriptor computation)
m( x, y )  ( L( x  1, y )  L( x  1, y )) 2  ( L( x, y  1)  L( x, y  1)) 2
 ( x, y )  a tan 2(( L( x, y  1)  L( x, y  1)) /( L( x  1, y )  L( x  1, y )))
• Multiple orientations assigned to keypoints from
an orientation histogram
– Significantly improve stability of matching
Keypoint images
Keypoint Selection
Finding extrema
with DoG
Removing
|D(X)| < 0.03
Removing
|D(X)| < 0.03
Select canonical orientation
• Create histogram of local
gradient directions computed
at selected scale
• Assign canonical orientation
at peak of smoothed
histogram
• Each key specifies stable 2D
coordinates (x, y, scale,
orientation)
Descriptor
• Descriptor has 3 dimensions (x,y,θ)
• Orientation histogram of gradient magnitudes
• Position and orientation of each gradient sample rotated
relative to keypoint orientation
Descriptor
• Weight magnitude of each sample point by Gaussian
weighting function
• Distribute each sample to adjacent bins by trilinear
interpolation (avoids boundary effects)
Descriptor
• Best results achieved with 4x4x8 = 128
descriptor size
• Normalize to unit length
– Reduces effect of illumination change
• Cap each element to 0.2, normalize again
– Reduces non-linear illumination changes
– 0.2 determined experimentally
Object detection
• Create a database
of keypoints from
training images
• Match keypoints to
a database
– Nearest neighbor
search
PCA-SIFT
•
•
•
•
Different descriptor (same keypoints)
Apply PCA to the gradient patch
Descriptor size is 20 (instead of 128)
More robust, faster
SIFT: Summary
•
•
•
•
•
•
Scale space
Difference-of-Gaussian
Localization
Filtering
Orientation assignment
Descriptor, 128 elements
Object recognition
• Definition: Identify an object and determine its
pose and model parameters
• Commercial object recognition
– $4 billion/year industry for inspection and assembly
– Almost entirely based on template matching
• Upcoming applications
– Mobile robots, toys, user interfaces
– Location recognition
– Digital camera panoramas, 3D scene modeling
Invariant local features
• Image content => local feature coordinates invariant to translation,
rotation, scale
• Technical details regarding
–
–
–
–
Keypoint matching (using 1st and 2nd nearest neighbors)
Efficient nearest neighbor indexing
Clustering with Hough transform (model location, orientation, scale)
Account for affine distortion
Features
Advantages of invariant local features
• Locality: features are local, so robust to
occlusion and clutter (no prior segmentation)
• Distinctiveness: individual features can be
matched to a large database of objects
• Quantity: many features can be generated for
even small objects
• Efficiency: close to real-time performance
Experimental evaluation
Scale change (factor 2.5)
Harris-Laplace
DoG
Viewpoint change (60 degrees)
Harris-Affine (Harris-Laplace)
Descriptors - conclusion
• SIFT + steerable perform best
• Performance of the descriptor independent of the
detector
• Errors due to imprecision in region estimation,
localization
Image retrieval
…
> 5000
images
change in viewing angle
Matches
22 correct matches
Image retrieval
…
> 5000
images
change in viewing angle
+ scale change
Matches
33 correct matches
Multiple panoramas from an unordered image set
Image registration and blending
Location recognition
Robot localization
• Joint work with Stephen Se, Jim Little
Recognizing panoramas
• Matthew Brown and David Lowe
• Recognize overlap from an unordered set of images and
automatically stitch together
• SIFT features provide initial feature matching
• Image blending at multiple scales hides the seams
Panorama automatically assembled from 143 images
Map continuously built over time
Planar recognition
• Planar surfaces can be
reliably recognized at a
rotation of 60° away from
the camera
• Affine fit approximates
perspective projection
• Only 3 points are needed
for recognition
3D object recognition
• Extract
outlines with
background
subtraction
3D object recognition
• Only 3 keys are needed
for recognition, so extra
keys provide robustness
• Affine model is no longer
as accurate
Recognition under occlusion
Comparison to template matching
• Costs of template matching
– 250,000 locations x 30 orientations x 4 scales = 30,000,000
evaluations
– Does not easily handle partial occlusion and other variation
without large increase in template numbers
– Viola & Jones cascade must start again for each qualitatively
different template
• Costs of local feature approach
– 3000 evaluations (reduction by factor of 10,000)
– Features are more invariant to illumination, 3D rotation, and
object variation
– Use of many small subtemplates increases robustness to partial
occlusion and other variations
More local features
•
•
•
•
•
•
Harris Laplace
Harris Affine
Hessian detector
Hessian Laplace
Hessian Affine
MSER (Maximally Stable Extremal
Regions)
• SURF (Speeded-Up Robust Feature)