Transcript SIFT

2011.4.14
Reporter: Fei-Fei Chen
 Wide-baseline matching
 Object recognition
 Texture recognition
 Scene classification
 Robot wandering
 Motion tracking
 Change in illumination
 3D camera viewpoint
 etc.
…
> 5000
images
change in viewing angle
22 correct matches
…
> 5000
images
change in viewing angle
+ scale change
 Find corresponding features across two or more views.
 Elements to be matched are image patches of fixed size
 Task: Find the best (most similar) patch in a second image.
 Intuition: This would be a good match for matching,
since it is very distinctive.
 Intuition: This would be a BAD patch for matching,
since it is not very distinctive.
 Intuitively, junctions of contours.
 Generally more stable features over change of viewpoint.
 Intuitively, large variations in the neighborhood of the point
in all directions.
 They are good features to match!
 Detection of Scale-Space Extrema
 Accuracy Keypoint localization
 Orientation assignment
 Keypoint descriptor
detector
descriptor
 For scale invariance, search for stable features across
all possible scales using a continuous function of scale,
scale space.
 SIFT uses DoG filter for scale space because it is
efficient and as stable as scale-normalized Laplacian of
Gaussian.
Convolution with a variable-scale Gaussian
Difference-of-Gaussian (DoG) filter
Convolution with the DoG filter
 doubles for
the next octave
K=2(1/s)
Dividing into octave is for efficiency only.
X is selected if it is larger or smaller than all 26 neighbors
 Reject (1) points with low contrast (flat)
(2) poorly localized along an edge (edge)
 Fit a 3D quadratic function for sub-pixel maxima
6
6
1
3
5
f ( x)  6  2 x 
6
x  6  2x  3x
2
2
f '( x)  2  6 x  0
1
-1
xˆ 
1
3
2
1
1
f ( xˆ )  6  2   3     6
3
3
3
1
0
1
3
+1
2
 Taylor series of several variables
 Two variables
2
2
2
 f
f  1   f 2
 f
 f 2
f ( x , y )  f ( 0 , 0 )  
x
y   
x 2
xy 
y 
y  2  xx
xy
yy
 x

 x
 0  
f    f  
 y 
 0 
 
 
f  x   f 0  
f
x
 f

 x
T
x
f   x  1
    x
y   y  2
1
2
 f
2
x
T
x
2
x
 2 f

xx
y  2
 f
 xy

2
 f 

xy  x 
 
2
 f  y
 y  y 
 Taylor expansion in a matrix form, x is a vector, f maps
x to a scalar
Hessian matrix
(often symmetric)
gradient
 f

  x1
 f
 x
 1
 
 f
 x
 n










 2 f

2

x

1
 2 f

  x 2  x1


 2 f

  x n  x1
 f
2
 x1  x 2
 f
2

2
x

2
 f
2
2
xn x2


 x1  x n 
2
 f 

x2xn 


2
 f 

2
xn 
 f



2
2
f
f
1   f
 f



2
2

x x 2 x
x

T
2


f

f
x 

x
2

x x

 x is a 3-vector
 Remove sample point if offset is larger than 0.5
 Throw out low contrast (<0.03)
Hessian matrix at keypoint location
Let
Keep the points with
r=10
 By assigning a consistent orientation, the keypoint
descriptor can be orientation invariant.
 For a keypoint, L is the Gaussian-smoothed image with
the closest scale,
(Lx, Ly)
m
θ
orientation histogram (36 bins)
σ=1.5*scale of the keypoint
accurate peak position
is determined by fitting
36-bin orientation histogram over 360°,
weighted by m and 1.5*scale falloff
Peak is the orientation
Local peak within 80% creates multiple orientations
About 15% has multiple orientations and they
contribute a lot to stability
0
2
• Thresholded image gradients are sampled over 16x16 array of locations in
scale space
• Create array of orientation histograms (w.r.t. key orientation)
• 8 orientations x 4x4 histogram array = 128 dimensions
• Normalized for intensity variance, clip values larger than 0.2, renormalize
σ=0.5*width
 Detection of Scale-Space Extrema
 Accuracy Keypoint localization
 Orientation assignment
 Keypoint descriptor
For scale invariance
Remove unstable feature points
For rotation invariance
For illumination invariance
 Image scale invariance.
 Image rotation invariance.
 Robust matching across a substantial range of
(1) affine distortion,
(2) change in 3D viewpoint,
(3) addition of noise,
(4) change in illumination.
 For a feature x, he found the closest feature x1
and the second closest feature x2. If the distance ratio
of d(x, x1) and d(x, x2) is smaller than 0.8, then it is
accepted as a match.
Thanks for your attention!
Q&A