ppt - TAMU Computer Science Faculty Pages

Download Report

Transcript ppt - TAMU Computer Science Faculty Pages

Scale-Invariant Feature
Transform (SIFT)
Jinxiang Chai
Review
Image Processing
- Median filtering
- Bilateral filtering
- Edge detection
- Corner detection
Review: Corner Detection
1. Compute image gradients
2. Construct the matrix from it and
its neighborhood values
3. Determine the 2 eigenvalues
λ(i.j)= [λ1, λ2].
4. If both λ1 and λ2 are big, we
have a corner
C( i , j )
  I x2

 I x I y
I I
I
x y
2
y



The Orientation Field
Corners are detected where both λ1 and λ2 are big
Good Image Features
• What are we looking for?
– Strong features
– Invariant to changes (affine and
perspective/occlusion)
– Solve the problem of correspondence
• Locate an object in multiple images (i.e. in video)
• Track the path of the object, infer 3D structures,
object and camera movement,
Scale Invariant Feature Transform
(SIFT)
• Choosing features that are invariant to image
scaling and rotation
• Also, partially invariant to changes in illumination
and 3D camera viewpoint
Invariance
•
•
•
•
Illumination
Scale
Rotation
Affine
Required Readings
• Object recognition from local scaleinvariant features [pdf link], ICCV 09
• David G. Lowe, "Distinctive image
features from scale-invariant
keypoints," International Journal of
Computer Vision, 60, 2 (2004), pp. 91-110
Motivation for SIFT
• Earlier Methods
– Harris corner detector
• Sensitive to changes in image scale
• Finds locations in image with large gradients in two
directions
– No method was fully affine invariant
• Although the SIFT approach is not fully invariant it
allows for considerable affine change
• SIFT also allows for changes in 3D viewpoint
SIFT Algorithm Overview
1.
2.
3.
4.
Scale-space extrema detection
Keypoint localization
Orientation Assignment
Generation of keypoint descriptors.
Scale Space
• Different scales are appropriate for
describing different objects in the image,
and we may not know the correct
scale/size ahead of time.
Scale space (Cont.)
• Looking for features (locations) that are
stable (invariant) across all possible scale
changes
– use a continuous function of scale (scale space)
• Which scale-space kernel will we use?
– The Gaussian Function
Scale-Space of Image
•
L(x, y,k )  G(x, y,k ) * I(x,y)
G( x, y, k ) - variable-scale Gaussian
I ( x, y ) - input image
Scale-Space of Image
•
L(x, y,k )  G(x, y,k ) * I(x,y)
G( x, y, k ) - variable-scale Gaussian
I ( x, y ) - input image
• To detect stable keypoint locations, find the
scale-space extrema in difference-ofGaussian function
D( x, y,  )  L( x, y, k )  L( x, y,  ) D( x, y,  )  (G( x, y, k )  G( x, y,  )) * I ( x, y)
Scale-Space of Image
•
L(x, y,k )  G(x, y,k ) * I(x,y)
G( x, y, k ) - variable-scale Gaussian
I ( x, y ) - input image
• To detect stable keypoint locations, find the
scale-space extrema in difference-ofGaussian function
D( x, y,  )  L( x, y, k )  L( x, y,  ) D( x, y,  )  (G( x, y, k )  G( x, y,  )) * I ( x, y)
Scale-Space of Image
•
L(x, y,k )  G(x, y,k ) * I(x,y)
G( x, y, k ) - variable-scale Gaussian
I ( x, y ) - input image
• To detect stable keypoint locations, find the
scale-space extrema in difference-ofGaussian function
D( x, y,  )  L( x, y, k )  L( x, y,  ) D( x, y,  )  (G( x, y, k )  G( x, y,  )) * I ( x, y)
Look familiar?
Scale-Space of Image
•
L(x, y,k )  G(x, y,k ) * I(x,y)
G( x, y, k ) - variable-scale Gaussian
I ( x, y ) - input image
• To detect stable keypoint locations, find the
scale-space extrema in difference-ofGaussian function
D( x, y,  )  L( x, y, k )  L( x, y,  ) D( x, y,  )  (G( x, y, k )  G( x, y,  )) * I ( x, y)
Look familiar?
-bandpass filter!
Difference of Gaussian
1. A = Convolve image with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. So how to deal with different scales?
Difference of Gaussian
1. A = Convolve image with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. Downsample B with bilinear interpolation
with pixel spacing of 1.5 (linear
combination of 4 adjacent pixels)
Difference of Gaussian Pyramid
Blur
A3-B3
B3
DOG3
A3
Downsample
A2-B2
B2
Blur
DOG2
A2
Input Image
Downsample
A1-B1
Blur
B1
DOG1
Blur
A1
Other issues
• Initial smoothing ignores highest spatial
frequencies of images
Other issues
• Initial smoothing ignores highest spatial
frequencies of images
- expand the input image by a factor of 2, using bilinear
interpolation, prior to building the pyramid
Other issues
• Initial smoothing ignores highest spatial
frequencies of images
- expand the input image by a factor of 2, using bilinear
interpolation, prior to building the pyramid
• How to do downsampling with bilinear
interpolations?
Bilinear Filter
Weighted sum of four neighboring pixels
x
u
y
v
Bilinear Filter
y
(i,j)
Sampling at S(x,y):
(i,j+1)
u
x
v
(i+1,j)
S(x,y) =
a*b*S(i,j)
(i+1,j+1)
+ a*(1-b)*S(i+1,j)
+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)
Bilinear Filter
y
(i,j)
Sampling at S(x,y):
(i,j+1)
u
x
v
(i+1,j)
S(x,y) =
a*b*S(i,j)
(i+1,j+1)
+ a*(1-b)*S(i+1,j)
+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)
To optimize the above, do the following
Si = S(i,j) + a*(S(i,j+1)-S(i))
Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))
S(x,y) = Si+b*(Sj-Si)
Bilinear Filter
y
(i,j)
(i,j+1)
x
(i+1,j)
(i+1,j+1)
Pyramid Example
A3
A2
A1
DOG3
B3
B2
DOG3
B1
DOG1
Feature Detection
• Find maxima and minima of scale space
• For each point on a DOG level:
– Compare to 8 neighbors at same level
– If max/min, identify corresponding point at pyramid
level below
– Determine if the corresponding point is max/min of its 8
neighbors
– If so, repeat at pyramid level above
• Repeat for each DOG level
• Those that remain are key points
Identifying Max/Min
DOG L+1
DOG L
DOG L-1
Refining Key List: Illumination
• For all levels, use the “A” smoothed image
to compute
– Gradient Magnitude
• Threshold gradient magnitudes:
– Remove all key points with MIJ less than 0.1
times the max gradient value
• Motivation: Low contrast is generally less
reliable than high for feature points
Results: Eliminating Features
• Removing features in low-contrast regions
Results: Eliminating Features
• Removing features in low-contrast regions
Assigning Canonical Orientation
• For each remaining key point:
– Choose surrounding N x N window at DOG
level it was detected
DOG image
Assigning Canonical Orientation
• For all levels, use the “A” smoothed image
to compute
– Gradient Orientation
+
Gaussian Smoothed Image
Gradient Orientation
Gradient Magnitude
Assigning Canonical Orientation
• Gradient magnitude weighted by 2D
Gaussian with σ of 3 times that of the
current smoothing scale
=
*
Gradient Magnitude
2D Gaussian
Weighted Magnitude
Assigning Canonical Orientation
Weighted Magnitude
Gradient Orientation
Sum of Weighted Magnitudes
• Accumulate in histogram
based on orientation
• Histogram has 36 bins with
10° increments
Gradient Orientation
Assigning Canonical Orientation
Weighted Magnitude
Gradient Orientation
Sum of Weighted Magnitudes
• Identify peak and assign
orientation and sum of
magnitude to key point
Peak
Gradient Orientation
Eliminating edges
• Difference-of-Gaussian function will be strong
along edges
– So how can we get rid of these edges?
Eliminating edges
• Difference-of-Gaussian function will be strong
along edges
– Similar to Harris corner detector
T
 Dxx Dxy 
I
I
 x  x 
H 
 H (i , j )     
D
D
yy 
 xy
 I y  I y 
  I x2

 I x I y
– We are not concerned about actual values of
eigenvalue, just the ratio of the two
Tr (H) 2 (   ) 2 (r  ) 2 (r  1) 2



2
Det(H)

r
r
I I
I
x y
2
y



Eliminating edges
• Difference-of-Gaussian function will be strong
along edges
– So how can we get rid of these edges?
Local Image Description
• SIFT keys each assigned:
– Location
– Scale (analogous to level it was detected)
– Orientation (assigned in previous canonical
orientation steps)
• Now: Describe local image region invariant
to the above transformations
SIFT: Local Image Description
• Needs to be invariant to changes in
location, scale and rotation
SIFT Key Example
Local Image Description
For each key point:
• Identify 8x8
neighborhood (from
DOG level it was
detected)
• Align orientation to xaxis
Local Image Description
3. Calculate gradient magnitude and
orientation map
4. Weight by Gaussian
Local Image Description
5. Calculate histogram of each 4x4 region.
8 bins for gradient orientation. Tally
weighted gradient magnitude.
Local Image Description
6. This histogram array is the image
descriptor. (Example here is vector,
length 8*4=32. Best suggestion: 128
vector for 16x16 neighborhood)
Applications: Image Matching
• Find all key points identified in source and target
image
– Each key point will have 2d location, scale and
orientation, as well as invariant descriptor vector
• For each key point in source image, search
corresponding SIFT features in target image.
• Find the transformation between two images
using epipolar geometry constraints or affine
transformation.
Image matching via SIFT featrues
Feature detection
Image matching via SIFT featrues
• Image
matching via nearest neighbor search
- if the ratio of closest distance to 2nd closest distance greater
than 0.8 then reject as a false match.
• Remove
outliers using epipolar line constraints.
Image matching via SIFT featrues
Summary
• SIFT features are reasonably invariant to
rotation, scaling, and illumination changes.
• We can use them for image matching and
object recognition among other things.
• Efficient on-line matching and recognition
can be performed in real time