Scale Invariant Feature Transform

Download Report

Transcript Scale Invariant Feature Transform

Object Recognition from Local
Scale-Invariant Features
David G. Lowe
Presented by Ashley L. Kapron
Introduction
• Object Recognition
– Recognize known objects in unknown
configurations
Previous Work
• Zhang et al
– Harris Corner Detection
– Detect peaks in local image
variation
• Schmid and Mohr
– Harris Corner Detection
– Local image descriptor at each
interest pt from an orientationinvariant vector of derivative-ofGaussian image measurements
Motivation
• Limitations of previous work:
– Examine image only on a single scale
• Current paper addresses this concern by
identifying stable key locations in scale
space
• Identify features that are invariant
Invariance
•
•
•
•
Illumination
Scale
Rotation
Affine
Scale Space
• Different scales are appropriate for
describing different objects in the image,
and we may not know the correct
scale/size ahead of time.
Difference of Gaussian
1. A = Convolve image with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
2. B = Convolve A with vertical and
horizontal 1D Gaussians, σ=sqrt(2)
3. DOG (Difference of Gaussian) = A – B
4. Downsample B with bilinear interpolation
with pixel spacing of 1.5 (linear
combination of 4 adjacent pixels)
Difference of Gaussian Pyramid
Blur
A3-B3
B3
DOG3
A3
Downsample
A2-B2
B2
Blur
DOG2
A2
Input Image
Downsample
A1-B1
Blur
B1
DOG1
Blur
A1
Pyramid Example
A3
A2
A1
DOG3
B3
B2
DOG3
B1
DOG1
Feature detection
• Find maxima and minima of scale space
• For each point on a DOG level:
– Compare to 8 neighbors at same level
– If max/min, identify corresponding point at pyramid
level below
– Determine if the corresponding point is max/min of its 8
neighbors
– If so, repeat at pyramid level above
• Repeat for each DOG level
• Those that remain are key points
Identifying Max/Min
DOG L+1
DOG L
DOG L-1
Refining Key List: Illumination
• For all levels, use the “A” smoothed image
to compute
– Gradient Magnitude
• Threshold gradient magnitudes:
– Remove all key points with MIJ less than 0.1
times the max gradient value
• Motivation: Low contrast is generally less
reliable than high for feature points
Assigning Canonical Orientation
• For each remaining key point:
– Choose surrounding N x N window at DOG
level it was detected
DOG image
Assigning Canonical Orientation
• For all levels, use the “A” smoothed image
to compute
– Gradient Orientation
+
Gaussian Smoothed Image
Gradient Orientation
Gradient Magnitude
Assigning Canonical Orientation
• Gradient magnitude weighted by 2D
gaussian
=
*
Gradient Magnitude
2D Gaussian
Weighted Magnitude
Assigning Canonical Orientation
Weighted Magnitude
Gradient Orientation
Sum of Weighted Magnitudes
• Accumulate in histogram
based on orientation
• Histogram has 36 bins with
10° increments
Gradient Orientation
Assigning Canonical Orientation
Weighted Magnitude
Gradient Orientation
Sum of Weighted Magnitudes
• Identify peak and assign
orientation and sum of
magnitude to key point
Peak
Gradient Orientation
Refining Key List: Rotation
• The user may choose a threshold to
exclude key points based on their
assigned sum of magnitudes.
Example of Refinement
Max/mins from
DOG pyramid
Filter for
illumination
Filter for edge
orientation
Local Image Description
• SIFT keys each assigned:
– Location
– Scale (analogous to level it was detected)
– Orientation (assigned in previous canonical
orientation steps)
• Now: Describe local image region invariant
to the above transformations
SIFT key example
Local Image Description
For each key point:
• Identify 8x8
neighborhood (from
DOG level it was
detected)
• Align orientation to xaxis
Local Image Description
3. Calculate gradient magnitude and
orientation map
4. Weight by Gaussian
Local Image Description
5. Calculate histogram of each 4x4 region.
8 bins for gradient orientation. Tally
weighted gradient magnitude.
Local Image Description
6. This histogram array is the image
descriptor. (Example here is vector,
length 8*4=32. Best suggestion: 128
vector for 16x16 neighborhood)
Database Creation
• Index all key points of
reference model
image(s)
– Store key point
descriptor vectors in
database
Image Matching
• Find all key points identified in target image
– Each key point will have 2d location, scale and
orientation, as well as invariant descriptor vector
• For each key point, find similar descriptor
vectors in reference image database.
– Descriptor vector may match more than one
reference image database
– The key point “votes” for image(s)
• Use best-bin-first algorithm
Hough Transform Clustering
•
Create 4D Hough Transform (HT) Space
for each reference image
1.
2.
3.
4.
•
Orientation bin = 30° bin
Scale bin = 2
X location bin = 0.25*ref image width
Y location bin = 0.25*ref image height
If key point “votes” for reference image,
tally its vote in 4D HT Space.
– This gives estimate of location and pose
– Keep list of which key points vote for a bin
Verification
• Identify bins with largest votes (must have
at least 3).
• Using list of key points which voted for a
cell, compute affine transformation
parameters (m, t)
– Use corresponding coordinates of reference
model (x,y) and target image (u,v).
Verification
• If more than three points, solve in leastsquares sense
Verification: Remove Outliers
• After applying affine transformation to key
points, determine difference between
calculated location and actual target image
location
• Throw out if:
– Orientation different by 15°
– Scale off by sqrt(2)
– X,Y location by 0.2*model size
• Repeat least-squares solution until no
points are thrown out
SIFT Example
SIFT Example
SIFT example
Advantages of SIFT
• Numerous keys can be generated for even small objects
• Partial occlusion/image clutter ok because dozens of
SIFT keys may be associated with an object, but only
need to find 3
• Object models can undergo limited affine projection.
– Planar shapes can be recognized at 60 degree rotation away
from camera.
• Individual features can be matched to a large database
of objects
Limitations of SIFT
• Fully affine tranformations require additional steps
• Many parameters “engineered” for specific application.
May need to be evaluated on case-to-case basis
Thank you!