Transcript Slides

Distinctive Image Feature from
Scale-Invariant KeyPoints
David G. Lowe, 2004
Presentation Content
• Introduction
• Related Research
• Algorithm
– Keypoint localization
– Orientation assignment
– Keypoint descriptor
• Recognizing images using keypoint descriptors
• Achievements and Results
• Conclusion
Introduction
• Image matching is a fundamental aspect of
many problems in computer vision.
Scale Invariant Feature Transform
(SIFT)
• Object or Scene recognition.
• Using local invariant image features.
(keypoints)
–
–
–
–
–
–
Scaling
Rotation
Illumination
3D camera viewpoint (affine)
Clutter / noise
Occlusion
• Realtime
Related Research
– Corner detectors
• Moravec 1981
• Harris and Stepens 1988
• Harris 1992
• Zhang 1995
• Torr 1995
• Schmid and Mohr 1997
– Scale invariant
• Crowley and Parker 1984
• Shokoufandeh 1999
• Lindeberg 1993, 1994
• Lowe 1999 (this author)
– Invariant to full affine transformation
•
•
•
•
•
Baumberg 2000
Tuytelaars and Van Gool 2000
Mikolajczyk and Schmid 2002
Schaffalitzky and Zisserman 2002
Brown and Lowe 2002
Keypoint Detection
• Goal: Identify locations and scales that can be
repeatably assigned under differing views of
the same object.
• Keypoints detection is done at a specific scale
and location
• Difference of gaussian function
• Search for stable features across all possible scales
D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I (x, y)
= L(x, y, kσ) − L(x, y, σ).
σ = amount of smoothing
k = constant : 2^(1/s)
KeyPoint Detection
• Reasonably low cost
• Scale sensative
• Number of scale samples per
octave?
• 3 scale samples per octave
where used (although more is
better).
• Determine amount of smoothing (σ)
• Loss of high frequency information so double up
Accurate Keypoint Localization
(1/2)
• Use Taylor expansion to determine the
interpolated location of the extrema (local
maximum).
Calculate the extrema at this exact location
and discart extrema below 3% difference of it
surroundings.
Accurate Keypoint Localization
(2/2)
• Eliminating Edge Responses
• Deffine a Hessian matrix with derivatives of
pixel values in 4 directions
• Detirmine ratio of maxiumum eigenvalue
divided by smaller one.
•
#KeyPoints
0
832
729
536
Orientation Assignment
• Caluculate orientation and magnitude of
gradients in each pixel
• Histogram of orientations of sample points near
keypoint.
• Weighted by its gradient magnitude and by a
Gaussian-weighted circular window with a σ that is
1.5 times that of the scale of the keypoint.
Stable orientation results
• Multiple keypoints for multiple
histogram peaks
• Interpolation
The Local Image Discriptor
• We now can find keypoints invariant to
location scale and orientation.
• Now compute discriptors for each keypoint.
• Highly distinctive yet invariant for illumination
and 3D viewpoint changes.
• Biologically inspired approach.
• Divide sample points around keypoint in 16 regions
(4 regions used in picture)
• Create histogram of orientations of each region (8
bins)
• Trilinear interpolation.
• Vector normalization
Descriptor Testing
This graph shows the percent of keypoints giving the correct match to a
database of 40,000 keypoints as a function of width of the n×n keypoint
descriptor and the number of orientations in each histogram. The graph is
computed for images with affine viewpoint change of 50 degrees and
addition of 4% noise.
Keypoint Matching
• Look for nearest neighbor in database
(euclidean distance)
• Comparing the distance of the closest
neighbor to that of the second-closest
neighbor.
• Distance closest / distance second-closest >
0.8 then discard.
Efficient Nearest Neighbor Indexing
.
•
•
•
•
•
128-dimensional feature vector
Best-Bin-First (BBF)
Modified k-d tree algorithm.
Only find an approximate answer.
Works well because of 0.8 distance rule.
Clustering with the Hough
Transform
• Select 1% inliers among 99% outliers
• Find clusteres of features that vote for the
same object pose.
– 2D location
– Scale
– Orientation
– Location relative to original training image.
• Use broad bin sizes.
Solution for Affine Parameters
• An affine transformation correctly accounts
for 3D rotation of a planar surface under
orthographic projection, but the
approximation can be poor for 3D rotation of
non-planar objects.
Basiclly: we do not create a 3D representation
of the object.
• The affine transformation of a model point [x
y] to an image point [u v] can be written as
•Outliers are discarded
•New matches can be
found by top-down
matching
Results
Results
Conclusion
• Invariant to image rotation and scale and
robust across a substantial range of affine
distortion, addition of noise, and change in
illumination.
• Realtime
• Lots of applications
Further Research
• Color
• 3D representation of world.