Transcript SIFT - RUG

Distinctive Image Features
from Scale-Invariant Keypoints
By David G. Lowe, University of British Columbia
Presented by:
Tim Havinga,
Joël van Neerbos
and
Robert van der Linden
Organization
• Introduction
• Keypoint extraction
• Applications
Introduction
Matching images across
affine transformation:
Change in lighting
and 3D viewpoint:
Introduction
• Motion tracking
• Object and scene recognition
• Stereo correspondence
Extracting features
•
•
•
•
Extrema detection
Keypoint localization
Orientation assignment
Local image descriptor
Extrema detection
Blur copies of the image with broadening Gaussian
filters.
Extrema detection
Subtract these (DoG) to find local extrema.
Extrema detection
Calculate the DoGs for different gaussians.
2x
Extrema detection
Calculate the DoGs for different gaussians.
2x
Extrema detection
Blur
Blur
Keypoint localization
Select keypoints that are higher or lower than their
26 neighbours.
Keypoint localization
Reject all points where the contrast is too low.
Keypoint localization
Reject all points that lie on an edge.
Effects of this elimination
Extrema detection
Effects of this elimination
Contrast check
Effects of this elimination
Edge check
Extracting features
•
•
•
•
Extrema detection
Keypoint localization
Orientation assignment
Local image descriptor
Orientation assignment
Assign an orientation to a keypoint to make its
descriptor invariant to rotation
Orientation assignment
The orientation of a keypoint is determined in four
steps:
1. Determine sample points
2. Determine the gradient magnitude and
orientation of each sample point
3. Create an orientation histogram of the sample
points
4. Extract the dominant directions from the
histogram
Step 1: Determine sample points
•
•
The source image is the Gaussian smoothed
image with the closest scale
Use all pixels within a certain radius
Actual scale
Used Gaussian
Step 2: Determine gradient magnitude and
orientation of each sample point
•
Gradient magnitude:
•
Gradient orientation:
Step 2: Determine gradient magnitude and
orientation of each sample point
•
Gradient magnitude:
•
Gradient orientation:
pixel
Step 3: Create an orientation histogram
•
•
The histogram has 36 bins, each covering 10
degrees
Each sample is weighted its gradient magnitude
and a Gaussian weighted circular window
Step 4: Extract dominant directions
•
•
•
Take the peak(s) from the orientation
histogram
Use all peaks greater than 80% of the highest
peak
Every direction gets its own keypoint
The Local image descriptor
•
Every keypoint now has a location, scale and
orientation, from which a repeatable 2D grid
can be determined
•
We want distinctive descriptor vectors, partially
invariant to illumination and viewpoint changes
Computing the Local image descriptor
•
•
•
Take the 16 x 16 sample array around the
keypoint
Compute 4 x 4 orientation histograms from this
array
Use 8 bins per histogram: 4x4x8=128 features
Local image descriptor optimizations
•
•
•
Normalize the obtained feature vector to
enhance invariance to illumination changes
Reduce the influence of large gradient
magnitudes by capping the normalized features
to 0.2
Normalize again
Possible applications for SIFT
We have a feature extraction method which
yields useful keypoints, what's next?
• Some appications:
• Object recognition in images
• Panorama stitching
• 3D scene modelling
• 3D human action tracking
(for example for security surveillance)
• Robot localisation and mapping
Panorama stitching
Panorama stitching
Brown, ICCV 2003
3D modelling
(from Sudderth et al., 2006)
Application: SIFT to object recognition
We can applicate SIFT to recognize objects in images.
Say, we have an image which contains an object.
How to recognize?
Key idea: Compare keypoints, if these are similar it is
likely that it is the same object.
First problem: a lot of features arise from background
clutter. How to remove these?
Possible approach:
- Look for clusters of matching features
- Look for distance of closest match to the secondclosest match
Efficiently locating the nearest neighbour
128 dimensional feature vector for each keypoint:
no search optimization possible, no better way
to find the nearest neighbour than exhaustive
search.
But: only 3 features are enough to locate objects,
for example when occluded.
Hough Transform method is used to describe
clusters of keypoints as shapes and let them
'vote' for the pose of an object, described in
location, orientation and scale.
Application: robot vision,
localization and mapping
Se, S. Lowe, D. G. Little, J. Vision-based Mobile
Robot Localization And Mapping using ScaleInvariant Features, 2001
 Application of SIFT to mobile robotics
 SIFT features combined with Simultaneous
Localization And Map Building (SLAMB)
 Recognizing landmarks: estimation of the
 10m by 10m lab, 3000 features collected
 Preliminary results: quite good
Conclusions from the paper
• The keypoints SIFT extracts are indeed
invariant to image rotation, scale and robust to
affine distortion, noise and change in
illumination.
• SIFT can be optimized to run real-time.
• The proposed approach (SIFT combined with
Hough transform for object recognition) has
shown to work reliably.
Discussion
• Is the SIFT method for keypoint extraction the
best way to get distinctive features from
images?
• Is SIFT biologically plausible? Is it important
to have biologically inspired methods in object
recognition / localization?
References
Main article:
• Distictive Image Features from Scale-Invariant
Keypoints, D. G. Lowe. International Journal of
Computer Vision 60, 91-110, 2004.
Other articles:
• Depth from Familiar Objects: A Hierarchical Model for
3D Scenes, Sudderth et al, Proceedings of the 2006
IEEE Conference on computer vision and pattern
recognition, volume II, 2410-2417, 2006.
• Vision-based Mobile Robot Localization And Mapping
using Scale-Invariant Features, Se, S. Lowe, D. G.
Little, J., 2001