Invariant features
Download
Report
Transcript Invariant features
EECS 274 Computer Vision
Local Invariant Features
:Local features
•
•
•
•
•
•
Matching with Harris Detector
Scale-invariant Feature Detection
Scale Invariant Image Descriptors
Affine-invariant Feature Detection
Object Recognition
SIFT Features
• Reading: S Chapter 4
Examples
Features
What local features to use?
Aperture problem
Ambiguity of 1-dimensional motion perception
Stripes moved left 5 pixels
Stripes moved upward 6 pixels
Introduction
Local invariant photometric descriptors
()
local descriptor
Local : robust to occlusion/clutter + no segmentation
Photometric : distinctive
Invariant : to image transformations + illumination changes
History - Recognition
Color histogram [Swain 91]
Each pixel is described
by a color vector
r
g
b
Distribution of color vectors
is described by a histogram
not robust to occlusion, not invariant, not distinctive
History - Recognition
Eigenimages [Turk 91]
• Each face vector is represented in the eigenimage space
– eigenvectors with the highest eigenvalues = eigenimages
..
. .
v2
v1
v3
• The new image is projected into the eigenimage space
– determine the closest face
not robust to occlusion, requires segmentation, not invariant,
discriminant
History - Recognition
Geometric invariants [Rothwell 92]
• Function with a value independent of the
transformation
f ( x, y) f ( x, y) where ( x, y)t T ( x, y)t
• Invariant for image rotation : distance of two
points
• Invariant for planar homography : cross-ratio
local and invariant, not discriminant, requires sub-pixel extraction of
primitives
History - Recognition
Problems : occlusion, clutter, image
transformations, distinctiveness
Solution : recognition with local
photometric invariants
[ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ]
Approach
()
local descriptor
1) Extraction of interest points (characteristic locations)
2) Computation of local descriptors
3) Determining correspondences
4) Selection of similar images
Matching with interest points
• Extraction of interest points with the Harris detector
• Comparison of points with cross-correlation
• Verification with the fundamental matrix
Moravec corner detector
• Developed for Stanford Cart in 1977
Moravec corner detector
Change of intensity for the shift [u,v]:
E (u , v) w( x, y ) I ( x u , y v) I ( x, y )
2
x, y
Window
function
Shifted
intensity
Intensity
Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1)
Look for local maxima in min(E)
Problems of Moravec detector
• Noisy response due to a binary window
function
• Only a set of shifts at every 45 degree is
considered
• Only minimum of E is taken into account
Harris corner detector (1988) solves these
problems.
Harris detector
Based on the idea of auto-correlation
Important difference in all directions interest point
Interest points
Geometric features
repeatable under transformations
2D characteristics of the signal
high informational content
Comparison of different detectors [Schmid98]
Harris detector
Harris detector
Auto-correlation function for a point ( x, y ) and a shift
(u, v) (x, y)
E (u, v) w( x, y )( I ( x, y ) I ( x u, y v)) 2
x
y
Discrete shifts can be avoided with the auto-correlation matrix
u
with I ( x u, y v) I ( x, y ) ( I x ( x, y ) I y ( x, y ))
v
E (u , v)
x
u
w
(
x
,
y
)
I
(
x
,
y
)
I
(
x
,
y
)
y
y
x
v
w( x, y ) * ( I x ( x, y )) 2
u v x , y
w( x, y ) * I x ( x, y ) I y ( x, y )
x, y
uT Au
I x2
A w*
I I
x y
IxI y
2
Iy
2
w( x, y) * I ( x, y) I ( x, y) u
v
w
(
x
,
y
)(
I
(
x
,
y
))
x
y
x, y
2
y
x, y
Comparison of detectors: Rotation
repeatability = #good matches/mean(#points)
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
Comparison of detectors: Perspective
repeatability = #good matches/mean(#points)
[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]
Harris corner detector
Intensity change in shifting window: eigenvalue analysis
E (u) E (u, v) uT Au
I x2
A w*
I I
x y
IxI y
2
Iy
1, 2 – eigenvalues of A
Ellipse E(u,v) = const
direction of the
slowest change
direction of the
fastest change
(max)-1/2
(min)-1/2
Shi and Tomasi use min(1, 2)
to locate good features to track
uncertainty ellipse
Harris corner detector
Classification of image
points using eigenvalues
of M:
2
edge
2 >> 1
Corner
1 and 2 are large,
1 ~ 2;
E increases in all
directions
1 and 2 are small;
E is almost constant
in all directions
flat
edge
1 >> 2
1
Harris detection
• Auto-correlation matrix
– captures the structure of the local neighborhood
– measure based on eigenvalues of this matrix
• 2 strong eigenvalues => interest point
• 1 strong eigenvalue => contour
• 0 eigenvalue
=> uniform region
• Interest point detection
– threshold on the eigenvalues
– local maximum for localization
Harris corner detector
Measure of corner response:
R det( A) k (trace( A)) 2
12 k (1 2 ) 2
(k – empirical constant, k = 0.04-0.06)
Example
Example
Example
Good features
Using auto-correlation
or Hessian matrix
Local descriptors
()
local descriptor
Descriptors characterize the local neighborhood of a point
Local jet
Convolution of image I with Gaussian derivatives
I ( x, y ) G ( )
I ( x, y ) Gx ( )
I ( x, y ) * G ( )
y
v( x, y ) I ( x, y ) * Gxx ( )
I ( x, y ) * G ( )
xy
I ( x, y ) * G yy ( )
I ( x, y ) G ( )
G( x, y) I ( x x, y y)dxdy
( x, y)t
t
G (( x, y) , )
exp(
)
2
2
2
2
1
2
N-Jet, local jet
• Invariance to image rotation : differential
invariants [Koen87]
L
L
L
L
L
L
L
L
i
i
x
x
y
y
Lxx Lx Lx 2 Lxy Lx Ly Lyy Lyy
Li Lij L j
L
L
L
ii
xx
yy
Lxx Lxx 2 Lxy Lxy Lyy Lyy
Lij Lij
(
L
L
L
L
L
L
L
L
)
ij jkl i k l
jkk i l l
L L L L L LL L
iij j k k
ijk i j l
L
L
L
L
ij jkl i k l
Lijk Li L j Lk
where ij is the antisymmet ric epsilon te nsor
Local descriptors
Robustness to illumination changes
In case of an affine transformation
I1 (x) aI 2 (x) b
or normalization of the image patch with mean and
variance
Li Lij L j
3/ 2
(
L
L
)
i
i
Lii
1/ 2
( Li Li )
Lij L ji
Li Li
ij ( L jkl Li Lk Ll L jkk Li Ll Ll )
( Li Li ) 2
L L L L L LL L )
ijk i j k
iij j k k
( Li Li ) 2
ij L jkl Li Lk Ll
( Li Li ) 2
Lijk Li L j Lk
2
(
L
L
)
i i
Determining correspondences
()
?
=
()
Vector comparison using the Mahalanobis distance
dist M (p, q) (p q)T 1 (p q)
Selection of similar images
• In a large database
– voting algorithm
– additional constraints
• Rapid acces with an indexing
mechanism
Voting algorithm
()
vector of
local characteristics
Voting algorithm
2
1
1
1
0
I 1 is the corresponding model image
Additional constraints
• Semi-local constraints
– neighboring points should match
– angles, length ratios should be similar
1
1
2
3
2
3
• Global constraints
– robust estimation of the image transformation
(homogaphy, epipolar geometry)
Results
database with ~1000 images
Results
Harris detector
Interest points extracted with Harris (~ 500 points)
Cross-correlation matching
Initial matches (188 pairs)
Global constraints
Robust estimation of the fundamental matrix
99 inliers
89 outliers
Summary of Harris detector
• Very good results in the presence of occlusion and
clutter
– local information
– discriminant greyvalue information
– invariance to image rotation and illumination
• Not invariance to scale and affine changes
• Solution for more general view point changes
– local invariant descriptors to scale and rotation
– extraction of invariant points and regions
Scale Invariant Feature Detection
• Consider two images of the same scene, related by
a scale change (i.e. zooming)
• How are their scale space representations related?
Scale Space Theory (Lindeberg ’98):
normalized derivative s, t / 2 x , t / 2 y
( x' , y ' ) ( sx, sy )
I ( x, y ) I ( x' , y ' )
t ' s 2t
L' ( x' , y ' , t ' ) s m ( 1) L( x, y, t )
is a free parameter for the task at hand
Normalized derivatives have the same value at
corresponding relative scales
Harris detector + scale changes
Scale Adapted Harris Detector
Many corresponding points at which the scale factor corresponds
to scale change between images
Scale invariant Harris points
• Multi-scale extraction of Harris interest points
• Selection of points at characteristic scale in scale space
Laplacian
Chacteristic scale :
- maximum in scale space
- scale invariant
Scale invariant interest points
multi-scale Harris points
selection of points
at the characteristic scale
with Laplacian
invariant points + associated regions [Mikolajczyk & Schmid’01]
Harris-Laplacian Feature
Harris detector – adaptation to scale
Evaluation of scale invariant detectors
repeatability – scale changes
SIFT: Overview
• 1999
• Generates image features, “keypoints”
– invariant to image scaling and rotation
– partially invariant to change in illumination and
3D camera viewpoint
– many can be extracted from typical images
– highly distinctive
Algorithm overview
1. Scale-space extrema detection
– Uses difference-of-Gaussian function
2. Keypoint localization
– Sub-pixel location and scale fit to a model
3. Orientation assignment
– 1 or more for each keypoint
4. Keypoint descriptor
– Created from local image gradients
Scale space
• Definition:
L( x, y, ) G( x, y, ) I ( x, y)
1
( x 2 y 2 ) / 2 2
e
where G( x, y, )
2
2
Scale space
• Keypoints are detected using scale-space
extrema in difference-of-Gaussian function D
• D definition:
D( x, y, ) (G ( x, y, k ) G ( x, y, )) I ( x, y )
L( x, y, k ) L( x, y, )
• Efficient to compute
Relationship of D to
G
2
2
• Close approximation to scale-normalized
Laplacian of Gaussian, 2 2G
•
•
G
Diffusion equation: 2G
Approximate ∂G/∂σ: G G( x, y, k ) G( x, y, )
k
– giving,
G ( x, y, k ) G ( x, y, )
2G
k
G( x, y, k ) G( x, y, ) (k 1) 2 2G
• When D has scales differing by a constant factor
it already incorporates the σ2 scale normalization
required for scale-invariance
• e.g., k 2
Scale space construction
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
Scale space
• A collection of images obtained by progressively smoothing the
input image
• Analogous to gradually reducing image resolution
• See Vedaldi’s implementation (http://www.vlfeat.org/ ) for details
• Discretized scales
( s, o) 0 2( s / S o ) , DoG( o,s ) I o ( o,s 1) I o ( o,s )
0 1.6, s 0,, S 1, o omin ,, omin O 1
s : # of scales per octave, O : # of octaves
S 3, O log 2 (min( I w , I h ))
Scale space images
…
…
…
…
…
…
…
…
first octave
second octave
third octave
fourth octave
Difference-of-Gaussian images
…
…
…
…
…
…
…
…
first octave
second octave
third octave
fourth octave
Frequency of sampling
• There is no minimum
• Best frequency determined experimentally
Prior smoothing for each octave
• Increasing σ increases robustness, but costs
• σ = 1.6 a good tradeoff
• Doubling the image initially increases number of
keypoints
Finding extrema
• Sample point D(x,y,σ) is selected only if it is a
minimum or a maximum of these points
Extrema in this image
DoG scale space
Localization
• 3D quadratic function is fit to the local
sample points
• Start with Taylor expansion with sample
point as the origin D() D D 12 D
T
2
T
2
– where
( x, y, )T
• Take the derivative with respect to X, and
D D ˆ
0
X
set it to 0, giving
X X
D
D
• ˆ is the location of the keypoint
• This is a 3x3 linear system
2
2
2
1
2
Localization
D 2 D ˆ
0
X
2
X X
2D
2
2
D
y
2D
x
2D
y
2D
y 2
2D
yx
2D
D
x
D
2D
y
yx
y
x
D
2 D
x
x 2
• Hessian and derivative approximated by finite
differences,
– example:
D Dki ,j1 Dki ,j1
2
2 D Dki ,j1 2 Dki , j Dki ,j1
2
1
2 D ( Dki 11, j Dki 11, j ) ( Dki 11, j Dki 11, j )
y
4
• If X is > 0.5 in any dimension, process repeated
Filtering
• Contrast (use prev. equation):
– If |D(X)| < 0.03, throw it out
ˆ ) D 1 D Xˆ
D(
2
T
• Edge-iness:
– Use ratio of principal curvatures to throw out poorly
defined peaks
D
D
H
D
– Curvatures come from Hessian:
D
– Ratio of Trace(H)2 and Determinant(H)
xx
xy
xy
yy
Tr ( H ) Dxx Dyy
Tr ( H ) 2 ( ) 2
Det ( H ) Dxx Dyy ( Dxy ) , ratio
Det ( H )
ratio > (r+1)2/(r), throw it out (SIFT uses
2
– If
r=10)
Orientation assignment
• Descriptor computed relative to keypoint’s
orientation achieves rotation invariance
• Precomputed along with mag. for all levels
(useful in descriptor computation)
m( x, y ) ( L( x 1, y ) L( x 1, y )) 2 ( L( x, y 1) L( x, y 1)) 2
( x, y ) a tan 2(( L( x, y 1) L( x, y 1)) /( L( x 1, y ) L( x 1, y )))
• Multiple orientations assigned to keypoints from
an orientation histogram
– Significantly improve stability of matching
Keypoint images
Keypoint Selection
Finding extrema
with DoG
Removing
|D(X)| < 0.03
Removing
|D(X)| < 0.03
Select canonical orientation
• Create histogram of local
gradient directions computed
at selected scale
• Assign canonical orientation
at peak of smoothed
histogram
• Each key specifies stable 2D
coordinates (x, y, scale,
orientation)
Descriptor
• Descriptor has 3 dimensions (x,y,θ)
• Orientation histogram of gradient magnitudes
• Position and orientation of each gradient sample rotated
relative to keypoint orientation
Descriptor
• Weight magnitude of each sample point by Gaussian
weighting function
• Distribute each sample to adjacent bins by trilinear
interpolation (avoids boundary effects)
Descriptor
• Best results achieved with 4x4x8 = 128
descriptor size
• Normalize to unit length
– Reduces effect of illumination change
• Cap each element to 0.2, normalize again
– Reduces non-linear illumination changes
– 0.2 determined experimentally
Object detection
• Create a database
of keypoints from
training images
• Match keypoints to
a database
– Nearest neighbor
search
PCA-SIFT
•
•
•
•
Different descriptor (same keypoints)
Apply PCA to the gradient patch
Descriptor size is 20 (instead of 128)
More robust, faster
SIFT: Summary
•
•
•
•
•
•
Scale space
Difference-of-Gaussian
Localization
Filtering
Orientation assignment
Descriptor, 128 elements
Object recognition
• Definition: Identify an object and determine its
pose and model parameters
• Commercial object recognition
– $4 billion/year industry for inspection and assembly
– Almost entirely based on template matching
• Upcoming applications
– Mobile robots, toys, user interfaces
– Location recognition
– Digital camera panoramas, 3D scene modeling
Invariant local features
• Image content => local feature coordinates invariant to translation,
rotation, scale
• Technical details regarding
–
–
–
–
Keypoint matching (using 1st and 2nd nearest neighbors)
Efficient nearest neighbor indexing
Clustering with Hough transform (model location, orientation, scale)
Account for affine distortion
Features
Advantages of invariant local features
• Locality: features are local, so robust to
occlusion and clutter (no prior segmentation)
• Distinctiveness: individual features can be
matched to a large database of objects
• Quantity: many features can be generated for
even small objects
• Efficiency: close to real-time performance
Experimental evaluation
Scale change (factor 2.5)
Harris-Laplace
DoG
Viewpoint change (60 degrees)
Harris-Affine (Harris-Laplace)
Descriptors - conclusion
• SIFT + steerable perform best
• Performance of the descriptor independent of the
detector
• Errors due to imprecision in region estimation,
localization
Image retrieval
…
> 5000
images
change in viewing angle
Matches
22 correct matches
Image retrieval
…
> 5000
images
change in viewing angle
+ scale change
Matches
33 correct matches
Multiple panoramas from an unordered image set
Image registration and blending
Location recognition
Robot localization
• Joint work with Stephen Se, Jim Little
Recognizing panoramas
• Matthew Brown and David Lowe
• Recognize overlap from an unordered set of images and
automatically stitch together
• SIFT features provide initial feature matching
• Image blending at multiple scales hides the seams
Panorama automatically assembled from 143 images
Map continuously built over time
Planar recognition
• Planar surfaces can be
reliably recognized at a
rotation of 60° away from
the camera
• Affine fit approximates
perspective projection
• Only 3 points are needed
for recognition
3D object recognition
• Extract
outlines with
background
subtraction
3D object recognition
• Only 3 keys are needed
for recognition, so extra
keys provide robustness
• Affine model is no longer
as accurate
Recognition under occlusion
Comparison to template matching
• Costs of template matching
– 250,000 locations x 30 orientations x 4 scales = 30,000,000
evaluations
– Does not easily handle partial occlusion and other variation
without large increase in template numbers
– Viola & Jones cascade must start again for each qualitatively
different template
• Costs of local feature approach
– 3000 evaluations (reduction by factor of 10,000)
– Features are more invariant to illumination, 3D rotation, and
object variation
– Use of many small subtemplates increases robustness to partial
occlusion and other variations
More local features
•
•
•
•
•
•
Harris Laplace
Harris Affine
Hessian detector
Hessian Laplace
Hessian Affine
MSER (Maximally Stable Extremal
Regions)
• SURF (Speeded-Up Robust Feature)