Transcript PPT

Paper Overviews
3 types of descriptors:
SIFT / PCA-SIFT (Ke, Sukthankar)
GLOH (Mikolajczyk, Schmid)
DAISY (Tola, et al, Winder, et al)
Comparison of descriptors (Mikolajczyk, Schmid)
Paper Overviews
PCA-SIFT: SIFT-based
but with a smaller descriptor
modifies the SIFT descriptor for robustness and
distinctiveness
GLOH:
DAISY: novel
descriptor that uses graph cuts for matching
and depth map estimation
SIFT
• “Scale Invariant Feature Transform”
• 4 stages:
1.Peak selection
2.Keypoint localization
3.Keypoint orientation
4.Descriptors
SIFT
• “Scale Invariant Feature Transform”
• 4 stages:
1.Peak selection
2.Keypoint localization
3.Keypoint orientation
4.Descriptors
SIFT
• 1. Peak Selection
• Make Gaussian pyramid
http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html
SIFT
• 1. Peak Selection
• Find local peaks using difference of
Gaussians
–- Peaks are found at different scales
http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html
SIFT
• “Scale Invariant Feature Transform”
• 4 stages:
1.Peak selection
2.Keypoint localization
3.Keypoint orientation
4.Descriptors
SIFT
• 2. Keypoint Localization
–Remove peaks that are “unstable”:
» Peaks in low-contrast areas
» Peaks along edges
»
Features not distinguishable
SIFT
• “Scale Invariant Feature Transform”
• 4 stages:
1.Peak selection
2.Keypoint localization
3.Keypoint orientation
4.Descriptors
SIFT
• 3. Keypoint Orientation
• Make histogram of gradients for a patch
of pixels
• Orient all patches so the dominant
gradient direction is vertical
http://www.inf.fu-berlin.de/lehre/SS09/CV/uebungen/uebung09/SIFT.pdf
SIFT
• “Scale Invariant Feature Transform”
• 4 stages:
1.Peak selection
2.Keypoint localization
3.Keypoint orientation
4.Descriptors
SIFT
• 4. Descriptors
• Ideal descriptor:
• Compact
• Distinctive from other descriptors
• Robust against lighting / viewpoint changes
SIFT
• 4. Descriptors
• A SIFT descriptor is a 128-element
vector:
–4x4 array of 8-bin histograms
–Each histogram is a smoothed
representation of gradient orientations of the
patch
PCA-SIFT
• Changes step 4 of the SIFT process to
create different descriptors
• Rationale:
– Construction of SIFT descriptors is
complicated
– Reason for constructing them that way is
unclear
– Is there a simpler alternative?
PCA-SIFT
• “Principal Component Analysis” (PCA)
• A widely-used method of dimensionality
reduction
• Used with SIFT to make a smaller feature
descriptor
–By projecting the gradient patch into a
smaller space
PCA-SIFT
–Creating a descriptor for keypoints:
1.Create patch eigenspace
2.Create projection matrix
3.Create feature vector
PCA-SIFT
–1. Create patch eigenspace
–For each keypoint:
•Take a 41x41 patch around the keypoint
•Compute horizontal / vertical gradients
–Put all gradient vectors for all keypoints into
a matrix
PCA-SIFT
–1. Create patch eigenspace
–M = matrix of gradients for all keypoints
–Calculate covariance of M
–Calculate eigenvectors of covariance(M)
PCA-SIFT
–2. Create projection matrix
–Choose first n eigenvectors
–This paper uses n = 20
–This is the projection matrix
–Store for later use, no need to re-compute
PCA-SIFT
–3. Create feature vector
–For a single keypoint:
•Take its gradient vector, project it with the
projection matrix
•Feature vector is of size n
–This is called Grad PCA in the paper
–“Img PCA” - use image patch instead of
gradient
–Size difference: 128 elements (SIFT) vs. n =
20
PCA-SIFT
–Results
–Tested SIFT vs. “Grad PCA” and “Img PCA” on
a series of image variations:
–Gaussian noise
–45° rotation followed by 50% scaling
–50% intensity scaling
–Projective warp
PCA-SIFT
–Results (Precision-recall curves)
–Grad PCA (black) generally outperforms Img
PCA (pink) and SIFT (purple) except when
brightness is reduced
–Both PCA methods outperform SIFT with
illumination changes
PCA-SIFT
–Results
–PCA-SIFT also gets more matches correct on
images taken at different viewpoints
–
A Performance Evaluation of Local
Descriptors
Krystian Mikojaczyk and Cordilia Schmid
Problem Setting for Comparison

Matching Problem
From a slide of David G. Lowe (IJCV 2004)
As we did in Project2: Panorama, we want to find correct
pairs of points in two images.
Overview of Compared Methods
Region Detector
detects interest points

Region Descriptor
describes the points

Matching Strategy
How to find pairs of points in two images?

Region Detector
Harris Points
 Blob Structure Detector
1. Harris-Laplace Regions (similar to DoG)
2. Hessian-Laplace Regions
3. Harris-Affine Region
4. Hessian-Affine Region
 Edge Detector
Canny Detector

Region Descriptors
Descriptor
Dimension
Category
Distance Measure
SIFT
128
PCA-SIFT
GLOH
36
128
SIFT Based Descriptors
Shape Context
36
Similar to SIFT, but focues on Edge locations with Canny Detector
Spin
50
A sparse set of affine-invariant local patches are used
Steerable Filter
14
Euclidean
Differential Invariants
Complex Filters
Gradient Moments
Cross Correlation
14
1681
20
81
Differential Descriptors
Forcuses on the properties of local derivaties (local jet)
Consists of many fileters
Mahalanobis
Moment based descriptor
Uniformaly sampled locations
Matching Strategy

Threshold-Based Matching

Nearest Neighbor Matching – Threshold
|| DA  DB || threshold

DB: the first neighbor
Nearest Neighbor Matching – Distance Ratio
|| D A  D B ||
 threshold
|| D A  DC ||
DB: the first neighbor
DC: the second neighbor
Peformance Measurements

Repeatability rate, ROC

Recall-Precision
TP (True Positive)
Recall
=
Actual positive
=
TP (True Positive)
Precision =
Predicted positive
=
# of correct maches
Total # of correct matches
# of correct maches
# of correct matches + # of false matches
Example of Recall-Precision
Let's say that our method detected..
* 50 corrsponding pairs were extracted
* 40 detected pairs were correct pairs
* As a groud truth, there are 200 correct pairs!
Then,
Recall = C/B = 40/200 = 20%
C
A
B
Precision = C/A = 40/50 = 80%

Predicted Pos
The perfect descriptor gives 100% recall for any value of Precision!!
Actual pos
DataSet

6 different transformed images
Rotation
Image Blur
JPEG Compression
Zoom + Rotation
Viewpoint Change
Light Change
Matching
Strategies
* Hessian-Affine Regions
Nearnest Neigbor Matching – Threshold
Threshold based Matching
Nearnest Neigbor Matching – Distance Ratio
View Point Change
With Hessian Affine Regions
With Harris-Affine Regions
Scale Change with Rotation
Hessian-Laplace Regions
Harris-Laplace Regions
Image Rotation of 30~45 degree
Harris Points
Image Blur
Hessian Affine Regions
JPEG
Compression
* Hessian-Affine Regions
Illumination
Changes
* Hessian-Affine Regions
Ranking of Descriptor
High Peformance
1. SIFT-based descriptors, 128 dimensions
GLOH, SIFT
2. Shape Context, 36 dimensions
3. PCA-SIFT, 36 dimensions
4. Gradient moments & Steerable Filters
( 20 dimensions ) & ( 14 dimensions)
Low Peformance
5. Other descriptors
Note: This performance is for matching problem. This is not general
performance.
Ranking of Difficult Image
Transformation
easy
1. Scale & Rotation & illumination
easy
difficult
1. Structured Scene
2. Textured Scene
2. JPEG Compression
Two Textured Scenes
3. Image Blur
difficult 4. View Point Change
Other Results





Hessian Regions are better than Harris Regions
Nearnest Neigbor based matching is better than
a simple threshold based matching
SIFT becomes better when nearenest neigbor
distance ration is used
Robust region descriptors peform bettern than
point-wise descriptors
Image Rotation does not have big impact on the
accuracy of descriptors
A Fast Local Descriptor for Dense Matching
Engin Tola, Vincent Lepetit, Pascal Fua
Ecole Polytechnique Federale de Lausanne, Switzerland
Paper novelty
• Introduces DAISY local image descriptor
– much faster to compute than SIFT for dense point matching
– works on the par or better than SIFT
• DAISY descriptors are fed into expectation-maximization (EM) algorithm
which uses graph cuts to estimate the scene’s depth.
– works on low-quality images such as the ones captured by video streams
SIFT local image descriptor
• SIFT descriptor is a 3–D histogram in which two dimensions correspond to
image spatial dimensions and the additional dimension to the image
gradient direction (normally discretized into 8 bins)
SIFT local image descriptor
• Each bin contains a weighted sum of the norms of the image gradients
around its center, where the weights roughly depend on the distance to the
bin center
DAISY local image descriptor
• Gaussian convolved orientation maps are calculated for every direction
(.)+
: Gaussian convolution filter with variance S
: image gradient in direction o
: operator (a)+ = max(a, 0)
: orientation maps
• Every location in
contains a value very similar to what a bin in SIFT
contains: a weighted sum computed over an area of gradient norms
DAISY local image descriptor
DAISY local image descriptor
I.
Histograms at every pixel location are computed
: histogram at location (u, v)
: Gaussian convolved orientation maps
II. Histograms are normalized to unit norm
III. Local image descriptor is computed as
: the location with distance R from (u,v)
in the direction given by j when the
directions are quantized into N values
From Descriptor to Depth Map
• The model uses EM to estimate depth map Z and occlusion map O by
maximizing
: descriptor of image n
Results
Results
Results
Picking the Best Daisy
Simon Winder, Gang Hua, Matthew Brown
Paper Contribution
• Utilize novel ground-truth training set
• Test multiple configurations of low-level filters and DAISY pooling and
optimize over their parameter
• Investigate the effects of robust normalization
• Apply PCA dimension reduction and dynamic range reduction to compress
the representation of descriptors
• Discuss computational efficiency and provide a list of recommendations
for descriptors that are useful in different scenarios
Descriptor Pipeline
• T-block takes the pixels from the image patch and transforms them to
produce a vector of k non-linear filter responses at each pixel.
– Block T1 involves computing gradients at each pixel and bilinearly quantizing the
gradient angle into k orientation bins as in SIFT
– Block T2 rectifies the x and y components of the gradient to produce a vector of length
4:
– Block T3 uses steerable filters evaluated at a number of different orientations
Descriptor Pipeline
• S-block spatially accumulates weighted filter vectors to give N linearly
summed vectors of length k and these are concatenated to form a
descriptor of kN dimensions.
Descriptor Pipeline
• S-block spatially accumulates weighted filter vectors to give N linearly
summed vectors of length k and these are concatenated to form a
descriptor of kN dimensions.
Descriptor Pipeline
• N-block normalizes the complete descriptor to provide invariance to
lighting changes. Use a form of threshold normalization with the following
stages
– Normalize the descriptor to a unit vector
– Clip all the elements of the vector that are above a threshold
– Scale the vector to a byte range.
by computing
Descriptor Pipeline
• Dimension reduction. Apply principle components analysis to compress
descriptor.
– First optimize the parameters of the descriptor and then compute the matrix of principal
components base on all descriptors computed on the training set.
– Next find the best dimensionality for reduction by computing the error rate on random
subsets of the training data.
– Progressively increasing the dimensionality by adding PCA bases until minimum error is
found.
Descriptor Pipeline
• Quantization further compress descriptor to reduce memory requirement
for large database of descriptor by quantizing descriptor elements into L
levels.
Training
• Use 3D reconstructions as a source of training data.
• Use machine learning approach to optimize parameters.
Results
• Gradient-based descriptor
Results
• Dimension Reduction
Results
• Descriptor Quantization