Bluetooth Wireless Enabled Product in Retail Environment

Download Report

Transcript Bluetooth Wireless Enabled Product in Retail Environment

Automatic Matching of Multi-View
Images
Ed Bremer
University of Rochester
References


[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
October 2004,
http://lear.inrialpes.fr/pubs/2004/MS04a
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool,
L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision,
August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of
Computer Vision, 60, 2 (2004), pp. 91-118.

[4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable
Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393.

[5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or
”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision,
Copenhagen, Denmark, pages 414-431, vol 1.

[6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages
774-781.

[7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8 th ICCV,
pages 525-531.
Automatic Matching of Multi-View
Images
2
Outline

Motivation

Applications

Process Components

Region Detectors

Descriptors

Matching Criteria

Performance Evaluation

Conclusion & Next Steps
Automatic Matching of Multi-View
Images
3
Motivation

Multi-view/Multi-image Matching
Multiple images of scene taken by single or multiple cameras
with different rotation, scale, viewpoint and illumination
3D scene
Automatic Matching of Multi-View
Images
4
Motivation

Applications
… detecting matching regions is used in all the following

Image registration

Super-resolution

Stereo vision

Object detection and recognition

Object and motion tracking

Indexing and retrieval of objects

3D scene reconstruction

Scene recognition
Automatic Matching of Multi-View
Images
5
Examples of Multi-view Images
[2]
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region
detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
6
Process Components

Covariant region detection


Invariant descriptor


Detect image regions covariant to class of transformation
between reference image and transformed image
Compute invariant descriptors from covariant regions
Descriptor matching

Compute distance between descriptors in reference image and
transformed image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
7
Region Detectors

Support regions for computation of descriptors






Determined independently in each image
Scale invariant or Affine invariant
Can be points (feature points) or regions (covariant)
Provide dense (local) coverage – robust to occlusion
Need to be stable and repeatable
Five region detectors 




Harris points -> invariant to rotation
Harris-Laplacian -> invariant to rotation and scale
Hessian-Laplace ->invariant to rotation and scale
Harris-Affine -> invariant to affine image transformations
Hessian-Affine -> invariant to affine image transformations
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
8
Region Detectors

Harris points 


Maxima of Harris function used to locate interest point
Support region fixed in size, 41x41 neighborhood centered at interest point
Harris-Laplace regions 

Scale adapted Harris function
Interest point is local minima or maxima across scale-space by Laplacian-ofGaussian
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
9
Region Detectors

Harris-Laplace Performance 

Approximately 10% better than Laplacian, Lowe or gradient
methods.
Harris standard detector is very poor under scale changes
[7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531.
Automatic Matching of Multi-View
Images
10
Region Detectors

Hessian-Laplace regions 
Interest point is at local maxima of Hessian determinant

Location in scale-space using maxima of Laplacian-of-Gaussian (can also use
Difference-of-Gaussians)
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91118.
Automatic Matching of Multi-View
Images
11
Region Detectors

Harris-Affine regions 


Find regions using Harris-Laplace detector
Region based on 2nd moment & affine adapted
Hessian-Affine regions 

Find regions using Hessian-Laplace detector
Affine adapted region based on 2nd moment.
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine
region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
12
Region Detectors

Regions produced by Harris-Affine and Hessian-Affine detectors
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors,
Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
13
Region Detectors

Affine normalization using 2nd moment matrix for region L and R
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors,
Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
14
Region Detectors

Region normalization





Detectors produce circular or elliptical regions
Size dependant on detection scale
Map regions to circular region with constant radius
Rotate regions in direction of dominant gradient orientation
Illumination normalization


Use affine transformation -> aI(x) + b
Mean and standard deviation of pixel intensities
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
15
Descriptors

Descriptors -> Feature vector




Invariant to changes in scale, rotation, affine translation and affine
illumination
Need to be distinct, stable and repeatable
Distribution (histogram) type or Covariance type
Ten Descriptor types










Scale-Invariant Feature Transform (SIFT)
Gradient Location and Orientation histogram (GLOH)
Shape Context
Principal Component Analysis (PCA)-SIFT
Steerable Filters
Differential Invariants
Complex Filters
Moment Invariants
Cross-Correlation
Spin Image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
16
Descriptors

SIFT and GLOH 3D Descriptors


SIFT -> 4 x 4 x 8 = 128 dimension descriptor
GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
17
Matching Criteria

Distance measure





Simple threshold



Find putative matches between images
Mahalanobis distance – used for covariant descriptors
Euclidean distance – used for distribution (histogram) descriptors
Direct distance comparison not suitable for indexing or database searching
Descriptors match if distance between is below threshold t
Descriptor in reference image can have many matches to descriptors in
transformed image
Nearest Neighbor (NN)


Find closest match between descriptors in reference and transformed image
Descriptor in reference image can have only 1 match to descriptor in
transformed image
Automatic Matching of Multi-View
Images
18
Performance Evaluation

Criterion basis



Recall rate = #correct matched/#correspondences
1-precision = #false matches/[#correct matches + #false matches]
Ideal descriptor -> recall rate = 1, for all precision given no overlap error
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
19
SIFT - Scale Invariant Feature Transform

Scale Invariant Feature Transform (SIFT) Lowe

Features –



[3]
Invariant to image scale, rotation
Invariant for small changes in illumination and 3D camera viewpoint
Extracts large number of highly distinctive features


Enables detection of small objects
Improved performance in cluttered scenes

Algorithms are efficient – complex operations applied to local
regions or features vs whole image

Procedure




Scale-space extrema detection
Keypoint localization
Orientation asignment
Keypoint vector (descriptor)
Automatic Matching of Multi-View
Images
20
SIFT - Scale Invariant Feature Transform

[3]
Scale-Space Blob Detector 
Search for stable features over all scales and image locations
Scale-space kernel -> Gaussian function

Difference of Gaussian

Automatic Matching of Multi-View
Images
21
SIFT - Scale Invariant Feature Transform

[3]
Difference of Gaussian (DoG)

simple subtraction of blurred L images

Approximation to scale-normalized Laplacian of Gaussian
Maxima or minima of scale-normalized Laplacian produces the most
stable image features compared to gradient, Hessian, or Harris corner
function (Mikolajczyk 2002)
Automatic Matching of Multi-View
Images
22
SIFT - Scale Invariant Feature Transform

[3]
Scale-Space Image Set 
Divide each octave into s intervals

Compute s + 3 filtered (increasing blurry) images, k = 2(1/s)
s = 3, k = 1.26
->
6th –> 3.18σ
5th –> 2.52σ
4th –> 2.00σ
3rd –> 1.59σ
2nd –> 1.26σ
1st –> 1.00σ

Subtract adjacent images to produce DoG images

Repeat for next octave using 2nd image from top and
decimate by 2
Automatic Matching of Multi-View
Images
23
SIFT - Scale Invariant Feature Transform

[3]
Scale-Space Pyramid (from Lowe)
Automatic Matching of Multi-View
Images
24
SIFT - Scale Invariant Feature Transform

[3]
Locating Scale-Space Extrema 



Detection of local maxima or minima of D(x, y, σ)
Compare each sample point to 8 neighbors in same scale
image and 9 neighbors in scale image above and below.
Mark if sample is greater than or less than all of the
neighbors
Compares s number of DoG images
Automatic Matching of Multi-View
Images
25
SIFT - Scale Invariant Feature Transform

[3]
Improving Localization 
Reject points that have low contrast using:
<threshold

Where –>

Gives offset extremum ->

Hessian and derivative of D(x, y, σ) uses differences of neighboring sample
points. x = (x, y , σ)T is offset from sample point
Automatic Matching of Multi-View
Images
26
SIFT - Scale Invariant Feature Transform

[3]
Edge Rejection 
Eliminate poorly defined peaks (edges) using Hessian matrix

Verify ratio of principal curves is less than threshold r<10

Efficient to compute -> less than 20 floating point operations
Automatic Matching of Multi-View
Images
27
SIFT - Scale Invariant Feature Transform

Results from Lowe
[3]
[3]
– 832 keypoints reduced to 536 (233x189 image)
Automatic Matching of Multi-View
Images
28
SIFT - Scale Invariant Feature Transform

Results from Lowe
[3]
– performance measures
Automatic Matching of Multi-View
Images
29
SIFT - Scale Invariant Feature Transform

Results from Lowe
[3]
– performance measures
Automatic Matching of Multi-View
Images
30
SIFT - Scale Invariant Feature Transform

[3]
Orientation – rotational invariance

Use scale of point to select image L(x, y, σ)

Compute the gradient m(x, y) and orientation θ(x, y) at each image
sample using differences.


Orientation histogram of sample points – entries weighted by
gradient magnitude and a Gaussian window around the keypoint,
bins cover 360° range
Peaks in histogram correspond to dominant directions of local
gradients
Automatic Matching of Multi-View
Images
31
SIFT - Scale Invariant Feature Transform

[3]
Descriptor – the feature vector



8x8 sub-region histograms allow shift in gradient positions
128 element feature vector -> 4x4 array of 8 orientations
(2x2x8 from Lowe is shown below)
Feature vectors matched by nearest neighbor (Euclidean distance)
Automatic Matching of Multi-View
Images
32
SIFT - Scale Invariant Feature Transform

Results from Lowe



[3]
[3]
–
Two training objects recognized in cluttered image
Small squares show point matches
Large rectangles shown border of training image after affine transformation
Automatic Matching of Multi-View
Images
33
Conclusions


Conclusions

Harris-Laplacian region detector performs better than Laplacian, DoG and gradient
scale-space operators

Scale-space detectors provide invariance to rotation, scale and small changes to
illumination and viewpoint.

Affine adaptation provides invariance to affine transformations

GLOH and SIFT descriptors provide the best performance.

Dense, localized descriptors perform well under occlusions
Nexts steps

Coding and testing of region detectors, descriptors and matching…
Automatic Matching of Multi-View
Images
34