Bluetooth Wireless Enabled Product in Retail Environment
Download
Report
Transcript Bluetooth Wireless Enabled Product in Retail Environment
Automatic Matching of Multi-View
Images
Ed Bremer
University of Rochester
References
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
October 2004,
http://lear.inrialpes.fr/pubs/2004/MS04a
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool,
L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision,
August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of
Computer Vision, 60, 2 (2004), pp. 91-118.
[4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable
Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393.
[5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or
”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision,
Copenhagen, Denmark, pages 414-431, vol 1.
[6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages
774-781.
[7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8 th ICCV,
pages 525-531.
Automatic Matching of Multi-View
Images
2
Outline
Motivation
Applications
Process Components
Region Detectors
Descriptors
Matching Criteria
Performance Evaluation
Conclusion & Next Steps
Automatic Matching of Multi-View
Images
3
Motivation
Multi-view/Multi-image Matching
Multiple images of scene taken by single or multiple cameras
with different rotation, scale, viewpoint and illumination
3D scene
Automatic Matching of Multi-View
Images
4
Motivation
Applications
… detecting matching regions is used in all the following
Image registration
Super-resolution
Stereo vision
Object detection and recognition
Object and motion tracking
Indexing and retrieval of objects
3D scene reconstruction
Scene recognition
Automatic Matching of Multi-View
Images
5
Examples of Multi-view Images
[2]
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region
detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
6
Process Components
Covariant region detection
Invariant descriptor
Detect image regions covariant to class of transformation
between reference image and transformed image
Compute invariant descriptors from covariant regions
Descriptor matching
Compute distance between descriptors in reference image and
transformed image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
7
Region Detectors
Support regions for computation of descriptors
Determined independently in each image
Scale invariant or Affine invariant
Can be points (feature points) or regions (covariant)
Provide dense (local) coverage – robust to occlusion
Need to be stable and repeatable
Five region detectors
Harris points -> invariant to rotation
Harris-Laplacian -> invariant to rotation and scale
Hessian-Laplace ->invariant to rotation and scale
Harris-Affine -> invariant to affine image transformations
Hessian-Affine -> invariant to affine image transformations
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
8
Region Detectors
Harris points
Maxima of Harris function used to locate interest point
Support region fixed in size, 41x41 neighborhood centered at interest point
Harris-Laplace regions
Scale adapted Harris function
Interest point is local minima or maxima across scale-space by Laplacian-ofGaussian
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
9
Region Detectors
Harris-Laplace Performance
Approximately 10% better than Laplacian, Lowe or gradient
methods.
Harris standard detector is very poor under scale changes
[7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531.
Automatic Matching of Multi-View
Images
10
Region Detectors
Hessian-Laplace regions
Interest point is at local maxima of Hessian determinant
Location in scale-space using maxima of Laplacian-of-Gaussian (can also use
Difference-of-Gaussians)
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91118.
Automatic Matching of Multi-View
Images
11
Region Detectors
Harris-Affine regions
Find regions using Harris-Laplace detector
Region based on 2nd moment & affine adapted
Hessian-Affine regions
Find regions using Hessian-Laplace detector
Affine adapted region based on 2nd moment.
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine
region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
12
Region Detectors
Regions produced by Harris-Affine and Hessian-Affine detectors
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors,
Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
13
Region Detectors
Affine normalization using 2nd moment matrix for region L and R
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors,
Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View
Images
14
Region Detectors
Region normalization
Detectors produce circular or elliptical regions
Size dependant on detection scale
Map regions to circular region with constant radius
Rotate regions in direction of dominant gradient orientation
Illumination normalization
Use affine transformation -> aI(x) + b
Mean and standard deviation of pixel intensities
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
15
Descriptors
Descriptors -> Feature vector
Invariant to changes in scale, rotation, affine translation and affine
illumination
Need to be distinct, stable and repeatable
Distribution (histogram) type or Covariance type
Ten Descriptor types
Scale-Invariant Feature Transform (SIFT)
Gradient Location and Orientation histogram (GLOH)
Shape Context
Principal Component Analysis (PCA)-SIFT
Steerable Filters
Differential Invariants
Complex Filters
Moment Invariants
Cross-Correlation
Spin Image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
16
Descriptors
SIFT and GLOH 3D Descriptors
SIFT -> 4 x 4 x 8 = 128 dimension descriptor
GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
17
Matching Criteria
Distance measure
Simple threshold
Find putative matches between images
Mahalanobis distance – used for covariant descriptors
Euclidean distance – used for distribution (histogram) descriptors
Direct distance comparison not suitable for indexing or database searching
Descriptors match if distance between is below threshold t
Descriptor in reference image can have many matches to descriptors in
transformed image
Nearest Neighbor (NN)
Find closest match between descriptors in reference and transformed image
Descriptor in reference image can have only 1 match to descriptor in
transformed image
Automatic Matching of Multi-View
Images
18
Performance Evaluation
Criterion basis
Recall rate = #correct matched/#correspondences
1-precision = #false matches/[#correct matches + #false matches]
Ideal descriptor -> recall rate = 1, for all precision given no overlap error
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View
Images
19
SIFT - Scale Invariant Feature Transform
Scale Invariant Feature Transform (SIFT) Lowe
Features –
[3]
Invariant to image scale, rotation
Invariant for small changes in illumination and 3D camera viewpoint
Extracts large number of highly distinctive features
Enables detection of small objects
Improved performance in cluttered scenes
Algorithms are efficient – complex operations applied to local
regions or features vs whole image
Procedure
Scale-space extrema detection
Keypoint localization
Orientation asignment
Keypoint vector (descriptor)
Automatic Matching of Multi-View
Images
20
SIFT - Scale Invariant Feature Transform
[3]
Scale-Space Blob Detector
Search for stable features over all scales and image locations
Scale-space kernel -> Gaussian function
Difference of Gaussian
Automatic Matching of Multi-View
Images
21
SIFT - Scale Invariant Feature Transform
[3]
Difference of Gaussian (DoG)
simple subtraction of blurred L images
Approximation to scale-normalized Laplacian of Gaussian
Maxima or minima of scale-normalized Laplacian produces the most
stable image features compared to gradient, Hessian, or Harris corner
function (Mikolajczyk 2002)
Automatic Matching of Multi-View
Images
22
SIFT - Scale Invariant Feature Transform
[3]
Scale-Space Image Set
Divide each octave into s intervals
Compute s + 3 filtered (increasing blurry) images, k = 2(1/s)
s = 3, k = 1.26
->
6th –> 3.18σ
5th –> 2.52σ
4th –> 2.00σ
3rd –> 1.59σ
2nd –> 1.26σ
1st –> 1.00σ
Subtract adjacent images to produce DoG images
Repeat for next octave using 2nd image from top and
decimate by 2
Automatic Matching of Multi-View
Images
23
SIFT - Scale Invariant Feature Transform
[3]
Scale-Space Pyramid (from Lowe)
Automatic Matching of Multi-View
Images
24
SIFT - Scale Invariant Feature Transform
[3]
Locating Scale-Space Extrema
Detection of local maxima or minima of D(x, y, σ)
Compare each sample point to 8 neighbors in same scale
image and 9 neighbors in scale image above and below.
Mark if sample is greater than or less than all of the
neighbors
Compares s number of DoG images
Automatic Matching of Multi-View
Images
25
SIFT - Scale Invariant Feature Transform
[3]
Improving Localization
Reject points that have low contrast using:
<threshold
Where –>
Gives offset extremum ->
Hessian and derivative of D(x, y, σ) uses differences of neighboring sample
points. x = (x, y , σ)T is offset from sample point
Automatic Matching of Multi-View
Images
26
SIFT - Scale Invariant Feature Transform
[3]
Edge Rejection
Eliminate poorly defined peaks (edges) using Hessian matrix
Verify ratio of principal curves is less than threshold r<10
Efficient to compute -> less than 20 floating point operations
Automatic Matching of Multi-View
Images
27
SIFT - Scale Invariant Feature Transform
Results from Lowe
[3]
[3]
– 832 keypoints reduced to 536 (233x189 image)
Automatic Matching of Multi-View
Images
28
SIFT - Scale Invariant Feature Transform
Results from Lowe
[3]
– performance measures
Automatic Matching of Multi-View
Images
29
SIFT - Scale Invariant Feature Transform
Results from Lowe
[3]
– performance measures
Automatic Matching of Multi-View
Images
30
SIFT - Scale Invariant Feature Transform
[3]
Orientation – rotational invariance
Use scale of point to select image L(x, y, σ)
Compute the gradient m(x, y) and orientation θ(x, y) at each image
sample using differences.
Orientation histogram of sample points – entries weighted by
gradient magnitude and a Gaussian window around the keypoint,
bins cover 360° range
Peaks in histogram correspond to dominant directions of local
gradients
Automatic Matching of Multi-View
Images
31
SIFT - Scale Invariant Feature Transform
[3]
Descriptor – the feature vector
8x8 sub-region histograms allow shift in gradient positions
128 element feature vector -> 4x4 array of 8 orientations
(2x2x8 from Lowe is shown below)
Feature vectors matched by nearest neighbor (Euclidean distance)
Automatic Matching of Multi-View
Images
32
SIFT - Scale Invariant Feature Transform
Results from Lowe
[3]
[3]
–
Two training objects recognized in cluttered image
Small squares show point matches
Large rectangles shown border of training image after affine transformation
Automatic Matching of Multi-View
Images
33
Conclusions
Conclusions
Harris-Laplacian region detector performs better than Laplacian, DoG and gradient
scale-space operators
Scale-space detectors provide invariance to rotation, scale and small changes to
illumination and viewpoint.
Affine adaptation provides invariance to affine transformations
GLOH and SIFT descriptors provide the best performance.
Dense, localized descriptors perform well under occlusions
Nexts steps
Coding and testing of region detectors, descriptors and matching…
Automatic Matching of Multi-View
Images
34