Object retrieval with large vocabularies and fast spatial

Download Report

Transcript Object retrieval with large vocabularies and fast spatial

Object retrieval with large
vocabularies and fast spatial
matching
James Phibin1, Ondrej Chum1, Michael Isard2,Josef Sivic1,
and Andrew Zisserman1
1Department of Engineering Science, 2University of Oxford
Microsoft Research,Silicon Valley
CVPR 2007
Overview
• Problem
– Input: a user-selected region of a query image
– Return: a ranked list of images retrieved from a large
corpus.
• Containing the same object
• Objective
– a promising step towards “web-scale” image corpora
• Improvement
– Improving the visual vocabulary
– Incorporating spatial information into the ranking
– Examples
Datasets
• Source
– Flickr
• Oxford 5K dataset
– “Oxford Christ Church,” “Oxford
Radcliffe Camera,”… with
“Oxford”
– 5,062 (1,024*768) images
• 100K dataset
– 145 most popular tags
– 99,782 (1,024*768) images
• 1M dataset
– 450 most popular tags
– 1,040,801 (500*333) images
Indexing the dataset
• Image description
– Affine-invariant Hessian regions
• 3,300 regions on a 1,024*768 image
– SIFT descriptor
2×2 8-direction gradient histogram
• 128-D
– 4×4× 8-direction gradient histogram
• Model
– bag-of-visual-words
• Quantize the visual descriptors to index the image
• Search engine
– L2 distance as similarity
– tf-idf weighting scheme
• more commonly occurring = less discriminative = smaller weight
Train the Dictionary
K-mean
Approximate k-mean (AKM)
Hierarchical k-mean (HKM)
2D k-d tree
AKM v.s.HKM
•
Traditional k-mean
– single iteration
• O(NK)
•
Strategy
– Reduce the number of candidates of nearest cluster heads
– AKM
• Approximate nearest neighbor
– replace the exact computing nearest neighbors with
» 8 randomized k-d tree of cluster heads
• Less than 1% of points are assigned differently from k-mean for moderate values of K
– HKM
• “vocabulary tree”
– A small number (K=10) of cluster centers at each level
– Kn clusters at the n-th level
•
Quantization effect
– AKM
• Conjunction of trees
– Overlapping partition
– HKM
• Points can additionally be assigned to some internal nodes
Comparing vocabularies
K-mean v.s. AKM
HKM v.s.AKM
Scaling up with AKM
Ground Truth
• Dataset
– 5K dataset
• Searching
– Manual
– Entire
– For 11 landmarks
• Labels
– Positive
• Good: nice, clear
• OK: more than 25% of the object
– Null
• Junk: less than 25%
– Negative
• Absent: object not present
5 queries for each landmark
Evaluation
• Precision
– # of retrieved positive images / # of total retrieved
images
• Recall
– # of retrieved positive images / # of total positive
images
• Average precision (AP)
– The area under the precision-recall curve for a query
• Mean average precision (mAP)
• Average AP for each of the 5 queries for a landmark
• Final mAP = average for mAP for each landmark
K-mean v.s. AKM
HKM v.s.AKM
Recognition Benchmark
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages
2161-2168, June 2006.
Scaling up with AKM
Spatial re-ranking
Use Spatial Info.
•
Usage
–
•
Re-ranking the top ranked results
Procedure
1. Estimate a transformation for each target image
2. Refine the estimations
–
Reduce the errors due to outliers
–
LO-RANSAC
» RANdom SAmple Consensus
» Additional modeL Optimization step
3. Re-rank target images
–
–
Scoring target images to the sum of the idf value for the
inlier words
Verified images above unverified images
Restricted transformation
• Degree of freedom
– 3 dof
• Isotropic scale
• Covering the changes in zoom or
distance
– 4 dof
foreshorten
(perspective)
• Anisotropic scale
• Covering foreshortening, either horizontal
or vertical
– 5 dof
• Anisotropic scale and vertical shear
• NOT
– In-plane rotation
shear
Comparing spatial rankings
Different transformation types
Large datasets
Examples
Examples of errors
Different transformation types
Large datasets
Examples
Examples of errors
Conclusion
• Conclusion
– Scalable visual object-retrieval system
• Future work
– More evaluation for higher scale
– Including spatial info. into the index
– Moving some of the burden of spatial
matching to the first ranking stage
RANSAC
http://en.wikipedia.org/wiki/RANS
AC
RANSAC example