Lost in Quantization: Improving Particular Object

Download Report

Transcript Lost in Quantization: Improving Particular Object

Lost in Quantization: Improving
Particular Object Retrieval in Large
Scale Image Databases
CVPR 2008
James Philbin
Ondˇrej Chum
Michael Isard
Josef Sivic
Andrew Zisserman
[7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query
expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.
Outline
•
•
•
•
Introduction
Methods in this paper
Experiment & Result
Conclusion
Outline
•
•
•
•
Introduction
Methods in this paper
Experiment & Result
Conclusion
Introduction
• Goal
– Specific object retrieval from an image database
• For large database
– It’s achieved by systems that are inspired by text
retrieval (visual words).
Flow
1. Get features
– SIFT
2. Cluster
–
Approximate k-means
3. Feature quantization
– Visual word
– Soft-assignment (query)
4. Re-ranked
–
RANSAC
5. Query expansion
–
Average query expansion
Outline
•
•
•
•
Introduction
Methods in this paper
Experiment & Result
Conclusion
Feature
• SIFT
Quantization (visual word)
• Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]
• Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]
8
Soft-assignment of visual words
• Matching two image features in bag-of-visualwords in hard-assignment
– Yes if assigned to the same visual word
– No otherwise
• Sort-assignment
– A weighted combination of visual words
Soft-assignment of visual words
A~E represent cluster centers (visual words)
points 1–4 are features
Soft-assignment of visual words
•
– d is the distance from the cluster center to the
descriptor
• In practice is chosen so that a substantial
weight is only assigned to few cells
• The essential parameters
– the spatial scale
– r, nearest neighbors considered
Soft-assignment of visual words
• the weights to the r nearest neighbors, the
descriptor is represented by an r-vector, which
is then L1 normalized
TF–IDF weighting
• Standard index architecture
•
TF–IDF weighting
• tf
– 100 vocabularies in a document, ‘a’ 3 times
– 0.03 (3/100)
• idf
– 1,000 documents have ‘a’, total number of
documents 10,000,000
– 9.21 ( ln(10,000,000 / 1,000) )
• if-idf = 0.28( 0.03 * 9.21)
TF–IDF weighting
• In this paper
– For the term frequency(tf)
• we simply use the normalized weight value for each
visual word.
– For the inverse document(idf)
• feature measure, we found that counting an occurrence
of a visual word as one, no matter how small its weight,
gave the best results
Re-ranking
• RANSAC
– Affine transform Θ : Y = AX+b
• Algorithm
– 1. Randomly choose n points
– 2. Use n points to find Θ
– 3. Input N-n points to Θ
– 4. How many inlier
– Repeat 1~4 K times
– Pick the best Θ
Re-ranking
• In this paper
– No only counting the number of inlier
correspondences ,but also scoring function,
or cosine =
Average query expansion
•
•
Obtain top (m < 50) verified results of original
query
Construct new query using average of these
results
•
– where d0 is the normalized tf vector of the query
region
– di is the normalized tf vector of the i-th result
• Requery once
Outline
•
•
•
•
Introduction
Methods in this paper
Experiment & Result
Conclusion
Dataset
• Crawled from Flickr & high resolution(1024x768)
• Oxford buildings
– About 5,062 high resolution(1024x768) images
– using 11 landmarks as queries
• Paris
– Used for quantization
– 6,300 images
• Flickr1
– 145 most popular tags
– 99,782 images
Dataset
Dataset
• Query
– 55 queries: 5 queries for each of 11 landmarks
Baseline
• Follow the architecture of previous work [15]
• A visual vocabulary of 1M words is generated
using an approximate k-means
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large
vocabularies and fast spatial matching. In Proc. CVPR, 2007
Evaluation
• Compute Average Precision (AP) score for
each of the 5 queries for a landmark
– Area under the precision-recall curve
• Precision = RPI / TNIR
• Recall = RPI / TNPC
RPI = retrieved positive images
Precision
Recall
TNIR = total number of images retrieved
TNPC = total number of positives in the corpus
• Average these to obtain a Mean Average
Precision (MAP)
24
Evaluation
• Dataset
– Only the Oxford (D1) 5,062 images
– Oxford (D1) + Flickr1 (D2) 104,844 images
• Vector quantizers
– Oxford or Paris
Result
Parameter variation
Comparison with other methods
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large
vocabularies and fast spatial matching. CVPR, 2007.
[14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree.
CVPR, 2006.
[18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular
lattice. ICCV, 2007.
Result
Spatial verification
Effect of vocabulary size
Result
Query expansion
Scaling-up to 100K images
Result
Result
ashmolean_3 goes from 0.626 AP to 0.874 AP
christ_church_5 increases from 0.333 to 0.813 AP
Outline
•
•
•
•
Introduction
Methods in this paper
Experiment & Result
Conclusion
Conclusion
• A new method of visual word assignment was
introduced:
– descriptor-space soft-assignment
• It improves that descriptor lost in the
quantization step of previously published
methods.