Towards efficient retrieval and regression with random

Download Report

Transcript Towards efficient retrieval and regression with random

Towards efficient matching with random
hashing methods…
Kristen Grauman
Gregory Shakhnarovich
Trevor Darrell
MIT CSAIL
Vision interfaces
Motivation: Content-based image retrieval
Query
Data set of 30 scenes in Boston
• 1,079 database images
• 89 query images
Features:
• Harris-Affine detector
(max m=3,595)
• MSER detector
(max m=1,707)
MIT CSAIL
Vision interfaces
• SIFT-PCA descriptors
Content-based image retrieval
Even this is far too slow for
any web-scale application!
Accuracy
Pyramid match:
~1 second / query
Optimal match:
~2 hours / query
Number top retrievals
MIT CSAIL
Vision interfaces
Sub-linear time image search
N
Randomized hashing
techniques useful for
sub-linear query time
of very large image
databases
Linear scan
0110101
h
0110111
0111101
<< N
MIT CSAIL
Vision interfaces
Pyramid match hashing
• For fixed-size sets, Locality-Sensitive
Hashing [Indyk & Motwani 1998] provides
bounded approximate similarity search over
bijective matching [Indyk & Thaper 2003];
[Grauman & Darrell CVPR 2004, 2005]
• For varying set sizes, embedding of
pyramid match (with product normalization)
makes random hyperplane hashing possible
under set intersection hash family of
[Charikar 2002]. [Grauman PhD 2006]
MIT CSAIL
Vision interfaces
MIT CSAIL
Vision interfaces
MIT CSAIL
Vision interfaces
MIT CSAIL
Vision interfaces
Single Frame Pose Estimation via
Approximate Nearest Neighbor regression
• Obtain large DB of pose-appearance mappings
• Exploit fast methods for approximate nearest
neighbor search in high dim. spaces. (e.g., LSH
[Indyk and Motwani ‘98-’00].)
MIT CSAIL
Vision interfaces
Approximate nearest neighbor techniques
Hash
fcns.
input
Rendered (& hashed) …
Pose
DB
…
…
similar examples fall into same bucket
MIT
in one or more hash table
CSAIL
Vision interfaces
Single Frame Pose Estimation via
Approximate Nearest Neighbor regression
• Render large DB of pose-appearance mappings
• Exploit fast methods for approximate nearest
neighbor search in high dim. spaces. (e.g., LSH
[Indyk and Motwani ‘98-’00].)
Problem: signal distance dominated by nuisance
variables
Idea: find embedding (i.e., hash functions for LSH)
most relevant to parameter (pose) similarity…
[Shakhnarovich et. al ’03, Shakhnarovich ‘05]
MIT CSAIL
Vision interfaces
Pose estimation and Similarity-sensitive
hashing
Posesensitive
Hash
fcns.
input
Rendered (& hashed)
Pose
DB
…
…
…
NN similar in pose, not image
MIT CSAIL
Vision interfaces
[Shakhnarovich et. al ’03, Shakhnarovich
‘05]
SSE / BoostPro
Similarity Sensitive Embedding
-
Compute embedding H: I  {0, 1}N such that
| H(I(1)) - H(I(2)) | is small if 1 is close to 2
| H(I(1)) - H(I(2)) | is large otherwise
Use the embedding with approximate nearest
neighbors retrieval (LSH)
- Find H by training boosted classifier to learn
“same-pair” and concatenate resulting weak
MIT
CSAIL
learners …
Vision interfaces
[Shakhnarovich
2005]
-
PSH results
~200,000 examples in DB; 2 sec
MIT CSAIL
Vision
interfaces
[Shakhnarovich et al.
2003,
2005]
Conclusions
• Random Hashing techniques allow broad search;
well suited for very high dimensional spaces
• Useful in domains where there is no prior knowledge
about how to cluster or model data…
• Similarity (parameter) sensitive hashing can find
distance related to task…effectively learn problem
dependent distance measure and efficient means to
index.
MIT CSAIL
Vision interfaces