We take picture everyday/everywhere

Download Report

Transcript We take picture everyday/everywhere

Recent Advances of Compact Hashing
for Large-Scale Visual Search
Shih-Fu Chang
Columbia University
October 2012
Joint work with Junfeng He (Facebook), Sanjiv Kumar (Google),
Wei Liu (IBM Research), and Jun Wang (IBM Research)
Outline

Lessons learned in designing hashing functions


The importance of balancing hash bucket size
How to incorporate supervised information

Prediction of NN search difficulty & hashing
performance

Demo: Bag of hash bits for Mobile Visual Search
digital video | multimedia lab
Fast Nearest Neighbor Search
• Applications: image search, texture synthesis, denoising …
• Avoid exhaustive search (
time complexity)
Image search
Dense matching, Coherence sensitive hashing
(Korman&Avidan ’11)
Photo tourism patch search
3
Locality-Sensitive Hashing
[Indyk, and Motwani 1998] [Datar et al. 2004]
101
0
1
Index by compact code
hash function
0
1
1
random
0
• hash code collision probability proportional to original similarity
4
l: # hash tables, K: hash bits per table
Hash Table based Search
hash table
n
hash bucket
address
01100
01101
01110
xi
01111
q
01101
5
• O(1) search time by table lookup
• bucket size is important (affect accuracy & post processing cost)
Different Approaches
Unsupervised
Hashing
Semi-Supervised
Hashing
Supervised
Hashing
LSH ‘98, SH ‘08, KLSH ‘09,
AGH ’10, PCAH, ITQ ‘11
SSH ‘10, WeaklySH ‘10
RBM ‘09, BRE ‘10,
MLH ‘11, LDAH ’11,
ITQ ‘11, KSH ‘12
6
PCA + Minimize Quantization Errors
ITQ method, Gong&Lazebnik, CVPR 11
• PCA to maximize variance in each hash dimension
• find optimal rotation in the subspace to minimize
quantization error
Effects of Min Quantization Errors
• 580K tiny images
PCA-ITQ, Gong&Lazebnik, CVPR 11
PCA-random rotation
PCA-ITQ optimal alignment
Utilize supervised labels
Semantic Category Supervision
Metric Supervision
similar
dissimilar
dissimilar
dissimilar
similar
9
Design Hash Codes to Match Supervised
Information
similar
dissimilar
1
0
• Preferred hashing function
10
Adding Supervised Labels to PCA Hash
similar pair
dissimilar pair
Relaxation:
Fitting labels
PCA covariance matrix
“adjusted” covariance matrix
• solution W: eigen vectors of adjusted covariance matrix
• If no supervision (S=0), it is simply PCA hash
Wang, Kumar, Chang, CVPR ’10, ICML’10
Semi-Supervised Hashing (SSH)
1 Million GIST Images
1% labels, 99% unlabeled
Precision @ top 1K
SSH
Supervised RBM
Unsupervised SH
Random LSH
Problem of orthogonal projections
Precision @ hamming radius 2
• Many buckets become empty
when # bits increases.
• Need to search many neighbor
buckets at query time
ICA Type Hashing
• Explicitly optimize two terms
SPICA Hash, He et al, CVPR 11
– Preserve similarity (accuracy)
– Balanced bucket size  max entropy  min mutual info I (search time)
D(Y ) 
N
W
p , q 1
pq
|| Yp  Yq ||2  
min I ( y1 ,..., yk ,..., yM )
N
while E ( y )   Yp  0
p 1
Search accuracy
Balanced bucket size
Fast ICA to find
non-orthogonal
projections
The Importance of balanced size
Simulation over 1M tiny image samples
Bucket size
The largest bucket of LSH
contains 10% of all 1M samples
LSH
SPICA Hash
Balanced bucket size
Bucket index
Different Approaches
Unsupervised
Hashing
Semi-Supervised
Hashing
Supervised
Hashing
LSH ‘98, SH ‘08, KLSH ‘09,
AGH ’10, PCAH, ITQ ‘11
SSH ‘10, WeaklySH ‘10
RBM ‘09, BRE ‘10,
MLH ‘11, LDAH ’11,
ITQ ‘11, KSH ‘12
16
Better ways to handle
supervised information?
BRE [Kulis & Darrell, ‘10]
Hamming distance between
H(xi) and H(xj)
MLH [Norouzi & Flee, ‘11]
hinge loss
But optimizing Hamming Distance (DH, XOR) is not easy!
17
A New Supervision Form:
Code Inner Products
labeled data
x1 similar
Liu, Wang, Ji, Jiang,
Chang, CVPR’12
code inner products
x2
code matrix
supervised
hashing
x1
x2
x3
code matrix
1
-1
1
1
-1
1
-1
1
-1
Х
1
1
1
-1
-1
1
1
1
-1
T
r
fitting
pair-wise label matrix
x1
x2
x3
x3
proof:
1
1
-1
1
1
-1
-1
-1
1
x1
x2
x3
S
code inner product ≡ Hamming distance
18
Code Inner Product
enables efficient optimization
Hashing:
Liu, Wang, Ji, Jiang,
Chang, CVPR2012
hash bit
sample
Design hash codes
to match
supervised
information
• Much easier/faster to optimize and extend to
kernels
19
Extend Code Inner Product to Kernel
• Following KLSH, construct a hash function using a kernel
function
and m anchor samples:
zero-mean normalization applied to k(x).
kernel matrix hash coefficients
1
-1
1
1
-1
1
-1
1
-1
1
1
-1
=sgn
×
l samples
m anchors
20
Benefits of Code Inner Product
Supervised
Methods
•CIFAR 10, 60K object images
from 10 classes, 1K query
images.
•1K supervised labels.
•KSH0 Spec Relax, KSH
Sigmoid hashing function
Open Issue: empty buckets and balance not addressed
21
Speedup by Inner Code Product
Train Time
Test Time
48 bits
SSH
48 bits
2.1
0.9×10−5
LDAH
0.7
0.9×10−5
BRE
494.7
2.9×10−5
MLH
3666.3
1.8×10−5
KSH0
7.0
3.3×10−5
Method
KSH
156.1
CVPR 2012
Significant
speedup
4.3×10−5
22
Tiny-1M: Visual Search Results
More visually
relevant
CVPR 2012
25
Comparison of Hashing vs. KD-Tree
KD Tree
Supervised
Hashing
Anchor Graph
Hashing
Photo Tourism Patch set (Norte
Dame subset, 103K samples)
512D GIFT
Understand Difficulty of Approximate
Nearest Neighbor Search He, Kumar, Chang, ICML 2012
• How difficult is approximate nearest neighbor
search in a dataset?
Toy example
Search not meaningful!
x is an ε-approximate NN if
e
q
D(q, x) £ (1+ e )D(q, xnn )
A concrete measure of difficulty of search in a dataset?
Relative Contrast
He, Kumar, Chang, ICML 2012
• A naïve search approach: Randomly pick a point and
compare that to the NN
Relative Contrast
Drandom E x [D(q, x)]
Cr =
=
D(q, xnn )
Dnn
Cr =
q
Eq, x [D(q, x)]
Eq [D(q, xnn )]
• High Relative Contrast  easier search
• If Cr ®1 , search not meaningful
Estimation of Relative Contrast
• With CLT, and binomial approximation
Drandom
1
Cr =
»
Dnn
[1+ f -1 (f (-1/ s ') +1 / n)s ']1/ p
ϕ - standard Gaussian cdf
n: data size
p: Lp distance
σ' – a function of data properties (dimensionality and sparsity)
Synthetic Data
relative contrast
relative contrast
• Data sampled randomly from U[0,1]
d: feature dimension
higher dimensionality  bad
s: prob. of non-zero element in each dim.
sparser vectors  good
Synthetic Data
relative contrast
relative contrast
• Data sampled randomly from U[0,1]
Larger database  good
lower p  good
Predict Hashing Performance of
Real-World Data
Dataset
Dimensionality
(d)
Sparsity
(s)
Relative Contrast
(Cr) for p = 1
SIFT
128
0.89
4.78
Gist
384
1.00
1.83
Color Hist
1382
0.027
3.19
10000
0.024
20 bits
Imagenet BoW
28 bits
12 bits
28 bits LSH
24 bits
32 bits
16 bits
1.90
16 bits LSH
Mobile Search System by Hashing
Light Computing
Low Bit Rate
Big Data Indexing
He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.
34
Estimate the Complexity
• 500 local features per image
– Feature size ~128 Kbytes
– more than 10 seconds for transmission over 3G
• Database indexing
– 1 million images need 0.5 billions local features
– Finding matched features becomes challenging
• Idea:
directly compute compact hash codes on mobile
devices
Approach: hashing
• Each local feature coded as hash bits
– locality sensitive, efficient for high dimensions
• Each image is represented as Bag of Hash Bits
011001100100111100…
110110011001100110…
36
Bit Reuse for Multi-Table Hashing
• To reduce transmission size
– Reuse a single hash bit pool by random subsampling
Optimal hash bit pool (e.g., 80 bits, PCA Hash or SPICA hash)
1001110000101010...00110111
Random
subset
Random
subset
...
Random
subset
Table 1
Table 2
...
Table 11
Random
subset
Table 12
32 bits
Union Results
37
Rerank Results with Boundary Features
• Use automatic salient object segmentation for every
image in DB [Cheng et al, CVPR 2011]
• Compute boundary features:
normalized central distance, Fourier magnitude
• Invariance: translation, scaling, rotation
38
Boundary Feature – Central Distance
Distance to Center D(n)
FFT: F(n)
39
Reranking with boundary feature
40
Mobile Product Search System:
Bags of Hash Bits and Boundary features
He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.
Server:
• 1 million product images crawled from
Amazon, eBay and Zappos
• Hundreds of categories; shoes, clothes,
electrical devices, groceries, kitchen
supplies, movies, etc.
Speed
• Feature extraction: ~1s
• Transmission:
80 bits/feature, 1KB/image
• Serer Search: ~0.4s
• Download/display: 1-2s
video demo (52”)
Performance
• Baseline [Chandrasekhar et al CVPR ‘10]:
Client: compress local features with CHoG
Server: BoW with Vocabulary Tree (1M codes)
30% higher recall and
6X-30X search speedup
42
Summary
• Some Ideas Discussed
– bucket balancing is important
– code inner product – an efficient form of supervised
hashing
– insights on search difficulty prediction
– Large mobile search – a good test case for hashing
• Open Issues
– supervised hashing vs. attribute discovery
– hashing beyond point-to-point search
– hashing to incorporate structured relation (spatiotemporal)
43
References
• (Supervised Kernel Hash)
W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang, Supervised Hashing with Kernels,
CVPR 2012.
•
(Difficulty of Nearest Neighbor Search)
J. He, S. Kumar, S.-F. Chang, On the Difficulty of Nearest Neighbor Search, ICML
2012.
•
(Hash Based Mobile Product Search)
J. He, T. Lin, J. Feng, X. Liu, S.-F. Chang, Mobile Product Search with Bag of Hash Bits and
Boundary Reranking, CVPR 2012
•
(Hashing with Graphs)
W. Liu, J. Wang, S. Kumar, S.-F. Chang. Hashing with Graphs, ICML 2011.
•
(Iterative Quantization)
Y. Gong and S. Lazebnik, Iterative Quantization: A Procrustean Approach to Learning Binary
Codes, CVPR 2011.
•
(Semi-Supervised Hash)
J. Wang, S. Kumar, S.-F. Chang. Semi-Supervised Hashing for Scalable Image Retrieval. CVPR
2010.
•
(ICA Hashing)
J.He, R. Radhakrishnan, S.-F. Chang, C. Bauer. Compact Hashing with Joint Optimization of
44
Search Accuracy and Time. CVPR 2011.