Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.
Download
Report
Transcript Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.
Cartesian k-means
Mohammad Norouzi
David Fleet
We need many clusters
Increasing number of clusters
Problem: Search time, storage cost
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
Compositional representation
๐ subspaces
โ regions per subspace
Compositional representation
๐ subspaces
โ regions per subspace
๐ = โ๐ centers
Compositional representation
๐ subspaces
โ regions per subspace
๐ = โ๐ centers
๐(๐โ) parameters
Which subspaces?
Which subspaces?
Learning
k-means
๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐
k-means
๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐
โ๐โ๐๐๐๐๐ ๐ถ =
min๐
๐โ๐
๐โโ1
๐ โ ๐ถ๐
๐ is a one-of-๐ encoding
[โ1๐ โก ๐ โ 0,1
๐
๐ = 1}]
2
k-means
๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐
โ๐โ๐๐๐๐๐ ๐ถ =
min๐
๐โ๐
๐โโ1
๐ โ ๐ถ๐
๐ is a one-of-๐ encoding
[โ1๐ โก ๐ โ 0,1
๐
๐ = 1}]
2
Orthogonal k-means
๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐
โ๐๐โ๐๐๐๐๐ ๐ถ =
min๐
๐โ๐
๐โโฌ
๐ is an arbitrary ๐-bit encoding
[โฌ๐ โก โ1,1
๐]
๐ โ ๐ถ๐
2
Orthogonal k-means
๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐
โ๐๐โ๐๐๐๐๐ ๐ถ =
min๐
๐โ๐
๐โโฌ
๐ is an arbitrary ๐-bit encoding
#centers: ๐ = 2๐
๐ โ ๐ถ๐
2
Orthogonal k-means
๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐
โ๐๐โ๐๐๐๐๐ ๐ถ =
min๐
๐โ๐
๐โโฌ
๐ โ ๐ถ๐
Additional constraints: โ ๐ โ ๐, ๐๐ โฅ ๐๐
LS estimate of ๐ given ๐ is ๐ = ๐ ๐๐(๐ถ ๐ ๐)
2
min
๐โ โ1,+1 ๐
๐ โ ๐ โ ๐ถ๐
2
๐ถ = identity
Learned
๐ถ
Iterative
Quantization [Gong & Lazebnik,
CVPRโ11]
Product Quantization [Jegou, Douze, Schmid, PAMIโ11]
Cartesian k-means
๐ฅ โ ๐ถ
1
๐ถ (1) โฅ ๐ถ (2)
๐ถ
2
๐(1)
๐(2)
Cartesian k-means
๐ฅ โ ๐ถ
1
๐ถ
2
}
}
โ
(1)
๐
(2)
,๐
โ
๐ถ (1) โฅ ๐ถ (2)
๐(1)
๐(2)
} one-of-โ encoding
} one-of-โ encoding
โ
โ1โ
#centers: ๐ = โ2
Cartesian k-means
๐ฅ โ ๐ถ
1
๐ถ
2
}
}
โ
(1)
๐
(2)
,๐
โ
๐ถ (1) โฅ ๐ถ (2)
๐(1)
๐(2)
} one-of-โ encoding
} one-of-โ encoding
โ
โ1โ
#centers: ๐ = โ2
Storage cost: O ๐
Search time:O ๐
Learning Cartesian k-means
2
min
(1) (2)
๐โ๐
๐
,๐
๐โ ๐ถ
1
๐รโ
๐ถ (1) โฅ ๐ถ (2)
๐(1) , ๐(2) โ โ1โ
๐ถ
2
๐(1)
๐(2)
Learning Cartesian k-means
๐
รโ
2
min
(1) (2)
๐โ๐
๐
,๐
๐โ ๐
1
๐
๐ร
2
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
๐
2
1
๐ท
0
2
0
๐ท2
๐(1)
๐(2)
Learning Cartesian k-means
2
min
(1) (2)
๐โ๐
๐
,๐
๐โ ๐
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
1
๐
2
1
๐ท
0
0
๐ท2
๐(1)
๐(2)
Learning Cartesian k-means
๐
1 ๐
๐
2 ๐
min
(1) (2)
๐โ๐
๐
,๐
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
2
1
๐โ ๐ท
0
0
๐ท2
๐(1)
๐(2)
Learning Cartesian k-means
๐
1 ๐
๐
2 ๐
min
(1) (2)
๐โ๐
๐
,๐
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
2
๐
๐
1
โ ๐ท
0
0
๐ท2
๐(1)
๐(2)
Learning Cartesian k-means
๐
1
๐
๐
min
๐โ๐
๐(1) ,๐(2)
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
๐
1
๐ท (2) ๐
2
๐ท
1
2
โ
๐
2 ๐
๐
Update ๐ท (1) and ๐(1) by
one step of k-means in
1 ๐
๐
๐
Learning Cartesian k-means
๐
1
๐
๐
min
๐โ๐
๐(1) ,๐(2)
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
๐
1
๐ท (2) ๐
2
๐ท
1
2
โ
๐
2 ๐
๐
Update ๐ท (2) and ๐ (2) by
one step of k-means in
2 ๐
๐
๐
Learning Cartesian k-means
min
(1) (2)
๐โ๐
๐
,๐
๐โ ๐
๐
(1) โฅ ๐
(2)
๐(1) , ๐(2) โ โ1โ
1
๐
๐ท
1
๐
1
๐ท
2
๐
2
2
2
Update ๐
by SVD to solve
Orthogonal procrustes
Cartesian k-means
๐(1)
๐ฅ โ ๐ถ
1
โฆ
๐ถ
๐
โฎ
๐(๐)
โ ๐ โ ๐ ๐ถ (๐) โฅ ๐ถ (๐)
๐(1) , โฆ , ๐(๐) โ โ1โ
} one-of-โ
} one-of-โ
#centers: ๐ = โ๐
๐
Storage cost: O ๐
๐
Search time:O( ๐)
Cartesian k-means
๐ subspaces, โ regions per subspace
ok-means
โ=2
๐ = 2๐
๐=1
k-means
compositionality
~๐๐%
~๐๐%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐ = 1600)
77.9%
k-means (๐ = 4000)
79.6%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐ = 1600)
ck-means (๐ = 402 )
77.9%
78.2%
k-means (๐ = 4000)
ck-means (๐ = 642 )
79.6%
79.7%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐ = 1600)
ck-means (๐ = 402 )
PQ (๐ = 402 )
77.9%
78.2%
75.9%
k-means (๐ = 4000)
ck-means (๐ = 642 )
PQ (๐ = 642 )
79.6%
79.7%
78.2%
Quantized ๐๐ ร ๐๐ images (๐๐๐๐ bits)
๐๐ ร ๐๐ images
Run-time complexity
Inference to quantize a point
o A big rotation of size ๐ ร ๐ can be expensive
o PCA to reduce dimensionality to ๐ as pre-processing
and optimize a ๐ ร ๐ projection within the model
Learning
o The most expensive part in each training iteration is
to solve SVD to estimate ๐
which is of ๐(๐3 )
o Can be done faster if we have a ๐ ร ๐ rotation
Summary
ITQ
PQ
ok-means
ck-means
Thank you for your attention!
๐
(1)
๐ท(1)
๐ (1)
โฆ
๐ฅ โ
๐
(๐)
๐ท(๐)
๐ (๐)
๐(๐ ๐)
bit 1
(๐1+ )2
(๐1โ )2
bit 1
bit 2
(๐1+ )2 (๐2+ )2
(๐1โ )2 (๐2โ )2
Query-specific table ( )
bit 1
bit 2
โฆ
bit ๐
+
(๐1+ )2 (๐2+ )2 โฆ ๐๐
2
โ
(๐1โ )2 (๐2โ )2 โฆ ๐๐
2
๐(๐ ๐ + ๐ ๐)
โค ๐(๐๐)