Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Transcript Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Cartesian k-means
Mohammad Norouzi
David Fleet
We need many clusters
Increasing number of clusters
Problem: Search time, storage cost
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
Compositional representation
𝑚 subspaces
ℎ regions per subspace
Compositional representation
𝑚 subspaces
ℎ regions per subspace
𝑘 = ℎ𝑚 centers
Compositional representation
𝑚 subspaces
ℎ regions per subspace
𝑘 = ℎ𝑚 centers
𝑂(𝑚ℎ) parameters
Which subspaces?
Which subspaces?
Learning
k-means
𝑘 cluster centers: 𝐶 = 𝒄1 , … , 𝒄𝑘
k-means
𝑘 cluster centers: 𝐶 = 𝒄1 , … , 𝒄𝑘
ℓ𝑘−𝑚𝑒𝑎𝑛𝑠 𝐶 =
min𝑘
𝒙∈𝒟
𝒃∈ℋ1
𝒙 − 𝐶𝒃
𝒃 is a one-of-𝑘 encoding
[ℋ1𝑘 ≡ 𝒃 ∈ 0,1
𝑘
𝒃 = 1}]
2
k-means
𝑘 cluster centers: 𝐶 = 𝒄1 , … , 𝒄𝑘
ℓ𝑘−𝑚𝑒𝑎𝑛𝑠 𝐶 =
min𝑘
𝒙∈𝒟
𝒃∈ℋ1
𝒙 − 𝐶𝒃
𝒃 is a one-of-𝑘 encoding
[ℋ1𝑘 ≡ 𝒃 ∈ 0,1
𝑘
𝒃 = 1}]
2
Orthogonal k-means
𝑚 center basis vecotrs: 𝐶 = 𝒄1 , … , 𝒄𝑚
ℓ𝑜𝑘−𝑚𝑒𝑎𝑛𝑠 𝐶 =
min𝑚
𝒙∈𝒟
𝒃∈ℬ
𝒃 is an arbitrary 𝑚-bit encoding
[ℬ𝑚 ≡ −1,1
𝑚]
𝒙 − 𝐶𝒃
2
Orthogonal k-means
𝑚 center basis vecotrs: 𝐶 = 𝒄1 , … , 𝒄𝑚
ℓ𝑜𝑘−𝑚𝑒𝑎𝑛𝑠 𝐶 =
min𝑚
𝒙∈𝒟
𝒃∈ℬ
𝒃 is an arbitrary 𝑚-bit encoding
#centers: 𝑘 = 2𝑚
𝒙 − 𝐶𝒃
2
Orthogonal k-means
𝑚 center basis vecotrs: 𝐶 = 𝒄1 , … , 𝒄𝑚
ℓ𝑜𝑘−𝑚𝑒𝑎𝑛𝑠 𝐶 =
min𝑚
𝒙∈𝒟
𝒃∈ℬ
𝒙 − 𝐶𝒃
Additional constraints: ∀ 𝑖 ≠ 𝑗, 𝑐𝑖 ⊥ 𝑐𝑗
LS estimate of 𝒃 given 𝒙 is 𝒃 = 𝑠𝑔𝑛(𝐶 𝑇 𝒙)
2
min
𝒃∈ −1,+1 𝑚
𝒙 − 𝝁 − 𝐶𝒃
2
𝐶 = identity
Learned
𝐶
Iterative
Quantization [Gong & Lazebnik,
CVPR’11]
Product Quantization [Jegou, Douze, Schmid, PAMI’11]
Cartesian k-means
𝑥 ≈ 𝐶
1
𝐶 (1) ⊥ 𝐶 (2)
𝐶
2
𝒃(1)
𝒃(2)
Cartesian k-means
𝑥 ≈ 𝐶
1
𝐶
2
}
}
ℎ
(1)
𝒃
(2)
,𝒃
∈
𝐶 (1) ⊥ 𝐶 (2)
𝒃(1)
𝒃(2)
} one-of-ℎ encoding
} one-of-ℎ encoding
ℎ
ℋ1ℎ
#centers: 𝑘 = ℎ2
Cartesian k-means
𝑥 ≈ 𝐶
1
𝐶
2
}
}
ℎ
(1)
𝒃
(2)
,𝒃
∈
𝐶 (1) ⊥ 𝐶 (2)
𝒃(1)
𝒃(2)
} one-of-ℎ encoding
} one-of-ℎ encoding
ℎ
ℋ1ℎ
#centers: 𝑘 = ℎ2
Storage cost: O 𝑘
Search time:O 𝑘
Learning Cartesian k-means
2
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝒙− 𝐶
1
𝑝×ℎ
𝐶 (1) ⊥ 𝐶 (2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
𝐶
2
𝒃(1)
𝒃(2)
Learning Cartesian k-means
𝑝
×ℎ
2
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝒙− 𝑅
1
𝑝
𝑝×
2
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
𝑅
2
1
𝐷
0
2
0
𝐷2
𝒃(1)
𝒃(2)
Learning Cartesian k-means
2
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝒙− 𝑅
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
1
𝑅
2
1
𝐷
0
0
𝐷2
𝒃(1)
𝒃(2)
Learning Cartesian k-means
𝑅
1 𝑇
𝑅
2 𝑇
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
2
1
𝒙− 𝐷
0
0
𝐷2
𝒃(1)
𝒃(2)
Learning Cartesian k-means
𝑅
1 𝑇
𝑅
2 𝑇
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
2
𝒙
𝒙
1
− 𝐷
0
0
𝐷2
𝒃(1)
𝒃(2)
Learning Cartesian k-means
𝑅
1
𝑇
𝒙
min
𝒙∈𝒟
𝒃(1) ,𝒃(2)
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
𝒃
1
𝐷 (2) 𝒃
2
𝐷
1
2
−
𝑅
2 𝑇
𝒙
Update 𝐷 (1) and 𝒃(1) by
one step of k-means in
1 𝑇
𝑅
𝒙
Learning Cartesian k-means
𝑅
1
𝑇
𝒙
min
𝒙∈𝒟
𝒃(1) ,𝒃(2)
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
𝒃
1
𝐷 (2) 𝒃
2
𝐷
1
2
−
𝑅
2 𝑇
𝒙
Update 𝐷 (2) and 𝑏 (2) by
one step of k-means in
2 𝑇
𝑅
𝒙
Learning Cartesian k-means
min
(1) (2)
𝒙∈𝒟
𝒃
,𝒃
𝒙− 𝑅
𝑅(1) ⊥ 𝑅(2)
𝒃(1) , 𝒃(2) ∈ ℋ1ℎ
1
𝑅
𝐷
1
𝒃
1
𝐷
2
𝒃
2
2
2
Update 𝑅 by SVD to solve
Orthogonal procrustes
Cartesian k-means
𝒃(1)
𝑥 ≈ 𝐶
1
…
𝐶
𝑚
⋮
𝒃(𝑚)
∀ 𝑖 ≠ 𝑗 𝐶 (𝑖) ⊥ 𝐶 (𝑗)
𝒃(1) , … , 𝒃(𝑚) ∈ ℋ1ℎ
} one-of-ℎ
} one-of-ℎ
#centers: 𝑘 = ℎ𝑚
𝑚
Storage cost: O 𝑘
𝑚
Search time:O( 𝑘)
Cartesian k-means
𝑚 subspaces, ℎ regions per subspace
ok-means
ℎ=2
𝑘 = 2𝑚
𝑚=1
k-means
compositionality
~𝟏𝟎%
~𝟐𝟓%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (𝑘 = 1600)
77.9%
k-means (𝑘 = 4000)
79.6%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (𝑘 = 1600)
ck-means (𝑘 = 402 )
77.9%
78.2%
k-means (𝑘 = 4000)
ck-means (𝑘 = 642 )
79.6%
79.7%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (𝑘 = 1600)
ck-means (𝑘 = 402 )
PQ (𝑘 = 402 )
77.9%
78.2%
75.9%
k-means (𝑘 = 4000)
ck-means (𝑘 = 642 )
PQ (𝑘 = 642 )
79.6%
79.7%
78.2%
Quantized 𝟑𝟐 × 𝟑𝟐 images (𝟏𝟎𝟐𝟒 bits)
𝟑𝟐 × 𝟑𝟐 images
Run-time complexity
Inference to quantize a point
o A big rotation of size 𝑝 × 𝑝 can be expensive
o PCA to reduce dimensionality to 𝑠 as pre-processing
and optimize a 𝑝 × 𝑠 projection within the model
Learning
o The most expensive part in each training iteration is
to solve SVD to estimate 𝑅 which is of 𝑂(𝑝3 )
o Can be done faster if we have a 𝑝 × 𝑠 rotation
Summary
ITQ
PQ
ok-means
ck-means
Thank you for your attention!
𝑅 (1)
𝐷(1)
𝑏 (1)
…
𝑥 ≈
𝑅 (𝑚)
𝐷(𝑚)
𝑏 (𝑚)
𝑂(𝑛 𝑝)
bit 1
(𝑑1+ )2
(𝑑1− )2
bit 1
bit 2
(𝑑1+ )2 (𝑑2+ )2
(𝑑1− )2 (𝑑2− )2
Query-specific table ( )
bit 1
bit 2
…
bit 𝑚
+
(𝑑1+ )2 (𝑑2+ )2 … 𝑑𝑚
2
−
(𝑑1− )2 (𝑑2− )2 … 𝑑𝑚
2
𝑂(𝑛 𝑚 + 𝑚 𝑝)
≤ 𝑂(𝑛𝑝)

Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Transcript Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Directory