Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Download Report

Transcript Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.

Cartesian k-means
Mohammad Norouzi
David Fleet
We need many clusters
Increasing number of clusters
Problem: Search time, storage cost
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
(subspace 1)
(subspace 2)
Compositional representation
๐‘š subspaces
โ„Ž regions per subspace
Compositional representation
๐‘š subspaces
โ„Ž regions per subspace
๐‘˜ = โ„Ž๐‘š centers
Compositional representation
๐‘š subspaces
โ„Ž regions per subspace
๐‘˜ = โ„Ž๐‘š centers
๐‘‚(๐‘šโ„Ž) parameters
Which subspaces?
Which subspaces?
Learning
k-means
๐‘˜ cluster centers: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘˜
k-means
๐‘˜ cluster centers: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘˜
โ„“๐‘˜โˆ’๐‘š๐‘’๐‘Ž๐‘›๐‘  ๐ถ =
min๐‘˜
๐’™โˆˆ๐’Ÿ
๐’ƒโˆˆโ„‹1
๐’™ โˆ’ ๐ถ๐’ƒ
๐’ƒ is a one-of-๐‘˜ encoding
[โ„‹1๐‘˜ โ‰ก ๐’ƒ โˆˆ 0,1
๐‘˜
๐’ƒ = 1}]
2
k-means
๐‘˜ cluster centers: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘˜
โ„“๐‘˜โˆ’๐‘š๐‘’๐‘Ž๐‘›๐‘  ๐ถ =
min๐‘˜
๐’™โˆˆ๐’Ÿ
๐’ƒโˆˆโ„‹1
๐’™ โˆ’ ๐ถ๐’ƒ
๐’ƒ is a one-of-๐‘˜ encoding
[โ„‹1๐‘˜ โ‰ก ๐’ƒ โˆˆ 0,1
๐‘˜
๐’ƒ = 1}]
2
Orthogonal k-means
๐‘š center basis vecotrs: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘š
โ„“๐‘œ๐‘˜โˆ’๐‘š๐‘’๐‘Ž๐‘›๐‘  ๐ถ =
min๐‘š
๐’™โˆˆ๐’Ÿ
๐’ƒโˆˆโ„ฌ
๐’ƒ is an arbitrary ๐‘š-bit encoding
[โ„ฌ๐‘š โ‰ก โˆ’1,1
๐‘š]
๐’™ โˆ’ ๐ถ๐’ƒ
2
Orthogonal k-means
๐‘š center basis vecotrs: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘š
โ„“๐‘œ๐‘˜โˆ’๐‘š๐‘’๐‘Ž๐‘›๐‘  ๐ถ =
min๐‘š
๐’™โˆˆ๐’Ÿ
๐’ƒโˆˆโ„ฌ
๐’ƒ is an arbitrary ๐‘š-bit encoding
#centers: ๐‘˜ = 2๐‘š
๐’™ โˆ’ ๐ถ๐’ƒ
2
Orthogonal k-means
๐‘š center basis vecotrs: ๐ถ = ๐’„1 , โ€ฆ , ๐’„๐‘š
โ„“๐‘œ๐‘˜โˆ’๐‘š๐‘’๐‘Ž๐‘›๐‘  ๐ถ =
min๐‘š
๐’™โˆˆ๐’Ÿ
๐’ƒโˆˆโ„ฌ
๐’™ โˆ’ ๐ถ๐’ƒ
Additional constraints: โˆ€ ๐‘– โ‰  ๐‘—, ๐‘๐‘– โŠฅ ๐‘๐‘—
LS estimate of ๐’ƒ given ๐’™ is ๐’ƒ = ๐‘ ๐‘”๐‘›(๐ถ ๐‘‡ ๐’™)
2
min
๐’ƒโˆˆ โˆ’1,+1 ๐‘š
๐’™ โˆ’ ๐ โˆ’ ๐ถ๐’ƒ
2
๐ถ = identity
Learned
๐ถ
Iterative
Quantization [Gong & Lazebnik,
CVPRโ€™11]
Product Quantization [Jegou, Douze, Schmid, PAMIโ€™11]
Cartesian k-means
๐‘ฅ โ‰ˆ ๐ถ
1
๐ถ (1) โŠฅ ๐ถ (2)
๐ถ
2
๐’ƒ(1)
๐’ƒ(2)
Cartesian k-means
๐‘ฅ โ‰ˆ ๐ถ
1
๐ถ
2
}
}
โ„Ž
(1)
๐’ƒ
(2)
,๐’ƒ
โˆˆ
๐ถ (1) โŠฅ ๐ถ (2)
๐’ƒ(1)
๐’ƒ(2)
} one-of-โ„Ž encoding
} one-of-โ„Ž encoding
โ„Ž
โ„‹1โ„Ž
#centers: ๐‘˜ = โ„Ž2
Cartesian k-means
๐‘ฅ โ‰ˆ ๐ถ
1
๐ถ
2
}
}
โ„Ž
(1)
๐’ƒ
(2)
,๐’ƒ
โˆˆ
๐ถ (1) โŠฅ ๐ถ (2)
๐’ƒ(1)
๐’ƒ(2)
} one-of-โ„Ž encoding
} one-of-โ„Ž encoding
โ„Ž
โ„‹1โ„Ž
#centers: ๐‘˜ = โ„Ž2
Storage cost: O ๐‘˜
Search time:O ๐‘˜
Learning Cartesian k-means
2
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐’™โˆ’ ๐ถ
1
๐‘×โ„Ž
๐ถ (1) โŠฅ ๐ถ (2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
๐ถ
2
๐’ƒ(1)
๐’ƒ(2)
Learning Cartesian k-means
๐‘
×โ„Ž
2
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐’™โˆ’ ๐‘…
1
๐‘
๐‘×
2
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
๐‘…
2
1
๐ท
0
2
0
๐ท2
๐’ƒ(1)
๐’ƒ(2)
Learning Cartesian k-means
2
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐’™โˆ’ ๐‘…
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
1
๐‘…
2
1
๐ท
0
0
๐ท2
๐’ƒ(1)
๐’ƒ(2)
Learning Cartesian k-means
๐‘…
1 ๐‘‡
๐‘…
2 ๐‘‡
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
2
1
๐’™โˆ’ ๐ท
0
0
๐ท2
๐’ƒ(1)
๐’ƒ(2)
Learning Cartesian k-means
๐‘…
1 ๐‘‡
๐‘…
2 ๐‘‡
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
2
๐’™
๐’™
1
โˆ’ ๐ท
0
0
๐ท2
๐’ƒ(1)
๐’ƒ(2)
Learning Cartesian k-means
๐‘…
1
๐‘‡
๐’™
min
๐’™โˆˆ๐’Ÿ
๐’ƒ(1) ,๐’ƒ(2)
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
๐’ƒ
1
๐ท (2) ๐’ƒ
2
๐ท
1
2
โˆ’
๐‘…
2 ๐‘‡
๐’™
Update ๐ท (1) and ๐’ƒ(1) by
one step of k-means in
1 ๐‘‡
๐‘…
๐’™
Learning Cartesian k-means
๐‘…
1
๐‘‡
๐’™
min
๐’™โˆˆ๐’Ÿ
๐’ƒ(1) ,๐’ƒ(2)
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
๐’ƒ
1
๐ท (2) ๐’ƒ
2
๐ท
1
2
โˆ’
๐‘…
2 ๐‘‡
๐’™
Update ๐ท (2) and ๐‘ (2) by
one step of k-means in
2 ๐‘‡
๐‘…
๐’™
Learning Cartesian k-means
min
(1) (2)
๐’™โˆˆ๐’Ÿ
๐’ƒ
,๐’ƒ
๐’™โˆ’ ๐‘…
๐‘…(1) โŠฅ ๐‘…(2)
๐’ƒ(1) , ๐’ƒ(2) โˆˆ โ„‹1โ„Ž
1
๐‘…
๐ท
1
๐’ƒ
1
๐ท
2
๐’ƒ
2
2
2
Update ๐‘… by SVD to solve
Orthogonal procrustes
Cartesian k-means
๐’ƒ(1)
๐‘ฅ โ‰ˆ ๐ถ
1
โ€ฆ
๐ถ
๐‘š
โ‹ฎ
๐’ƒ(๐‘š)
โˆ€ ๐‘– โ‰  ๐‘— ๐ถ (๐‘–) โŠฅ ๐ถ (๐‘—)
๐’ƒ(1) , โ€ฆ , ๐’ƒ(๐‘š) โˆˆ โ„‹1โ„Ž
} one-of-โ„Ž
} one-of-โ„Ž
#centers: ๐‘˜ = โ„Ž๐‘š
๐‘š
Storage cost: O ๐‘˜
๐‘š
Search time:O( ๐‘˜)
Cartesian k-means
๐‘š subspaces, โ„Ž regions per subspace
ok-means
โ„Ž=2
๐‘˜ = 2๐‘š
๐‘š=1
k-means
compositionality
~๐Ÿ๐ŸŽ%
~๐Ÿ๐Ÿ“%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐‘˜ = 1600)
77.9%
k-means (๐‘˜ = 4000)
79.6%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐‘˜ = 1600)
ck-means (๐‘˜ = 402 )
77.9%
78.2%
k-means (๐‘˜ = 4000)
ck-means (๐‘˜ = 642 )
79.6%
79.7%
Codebook learning (CIFAR-10)
Codebook
Accuracy
k-means (๐‘˜ = 1600)
ck-means (๐‘˜ = 402 )
PQ (๐‘˜ = 402 )
77.9%
78.2%
75.9%
k-means (๐‘˜ = 4000)
ck-means (๐‘˜ = 642 )
PQ (๐‘˜ = 642 )
79.6%
79.7%
78.2%
Quantized ๐Ÿ‘๐Ÿ × ๐Ÿ‘๐Ÿ images (๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’ bits)
๐Ÿ‘๐Ÿ × ๐Ÿ‘๐Ÿ images
Run-time complexity
Inference to quantize a point
o A big rotation of size ๐‘ × ๐‘ can be expensive
o PCA to reduce dimensionality to ๐‘  as pre-processing
and optimize a ๐‘ × ๐‘  projection within the model
Learning
o The most expensive part in each training iteration is
to solve SVD to estimate ๐‘… which is of ๐‘‚(๐‘3 )
o Can be done faster if we have a ๐‘ × ๐‘  rotation
Summary
ITQ
PQ
ok-means
ck-means
Thank you for your attention!
๐‘… (1)
๐ท(1)
๐‘ (1)
โ€ฆ
๐‘ฅ โ‰ˆ
๐‘… (๐‘š)
๐ท(๐‘š)
๐‘ (๐‘š)
๐‘‚(๐‘› ๐‘)
bit 1
(๐‘‘1+ )2
(๐‘‘1โˆ’ )2
bit 1
bit 2
(๐‘‘1+ )2 (๐‘‘2+ )2
(๐‘‘1โˆ’ )2 (๐‘‘2โˆ’ )2
Query-specific table ( )
bit 1
bit 2
โ€ฆ
bit ๐‘š
+
(๐‘‘1+ )2 (๐‘‘2+ )2 โ€ฆ ๐‘‘๐‘š
2
โˆ’
(๐‘‘1โˆ’ )2 (๐‘‘2โˆ’ )2 โ€ฆ ๐‘‘๐‘š
2
๐‘‚(๐‘› ๐‘š + ๐‘š ๐‘)
โ‰ค ๐‘‚(๐‘›๐‘)