Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.
Download ReportTranscript Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost.
Cartesian k-means Mohammad Norouzi David Fleet We need many clusters Increasing number of clusters Problem: Search time, storage cost (subspace 1) (subspace 2) (subspace 1) (subspace 2) (subspace 1) (subspace 2) (subspace 1) (subspace 2) (subspace 1) (subspace 2) (subspace 1) (subspace 2) Compositional representation ๐ subspaces โ regions per subspace Compositional representation ๐ subspaces โ regions per subspace ๐ = โ๐ centers Compositional representation ๐ subspaces โ regions per subspace ๐ = โ๐ centers ๐(๐โ) parameters Which subspaces? Which subspaces? Learning k-means ๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐ k-means ๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐ โ๐โ๐๐๐๐๐ ๐ถ = min๐ ๐โ๐ ๐โโ1 ๐ โ ๐ถ๐ ๐ is a one-of-๐ encoding [โ1๐ โก ๐ โ 0,1 ๐ ๐ = 1}] 2 k-means ๐ cluster centers: ๐ถ = ๐1 , โฆ , ๐๐ โ๐โ๐๐๐๐๐ ๐ถ = min๐ ๐โ๐ ๐โโ1 ๐ โ ๐ถ๐ ๐ is a one-of-๐ encoding [โ1๐ โก ๐ โ 0,1 ๐ ๐ = 1}] 2 Orthogonal k-means ๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐ โ๐๐โ๐๐๐๐๐ ๐ถ = min๐ ๐โ๐ ๐โโฌ ๐ is an arbitrary ๐-bit encoding [โฌ๐ โก โ1,1 ๐] ๐ โ ๐ถ๐ 2 Orthogonal k-means ๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐ โ๐๐โ๐๐๐๐๐ ๐ถ = min๐ ๐โ๐ ๐โโฌ ๐ is an arbitrary ๐-bit encoding #centers: ๐ = 2๐ ๐ โ ๐ถ๐ 2 Orthogonal k-means ๐ center basis vecotrs: ๐ถ = ๐1 , โฆ , ๐๐ โ๐๐โ๐๐๐๐๐ ๐ถ = min๐ ๐โ๐ ๐โโฌ ๐ โ ๐ถ๐ Additional constraints: โ ๐ โ ๐, ๐๐ โฅ ๐๐ LS estimate of ๐ given ๐ is ๐ = ๐ ๐๐(๐ถ ๐ ๐) 2 min ๐โ โ1,+1 ๐ ๐ โ ๐ โ ๐ถ๐ 2 ๐ถ = identity Learned ๐ถ Iterative Quantization [Gong & Lazebnik, CVPRโ11] Product Quantization [Jegou, Douze, Schmid, PAMIโ11] Cartesian k-means ๐ฅ โ ๐ถ 1 ๐ถ (1) โฅ ๐ถ (2) ๐ถ 2 ๐(1) ๐(2) Cartesian k-means ๐ฅ โ ๐ถ 1 ๐ถ 2 } } โ (1) ๐ (2) ,๐ โ ๐ถ (1) โฅ ๐ถ (2) ๐(1) ๐(2) } one-of-โ encoding } one-of-โ encoding โ โ1โ #centers: ๐ = โ2 Cartesian k-means ๐ฅ โ ๐ถ 1 ๐ถ 2 } } โ (1) ๐ (2) ,๐ โ ๐ถ (1) โฅ ๐ถ (2) ๐(1) ๐(2) } one-of-โ encoding } one-of-โ encoding โ โ1โ #centers: ๐ = โ2 Storage cost: O ๐ Search time:O ๐ Learning Cartesian k-means 2 min (1) (2) ๐โ๐ ๐ ,๐ ๐โ ๐ถ 1 ๐×โ ๐ถ (1) โฅ ๐ถ (2) ๐(1) , ๐(2) โ โ1โ ๐ถ 2 ๐(1) ๐(2) Learning Cartesian k-means ๐ ×โ 2 min (1) (2) ๐โ๐ ๐ ,๐ ๐โ ๐ 1 ๐ ๐× 2 ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ ๐ 2 1 ๐ท 0 2 0 ๐ท2 ๐(1) ๐(2) Learning Cartesian k-means 2 min (1) (2) ๐โ๐ ๐ ,๐ ๐โ ๐ ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ 1 ๐ 2 1 ๐ท 0 0 ๐ท2 ๐(1) ๐(2) Learning Cartesian k-means ๐ 1 ๐ ๐ 2 ๐ min (1) (2) ๐โ๐ ๐ ,๐ ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ 2 1 ๐โ ๐ท 0 0 ๐ท2 ๐(1) ๐(2) Learning Cartesian k-means ๐ 1 ๐ ๐ 2 ๐ min (1) (2) ๐โ๐ ๐ ,๐ ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ 2 ๐ ๐ 1 โ ๐ท 0 0 ๐ท2 ๐(1) ๐(2) Learning Cartesian k-means ๐ 1 ๐ ๐ min ๐โ๐ ๐(1) ,๐(2) ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ ๐ 1 ๐ท (2) ๐ 2 ๐ท 1 2 โ ๐ 2 ๐ ๐ Update ๐ท (1) and ๐(1) by one step of k-means in 1 ๐ ๐ ๐ Learning Cartesian k-means ๐ 1 ๐ ๐ min ๐โ๐ ๐(1) ,๐(2) ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ ๐ 1 ๐ท (2) ๐ 2 ๐ท 1 2 โ ๐ 2 ๐ ๐ Update ๐ท (2) and ๐ (2) by one step of k-means in 2 ๐ ๐ ๐ Learning Cartesian k-means min (1) (2) ๐โ๐ ๐ ,๐ ๐โ ๐ ๐ (1) โฅ ๐ (2) ๐(1) , ๐(2) โ โ1โ 1 ๐ ๐ท 1 ๐ 1 ๐ท 2 ๐ 2 2 2 Update ๐ by SVD to solve Orthogonal procrustes Cartesian k-means ๐(1) ๐ฅ โ ๐ถ 1 โฆ ๐ถ ๐ โฎ ๐(๐) โ ๐ โ ๐ ๐ถ (๐) โฅ ๐ถ (๐) ๐(1) , โฆ , ๐(๐) โ โ1โ } one-of-โ } one-of-โ #centers: ๐ = โ๐ ๐ Storage cost: O ๐ ๐ Search time:O( ๐) Cartesian k-means ๐ subspaces, โ regions per subspace ok-means โ=2 ๐ = 2๐ ๐=1 k-means compositionality ~๐๐% ~๐๐% Codebook learning (CIFAR-10) Codebook Accuracy k-means (๐ = 1600) 77.9% k-means (๐ = 4000) 79.6% Codebook learning (CIFAR-10) Codebook Accuracy k-means (๐ = 1600) ck-means (๐ = 402 ) 77.9% 78.2% k-means (๐ = 4000) ck-means (๐ = 642 ) 79.6% 79.7% Codebook learning (CIFAR-10) Codebook Accuracy k-means (๐ = 1600) ck-means (๐ = 402 ) PQ (๐ = 402 ) 77.9% 78.2% 75.9% k-means (๐ = 4000) ck-means (๐ = 642 ) PQ (๐ = 642 ) 79.6% 79.7% 78.2% Quantized ๐๐ × ๐๐ images (๐๐๐๐ bits) ๐๐ × ๐๐ images Run-time complexity Inference to quantize a point o A big rotation of size ๐ × ๐ can be expensive o PCA to reduce dimensionality to ๐ as pre-processing and optimize a ๐ × ๐ projection within the model Learning o The most expensive part in each training iteration is to solve SVD to estimate ๐ which is of ๐(๐3 ) o Can be done faster if we have a ๐ × ๐ rotation Summary ITQ PQ ok-means ck-means Thank you for your attention! ๐ (1) ๐ท(1) ๐ (1) โฆ ๐ฅ โ ๐ (๐) ๐ท(๐) ๐ (๐) ๐(๐ ๐) bit 1 (๐1+ )2 (๐1โ )2 bit 1 bit 2 (๐1+ )2 (๐2+ )2 (๐1โ )2 (๐2โ )2 Query-specific table ( ) bit 1 bit 2 โฆ bit ๐ + (๐1+ )2 (๐2+ )2 โฆ ๐๐ 2 โ (๐1โ )2 (๐2โ )2 โฆ ๐๐ 2 ๐(๐ ๐ + ๐ ๐) โค ๐(๐๐)