AndoniLecture - Computational Geometric Learning

Download Report

Transcript AndoniLecture - Computational Geometric Learning

Nearest Neighbor Search
in high-dimensional spaces
Alexandr Andoni
(Microsoft Research)
Nearest Neighbor Search (NNS)
Preprocess: a set D of
points in Rd
Query: given a new point q,
report a point pD with the
smallest distance to q
p
q
Motivation
 Generic setup:
 Points model objects (e.g. images)
 Distance models (dis)similarity measure
 Application areas:
 machine learning, data mining, speech
recognition, image/video/music clustering,
bioinformatics, etc…
 Distance can be:
 Euclidean, Hamming, ℓ∞,
edit distance, Ulam, Earth-mover distance, etc…
 Primitive for other problems:
 find the closest pair in a set D, MST, clustering…
p
q
Plan for today
1. NNS for “basic” distances: LSH
2. NNS for “advanced” distances:
embeddings
2D case
Compute Voronoi diagram
Given query q, perform
point location
Performance:
Space: O(n)
Query time: O(log n)
High-dimensional case
 All exact algorithms degrade rapidly with the
dimension d
Algorithm
Full indexing
No indexing
– linear scan
Query time Space
O(d*log n) nO(d) (Voronoi diagram size)
O(dn)
O(dn)
 When d is high, state-of-the-art is unsatisfactory:
Even in practice, query time tends to be linear in n
Approximate NNS
r-near neighbor: given a new
point q, report a point pD s.t.
||p-q||≤rcr
as long as there exists
a point at distance ≤r
r
cr
q
p
Approximation Algorithms for NNS
A vast literature:
With exp(d) space or Ω(n) time:
[Arya-Mount-et al], [Kleinberg’97], [Har-Peled’02],…
With poly(n) space and o(n) time:
[Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98],
[Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02],
[Datar-Immorlica-Indyk-Mirrokni’04], [ChakrabartiRegev’04], [Panigrahy’06], [Ailon-Chazelle’06], [AIndyk’06]…
The landscape: algorithms
Space
Time
Comment
Reference
Space: poly(n).
n4/ε2+nd O(d*log n) c=1+ε
Query: logarithmic
[KOR’98, IM’98]
Space: small poly n1+ρ +nd dnρ
(close to linear).
Query: poly
(sublinear).
ρ≈1/c
[IM’98, Cha’02, DIIM’04]
Space: near-linear. nd*logn dnρ
Query: poly
(sublinear).
ρ=2.09/c
[Ind’01, Pan’06]
ρ=O(1/c2)
[AI’06]
ρ=1/c2 +o(1) [AI’06]
Locality-Sensitive Hashing
[Indyk-Motwani’98]
q
Random hash function g:
RdZ s.t. for any points p,q:
p
If ||p-q|| ≤ r, then Pr[g(p)=g(q)]
is “high” “not-so-small”
If ||p-q|| >cr, then Pr[g(p)=g(q)]
is “small”
Use several hash
tables: nρ, where ρ s.t.
Pr[g(p)=g(q)]
1
P1
P2
||p-q||
r
cr
Example of hash functions: grids
[Datar-Immorlica-Indyk-Mirrokni’04]
 Pick a regular grid:
Shift and rotate randomly
 Hash function:
g(p) = index of the cell of p
 Gives ρ ≈ 1/c
p
Near-Optimal LSH
[A-Indyk’06]
 Regular grid → grid of balls
p
p can hit empty space, so take
more such grids until p is in a ball
 Need (too) many grids of balls
Start by reducing dimension to t
 Analysis gives
 Choice of reduced dimension t?
2D
Tradeoff between
# hash tables, n, and
Time to hash, tO(t)
Total query time: dn1/c2+o(1)
p
Rt
p
Proof idea
 Claim:
, where
 P(r)=probability of collision when ||p-q||=r
 Intuitive proof:
 Let’s ignore effects of reducing dimension
 P(r) = intersection / union
 P(r)≈random point u beyond the dashed line
 The x-coordinate of u has a nearly Gaussian
distribution
→ P(r)  exp(-A·r2)
qq
r
p
P(r)
x
u
The landscape: lower bounds
Space
Time
Comment
Space: poly(n).
n4/ε2+nd O(d*log n) c=1+ε
Query: logarithmic o(1/ε2)
n
ω(1) memory lookups
Space: small poly 1+ρ
n +nd dnρ
(close to linear).
Query: poly
(sublinear).
n1+o(1/c2)
ρ≈1/c
[KOR’98, IM’98]
[AIP’06]
[IM’98, Cha’02, DIIM’04]
ρ=1/c2 +o(1) [AI’06]
ρ≥1/c2
ω(1) memory lookups
Space: near-linear. nd*logn dnρ
Query: poly
(sublinear).
Reference
[MNP’06, OWZ’10]
[PTW’08, PTW’10]
ρ=2.09/c
[Ind’01, Pan’06]
ρ=O(1/c2)
[AI’06]
Open Question #1:
Design space partitioning of Rt that is
efficient: point location in poly(t) time
qualitative: regions are “sphere-like”
2
c
[Prob. needle of length 1 is cut]
≥
[Prob needle of length c is cut]
LSH beyond NNS
[A-Indyk’07, Rahami-Recht’07, A’09]
 Approximating Kernel Spaces (obliviously)
Problem:
 For x,yRd, can define inner product K(x,y)=e-||x-y||
 Implicitly, means K(x,y) = ϕ(x) * ϕ(y)
 Can we obtain explicit and efficient ϕ?
approximately? (can’t do exactly)
Yes, for some kernels, via LSH:
 E.g., map ϕ’(x)=(r1(g1(x)), r2(g2(x)…)
• gi’s are LSH functions on Rd, ri’s map into random ±1
 Get: ±ε approximation in O(ε-2 * log n) dimensions
 Sketching (≈ dimensionality reduction in a
computational space) [KOR’98,…]
Plan for today
1. NNS for basic distances
2. NNS for advanced distances: embeddings
NNS beyond LSH
Distances, so far
LSH good for: Hamming, Euclidean
Space
Hamming (ℓ1) n1+ρ +nd
Euclidean (ℓ2)
Time
Comment
Reference
dnρ
ρ=1/c
[IM’98, Cha’02, DIIM’04]
ρ≥1/c
[MNP06,OZW10,PTW’08’10]
ρ≈1/c2
ρ≥1/c2
[AI’06]
[MNP06,OZW10,PTW’08’10]
How about other distances (not ℓp’s) ?
Earth-Mover Distance (EMD)
 Given two sets A, B of points,
EMD(A,B) = min cost bipartite matching
between A and B
 Points can be in plane, ℓ2d…
 Applications: image search
Images courtesy of Kristen Grauman (UT Austin)
Reductions via embeddings
 For each XM, associate a vector
f(X), such that for all X,YM
||f(X) - f(Y)||2 approximates original
distance between X and Y
Up to some distortion (approximation)
 Then can use NNS for Euclidean
space!
 Can also consider other “easy” f
distances between f(x), f(y)
 Most popular host: ℓ1≡Hamming
ℓ1=real space with distance
||x-y||1=∑i |xi-yi|
f
Earth-Mover Distance over 2D into ℓ1
[Cha02, IT03]
 Sets of size s in [1…s]x[1…s] box
 Embedding of set A:
 impose randomly-shifted
grid
 Each grid cell gives
a coordinate:
f (A)c=#points in the cell c
 Subpartition the grid
recursively, and assign
new coordinates for each
new cell (on all levels)
00
02
00
11
12
01
01
22
20
00
 Distortion: O(log s)
21
Embeddings of various metrics
 Embeddings into ℓ1
Metric
Upper bound
Lower bound
Earth-mover distance
(s-sized sets in 2D plane)
O(log s)
Ω(log1/2 s)
[Cha02, IT03]
[NS07]
Ω(log s)
Earth-mover distance
O(log s*log d)
Open Question #3:
[AIK08]
(s-sized sets in {0,1}d)
Improve the
distortion
of embedding
d
Õ(√log
d)
Ω(log d)
Edit distance over {0,1}
2
EMD, W2, edit distance into
ℓ1
[OR05]
[KN05,KR06]
(= #indels to tranform x->y)
Ulam (edit distance between
non-repetitive strings)
O(log d)
Block edit distance
Õ(log d)
[KN05]
Ω̃(log d)
[AK07]
[CK06]
[MS00, CM07]
4/3
[Cor03]
Really beyond LSH
 NNS for ℓ∞
Space
Time
Comment
Reference
n1+ρ
O(d log n)
c≈logρlog d [I’98]
via decision trees
cannot do better via (deterministic) decision trees
 NNS for mixed norms, e.g. ℓ2(ℓ1) [I’04,AIK’09,A’09]
 Embedding into mixed norms [AIK’09]
Ulam O(1)-embeds in ℓ22(ℓ∞(ℓ1)) of small dimension
yields NNS with O(log log d) approximation
Ω̃(log d) if would embed into each separate norm!
 Open Question #4:
Embed EMD, edit distance into mixed norms?
23
Summary: high-d NNS
 Locality-sensitive hashing:
 For Hamming, Euclidean spaces
 Provably (near) optimal NNS in some regimes
 Applications beyond NNS: kernels, sketches
 Beyond LSH
 Non-normed distances: via embeddings into ℓ1
 Algorithms for ℓp and mixed norms (of ℓp‘s)
 Some open questions:
 Design qualitative, efficient LSH / space partitioning (in
Euclidean space)
 Embed “harder” distances (like EMD, edit distance) into ℓ1, or
mixed norms (of ℓp’s) ?
 Is there an LSH for ℓ∞ ?
 NNS for any norm: e.g. trace norm?