Transcript ppt

A Theoretical Justification of
Link Prediction Heuristics
Deepayan Chakrabarti ([email protected])
Purnamrita Sarkar
Andrew Moore
1
Link Prediction
 Which pair of nodes {i,j} should be connected?
Alice
Bob
Charlie
Goal: Recommend a movie
2
Link Prediction
 Which pair of nodes {i,j} should be connected?
Goal: Suggest friends
3
Link Prediction

Recommendation systems



Personalization
Ranking problems




Friend suggestion, song recommendation, …
Rank a set of results returned for some query
node
Prefetching of requests
Uncovering hidden edges
…
4
Link Prediction Heuristics

Predict link between nodes



Connected by the shortest path
With the most common neighbors (length 2 paths)
More weight to low-degree common nbrs
(Adamic/Adar)
Alice
1000
followers
Bob
Prolific common
friends
Less evidence
Less prolific
3 followers
Charlie
Much more
evidence
Link Prediction Heuristics

Predict link between nodes




Connected by the shortest path
With the most common neighbors (length 2 paths)
More weight to low-degree common nbrs (Adamic/Adar)
With more short paths (e.g. length 3 paths )


…
exponentially decaying weights to longer paths (Katz
measure)
Link prediction accuracy*
Previous Empirical Studies*
How do we justify these
observations?
Random Shortest
Path
Common
Neighbors
Adamic/Adar
Especially if the
graph is sparse
Ensemble of
short paths
*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007
7
Link Prediction – Generative Model
Unit volume universe
Model:
1. Nodes are uniformly distributed points in a latent space
2. This space has a distance metric
3. Points close to each other are likely to be connected in the graph
 Logistic distance function (Raftery+/2002)
8
Link Prediction – Generative Model
Higher probability of
linking
α determines
the steepness
1
½
Model:
radius r
1. Nodes are uniformly distributed points in a latent space
2. This space has a distance metric
3. Points close to each other are likely to be connected in the graph
Link prediction ≈ find nearest neighbor who is not currently
linked to the node.
 Equivalent to inferring distances in the latent space
9
Previous Empirical Studies*
Link prediction accuracy*
Especially if the
graph is sparse
Random Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007
10
Common Neighbors
i

j
Pr2(i,j) = Pr(common neighbor|dij)
Pr2 (i, j)   Pr(i ~ k | d ik ) Pr( j ~ k | d jk ) P(d ik , d jk | d ij )d ikd jk
Product of two logistic probabilities, integrated
over a volume determined by dij
As α∞ Logistic  Step function
Much easier to analyze!
11
Common Neighbors
Everyone has same radius r
Unit volume universe
j
i
Pr2 (i, j)  A(r, r, d ij )
η=Number of
common
neighbors
η
η

P   ε  A(r, r, dij )   ε   1  2
N 
N
# common nbrs
gives a bound
on distance
  η/N  ε 

2r 1  
  V(r) 
1/ D

 η/N  ε 

  dij  2r 1  

 V(r) 
2/ D
V(r)=volume
of radius r in
D dims
12
Common Neighbors

OPT = node closest to i
MAX = node with max common neighbors with i

Theorem:

w.h.p
dOPT ≤ dMAX ≤ dOPT + 2[ε/V(1)]1/D
Link prediction by common neighbors is
asymptotically optimal
13
Common Neighbors: Distinct Radii
Node k has radius rk .



m
i
rk
ik if dik ≤ rk (Directed graph)
k
j
rk captures popularity of node k
Type 1: i k  j
ri
i
k
A(ri , rj ,dij)
j r
j
Type 2: i  k  j
rk i
k
j
A(rk , rk ,dij)
rk
14
Type 2 common neighbors
Example graph:
 N1 nodes of radius r1 and N2 nodes of radius r2
 r1 << r2
η2 ~ Bin[N2 , A(r2, r2, dij)]
η1 ~ Bin[N1 , A(r1, r1, dij)]
k
i
j
Pick d* to maximize Pr[η1 , η2 | dij]
 w(r1) E[η1|d*] + w(r2) E[η2|d*] = w(r1)η1 + w(r2) η2
Inversely related to d*
Weighted common
neighbors
Common Neighbors: Distinct Radii
Node k has radius rk .

ik if dik ≤ rk (Directed graph)



rk captures popularity of node k
“Weighted” common neighbors:

m
i
Predict (i,j) pairs with highest
Σ w(r)η(r)
rk
k
j
# common
neighbors
of radius r
Weight for nodes
of radius r
16
Type 2 common neighbors
i
rk
k
j
Presence of
common neighbor
is very informative
Adamic/Ad
ar
Absence is very
informative
1/r
const const
w(r) 

1
r
deg D
r is close to
max radius
Real world graphs generally fall in this range
17
Type 2 common neighbors
Latent space
dimensionality
affects the plots
18
New Estimators

Weighted common neighbors aggregates all
types of common neighbors

Alternative:


Get a bound for the different types
Combine bounds
19
New Estimators
Number of
common neighbors
of a given radius r
Qr = Fraction of nodes
with radius ≤ r which are
common neighbors
Large Qr  small dij
TR = Fraction of nodes
with radius ≥ R which
are common neighbors
Small TR  large dij
New Estimators
Number of
common neighbors
of a given radius r


“Sweep” over the range of radii
Each radius gives us bounds on dij

Combine these
Previous Empirical Studies*
Link prediction accuracy*
Especially if the
graph is sparse
Random Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007
22
l hop Paths

Common neighbors = 2 hop paths

Analysis of longer paths: two components
1. Bounding E(ηl | dij). [ηl = # l hop paths]

Bounds Prl (i,j) by using triangle inequality on a
series of common neighbor probabilities.
2. ηl ≈ E(ηl | dij)
Triangulation
l hop Paths

Common neighbors = 2 hop paths

Analysis of longer paths: two components
1. Bounding E(ηl | dij). [ηl = # l hop paths]

Bounds Prl (i,j) by using triangle inequality on a
series of common neighbor probabilities.
2. ηl ≈ E(ηl | dij)

Bounded dependence of ηl on position of each node
 Can use McDiarmid’s inequality to bound
|ηl - E(ηl|dij)|
ℓ-hop Paths

Common neighbors = 2 hop paths

For longer paths:

Bounds are weaker
For ℓ’ ≥ ℓ we need ηℓ’ >> ηℓ to obtain similar bounds

d ij  r  (  1)r1 - gη , N, δ 

 justifies the exponentially decaying weight given to longer paths by
the Katz measure
25
Revisiting logistic distance model
i
j
Factor ¼ weak bound
for Logistic
1
4
A
1
2
e
 αdij
(V  A)  Pr2 (i, j)  A  c(r, α, D)
 Can be made tighter, as logistic approaches the step function.
Summary

Three key ingredients
1.
Closer points are likelier to be linked.
Small World Model- Watts, Strogatz, 1998, Kleinberg 2001
2.
Triangle inequality holds
necessary to extend to ℓ-hop paths
3.
Points are spread uniformly at random
 Otherwise properties will depend on location as well
as distance
27
Link prediction accuracy*
Summary
Differentiating between
different degrees is
important
For large dense graphs,
common neighbors are
enough
In sparse graphs,
paths of length 3 or
more help in
prediction.
The number of paths
matters, not the
length
Random Shortest
Path
Common
Neighbors
Adamic/Adar Ensemble of
short paths
*Liben-Nowell & Kleinberg, 2003; Brand, 2005;
Sarkar & Moore, 2007
28
Open question

Weighted graphs

Alternatives to homophily

Dimensionality of the latent “interest” space
29