Transcript ppt

Nonparametric Link
Prediction in Dynamic Graphs
Purnamrita Sarkar (UC Berkeley)
Deepayan Chakrabarti (Facebook)
Michael Jordan (UC Berkeley)
1
Link Prediction
 Who is most likely to be interact with a given node?
Alice
Should Facebook
suggest Alice
as a friend for Bob?
Bob
Friend suggestion in Facebook
2
Link Prediction
Alice
Bob
Should Netflix
suggest this
movie to Alice?
Charlie
Movie recommendation in Netflix
3
Link Prediction

Prediction using simple features




degree of a node
number of common neighbors
last time a link appeared
What if the graph is dynamic?
4
Related Work

Generative models




Exp. family random graph models [Hanneke+/’06]
Dynamics in latent space [Sarkar+/’05]
Extension of mixed membership block models
[Fu+/10]
Other approaches


Autoregressive models for links [Huang+/09]
Extensions of static features [Tylenda+/09]
5
Goal

Link Prediction




incorporating graph dynamics,
requiring weak modeling assumptions,
allowing fast predictions,
and offering consistency guarantees.
6
Outline





Model
Estimator
Consistency
Scalability
Experiments
7
The Link Prediction Problem in Dynamic Graphs
Y1 (i,j)=1
G1
Y2 (i,j)=0
G2 ……
YT+1 (i,j)=?
GT+1
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j))
Edge in T+1
Features of previous graphs
and this pair of nodes
8
Including graph-based features

Example set of features for pair (i,j):




cn(i,j) (common neighbors)
ℓℓ(i,j) (last time a link was formed)
deg(j)
Represent dynamics using “datacubes” of these features.

≈ multi-dimensional histogram on binned feature values
1 ≤ cn ≤ 3
3 ≤ deg ≤ 6
1 ≤ ℓℓ ≤ 2
ηt = #pairs in Gt with these
features
high ηt+/ηt  this feature
combination is more likely to
create a new edge at time t+1
cn
deg
ℓℓ
ηt+ = #pairs in Gt with these
features, which had an edge in Gt+1
9
Including graph-based features
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Y1 (i,j)=1
G1


YT+1 (i,j)=?
Y2 (i,j)=0
G2
……
GT
How do we form these datacubes?
Vanilla idea: One datacube for Gt→Gt+1
aggregated over all pairs (i,j)

Does not allow for differently evolving communities
10
Our Model
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Y1 (i,j)=1
G1


YT+1 (i,j)=?
Y2 (i,j)=0
G2
……
GT
How do we form these datacubes?
Our Model: One datacube for each neighborhood

Captures local evolution
11
Our Model
Neighborhood Nt(i)= nodes within 2 hops
Features extracted from (Nt-p,…Nt)
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Datacube
Number of node pairs
- with feature s
- in the neighborhood of i
- at time t
Number of node pairs
- with feature s
- in the neighborhood of i
- at time t
- which got connected at time t+1
12
Our Model

Datacube dt(i) captures graph evolution



in the local neighborhood of a node
in the recent past
Model:
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( g(d
gG1,G2,…GT
(i,j))
)
t(i), st(i,j)

What is g(.)?
Local
evolution
patterns
Features of
the pair
13
Outline





Model
Estimator
Consistency
Scalability
Experiments
14
Kernel Estimator for g
{
{
{
{
{
{
{
{
datacube,
feature pair
t=2
GT-2
{
{
{
{
GT-1
GT
{query data-cube at T-1
and feature vector at
time T
compute similarities
…
…
datacube,
feature pair
t=1
……
{
{
{
{
{
{
{
{
…
{
{
{
{
G2
{
G1
datacube,
feature pair
t=3
15
Kernel Estimator for g
, )I{
==
}
}
K(
}

}
Factorize the similarity function

Allows computation of g(.) via simple lookups
16
Kernel Estimator for g
G1
G2
……
GT-2
GT-1
GT
compute similarities
only between data cubes
datacubes
t=1
datacubes
t=2
datacubes
t=3
w1
w2
w3
w4
w11  w22  w33  w44
w11  w22  w33  w44
η1 , η1+
η2 , η2+
η3 , η3+
η4 , η4+
17
Kernel Estimator for g
, )I{
==
}
}
K(
}

}
Factorize the similarity function


Allows computation of g(.) via simple lookups
What is K( , )?
18
Similarity between two datacubes
Idea 1

For each cell s, take
(η1+/η1 – η2+/η2)2 and sum

Problem:
η1 , η1+

η2 , η2+

Magnitude of η is ignored
5/10 and 50/100 are treated
equally
Consider the distribution
19
Similarity between two datacubes
0.07
Idea 2
0.06
0.05
0.04
0.03

0.02
0.01
0
0
5
10
15
20
25
30
35
40
45
η1 , η1+
0.14

0.12
0.1
0.08
0.06
0.04
For each cell s, compute
posterior distribution of edge
creation prob.
dist = total variation distance
between distributions
0.02
0
0
5
10
15
20
25
30
35
40

45
summed over all cells
η2 , η2+
K( , )  b
As b0, K(
,
dist(
,
)
) 0 unless dist(
0<b<1
,
) =0
20
Kernel Estimator for g
1

ĥ   K( , )η t 1
#
ĥ( , )
ĝ( , ) 
f̂( , )
Want to show:
ĝ 
 g
1
f̂   K( , )η t 1
#
21
Outline





Model
Estimator
Consistency
Scalability
Experiments
22
Consistency of Estimator

ĥ( , )
ĝ( , ) 
f̂( , )

Lemma 1: As T→∞, for some R>0,

Proof using:
As T→∞,
23
Consistency of Estimator


ĝ( , ) 
ĥ( , )
f̂( , )
Lemma 2: As T→∞,
24
Consistency of Estimator


Assumption: finite graph
Proof sketch:

Dynamics are Markovian with finite state space
the chain must eventually enter a closed,
irreducible communication class
geometric ergodicity if class is aperiodic
(if not, more complicated…)
strong mixing with exponential decay
variances decay as o(1/T)
25
Consistency of Estimator

Theorem:

Proof Sketch:
for some R>0


So
26
Outline





Model
Estimator
Consistency
Scalability
Experiments
27
Scalability

Full solution:



Approximate solution:


Summing over all n datacubes for all T timesteps
Infeasible
Sum over nearest neighbors of query datacube
How do we find nearest neighbors?

Locality Sensitive Hashing (LSH)
[Indyk+/98, Broder+/98]
28
Using LSH

Devise a hashing function for datacubes
such that


“Similar” datacubes tend to be hashed to the
same bucket
“Similar” = small total variation distance
between cells of datacubes
29
Using LSH

Step 1: Map datacubes to bit vectors
0.07
Use B2 bits for each bucket
0.06
0.05
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
30
35
40
For probability mass p the
first
bits are set to 1
45
Use B1 buckets to discretize [0,1]
Total M*B1*B2 bits, where M = max number of occupied cells << total
number of cells
30
Using LSH



Step 1: Map datacubes to bit vectors
Total variation distance
∝ L1 distance between distributions
≈ Hamming distance between vectors
Step 2: Hash function = k out of MB1B2 bits
31
Fast Search Using LSH
1111111111000000000111111111000
0000
0001
0011
10000101000011100001101010000
10101010000011100001101010000
101010101110111111011010111110
.
.
1011
.
.
1111111111000000000111111111001
1111
32
Outline





Model
Estimator
Consistency
Scalability
Experiments
33
Experiments

LL: last link (time of last occurrence of a pair)
static on 𝐺𝑇
Baselines

CN: rank by number of common neighbors in 𝐺𝑇
AA: more weight to low-degree common neighbors
Katz: accounts for longer paths
static on
∪ 𝐺𝑇





CN-all: apply CN to 𝐺1 ∪ ⋯ ∪ 𝐺𝑡
AA-all, Katz-all: similar
34
Setup
G1
G2
Training data



GT
GT+1
Test data
Pick random subset S from nodes with
degree>0 in GT+1
∀𝑠 ∈ 𝐒, predict a ranked list of nodes likely to link
to s
Report mean AUC (higher is better)
35
Simulations

Social network model of Hoff et al.



Each node has an independently drawn
feature vector
Edge(i,j) depends on features of i and j
Seasonality effect


Feature importance varies with season
different communities in each season
Feature vectors evolve smoothly over time
evolving community structures
36
Simulations


NonParam is much better than others in the presence of
seasonality
CN, AA, and Katz implicitly assume smooth evolution
37
Sensor
*
Network
* www.select.cs.cmu.edu/data
38
Summary

Link formation is assumed to depend on



Admits a kernel-based estimator



the neighborhood’s evolution
over a time window
Consistency
Scalability via LSH
Works particularly well for


Seasonal effects
differently evolving communities
39