Transcript ppt
Nonparametric Link
Prediction in Dynamic Graphs
Purnamrita Sarkar (UC Berkeley)
Deepayan Chakrabarti (Facebook)
Michael Jordan (UC Berkeley)
1
Link Prediction
Who is most likely to be interact with a given node?
Alice
Should Facebook
suggest Alice
as a friend for Bob?
Bob
Friend suggestion in Facebook
2
Link Prediction
Alice
Bob
Should Netflix
suggest this
movie to Alice?
Charlie
Movie recommendation in Netflix
3
Link Prediction
Prediction using simple features
degree of a node
number of common neighbors
last time a link appeared
What if the graph is dynamic?
4
Related Work
Generative models
Exp. family random graph models [Hanneke+/’06]
Dynamics in latent space [Sarkar+/’05]
Extension of mixed membership block models
[Fu+/10]
Other approaches
Autoregressive models for links [Huang+/09]
Extensions of static features [Tylenda+/09]
5
Goal
Link Prediction
incorporating graph dynamics,
requiring weak modeling assumptions,
allowing fast predictions,
and offering consistency guarantees.
6
Outline
Model
Estimator
Consistency
Scalability
Experiments
7
The Link Prediction Problem in Dynamic Graphs
Y1 (i,j)=1
G1
Y2 (i,j)=0
G2 ……
YT+1 (i,j)=?
GT+1
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j))
Edge in T+1
Features of previous graphs
and this pair of nodes
8
Including graph-based features
Example set of features for pair (i,j):
cn(i,j) (common neighbors)
ℓℓ(i,j) (last time a link was formed)
deg(j)
Represent dynamics using “datacubes” of these features.
≈ multi-dimensional histogram on binned feature values
1 ≤ cn ≤ 3
3 ≤ deg ≤ 6
1 ≤ ℓℓ ≤ 2
ηt = #pairs in Gt with these
features
high ηt+/ηt this feature
combination is more likely to
create a new edge at time t+1
cn
deg
ℓℓ
ηt+ = #pairs in Gt with these
features, which had an edge in Gt+1
9
Including graph-based features
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Y1 (i,j)=1
G1
YT+1 (i,j)=?
Y2 (i,j)=0
G2
……
GT
How do we form these datacubes?
Vanilla idea: One datacube for Gt→Gt+1
aggregated over all pairs (i,j)
Does not allow for differently evolving communities
10
Our Model
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Y1 (i,j)=1
G1
YT+1 (i,j)=?
Y2 (i,j)=0
G2
……
GT
How do we form these datacubes?
Our Model: One datacube for each neighborhood
Captures local evolution
11
Our Model
Neighborhood Nt(i)= nodes within 2 hops
Features extracted from (Nt-p,…Nt)
1 ≤ cn(i,j) ≤ 3
3 ≤ deg(i,j) ≤ 6
1 ≤ ℓℓ (i,j) ≤ 2
Datacube
Number of node pairs
- with feature s
- in the neighborhood of i
- at time t
Number of node pairs
- with feature s
- in the neighborhood of i
- at time t
- which got connected at time t+1
12
Our Model
Datacube dt(i) captures graph evolution
in the local neighborhood of a node
in the recent past
Model:
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( g(d
gG1,G2,…GT
(i,j))
)
t(i), st(i,j)
What is g(.)?
Local
evolution
patterns
Features of
the pair
13
Outline
Model
Estimator
Consistency
Scalability
Experiments
14
Kernel Estimator for g
{
{
{
{
{
{
{
{
datacube,
feature pair
t=2
GT-2
{
{
{
{
GT-1
GT
{query data-cube at T-1
and feature vector at
time T
compute similarities
…
…
datacube,
feature pair
t=1
……
{
{
{
{
{
{
{
{
…
{
{
{
{
G2
{
G1
datacube,
feature pair
t=3
15
Kernel Estimator for g
, )I{
==
}
}
K(
}
}
Factorize the similarity function
Allows computation of g(.) via simple lookups
16
Kernel Estimator for g
G1
G2
……
GT-2
GT-1
GT
compute similarities
only between data cubes
datacubes
t=1
datacubes
t=2
datacubes
t=3
w1
w2
w3
w4
w11 w22 w33 w44
w11 w22 w33 w44
η1 , η1+
η2 , η2+
η3 , η3+
η4 , η4+
17
Kernel Estimator for g
, )I{
==
}
}
K(
}
}
Factorize the similarity function
Allows computation of g(.) via simple lookups
What is K( , )?
18
Similarity between two datacubes
Idea 1
For each cell s, take
(η1+/η1 – η2+/η2)2 and sum
Problem:
η1 , η1+
η2 , η2+
Magnitude of η is ignored
5/10 and 50/100 are treated
equally
Consider the distribution
19
Similarity between two datacubes
0.07
Idea 2
0.06
0.05
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
30
35
40
45
η1 , η1+
0.14
0.12
0.1
0.08
0.06
0.04
For each cell s, compute
posterior distribution of edge
creation prob.
dist = total variation distance
between distributions
0.02
0
0
5
10
15
20
25
30
35
40
45
summed over all cells
η2 , η2+
K( , ) b
As b0, K(
,
dist(
,
)
) 0 unless dist(
0<b<1
,
) =0
20
Kernel Estimator for g
1
ĥ K( , )η t 1
#
ĥ( , )
ĝ( , )
f̂( , )
Want to show:
ĝ
g
1
f̂ K( , )η t 1
#
21
Outline
Model
Estimator
Consistency
Scalability
Experiments
22
Consistency of Estimator
ĥ( , )
ĝ( , )
f̂( , )
Lemma 1: As T→∞, for some R>0,
Proof using:
As T→∞,
23
Consistency of Estimator
ĝ( , )
ĥ( , )
f̂( , )
Lemma 2: As T→∞,
24
Consistency of Estimator
Assumption: finite graph
Proof sketch:
Dynamics are Markovian with finite state space
the chain must eventually enter a closed,
irreducible communication class
geometric ergodicity if class is aperiodic
(if not, more complicated…)
strong mixing with exponential decay
variances decay as o(1/T)
25
Consistency of Estimator
Theorem:
Proof Sketch:
for some R>0
So
26
Outline
Model
Estimator
Consistency
Scalability
Experiments
27
Scalability
Full solution:
Approximate solution:
Summing over all n datacubes for all T timesteps
Infeasible
Sum over nearest neighbors of query datacube
How do we find nearest neighbors?
Locality Sensitive Hashing (LSH)
[Indyk+/98, Broder+/98]
28
Using LSH
Devise a hashing function for datacubes
such that
“Similar” datacubes tend to be hashed to the
same bucket
“Similar” = small total variation distance
between cells of datacubes
29
Using LSH
Step 1: Map datacubes to bit vectors
0.07
Use B2 bits for each bucket
0.06
0.05
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
30
35
40
For probability mass p the
first
bits are set to 1
45
Use B1 buckets to discretize [0,1]
Total M*B1*B2 bits, where M = max number of occupied cells << total
number of cells
30
Using LSH
Step 1: Map datacubes to bit vectors
Total variation distance
∝ L1 distance between distributions
≈ Hamming distance between vectors
Step 2: Hash function = k out of MB1B2 bits
31
Fast Search Using LSH
1111111111000000000111111111000
0000
0001
0011
10000101000011100001101010000
10101010000011100001101010000
101010101110111111011010111110
.
.
1011
.
.
1111111111000000000111111111001
1111
32
Outline
Model
Estimator
Consistency
Scalability
Experiments
33
Experiments
LL: last link (time of last occurrence of a pair)
static on 𝐺𝑇
Baselines
CN: rank by number of common neighbors in 𝐺𝑇
AA: more weight to low-degree common neighbors
Katz: accounts for longer paths
static on
∪ 𝐺𝑇
CN-all: apply CN to 𝐺1 ∪ ⋯ ∪ 𝐺𝑡
AA-all, Katz-all: similar
34
Setup
G1
G2
Training data
GT
GT+1
Test data
Pick random subset S from nodes with
degree>0 in GT+1
∀𝑠 ∈ 𝐒, predict a ranked list of nodes likely to link
to s
Report mean AUC (higher is better)
35
Simulations
Social network model of Hoff et al.
Each node has an independently drawn
feature vector
Edge(i,j) depends on features of i and j
Seasonality effect
Feature importance varies with season
different communities in each season
Feature vectors evolve smoothly over time
evolving community structures
36
Simulations
NonParam is much better than others in the presence of
seasonality
CN, AA, and Katz implicitly assume smooth evolution
37
Sensor
*
Network
* www.select.cs.cmu.edu/data
38
Summary
Link formation is assumed to depend on
Admits a kernel-based estimator
the neighborhood’s evolution
over a time window
Consistency
Scalability via LSH
Works particularly well for
Seasonal effects
differently evolving communities
39