Transcript Slide 1

CS224W: Social and Information Network Analysis
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
[LibenNowell-Kleinberg ‘03]

The link prediction task:
 Given G[t0,t0’] a graph on edges
up to time t0’ output a ranked list L
of links (not in G[t0,t0’]) that are
predicted to appear in G[t1,t1’]

Evaluation:
G[t0, t’0]
G[t1, t’1]
 n=|Enew|: # new edges that appear during
the test period [t1,t1’]
 Take top n elements of L and count correct edges
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2
[LibenNowell-Kleinberg ‘03]

Predict links in a evolving collaboration
network
Picture – Core vs
non-core

Core: Since network data is very sparse
 Consider only nodes with in-degree and
out-degree of at least 3
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
3

Methodology:
 For every pair of nodes (x,y) compute proximity c
 # of common neighbors c(x,y) of x and y
 Sort pairs by the decreasing score

 only condier/predict edges where both
endpoints are in the core (deg. >3)
X
 Predict top n pairs as new links
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4
[LibenNowell-Kleinberg ‘03]

For every pair of nodes (x,y) compute:
Γ(x) … degree of node x
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5
[LibenNowell-Kleinberg ’ 03]
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
6

Improvement over #common neighbors
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
7
[WSDM ‘11]

How to learn to predict new friends?
 Facebook’s People You May Know
 Let’s look at the data:
 92% of new friendships on
FB are friend-of-a-friend
 More common friends helps
v
u
7/20/2015
z
w
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
8


Recommend a list of possible friends
Supervised machine learning setting:
 Training example:
 For every node s have a list of nodes she will
create links to {v1 … vk}
s
 Use FB network from May 2011 and {v1…vk}
are the new friendships you created since then
“positive” nodes
“negative” nodes
 Task:
 For a given node s rank nodes {v1 … vk}
higher than other nodes in the network

Supervised Random Walks
7/20/2015
Green nodes
are the nodes
to which s
creates links in
the future
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9

How to combine node/edge attributes and
the network structure?
 Learn a strength of each edge based on:
 Profile of user u, profile of user v
 Interaction history of u and v
 Do a PageRank-like random walk
from s to measure the “proximity”
between s and other nodes
 Rank nodes by their “proximity”
(i.e., visiting prob.)
7/20/2015

s
“positive” nodes
“negative” nodes
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
[WSDM ’11]



Let s be the center node
Let fβ(u,v) be a function that assigns
a strength to each edge:
𝑎𝑢𝑣 = 𝑓𝛽 𝑢, 𝑣 = exp − 𝑖 𝛽𝑖 ⋅ Ψ𝑢𝑣 𝑖
 Ψuv is a feature vector
 Features of node u
 Features of node v
 Features of edge (u,v)
 (β is the parameter vector we want to learn!)

s
“positive” nodes
“negative” nodes
Do Random Walk with Restarts from s where
transitions are according to edge strengths auv
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11

s
s
Network
Set edge
strengths
auv = fβ(u,v)
How to estimate edge strengths?
 How to set parameters β of fβ(u,v)?
7/20/2015
Personalized
PageRank on the
weighted graph.
Each node u gets a
PageRank proximity pu
Sort nodes by the
decreasing PageRank
proximity pu
Recommend top k
nodes with the highest
proximity pu to node s
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
[WSDM ’11]


auv …. Strength of edge (u,v,)
Random walk transition matrix:
s

PageRank transition matrix:
“positive” nodes
“negative” nodes
 with prob. α jump back to s

Compute PageRank vector: p = pT Q

Rank nodes u by pu
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13
[WSDM ’11]




Each node u has a score pu
Positive nodes D ={d1,…, dk}
Negative nodes L = {the rest}
What do we want?
Want to find β such that pl < pd
s
“positive” nodes
“negative” nodes
 The exact solution to the above problem may not exist
 So we make the constrains “soft” (i.e., optional)
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
14
[WSDM ’11]

Want to minimize:
 Loss: h(x)=0 if x<0, x2 else
1
0.9
Loss
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1
-0.8
-0.6
-0.4
pl < pd
7/20/2015
-0.2
0
0.2
pl=pd
0.4
0.6
0.8
1
p l > pd
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15
[WSDM ’11]


How to minimize F?
Both pl and pd depend on β
s
 Given β assign edge weights auv=fβ(u,v)
 Using transition matrix Q=[auv]
compute PageRank scores pu
 Rank nodes by the PageRank score

Want to find β such that pl < pd
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16
[WSDM ’11]

How to minimize F?
 Take the derivative!
Easy


We know:
So:
i.e.
 Looks like the PageRank equation!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
17

To optimize F, use gradient based method:
 Pick a random starting point β0
 Compute the personalized PageRank vector p
 Compute the gradient with respect
to the weight vector β
 Update β
 Optimize using
quasi-Newton
method
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
[WSDM ’11]

Facebook Iceland network
 174,000 nodes (55% of population)
 Avg. degree 168
 Avg. person added 26 friends/month

s
For every node s:
 Positive examples:
 D={ new friendships of s created in Nov ‘09 }
 Negative examples:
 L={ other nodes s did not create new links to }
 Limit to friends of friends:
 on avg. there are 20k FoFs (max 2M)!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19

Node and Edge features for learning:
 Node: Age, Gender, Degree
 Edge: Age of an edge, Communication, Profile
visits, Co-tagged photos

Baselines:
 Decision trees and logistic regression:
 Above features + 10 network features (PageRank,
Skip AUC
common friends)

Evaluation:
evaluation,
Skip Logistic
Regression, DT?
 AUC and Precision at Top20
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20

Facebook:
predict future
friends
 Adamic-Adar
already works
great
 Logistic regression
also strong
 SRW gives slight
improvement
7/20/2015
No AUC column,
No DT, LR
What are feature
sets? Node,
Network, Path
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
21

Many networks are implicit or hard to observe:
 Hidden/hard-to-reach populations:
 Network of needle sharing between drug injection users
 Implicit connections:
 Network of information propagation in online news media

But we can observe results of the processes
taking place on such (invisible) networks:
 Virus propagation:
 Drug users get sick, and we observe when they see the doctor
 Information networks:
 We observe when media sites mention information

Question: Can we infer the hidden networks?
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
24
[KDD ’10, NIPS ‘10]

There is a hidden diffusion network:
a
b
c

d
e
We only see times when nodes get “infected”:
 Cascade c1: (a,1), (c,2), (b,3), (e,4)
 Cascade c2: (c,1), (a,4), (b,5), (d,6)

Want to infer who-infects-whom network!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
25
Virus propagation
Word of mouth &
Viral marketing
Process
Viruses propagate
through the network
Recommendations and
influence propagate
We observe
We only observe when
people get sick
We only observe when
people buy products
It’s hidden
But NOT who infected
whom
But NOT who influenced
whom
Can we infer the underlying network?
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
26
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
27

Goal: Find a graph G that best explains the
observed information times
 Given a graph G, define the likelihood P(C|G):
 Define a model of information diffusion over a graph
Have a diagram of
 Pwhat
… define
prob. that u infects v in cascade c
we will
c(u,v)
and how it will all fit
 P(c|T)
prob. that c spread in particular pattern T
together. …
Maybe
even on a separate
 P(c|G)
… prob. that cascade c occurred in G
slide.
 P(C|G) … prob. that a set of cascades C occurred in G

Questions:
 How to efficiently compute P(G|C)? (given a single G)
 How to efficiently find G* that maximizes P(G|C)?
(over O(2N*N) graphs)
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
28
 Continuous time cascade diffusion model:
 Cascade c reaches node u at tu
and spreads to u’s neighbors:
 With probability β cascade propagates along edge (u, v)
and we determine the infection time of node v
tv = tu + Δ
e.g.: Δ ~ Exponential or Power-law
tu
7/20/2015
Δ1
tb
Δ2
u
a
b
c
d
tc
We assume each node
v has only one parent!
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
29

The model for 1 cascade:
 Cascade reaches node u at time tu,
and spreads to u’s neighbors v:
With prob. β cascade propagates
along edge (u,v) and tv = tu+Δ

tv
ε ε ε
a
b
β
d
c
Transmission probability:
Pc(u,v)  P(tv -tu ) if tv> tu else ε
e
e.g.: Pc(u,v)  e -Δt
 ε captures influence external to the network
 At any time a node can get infected from outside
with small probability ε
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
30

Given node infection times and pattern T:
 c = { (a,1), (c,2), (b,3), (e,4) }
 T = { a→b, a→c, b→e }

Prob. that c propagates in pattern T
T
a
b
d
c
e
Graph G
Edges that “propagated”

Edges that failed to “propagate”
Approximate it as:
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
31

How likely is cascade c to spread in graph G?
 c = {(a,1), (c,2), (b,3), (e,4)}
a
d
c
e

a
b
a
b
d
c
e
b
d
c
e
Need to consider all possible ways for c to
spread over G (i.e., all spanning trees T):
Consider only the most likely propagation tree
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32

Score of a graph G for a set of cascades C:

Want to find the “best” graph:
The problem is NP-hard:
MAX-k-COVER [KDD ’10]
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
33

Given a cascade c, what is the most likely
propagation tree?
 A maximum directed spanning tree
 Edge (i,j) in G has weight wc(i,j)=log Pc(i,j)
 The maximum weight spanning tree on
infected nodes: Each node picks an in-edge of
max weight:
Local greedy selection gives optimal tree!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
a
b
d
c
e
Parent of
node i in
tree T
34


Theorem:
Fc(G) is monotonic, and submodular
Proof:






7/20/2015
Single cascade c, some edge e=(r,s) of weight. wrs
Show Fc(G {e}) – Fc(G) ≥ Fc(G’ {e}) – FC (G’)
Let w.s be max weight in-edge of s in G
Let w’.s be max weight in-edge of s in G’
ai
bs
Since G  G’ : w.s  w’.s and wrs= w’rs
j
𝐹𝑐 (𝐺 ∪ 𝑟, 𝑠 − 𝐹𝑐 𝐺
r
= max 𝑤.𝑠 , 𝑤𝑟𝑠 − 𝑤.𝑠
s picks in-edge
of max weight
≥ max 𝑤.𝑠′ , 𝑤𝑟𝑠 − 𝑤.𝑠′
= 𝐹𝑐 𝐺 ′ ∪ 𝑟, 𝑠 − 𝐹𝑐 (𝐺′)
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
35
[KDD ’10]

The algorithm:
Use greedy hill-climbing to maximize FC(G):
 Start with empty G0 (G with no edges)
 Add k edges (k is parameter)
 At every step i add an edge to Gi that
maximizes the marginal improvement
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
36

Synthetic data:





Take a graph G on k edges
Simulate info. diffusion
Record node infection times
Reconstruct G
Evaluation:
 How many edges of G
can NetInf find?
 Break-even point: 0.95
 Performance is independent
of the structure of G!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
 We achieve ≈ 90 % of the best possible
network!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
38
 With 2x as many infections as edges,
the break-even point is already 0.8 - 0.9!
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
39
[KDD, ‘09]

Memetracker dataset:




172m news articles
Aug ‘08 – Sept ‘09
343m textual phrases
Times tc(w) when site
w mentions phrase c
http://memetracker.org


Given times when sites mention phrases
Infer the network of information diffusion:
 Who tends to copy (repeat after) whom
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
40

5,000 news sites:
Blogs
Mainstream media
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41
Blogs
Mainstream media
7/20/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
42