Transcript Slide 1
CS224W: Social and Information Network Analysis
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Task: (HW3 is optional)
Find node correspondences
between two graphs
Incentives:
European chocolates!
Fame!
Up to 10% extra credit
Due:
Monday Nov 14
No late days!
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2
Random network
(Erdos-Renyi random graph)
Scale-free (power-law) network
Degree
distribution is
Power-law
Degree distribution is Binomial
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Part 1-3
[Mitzenmacher, ‘03]
We will analyze the following model:
Nodes arrive in order 1,2,3,…,n
When node i is created it makes a
single link to an earlier node i chosen:
Node i
1)With prob. p, i links to j chosen uniformly at
random (from among all earlier nodes)
2) With prob. 1-p, node i chooses node j uniformly
at random and links to a node j points to.
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4
Claim: The described model generates networks
where the fraction of nodes with degree k scales
as:
1
P(di k ) k
(1 )
q
where q=1-p
Consider deterministic and continuous
approximation to the in-degree of node i as a
function of time t
t is the number of nodes that have arrived so far
In-degree di(t) of node i (i=1,2,…,n) is a continuous
quantity and it grows deterministically with time t
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5
Node i
Initial condition:
di(t)=0, when t=i (node i just arrived)
Expected change of di(t) over time:
Node i gains an in-link at step t+1 only if a link
from a newly created node t+1 points to it.
What’s the probability of this event?
With prob. p node t+1 links randomly:
Links to our node i with prob. 1/t
With prob. 1-p node t+1 links preferentially:
Links to our node i with prob. di(t)/t
So: Prob. node t+1 links to i is:
7/18/2015
𝟏
𝐩
𝒕
+ 𝟏−𝒑
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
𝒅𝒊 (𝒕)
𝒕
6
Expected change of di(t):
𝟏
𝒕
𝒅𝒊 (𝒕 + 𝟏) − 𝒅𝒊 (𝒕) = 𝒑 + 𝟏 − 𝒑
d𝑑𝑖 (𝑡)
d𝑡
1
d𝑑𝑖 (𝑡)
𝑝+𝑞𝑑𝑖 (𝑡)
1
𝑡
=
1
d𝑑𝑖 (𝑡)
𝑝+𝑞𝑑𝑖 (𝑡)
1
ln
𝑞
𝑝 + 𝑞𝑑𝑖 𝑡
𝑝 + 𝑞𝑑𝑖 𝑡 = 𝐴
7/18/2015
𝑑𝑖 (𝑡)
𝑡
=𝑝 + 1−𝑝
=
𝒅𝒊 (𝒕)
𝒕
𝑝+𝑞𝑑𝑖 (𝑡)
𝑡
Divide by
p+q di(t)
1
d𝑡
𝑡
=
1
d𝑡
𝑡
Integrate
Let A=ec and
exponentiate
= ln 𝑡 + 𝑐
𝑡𝑞
𝑑𝑖 𝑡 =
1
𝑞
𝐴𝑡 𝑞 − 𝑝
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
7
What is the constant A?
1
𝑑𝑖 𝑡 = 𝐴𝑡 𝑞 − 𝑝
𝑞
We know: 𝑑𝑖 𝑖 = 0
So: 𝑑𝑖 𝑖 =
7/18/2015
𝐴=
1
𝑞
𝐴𝑖 𝑞 − 𝑝 = 0
𝑝
𝑖𝑞
𝑑𝑖 𝑡 =
𝑝
𝑞
𝑡 𝑞
𝑖
−1
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
8
What is F(k) the fraction of nodes that has
degree at least k at time t?
How many nodes i have degree > k?
𝑑𝑖 𝑡 =
𝑝
𝑞
then: i < t
𝑡 𝑞
𝑖
𝑞
𝑘
𝑝
−1 >𝑘
Motivate this better!
Why is F(d) as CDF
if it is really a
CCDF?
1
−1
−𝑞
There are t nodes total at time t so F(k):
q
F (k ) k 1
p
7/18/2015
1
q
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9
What is the fraction of nodes with
degree exactly k?
Take derivative of F(k):
F(k) is CDF, so F’(k) is the PDF
1 q
F ' (k ) k 1
pp
7/18/2015
1
1
q
q
F (k ) k 1
p
1
1
q
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
1
q
Show simulations
from Barabasi
rather than showing
calculations!
Two changes from the Gnp
Groth + Preferential attachment
Do we need both? Yes!
Add growth to Gnp
(assume 1 edge is added at each step)
𝑋𝑗 = degree of node j at the end
𝑋𝑗 (𝑢) = 1 if node u links to j, else 0
𝑋𝑗 = 𝑋𝑗 𝑗 + 1 + 𝑋𝑗 𝑗 + 2 + ⋯ + 𝑋𝑗 (𝑛)
𝐸[𝑋𝑗 (𝑢)] = 𝑃[𝑢 links to 𝑗] =
𝐸[𝑋𝑗 ] =
1
𝑛
𝑢=𝑗 𝑢−1
1
𝑢−1
= 𝐻𝑛−1 − 𝐻𝑗
𝐸[𝑋𝑗 ] = log 𝑛 − 1 − log 𝑗 = log
7/18/2015
Hn…nth harmonic number:
𝑛
1
𝐻𝑛 =
≈ log(𝑛)
𝑘
𝑘=1
𝑛−1
𝑗
≠
𝒏 𝜶
𝒋
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11
Preferential attachment gives power-law
degrees
Intuitively reasonable process
Can tune p to get the observed exponent
On the web, P[node has degree d] ~ d -2.1
2.1 = 1+1/(1-p) p ~ 0.1
There are also other network formation
mechanisms that generate scale-free networks:
Random surfer model
Forest Fire model
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
Skip!
Copying mechanism (directed network)
select a node and an edge of this node
attach to the endpoint of this edge
Walking on a network (directed network)
the new node connects to a node, then to every
first, second, … neighbor of this node
Attaching to edges
select an edge
attach to both endpoints of this edge
Node duplication
duplicate a node with all its edges
randomly prune edges of new node
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13
Preferential attachment is not so good at
predicting network structure
Age-degree correlation
Links among high degree nodes
On the web nodes sometime avoid linking to each other
Further questions:
What is a reasonable probabilistic model for how
people sample through web-pages and link to
them?
Short random walks
Effect of search engines – reaching pages based on
number of links to them
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
14
How does the
connectivity of the
network change as the
vertices get removed?
[Albert et al. 00; Palmer et al. 01]
Vertices can be
removed:
Uniformly at random
In order of decreasing degree
It is important for epidemiology
Removal of vertices corresponds to vaccination
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15
Get Colorful plots
from Barabasi
Real-world networks are resilient to random attacks
You need to remove all web-pages of degree > 5
to disconnect the web
But this is a very small fraction of all web pages
Random network has better resilience to targeted attacks
Mean path length
Preferential
removal
Internet
(Autonomous
systems)
Random
removal
Fraction of removed nodes
7/18/2015
Random network
Fraction of removed nodes
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16
Preferential attachment is a model of a
growing network
What governs network growth and
evolution?
P1) Node arrival process:
When nodes enter the network
P2) Edge initiation process:
Each node decides when to initiate an edge
P3) Edge destination process:
7/18/2015
The node determines destination of the edge
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
[Leskovec et al., KDD ’08]
4 online social networks with
exact edge arrival sequence
For every edge (u,v) we know exact
time of the appearance tuv
Directly observe mechanisms leading
to global network properties
and so on for
millions…
(F)
(D)
(A)
(L)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19
(F)
(D)
Flickr:
Exponential
(A)
(L)
Answers:
Sub-linear
7/18/2015
Delicious:
Linear
LinkedIn:
Quadratic
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20
How long do nodes live?
Node life-time is the time between the 1st
and the last edge of a node
Lifetime of a node
1st edge
of node i
last edge
of node i
time
How do nodes “wake up” to create links?
1st edge
of node i
7/18/2015
Edge creation
events
last edge
of node i
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
time
21
LinkedIn
Lifetime a:
time between
node’s first
and last edge
Node lifetime is exponentially distributed:
−𝜆𝑎
𝑝𝑙 𝑎 = 𝜆𝑒
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
22
Invent better
notation for delta!
How do nodes “wake up” to create edges?
Edge gap 𝜹𝒊 𝒅 : time between dth and d+1st
edge of node i:
Let 𝑡𝑖 𝑑 be the creation time of d-th edge of node i
𝛿𝑖 𝑑 = 𝑡𝑖 𝑑 + 1 − 𝑡𝑖 𝑑
𝛿𝑖 1
1st edge
of node i
𝛿𝑖 2
𝛿𝑖 3
last edge
of node i
time
𝜹 𝒅 is a distribution (histogram) of 𝜹𝒊 𝒅 over all nodes 𝑖
𝛿𝑖 1
7/18/2015
Node i
𝛿𝑖+1 1
Node i+1
𝛿𝑖+2 1
Node i+2
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
23
LinkedIn
Edge gap δ(d):
inter-arrival
time between
dth and d+1st
edge
For every d we
get a separate
histogram
pg ( (1)) (1) e
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
24
Invent better
notation for delta!
How do α and β change as a function of d?
Fit to each
plot of δ(d):
7/18/2015
pg ( (d )) (d )
( d )
e
(d )
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
25
α is const, β linear in d – gaps get smaller with d
d
pg ( (d )) (d ) e
Log(probability)
(d )
d=3
d=2
Degree
d=1
Log(edge gap)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
26
Source node i wakes up and creates an edge
How does i select a target node j?
What is the degree of the target j?
Do preferential attachment really hold?
How many hops away is the target j?
Are edges attaching locally?
2
7/18/2015
3
4
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
27
[Leskovec et al., KDD ’08]
Are edges more likely to connect to higher
degree nodes?
pe (k ) k
PA
Gnp
Flickr
7/18/2015
Network
τ
Gnp
0
PA
1
Flickr
1
Delicious
1
Answers
0.9
LinkedIn
0.6
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
28
[Leskovec et al., KDD ’08]
Just before the edge (u,w) is placed how
many hops are between u and w?
Fraction of triad
closing edges
Gnp
PA
Flickr
Network
%Δ
Flickr
66%
Delicious
28%
Answers
23%
LinkedIn
50%
Real edges are local!
w
Mostuof them close
v
triangles!
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
29
Explain the
strategies better
Remove the
example of
Random-Random.
Explain likelihood
better
Focus only on triad-closing edges
New triad-closing edge (u,w) appears next
Model this as 2 independent choices:
v’
1. u choses neighbor v
2. v choses neighbor w
and connect u to w
E.g.: Under Random-Random:
1 1
5 2
1
5
𝑝 𝑢, 𝑤 = ⋅ + ⋅ 1 =
u
w
v
3
10
Under a particular pair of “strategies”:
Likelihood of the graph = 𝑢,𝑤 ∈𝐸 𝑝 𝑢, 𝑤
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
30
Improvement over the baseline:
Baseline: Pick a random node 2 hops away
Select w (2nd node)
Strategy to select v (1st node)
Strategies to pick a neighbor:
7/18/2015
random: uniformly at random
deg: proportional to its degree
com: prop. to the number of common friends
last: prop. to time since last activity
comlast: prop. to com*last
u
w
v
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
31
[Leskovec et al., KDD ’08]
The model of network evolution
Process
P1) Node arrival
Model
• Node arrival function is given
• Node lifetime is exponential
P2) Edge initiation • Edge gaps get smaller as the
degree increases
P3) Edge
Pick edge destination using
destination
random-random
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32
[Leskovec et al., KDD ’08]
Skip
Theorem: Exponential node lifetimes and
power-law with exponential cutoff edge gaps
lead to power-law degree distributions
Interesting as temporal behavior predicts
structural network property
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
33
Skip!
Given the model one can take an existing
network continue its evolution
Compare true and predicted (based on the
theorem) degree exponent:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
35
How do networks evolve at the macro level?
What are global phenomena of network growth?
Questions:
What is the relation between the number of nodes
n(t) and number of edges e(t) over time t?
How does diameter change as the network grows?
How does degree distribution evolve as the
network grows?
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
36
[Leskovec et al., KDD 05]
N(t) … nodes at time t
E(t) … edges at time t
Suppose that
N(t+1) = 2 * N(t)
Q: what is
E(t+1) =2 * E(t)
A: over-doubled!
But obeying the Densification Power Law
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
[Leskovec et al., KDD 05]
Internet
E(t)
What is the relation between
the number of nodes and the
edges over time?
a=1.2
First guess: constant average
degree over time
Networks are denser over time
Densification Power Law:
N(t)
Citations
E(t)
a … densification exponent (1 ≤ a ≤ 2)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
a=1.6
N(t)
38
[Leskovec et al. KDD 05]
Densification Power Law
the number of edges grows faster than the
number of nodes – average degree is increasing
or
equivalently
a … densification exponent: 1 ≤ a ≤ 2:
a=1: linear growth – constant out-degree
(traditionally assumed)
a=2: quadratic growth – clique
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
39
[Leskovec et al. KDD 05]
Internet
diameter
Prior models and intuition say
that the network diameter slowly
grows (like log N, log log N)
size of the graph
Diameter shrinks over time
as the network grows the
distances between the nodes
slowly decrease
7/18/2015
diameter
Citations
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
time
40
Is shrinking
diameter just a
consequence of
densification?
diameter
[Leskovec et al. TKDD 07]
Erdos-Renyi
random graph
Densification
exponent a =1.3
size of the graph
Densifying random graph has increasing
diameter There is more to shrinking diameter
than just densification
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41
Make sure that Power-laws and PA fit into a
single lecture
Then 2nd lecture (evolution) will be finished in
time)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
42
Is it the degree sequence?
Compare diameter of a:
diameter
True network (red)
Random network with
the same degree
distribution (blue)
Citations
size of the graph
Densification + degree sequence
give shrinking diameter
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
43
[Leskovec et al. TKDD 07]
How does degree distribution evolve to
allow for densification?
Option 1) Degree exponent n is constant:
Fact 1: For degree exponent 1<n < 2: a = 2/
Email network
7/18/2015
A consequence of what
we learned in last class:
■ Power-laws with
exponents <2 have infinite
expectations.
■ So, by maintaining
constant degree exponent
the average degree grows.
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
44
[Leskovec et al. TKDD 07]
How does degree distribution evolve to allow
for densification?
Option 2) Exponent n evolves with graph size n:
Fact 2:
Citation network
7/18/2015
Remember, expected
degree is:
𝛾−1
𝐸𝑥 =
𝑥
𝛾 − 2 𝑚𝑖𝑛
So has to decay as
as function of graph
size for the avg. degree
to go up
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
45
[Leskovec et al. TKDD 07]
Want to model graphs that density and have
shrinking diameters
Intuition:
How do we meet friends at a party?
How do we identify references when writing
papers?
w
7/18/2015
v
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
46
[Leskovec et al. TKDD 07]
The Forest Fire model has 2 parameters:
p … forward burning probability
r … backward burning probability
The model:
Each turn a new node v arrives
Uniformly at random chooses an “ambassador” w
Flip 2 geometric coins to determine the number of
in- and out-links of w to follow
“Fire” spreads recursively until it dies
New node v links to all burned nodes
Geometric distribution:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
47
Forest Fire generates graphs that densify
and have shrinking diameter
E(t)
densification
1.32
N(t)
7/18/2015
diameter
diameter
N(t)
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
48
Forest Fire also generates graphs with
power-law degree distribution
in-degree
log count vs. log in-degree
7/18/2015
out-degree
log count vs. log out-degree
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
49
Fix backward
probability r and
vary forward
burning prob. p
Notice a sharp
transition
between sparse
and clique-like
graphs
Increasing
diameter
Sparse
graph
Clique-like
graph
Constant
diameter
Decreasing
diameter
Sweet spot is
very narrow
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
50