Transcript Slide 1

CS224W: Social and Information Network Analysis
Jure Leskovec, Stanford University
http://cs224w.stanford.edu

Task: (HW3 is optional)
 Find node correspondences
between two graphs

Incentives:
 European chocolates!
 Fame!
 Up to 10% extra credit

Due:
 Monday Nov 14
 No late days!
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2
Random network
(Erdos-Renyi random graph)
Scale-free (power-law) network
Degree
distribution is
Power-law
Degree distribution is Binomial
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Part 1-3
[Mitzenmacher, ‘03]
We will analyze the following model:
 Nodes arrive in order 1,2,3,…,n
 When node i is created it makes a
single link to an earlier node i chosen:
Node i
 1)With prob. p, i links to j chosen uniformly at
random (from among all earlier nodes)
 2) With prob. 1-p, node i chooses node j uniformly
at random and links to a node j points to.
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4

Claim: The described model generates networks
where the fraction of nodes with degree k scales
as:
1
P(di  k )  k

(1 )
q
where q=1-p
Consider deterministic and continuous
approximation to the in-degree of node i as a
function of time t
 t is the number of nodes that have arrived so far
 In-degree di(t) of node i (i=1,2,…,n) is a continuous
quantity and it grows deterministically with time t
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5
Node i

Initial condition:
 di(t)=0, when t=i (node i just arrived)

Expected change of di(t) over time:
 Node i gains an in-link at step t+1 only if a link
from a newly created node t+1 points to it.
 What’s the probability of this event?
 With prob. p node t+1 links randomly:
 Links to our node i with prob. 1/t
 With prob. 1-p node t+1 links preferentially:
 Links to our node i with prob. di(t)/t
 So: Prob. node t+1 links to i is:
7/18/2015
𝟏
𝐩
𝒕
+ 𝟏−𝒑
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
𝒅𝒊 (𝒕)
𝒕
6

Expected change of di(t):
𝟏
𝒕
 𝒅𝒊 (𝒕 + 𝟏) − 𝒅𝒊 (𝒕) = 𝒑 + 𝟏 − 𝒑

d𝑑𝑖 (𝑡)
d𝑡

1
d𝑑𝑖 (𝑡)
𝑝+𝑞𝑑𝑖 (𝑡)


1
𝑡
=
1
d𝑑𝑖 (𝑡)
𝑝+𝑞𝑑𝑖 (𝑡)
1
ln
𝑞
𝑝 + 𝑞𝑑𝑖 𝑡
 𝑝 + 𝑞𝑑𝑖 𝑡 = 𝐴
7/18/2015
𝑑𝑖 (𝑡)
𝑡
=𝑝 + 1−𝑝
=
𝒅𝒊 (𝒕)
𝒕
𝑝+𝑞𝑑𝑖 (𝑡)
𝑡
Divide by
p+q di(t)
1
d𝑡
𝑡
=
1
d𝑡
𝑡
Integrate
Let A=ec and
exponentiate
= ln 𝑡 + 𝑐
𝑡𝑞
𝑑𝑖 𝑡 =
1
𝑞
𝐴𝑡 𝑞 − 𝑝
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
7
What is the constant A?


1
𝑑𝑖 𝑡 = 𝐴𝑡 𝑞 − 𝑝
𝑞
We know: 𝑑𝑖 𝑖 = 0
So: 𝑑𝑖 𝑖 =


7/18/2015
𝐴=
1
𝑞
𝐴𝑖 𝑞 − 𝑝 = 0
𝑝
𝑖𝑞
𝑑𝑖 𝑡 =
𝑝
𝑞
𝑡 𝑞
𝑖
−1
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
8

What is F(k) the fraction of nodes that has
degree at least k at time t?
 How many nodes i have degree > k?
 𝑑𝑖 𝑡 =
𝑝
𝑞
 then: i < t

𝑡 𝑞
𝑖
𝑞
𝑘
𝑝
−1 >𝑘
Motivate this better!
Why is F(d) as CDF
if it is really a
CCDF?
1
−1
−𝑞
There are t nodes total at time t so F(k):
q

F (k )   k  1
p

7/18/2015

1
q
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9

What is the fraction of nodes with
degree exactly k?
 Take derivative of F(k):
 F(k) is CDF, so F’(k) is the PDF

1 q
F ' (k )   k  1
pp

7/18/2015
1
1
q
q

F (k )   k  1
p


1
  1
q
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
1
q
Show simulations
from Barabasi
rather than showing
calculations!

Two changes from the Gnp
 Groth + Preferential attachment

Do we need both? Yes!
 Add growth to Gnp
(assume 1 edge is added at each step)
 𝑋𝑗 = degree of node j at the end
 𝑋𝑗 (𝑢) = 1 if node u links to j, else 0
 𝑋𝑗 = 𝑋𝑗 𝑗 + 1 + 𝑋𝑗 𝑗 + 2 + ⋯ + 𝑋𝑗 (𝑛)
 𝐸[𝑋𝑗 (𝑢)] = 𝑃[𝑢 links to 𝑗] =
 𝐸[𝑋𝑗 ] =
1
𝑛
𝑢=𝑗 𝑢−1
1
𝑢−1
= 𝐻𝑛−1 − 𝐻𝑗
 𝐸[𝑋𝑗 ] = log 𝑛 − 1 − log 𝑗 = log
7/18/2015
Hn…nth harmonic number:
𝑛
1
𝐻𝑛 =
≈ log(𝑛)
𝑘
𝑘=1
𝑛−1
𝑗
≠
𝒏 𝜶
𝒋
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11



Preferential attachment gives power-law
degrees
Intuitively reasonable process
Can tune p to get the observed exponent
 On the web, P[node has degree d] ~ d -2.1
 2.1 = 1+1/(1-p)  p ~ 0.1
There are also other network formation
mechanisms that generate scale-free networks:
 Random surfer model
 Forest Fire model
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
Skip!

Copying mechanism (directed network)
 select a node and an edge of this node
 attach to the endpoint of this edge

Walking on a network (directed network)
 the new node connects to a node, then to every
 first, second, … neighbor of this node

Attaching to edges
 select an edge
 attach to both endpoints of this edge

Node duplication
 duplicate a node with all its edges
 randomly prune edges of new node
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13

Preferential attachment is not so good at
predicting network structure
 Age-degree correlation
 Links among high degree nodes
 On the web nodes sometime avoid linking to each other

Further questions:
 What is a reasonable probabilistic model for how
people sample through web-pages and link to
them?
 Short random walks
 Effect of search engines – reaching pages based on
number of links to them
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
14

How does the
connectivity of the
network change as the
vertices get removed?
[Albert et al. 00; Palmer et al. 01]

Vertices can be
removed:
 Uniformly at random
 In order of decreasing degree

It is important for epidemiology
 Removal of vertices corresponds to vaccination
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15
Get Colorful plots
from Barabasi

Real-world networks are resilient to random attacks
 You need to remove all web-pages of degree > 5
to disconnect the web
 But this is a very small fraction of all web pages
Random network has better resilience to targeted attacks
Mean path length

Preferential
removal
Internet
(Autonomous
systems)
Random
removal
Fraction of removed nodes
7/18/2015
Random network
Fraction of removed nodes
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16


Preferential attachment is a model of a
growing network
What governs network growth and
evolution?
 P1) Node arrival process:

When nodes enter the network
 P2) Edge initiation process:

Each node decides when to initiate an edge
 P3) Edge destination process:

7/18/2015
The node determines destination of the edge
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
[Leskovec et al., KDD ’08]

4 online social networks with
exact edge arrival sequence
 For every edge (u,v) we know exact
time of the appearance tuv

Directly observe mechanisms leading
to global network properties
and so on for
millions…
(F)
(D)
(A)
(L)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19
(F)
(D)
Flickr:
Exponential
(A)
(L)
Answers:
Sub-linear
7/18/2015
Delicious:
Linear
LinkedIn:
Quadratic
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20

How long do nodes live?
 Node life-time is the time between the 1st
and the last edge of a node
Lifetime of a node
1st edge
of node i

last edge
of node i
time
How do nodes “wake up” to create links?
1st edge
of node i
7/18/2015
Edge creation
events
last edge
of node i
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
time
21
LinkedIn

Lifetime a:
time between
node’s first
and last edge
Node lifetime is exponentially distributed:
−𝜆𝑎
𝑝𝑙 𝑎 = 𝜆𝑒
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
22
Invent better
notation for delta!

How do nodes “wake up” to create edges?
 Edge gap 𝜹𝒊 𝒅 : time between dth and d+1st
edge of node i:
 Let 𝑡𝑖 𝑑 be the creation time of d-th edge of node i
 𝛿𝑖 𝑑 = 𝑡𝑖 𝑑 + 1 − 𝑡𝑖 𝑑
𝛿𝑖 1
1st edge
of node i
𝛿𝑖 2
𝛿𝑖 3
last edge
of node i
time
 𝜹 𝒅 is a distribution (histogram) of 𝜹𝒊 𝒅 over all nodes 𝑖
𝛿𝑖 1
7/18/2015
Node i
𝛿𝑖+1 1
Node i+1
𝛿𝑖+2 1
Node i+2
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
23
LinkedIn
Edge gap δ(d):
inter-arrival
time between
dth and d+1st
edge
For every d we
get a separate
histogram

pg ( (1))   (1) e
7/18/2015

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
24
Invent better
notation for delta!

How do α and β change as a function of d?
Fit to each
plot of δ(d):
7/18/2015
pg ( (d ))   (d )
 ( d )
e
 (d )
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
25
α is const, β linear in d – gaps get smaller with d
    d
pg ( (d ))   (d ) e
Log(probability)

  (d ) 
d=3
d=2
Degree
d=1
Log(edge gap)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
26


Source node i wakes up and creates an edge
How does i select a target node j?
 What is the degree of the target j?
 Do preferential attachment really hold?
 How many hops away is the target j?
 Are edges attaching locally?
2
7/18/2015
3
4
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
27
[Leskovec et al., KDD ’08]

Are edges more likely to connect to higher
degree nodes?
pe (k )  k
PA
Gnp
Flickr
7/18/2015
Network
τ
Gnp
0
PA
1
Flickr
1
Delicious
1
Answers
0.9
LinkedIn
0.6
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

28
[Leskovec et al., KDD ’08]

Just before the edge (u,w) is placed how
many hops are between u and w?
Fraction of triad
closing edges
Gnp
PA
Flickr
Network
%Δ
Flickr
66%
Delicious
28%
Answers
23%
LinkedIn
50%
Real edges are local!
w
Mostuof them close
v
triangles!
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
29



Explain the
strategies better
Remove the
example of
Random-Random.
Explain likelihood
better
Focus only on triad-closing edges
New triad-closing edge (u,w) appears next
Model this as 2 independent choices:
v’
1. u choses neighbor v
2. v choses neighbor w
and connect u to w
 E.g.: Under Random-Random:
1 1
5 2
1
5
 𝑝 𝑢, 𝑤 = ⋅ + ⋅ 1 =

u
w
v
3
10
Under a particular pair of “strategies”:
Likelihood of the graph = 𝑢,𝑤 ∈𝐸 𝑝 𝑢, 𝑤
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
30

Improvement over the baseline:
 Baseline: Pick a random node 2 hops away
Select w (2nd node)
Strategy to select v (1st node)
Strategies to pick a neighbor:





7/18/2015
random: uniformly at random
deg: proportional to its degree
com: prop. to the number of common friends
last: prop. to time since last activity
comlast: prop. to com*last
u
w
v
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
31
[Leskovec et al., KDD ’08]

The model of network evolution
Process
P1) Node arrival
Model
• Node arrival function is given
• Node lifetime is exponential
P2) Edge initiation • Edge gaps get smaller as the
degree increases
P3) Edge
Pick edge destination using
destination
random-random
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32
[Leskovec et al., KDD ’08]
Skip

Theorem: Exponential node lifetimes and
power-law with exponential cutoff edge gaps
lead to power-law degree distributions

Interesting as temporal behavior predicts
structural network property
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
33
Skip!

Given the model one can take an existing
network continue its evolution

Compare true and predicted (based on the
theorem) degree exponent:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
35

How do networks evolve at the macro level?
 What are global phenomena of network growth?

Questions:
 What is the relation between the number of nodes
n(t) and number of edges e(t) over time t?
 How does diameter change as the network grows?
 How does degree distribution evolve as the
network grows?
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
36
[Leskovec et al., KDD 05]





N(t) … nodes at time t
E(t) … edges at time t
Suppose that
N(t+1) = 2 * N(t)
Q: what is
E(t+1) =2 * E(t)
A: over-doubled!
 But obeying the Densification Power Law
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
[Leskovec et al., KDD 05]



Internet
E(t)
What is the relation between
the number of nodes and the
edges over time?
a=1.2
First guess: constant average
degree over time
Networks are denser over time
Densification Power Law:
N(t)
Citations
E(t)

a … densification exponent (1 ≤ a ≤ 2)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
a=1.6
N(t)
38
[Leskovec et al. KDD 05]

Densification Power Law
 the number of edges grows faster than the
number of nodes – average degree is increasing
or
equivalently
a … densification exponent: 1 ≤ a ≤ 2:
 a=1: linear growth – constant out-degree
(traditionally assumed)
 a=2: quadratic growth – clique
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
39
[Leskovec et al. KDD 05]
Internet
diameter
Prior models and intuition say
that the network diameter slowly
grows (like log N, log log N)

size of the graph

Diameter shrinks over time
 as the network grows the
distances between the nodes
slowly decrease
7/18/2015
diameter
Citations
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
time
40
Is shrinking
diameter just a
consequence of
densification?
diameter
[Leskovec et al. TKDD 07]
Erdos-Renyi
random graph
Densification
exponent a =1.3
size of the graph
Densifying random graph has increasing
diameter There is more to shrinking diameter
than just densification
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41


Make sure that Power-laws and PA fit into a
single lecture
Then 2nd lecture (evolution) will be finished in
time)
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
42
Is it the degree sequence?
Compare diameter of a:
diameter
 True network (red)
 Random network with
the same degree
distribution (blue)
Citations
size of the graph
Densification + degree sequence
give shrinking diameter
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
43
[Leskovec et al. TKDD 07]


How does degree distribution evolve to
allow for densification?
Option 1) Degree exponent n is constant:
 Fact 1: For degree exponent 1<n < 2: a = 2/
Email network
7/18/2015
A consequence of what
we learned in last class:
■ Power-laws with
exponents <2 have infinite
expectations.
■ So, by maintaining
constant degree exponent 
the average degree grows.
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
44
[Leskovec et al. TKDD 07]


How does degree distribution evolve to allow
for densification?
Option 2) Exponent n evolves with graph size n:
 Fact 2:
Citation network
7/18/2015
Remember, expected
degree is:
𝛾−1
𝐸𝑥 =
𝑥
𝛾 − 2 𝑚𝑖𝑛
So  has to decay as
as function of graph
size for the avg. degree
to go up
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
45
[Leskovec et al. TKDD 07]


Want to model graphs that density and have
shrinking diameters
Intuition:
 How do we meet friends at a party?
 How do we identify references when writing
papers?
w
7/18/2015
v
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
46
[Leskovec et al. TKDD 07]

The Forest Fire model has 2 parameters:
 p … forward burning probability
 r … backward burning probability

The model:
 Each turn a new node v arrives
 Uniformly at random chooses an “ambassador” w
 Flip 2 geometric coins to determine the number of
in- and out-links of w to follow
 “Fire” spreads recursively until it dies
 New node v links to all burned nodes
Geometric distribution:
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
47

Forest Fire generates graphs that densify
and have shrinking diameter
E(t)
densification
1.32
N(t)
7/18/2015
diameter
diameter
N(t)
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
48

Forest Fire also generates graphs with
power-law degree distribution
in-degree
log count vs. log in-degree
7/18/2015
out-degree
log count vs. log out-degree
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
49



Fix backward
probability r and
vary forward
burning prob. p
Notice a sharp
transition
between sparse
and clique-like
graphs
Increasing
diameter
Sparse
graph
Clique-like
graph
Constant
diameter
Decreasing
diameter
Sweet spot is
very narrow
7/18/2015
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
50