Transcript snap.stanford.edu
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
http://cs224w.stanford.edu
4/26/2020
Better and better clusters Clusters get worse and worse Best cluster has ~100 nodes
k, (cluster size) Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
2
Denser and denser network core 4/26/2020 Small good communities
Nested core-periphery
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
3
Intuition: Self-similarity
Object is similar to a part of itself: the whole has the same shape as one or more of the parts Mimic recursive graph/community growth Initial graph Recursive expansion
Kronecker Product
self-similar matrices is a way of generating 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4
[PKDD ‘05] Intermediate stage (3x3) (9x9) Initiator graph 4/26/2020 After the growth phase Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
5
Kronecker product
given by of matrices A and B is
N x M K x L N*K x M*L
Define a Kronecker product of two graphs as a Kronecker product of their
adjacency matrices
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
6
[PKDD ‘05]
Kronecker graph:
sequence of graphs by iterating the a growing
Kronecker product:
Note:
One can easily use multiple initiator matrices (
K 1 ’
,
K 1 ’’
,
K 1 ’’’
) (even of different sizes) 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
K 1
7
[PKDD ‘05] For
K k K 1
(k th
N 1 k
on nodes
N 1
nodes and
E 1
Kronecker power of edges
K 1
) has:
E 1 k
edges We get
densification power-law:
𝑬 𝒕 ∝ 𝑵 𝒕 𝒂 , What is a?
𝐥𝐨𝐠 𝑬 𝒕 𝒂 = 𝐥𝐨𝐠 𝑵 𝒕 = 𝐥𝐨𝐠 𝑬 𝟏 𝐥𝐨𝐠(𝑵 𝟏 ) 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
K 1
8
[PKDD ‘05]
Continuing multypling with
K 1
obtain
K 4
and so on … we
4/26/2020
K 1
3 x 3 9 x 9
K 4
adjacency matrix Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
9
[PKDD ‘05] 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10
[PKDD ’05]
Kronecker graphs have many properties found in real networks:
Properties of static networks
Power-Law like Degree Distribution Power-Law eigenvalue and eigenvector distribution Small Diameter
Properties of dynamic networks
Densification Power Law Shrinking/Stabilizing Diameter 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
11
[PKDD ’05] Observation: Edges in Kronecker graphs: where
X
are appropriate nodes in G and H
Why?
An entry in matrix G H is a multiplication of entries in G and H.
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
12
[PKDD ’05] Theorem: diameter
d
Constant diameter
: If
G
,
H
then
G
H
has diameter
d
have
What is distance between nodes u, v in
G
H
?
t
Let u=[a,b], v=[a’,b’] (using notation from last slide) hen edge (u,v) in
G
H
iif (a,a’) G and (b,b’)
H
So, path a to a’ in G is less d steps: a
1 ,a 2 ,a 3 ,…,a d
And path b to b’ in H is less d steps: b
1 ,b 2 ,b 3 ,…,b d
Then: edge ([a
1 ,b 1 ], [a 2 ,b 2
]) is in
G
H
So it takes
G
H
Consequence:
If
K 1
has diameter
d
then graph
K k
also has diameter
d
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
13
[PKDD ’05] Create
N 1
N 1
Compute the
k th
For each entry edge (
u,v
)
probability matrix
in
K k p
Kronecker power
Θ k uv
of
Θ k
include an with probability
p
Θ 1
uv
0.5 0.2
0.1 0.3
Θ 1
Probability of edge
p ij
Kronecker multiplication 0.25
0.10
0.10
0.04
0.05
0.05
0.15
0.02
0.02
0.15
0.06
0.06
0.01
0.03
0.03
Θ 2 = Θ 1
Θ 1
0.09
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Instance matrix
K 2
Flip biased coins 14 4/26/2020
What is known about Stochastic Kronecker?
Undirected
Kronecker graph model with:
Connected
, if:
b+c > 1
1
Connected component of size Θ(n)
, if:
(a+b)(b+c) > 1
Constant diameter
, if:
b+c > 1
Not searchable
by a decentralized algorithm a b a>b>c b c 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
15
Show that naïve coin flipping is too slow. Have a picture of the RMAT-like generation.
Double edges will be a problem.
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
16
Skip method of moments!
Have a nice slide with pictures motivaing what we are trying to do.
Given a real network
G
Want to estimate the initiator matrix: Method of moments
[Gleich&Owen ‘09] Compare counts of and solve the system of equations 1 a b b d For every of the 4 subgraphs, we get an equation: 2 E[# ] = (a+2b+c) k - (a+c) k 2 E[# ] = … … where k = log 2 (N) Now solve the system of equations by trying all possible values (a,b,c) 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
17
estimation. Just explain the formulation and how to compute the likelihood Method 2:
Maximum Likelihood Estimation
arg max 1 P( | Kronecker 1 ) Naïve estimation takes
O(N!N
2 )
:
N!
for different node labelings:
N 2
for traversing graph adjacency matrix Do gradient descent to find good 1 1 a b c d We will get this down do O(E)!
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
18
KronFit: Maximum likelihood estimation
Given real graph
G
Find Kronecker initiator graph
Θ
which arg max
P
(
G
| ) We then need to (efficiently) calculate
P
(
G
| )
And maximize over
Θ
(e.g., using gradient descent) 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
19
the likelihood – edge probability and the outcome Given a graph
G Θ
and Kronecker matrix we calculate probability that
Θ
generated
G P(G|Θ)
0.25 0.10 0.10 0.04
1 0 1 0.5 0.2
0.05 0.15 0.02 0.06
0 1 0 0.05 0.02 0.15 0.06
1 0 1 0.1 0.3
1 1 1 0.01 0.03 0.03 0.09
Θ Θ k G P(G|Θ ) P
(
G
| ) (
u
,
v
)
G
k
[
u
,
v
] (
u
,
v
)
G
( 1
k
[
u
,
v
]) 1 1 1 1 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
20 G
[ICML ‘07] 0.5
0.1
1
Θ
0.2
0.3
G’
3
σ
0.25
0.05
0.05
0.01
1 0 0.10
Θ k
0.10
0.15
0.02
0.03
0 1 1 1 0.02
0.15
0.03
0 1 0.04
0.06
0.06
0.09
Nodes are unlabeled Graphs G’ and G” should have the same probability
P(G’|Θ) = P(G”|Θ)
One needs to consider all
P
(
G
| node correspondences
σ
)
P
(
G
| , )
P
( ) 2 1 1 1 1 4 0 0 1 1 1 2
G”
4 3 1 0 1 0 1 0 1 0 1 1 1 1 All correspondences are a priori equally likely There are
O(N!)
1 1 1 1
P(G’|Θ) = P(G”|Θ)
4/26/2020 correspondences Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
21
[ICML ‘07] Assume that we solved the node correspondence problem
P
( Calculating
G
| ) (
u
,
v
)
G
k
[
u
,
v
] (
u
,
v
)
G
( 1 Takes
O(N 2 )
time
k
[
u
,
v
]) 4/26/2020 0.25
0.05
0.05
0.01
0.10
0.10
0.15
0.02
0.02
0.15
0.03
Θ k
0.03
0.04
0.06
0.06
0.09
σ
1 0 1 0 0 1 0 0 1 0 1 1
G
1 1 1 1
P(G|Θ, σ )
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
22
Node correspondence:
Permutation σ defines the mapping 2 1 Randomly search over σ 3 4 Swap node 2 4 labels 1 and 4 to find good mappings 3 1 1 2 3 4 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 2 3 4 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 4/26/2020
Calculating the likelihood
P(G|Θ,σ)
Calculate likelihood of
empty graph
(G with 0 edges) Correct it for edges that we observe in the graph Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
23
Experimental setup Given real graph
G
Gradient descent from random initial point Obtain estimated parameters
Θ
Generate synthetic graph
K
using
Θ
Compare properties of graphs
G
and
K
Note:
We do not fit the graph properties themselves We fit the likelihood and then compare the properties a b c d 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
30
[ICML ‘07]
Real and Kronecker are very close:
1 0.99 0.54
0.49 0.13
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
32
explansion for the little matrix.
What do estimated parameters tell us about the network structure?
b edges a b c d a edges d edges c edges 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
33
[JMLR ‘10]
What do estimated parameters tell us about the network structure?
0.9 0.5
0.5 0.1
0.5 edges
Core
0.9 edges
Periphery
0.1 edges 0.5 edges 4/26/2020
Nested Core-periphery
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
34
Small and large networks are very different:
[JMLR ‘10] 4/26/2020 = 0.99 0.17
0.17 0.82
= 0.99 0.54
0.49 0.13
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
35
Skip
Large scale network structure:
Large networks are different from small networks and manifolds
Nested
Core
periphery
Recursive onion-like structure of the network where each layer decomposes into a core and periphery 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
36
Skip Remember the SKG theorems: Connected , if
b+c>1
: 0.55+0.15 > 1.
No!
0.99 0.55
0.55 0.15
Giant component , if
(a+b)∙(b+c)>1
: (0.99+0.55)∙(0.55+0.15) > 1.
Yes!
Real graphs are in the in the parameter region analogous to the giant component of an
extremely sparse
G np
4/26/2020
G np 1/n
real-networks
log(n)/n
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
37
Each node has a set of
categorical attributes
Example: Gender: Male, Female Home country: US, Canada, Russia, etc.
How do node attributes influence link formation? 𝑣 ’s gender 𝒖 𝒗 𝒖 𝒗
FEMALE FEMALE MALE 0.3
0.6
u is friends with v
MALE 0.6
0.2
Link probability Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
4/26/2020 39
Let the values of the 𝒊
-th attribute
for node 𝑢 and 𝑣 be 𝒂 𝒊 𝒖 and 𝒂 𝒊 (𝒗) 𝑎 𝑖 𝑢 and 𝑎 𝑖 (𝑣) can take values {0, ⋯ , 𝑑 𝑖 − 1} Question: How can we capture the influence of the attributes on link formation?
Attribute matrix
𝚯 𝑎 𝑖 𝑣 = 0 𝑎 𝑖 𝑣 = 1 𝑎 𝑖 𝑢 = 0 𝚯[𝟎, 𝟎] 𝚯[𝟎, 𝟏] 𝑷 𝒖, 𝒗 = 𝚯[𝒂 𝒊 𝒖 , 𝒂 𝒊 (𝒗)] 𝑎 𝑖 𝑢 = 1 𝚯[𝟏, 𝟎] 𝚯[𝟏, 𝟏] 4/26/2020 Each entry of the attribute matrix captures the
probability of a link
between two nodes associated with the attributes of them Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
40
Flexibility in the network structure:
Homophily
: love of the
same
e.g., political parties, hobbies
0.9 0.1
0.1
0.8
Heterophily
: love of the
opposite
e.g., genders
0.2
0.9
0.9 0.1
Core-periphery
: love of the
core
e.g. extrovert personalities
0.9 0.5
0.5
0.2
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
41
How do we combine the effects of multiple attributes?
Multiply the probabilities
from all attributes 𝒂 𝒖 = [ 𝒂 𝒗 = [ 𝚯 𝐢 = 𝜶 𝟏 𝜷 𝟏 𝜷 𝟏 𝜸 𝟏 𝑷 𝒖, 𝒗 = 4/26/2020 𝟎 𝟎 𝜶 𝟏 × 𝟎 𝟏 𝜶 𝟐 𝜷 𝟐 𝜷 𝟐 𝜸 𝟐 𝜷 𝟐 × 𝟏 𝟏 𝜶 𝟑 𝜷 𝟑 𝜷 𝟑 𝜸 𝟑 𝜸 𝟑 × 𝟎 𝟎 𝜶 𝟒 𝜷 𝟒 𝜷 𝟒 𝜸 𝟒 𝜶 𝟒 ] ] Link probability Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Node attributes
+
Attribute matrices 42
Multiplicative Attribute Graph
𝑴(𝒏, 𝒍, 𝒂, 𝜣) : A network contains 𝒏
nodes
Each node has 𝒍
categorical attributes
𝑎 𝑖 (𝑢) represents the 𝒊
-th attribute of node
𝒖 Each attribute 𝑎 𝑖 (∙) is linked to a 𝒅 𝒊
link-affinity matrix
𝜣 𝒊 × 𝒅 𝒊
attribute
Edge probability between nodes 𝑢 and 𝑣 𝒍 𝑷(𝒖, 𝒗) = 𝚯 𝒊 [𝒂 𝒊 𝒊=𝟏 𝒖 , 𝒂 𝒊 𝒗 ] 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
43
statement more precise – explain it better.
Initiator matrix K
1
acts like an
Probability of a link
affinity matrix
between nodes
u, v
:
P
(
u
,
v
)
i k
1
K
1 (
A u
(
i
),
A v
(
i
))
K
1 0 1
a c b d v 2 v 3
= (0,1) = (1,0) 0 1
P(v 2 ,v 3 ) = b∙c
4/26/2020 = Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
44
Each node in a Kronecker graph has a node id (e.g. 0, ⋯ , 2 𝑙 − 1 ) A binary representation of node id is its attribute vector in a MAG model Then, the (stochastic) adjacency matrices of two models are equivalent
Example:
𝑣 0 𝑣 1 𝑣 2 𝑣 3 𝑎 𝑲 𝑏 𝑣 0 𝑣 1 𝑎 𝒂 𝑐 𝑏 𝑑 𝑎 𝑐 𝒃 𝑏 𝑑 𝑐 𝑑 𝑣 2 𝑣 3 𝑎 𝒄 𝑐 𝑏 𝑑 𝑎 𝒅 𝑐 𝑏 𝑑 𝑎(𝑣 1 ) = [0 1] 𝑎(𝑣 2 ) = [1 0] 𝑃 𝑣 1 , 𝑣 2 = 𝑏 ∙ 𝑐 4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
45
2 ingredients of Kronecker model:
(1) Each of
2 k
nodes has a unique binary vector of length
k
Node id expressed binary is the vector (2) The initiator matrix
K
Question:
What if ingredient (1) is dropped?
i.e., do we need high variability of feature vectors?
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
46
Adjacency matrices:
4/26/2020 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
47