Modeling networks using Kronecker multiplication Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure/ Introduction • Graphs are everywhere • What can we do with graphs? – What patterns.

Download Report

Transcript Modeling networks using Kronecker multiplication Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure/ Introduction • Graphs are everywhere • What can we do with graphs? – What patterns.

Modeling networks using
Kronecker multiplication
Jure Leskovec
Machine Learning Department
Carnegie Mellon University
[email protected]
http://www.cs.cmu.edu/~jure/
Introduction
• Graphs are everywhere
• What can we do with
graphs?
– What patterns or
“laws” hold for most
real-world graphs?
– Can we build models
of graph generation
and evolution?
– Can we fit these
models to real
networks?
Web & citations
Internet
Needle exchange
Yeast protein
interactions
Traditional approach
• Sociologists were first to study networks:
– Study of patterns of connections between people to
understand functioning of the society
– People are nodes, interactions are edges
– Questionares are used to collect link data (hard to
obtain, inaccurate, subjective)
– Typical questions: Centrality and connectivity
• Limited to small graphs (~10 nodes) and
properties of individual nodes and edges
New approach (1)
• Large networks (e.g., web, internet, on-line
social networks) with millions of nodes
• Many traditional questions not useful anymore:
– Traditional: What happens if a node u is removed?
– Now: What percentage of nodes needs to be
removed to affect network connectivity?
• Focus moves from a single node to study of
statistical properties of the network as a whole
• Can not draw (plot) the network and examine it
New approach (2)
• How the network “looks like” even if I can’t look
at it?
• Need statistical methods and tools to quantify
large networks
• 3 parts/goals:
– Statistical properties of large networks
– Models that help understand these properties
– Predict behavior of networked systems based on
measured structural properties and local rules
governing individual nodes
Outline
• Introduction
• Properties of real-world networks
– Properties of static networks
– Properties of dynamic (evolving) networks)
• Proposed graph generation model
– Kronecker Graphs
•
•
•
•
Properties of Kronecker Graphs
Fitting Kronecker Graphs
Experiments
Observations and Conclusion
Outline
• Introduction
• Properties of real-world networks
– Properties of static networks
– Properties of dynamic (evolving) networks)
• Proposed graph generation model
– Kronecker Graphs
•
•
•
•
Properties of Kronecker Graphs
Fitting Kronecker Graphs
Experiments
Observations and Conclusion
Statistical properties of networks
• Features that are common to networks of
different types:
– Properties of static networks:
•
•
•
•
•
•
Small-world effect
Transitivity or clustering
Degree distributions (scale free networks)
Network resilience
Community structure
Subgraphs or motifs
– Temporal properties:
• Densification
• Shrinking diameter
Small-world effect (1)
• Six degrees of separation (Milgram 60s)
– Random people in Nebraska were asked to send letters
to stockbrokes in Boston
– Letters can only be passed to first-name acquantices
– Only 25% letters reached the goal
– But they reached it in about 6 steps
• Measuring path lengths:
– Diameter (longest shortest path): max dij
– Effective diameter: distance at which 90% of all
connected pairs of nodes can be reached
– Mean geodesic (shortest) distance l
or
Small-world effect (2)
– 180 million people
– 1.3 billion edges
– Edge if two people
exchanged at least
one message in one
month period
7
10
Pick a random
node, count
how many
nodes are at
distance
1,2,3... hops
6
10
Number of nodes
• Distribution of
shortest path lengths
• Microsoft Messenger
network
8
10
5
10
4
10
3
7
10
2
10
1
10
0
10
0
5
10
15
20
Distance (Hops)
25
30
Degree distributions (1)
• Let pk denote a fraction of nodes with degree k
• We can plot a histogram of pk vs. k
• In a (Erdos-Renyi) random graph degree distribution
follows Poisson distribution
• Degrees in real networks are heavily skewed to the right
• Distribution has a long tail of values that are far above
the mean
• Heavy (long) tail:
– Amazon sales
– word length distribution, …
vs.
Degree distributions (2)
Many lowdegree nodes
log(pk)
Few highdegree nodes
log(k)
Degree distributions (3)
x 10
-3
• Many real world
networks contain hubs:
highly connected nodes pk
• We can easily
distinguish between
exponential and powerlaw tail by plotting on
log-lin and log-log axis
• We usually work with
CDF instead of PDF
pk
(then the degree
exponent is α=slope+1)
• In scale-free networks
maximum degree
scales as n1/(α-1)
3.5
-2
10
3
lin-lin
2.5
log-lin
-3
10
2
-4
10
1.5
1
-5
10
0.5
0
0
-6
200
400
-2
10
600
800
1000
10
0
200
400
600
800
k
k
log-log
-3
10
-4
10
-5
10
-6
10 0
10
1
10
k
2
10
Degree distribution in
a blog network
10
3
1000
Poisson vs. Scale-free network
Poisson network
(Erdos-Renyi random graph)
Scale-free (power-law) network
Degree
distribution is
Power-law
Degree distribution is Poisson
Function is
scale free if:
f(ax) = b f(x)
Spectral properties
• Scree plot
Eigenvalue
– Eigenvalues of graph
adjacency matrix follow
a power law
– Network values
(components of
principal eigenvector)
also follow a power-law
Scree Plot
Rank
Temporal Graph Patterns
• Conventional Wisdom:
– Constant average degree: the number of edges
grows linearly with the number of nodes
– Slowly growing diameter: as the network grows the
distances between nodes grow
• We recently found:
– Densification Power Law: networks are becoming
denser over time
– Shrinking Diameter: diameter is decreasing as the
network grows
Temporal Patterns –
Densification
• A very basic question: What is
the relation between the
number of nodes and the
number of edges in a network?
• Densification Power Law
• Suppose that
E(t)
– N(t) … nodes at time t
– E(t) … edges at time t
Densification
Power Law
1.69
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!
– But obeying the Densification
Power Law
N(t)
Networks over time: Densification
• Networks are becoming
denser over time
• The number of edges grows
faster than the number of
nodes – average degree is
increasing
Internet
E(t)
a=1.2
N(t)
a … densification exponent:
Citations
1 ≤ a ≤ 2:
– a=1: linear growth – constant
out-degree (assumed in the
literature so far)
– a=2: quadratic growth – clique
E(t)
a=1.7
N(t)
Densification & degree distribution
• How does densification affect
degree distribution?
• Given densification exponent a, the
degree exponent is:
Degree exponent
over time
pk=kγ
(a)
γ(t)
– (a) For γ=const over time, we obtain
densification only for 1<γ<2, then γ=a/2
– (b) For γ<2 degree distribution has to
evolve according to:
a=1.1
(b)
γ(t)
• Power-law: y=b xγ, for γ<2 E[y] = ∞
a=1.6
Shrinking diameters
• Intuition and prior work say
that distances between the
nodes slowly grow as the
network grows (like log n):
– d ~ O(log N)
– d ~ O(log log N)
• Diameter Shrinks/Stabilizes
over time
– as the network grows the
distances between nodes
slowly decrease
Internet
Citations
Patterns hold in many graphs
• All these patterns can be observed in many
real life graphs:
–
–
–
–
–
–
–
–
–
–
World wide web [Barabasi]
On-line communities [Holme, Edling, Liljeros]
Who call whom telephone networks [Cortes]
Autonomous systems [Faloutsos, Faloutsos, Faloutsos]
Internet backbone – routers [Faloutsos, Faloutsos, Faloutsos]
Movie – actors [Barabasi]
Science citations [Leskovec, Kleinberg, Faloutsos]
Co-authorship [Leskovec, Kleinberg, Faloutsos]
Sexual relationships [Liljeros]
Click-streams [Chakrabarti]
Outline
• Introduction
• Properties of real-world networks
– Properties of static networks
– Properties of dynamic (evolving) networks)
• Proposed graph generation model
– Kronecker Graphs
•
•
•
•
Properties of Kronecker Graphs
Fitting Kronecker Graphs
Experiments
Observations and Conclusion
Graph Generators
• Lots of work
– Random graph [Erdos and Renyi, 60s]
– Preferential Attachment [Albert and Barabasi, 1999]
– Copying model [Kleinberg, Kumar, Raghavan, Rajagopalan
and Tomkins, 1999]
– Community Guided Attachment and Forest Fire Model
[Leskovec, Kleinberg and Faloutsos, 2005]
– Also work on Web graph and virus propagation [Ganesh et al,
Satorras and Vespignani]++
• But all of these
– Do not obey all the patterns
– Or we are not able prove them
Kronecker graphs
• Want to have a model that can generate a
realistic graph:
– Static Patterns
• Power Law Degree Distribution
• Small Diameter
• Power Law Eigenvalue and Eigenvector Distribution
– Temporal Patterns
• Densification Power Law
• Shrinking/Constant Diameter
• For Kronecker graphs all these properties can
actually be proven
Recursive Graph Generation
• There are many obvious (but wrong) ways
Initial graph
Recursive expansion
– Does not obey Densification Power Law
– Has increasing diameter
• Kronecker Product is exactly what we need
Kronecker Product – a Graph
Intermediate stage
Adjacency matrix
Adjacency matrix
Kronecker Product – a Graph
• Continuing multypling with G1 we
obtain G4 and so on …
G4 adjacency matrix
Kronecker Graphs – Formally:
• We create the self-similar graphs
recursively:
– Start with a initiator graph G1 on N1
nodes and E1 edges
– The recursion will then product larger
graphs G2, G3, …Gk on N1k nodes
– Since we want to obey Densification
Power Law graph Gk has to have E1k
edges
Kronecker Product – Definition
• The Kronecker product of matrices A and B is
given by
NxM
KxL
N*K x M*L
• We define a Kronecker product of two graphs as
a Kronecker product of their adjacency matrices
Kronecker Graphs
• We propose a growing sequence of
graphs by iterating the Kronecker
product
• Each Kronecker multiplication
exponentially increases the size of the
graph
Kronecker Graphs – Intuition
• Intuition:
– Recursive growth of graph communities
– Nodes get expanded to micro communities
– Nodes in sub-community link among themselves and
to nodes from different communities
How to randomize a graph?
• We want a randomized version of
Kronecker Graphs
• Obvious solution
– Randomly add/remove some edges
• Wrong! – is not biased
– adding random edges destroys degree
distribution, diameter, …
• Want add/delete edges in a biased way
• How to randomize properly and maintain all
the properties?
Stochastic Kronecker Graphs
• Create N1N1 probability matrix P1
• Compute the kth Kronecker power Pk
• For each entry puv of Pk include an
edge (u,v) with probability puv
0.5 0.2
0.1 0.3
P1
Kronecker 0.25 0.10 0.10 0.04
multiplication
0.05 0.15 0.02 0.06
0.05 0.02 0.15 0.06
0.01 0.03 0.03 0.09
P2
Instance
Matrix G2
flip biased
coins
Outline
• Introduction
• Properties of real-world networks
– Properties of static networks
– Properties of dynamic (evolving) networks)
• Proposed graph generation model
– Kronecker Graphs
•
•
•
•
Properties of Kronecker Graphs
Fitting Kronecker Graphs
Experiments
Observations and Conclusion
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
– Static Patterns
 Power Law Degree Distribution
 Power Law eigenvalue and eigenvector distribution
 Small Diameter
– Dynamic Patterns
 Growth Power Law
 Shrinking/Stabilizing Diameters
– First and the only generator for which we can
prove all the properties
Properties of Kronecker Graphs
• Theorem: Kronecker Graphs have
Multinomial in- and out-degree distribution
(which can be made to behave like a Power Law)
• Proof:
– Let G1 have degrees d1, d2, …, dN
– Kronecker multiplication with a node of degree d
gives degrees d∙d1, d∙d2, …, d∙dN
– After Kronecker powering Gk has multinomial
degree distribution
Eigen-value/-vector Distribution
• Theorem: The Kronecker Graph has
multinomial distribution of its eigenvalues
• Theorem: The components of each
eigenvector in Kronecker Graph follow a
multinomial distribution
• Proof: Trivial by properties of Kronecker
multiplication
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
– Static Patterns
 Power Law Degree Distribution
 Power Law eigenvalue and eigenvector distribution
 Small Diameter
– Dynamic Patterns
• Growth Power Law
• Shrinking/Stabilizing Diameters
Temporal Patterns: Densification
• Theorem: Kronecker graphs follow a
Densification Power Law with densification
exponent
• Proof:
– If G1 has N1 nodes and E1 edges then Gk has
Nk = N1k nodes and Ek = E1k edges
– And then Ek = Nka
– Which is a Densification Power Law
Constant Diameter – Proof Sketch
• Theorem: If G1 has diameter d then graph
Gk also has diameter d
• Observation: Edges in Kronecker graphs:
where X are appropriate nodes
• Example:
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
– Static Patterns
 Power Law Degree Distribution
 Power Law eigenvalue and eigenvector distribution
 Small Diameter
– Dynamic Patterns
 Growth Power Law
 Shrinking/Stabilizing Diameters
– First and the only generator for which we can
prove all the properties
Outline
• Introduction
• Properties of real-world networks
– Properties of static networks
– Properties of dynamic (evolving) networks)
• Proposed graph generation model
– Kronecker Graphs
•
•
•
•
Properties of Kronecker Graphs
Fitting Kronecker Graphs
Experiments
Observations and Conclusion
Why fitting generative models?
• Parameters tell us about the structure of a graph
• Extrapolation: given a graph today, how will it
look in a year?
• Sampling: can I get a smaller graph with similar
properties?
• Anonymization: instead of releasing real graph
(e.g., email network), we can release a synthetic
version of it
Problem definition
• Find parameter matrix Θ which
• We need to (efficiently) calculate
P(G | 1 )
• And maximize over Θ (by using
gradient descent)
• (so we also need to the gradient)
Fitting Kronecker to Real Data
• Given a graph G and Kronecker matrix Θ we
can calculate probability that Θ generated G
P(G|Θ):
0.5 0.2
0.1 0.3
Θ
0.25 0.10 0.10 0.04
0.05 0.15 0.02 0.06
0.05 0.02 0.15 0.06
1
1
0
0
1
1
1
0
0
1
1
1
0.01 0.03 0.03 0.09
0
0
1
1
Θk
P(G|Θ)
G
P(G | 1 )    k [u, v]  1   k [u, v]
( u ,v )G
( u ,v )G
Challenge 1: Node labeling
0.25
0.10
0.10
0.04
0.5
0.2
0.05
0.15
0.02
0.06
0.1
0.3
0.05
0.02
0.15
0.06
0.01
0.03
0.03
0.09
Θ
Θk
G’
1
3
2
4
2
G”
4
1
3
1
0
1
0
0
1
1
1
1
1
1
1
0
0
1
1
1
0
1
1
0
1
0
1
1
0
1
1
1
1
1
1
P(G’|Θ) = P(G”|Θ)
• Graphs G’ and G” should
have save probability
P(G’|Θ) = P(G”|Θ)
• So one needs to consider
all node labelings σ
P(G | )   P(G | ,  )P( )
• There are O(N!) such
labelings
• All labelings are apriori
equally likely
Challenge 2: calculating P(G|Θ,σ)
• Calculating
σ… node labeling
P = Θk
• Takes O(N2) time. Infeasible for large graphs
0.25
0.10
0.10
0.04
1
1
0
0
0.05
0.15
0.02
0.06
1
1
1
0
0.05
0.02
0.15
0.06
0
1
1
1
0.01
0.03
0.03
0.09
0
0
1
1
Θk
G
P(G|Θ, σ)
Our solutions
• Naïvely calculating P(G|Θ) takes O(N!N2) time
• We can do it in O(E)
• Solutions
– Challenge 1:
• We won’t consider all labelings
• But use Markov Chain Monte Carlo (MCMC) sampling
techniques to sample permutations from P(σ|G,Θ)
– Challenge 2:
• Real graphs are sparse: E << N
• Calculate P(Gempty|Θ) and then “add” the edges.
• This takes O(E) (and not O(N2))
Sampling node labelings (1)
• Gradient over parameters
• Sample the permutations from
P(σ|G,Θ) and average them
Sampling node labelings (2)
Metropolis permutation
sampling algorithm
j
k
• Need to efficiently
calculate the likelihood
ratios
• But the permutations σ(i)
and σ(i+1) only differ at 2
positions
• So we only traverse to
update 2 rows (columns)
of Θk
• We can evaluate the
likelihood ratio efficiently
Calculating P(G|Θ,σ)
• Real graphs are sparse so we first
calculate likelihood of empty graph
• Probability of edge (i,j) is in general pij
=θ1aθ2b θ3c θ4d
• By using Taylor approximation to pij
and summing the multinomial series
we obtain:
• We approximate the likelihood:
Θk
pij =θ1aθ2bθ3cθ1d
Taylor approximation
log(1-x) ~ -x – 0.5 x2
Convergence of fitting
• Can gradient descent recover true parameters?
• How nice (smooth, without local minima) is optimization
space?
– Generate a graph from random parameters
– Start at random point and use gradient descent
– We recover true parameters 98% of the times
• How does algorithm converge to true parameters with
gradient descent iterations?
Log-likelihood
Avg abs error
1st eigenvalue
Diameter
AS graph (N=6500, E=26500)
Degree distribution
Adjacency matrix eigen values
Hop plot
Network value
Epinions graph (N=76k, E=510k)
Degree distribution
Hop plot
Adjacency matrix eigen values
Network value
Scalability
• Fitting scales linearly with the
number of edges
Model selection
• How big should parameters matrix Θ be?
• We propose to use Bayes Information
Criterion (BIC):
• We tradeoff between
the model fit and the
model complexity
Conclusion
• We proposed Kronecker Graphs
– We can provable properties of Kronecker Graph
model
• We presented scalable algorithms for fitting
Kronecker Graphs
– Use simulation techniques to overcome superexponential number of node labelings
– Use Taylor approximation to quickly evaluate the
likelihood
• Kronecker Graphs fit well
References
– Graph Evolution: Densification and Shrinking Diameters, by Jure
Leskovec, Jon Kleinberg and Christos Faloutsos, ACM TKDD 2007
– Realistic, Mathematically Tractable Graph Generation and Evolution,
Using Kronecker Multiplication, by Jure Leskovec, Deepay Chakrabarti,
Jon Kleinberg and Christos Faloutsos, PKDD 2005
– Scalable Modeling of Real Graphs using Kronecker Multiplication, by
Jure Leskovec and Christos Faloutsos, in submission