Modeling networks using Kronecker multiplication Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure/ Introduction • Graphs are everywhere • What can we do with graphs? – What patterns.
Download ReportTranscript Modeling networks using Kronecker multiplication Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure/ Introduction • Graphs are everywhere • What can we do with graphs? – What patterns.
Modeling networks using Kronecker multiplication Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure/ Introduction • Graphs are everywhere • What can we do with graphs? – What patterns or “laws” hold for most real-world graphs? – Can we build models of graph generation and evolution? – Can we fit these models to real networks? Web & citations Internet Needle exchange Yeast protein interactions Traditional approach • Sociologists were first to study networks: – Study of patterns of connections between people to understand functioning of the society – People are nodes, interactions are edges – Questionares are used to collect link data (hard to obtain, inaccurate, subjective) – Typical questions: Centrality and connectivity • Limited to small graphs (~10 nodes) and properties of individual nodes and edges New approach (1) • Large networks (e.g., web, internet, on-line social networks) with millions of nodes • Many traditional questions not useful anymore: – Traditional: What happens if a node u is removed? – Now: What percentage of nodes needs to be removed to affect network connectivity? • Focus moves from a single node to study of statistical properties of the network as a whole • Can not draw (plot) the network and examine it New approach (2) • How the network “looks like” even if I can’t look at it? • Need statistical methods and tools to quantify large networks • 3 parts/goals: – Statistical properties of large networks – Models that help understand these properties – Predict behavior of networked systems based on measured structural properties and local rules governing individual nodes Outline • Introduction • Properties of real-world networks – Properties of static networks – Properties of dynamic (evolving) networks) • Proposed graph generation model – Kronecker Graphs • • • • Properties of Kronecker Graphs Fitting Kronecker Graphs Experiments Observations and Conclusion Outline • Introduction • Properties of real-world networks – Properties of static networks – Properties of dynamic (evolving) networks) • Proposed graph generation model – Kronecker Graphs • • • • Properties of Kronecker Graphs Fitting Kronecker Graphs Experiments Observations and Conclusion Statistical properties of networks • Features that are common to networks of different types: – Properties of static networks: • • • • • • Small-world effect Transitivity or clustering Degree distributions (scale free networks) Network resilience Community structure Subgraphs or motifs – Temporal properties: • Densification • Shrinking diameter Small-world effect (1) • Six degrees of separation (Milgram 60s) – Random people in Nebraska were asked to send letters to stockbrokes in Boston – Letters can only be passed to first-name acquantices – Only 25% letters reached the goal – But they reached it in about 6 steps • Measuring path lengths: – Diameter (longest shortest path): max dij – Effective diameter: distance at which 90% of all connected pairs of nodes can be reached – Mean geodesic (shortest) distance l or Small-world effect (2) – 180 million people – 1.3 billion edges – Edge if two people exchanged at least one message in one month period 7 10 Pick a random node, count how many nodes are at distance 1,2,3... hops 6 10 Number of nodes • Distribution of shortest path lengths • Microsoft Messenger network 8 10 5 10 4 10 3 7 10 2 10 1 10 0 10 0 5 10 15 20 Distance (Hops) 25 30 Degree distributions (1) • Let pk denote a fraction of nodes with degree k • We can plot a histogram of pk vs. k • In a (Erdos-Renyi) random graph degree distribution follows Poisson distribution • Degrees in real networks are heavily skewed to the right • Distribution has a long tail of values that are far above the mean • Heavy (long) tail: – Amazon sales – word length distribution, … vs. Degree distributions (2) Many lowdegree nodes log(pk) Few highdegree nodes log(k) Degree distributions (3) x 10 -3 • Many real world networks contain hubs: highly connected nodes pk • We can easily distinguish between exponential and powerlaw tail by plotting on log-lin and log-log axis • We usually work with CDF instead of PDF pk (then the degree exponent is α=slope+1) • In scale-free networks maximum degree scales as n1/(α-1) 3.5 -2 10 3 lin-lin 2.5 log-lin -3 10 2 -4 10 1.5 1 -5 10 0.5 0 0 -6 200 400 -2 10 600 800 1000 10 0 200 400 600 800 k k log-log -3 10 -4 10 -5 10 -6 10 0 10 1 10 k 2 10 Degree distribution in a blog network 10 3 1000 Poisson vs. Scale-free network Poisson network (Erdos-Renyi random graph) Scale-free (power-law) network Degree distribution is Power-law Degree distribution is Poisson Function is scale free if: f(ax) = b f(x) Spectral properties • Scree plot Eigenvalue – Eigenvalues of graph adjacency matrix follow a power law – Network values (components of principal eigenvector) also follow a power-law Scree Plot Rank Temporal Graph Patterns • Conventional Wisdom: – Constant average degree: the number of edges grows linearly with the number of nodes – Slowly growing diameter: as the network grows the distances between nodes grow • We recently found: – Densification Power Law: networks are becoming denser over time – Shrinking Diameter: diameter is decreasing as the network grows Temporal Patterns – Densification • A very basic question: What is the relation between the number of nodes and the number of edges in a network? • Densification Power Law • Suppose that E(t) – N(t) … nodes at time t – E(t) … edges at time t Densification Power Law 1.69 N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! – But obeying the Densification Power Law N(t) Networks over time: Densification • Networks are becoming denser over time • The number of edges grows faster than the number of nodes – average degree is increasing Internet E(t) a=1.2 N(t) a … densification exponent: Citations 1 ≤ a ≤ 2: – a=1: linear growth – constant out-degree (assumed in the literature so far) – a=2: quadratic growth – clique E(t) a=1.7 N(t) Densification & degree distribution • How does densification affect degree distribution? • Given densification exponent a, the degree exponent is: Degree exponent over time pk=kγ (a) γ(t) – (a) For γ=const over time, we obtain densification only for 1<γ<2, then γ=a/2 – (b) For γ<2 degree distribution has to evolve according to: a=1.1 (b) γ(t) • Power-law: y=b xγ, for γ<2 E[y] = ∞ a=1.6 Shrinking diameters • Intuition and prior work say that distances between the nodes slowly grow as the network grows (like log n): – d ~ O(log N) – d ~ O(log log N) • Diameter Shrinks/Stabilizes over time – as the network grows the distances between nodes slowly decrease Internet Citations Patterns hold in many graphs • All these patterns can be observed in many real life graphs: – – – – – – – – – – World wide web [Barabasi] On-line communities [Holme, Edling, Liljeros] Who call whom telephone networks [Cortes] Autonomous systems [Faloutsos, Faloutsos, Faloutsos] Internet backbone – routers [Faloutsos, Faloutsos, Faloutsos] Movie – actors [Barabasi] Science citations [Leskovec, Kleinberg, Faloutsos] Co-authorship [Leskovec, Kleinberg, Faloutsos] Sexual relationships [Liljeros] Click-streams [Chakrabarti] Outline • Introduction • Properties of real-world networks – Properties of static networks – Properties of dynamic (evolving) networks) • Proposed graph generation model – Kronecker Graphs • • • • Properties of Kronecker Graphs Fitting Kronecker Graphs Experiments Observations and Conclusion Graph Generators • Lots of work – Random graph [Erdos and Renyi, 60s] – Preferential Attachment [Albert and Barabasi, 1999] – Copying model [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 1999] – Community Guided Attachment and Forest Fire Model [Leskovec, Kleinberg and Faloutsos, 2005] – Also work on Web graph and virus propagation [Ganesh et al, Satorras and Vespignani]++ • But all of these – Do not obey all the patterns – Or we are not able prove them Kronecker graphs • Want to have a model that can generate a realistic graph: – Static Patterns • Power Law Degree Distribution • Small Diameter • Power Law Eigenvalue and Eigenvector Distribution – Temporal Patterns • Densification Power Law • Shrinking/Constant Diameter • For Kronecker graphs all these properties can actually be proven Recursive Graph Generation • There are many obvious (but wrong) ways Initial graph Recursive expansion – Does not obey Densification Power Law – Has increasing diameter • Kronecker Product is exactly what we need Kronecker Product – a Graph Intermediate stage Adjacency matrix Adjacency matrix Kronecker Product – a Graph • Continuing multypling with G1 we obtain G4 and so on … G4 adjacency matrix Kronecker Graphs – Formally: • We create the self-similar graphs recursively: – Start with a initiator graph G1 on N1 nodes and E1 edges – The recursion will then product larger graphs G2, G3, …Gk on N1k nodes – Since we want to obey Densification Power Law graph Gk has to have E1k edges Kronecker Product – Definition • The Kronecker product of matrices A and B is given by NxM KxL N*K x M*L • We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices Kronecker Graphs • We propose a growing sequence of graphs by iterating the Kronecker product • Each Kronecker multiplication exponentially increases the size of the graph Kronecker Graphs – Intuition • Intuition: – Recursive growth of graph communities – Nodes get expanded to micro communities – Nodes in sub-community link among themselves and to nodes from different communities How to randomize a graph? • We want a randomized version of Kronecker Graphs • Obvious solution – Randomly add/remove some edges • Wrong! – is not biased – adding random edges destroys degree distribution, diameter, … • Want add/delete edges in a biased way • How to randomize properly and maintain all the properties? Stochastic Kronecker Graphs • Create N1N1 probability matrix P1 • Compute the kth Kronecker power Pk • For each entry puv of Pk include an edge (u,v) with probability puv 0.5 0.2 0.1 0.3 P1 Kronecker 0.25 0.10 0.10 0.04 multiplication 0.05 0.15 0.02 0.06 0.05 0.02 0.15 0.06 0.01 0.03 0.03 0.09 P2 Instance Matrix G2 flip biased coins Outline • Introduction • Properties of real-world networks – Properties of static networks – Properties of dynamic (evolving) networks) • Proposed graph generation model – Kronecker Graphs • • • • Properties of Kronecker Graphs Fitting Kronecker Graphs Experiments Observations and Conclusion Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters – First and the only generator for which we can prove all the properties Properties of Kronecker Graphs • Theorem: Kronecker Graphs have Multinomial in- and out-degree distribution (which can be made to behave like a Power Law) • Proof: – Let G1 have degrees d1, d2, …, dN – Kronecker multiplication with a node of degree d gives degrees d∙d1, d∙d2, …, d∙dN – After Kronecker powering Gk has multinomial degree distribution Eigen-value/-vector Distribution • Theorem: The Kronecker Graph has multinomial distribution of its eigenvalues • Theorem: The components of each eigenvector in Kronecker Graph follow a multinomial distribution • Proof: Trivial by properties of Kronecker multiplication Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns • Growth Power Law • Shrinking/Stabilizing Diameters Temporal Patterns: Densification • Theorem: Kronecker graphs follow a Densification Power Law with densification exponent • Proof: – If G1 has N1 nodes and E1 edges then Gk has Nk = N1k nodes and Ek = E1k edges – And then Ek = Nka – Which is a Densification Power Law Constant Diameter – Proof Sketch • Theorem: If G1 has diameter d then graph Gk also has diameter d • Observation: Edges in Kronecker graphs: where X are appropriate nodes • Example: Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters – First and the only generator for which we can prove all the properties Outline • Introduction • Properties of real-world networks – Properties of static networks – Properties of dynamic (evolving) networks) • Proposed graph generation model – Kronecker Graphs • • • • Properties of Kronecker Graphs Fitting Kronecker Graphs Experiments Observations and Conclusion Why fitting generative models? • Parameters tell us about the structure of a graph • Extrapolation: given a graph today, how will it look in a year? • Sampling: can I get a smaller graph with similar properties? • Anonymization: instead of releasing real graph (e.g., email network), we can release a synthetic version of it Problem definition • Find parameter matrix Θ which • We need to (efficiently) calculate P(G | 1 ) • And maximize over Θ (by using gradient descent) • (so we also need to the gradient) Fitting Kronecker to Real Data • Given a graph G and Kronecker matrix Θ we can calculate probability that Θ generated G P(G|Θ): 0.5 0.2 0.1 0.3 Θ 0.25 0.10 0.10 0.04 0.05 0.15 0.02 0.06 0.05 0.02 0.15 0.06 1 1 0 0 1 1 1 0 0 1 1 1 0.01 0.03 0.03 0.09 0 0 1 1 Θk P(G|Θ) G P(G | 1 ) k [u, v] 1 k [u, v] ( u ,v )G ( u ,v )G Challenge 1: Node labeling 0.25 0.10 0.10 0.04 0.5 0.2 0.05 0.15 0.02 0.06 0.1 0.3 0.05 0.02 0.15 0.06 0.01 0.03 0.03 0.09 Θ Θk G’ 1 3 2 4 2 G” 4 1 3 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 1 1 P(G’|Θ) = P(G”|Θ) • Graphs G’ and G” should have save probability P(G’|Θ) = P(G”|Θ) • So one needs to consider all node labelings σ P(G | ) P(G | , )P( ) • There are O(N!) such labelings • All labelings are apriori equally likely Challenge 2: calculating P(G|Θ,σ) • Calculating σ… node labeling P = Θk • Takes O(N2) time. Infeasible for large graphs 0.25 0.10 0.10 0.04 1 1 0 0 0.05 0.15 0.02 0.06 1 1 1 0 0.05 0.02 0.15 0.06 0 1 1 1 0.01 0.03 0.03 0.09 0 0 1 1 Θk G P(G|Θ, σ) Our solutions • Naïvely calculating P(G|Θ) takes O(N!N2) time • We can do it in O(E) • Solutions – Challenge 1: • We won’t consider all labelings • But use Markov Chain Monte Carlo (MCMC) sampling techniques to sample permutations from P(σ|G,Θ) – Challenge 2: • Real graphs are sparse: E << N • Calculate P(Gempty|Θ) and then “add” the edges. • This takes O(E) (and not O(N2)) Sampling node labelings (1) • Gradient over parameters • Sample the permutations from P(σ|G,Θ) and average them Sampling node labelings (2) Metropolis permutation sampling algorithm j k • Need to efficiently calculate the likelihood ratios • But the permutations σ(i) and σ(i+1) only differ at 2 positions • So we only traverse to update 2 rows (columns) of Θk • We can evaluate the likelihood ratio efficiently Calculating P(G|Θ,σ) • Real graphs are sparse so we first calculate likelihood of empty graph • Probability of edge (i,j) is in general pij =θ1aθ2b θ3c θ4d • By using Taylor approximation to pij and summing the multinomial series we obtain: • We approximate the likelihood: Θk pij =θ1aθ2bθ3cθ1d Taylor approximation log(1-x) ~ -x – 0.5 x2 Convergence of fitting • Can gradient descent recover true parameters? • How nice (smooth, without local minima) is optimization space? – Generate a graph from random parameters – Start at random point and use gradient descent – We recover true parameters 98% of the times • How does algorithm converge to true parameters with gradient descent iterations? Log-likelihood Avg abs error 1st eigenvalue Diameter AS graph (N=6500, E=26500) Degree distribution Adjacency matrix eigen values Hop plot Network value Epinions graph (N=76k, E=510k) Degree distribution Hop plot Adjacency matrix eigen values Network value Scalability • Fitting scales linearly with the number of edges Model selection • How big should parameters matrix Θ be? • We propose to use Bayes Information Criterion (BIC): • We tradeoff between the model fit and the model complexity Conclusion • We proposed Kronecker Graphs – We can provable properties of Kronecker Graph model • We presented scalable algorithms for fitting Kronecker Graphs – Use simulation techniques to overcome superexponential number of node labelings – Use Taylor approximation to quickly evaluate the likelihood • Kronecker Graphs fit well References – Graph Evolution: Densification and Shrinking Diameters, by Jure Leskovec, Jon Kleinberg and Christos Faloutsos, ACM TKDD 2007 – Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by Jure Leskovec, Deepay Chakrabarti, Jon Kleinberg and Christos Faloutsos, PKDD 2005 – Scalable Modeling of Real Graphs using Kronecker Multiplication, by Jure Leskovec and Christos Faloutsos, in submission