Transcript Slide 1
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: (HW3 is optional) Find node correspondences between two graphs Incentives: European chocolates! Fame! Up to 10% extra credit Due: Monday Nov 14 No late days! 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 Random network (Erdos-Renyi random graph) Scale-free (power-law) network Degree distribution is Power-law Degree distribution is Binomial 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1-3 [Mitzenmacher, ‘03] We will analyze the following model: Nodes arrive in order 1,2,3,…,n When node i is created it makes a single link to an earlier node i chosen: Node i 1)With prob. p, i links to j chosen uniformly at random (from among all earlier nodes) 2) With prob. 1-p, node i chooses node j uniformly at random and links to a node j points to. 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4 Claim: The described model generates networks where the fraction of nodes with degree k scales as: 1 P(di k ) k (1 ) q where q=1-p Consider deterministic and continuous approximation to the in-degree of node i as a function of time t t is the number of nodes that have arrived so far In-degree di(t) of node i (i=1,2,…,n) is a continuous quantity and it grows deterministically with time t 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5 Node i Initial condition: di(t)=0, when t=i (node i just arrived) Expected change of di(t) over time: Node i gains an in-link at step t+1 only if a link from a newly created node t+1 points to it. What’s the probability of this event? With prob. p node t+1 links randomly: Links to our node i with prob. 1/t With prob. 1-p node t+1 links preferentially: Links to our node i with prob. di(t)/t So: Prob. node t+1 links to i is: 7/18/2015 𝟏 𝐩 𝒕 + 𝟏−𝒑 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 𝒅𝒊 (𝒕) 𝒕 6 Expected change of di(t): 𝟏 𝒕 𝒅𝒊 (𝒕 + 𝟏) − 𝒅𝒊 (𝒕) = 𝒑 + 𝟏 − 𝒑 d𝑑𝑖 (𝑡) d𝑡 1 d𝑑𝑖 (𝑡) 𝑝+𝑞𝑑𝑖 (𝑡) 1 𝑡 = 1 d𝑑𝑖 (𝑡) 𝑝+𝑞𝑑𝑖 (𝑡) 1 ln 𝑞 𝑝 + 𝑞𝑑𝑖 𝑡 𝑝 + 𝑞𝑑𝑖 𝑡 = 𝐴 7/18/2015 𝑑𝑖 (𝑡) 𝑡 =𝑝 + 1−𝑝 = 𝒅𝒊 (𝒕) 𝒕 𝑝+𝑞𝑑𝑖 (𝑡) 𝑡 Divide by p+q di(t) 1 d𝑡 𝑡 = 1 d𝑡 𝑡 Integrate Let A=ec and exponentiate = ln 𝑡 + 𝑐 𝑡𝑞 𝑑𝑖 𝑡 = 1 𝑞 𝐴𝑡 𝑞 − 𝑝 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7 What is the constant A? 1 𝑑𝑖 𝑡 = 𝐴𝑡 𝑞 − 𝑝 𝑞 We know: 𝑑𝑖 𝑖 = 0 So: 𝑑𝑖 𝑖 = 7/18/2015 𝐴= 1 𝑞 𝐴𝑖 𝑞 − 𝑝 = 0 𝑝 𝑖𝑞 𝑑𝑖 𝑡 = 𝑝 𝑞 𝑡 𝑞 𝑖 −1 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8 What is F(k) the fraction of nodes that has degree at least k at time t? How many nodes i have degree > k? 𝑑𝑖 𝑡 = 𝑝 𝑞 then: i < t 𝑡 𝑞 𝑖 𝑞 𝑘 𝑝 −1 >𝑘 Motivate this better! Why is F(d) as CDF if it is really a CCDF? 1 −1 −𝑞 There are t nodes total at time t so F(k): q F (k ) k 1 p 7/18/2015 1 q Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 What is the fraction of nodes with degree exactly k? Take derivative of F(k): F(k) is CDF, so F’(k) is the PDF 1 q F ' (k ) k 1 pp 7/18/2015 1 1 q q F (k ) k 1 p 1 1 q Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10 1 q Show simulations from Barabasi rather than showing calculations! Two changes from the Gnp Groth + Preferential attachment Do we need both? Yes! Add growth to Gnp (assume 1 edge is added at each step) 𝑋𝑗 = degree of node j at the end 𝑋𝑗 (𝑢) = 1 if node u links to j, else 0 𝑋𝑗 = 𝑋𝑗 𝑗 + 1 + 𝑋𝑗 𝑗 + 2 + ⋯ + 𝑋𝑗 (𝑛) 𝐸[𝑋𝑗 (𝑢)] = 𝑃[𝑢 links to 𝑗] = 𝐸[𝑋𝑗 ] = 1 𝑛 𝑢=𝑗 𝑢−1 1 𝑢−1 = 𝐻𝑛−1 − 𝐻𝑗 𝐸[𝑋𝑗 ] = log 𝑛 − 1 − log 𝑗 = log 7/18/2015 Hn…nth harmonic number: 𝑛 1 𝐻𝑛 = ≈ log(𝑛) 𝑘 𝑘=1 𝑛−1 𝑗 ≠ 𝒏 𝜶 𝒋 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11 Preferential attachment gives power-law degrees Intuitively reasonable process Can tune p to get the observed exponent On the web, P[node has degree d] ~ d -2.1 2.1 = 1+1/(1-p) p ~ 0.1 There are also other network formation mechanisms that generate scale-free networks: Random surfer model Forest Fire model 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12 Skip! Copying mechanism (directed network) select a node and an edge of this node attach to the endpoint of this edge Walking on a network (directed network) the new node connects to a node, then to every first, second, … neighbor of this node Attaching to edges select an edge attach to both endpoints of this edge Node duplication duplicate a node with all its edges randomly prune edges of new node 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13 Preferential attachment is not so good at predicting network structure Age-degree correlation Links among high degree nodes On the web nodes sometime avoid linking to each other Further questions: What is a reasonable probabilistic model for how people sample through web-pages and link to them? Short random walks Effect of search engines – reaching pages based on number of links to them 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14 How does the connectivity of the network change as the vertices get removed? [Albert et al. 00; Palmer et al. 01] Vertices can be removed: Uniformly at random In order of decreasing degree It is important for epidemiology Removal of vertices corresponds to vaccination 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15 Get Colorful plots from Barabasi Real-world networks are resilient to random attacks You need to remove all web-pages of degree > 5 to disconnect the web But this is a very small fraction of all web pages Random network has better resilience to targeted attacks Mean path length Preferential removal Internet (Autonomous systems) Random removal Fraction of removed nodes 7/18/2015 Random network Fraction of removed nodes Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16 Preferential attachment is a model of a growing network What governs network growth and evolution? P1) Node arrival process: When nodes enter the network P2) Edge initiation process: Each node decides when to initiate an edge P3) Edge destination process: 7/18/2015 The node determines destination of the edge Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18 [Leskovec et al., KDD ’08] 4 online social networks with exact edge arrival sequence For every edge (u,v) we know exact time of the appearance tuv Directly observe mechanisms leading to global network properties and so on for millions… (F) (D) (A) (L) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19 (F) (D) Flickr: Exponential (A) (L) Answers: Sub-linear 7/18/2015 Delicious: Linear LinkedIn: Quadratic Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 How long do nodes live? Node life-time is the time between the 1st and the last edge of a node Lifetime of a node 1st edge of node i last edge of node i time How do nodes “wake up” to create links? 1st edge of node i 7/18/2015 Edge creation events last edge of node i Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu time 21 LinkedIn Lifetime a: time between node’s first and last edge Node lifetime is exponentially distributed: −𝜆𝑎 𝑝𝑙 𝑎 = 𝜆𝑒 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22 Invent better notation for delta! How do nodes “wake up” to create edges? Edge gap 𝜹𝒊 𝒅 : time between dth and d+1st edge of node i: Let 𝑡𝑖 𝑑 be the creation time of d-th edge of node i 𝛿𝑖 𝑑 = 𝑡𝑖 𝑑 + 1 − 𝑡𝑖 𝑑 𝛿𝑖 1 1st edge of node i 𝛿𝑖 2 𝛿𝑖 3 last edge of node i time 𝜹 𝒅 is a distribution (histogram) of 𝜹𝒊 𝒅 over all nodes 𝑖 𝛿𝑖 1 7/18/2015 Node i 𝛿𝑖+1 1 Node i+1 𝛿𝑖+2 1 Node i+2 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23 LinkedIn Edge gap δ(d): inter-arrival time between dth and d+1st edge For every d we get a separate histogram pg ( (1)) (1) e 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24 Invent better notation for delta! How do α and β change as a function of d? Fit to each plot of δ(d): 7/18/2015 pg ( (d )) (d ) ( d ) e (d ) Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25 α is const, β linear in d – gaps get smaller with d d pg ( (d )) (d ) e Log(probability) (d ) d=3 d=2 Degree d=1 Log(edge gap) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26 Source node i wakes up and creates an edge How does i select a target node j? What is the degree of the target j? Do preferential attachment really hold? How many hops away is the target j? Are edges attaching locally? 2 7/18/2015 3 4 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27 [Leskovec et al., KDD ’08] Are edges more likely to connect to higher degree nodes? pe (k ) k PA Gnp Flickr 7/18/2015 Network τ Gnp 0 PA 1 Flickr 1 Delicious 1 Answers 0.9 LinkedIn 0.6 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28 [Leskovec et al., KDD ’08] Just before the edge (u,w) is placed how many hops are between u and w? Fraction of triad closing edges Gnp PA Flickr Network %Δ Flickr 66% Delicious 28% Answers 23% LinkedIn 50% Real edges are local! w Mostuof them close v triangles! 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29 Explain the strategies better Remove the example of Random-Random. Explain likelihood better Focus only on triad-closing edges New triad-closing edge (u,w) appears next Model this as 2 independent choices: v’ 1. u choses neighbor v 2. v choses neighbor w and connect u to w E.g.: Under Random-Random: 1 1 5 2 1 5 𝑝 𝑢, 𝑤 = ⋅ + ⋅ 1 = u w v 3 10 Under a particular pair of “strategies”: Likelihood of the graph = 𝑢,𝑤 ∈𝐸 𝑝 𝑢, 𝑤 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30 Improvement over the baseline: Baseline: Pick a random node 2 hops away Select w (2nd node) Strategy to select v (1st node) Strategies to pick a neighbor: 7/18/2015 random: uniformly at random deg: proportional to its degree com: prop. to the number of common friends last: prop. to time since last activity comlast: prop. to com*last u w v Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31 [Leskovec et al., KDD ’08] The model of network evolution Process P1) Node arrival Model • Node arrival function is given • Node lifetime is exponential P2) Edge initiation • Edge gaps get smaller as the degree increases P3) Edge Pick edge destination using destination random-random 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32 [Leskovec et al., KDD ’08] Skip Theorem: Exponential node lifetimes and power-law with exponential cutoff edge gaps lead to power-law degree distributions Interesting as temporal behavior predicts structural network property 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33 Skip! Given the model one can take an existing network continue its evolution Compare true and predicted (based on the theorem) degree exponent: 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35 How do networks evolve at the macro level? What are global phenomena of network growth? Questions: What is the relation between the number of nodes n(t) and number of edges e(t) over time t? How does diameter change as the network grows? How does degree distribution evolve as the network grows? 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36 [Leskovec et al., KDD 05] N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is E(t+1) =2 * E(t) A: over-doubled! But obeying the Densification Power Law 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37 [Leskovec et al., KDD 05] Internet E(t) What is the relation between the number of nodes and the edges over time? a=1.2 First guess: constant average degree over time Networks are denser over time Densification Power Law: N(t) Citations E(t) a … densification exponent (1 ≤ a ≤ 2) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu a=1.6 N(t) 38 [Leskovec et al. KDD 05] Densification Power Law the number of edges grows faster than the number of nodes – average degree is increasing or equivalently a … densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out-degree (traditionally assumed) a=2: quadratic growth – clique 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39 [Leskovec et al. KDD 05] Internet diameter Prior models and intuition say that the network diameter slowly grows (like log N, log log N) size of the graph Diameter shrinks over time as the network grows the distances between the nodes slowly decrease 7/18/2015 diameter Citations Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu time 40 Is shrinking diameter just a consequence of densification? diameter [Leskovec et al. TKDD 07] Erdos-Renyi random graph Densification exponent a =1.3 size of the graph Densifying random graph has increasing diameter There is more to shrinking diameter than just densification 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 Make sure that Power-laws and PA fit into a single lecture Then 2nd lecture (evolution) will be finished in time) 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42 Is it the degree sequence? Compare diameter of a: diameter True network (red) Random network with the same degree distribution (blue) Citations size of the graph Densification + degree sequence give shrinking diameter 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43 [Leskovec et al. TKDD 07] How does degree distribution evolve to allow for densification? Option 1) Degree exponent n is constant: Fact 1: For degree exponent 1<n < 2: a = 2/ Email network 7/18/2015 A consequence of what we learned in last class: ■ Power-laws with exponents <2 have infinite expectations. ■ So, by maintaining constant degree exponent the average degree grows. Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44 [Leskovec et al. TKDD 07] How does degree distribution evolve to allow for densification? Option 2) Exponent n evolves with graph size n: Fact 2: Citation network 7/18/2015 Remember, expected degree is: 𝛾−1 𝐸𝑥 = 𝑥 𝛾 − 2 𝑚𝑖𝑛 So has to decay as as function of graph size for the avg. degree to go up Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45 [Leskovec et al. TKDD 07] Want to model graphs that density and have shrinking diameters Intuition: How do we meet friends at a party? How do we identify references when writing papers? w 7/18/2015 v Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46 [Leskovec et al. TKDD 07] The Forest Fire model has 2 parameters: p … forward burning probability r … backward burning probability The model: Each turn a new node v arrives Uniformly at random chooses an “ambassador” w Flip 2 geometric coins to determine the number of in- and out-links of w to follow “Fire” spreads recursively until it dies New node v links to all burned nodes Geometric distribution: 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 47 Forest Fire generates graphs that densify and have shrinking diameter E(t) densification 1.32 N(t) 7/18/2015 diameter diameter N(t) Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 48 Forest Fire also generates graphs with power-law degree distribution in-degree log count vs. log in-degree 7/18/2015 out-degree log count vs. log out-degree Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49 Fix backward probability r and vary forward burning prob. p Notice a sharp transition between sparse and clique-like graphs Increasing diameter Sparse graph Clique-like graph Constant diameter Decreasing diameter Sweet spot is very narrow 7/18/2015 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50