School of Information University of Michigan SI 614 Search in random networks Lecture 16
Download ReportTranscript School of Information University of Michigan SI 614 Search in random networks Lecture 16
School of Information University of Michigan SI 614 Search in random networks Lecture 16 Search in random networks Motivation Power-law (PL) networks, social and P2P Analysis of scaling of search strategies in PL networks Simulation artificial power-law topologies, real Gnutella networks Comparison with existing P2P search strategies Reflector, Morpheus Path finding Directed Search Freenet 2 How do we search? Mary Who could introduce me to Richard Gere? Bob Jane # of telephone numbers from which calls were made AT&T Call Graph # of telephone numbers called Aiello et al. STOC ‘00 Gnutella network power-law link distribution proportion of nodes 10 10 10 data power-law fit t = 2.07 2 1 0 10 0 1 10 number of neighbors summer 2000, data provided by Clip2 Preferential attachment model Nodes join at different times The more connections a node has, the more likely it is to acquire new connections Growth process produces power-law network ping host cache ping Gnutella and the bandwidth barrier file sharing w/o a central index queries broadcast to every node within radius ttl as network grows, encounter a bandwidth barrier (dial up modems cannot keep up with query traffic, fragmenting the network) Clip 2 report Gnutella: To the Bandwidth Barrier and Beyond http://www.clip2.com/gnutella.html#q17 power-law graph number of nodes found 94 67 63 54 2 6 1 Poisson graph number of nodes found 93 19 15 11 7 3 1 Search with knowledge of 2nd neighbors Outline of search strategy pass query onto only one neighbor at each step OPTIONS requires that nodes sign query - avoid passing message onto a node twice requires knowledge of one’s neighbors degree - pass to the highest degree node requires knowledge of one’s neighbors neighbors - route to 2nd degree neighbors Generating functions M.E.J. Newman, S.H. Strogatz, and D.J. Watts ‘Random graphs with arbitrary degree distributions and their applications’, PRE, cond-mat/0007235 Generating functions for degree distributions G0 ( x ) pk x k k 0 Useful for computing moments of degree distribution, component sizes, and average pathlengths Fun with generating functions normalization condition: probabilities sum to 1 G0 (1) pk 1 k 0 derivatives: the generating function contains all the information of the degree distribution 1 G0 pk k! x k k x 0 Fun with generating functions (cont’d) Expected degree of a randomly chosen vertex k kpk G0' (1) k Higher moments of degree distribution kn n d n k pk x G0 ( x) k dx x 1 Example: Poisson distribution Let p = z/N be the probability of an edge existing between two vertices (z is the average degree) N k G0 ( x) p (1 p) N k x k k 0 k N (1 p px) N ~ e z ( x 1) for large N G'0 ( x) ze z ( x 1) G'0 (1) z 1 k G0 pk k! x k x 0 1 z k ez k! just the regular Poisson distribution Introducing cutoffs kmax N 1 a node cannot have more connections than there are other nodes This is important for exponents close to 2 1 1 pk 1 Ct xt 1 C 6 2 2 p( k 1000, t 2) pk ~ 0.001 1000 Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors: (1 p(k 1000,t 2))1000 ~ 0.36 without a cutoff, for t = 2 have > 50% chance of observing a node with more neighbors than there are nodes for t = 2.1, have a 25% chance Selecting from a variety of cutoffs kmax N 2. pk Ck t ek / 3. Ck pk 0 Newman et al. k CN 1t t otherwise Aiello et al. Generating Function G0 x C CN 1 t t k k x k 1 1 million websites (~ 1997) proportion of sites w/ so many links 1. N 1000 # of sites linking to the site Aiello’s ‘conservative’ vs. Havlin’s ‘natural’ cutoff n(k) N * pk 1 cutoff where expected number of nodes of degree k is 1 Ck t N 1 1 k ~ Nt 1 k n(k) N* k kmax cutoff so that expected number of nodes of degree > k is 1 1 k pk 1 ck t ~ N 1 k kmax 1t kmax ~ N 1 kmax ~ N 1 t 1 The imposed cutoff can have a dramatic effect on the properties of the graph degrees drawn at random, for t = 2, and N = 1000 Generating functions for degree distributions Random graphs with arbitrary degree distributions and their applications by Newman, Strogatz & Watts 2 2 2 1 G0 ( x ) pk x k k 0 1 pk ~ k t is the probability that a randomly chosen vertex has degree k k kpk G0' (1) 1 2 k 2 2 is a generating function G1 x G0' x G0' 1 z2 G0' 1G1' 1 is the expected degree of a randomly chosen vertex is the distribution of remaining outgoing edges following and edge is the expected number of second degree neighbors assuming neighbors don’t share edges search with knowledge of first neighbors kmax G0 ( x ) c k t x k 1 Generating function with cutoff kmax G (x) G0 ( x ) c k 1t x k 1 x 1 ' 0 kmax kmax 1 1 G0' (1) k c k 1t k 1t dk G0' ( x ) c kmax 1t k 1 G (x) ' ' k x G0 (1) G0 (1) x 1 ' 1 c kmax 1t k 2 ' k ( k 1) x G0 (1) 2 constant in N Average degree of vertex 1 2 t 1 kmax t 2 Average number of neighbors following an edge for 2<t<3, and kmax~Na, decreases with N 3 t 2 t 2 t k ( t 2) 2 ( t 1) k 1 max max (3 t ) G1' (1) ' G0 (1) (t 2)(3 t ) search with knowledge of first neighbors (cont’d) z1B G (1) ' 1 In the limit t->2, 3 t 3 t 1 kmax t 2 kmax ' 2t G0 (1) (3 t ) 1 kmax (3 t ) ' 1 G (1) 3 t kmax kmax log(kmax ) Let’s for the moment ignore the fact that as we do a random walk, we encounter neighbors that we’ve seen before N s = number of steps = z1B Search time with different cutoffs If kmax = N, s(t ) N N t 2 N ,2 t 3 3 t 3 t kmax N s(2.1) N 0.1 s If kmax = N1/(t-1), N log(kmax ) log(N ),t 2 kmax s(t ) t 2 2 N N 3t N t 1 ,2 t 3 3 t kmax N t 1 s(2.1) s(2) N 0.18 N log(kmax ) log(N ) kmax search with knowledge of first neighbors (cont’d) If kmax = N1/t, s So the best we can do is N 3 t kmax N N 1 N 23 / t ,2 t 3 (N t )3t for exponents close to 2 2nd neighbor random walk, ignoring overlap: ns z2B N S~ N z2B N 2 t 2 k z2B G1(G1( x )) G1' (1) 2t 1 k (3 t ) x x 1 max S N ,t ~ N 312 t 3 t max S N ,t 2.1 ~ N 0.15 2 Following the degree sequence Go to highest degree node, then next highest, … etc. z1D kmax kmax a 1t Nk 1t dk ~ Nakmax a ~ s = # of steps taken 2nd neighbors, ignoring overlap: 2(2 t ) z1DG1' ( x ) ~ Nak max 2(t 2) s ~ k max ~ N 24 / t Sdeg N ,t 2.1 N 0.1 Ratio of the degree of a node to the expected degree of its highest degree neighbor for 10,000 node power-law graphs of varying exponents t = 2.00 t = 2.25 t = 2.50 t = 2.75 t = 3.00 t = 3.25 t = 3.50 t = 3.75 20 degree of neighbor - 1 degree of node 10 5 2 1 0 10 20 30 40 50 60 degree of node 70 80 90 100 Exponents t close to 2 required to search effectively Gnutella World Wide Web, Social networks, t ~ 2.0-2.3, high degree nodes: directories, search engines AT&T call graph t ~ 2.1 Actor collaboration graph (imdb database) t ~ 2.0-2.2 number of actors/actresses 105 actors, t = 2 actresses, t = 2.1 104 103 102 101 100 0 10 101 102 103 number of costars 104 Following the degree sequence 17 18 10 5 1 6 9 8 50 Complications Should not visit same node more than once Many neighbors of current node being visited were also neighbors of previously visited nodes, and there is a bias toward high degree nodes being ‘seen’ over and over again Status and degree of node visited 30 not visited visited neighbors visited degree of node 25 20 15 10 5 0 0 100 200 300 step 400 500 600 1 random walk degree sequence 0.1 seeking high degree nodes speeds up the search process -2 10 -3 10 -4 10 1 10 10 2 10 3 10 4 10 5 10 6 step about 50% of a 10,000 node graph is explored in the first 12 steps cumulative nodes found at step proportion of nodes found at step Progress of exploration in a 10,000 node graph knowing 2nd degree neighbors 1 random walk degree sequence 0.8 0.6 0.4 0.2 0 12 20 40 step 60 80 100 Scaling of search time with size of graph 3 covertime for half the nodes 10 random walk a = 0.37 fit degree sequence a = 0.24 fit 2 10 1 10 0 10 1 10 2 10 3 10 size of graph 4 10 5 10 Comparison with a Poisson graph 10 G0 x e z x1 x G1 x G0 x G0 x z 1 10 0 10 0 10 1 10 step 10 2 expected degree and expected degree following a link are equal scaling is linear 10 3 cover time for 1/2 of graph degree of current node 10 Poisson power-law 2 10 10 10 10 5 4 constant av. deg. = 3.4 g = 1.0 fit 3 2 1 0 10 1 2 4 10 10 10 number of nodes in graph 10 6 Gnutella network 50% of the files in a 700 node network can be found in < 8 steps cumulative nodes found at step 1 0.8 0.6 0.4 0.2 0 high degree seeking 1st neighbors high degree seeking 2nd neighbors 0 20 40 60 step 80 100 Required modifications to nodes • Maintain a list of files in their neighborhood • Check query against list. • Periodically contact neighbors to maintain list • Append ID to each query processed Tradeoff storage/cpu (available) for bandwidth (limited) Theory vs. reality: • overloading high degree nodes but no worse than original scenario where all nodes handle all traffic assume high degree -> high bandwidth so can carry the traffic load • fewer nodes used for routing, system is more susceptible to malicious attack Partial implementation: • localized indexing • traffic routed to high degree nodes Clip2 Distributed Search Solutions http://dss.clip2.com © Clip2.com, Inc. Broadband user running Reflector Broadband user running Gnutella Dial-up user running Gnutella Connection-preferencing rules LimeWire, BearShare: drop connections to unresponsive hosts drives slower hosts to have fewer connections & move to edge of network Supernodes Kazaa, BearShare defender, Morpheus SuperNodes from Clip2: Morpheus out of the Underworld http://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html Conclusions Search is faster and scales in power-law networks Networks intended to be searched, such as Gnutella, have a favorable P-L topology High degree strategy has partially been implemented in existing p2p clients, such as BearShare, Kazaa & Morpheus A PL link distribution shortens the average shortest path zr a r 1 Poisson: PL: z2 z1 z1 a = z1 a > z1 r 1 z1 10 6 power-law a =2.5 Poisson a =1.0 6 4 a neighbors at radius 10 PL PS 5 10 10 10 10 10 4 2 3 0 2 10 4 6 10 N 10 2 1 0 1 1.5 2 2.5 3 radius 3.5 4 4.5 5 What about the shortest path discovered along the way? B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103. B each node passes message to highest degree neighbor it hasn’t passed the message to previously ‘cut off’ loops A A high degree seeking strategy finds shortest paths whose average scales logarithmically with the size of the graph 8 7.5 av. path length found 7 6.5 6 5.5 5 4.5 4 PL high degree 0.72*ln(N) 3.5 3 2 10 10 3 10 N 4 10 5 Scaling of the path length found using a • random strategy on a PL graph • high-degree strategy on a Poisson graph av. path length found 10 10 2 PL Poisson 0.46 N 0.48 N 1 10 2 10 3 10 N 4 10 5 But… Search costs are prohibitive, might as well do a BFS 10 median search cost 10 10 10 4 3 2 1 PL high degree PL rand Poisson high degree 0 10 2 10 10 3 10 N 4 10 5 Freenet Queries are passed to one peer at a time. Queries routed to high degree nodes. Has a power-law topology Theodore Hong, ‘Performance’ chapter in O’Reilley’s “Peer-to-Peer, Harnessing the Power of Disruptive Technologies” Scales as N0.275 with the size of the network, N. Theodore Hong, power - law link distribution of a simulated Freenet network Theodore Hong, scaling of mean search time on a simulated Freenet network Node specialization key to Freenet’s speed Each node forwards query to node with “closest” hash key Node passing back a match remembers the address the data came from Results in nodes developing a bias towards a part of the keyspace 112 659 ?356? 356 340 340 388 388 396 396 135 135 214 214 Queries are naturally routed to high degree nodes Use keys for orientation Applications to peer to peer networks Adriana Iamnitchi, Matei Ripeanu, Ian Foster “Small-World File-Sharing Communities”, http://arxiv.org/abs/cs.DC/0307036 create localized indeces for peers with similar download patterns Foreseer: Proposed P2P architecture with friend & neighbor overlay friend: has shared a file neighbor: short ping time Fletcher, George , Sheth, Hardik and Börner, Katy. (2004). Unstructured Peer-to-Peer Networks: Topological Properties and Search Performance. Third International Joint Conference on Autonomous Agents and MUltiAgent Systems. W6: Agents and Peer-to-Peer Computing, Moro, Gianluca, Bergmanschi, Sonia and Aberer, Karl, Eds., New York, July 19-23, pp. 2-13. http://ella.slis.indiana.edu/~katy/paper/04-fletcher.pdf How do networks become navigable? Aaron Clauset and Cris Moore arxiv.org/abs/cond-mat/0309415 In the limit N-> long range link distribution becomes 1/r, r = lattice distance between nodes