Transcript Slide 1
School of Information University of Michigan Search in networks Lada Adamic (U. Michigan) NetSci Workshop May 16th, 2006 Outline Search in structured networks small world experiments geographical models hierarchical models studies: HP labs email network (simulated) Club Nexus online community (simulated) Phone interview company (survey) LiveJournal (simulated) Search in unstructured networks power law networks Erdos-Renyi networks P2P networks (Gnutella example) Search in structured networks Small world experiments then MA NE Milgram’s experiment (1960’s): Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target. Milgram’s small world experiment Target person worked in Boston as a stockbroker. 296 senders from Boston and Omaha. 20% of senders reached target. Typical strategy – if far from target choose someone geographically closer, if close to target geographically, choose someone professionally closer average chain length = 6.5 “Six degrees of separation” Small world experiments now email experiment Dodds, Muhamad, Watts, Science 301, (2003) 18 targets 13 different countries 24,163 message chains 384 reached their targets average path length 4.0 image by Stephen G. Eick http://www.bell-labs.com/user/eick/index.html (unrelated to small world experiment…) Small world experiment at Columbia Successful chains disproportionately used • weak ties (Granovetter) • professional ties (34% vs. 13%) • ties originating at work/college • target's work (65% vs. 40%) . . . and disproportionately avoided • hubs (8% vs. 1%) (+ no evidence of funnels) • family/friendship ties (60% vs. 83%) Strategy: Geography -> Work Why study small world phenomena? Curiosity: Why is the world small? How are people able to route messages? Social Networking as a Business: Friendster, Orkut, MySpace,FaceBook LinkedIn, Spoke, VisiblePath Six degrees of separation - to be expected Pool and Kochen (1978) - average person has 500-1500 acquaintances Ignoring clustering, other redundancy … ~ 103 first neighbors, 106 second neighbors, 109 third neighbors But networks are clustered: my friends’ friends tend to be my friends Watts & Strogatz (1998) - a few random links in an otherwise clustered graph give an average shortest path close to that of a random graph Is this the whole picture? Why are small worlds navigable? How are people are able to find short paths? How to choose among hundreds of acquaintances? Strategy: Simple greedy algorithm - each participant chooses correspondent who is closest to target with respect to the given property Models geography Kleinberg (2000) hierarchical groups Watts, Dodds, Newman (2001), Kleinberg(2001) high degree nodes Adamic, Puniyani, Lukose, Huberman (2001), Newman(2003) Reverse small world experiment Killworth & Bernard (1978): Given hypothetical targets (name, occupation, location, hobbies, religion…) participants choose an acquaintance for each target Acquaintance chosen based on (most often) occupation, geography only 7% because they “know a lot of people” Simple greedy algorithm: most similar acquaintance two-step strategy rare Spatial search Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’ Proc. 32nd ACM Symposium on Theory of Computing, 2000. (Nature 2000) “The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain” S.Milgram ‘The small world problem’, Psychology Today 1,61,1967 nodes are placed on a lattice and connect to nearest neighbors additional links placed with puv~ d r uv no locality When r=0, links are randomly distributed, ASP ~ log(n), n size of grid When r=0, any decentralized algorithm is at least a0n2/3 p ~ p0 When r<2, expected time at least arn(2-r)/3 Overly localized links on a lattice When r>2 expected search time ~ N(r-2)/(r-1) 1 p~ 4 d Links balanced between long and short range When r=2, expected time of a DA is at most C (log N)2 1 p~ 2 d Hierarchical social network models Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’ NIPS 14, 2001 h Hierarchical network models: b=3 Individuals classified into a hierarchy, hij = height of the least common ancestor. pij ~ b a hij e.g. state-county-city-neighborhood industry-corporation-division-group Theorem: If a = 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure models: Individuals belong to nested groups q = size of smallest group that v,w belong to f(q) ~ q-a Theorem: If a = 1 and outdegree is polylogarithmic, can s ~ O(log n) Sketch of proof l2|R|<|R’|<l|R| R R’ T S k = c log2n calculate probability that s fails to have a link in R’ Identity and search in social networks Watts, Dodds, Newman (Science,2001) individuals belong to hierarchically nested groups pij ~ exp(-a x) multiple independent hierarchies h=1,2,..,H coexist corresponding to occupation, geography, hobbies, religion… Identity and search in social networks Watts, Dodds, Newman (2001) Message chains fail at each node with probability p Network is ‘searchable’ if a fraction r of messages reach the target q (1 p ) L L r N=102400 N=204800 N=409600 Small World Model, Watts et al. Fits Milgram’s data well Model parameters: N = 108 z = 300 g = 100 b = 10 a= 1, H = 2 Lmodel= 6.7 Ldata = 6.5 more slides on this: http://www.aladdin.cs.cmu.edu/workshops/wsa/papers/dodds-2004-04-10search.pdf High degree search Adamic et al. Phys. Rev. E, 64 46135 (2001) Mary Who could introduce me to Richard Gere? Bob Jane Small world experiments so far Classic small world experiment: Given a target individual, forward to one of your acquaintances Observe chains but not the rest of the social network Reverse small world experiment (Killworth & Bernard) Given a hypothetical individual, which of your acquaintances would you choose Observe individual’s social network and possible choices, but not resulting chains or complete social network Testing search models on social networks advantage: have access to entire communication network and to individual’s attributes Use a well defined network: HP Labs email correspondence over 3.5 months Edges are between individuals who sent at least 6 email messages each way 450 users median degree = 10, mean degree = 13 average shortest path = 3 Node properties specified: degree geographical location position in organizational hierarchy Can greedy strategies work? Strategy 1: High degree search Power-law degree distribution of all senders of email passing through HP labs 10 0 outdegree distribution a = 2.0 fit of senders proportionfrequency 10 10 10 10 -2 -4 -6 -8 10 0 10 1 10 2 10 3 10 outdegree number of recipients sender has sent email to 4 Filtered network (at least 6 messages sent each way) Degree distribution no longer power-law, but Poisson 35 10 0 p(k) 25 p(k) 30 10 -2 20 15 10 10 -4 0 20 40 k 60 80 5 0 0 20 40 60 number of email correspondents, k 80 It would take 40 steps on average (median of 16) to reach a target! Strategy 2: Geography Communication across corporate geography 1U 1L 87 % of the 4000 links are between individuals on the same floor 4U 2U 3U 2L 3L Cubicle distance vs. probability of being linked 0 10 measured 1/r proportion of linked pairs 1/r2 -1 10 -2 10 optimum for search -3 10 2 10 distance in feet 3 10 Strategy 3: Organizational hierarchy Email correspondence superimposed on the organizational hierarchy Example of search path distance 2 distance 1 distance 1 distance 1 hierarchical distance = 5 search path distance = 4 Probability of linking vs. distance in hierarchy observed fit exp(-0.92*h) probability of linking 0.6 0.5 0.4 0.3 0.2 0.1 0 2 4 6 hierarchical distance h 8 10 in the ‘searchable’ regime: 0 < a < 2 (Watts, Dodds, Newman 2001) Results 5 x 10 distance hierarchy geography geodesic org random median 4 7 3 6 28 mean 5.7 (4.7) 12 3.1 6.1 57.4 4 16000 number of pairs number of pairs 14000 hierarchy 4 3 2 geography 12000 10000 8000 6000 4000 1 2000 0 0 5 10 15 number of steps in search 20 0 0 252 4 6 8 10 12 number of steps 14 16 18 20 Expt 2 Searching a social networking website Profiles: status (UG or G) year major or department residence gender Personality you friendship romance freetime support (choose 3 exactly): funny, kind, weird, … honesty/trust, common interests, commitment, … -“socializing, getting outside, reading, … unconditional accepters, comic-relief givers, eternal optimists Interests books movies music social activities land sports water sports other sports (choose as many as apply) mystery & thriller, science fiction, romance, … western, biography, horror, … folk, jazz, techno, … ballroom dancing, barbecuing, bar-hopping, … soccer, tennis, golf, … sailing, kayaking, swimming, … ski diving, weightlifting, billiards, … Differences between data sets HP labs email network Online community • complete image of communication network • partial information of social network • affinity not reflected • only friends listed Degree Distribution for Nexus Net 2469 users, average degree 8.2 200 number of users number of users with so many links 250 150 2 10 1 10 0 10 0 10 100 1 10 number of links 2 10 50 0 0 20 40 60 number of links 80 100 Problem: how to construct hierarchies? Probability of linking by separation in years 0.02 prob. two grads are friends prob. two undergrads are friends 0.014 0.012 0.01 0.008 0.006 data (x+1)-1.7 fit 0.015 0.01 0.005 0 0 1 2 3 4 separation in years 5 0.004 0.002 data (x+1)-1.1 fit 0 0 1 2 separation in years 3 Hierarchies not useful for other attributes: Geography probability of being friends 0.06 0.05 0.04 0.03 0.02 0.01 0 0 100 200 300 400 500 600 distance between residences Other attributes: major, sports, freetime activities, movie preferences… Strategy using user profiles prob. two undergrads are friends (consider simultaneously) • both undergraduate, both graduate, or one of each • same or different year • both male, both female, or one of each • same or different residences • same or different major/department Results strategy random high degree profile median 133 39 21 mean 390 137 53 With an attrition rate of 25%, 5% of the messages get through at an average of 4.8 steps, => hence network is barely searchable The accuracy of small world chains in social networks Peter D. Killworth, Christopher McCarty, H. Russell Bernard, Mark House Social Networks, 2006 First parallel study of individuals choices vs. actual shortest paths Network 105 members of an interviewing bureau 10,920 shortest path connections who knows whom who a person would select as the next link in a chain to a particular person x recent hire worked a while old timer Accuracy of small world chains Shortest paths use the network of who-knows whom to calculate actual shortest paths compare to paths formed by individuals’ choices 21.7% fail through reaching missing data 23.7% reach cycles : i chooses j, j chooses i 54.6% reach the target, with chains that are 40% longer on average than the shortest path Next choice accuracy and a Markov model 48% of the time, a person chooses a contact who is closer to the target over half of the choices are wrong! Markov model: terminate chain with probability a (attrition) choose someone closer to the target with probability p, otherwise choose someone at same distance LiveJournal LiveJournal provides an API to crawl the friendship network + profiles friendly to researchers great research opportunity basic statistics Users How many users, and how many of those are active? Total accounts: 9980558 ... active in some way: 1979716 ... that have ever updated: 6755023 ... updating in last 30 days: 1300312 ... updating in last 7 days: 751301 ... updating in past 24 hours: 216581 Age distribution Predominantly female & young demographic Male: 1370813 (32.4%) Female: 2856360 (67.6%) Unspecified: 1575389 13 18483 14 87505 15 211445 16 343922 17 400947 18 414601 19 405472 20 371789 21 303076 22 239255 23 194379 24 152569 25 127121 26 98900 27 73392 28 59188 29 48666 Geographic Routing in Social Networks David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins (PNAS 2005) data used Feb. 2004 500,000 LiveJournal users with US locations giant component (77.6%) of the network clustering coefficient: 0.2 Degree distributions The broad degree distributions we’ve learned to know and love but more probably lognormal than power law broader in degree than outdegree distribution Results of a simple greedy geographical algorithm Choose source s and target t randomly Try to reach target’s city – not target itself At each step, the message is forwarded from the current message holder u to the friend v of u geographically closest to t stop if d(v,t) > d(u,t) 13% of the chains are completed stop if d(v,t) > d(u,t) pick a neighbor at random in the same city if possible, else stop 80% of the chains are completed the geographic basis of friendship d = d(u,v) the distance between pairs of people The probability that two people are friends given their distance is equal to P(d) = e + f(d), e is a constant independent of geography e is 5.0 x 10-6 for LiveJournal users who are very far apart the geographic basis of friendship The average user will have ~ 2.5 non-geographic friends The other friends (5.5 on average) are distributed according to an approximate 1/distance relationship But 1/d was proved not to be navigable by Kleinberg, so what gives? Navigability in networks of variable geographical density Kleinberg assumed a uniformly populated 2D lattice But population is far from uniform population networks and rank-based friendship probability of knowing a person depends not on absolute distance but on relative distance (i.e. how many people live closer) Pr[u ->v] ~ 1/ranku(v) Structured search Conclusions Individuals associate on different levels into groups. Individuals tend to know others who are ‘close by’ Group structure facilitates decentralized search using social ties. Hierarchy search faster than geographical search Simple strategies are not perfect – but short (rather than shortest) chains can be found Weighted shortest paths Routes shortest route from Chicago to Boston vertex: intersection edge weights: road distances alternative weights: expected time traveled, gas consumed… usually sum the weights from each segment finish surface road 25 mph, 50 miles 2 hours start freeway, 70 mph 30 miles/70 mph ~ 26 minutes freeway, 65 mph 40 miles/65 mph ~ 37 minutes Reliable paths through social networks The probability of transmitting a message or infectious agent could be related to the strength of the tie e.g. rather than summing the weights, we might multiply the probabilities of getting through p=1 p = 0.001 p = 0.05 p = 0.5 p = 0.5 Probability of getting an idea through to the head of labs via CEO (0.001*1 = 0.001), via direct manager (0.5*0.5 = 0.25) Search in random networks Motivation Power-law (PL) networks, social and P2P Analysis of scaling of search strategies in PL networks Simulation artificial power-law topologies, real Gnutella networks Comparison with existing P2P search strategies Reflector, Morpheus Directed Search Freenet 2 How do we search? Mary Who could introduce me to Richard Gere? Bob Jane # of telephone numbers from which calls were made AT&T Call Graph # of telephone numbers called Aiello et al. STOC ‘00 Gnutella network power-law link distribution proportion of nodes 10 10 10 data power-law fit t = 2.07 2 1 0 10 0 1 10 number of neighbors summer 2000, data provided by Clip2 Preferential attachment model Nodes join at different times The more connections a node has, the more likely it is to acquire new connections Growth process produces power-law network ping host cache ping Gnutella and the bandwidth barrier file sharing w/o a central index queries broadcast to every node within radius ttl as network grows, encounter a bandwidth barrier (dial up modems cannot keep up with query traffic, fragmenting the network) Clip 2 report Gnutella: To the Bandwidth Barrier and Beyond http://www.clip2.com/gnutella.html#q17 power-law graph number of nodes found 94 67 63 54 2 6 1 Poisson graph number of nodes found 93 19 15 11 7 3 1 Search with knowledge of 2nd neighbors Outline of search strategy pass query onto only one neighbor at each step OPTIONS requires that nodes sign query - avoid passing message onto a node twice requires knowledge of one’s neighbors degree - pass to the highest degree node requires knowledge of one’s neighbors neighbors - route to 2nd degree neighbors Generating functions M.E.J. Newman, S.H. Strogatz, and D.J. Watts ‘Random graphs with arbitrary degree distributions and their applications’, PRE, cond-mat/0007235 Generating functions for degree distributions G0 ( x ) pk x k k 0 Useful for computing moments of degree distribution, component sizes, and average path lengths Fun with generating functions normalization condition: probabilities sum to 1 G0 (1) pk 1 k 0 derivatives: the generating function contains all the information of the degree distribution 1 G0 pk k! x k k x 0 Fun with generating functions (cont’d) Expected degree of a randomly chosen vertex k kpk G0' (1) k Higher moments of degree distribution kn n d n k pk x G0 ( x) k dx x 1 Example: Poisson distribution Let p = z/N be the probability of an edge existing between two vertices (z is the average degree) N k G0 ( x) p (1 p) N k x k k 0 k N (1 p px) N ~ e z ( x 1) for large N G'0 ( x) ze z ( x 1) G'0 (1) z 1 k G0 pk k! x k x 0 1 z k ez k! just the regular Poisson distribution Introducing cutoffs kmax N 1 a node cannot have more connections than there are other nodes This is important for exponents close to 2 1 1 pk 1 Ct xt 1 C 6 2 2 p( k 1000, t 2) pk ~ 0.001 1000 Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors: (1 p(k 1000,t 2))1000 ~ 0.36 without a cutoff, for t = 2 have > 50% chance of observing a node with more neighbors than there are nodes for t = 2.1, have a 25% chance Selecting from a variety of cutoffs kmax N 2. pk Ck t ek / 3. Ck pk 0 Newman et al. k CN 1t t otherwise Aiello et al. Generating Function G0 x C CN 1 t t k k x k 1 1 million websites (~ 1997) proportion of sites w/ so many links 1. N 1000 # of sites linking to the site Aiello’s ‘conservative’ vs. Havlin’s ‘natural’ cutoff n(k) N * pk 1 cutoff where expected number of nodes of degree k is 1 Ck t N 1 1 k ~ Nt 1 k n(k) N* k kmax cutoff so that expected number of nodes of degree > k is 1 1 k pk 1 ck t ~ N 1 k kmax 1t kmax ~ N 1 kmax ~ N 1 t 1 The imposed cutoff can have a dramatic effect on the properties of the graph degrees drawn at random, for t = 2, and N = 1000 Generating functions for degree distributions Random graphs with arbitrary degree distributions and their applications by Newman, Strogatz & Watts 2 2 2 1 G0 ( x ) pk x k k 0 1 pk ~ k t is the probability that a randomly chosen vertex has degree k k kpk G0' (1) 1 2 k 2 2 is a generating function G1 x G0' x G0' 1 z2 G0' 1G1' 1 is the expected degree of a randomly chosen vertex is the distribution of remaining outgoing edges following and edge is the expected number of second degree neighbors assuming neighbors don’t share edges search with knowledge of first neighbors kmax G0 ( x ) c k t x k 1 Generating function with cutoff kmax G (x) G0 ( x ) c k 1t x k 1 x 1 ' 0 kmax kmax 1 1 G0' (1) k c k 1t ~ k 1t dk G0' ( x ) c kmax 1t k 1 G (x) ' ' k x G0 (1) G0 (1) x 1 ' 1 c kmax 1t k 2 ' k ( k 1) x G0 (1) 2 constant in N Average degree of vertex 1 2 t 1 kmax t 2 Average number of neighbors following an edge for 2<t<3, and kmax~Na, decreases with N 3 t 2 t 2 t k ( t 2) 2 ( t 1) k 1 max max (3 t ) G1' (1) ' G0 (1) (t 2)(3 t ) search with knowledge of first neighbors (cont’d) 3 t 3 t 1 kmax t 2 kmax 3 t z1B G (1) ~ ' k ~ max 2t G0 (1) (3 t ) 1 kmax (3 t ) ' 1 In the limit t->2, ' 1 G (1) ~ kmax log(kmax ) Let’s for the moment ignore the fact that as we do a random walk, we encounter neighbors that we’ve seen before N s = number of steps = z1B Search time with different cutoffs If kmax = N, s(t ) ~ N N t 2 N ,2 t 3 3 t 3 t kmax N s(2.1) ~ N 0.1 s~ If kmax = N1/(t-1), grow from 1,000 to 1,000,000 nodes, search time increases by a factor of ~2 N log(kmax ) log(N ),t 2 kmax t 2 2 N N s(t ) ~ 3t 3t N t 1 ,2 t 3 kmax N t 1 s(2.1) ~ N s(2) ~ 0.18 grow 1000x search time increases 3x N log(kmax ) log(N ) kmax search with knowledge of first neighbors (cont’d) If kmax = N1/t, s~ So the best we can do is N 3 t kmax N N 1 N 23 / t ,2 t 3 (N t )3t for exponents close to 2 2nd neighbor random walk, ignoring overlap: ns z2B N S~ N z2B N 2 t 2 k z2B G1(G1( x )) G1' (1) 2t 1 k (3 t ) x x 1 max S N ,t ~ N 312 t 3 t max S N ,t 2.1 ~ N 0.15 2 Following the degree sequence Go to highest degree node, then next highest, … etc. z1D kmax kmax a 1t Nk 1t dk ~ Nakmax a ~ s = # of steps taken 2nd neighbors, ignoring overlap: 2(2 t ) z1DG1' ( x ) ~ Nak max 2(t 2) s ~ k max ~ N 24 / t Sdeg N ,t 2.1 N 0.1 Ratio of the degree of a node to the expected degree of its highest degree neighbor for 10,000 node power-law graphs of varying exponents t = 2.00 t = 2.25 t = 2.50 t = 2.75 t = 3.00 t = 3.25 t = 3.50 t = 3.75 20 degree of neighbor - 1 degree of node 10 5 2 1 0 10 20 30 40 50 60 degree of node 70 80 90 100 Exponents t close to 2 required to search effectively Gnutella World Wide Web, Social networks, t ~ 2.0-2.3, high degree nodes: directories, search engines AT&T call graph t ~ 2.1 Actor collaboration graph (imdb database) t ~ 2.0-2.2 number of actors/actresses 105 actors, t = 2 actresses, t = 2.1 104 103 102 101 100 0 10 101 102 103 number of costars 104 Following the degree sequence 17 18 10 5 1 6 9 8 50 Complications Should not visit same node more than once Many neighbors of current node being visited were also neighbors of previously visited nodes, and there is a bias toward high degree nodes being ‘seen’ over and over again Status and degree of node visited 30 not visited visited neighbors visited degree of node 25 20 15 10 5 0 0 100 200 300 step 400 500 600 1 random walk degree sequence 0.1 seeking high degree nodes speeds up the search process -2 10 -3 10 -4 10 1 10 10 2 10 3 10 4 10 5 10 6 step about 50% of a 10,000 node graph is explored in the first 12 steps cumulative nodes found at step proportion of nodes found at step Progress of exploration in a 10,000 node graph knowing 2nd degree neighbors 1 random walk degree sequence 0.8 0.6 0.4 0.2 0 12 20 40 step 60 80 100 Scaling of search time with size of graph 3 covertime for half the nodes 10 random walk a = 0.37 fit degree sequence a = 0.24 fit 2 10 1 10 0 10 1 10 2 10 3 10 size of graph 4 10 5 10 Comparison with a Poisson graph 10 G0 x e z x1 x G1 x G0 x G0 x z 1 10 0 10 0 10 1 10 step 10 2 expected degree and expected degree following a link are equal scaling is linear 10 3 cover time for 1/2 of graph degree of current node 10 Poisson power-law 2 10 10 10 10 5 4 constant av. deg. = 3.4 g = 1.0 fit 3 2 1 0 10 1 2 4 10 10 10 number of nodes in graph 10 6 Gnutella network 50% of the files in a 700 node network can be found in < 8 steps cumulative nodes found at step 1 0.8 0.6 0.4 0.2 0 high degree seeking 1st neighbors high degree seeking 2nd neighbors 0 20 40 60 step 80 100 Required modifications to nodes • Maintain a list of files in their neighborhood • Check query against list. • Periodically contact neighbors to maintain list • Append ID to each query processed Tradeoff storage/cpu (available) for bandwidth (limited) Theory vs. reality: • overloading high degree nodes but no worse than original scenario where all nodes handle all traffic assume high degree -> high bandwidth so can carry the traffic load • fewer nodes used for routing, system is more susceptible to malicious attack Partial implementation: • localized indexing • traffic routed to high degree nodes Clip2 Distributed Search Solutions http://dss.clip2.com © Clip2.com, Inc. Broadband user running Reflector Broadband user running Gnutella Dial-up user running Gnutella Connection-preferencing rules LimeWire, BearShare: drop connections to unresponsive hosts drives slower hosts to have fewer connections & move to edge of network Supernodes Kazaa, BearShare defender, Morpheus SuperNodes from Clip2: Morpheus out of the Underworld http://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html Freenet Queries are passed to one peer at a time. Queries routed to high degree nodes. Has a power-law topology Theodore Hong, ‘Performance’ chapter in O’Reilley’s “Peer-to-Peer, Harnessing the Power of Disruptive Technologies” Scales as N0.275 with the size of the network, N. Theodore Hong, power - law link distribution of a simulated Freenet network Theodore Hong, scaling of mean search time on a simulated Freenet network Node specialization key to Freenet’s speed Each node forwards query to node with “closest” hash key Node passing back a match remembers the address the data came from Results in nodes developing a bias towards a part of the keyspace 112 659 ?356? 356 340 340 388 388 396 396 135 135 214 214 Queries are naturally routed to high degree nodes Use keys for orientation Conclusions Search is faster and scales in power-law networks Networks intended to be searched, such as Gnutella, have a favorable P-L topology High degree strategy has partially been implemented in existing p2p clients, such as BearShare, Kazaa & Morpheus Current research on search search in weighted networks expertise search P2P architectures with ‘friendship’ overlays weak ties vs. strong ties and online communication A PL link distribution shortens the average shortest path zr a r 1 Poisson: PL: z2 z1 z1 a = z1 a > z1 r 1 z1 10 6 power-law a =2.5 Poisson a =1.0 6 4 a neighbors at radius 10 PL PS 5 10 10 10 10 10 4 2 3 0 2 10 4 6 10 N 10 2 1 0 1 1.5 2 2.5 3 radius 3.5 4 4.5 5 What about the shortest path discovered along the way? B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103. B each node passes message to highest degree neighbor it hasn’t passed the message to previously ‘cut off’ loops A A high degree seeking strategy finds shortest paths whose average scales logarithmically with the size of the graph 8 7.5 av. path length found 7 6.5 6 5.5 5 4.5 4 PL high degree 0.72*ln(N) 3.5 3 2 10 10 3 10 N 4 10 5 Scaling of the path length found using a • random strategy on a PL graph • high-degree strategy on a Poisson graph av. path length found 10 10 2 PL Poisson 0.46 N 0.48 N 1 10 2 10 3 10 N 4 10 5 But… Search costs are prohibitive, might as well do a BFS 10 median search cost 10 10 10 4 3 2 1 PL high degree PL rand Poisson high degree 0 10 2 10 10 3 10 N 4 10 5