School of Information University of Michigan SI 614 Search in random networks Lecture 16

Download Report

Transcript School of Information University of Michigan SI 614 Search in random networks Lecture 16

School of Information
University of Michigan
SI 614
Search in random networks
Lecture 16
Search in random networks
Motivation
Power-law (PL) networks, social and P2P
Analysis of scaling of search strategies in PL networks
Simulation
artificial power-law topologies, real Gnutella networks
Comparison with existing P2P search strategies
Reflector, Morpheus
Path finding
Directed Search
Freenet
2
How do we search?
Mary
Who could
introduce me to
Richard Gere?
Bob
Jane
# of telephone numbers
from which calls were made
AT&T Call Graph
# of telephone numbers called
Aiello et al. STOC ‘00
Gnutella network
power-law link distribution
proportion of nodes
10
10
10
data
power-law fit
t = 2.07
2
1
0
10
0
1
10
number of neighbors
summer 2000,
data provided by Clip2
Preferential attachment model
Nodes join at different times
The more connections a node has, the more likely it is to acquire
new connections
Growth process produces power-law network
ping
host cache
ping
Gnutella and the bandwidth barrier
file sharing w/o a central index
queries broadcast to every node within radius ttl
 as network grows, encounter a bandwidth
barrier (dial up modems cannot keep up with
query traffic, fragmenting the network)
Clip 2 report
Gnutella: To the Bandwidth Barrier and Beyond
http://www.clip2.com/gnutella.html#q17
power-law graph
number of
nodes found
94
67
63
54
2
6
1
Poisson graph
number of
nodes found
93
19
15
11
7
3
1
Search with knowledge of 2nd neighbors
Outline of search strategy
pass query onto only one neighbor at each step
OPTIONS
requires that nodes sign query
- avoid passing message onto a node twice
requires knowledge of one’s neighbors degree
- pass to the highest degree node
requires knowledge of one’s neighbors neighbors
- route to 2nd degree neighbors
Generating functions
 M.E.J. Newman, S.H. Strogatz, and D.J. Watts
 ‘Random graphs with arbitrary degree distributions and their
applications’, PRE, cond-mat/0007235
 Generating functions for degree distributions

G0 ( x )   pk x k
k 0
 Useful for computing moments of degree distribution,
 component sizes, and average pathlengths
Fun with generating functions
 normalization condition: probabilities sum to 1

G0 (1)   pk  1
k 0
 derivatives: the generating function contains all the
information of the degree distribution
1  G0
pk 
k! x k
k
x 0
Fun with generating functions (cont’d)
 Expected degree of a randomly chosen vertex
 k   kpk  G0' (1)
k
 Higher moments of degree distribution
kn
n


 d 
n
  k pk   x  G0 ( x)
k
 dx 
 x 1
Example: Poisson distribution
 Let p = z/N be the probability of an edge existing
between two vertices (z is the average degree)
N k
G0 ( x)     p (1  p) N k x k
k 0  k 
N
 (1  p  px) N ~ e z ( x 1)
for large N
G'0 ( x)  ze z ( x 1)
G'0 (1)  z
1  k G0
pk 
k! x k
x 0
1
 z k ez
k!
just the regular Poisson
distribution
Introducing cutoffs
kmax  N  1 a node cannot have more connections than there are other nodes
This is important for exponents close to 2


1
1 pk 1 Ct xt  1
C  6
2
2

p( k  1000, t  2)   pk ~ 0.001
1000
Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors:
(1 p(k  1000,t  2))1000 ~ 0.36
without a cutoff, for t = 2
have > 50% chance of observing a node with more neighbors than there are nodes
for t = 2.1, have a 25% chance
Selecting from a variety of cutoffs
kmax  N
2.
pk  Ck t ek /
3.
Ck
pk  
0
Newman et al.
k  CN 
1t
t
otherwise
Aiello et al.
Generating Function
G0 x   C
CN 1 t
t k
k
 x
k 1
1 million websites (~ 1997)
proportion of sites w/ so many links
1.
N
1000
# of sites linking to the site
Aiello’s ‘conservative’ vs. Havlin’s ‘natural’
cutoff
n(k)
N * pk  1
cutoff where expected
number of nodes of degree
k is 1
Ck t  N 1
1
k ~ Nt
1
k
n(k)

N*

k  kmax
cutoff so that
expected number of nodes
of degree > k is 1
1
k


pk  1
ck t ~ N 1
k  kmax
1t
kmax
~ N 1
kmax ~ N
1
t 1
The imposed cutoff can have a dramatic
effect on the properties of the graph
degrees drawn at random, for t = 2, and N = 1000
Generating functions for degree
distributions
 Random graphs with arbitrary degree distributions and their applications
 by Newman, Strogatz & Watts
2
2

2
1
G0 ( x )   pk x k
k 0
1
pk ~ k t
is the probability that a randomly
chosen vertex has degree k
 k   kpk  G0' (1)
1
2
k
2
2
is a generating function
G1  x  
G0'  x 
G0' 1
z2  G0' 1G1' 1
is the expected degree of a randomly
chosen vertex
is the distribution of remaining
outgoing edges following and edge
is the expected number of second
degree neighbors
assuming neighbors don’t share edges
search with knowledge of first neighbors
kmax
G0 ( x )  c  k t x k
1
Generating function with cutoff
kmax

G (x) 
G0 ( x )  c  k 1t x k 1
x
1
'
0
kmax
kmax
1
1
G0' (1)  k  c  k 1t

k 1t dk 
G0' ( x )
c  kmax 1t k 1
G (x)  '
 '
k x

G0 (1) G0 (1) x 1
'
1
c kmax 1t
k 2
 '
k
(
k

1)
x

G0 (1) 2
constant in N
Average degree of vertex

1
2 t
1  kmax
t 2

Average number of neighbors
following an edge
for 2<t<3, and kmax~Na, decreases
with N
3 t
2 t
2 t
k
(
t

2)

2
(
t

1)

k
1
max
max (3  t )
G1' (1)  '
G0 (1)
(t  2)(3  t )
search with knowledge of first neighbors
(cont’d)
z1B  G (1)
'
1
In the limit t->2,
3 t
3 t
1 kmax
t  2 kmax

'
2t
G0 (1) (3  t ) 1  kmax
(3  t )
'
1
G (1)
3 t
kmax
kmax
log(kmax )
Let’s for the moment ignore the fact that as we do a random walk, we encounter neighbors
that we’ve seen before
N
s = number of steps =
z1B
Search time with different cutoffs
If kmax = N, s(t )
N
N
t 2


N
,2  t  3
3 t
3 t
kmax N
s(2.1) N 0.1
s
If kmax = N1/(t-1),
N log(kmax )
 log(N ),t  2
kmax
s(t )
t 2
2
N
N
 3t  N t 1 ,2  t  3
3 t
kmax
N t 1
s(2.1)
s(2)
N 0.18
N log(kmax )
 log(N )
kmax
search with knowledge of first neighbors
(cont’d)
If kmax =
N1/t,
s
So the best we can do is
N

3 t
kmax
N
N
1
 N 23 / t ,2  t  3
(N t )3t
for exponents close to 2
2nd neighbor random walk, ignoring overlap:
ns  z2B  N 
S~
N
z2B  N 
2
 t 2 k



z2B   G1(G1( x ))  G1' (1)  

2t
1

k
(3

t
)
 x
 x 1
max


S N ,t  ~ N 312 t 
3 t
max
S N ,t  2.1 ~ N 0.15
2
Following the degree sequence
Go to highest degree node, then next highest, … etc.
z1D  
kmax
kmax a
1t
Nk 1t dk ~ Nakmax
a ~ s = # of steps taken
2nd neighbors, ignoring overlap:
2(2 t )
z1DG1' ( x ) ~ Nak max
2(t  2)
s ~ k max
~ N 24 / t
Sdeg N ,t  2.1  N 0.1
Ratio of the degree of a node to the expected degree of its highest
degree neighbor for 10,000 node power-law graphs of varying exponents
t = 2.00
t = 2.25
t = 2.50
t = 2.75
t = 3.00
t = 3.25
t = 3.50
t = 3.75
20
degree of neighbor - 1
degree of node
10
5
2
1
0
10
20
30
40
50
60
degree of node
70
80
90
100
Exponents t close to 2 required to search effectively
Gnutella
World Wide Web,
Social networks,
t ~ 2.0-2.3,
high degree nodes: directories, search engines
AT&T call graph t ~ 2.1
Actor collaboration graph
(imdb database)
t ~ 2.0-2.2
number of actors/actresses
105
actors, t = 2
actresses, t = 2.1
104
103
102
101
100 0
10
101
102
103
number of costars
104
Following the degree sequence
17
18
10
5
1
6
9
8
50
Complications
Should not visit same node more than once
Many neighbors of current node being visited
were also neighbors of previously visited
nodes, and there is a bias toward high degree
nodes being ‘seen’ over and over again
Status and degree of node visited
30
not visited
visited
neighbors visited
degree of node
25
20
15
10
5
0
0
100
200
300
step
400
500
600
1
random walk
degree sequence
0.1
seeking high degree nodes
speeds up the search process
-2
10
-3
10
-4
10
1
10
10
2
10
3
10
4
10
5
10
6
step
about 50% of a 10,000 node graph
is explored in the first 12 steps
cumulative nodes found at step
proportion of nodes found at step
Progress of exploration in a 10,000 node graph knowing
2nd degree neighbors
1
random walk
degree sequence
0.8
0.6
0.4
0.2
0
12
20
40
step
60
80
100
Scaling of search time with size of graph
3
covertime for half the nodes
10
random walk
a = 0.37 fit
degree sequence
a = 0.24 fit
2
10
1
10
0
10 1
10
2
10
3
10
size of graph
4
10
5
10
Comparison with a Poisson graph
10
G0 x  e z  x1
x
G1  x   G0  x   G0  x 
z
1
10
0
10 0
10
1
10
step
10
2
expected degree and expected
degree following a link are equal
scaling is linear
10
3
cover time for 1/2 of graph
degree of current node
10
Poisson
power-law
2
10
10
10
10
5
4
constant av. deg. = 3.4
g = 1.0 fit
3
2
1
0
10 1
2
4
10
10
10
number of nodes in graph
10
6
Gnutella network
50% of the files in a 700 node network can be found in < 8 steps
cumulative nodes found at step
1
0.8
0.6
0.4
0.2
0
high degree seeking 1st neighbors
high degree seeking 2nd neighbors
0
20
40
60
step
80
100
Required modifications to nodes
•
Maintain a list of files in their neighborhood
•
Check query against list.
•
Periodically contact neighbors to maintain list
•
Append ID to each query processed
Tradeoff
storage/cpu
(available)
for
bandwidth
(limited)
Theory vs. reality:
• overloading
high degree nodes
but no worse than original scenario where all nodes
handle all traffic
assume high degree -> high bandwidth
so can carry the traffic load
• fewer nodes used for routing,
system is more susceptible to malicious
attack
Partial implementation:
• localized
indexing
• traffic routed to high degree nodes
Clip2 Distributed Search Solutions
http://dss.clip2.com
© Clip2.com, Inc.
Broadband user running
Reflector
Broadband user running
Gnutella
Dial-up user running
Gnutella
Connection-preferencing rules
LimeWire, BearShare:
drop connections to unresponsive hosts
drives slower hosts to have fewer connections &
move to edge of network
Supernodes
Kazaa, BearShare defender, Morpheus SuperNodes
from Clip2: Morpheus out of the Underworld
http://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html
Conclusions
Search is faster and scales in power-law networks
Networks intended to be searched, such as Gnutella,
have a favorable P-L topology
High degree strategy has partially been implemented in existing p2p
clients, such as BearShare, Kazaa & Morpheus
A PL link distribution shortens the average shortest path
zr  a
r 1
Poisson:
PL:
 z2 
z1   
 z1 
a = z1
a > z1
r 1
z1
10
6
power-law a =2.5
Poisson a =1.0
6
4
a
neighbors at radius
10
PL
PS
5
10
10
10
10
10
4
2
3
0
2
10
4
6
10
N
10
2
1
0
1
1.5
2
2.5
3
radius
3.5
4
4.5
5
What about the shortest path discovered along the way?
B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103.
B
each node passes
message to highest degree
neighbor it hasn’t
passed the message to
previously
‘cut off’ loops
A
A high degree seeking strategy finds shortest paths whose
average scales logarithmically with the size of the graph
8
7.5
av. path length found
7
6.5
6
5.5
5
4.5
4
PL high degree
0.72*ln(N)
3.5
3
2
10
10
3
10
N
4
10
5
Scaling of the path length found using a
• random strategy on a PL graph
• high-degree strategy on a Poisson graph
av. path length found
10
10
2
PL
Poisson
0.46
N
0.48
N
1
10
2
10
3
10
N
4
10
5
But…
Search costs are prohibitive, might as well do a BFS
10
median search cost
10
10
10
4
3
2
1
PL high degree
PL rand
Poisson high degree
0
10
2
10
10
3
10
N
4
10
5
Freenet
Queries are passed to one peer at a time.
Queries routed to high degree nodes.
Has a power-law topology
Theodore Hong, ‘Performance’ chapter in O’Reilley’s
“Peer-to-Peer, Harnessing the Power of Disruptive Technologies”
Scales as N0.275 with the size of the network, N.
Theodore Hong,
power - law link distribution of a simulated Freenet network
Theodore Hong,
scaling of mean search time
on a simulated Freenet network
Node specialization key to Freenet’s speed
Each node forwards query to node with “closest” hash key
Node passing back a match remembers the address the
data came from
Results in nodes developing a bias towards a part of the
keyspace
112
659
?356?
356
340
340
388
388
396
396
135
135
214
214
Queries are naturally routed to high degree nodes
Use keys for orientation
Applications to peer to peer networks
 Adriana Iamnitchi, Matei Ripeanu, Ian Foster
“Small-World File-Sharing Communities”, http://arxiv.org/abs/cs.DC/0307036
create localized indeces for peers with similar download patterns
 Foreseer:
Proposed P2P architecture with friend & neighbor overlay
friend: has shared a file
neighbor: short ping time
 Fletcher, George , Sheth, Hardik and Börner, Katy. (2004). Unstructured
Peer-to-Peer Networks: Topological Properties and Search Performance.
Third International Joint Conference on Autonomous Agents and MUltiAgent Systems. W6: Agents and Peer-to-Peer Computing, Moro, Gianluca,
Bergmanschi, Sonia and Aberer, Karl, Eds., New York, July 19-23, pp. 2-13.
http://ella.slis.indiana.edu/~katy/paper/04-fletcher.pdf
How do networks become navigable?
Aaron Clauset and Cris Moore
arxiv.org/abs/cond-mat/0309415

In the limit N->
long range
link distribution becomes 1/r,
r = lattice distance between
nodes