School of Information University of Michigan SI 614 Small Worlds Lecture 5 Instructor: Lada Adamic.
Download
Report
Transcript School of Information University of Michigan SI 614 Small Worlds Lecture 5 Instructor: Lada Adamic.
School of Information
University of Michigan
SI 614
Small Worlds
Lecture 5
Instructor: Lada Adamic
Outline
Milgram’s small world experiment
Watts & Strogatz small world model
Kleinberg small world model
Watts, Dodds & Newman community model
Network models: a few examples
Things for Lada to remember:
Online survey for short profiles
Email in groups by Monday Feb. 6th
Cormen chapter now available as PDF on cTools
PS 1 graded
PS 3 available by tomorrow
Small world experiments then
MA
NE
Milgram’s experiment (1960’s):
Given a target individual and a particular property, pass the message to a
person you correspond with who is “closest” to the target.
Milgram’s small world experiment
Target person worked in Boston as a stockbroker.
296 senders from Boston and Omaha.
20% of senders reached target.
average chain length = 6.5.
“Six degrees of separation”
Small world experiments now
email experiment
Dodds, Muhamad, Watts,
Science 301, (2003)
18 targets
13 different countries
60,000+ participants
24,163 message chains
384 reached their targets
average path length 4.0
image by Stephen G. Eick
http://www.bell-labs.com/user/eick/index.html
(unrelated to small world experiment…)
Targets for the small world experiment at Columbia
a professor at an Ivy League university,
an archival inspector in Estonia,
a technology consultant in India,
a policeman in Australia,
a veterinarian in the Norwegian army.
no chain reached the target in Croatia
Accounting for attrition
Approximate 37% participation rate approximately .
Probability of a chain of length 10 getting through:
.3710 ~ 5 x 10-5
so only one out of 20,000 chains would make it
actual # of completed chains: 384 (1.6% of all chains).
Small changes in attrition rates lead to large changes in
completion rates
e.g., a 15% decrease in attrition rate would lead to a
800% increase in completion rate
Estimating ‘recovered’ chain lengths for uncompleted
chains
<L> = 4.05 for all completed chains
L* = Estimated `true' median chain length
Intra-country chains: L* = 5
Inter-country chains: L* = 7
All chains: L* = 7
Milgram: L * ~ 8-9 hops
Attrition rate stays approx. constant throughout
rL – probability of not passing on the message at
distance L from the source
average
95 % confidence interval
Estimated ‘recovered’ chain lengths
observed
chain
lengths
‘recovered’
histogram of
path lengths
inter-country
intra-country
Small world experiment at Columbia
Successful chains disproportionately used
• weak ties (Granovetter)
• professional ties (34% vs. 13%)
• ties originating at work/college
• target's work (65% vs. 40%)
. . . and disproportionately avoided
• hubs (8% vs. 1%) (+ no evidence of funnels)
• family/friendship ties (60% vs. 83%)
Strategy: Geography -> Work
How many hops actually separate any two
individuals in the world?
Participants are not perfect in routing messages
They use only local information
“The accuracy of small world chains in social networks”
Peter D. Killworth, Chris McCarty , H. Russell Bernard& Mark House:
Analyze 10920 shortest path connections between 105 members of
an interviewing bureau,
together with the equivalent conceptual, or ‘small world’ routes,
which use individuals’ selections of intermediaries.
This permits the first study of the impact of accuracy within small
world chains.
The mean small world path length (3.23) is 40% longer than the
mean of the actual shortest paths (2.30)
Model suggests that people make a less than optimal small world
choice more than half the time.
Why study small world phenomena?
Curiosity:
Why is the world small?
How are people able to route messages?
Social Networking as a Business:
Friendster, Orkut, MySpace
LinkedIn, Spoke, VisiblePath
Six degrees of separation - to be expected
Pool and Kochen (1978) - average person has 500-1500
acquaintances
Ignoring clustering (the probability that my friend’s friend is not
someone unknown to me, but is actually my friend…)
~ 103 first neighbors, 106 second neighbors, 109 third neighbors
Since the number of neighbors grows exponentially with distance
(measured in hops traversed in a breadth-first search)
Connected random networks have short average path lengths:
<dAB> ~ log(N)
N = population size,
dAB = distance between nodes A and B.
But: social networks aren't random…
Reverse small world experiment
Killworth & Bernard (1978):
Given hypothetical targets (name, occupation, location, hobbies, religion…)
participants choose an acquaintance for each target
Acquaintance chosen based on
(most often) occupation, geography
only 7% because they “know a lot of people”
Simple greedy algorithm: most similar acquaintance
two-step strategy rare
The small world model
High clustering: my friends’ friends tend to be my friends
Watts & Strogatz (1998) - a few random links in an otherwise clustered
graph give an average shortest path close to that of a random graph
Networks in nature (empirical observations)
neural network of C. elegans,
semantic networks of languages,
actor collaboration graph,
food webs.
lnetwork ln(N )
Cnetwork Crandom graph
Model proposed
Crossover from regular lattices to random graphs
Tunable
Small world network with (simultaneously):
Small average shortest path
Large clustering coefficient (not obeyed by RG)
Two ways of constructing a small world graph
Select a fraction p of edges
Reposition on of their endpoints
Add a fraction p of additional
edges leaving underlying lattice
intact
As in many network generating algorithms
Disallow self-edges
Disallow multiple edges
Original model
Each node has K>=4 nearest neighbors (local)
Probability p of rewiring to randomly chosen nodes
p small: regular lattice
p large: classical random graph
p=0 Ordered lattice
Compute the clustering coefficient as follows
each node is connected to K neighbors, who can have K*(K-1)/2
pairwise connections between them
some of the connections between them are present in the lattice
If K = 4 (connected to two closest neighbors on each side)
C = 3*2/4/3 = ½
Caution: sometimes the lattice will be specified as
each node connects to K closest neighbors
each node connects to all neighbors within distance k (k = K/2)
Clustering coefficient for regular lattice
In general, can have any K
a neighbor K/2 hops away from i
can connect to (K/2 – 1) of i’s
neighbors
a neighbor K/2-1 hops away can
connect to (1 + K/2 – 1) neighbors
K/2 – 2 hops away
i
(2 + K/2 – 1) neighbors
i
1 hop away
2*(K/2 – 1)
Sum this up
multiply by factor of 2 because i
has neighbors on both sides
divide by a factor of 2 because
edges are undirected
i
Clustering coefficient for regular lattice
The number of connections
between neighbors is given by
K
1
2
K
3
(
i
1
)
K ( K 2)
8
j 0 2
i
The maximum number of
i
connections is K*(K-1)/2
→ clustering coefficient is
i
3( K 2)
C
4( K 1)
Average shortest path – regular lattice
Average node is N/4 hops away (a quarter of the way
around the ring), and you can hop over K/2 nodes at a
time
N
l
1
2K
p=1 Random graph
We’ll talk more about this next week
ln N
l
ln K
K
C
N
small
small
There are an average of K links per node.
The probability that any two nodes are connected is p = K/N.
The probability that two nodes which share in a neighbor in common
are connected themselves is the same as any two random nodes: K/N
(actually (K-1)/N because they have already expended one edge on their
common neighbor.
What happens in between?
Small shortest path means small clustering?
Large shortest path means large clustering?
Through numerical simulation
As we increase p from 0 to 1
Fast decrease of mean distance
Slow decrease in clustering
Change in clustering coefficient and average path length
as a function of the proportion of rewired edges
C(p)/C(0)
Exact analytical solution
l(p)/l(0)
No exact analytical solution
1% of links rewired
10% of links rewired
Clustering coefficient for SW model with rewiring
The probability that a connected triple stays connected
after rewiring
probability that none of the 3 edges were rewired (1-p)3
probability that edges were rewired back to each other
very small, can ignore
Clustering coefficient = C(p) = C(p=0)*(1-p)3
1
C(p)/C(0)
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
p
Clustering coefficient: addition of random
edges
How does C depend on p?
C’(p)= 3xnumber of triangles / number of connected
triples
C’(p) computed analytically for the small world model
without rewiring
3(k 1)
C ' ( p)
2(2k 1) 4kp( p 2)
1
0.9
0.8
C’(p)
0.7
0.6
0.5
0.4
0.3
0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
p
1
Degree distribution
p=0 delta-function
p>0 broadens the distribution
Edges left in place with probability (1-p)
Edges rewired towards i with probability 1/N
Model: small world with probability p of
rewiring
Why does each
node keep at
least
K/2 links?
Even at p = 1,
graph is not a
purely random
graph
1000 vertices
random network
with average
connectivity K
visit nodes sequentially and rewire links
exponential decay, all nodes have similar number of links
Some examples for real networks
(in averages)
Network
size
vertex
degree
shorte
st
path
Shortest
path in
fitted
random
graph
Clustering
(#
triangles)
Clustering
(averaged
over
vertices)
Clustering in
random graph
Film actors
225,226
61
3.65
2.99
0.20
0.79
0.00027
MEDLINE
coauthorship
1,520,25
1
18.1
4.6
4.91
0.45
0.56
1.8 x 10-4
E.Coli
substrate
graph
282
7.35
2.9
3.04
0.32
0.026
C.Elegans
282
14
2.65
2.25
0.28
0.05
What if long range links depend on distance?
“The geographic movement of the [message] from Nebraska to
Massachusetts is striking. There is a progressive closing in on the
target area as each new person is added to the chain”
S.Milgram ‘The small world problem’, Psychology Today 1,61,1967
MA
NE
Kleinberg’s geographical small world model
nodes are placed on a lattice and
connect to nearest neighbors
additional links placed with puv~
d
r
uv
Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’
(Nature 2000)
no locality
When r=0, links are randomly distributed, ASP ~ log(n), n size of grid
p ~ p0
Links highly localized links on a lattice
1
p~ 4
d
Links balanced between long and short range
1
p~ 2
d
How the small world phenomenon arises
l2|R|<|R’|<l|R|
R
R’
T
S
k = c log2n
calculate probability that s fails to have a link in R’
h
Hierarchical network models:
b=3
Individuals classified into a hierarchy,
hij = height of the least common ancestor.
pij
b
a hij
e.g. state-county-city-neighborhood
industry-corporation-division-group
Group structure models:
Individuals belong to nested groups
q = size of smallest group that v,w belong to
f(q) ~ q-a
Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’ NIPS 14, 2001
Identity and search in social networks
Watts, Dodds, Newman (Science,2001)
individuals belong to hierarchically nested groups
pij ~ exp(-a x)
multiple independent hierarchies h=1,2,..,H
coexist corresponding to occupation,
geography, hobbies, religion…
Other generative models
Assign properties to nodes (e.g. spatial location, group
membership)
Add or rewire links according to some rule
optimize for a particular property (simulated annealing)
add links with probability depending on property of existing
nodes, edges (preferential attachment, link copying)
simulate nodes as agents ‘deciding’ whether to rewire or add
links
Example: trade-off between wiring and connectivity
Small worlds: How and Why, Nisha Mathias and Venkatesh Gopal
E is the ‘energy’ cost we are trying to minimize
L is the average shortest path in ‘hops’
W is the total length of wire used
Network configuration
rewire using simulated
annealing
sequence is shown in
order of increasing l
Small worlds: the how and the why
same networks, but the
vertices are allowed to
move using a spring
layout algorithm
wiring cost associated
with the physical
distance between nodes
Shape and efficiency in spatial distribution networks
Michael Gastner & Mark Newman
(a) Commuter rail network in the Boston area. The arrow
marks the assumed root of the network.
(b) Star graph.
(c) Minimum spanning tree.
(d) The model of Eq. (3) applied to the same set of
stations.
Assign an effective cost to each edge
physical distance
number of hops
l incorporates a person’s preference for short distances
or a small number of hops
car travel: short distance
airplane travel: small number of hops, sometimes at the expense
of total distance
Construct network using simulated annealing
slide by Mark Newman
slide by Mark Newman
slide by Mark Newman
Roads
Air routes
slide by Mark Newman
How do networks become navigable?
Aaron Clauset and Christopher Moore
arxiv.org/abs/cond-mat/0309415
• start with a 1-D lattice (a ring)
• each node is connected to its 2
nearest neighbors and has one long
range link (initially just a self-loop)
y
• we start going from x to y, but go no
more than a certain threshold # of
steps which repesents how many hops
we think getting to y should take
• if we give up, we rewire x’s long
range link to the last node we reached
x
• In the limit N->
• long range link distribution becomes
1/r, r = lattice distance between nodes
• search time starts scaling as log(N)
Summary
The world is small!
Watts & Strogatz came up with a simple model to explain
why
Later, more sophisticated models of social structure were
developed
There are many, many more models that can be thought
up and that give useful insights