School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.

Download Report

Transcript School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.

School of Information
University of Michigan
SI 614
Random graphs & power law networks
preferential attachment
Lecture 7
Instructor: Lada Adamic
Outline
 Erdos-Renyi random graphs
 BA model
 scale free networks in Pajek
 modifications of preferential attachment
 other processes that lead to power law networks
 randomizing networks but preserving network properties
 assortative mixing
Simplest random network
 Erdos-Renyi model: randomly draw E edges between N
nodes
 Conserves only the average number of neighbors
(connectivity) of a node
<k>=2E/N = p N
 No hubs! Narrow distribution of connectivities
Poisson distribution
Real world networks are often power law though...
 Sexual networks
 Great variation in
contact numbers
Yule model
Basic BA-model
 Very simple algorithm to implement
 start with an initial set of m0 fully connected nodes
 e.g. m0 = 3
3
1
2
 now add new vertices one by one, each one with exactly m edges
 each new edge connects to an existing vertex in proportion to the
number of edges that vertex already has → preferential attachment
 easiest if you keep track of edge endpoints in one large array and select
an element from this array at random
 the probability of selecting any one vertex will be proportional to the number
of times it appears in the array – which corresponds to its degree
1 1 2 2 2 3 3 4 5 6 6 7 8 ….
generating BA graphs – cont’d

3
To start, each vertex has an equal
number of edges (2)


the probability of choosing any
vertex is 1/3
We add a new vertex, and it will
have m edges, here take m=2

112233
draw 2 random elements from the
array – suppose they are 2 and 3
1
2
3
1122233344
1

4
2
Now the probabilities of selecting
1,2,3,or 4 are
1/5, 3/10, 3/10, 1/5
5
3

4
Add a new vertex, draw a vertex
for it to connect from the array

etc.
11222333344455
1
2
Properties of the BA graph
 The distribution is scale free with exponent a = 3
P(k) = 2 m2/k3
 The graph is connected
 Every new vertex is born with a link or several links (depending
on whether m = 1 or m > 1)
 It then connects to an ‘older’ vertex, which itself connected to
another vertex when it was introduced
 And we started from a connected core
 The older are richer
 Nodes accumulate links as time goes on, which gives older
nodes an advantage since newer nodes are going to attach
preferentially – and older nodes have a higher degree to tempt
them with than some new kid on the block
Time evolution of the connectivity of a vertex in the BA model
vertex introduced at time t=5
vertex introduced at time t=95



Younger vertex does not stand a chance:
at t=95 older vertex has ~ 20 edges, and younger vertex is starting out with 5
at t ~ 10,000 older vertex has 200 edges and younger vertex has 50
Generating scale free networks with Pajek
 Two general options
 Scale free
 D.M. Pennock et al. (2002) Winners don’t take all, PNAS, 99/8,
5207-5211.
 Pajek command: Net > Random Network > Scale Free
 Differs from the BA model primarily in that:
 new vertices are not automatically assigned edges
 probability of attaching is partially independent of degree
 Extended model
 Albert R., Barabasi A.L.: Topology of evolving networks: local events
and universality http://xxx.lanl.gov/abs/cond-mat/0005085
 Pajek command: Net > Random Network > Extended Model
 Differs from the simple BA model in that:
 edges are added between existing nodes, not only the newcomer
 edges are rewired between existing nodes
‘Scale free’ network option in Pajek
 Network starts with m0 vertices, which link to each other
with probability p0 (as in an Erdos-Renyi random graph)
 At each time step t, one vertex and m edges are added
to the network
 Instead of attaching one end point of each edge to the
newly introduced vertex, choose each end point
according to the probability:
deg in (v)
deg out (v)
1
Pr(v)  a


E
E
V
fraction of edges in the
graph that start at v
fraction of edges in the
graph that end at v
the credit v
gets just for being
one of the vertices
‘Scale free’ network generation in Pajek-cont’d
 Observations:
 a     = 1, so can vary the relative importance of indegree,
outdegree, and independent probability
 in an undirected network a  , since indegree and outdegree are the
same
 Not all vertices will be connected, since they are not ‘born’ with an
edge
 The larger g is, the less scale-free the degree distribution
 edges are added at without regard to degree
 Original BA paper showed that in that case the degree distribution
P(k) ~ exp(-k) so an exponential distribution
deg in (v)
deg out (v)
1
Pr(v)  a


E
E
V
Pennock model
 Example: It is reasonable to assume that some webpages will be
linked to in part because of what they are rather than the number of
links they already have
fits to various subsets of web data,
and web pages in general
‘Scale free’ in Pajek
 For the network you can specify
 ‘undirected’, ‘directed’, or ‘acyclic’
 an ‘adding > free’ option?
 # of vertices
 # of lines
 average degree of vertices
in theory you can leave either the # of
vertices or # of lines unconstrained,
but leaving the # of lines
unconstrained (enter in ‘0’) works for
me
 Initial Erdos-Renyi Graph (these are the first few vertices present)
 # of vertices (use something small, a couple of vertices)
 probability p of connecting – type 0.9999 to have them fully connected, or
anything between 0 and 1 doesn’t matter much
 a – this is between 0 and 0.5 for an undirected graph
 the higher a the more scale-free your distribution will be
 but watch out, if you set a = 0.5, then =0.5 and  = 0, and your new,
edgeless vertices will never get new connections – you will only have the
original Erdos-Renyi component connected
Extended BA model (undirected network)
 start with m0 isolated nodes
 at each timestep perform one of the following operations:
 w/ prob. p add m (m≤ m0) new links
 for each link
 select ‘from’ vertex at random
 select ‘to’ vertex in proportion to its degree (+1 so that isolated vertices have a
chance of getting links)
ki  1
P ( ki )  
i  k j 1
j
 w/ prob. q where 0 < q < 1 – p
 rewire m links
 select node i at random and one of i’s links
 rewire the endpoint of i’s link to another node j randomly chosen with probability
P(kj)
Extended BA model – cont’d
 w/ prob. 1 – p - q
 add a new node with m links
 connect endpoints of the m links to vertices in proportion to their degree
(P(kj)
 In the p=q=0 limit, reduces to the simple BA model
 rewire m links
 select node i at random and one of i’s links
 rewire the endpoint of i’s link to another node j randomly chosen with
probability P(kj)
 In the high q (q -> 1) limit, extended model produces a
network with an exponential tail because growth is very
slow (only rewiring is occurring)
parameter space of the extended BA model
 In the high p (p > 0.5) limit, have a scale free distribution,
because adding new edges preferentially
 saturation effect for small k (degree)
 because edges keep being added, but vertices are not being added
that quickly, eventually even the low degree vertices get a few more
edges
 power-law exponent varies between 2 and , depending on
parameters
Extended BA model in Pajek
 Net > Random Network > Extended Model
 Specify
 n = # of vertices
 m0 = # of initial, disconnected nodes
 m ≤ m0, number of edges to add/rewire at a time
 p = probability to add new lines
 q = probability to rewire edges, 0 ≤ q ≤ 1-p
 can ask for network without multiple lines
How can we randomize a network while
preserving the degree distribution?
 Stub reconnection algorithm (M. E. Newman, et al, 2001, also known in
mathematical literature since 1960s)
 Break every edge in two “edge stubs”
AB to A
B
 Randomly reconnect stubs
 Problems:
 Leads to multiple edges
 Cannot be modified to preserve additional topological
properties
Local rewiring algorithm
 Randomly select and rewire two edges (Maslov, Sneppen, 2002, also
known in mathematical literature since 1960s)
 Repeat many times
 Preserves both the number of upstream and downstream
neighbors of each node
Conserving additional low-level topological
properties
 In addition to ki one may also conserve:
 The exact numbers of loops or other motifs
 The size and numbers of components: Internet – all nodes have
to be connected to each other
 Metropolis algorithm: two edges are rewired based on
E=(Nactual-Ndesired)2/Ndesired
 If E0 rewiring step is always accepted
 If E>0 rewiring step is accepted with p=exp(-E/T)
Assortativity
 Social networks are assortative:
 the gregarious people associate with other gregarious people
 the loners associate with other loners
 The Internet is disassorative:
Assortative:
hubs connect to hubs
Random
Disassortative:
hubs are in the
periphery
Correlation profile of a network
 Detects preferences in linking of nodes to each other
based on their connectivity
 Measure N(k0,k1) – the number of edges between nodes
with connectivities k0 and k1
 Compare it to Nr(k0,k1) – the same property in a properly
randomized network
 Very noise-tolerant with respect to both false positives
and negatives
Correlation profiles give complex networks
unique identities
2D picture
Protein interactions
slide by Sergei Maslov
Internet
Correlation profiles give complex networks
unique identities
Sergei Maslov: 2D histogram
Protein interactions
Internet
Correlation profiles -cont’d
 Pastor-Satorras and Vespignani: 2D plot
average degree
of the node’s neighbors
degree of node
Correlation profiles -cont’d
 Newman: single number
-0.189
internet degree correlation coefficient
The Pearson correlation coefficient of nodes on each
side on an edge
Other examples of assortative mixing
 Assortativity is not limited to degree-degree correlations
other attributes
 social networks: race, income, gender, age
 food webs: herbivores, carnivores
 internet: high level connectivity providers, ISPs, consumers
 Tendency of like individuals to associate: ‘homophily’
 more about this later…