Graphs - Minas Gjoka

Download Report

Transcript Graphs - Minas Gjoka

Construction of Simple Graphs with a Target
Joint Degree Matrix and Beyond
Minas Gjoka, Balint Tillman, Athina Markopoulou
University of California, Irvine
Graphs
DNS
Autonomous
Systems
Social Networks
Protein interactions
World Wide Web
2
Motivation
 Measurements/sampling OSNs
• http://odysseas.calit.uci.edu/osn/
• [INFOCOM 2010],[ SIGMETRICS 2011],
3x[JSAC 2011], [WOSN 2012]…
• ~3500 researchers have requested our
Facebook datasets
Social Networks
 Generate synthetic graphs that
resemble real social networks
• to use in simulations
• for anonymization
 Q1: resemble in terms of what?
 Q2: generate how?
3
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
• “A set of graph properties that describe and constrain random
graphs, using degree correlations, in successively finer detail”
2a
1
3
2b
4
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree k  2 E
V
2a
1
0K 
3
24
2
4
2b
5
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree
• 1K specifies the node degree sequence
D(k )  aV 1
k
2a
1
3
2b
k
D(k)
1
1
2
2
3
1
4
1K
6
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree
• 1K specifies the node degree sequence
• 2K specifies the joint node degree matrix (JDM)
JDM (k , l )  aV
k
(k,l)
2a
1
3

1
1
2
bVl {{a ,b )E }
1
1
2
2
2b
3
3
1
2
2
2K
7
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
•
•
•
•
0K specifies the average node degree
1K specifies the node degree sequence
2K specifies the joint node degree matrix (JDM)
3K specifies the number of induced subgraphs of 3 nodes
o nodes are labeled by their degree k
2a
1
3
(k,l,m)
#Wedges
(k,l,m)
#Triangles
1,3,2
2
2,2,3
1
2
2b
4
3K
2
8
dK-Series
 dK-series framework [Mahadevan et al, Sigcomm ’06]
•
•
OSNs •
“2K+” •
•
•
0K specifies the average node degree
1K specifies the node degree sequence
2K specifies the joint node degree matrix (JDM)
3K specifies the number of induced subgraphs of 3 nodes
…
nK specifies the entire graph
 Nice properties
• Inclusion
• Convergence
• Tradeoff : accuracy vs. complexity
9
Related Work
 Graph Construction Approaches:
• Stochastic: reproduces dk-distribution in expectation.
• Configuration (“pseudograph”): reproduces dk-distribution exactly.
o Deterministic algorithms up to d=2. MCMC for d>=2.
 1K Construction
• Configuration: 1K multigraphs [Molloy’95]
• 1K+ [Bansal ’09, Newman’09, Serrano & Boguna’05, …]
 2K Construction
• Configuration model for 2K multigraphs [Mahadevan’06]
• Balance Degree Invariant: simple graphs [Amanatidis’08], [Stanton’ 12]
 2K+ Construction
• 2K preserving, 3K targeting using edge rewiring: [Mahadevan’ 06]
• 2.5K heuristic: JDM+degree dependent clustering coefficient: [Gjoka’13]
10
2K Construction
Configuration Model
Free stub
2a
2b
4a
3a
2b
3b
l
k 2 3 4
2 2 2 2
3 2 2 2
4 2 2
target JDM
l
k
2
3
4
2
3
4
0
0
0
0
0
0
0
0
current JDM
11
2K Construction
Configuration Model
Used stub
Free stub
2a
2b
4a
3a
2b
Construction stuck!
2/8 (25%) of the edges
cannot be added
3b
l
k 2 3 4
2 2 2 2
3 2 2 2
4 2 2
target JDM
l
k
2
Edges added
(2a,3a) (3b,4a) (2b,3a) 3
(2b,4a) (3a,3b) (2a,2b) 4
2
3
4
2
2
1
2
2
1
1
1
current JDM
12
2K Construction
Balanced Degree Invariant
JDM(3, 4) = 1
4) = 2
(3,

JDM (3, 4)
target JDM
JDM(3, 4) <

k =3
3a
3b
k =3
3a
3b
l
4a
4b
l
4a
4b
k =3
3a
3b
l
4a
4b
=4
Construction
Used stub
Free stub
constrained!
=4
=4
13
Our Contributions
 New 2K Construction Algorithm
 can produce any simple graph
 Main benefit: no constraints in constructed graphs
 with the exact JDMtarget
 in O(|E|dmax)
 2K+ Framework: JDMtarget+ Additional Properties
 2K + Node Attributes (exactly)
 2K + Avg Clustering (approx)

Main benefit: orders of magnitude faster than 2K+MCMC
14
2K Construction
JDMtarget
 Input: Joint Degree Matrix
• JDMtarget must be graphical
 Goal:
• Construct a simple graph with
exactly JDMtarget
1
2
3
4
1
1
1
2
1
1
3
1
1
4
1
1
4
4
2
15
2K Construction
Initialize:
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
…
JDM/JDMtarget
1
…
add edge between (x, y)
3
4
1
0/1 0/1
2
0/1 0/1
3
0/1 0/1
4
0/1 0/1 0/4 0/2
0/4
2a
…
…
2
3a
4a
4b
3b
1a
1b
16
2K Construction
Initialize:
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
…
JDM/JDMtarget
1
…
add edge between (x, y)
JDM(k, l)++
3
4
1
0/1 1/1
2
0/1 0/1
3
0/1 0/1
4
1/1 0/1 0/4 0/2
0/4
2a
…
…
2
3a
4a
4b
3b
1a
1b
17
2K Construction
Initialize:
JDM/JDMtarget
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
if x does not have free stubs
1
3
4
1
0/1 1/1
2
0/1 0/1
3
0/1 0/1
4
1/1 0/1 0/4 0/2
0/4
2a
neighbor switch for x
if y does not have free stubs
2
3a
4a
4b
3b
1a
1b
neighbor switch for y
add edge between (x, y)
JDM(k, l)++
18
Case 1
x, y both have free stubs
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
x
y
Add edge between x and y
19
Case 2
x has free stubs but y does not
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
x
b
y
t
Add edge between x and y
Neighbor switch
between y and b
using t
20
Case 3
neither x nor y have free stubs
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
Neighbor switch
between x and
b2 using t2
t2
x
b2
b1
y
t1
Add edge between x and y
Neighbor switch
between y and
b1 using t1
21
Properties of 2K Algorithm
 Terminates with exact JDMtarget in O(|E|dmax)
• It adds 1 edge at a time, while staying below JDMtarget
 It can produce ALL graphs with the JDMtarget
 Output graph depends on the order of adding edges
22
Our Contributions
 New 2K Construction Algorithm
 can produce any simple graph
 Main benefit: no constraints in constructed graphs
 with the exact JDMtarget
 in O(|E|dmax)
 2K+ Framework: JDMtarget+ Additional Properties
 2K + Node Attributes (exactly)
 2K + Avg Clustering (approx)

Main benefit: orders of magnitude faster than 2K+MCMC
23
Flexibility of 2K Algorithm
 Family of algorithms: add one edge at a time, while staying below
JDMtarget
• any order of degree pairs (k,l)
• any order of node pairs (x,y), even before completing a degree pair
• Can start with an empty or partially built graph
 2K+: can target additional properties fast
target
 Previously known: space of graphs with JDM
is connected; but


slow MCMC mixing
Property 1: clustering
Property 2: attribute correlation
24
Extension 1: Target JDM + Clustering
0 triangles
1 triangles
2
3
2
2
3
2
3
2 triangles
2
2
2
3
3
2
3
2
l 2
k
3
4
4
4
2
2
3
2
2
2
JDM
Intuition: by controlling the order we add edges we can control
clustering.
25
Extension 1: Target JDM + Clustering
0 triangles
2 triangles
2a
3a
2d
2c
l 2
k
3
4
4
4
2
2
3b
3
2a
3a
3b
JDM
2b
2c
25
2a
50
2c
2d
75
2d
2a
2b
3a
2b
0
3b
nodes randomly on a circle,
consider node pairs’ distance
2b
3a
2c
2d
3b
[INFOCOM 2013]: add edges in
26
increasing distance high clustering
“Sortedness” of node pairs’ list controls clustering
• Example: JDMtarget of Facebook Caltech Network
• Consider many orders of node pairs  create graphs with JDMtarget
 compute avg clustering c.
2a
2b
3a
2c
2d
3b
[INFOCOM 2015]: control order of
node pairs  control clustering
27
Extension 1: Target JDM + Clustering
2K+ Avg Clustering
Input: target JDM, avg clustering coefficient c
Stage 1
E’ = list of node pairs s.t. sortedness(E’)≈S(c)
FOR each candidate node pair (v,w) in E’:
IF both nodes v and w have free stubs and
the corresponding JDM(k, l) < JDMtarget(k, l):
add edge (v,w)
Stage 2
If not all |E| edges have been added:
Add remaining edges using 2K_Simple
28
Real world examples
target JDM+avg clustering
Average Node Closeness
Average Node Shortest Path Length
Average Clustering Coefficient
29
Real world examples
target JDM+avg clustering
2K+MCMC did not finish after several days
30
Extension 2: Node Attributes
l
k 1 2
2
1
2 2 6
JDM
2 2
2 4
JAM
l
k 1 2
2
1
2 2 6
JDM
Joint Attribute Matrix
(or Attribute Mixing Matrix)
31
Extension 2: Node Attributes Mixing
l
k 1 2
2
1
2 2 6
JDM
2 2
2 4
JAM
l
k 1 2
2
1
2 2 6
JDM
4
6
JAM
Joint Attribute Matrix
(or Attribute Mixing Matrix)
32
Extension 2: Degree+Attribute Mixing
l
k 1 2
2
1
2 2 6
2 2
2 4
JDM
1
2
1
1
2
1
2
1
1
l
k 1 2
2
1
2 2 6
JAM
2
JDM
1
1
1
2
4
2
4
6
1
JAM
2
2
2
2
6
Joint Degree and Attribute Matrix (JDAM)
33
Extension 2: target JDAM
2K Algorithm also works for target JDAM
1
2
1
1
2
1
2
1
1
2
1
1
1
1
2
4
2
2
2
2
2
6
Joint Degree and Attribute Matrix (JDAM)
34
Real world examples
graphs with node attributes
Average Node Closeness
Average Node Shortest Path Length
Average Clustering Coefficient
35
Real world examples
small graphs with node attributes
Simulation takes ~1 day to target 2K and c = 0.24
with MCMC (using double edge swaps)
36
Construction of 2K+ Graphs
 New 2K Construction Algorithm
• can produce any simple graph with exact JDMtarget in O(|E|dmax)
 2K+ Framework: JDMtarget+ Additional Properties


Extension 1: 2K (exactly) + Avg Clustering (approx)
Extension 2: 2K (exactly) + Node Attributes (exactly)
 Future directions
 Construction: target attributes + structure (towards 3K)
http://odysseas.calit2.uci.edu/osn/
37
Construction of 2K+ Graphs
 New 2K Construction Algorithm
• can produce any simple graph with exact JDMtarget in O(|E|dmax)
 2K+ Framework: JDMtarget+ Additional Properties


Extension 1: 2K (exactly) + Avg Clustering (approx)
Extension 2: 2K (exactly) + Node Attributes (exactly)
2a
2b
3a
2c
2d
3b
Questions?
38