Graphs - Minas Gjoka
Download
Report
Transcript Graphs - Minas Gjoka
Construction of Simple Graphs with a Target
Joint Degree Matrix and Beyond
Minas Gjoka, Balint Tillman, Athina Markopoulou
University of California, Irvine
Graphs
DNS
Autonomous
Systems
Social Networks
Protein interactions
World Wide Web
2
Motivation
Measurements/sampling OSNs
• http://odysseas.calit.uci.edu/osn/
• [INFOCOM 2010],[ SIGMETRICS 2011],
3x[JSAC 2011], [WOSN 2012]…
• ~3500 researchers have requested our
Facebook datasets
Social Networks
Generate synthetic graphs that
resemble real social networks
• to use in simulations
• for anonymization
Q1: resemble in terms of what?
Q2: generate how?
3
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
• “A set of graph properties that describe and constrain random
graphs, using degree correlations, in successively finer detail”
2a
1
3
2b
4
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree k 2 E
V
2a
1
0K
3
24
2
4
2b
5
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree
• 1K specifies the node degree sequence
D(k ) aV 1
k
2a
1
3
2b
k
D(k)
1
1
2
2
3
1
4
1K
6
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
• 0K specifies the average node degree
• 1K specifies the node degree sequence
• 2K specifies the joint node degree matrix (JDM)
JDM (k , l ) aV
k
(k,l)
2a
1
3
1
1
2
bVl {{a ,b )E }
1
1
2
2
2b
3
3
1
2
2
2K
7
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
•
•
•
•
0K specifies the average node degree
1K specifies the node degree sequence
2K specifies the joint node degree matrix (JDM)
3K specifies the number of induced subgraphs of 3 nodes
o nodes are labeled by their degree k
2a
1
3
(k,l,m)
#Wedges
(k,l,m)
#Triangles
1,3,2
2
2,2,3
1
2
2b
4
3K
2
8
dK-Series
dK-series framework [Mahadevan et al, Sigcomm ’06]
•
•
OSNs •
“2K+” •
•
•
0K specifies the average node degree
1K specifies the node degree sequence
2K specifies the joint node degree matrix (JDM)
3K specifies the number of induced subgraphs of 3 nodes
…
nK specifies the entire graph
Nice properties
• Inclusion
• Convergence
• Tradeoff : accuracy vs. complexity
9
Related Work
Graph Construction Approaches:
• Stochastic: reproduces dk-distribution in expectation.
• Configuration (“pseudograph”): reproduces dk-distribution exactly.
o Deterministic algorithms up to d=2. MCMC for d>=2.
1K Construction
• Configuration: 1K multigraphs [Molloy’95]
• 1K+ [Bansal ’09, Newman’09, Serrano & Boguna’05, …]
2K Construction
• Configuration model for 2K multigraphs [Mahadevan’06]
• Balance Degree Invariant: simple graphs [Amanatidis’08], [Stanton’ 12]
2K+ Construction
• 2K preserving, 3K targeting using edge rewiring: [Mahadevan’ 06]
• 2.5K heuristic: JDM+degree dependent clustering coefficient: [Gjoka’13]
10
2K Construction
Configuration Model
Free stub
2a
2b
4a
3a
2b
3b
l
k 2 3 4
2 2 2 2
3 2 2 2
4 2 2
target JDM
l
k
2
3
4
2
3
4
0
0
0
0
0
0
0
0
current JDM
11
2K Construction
Configuration Model
Used stub
Free stub
2a
2b
4a
3a
2b
Construction stuck!
2/8 (25%) of the edges
cannot be added
3b
l
k 2 3 4
2 2 2 2
3 2 2 2
4 2 2
target JDM
l
k
2
Edges added
(2a,3a) (3b,4a) (2b,3a) 3
(2b,4a) (3a,3b) (2a,2b) 4
2
3
4
2
2
1
2
2
1
1
1
current JDM
12
2K Construction
Balanced Degree Invariant
JDM(3, 4) = 1
4) = 2
(3,
JDM (3, 4)
target JDM
JDM(3, 4) <
k =3
3a
3b
k =3
3a
3b
l
4a
4b
l
4a
4b
k =3
3a
3b
l
4a
4b
=4
Construction
Used stub
Free stub
constrained!
=4
=4
13
Our Contributions
New 2K Construction Algorithm
can produce any simple graph
Main benefit: no constraints in constructed graphs
with the exact JDMtarget
in O(|E|dmax)
2K+ Framework: JDMtarget+ Additional Properties
2K + Node Attributes (exactly)
2K + Avg Clustering (approx)
Main benefit: orders of magnitude faster than 2K+MCMC
14
2K Construction
JDMtarget
Input: Joint Degree Matrix
• JDMtarget must be graphical
Goal:
• Construct a simple graph with
exactly JDMtarget
1
2
3
4
1
1
1
2
1
1
3
1
1
4
1
1
4
4
2
15
2K Construction
Initialize:
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
…
JDM/JDMtarget
1
…
add edge between (x, y)
3
4
1
0/1 0/1
2
0/1 0/1
3
0/1 0/1
4
0/1 0/1 0/4 0/2
0/4
2a
…
…
2
3a
4a
4b
3b
1a
1b
16
2K Construction
Initialize:
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
…
JDM/JDMtarget
1
…
add edge between (x, y)
JDM(k, l)++
3
4
1
0/1 1/1
2
0/1 0/1
3
0/1 0/1
4
1/1 0/1 0/4 0/2
0/4
2a
…
…
2
3a
4a
4b
3b
1a
1b
17
2K Construction
Initialize:
JDM/JDMtarget
1K: create nodes and stubs
JDM(k,l)=0 for all k,l
Pick (k, l) degree pair, in any order
While JDM(k, l) < JDMtarget(k, l)
Pick (x, y) any pair of disconnected
nodes with degrees k and l
if x does not have free stubs
1
3
4
1
0/1 1/1
2
0/1 0/1
3
0/1 0/1
4
1/1 0/1 0/4 0/2
0/4
2a
neighbor switch for x
if y does not have free stubs
2
3a
4a
4b
3b
1a
1b
neighbor switch for y
add edge between (x, y)
JDM(k, l)++
18
Case 1
x, y both have free stubs
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
x
y
Add edge between x and y
19
Case 2
x has free stubs but y does not
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
x
b
y
t
Add edge between x and y
Neighbor switch
between y and b
using t
20
Case 3
neither x nor y have free stubs
JDM(k, l) < JDMtarget(k, l)
node x has degree k
node y has degree l
k=3
l=4
Neighbor switch
between x and
b2 using t2
t2
x
b2
b1
y
t1
Add edge between x and y
Neighbor switch
between y and
b1 using t1
21
Properties of 2K Algorithm
Terminates with exact JDMtarget in O(|E|dmax)
• It adds 1 edge at a time, while staying below JDMtarget
It can produce ALL graphs with the JDMtarget
Output graph depends on the order of adding edges
22
Our Contributions
New 2K Construction Algorithm
can produce any simple graph
Main benefit: no constraints in constructed graphs
with the exact JDMtarget
in O(|E|dmax)
2K+ Framework: JDMtarget+ Additional Properties
2K + Node Attributes (exactly)
2K + Avg Clustering (approx)
Main benefit: orders of magnitude faster than 2K+MCMC
23
Flexibility of 2K Algorithm
Family of algorithms: add one edge at a time, while staying below
JDMtarget
• any order of degree pairs (k,l)
• any order of node pairs (x,y), even before completing a degree pair
• Can start with an empty or partially built graph
2K+: can target additional properties fast
target
Previously known: space of graphs with JDM
is connected; but
slow MCMC mixing
Property 1: clustering
Property 2: attribute correlation
24
Extension 1: Target JDM + Clustering
0 triangles
1 triangles
2
3
2
2
3
2
3
2 triangles
2
2
2
3
3
2
3
2
l 2
k
3
4
4
4
2
2
3
2
2
2
JDM
Intuition: by controlling the order we add edges we can control
clustering.
25
Extension 1: Target JDM + Clustering
0 triangles
2 triangles
2a
3a
2d
2c
l 2
k
3
4
4
4
2
2
3b
3
2a
3a
3b
JDM
2b
2c
25
2a
50
2c
2d
75
2d
2a
2b
3a
2b
0
3b
nodes randomly on a circle,
consider node pairs’ distance
2b
3a
2c
2d
3b
[INFOCOM 2013]: add edges in
26
increasing distance high clustering
“Sortedness” of node pairs’ list controls clustering
• Example: JDMtarget of Facebook Caltech Network
• Consider many orders of node pairs create graphs with JDMtarget
compute avg clustering c.
2a
2b
3a
2c
2d
3b
[INFOCOM 2015]: control order of
node pairs control clustering
27
Extension 1: Target JDM + Clustering
2K+ Avg Clustering
Input: target JDM, avg clustering coefficient c
Stage 1
E’ = list of node pairs s.t. sortedness(E’)≈S(c)
FOR each candidate node pair (v,w) in E’:
IF both nodes v and w have free stubs and
the corresponding JDM(k, l) < JDMtarget(k, l):
add edge (v,w)
Stage 2
If not all |E| edges have been added:
Add remaining edges using 2K_Simple
28
Real world examples
target JDM+avg clustering
Average Node Closeness
Average Node Shortest Path Length
Average Clustering Coefficient
29
Real world examples
target JDM+avg clustering
2K+MCMC did not finish after several days
30
Extension 2: Node Attributes
l
k 1 2
2
1
2 2 6
JDM
2 2
2 4
JAM
l
k 1 2
2
1
2 2 6
JDM
Joint Attribute Matrix
(or Attribute Mixing Matrix)
31
Extension 2: Node Attributes Mixing
l
k 1 2
2
1
2 2 6
JDM
2 2
2 4
JAM
l
k 1 2
2
1
2 2 6
JDM
4
6
JAM
Joint Attribute Matrix
(or Attribute Mixing Matrix)
32
Extension 2: Degree+Attribute Mixing
l
k 1 2
2
1
2 2 6
2 2
2 4
JDM
1
2
1
1
2
1
2
1
1
l
k 1 2
2
1
2 2 6
JAM
2
JDM
1
1
1
2
4
2
4
6
1
JAM
2
2
2
2
6
Joint Degree and Attribute Matrix (JDAM)
33
Extension 2: target JDAM
2K Algorithm also works for target JDAM
1
2
1
1
2
1
2
1
1
2
1
1
1
1
2
4
2
2
2
2
2
6
Joint Degree and Attribute Matrix (JDAM)
34
Real world examples
graphs with node attributes
Average Node Closeness
Average Node Shortest Path Length
Average Clustering Coefficient
35
Real world examples
small graphs with node attributes
Simulation takes ~1 day to target 2K and c = 0.24
with MCMC (using double edge swaps)
36
Construction of 2K+ Graphs
New 2K Construction Algorithm
• can produce any simple graph with exact JDMtarget in O(|E|dmax)
2K+ Framework: JDMtarget+ Additional Properties
Extension 1: 2K (exactly) + Avg Clustering (approx)
Extension 2: 2K (exactly) + Node Attributes (exactly)
Future directions
Construction: target attributes + structure (towards 3K)
http://odysseas.calit2.uci.edu/osn/
37
Construction of 2K+ Graphs
New 2K Construction Algorithm
• can produce any simple graph with exact JDMtarget in O(|E|dmax)
2K+ Framework: JDMtarget+ Additional Properties
Extension 1: 2K (exactly) + Avg Clustering (approx)
Extension 2: 2K (exactly) + Node Attributes (exactly)
2a
2b
3a
2c
2d
3b
Questions?
38