cikm_poster.ppt

Download Report

Transcript cikm_poster.ppt

Overlapping Community
Detection in Networks
Nan Du
1
Overlapping Community Detection
• It is possible for each individual to have many
communities simultaneously.
• Question: how can we develop an algorithm to
find overlapping communities ?
• Related work
– Palla’s CPM algorithm 2006
– GN-extensions :
• CONGA, P&W, 2007
• fuzzy k-means 2007
2
Overlapping Community Detection
• Palla’s CPM algorithm, 2005
– Well-defined k-clique community
– Required user input parameter k
– Can not cover all the vertices in the given network
• CONGA, 2007
– Based on defined splitting betweenness to decide when to split vertices,
what vertex to split and how to split them
– Low efficiency on large graph O(m3)
• P&W, 2007
– Based on both of the edge betweenness and vertex betweenness to
decide whether to split a vertex or remove an edge, which requires a
user input parameter to assess the similarity between pairs of vertices
• Fuzzy clustering, 2007
– requires a user input parameter to indicate an upper bound of the
community's number, which is often hard to give in real networks
3
Overlapping Community Detection
• A novel algorithm COCD (Clique-based
Overlapping Community Detection) is
proposed
– Can cover all the vertices of the given network
– Free of user input parameters
– Efficient and scalable
4
Overlapping Community Detection
• COCD consists of 3 basic steps
– Maximal clique enumeration
• Peamc on sparse graphs
– Core formation
• a core is the set of all closely related maximal cliques
– Clustering
• Freeman Centrality is used to assign the left vertices to
the cores
5
Overlapping Community Detection
• Core Formation
– A core is defined as a set of closely related
maximal cliques
– How to decide whether to merge two cores once
they share some common vertices?
– Solution : Closeness Function
isClose(Gi ,G j )  0, subgraph Gi and G j
will be merged; otherwise, they are separated
6
Overlapping Community Detection
• COCD algorithm
– Core formation (whether to merge two cores ?)
• Closeness Function
CL and CS are the set of maximal cliques containing v L , v S
GL and GS are the induced sub-graphs | V (GL ) || V (GS ) |
Ebetween (GL ,GS ) is the set of edges between GL and GS

| V (GL ) |
| V (GS ) |
I LS  V (GL )  V (GS )
infLS   | Ebetween (G(V (GL ) {vS }) ,G(V (GS ) I LS ) ) |
isClose(GL ,GS )  ( infLS   | E (G(V (GS ) I LS ) ) |) 
( infSL   | E (G(V (GL ) I LS ) ) |)
7
Overlapping Community Detection
C0  {{v 0 ,v1,v 2,v 3 },{v 0 ,v 3 ,v 4 }}
C4  {{v 4 ,v 5 ,v 6 ,v 7 },{v 4 ,v 5 ,v 7 ,v 8 },
{v 0 ,v 3 ,v 4 }} I  {v ,v ,v }
• COCD algorithm
– Core formation
40
V1
0
3
4
V5
V0
V6
V7
V4
V2
V3
V8
8
Overlapping Community Detection
C0  {{v 0 ,v1,v 2,v 3 },{v 0 ,v 3 ,v 4 }}
C4  {{v 4 ,v 5 ,v 6 ,v 7 },{v 4 ,v 5 ,v 7 ,v 8 },
{v 0 ,v 3 ,v 4 }}
• COCD algorithm
– Core formation
I 40  {v 0 ,v 3 ,v 4 }
V5
V1
V6
V7
V4
V2
V3
V8
V (G4 )  {v 0 } 
{v 3 ,v 4 ,v 5 ,v 6 ,v 7 ,v 8 }
V (G0 )  I 40  {v1,v 2 }
| Ebetween (G(V (G4 ) {v 0 }),G(V (G0 ) I 40 ) ) | 2

| V (G4 ) |
 1.4
| V (G0 ) |
inf40  1.4 | Ebetween (G(V (G4 ){v 0 }),G(V (G0 ) I 40 ) ) | 2.8
9
Overlapping Community Detection
C0  {{v 0 ,v1,v 2,v 3 },{v 0 ,v 3 ,v 4 }}
C4  {{v 4 ,v 5 ,v 6 ,v 7 },{v 4 ,v 5 ,v 7 ,v 8 },
{v 0 ,v 3 ,v 4 }}
• COCD algorithm
– Core formation
V1
V0
I 40  {v 0 ,v 3 ,v 4 }
V5
V6
V7
V (G0 )  {v 4 }  {v 0 ,v1,v 2,v 3 }
V (G4 )  I 40  {v 5 ,v 6 ,v 7 ,v 8 }
V2
V3
V8
| Ebetween (G(V (G0 ){v 4 }) ,G(V (G4 ) I 40 ) ) | 0 inf04  0
10
Overlapping Community Detection
• COCD algorithm
– Core formation
C0  {{v 0 ,v1,v 2,v 3 },{v 0 ,v 3 ,v 4 }}
C4  {{v 4 ,v 5 ,v 6 ,v 7 },{v 4 ,v 5 ,v 7 ,v 8 },
{v 0 ,v 3 ,v 4 }}
isClose(G4 ,G0 )  ( inf40   | E (G(V (G0 ) I 40 ) ) |) 
( inf04   | E (G(V (G4 ) I 40 ) ) |)  0
V1
Because isClose(G4 ,G0 )  0,
C0 and C4 should not
be merged together
V5
V0
V6
V7
V4
V2
V3
V8
11
Overlapping Community Detection
• Experimental Evaluation
– On networks with known community structures
• precision : the fraction of vertex pairs in the same
cluster that also belong to the same community
• recall : the fraction of vertex pairs belonging to the
same community that are also in the same cluster
– On networks with unknown community structures
• overlap coefficient & vertex average degree (vad)
overlap 
 |P |
Pi C
i
| V (G ) |
vad 
2  |E (Pi ) |
Pi C
 |P |
Pi C
i
12
Overlapping Community Detection
• Experimental Evaluation
16 Real datasets from different domains
13
Overlapping Community Detection
• Experimental Evaluation
1.2
1.2
1
COCD
0.8
CONGA
P&W
0.6
Cfinder(k=3)
0.4
Cfinder(k=5)
0.2
0.8
COCD
CONGA
P&W
Cfinder(k=3)
Cfinder(k=5)
Precision
Recall
1
0.6
0.4
0.2
0
Karate
Club
Dolphin
Social
Network
College
Football
0
Karate
Club
Dolphin
Social
Network
College
Football
14
Overlapping Community Detection
• Experimental Evaluation
1.67
1.43
1.45
1.44
Results on networks with unknown community structures
15
Community Detection
• Experimental Evaluation
Communities of word
association network
Communities of cell phone network
16
References
• S. Gregory. An algorithm to find overlapping community
structure in networks. In The PKDD, pages 91-102, 2007
• G. Palla, I. Dernyi, and I. Farkas. Uncovering the overlapping
community structure of complex network in nature and society.
Nature, 435(7043):814-818, June 2005
• J. Pinney and D. Westhead. Betweenness-based decomposition
methods for social and biological networks. Leeds University
Press
• S. Zhang, R. S. Wang, and X. S. Zhang. Identificationof
overlapping community structure in complex networks using
fuzzy c-means clustering. PHYSICA, 374(1)
• N. Du, B. Wu, and B. Wang. A parallel algorithm for
enumerating all maximal cliques in complex networks. In
ICDM Mining Complex Datd Workshop, pages 320-324,
December 2006.
17