Transcript Document

SCAN: A Structural
Clustering Algorithm
for Networks
Xiaowei Xu (徐晓伟)
Joint Work with Nurcan Yuruk (UALR) and Thomas A. J. Schweiger (Acxiom)
Network Clustering Problem


Networks made up of the mutual relationships
of data elements usually have an underlying
structure. Because relationships are complex, it
is difficult to discover these structures. How
can the structure be made clear?
Stated another way, given simply information of
who associates with whom, could one identify
clusters of individuals with common interests or
special relationships (families, cliques, terrorist
cells).
An Example of Networks
How many clusters?
 What size should
they be?
 What is the best
partitioning?
 Should some points
be segregated?

A Social Network Model



Individuals in a tight social group, or clique, know
many of the same people, regardless of the size
of the group.
Individuals who are hubs know many people in
different groups but belong to no single group.
Politicians, for example bridge multiple groups.
Individuals who are outliers reside at the margins
of society. Hermits, for example, know few
people and belong to no group.
The Neighborhood of a Vertex
Define () as the immediate neighborhood
of a vertex (i.e. the set of people that an
individual knows ).
v
Structure Similarity

The desired features tend to be captured
by a measure we call Structural Similarity
| (v)  ( w) |
 (v, w) 
| (v) || ( w) |

Structural similarity is large for members of a
clique and small for hubs and outliers.
Structural Connectivity [1]

-Neighborhood:
N (v)  {w  (v) |  (v, w)   }
Core: CORE , (v) | N (v) | 
 Direct structure reachable:

DirRECH  , (v, w)  CORE  , (v)  w  N  (v)
Structure reachable: transitive closure of
direct structure reachability
 Structure connected:

CONNECT ,  (v, w)  u V : RECH  ,  (u, v)  RECH  ,  (u, w)
[1] M. Ester, H. P. Kriegel, J. Sander, & X. Xu (KDD'97)
Structure-Connected Clusters

Structure-connected cluster C
 Connectivity: v, w  C : CONNECT , (v, w)
 Maximality: v, w  V : v  C  REACH  , (v, w)  w  C

Hubs:
 Not belong to any cluster
 Bridge to many clusters

Outliers:
hub
 Not belong to any cluster
 Connect to less clusters
outlier
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
0.63
13
0
Algorithm
2
3
=2
 = 0.7
5
1
4
0.67
8
0.82
12
0.75
6
11
10
9
13
7
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0.67
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
0.73
11
0.73
12
0.73
10
8
9
13
6
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0
Algorithm
2
3
=2
 = 0.7
5
7
4
0.51
6
11
8
12
10
9
13
1
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
11
8
0.68
12
10
9
13
6
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
0.51
12
10
9
13
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0
Algorithm
2
3
=2
 = 0.7
5
0.51
0.68
7
6
11
8
12
10
9
13
1
4
0.51
0
Algorithm
2
3
=2
 = 0.7
5
1
4
7
6
11
8
12
10
9
13
0
Running Time


Running time = O(|E|)
For sparse networks = O(|V|)
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).
Are you ready for some football?

Given only the 2006 schedule of what
schools each NCAA Division 1A team
met on a football field, what underlying
structures could one discover?
789 Contests

119 Division 1A school who play:
 schools in their conference
 schools in other 1A conferences
 independent 1A schools
(e.g. Army)
 schools in sub-1A conferences (e.g. Maine)
Consider Arkansas’ Schedule:
USC
Utah State
Vanderbilt
Alabama
Auburn
Southeast Missouri State
Mississippi
Louisiana Monroe
SouthCarolina
Tennessee
Mississippi State
LSU
Florida
Wisconsin
Pacific 10
Western Athletic
SEC
SEC
SEC
Non 1A
SEC
Sun Belt
SEC
SEC
SEC
SEC
SEC
Big 10
The Network:
The 1A Conference:
Result of Our Algorithm:
Result of FastModularity Alg. [2]:
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, 066111 (2004).
Conclusion

We propose a novel network clustering algorithm:
 It is fast O(|E|), for scale free networks: O(|V|)
 It can find clusters, as well as hubs and outliers

For more information:
 See you in poster session this evening at poster
board #4
 Email: [email protected]
 URL: http://ifsc.ualr.edu/xwxu

Thank you!