K-MST-based clustering

Download Report

Transcript K-MST-based clustering

K-MST -based clustering
Caiming Zhong
Pasi Franti
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Outline
• Minimum spanning tree (MST)
• MST-based clustering
• K-MST
• K-MST-based clustering
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
• Fast approximate MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Minimum Spanning Tree
• Spanning tree
Given graph
Spanning tree
NonSpanning tree
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Minimum Spanning Tree
• Minimize the sum of weights (Kruskal,
Prim’s Algorithm)
w (T ) 
 w (u , v )
( u , v T )
Given graph
G=(V,E)
MST
T
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST-based clustering
• The most used Method1: removing
long MST-edges
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST-based clustering
• Removing long MST-edges doesn’t
always work
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST-based clustering
• The most used Method2: edge inconsistent
Tree edge
AB, whose weight
W(AB) is
significantly larger
than the average of
nearby edge
weights on both
sides of the edge
AB, should be
deleted.
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
K-MST
• What is K-MST?
– Let G = (V,E) denote the complete graph
– Let MST1 denote the MST of G, and it is
computed as MST1 = mst(V, E).
– Then, MST2 denote the second round of
MST of G, MST2 = mst(V, E- MST1).
– MSTk = mst(V, E- MST1-…-MSTk-1).
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
K-MST
• K-MST-based graph
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
K-MST
• Typical clustering problems
– Separated problems and touching
problems.
– Separated problems includes distanceseparated problems and density-separated
problems.
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
K-MST-based clustering
• Definition of edge weight for separated
problems
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
w ( e ab )  1 
min( avg ( E a  {e ab }), avg ( E b  {e ab }))
 ( e ab )
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Three good features: (1) Weights of inter-cluster edges are
quite larger than those of intra-cluster edges. (2) The intercluster edges are approximately equally distributed to T1 and
T2. (3) Except inter- cluster edges, most of edges with large
weights come from T2.
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
K-MST-based clustering
• Touching problems
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Partition(cut1) and
University of Joensuu
Dept. of Computer
Partition(cut3) are similar
; Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Partition(cut2) and
Partition(cut3) are similar .
Fast approximate MST (FAMST)
• Traditional MST algorithms take
O(N2) time, not favored by large data
sets.
• In practical application, generally
FAMST has as same result as exact
MST
• Find a FAMST in O(N1.55)
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Fast approximate MST (FAMST)
• Scheme: Divide-and-Conquer
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
Fast approximate MST (FAMST)
• Performance
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi
MST
MST-based
clustering
K-MST
K-MST-based
clustering
Fast approximate
MST
University of Joensuu
Dept. of Computer Science
P.O. Box 111
FIN- 80101 Joensuu
Tel. +358 13 251 7959
fax +358 13 251 7955
www.cs.joensuu.fi