Clustering Algorithms
Download
Report
Transcript Clustering Algorithms
KI2 - 7
Clustering Algorithms
Johan Everts
Kunstmatige Intelligentie / RuG
1
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so that
the objects of one cluster are similar to each other whereas objects of
different clusters are dissimilar. (Bacher 1996)
The Goals of Clustering
Determine the intrinsic grouping in a set of unlabeled
data.
What constitutes a good clustering?
All clustering algorithms will produce clusters,
regardless of whether the data contains them
There is no golden standard, depends on goal:
data reduction
“natural clusters”
“useful” clusters
outlier detection
Stages in clustering
Taxonomy of Clustering Approaches
Hierarchical Clustering
Agglomerative clustering treats each data point as a singleton cluster, and
then successively merges clusters until all points have been merged into a
single remaining cluster. Divisive clustering works the other way around.
Agglomerative Clustering
Single link
In single-link hierarchical clustering, we merge in each step the two
clusters whose two closest members have the smallest distance.
Agglomerative Clustering
Complete link
In complete-link hierarchical clustering, we merge in each step the two
clusters whose merger has the smallest diameter.
Example – Single Link AC
BA
FI
MI
NA
RM
TO
BA
0
662
877
255
412
996
FI
662
0
295
468
268
400
MI
877
295
0
754
564
138
NA
255
468
754
0
219
869
RM
412
268
564
219
0
669
TO
996
400
138
869
669
0
Example – Single Link AC
Example – Single Link AC
BA
FI
MI/TO
NA
RM
BA
0
662
877
255
412
FI
662
0
295
468
268
MI/TO
877
295
0
754
564
NA
255
468
754
0
219
RM
412
268
564
219
0
Example – Single Link AC
Example – Single Link AC
BA
FI
MI/TO
NA/RM
BA
0
662
877
255
FI
662
0
295
268
MI/TO
877
295
0
564
NA/RM
255
268
564
0
Example – Single Link AC
Example – Single Link AC
BA/NA/RM
FI
MI/TO
BA/NA/RM
0
268
564
FI
268
0
295
MI/TO
564
295
0
Example – Single Link AC
Example – Single Link AC
BA/FI/NA/RM
MI/TO
BA/FI/NA/RM
0
295
MI/TO
295
0
Example – Single Link AC
Example – Single Link AC
Taxonomy of Clustering Approaches
Square error
K-Means
Step 0: Start with a random partition into K clusters
Step 1: Generate a new partition by assigning each
pattern to its closest cluster center
Step 2: Compute new cluster centers as the
centroids of the clusters.
Step 3: Steps 1 and 2 are repeated until there is no
change in the membership (also cluster centers
remain the same)
K-Means
K-Means – How many K’s ?
K-Means – How many K’s ?
Locating the ‘knee’
The knee of a curve is defined as the point of
maximum curvature.
Leader - Follower
Online
Specify threshold distance
Find the closest cluster center
Distance above threshold ? Create new cluster
Or else, add instance to cluster
Leader - Follower
Find the closest cluster center
Distance above threshold ? Create new cluster
Or else, add instance to cluster
Leader - Follower
Distance < Threshold
Find the closest cluster center
Distance above threshold ? Create new cluster
Or else, add instance to cluster and update cluster
center
Leader - Follower
Find the closest cluster center
Distance above threshold ? Create new cluster
Or else, add instance to cluster and update cluster
center
Leader - Follower
Distance > Threshold
Find the closest cluster center
Distance above threshold ? Create new cluster
Or else, add instance to cluster and update cluster
center
Kohonen SOM’s
The Self-Organizing Map (SOM) is an unsupervised
artificial neural network algorithm. It is a compromise
between biological modeling and statistical data processing
Kohonen SOM’s
Each weight is representative of a certain input.
Input patterns are shown to all neurons simultaneously.
Competitive learning: the neuron with the largest response is chosen.
Kohonen SOM’s
Initialize weights
Repeat until convergence
Select next input pattern
Find Best Matching Unit
Update weights of winner and neighbours
Decrease learning rate & neighbourhood size
Learning rate & neighbourhood size
Kohonen SOM’s
Distance related learning
Kohonen SOM’s
Some nice illustrations
Kohonen SOM’s
Kohonen SOM Demo (from ai-junkie.com):
mapping a 3D colorspace on a 2D Kohonen map
Performance Analysis
K-Means
Depends a lot on a priori knowledge (K)
Very Stable
Leader Follower
Depends a lot on a priori knowledge (Threshold)
Faster but unstable
Performance Analysis
Self Organizing Map
Stability and Convergence Assured
Principle of self-ordering
Slow and many iterations needed for convergence
Computationally intensive
Conclusion
No Free Lunch theorema
Any elevated performance over one class, is
exactly paid for in performance over another class
Ensemble clustering ?
Use SOM and Basic Leader Follower to identify
clusters and then use k-mean clustering to refine.
Any Questions ?
?