This would be an example of a two line header

Download Report

Transcript This would be an example of a two line header

2010
Scientific Computing
Problem Definition
Input:
• X  { x1 , x 2 ,..., x n } : A data set in d-dim. space
• m: Number of clusters (we don’t use k here to avoid
confusion with summation index…)
Output:
• Cluster centers: c j , 1  j  m
• Assignment of each xi to one of the m clusters:
a ij   0,1 , 1  i  n , 1  j  m
m
a
ij
 1,  i
j 1
Requirement:
2
2015/7/21
• The output should minimize the objective function…
2
2010
Scientific Computing
Objective Function
Objective function (distortion)
ej 

xi  c j
2
xi G j
m
J ( X ; C , A) 

m
ej 
j 1
 
xi  c j
j 1 xi G j
2
m

n

a ij x i  c j
2
, w here
j 1 i 1
X  { x1 , x 2 , ..., x n }
C  { c1 , c 2 , ..., c m }
m
a ij  1 iff x i  G j , w ith  a ij  1  i
j 1
•
•
3
2015/7/21
d*m (for matrix C) plus n*m (for matrix A) tunable
parameters with certain constraints on matrix A
Np-hard problem if exact solution is required
3
2010
Scientific Computing
Task 1: How to Find Assignment in A?
Goal
•
Find A to minimize J(X; C, A) with fixed C
Facts
•
Close-form solution exists:

1 if j  arg min x i  c q
q
a ij  

0 , otherwise

•
4
2015/7/21
2
J  X ; C , A p   J  X ; C , A p 1 
4
2010
Scientific Computing
Task 2: How to Find Centers in C?
Goal
•
Find C to minimize J(X; C, A) with fixed A
Facts
•
Close-form solution exists:
n
cj 
a
ij
xi
i 1
n
a
ij
i 1
•
5
2015/7/21
J  X ; C p , A   J  X ; C p 1 , A 
5
2010
Scientific Computing
Algorithm
1. Initialize
- Select initial m cluster centers
2. Find clusters
- For each xi, assign the cluster with nearest center
-  Find A to minimize J(X; C, A) with fixed C
3. Find centers
- Recompute each cluster center as the mean of data in
the cluster
-  Find C to minimize J(X; C, A) with fixed A
4. Stopping criterion
- Stop if clusters stay the same. Otherwise go to step 2.
6
2015/7/21
6
2010
Scientific Computing
Stopping Criteria
Two stopping criteria
•
•
Repeating until no more change in cluster
assignment
Repeat until distortion improvement is less than
a threshold
Facts
•
Convergence is assured since J is reduced
repeatedly.
J ( X ; C 1 , _)  J ( X ; C 1 , A1 )  J ( X ; C 2 , A1 )  J ( X ; C 2 , A2 )  J ( X ; C 3 , A2 )  J ( X ; C 2 , A3 )  ...
7
2015/7/21
7
2010
Scientific Computing
Properties of K-means Clustering
• K-means can find the approximate solution
efficiently.
• The distortion (squared error) is a monotonically
non-increasing function of iterations.
• The goal is to minimize the square error, but it
could end up in a local minimum.
• To increase the probability of finding the global
maximum, try to start k-means with different initial
conditions.
• “Cluster validation” refers to a set of methods
which try to determine the best value of k.
• Other distance measures can be used in place of
the Euclidean distance, with corresponding
change in center identification.
8
2015/7/21
8
2010
Scientific Computing
K-means Snapshots
9
2015/7/21
9
2010
Scientific Computing
Demo of K-means Clustering
• Toolbox download
• Utility Toolbox
• Machine Learning Toolbox
• Demos
• kMeansClustering.m
• vecQuzntize.m
10
2015/7/21
10
2010
Scientific Computing
Demos of K-means Clustering
kMeansClustering.m
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Input 2
0.8
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-0.8 -0.6 -0.4 -0.2
11
0
2015/7/21
0
0.2
0.4
0.6
0.8
-0.8 -0.6 -0.4 -0.2
0
0.2
Input 1
0.4
0.6
0.8
11
2010
Scientific Computing
Application: Image Compression
Goal
•
Convert a image from true colors to indexed
colors with minimum distortion.
Steps
•
•
•
Collect data from a true-color image
Perform k-means clustering to obtain cluster
centers as the indexed colors
Compute the compression rate
before  m * n * 3 * 8
after  m * n * log
 
before
after
12
2015/7/21

2
c   8 * c * 3
m * n *3*8
m * n * log
2
c   8 * c * 3
24

log
c  
2
24 c
m*n
24

log
12
2
c 
2010
Scientific Computing
Example: Image Compression
Original image
Dimension: 480x640
Data size: 480*640*3*8 bits = 0.92 MB
13
2015/7/21
13
2010
Scientific Computing
Example: Image Compression
2015/7/21
14
14
2010
Scientific Computing
Example: Image Compression
2015/7/21
15
15
2010
Scientific Computing
Code
16
X = imread('annie19980405.jpg');
image(X)
[m, n, p]=size(X);
index=(1:m*n:m*n*p)'*ones(1, m*n)+ones(p,1)*(0:m*n-1);
data=double(X(index));
maxI=6;
for i=1:maxI
centerNum=2^i;
fprintf('i=%d/%d: no. of centers=%d\n', i, maxI, centerNum);
center=kMeansClustering(data, centerNum);
distMat=distPairwise(center, data);
[minValue, minIndex]=min(distMat);
X2=reshape(minIndex, m, n);
map=center'/255;
figure; image(X2); colormap(map); colorbar; axis image;
2015/7/21
16
end