TV 프로그램을 위한 내용기반 추천 - ai

Download Report

Transcript TV 프로그램을 위한 내용기반 추천 - ai

EM Algorithm:
Expectation Maximazation
Clustering Algorithm
book: “DataMining, Morgan Kaufmann, Frank”
DataMining, Morgan Kaufmann,
p218-227
Mining Lab. 김완섭
2004년 10월 27일
Content







Clustering
K-Means via EM
Mixture Model
EM Algorithm
Simple examples of EM
EM Application; WEKA
References
Clustering (1/2)

Clustering ?


Clustering algorithms divide a data set into
natural groups (clusters).
Instances in the same cluster are similar to
each other, they share certain properties.


e.g Customer Segmentation.
Clustering vs. Classification


Supervised Learning
Unsupervised Learning
 Not target variable to be predicted.
Clustering (2/2)

Categorization of Clustering Methods

Partitioning mehtods


Hierachical methods


DBSCAN / OPTICS
Grid-based methods


CURE / CHAMELON / BIRCH
Density-based methods


K-Means / K-medoids / PAM / CRARA / CRARANS
Model-Based Clustering
Probability-based Clustering
STING / CLIQUE / Wave-Cluster
Statistical Clustering
Model-based methods

EM / COBWEB / Bayesian / Neural
K-Means (1)
Algorithm

Step 0 :


Step 1 : (Assignment)



For each object compute distances to k centroids.
Assign each object to the cluster to which it is
the closest.
Step 2 : (New Centroids)


Select K objects as initial centroids.
Compute a new centroid for each cluster.
Step 3: (Converage)


Stop if the change in the centroids is less than
the selected covergence criterion.
Otherwise repeat Step 1.
K-Means (2)
simple example
Input Data
Random Centroids
Centroids & (check)
Assignment
Assignment
New Centroids
& (check)
New Centroids
& (Check)
Assignment
K-Means (3)
weakness on outlier (noise)
K-Means (4)
Calculation
0.
(4,4), (3,4)
(4,2), (0,2), (1,1), (1,0)
1. 1) <3.5, 4>
<1.5, 1.25>
2) <3.5, 4> - (3, 4), (4, 4), (4, 2)
<1.5, 1.25> - (0, 2) (1, 1), (1, 0)
2. 2) <3.67, 3.3>
<0.67, 1>
3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2)
<0.67, 1> - (0, 2) (1, 1), (1, 0)
1.
(4,4), (3,4)
(4,2), (0,2), (1,1), (1,0) (100, 0)
1. 1) <3.5, 4>
<21, 1>
2) <3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2)
<21, 1> - (100,1)
2. 1) <2.1, 2.1>
<100, 0>
2) <2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2)
<100, 1> - (100, 1)
K-Means (5)
comparison with EM

C1
K-Means

Hard Clustering.

A instance belong to only one Cluster.
Based on Euclidean distance.
Not Robust on outlier, value range.
I
C2



EM



0.7
C1
0.3
Soft Clustering.
I
C2
 A instance belong to several clusters with
membership probability.
Based on density probability.
Can handle both numeric and nominal attributes.
Mixture Model (1)



A Mixture is a set of k probability distributions,
repesenting k clusters.
A probability distribution have mean and
variances.
The mixture model combines several normal
distributions.
Mixture Model (2)
 Only one numeric attribute
 five parameter

 A B  A  B pA
Mixture Model (3)
Simple Example

Probability that an instance x belongs to cluster A
Pr[x | A]  Pr[A] f ( x;  A, A ) p A
Pr[A | x] 

Pr[x]
Pr[x]

1
f ( x;  ,  ) 
e
2 
( x )2
2 2
Probability Density Function
Mixture Model (4)
Probability Density Function



Normal Distribution
Gaussian Density Function
Poisson Distribution
1 x
 (
1
p ( x) 
e 2
 2


1
f ( x;  ,  ) 
e
2 
)2
( x )2
2 2
Mixture Model (5)
Probability Density Function

Iteration
k
w
h 1
h
 1, wh  0
Iteration
EM Algorithm (1)

Step 1. (Initialization)


Step 2. (Maximization Step)




Update record’s weight
Step 4.



Parameter Adjustment
Re-create cluster model
Re-compute the parameter Θ(mean, variance)
normal distribution.
Step 3. (Expectation Step)


Random probability
Calculate log-likelihood
If the value saturates, exit
If not, Go to Step 2.
Weight Adjustment
EM Algorithm (2)
Initialization

Random Probability


M-Step
Example
Num
Math
English
Cluster1
Cluster2
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
EM Algorithm (3)
M-Step : Parameter (Mean, Dev)


Estimating parameters from weighted
instances
Parameters
 A B  A  B
 means, deviations.
w1 x1  w2 x2  ...  wn xn
A 
w1  w2  ...  wn
2
2
2
w
(
x


)

w
(
x


)

...

w
(
x


)
2
2
n
n
2  1 1
w1  w2  ... wn
17
EM Algorithm (3)  A  B  A  B
M-Step : Parameter (Mean, Dev)
Num
Math
English
Cluster-A
Cluster-B
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
 A,math
 A,math
0.25  80  0.8  50  ...  0.6  60

 49
0.25  0.8  0.43  0.7  0.15  0.6
2
0.25(80  49) 2  0.8(50  49) 2  ...  0.6(60  49) 2

 376
0.25  0.8  0.43  0.7  0.15  0.6
18
EM Algorithm (4)
E-Step : Weight

compute weight
wh ( x) 
wh  f h ( x | h )
 wi  fi ( x | h )
i

here
d
f h ( x | h )   f h , j ( x j | h )
j 1
1 x
 (
1
p ( x) 
e 2
 2

)2
EM Algorithm (5)
E-Step : Weight

compute weight
Num
Math
English
1
80
90
0.25 f c1 ( x | c1 )
wc1 (1번학생) 
0.25 f c1 (1번학생| c1 )  0.75 f c 2 (1번학생| c 2 )

0.25 * 0.0016
 0. ???
0.25 * 0.0016  0.75 * 0.00863
here
f c1 ( x | c1 )  f c1,math (학생1 | c1,math )  fc1,eng (학생1 | c1,eng )  0.0016
1 x  49 2
)
19
 (
1
e 2
19 2
 0.025
1 x  77 2
)
11
 (
1
e 2
11 2
 0.064
EM Algorithm (6)
Objective Function (check)

Log-likelihood Function
For all instances, it’s probability belong to cluster A,
Use log for analysis


 log( p
A
P r[xi | A]  pB P r[xi | B]
i
k
L()   log( wh  f h ( x | h ,  h))
xD
h 1
f h ( x |  h , h ) 
1
(2 ) d  | h
1-Dimensional data
2-Cluster A,B
N-Dimensional data
K-cluster
h
- Mean vector
- Covariance matrix

1
 exp(  ( x   h )T (h ) 1 ( x   h ))
2
|
h
EM Algorithm (7)
Objective Function (check)

c1
19 0 


0
11


c1  49,77 
1
f ( x;  ,  ) 
e
2 
f h ( x |  h , h ) 

( x )2
2 2
- Covariance Matrix
- Mean Vector
1
(2 ) d  | h

1
 exp(  ( x   h )T (h ) 1 ( x   h ))
2
|
h
h
EM Algorithm (8)
Termination

Termination

| L( j )  L( j 1 ) | 
Procedure stops when log-likelihood saturates.
Q4
Q3
Q2
Q1
Q0
# of Iteration
EM Algorithm (1)
Simple Data

EM example


6 data (3 sample per 1 class)
2 class (circle, rectangle)
EM Algorithm (2)
 log( p
A
P r[xi | A]  pB P r[xi | B]
i
Likelihood function of two component means Θ1, Θ2
EM Algorithm (3)
EM Example (1)

Example dataset

2 Column(Math, English), 6 record
Num
Math
English
1
80
90
2
60
75
3
80
90
4
30
60
5
75
100
6
15
80
EM Example (2)

Distri. Of Math



mean : 56.67
variance : 776.73
100
Distri. Of Eng


mean : 82.5
variance : 197.50
50
0
50
100
EM Example (3)

Random Cluster Weight
Num
Math
English
Cluster1
Cluster2
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
EM Example (4)

Iteration 1
Maximization Step
(parameter adjustment)
EM Example (4)
k
L()   log( wh  f h ( x | h ,  h))
xd
h 1
f h ( x |  h , h ) 
1
(2 ) d  | h
1
 exp( ( x   h )T (h ) 1 ( x   h ))
2
|
1번 학생의 cluster1에대한 계산의예
f h ( x |  h , h )
1  80  49

 exp(  (    )T
2  90  77
19 0 
2
 |
(2 )  | 
 0 11
1
1
19 0 
 80  49
  (    ))
 
 0 11
 90  77
1

0

  31
1
1
19
   )

 exp(  31 13  
2
1
2
 0
 13
(2 )  19  11
11

1
1 312  11 132  19

 exp( 
)
2
2
19

11
(2 )  19  11
EM Example (5)

Iteration 2
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Example (6)

Iteration 3
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Example (6)

Iteration 3
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Application (1)
Weka

Weka




Waikato University in Newzealand
Open Source Mining Tool
http://www.cs.waikato.ac.nz/ml/weka
Experiment Data


Iris data
Real Data
 Department Customer Data
 Modified Customer Data
EM Application (2)
IRIS Data

Data Info

Attribute Information:


sepal length in cm / sepal width / petal length / petal width in cm
class : Iris Setosa / Iris Versicolour / Iris Virginica
EM Application (3)
IRIS Data
EM Application (4)
Weka Usage

Weka Clustering Packages
Weka.clusterers

Command line Execution
Java weka.clusterers.EM –t iris.arff –N 2
Java weka.clusterers.EM –t iris.arff –N 2 -V

GUI Execution
Java –jar weka.jar
EM Application (4)
Weka Usage

Options for clustering in weka
-t
<training file>
Specify training file
-T
<test file>
Specify test file
-x
<number of folds>
Specify number of folds for crossvalidation
-s
<random number seed>
Specify random number seed
-l
<input file>
Specify input file for model
-d
<output file>
Specify ouput file for model
-p
Only output prediction for test
instances
EM Application (5)
Weka usage
EM Application (5)
Weka usage – input file format
% Summary Statistics:
%
Min Max
% sepal length: 4.3 7.9
%
sepal width: 2.0 4.4
% petal length: 1.0 6.9
%
petal width: 0.1 2.5
Mean SD
5.84 0.83
3.05 0.43
3.76 1.76
1.20 0.76
Class Correlation
0.7826
-0.4194
0.9490 (high!)
0.9565 (high!)
@RELATION iris
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
sepallength
sepalwidth
petallength
petalwidth
class
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
REAL
REAL
REAL
REAL
{Iris-setosa,Iris-versicolor,Iris-virginica}
EM Application (6)
Weka usage – output format
Number of clusters: 3
Cluster: 0 Prior probability: 0.3333
Attribute: sepallength
Normal Distribution. Mean = 5.006 StdDev = 0.3489
Attribute: sepalwidth
Normal Distribution. Mean = 3.418 StdDev = 0.3772
Attribute: petallength
Normal Distribution. Mean = 1.464 StdDev = 0.1718
Attribute: petalwidth
Normal Distribution. Mean = 0.244 StdDev = 0.1061
Attribute: class
Discrete Estimator. Counts = 51 1 1 (Total = 53)
0
1
2
50 ( 33%)
48 ( 32%)
52 ( 35%)
Log likelihood: -2.21138
EM Application (6)
Result Visualization
References

DataMining


DataMining, Concepts and Techiques.


Jiawei Han. Chapter 8.
The Expectation Maximization Algorithm


Morgan Cauffmann. IAN H. p218-p255.
Frank Dellaert, Febrary 2002.
A Gentle Tutorial of the EM Algorithm and its
Application to Parameter Estimation for
Gaussian Mixture and Hidden Markov Models.