TV 프로그램을 위한 내용기반 추천 - ai
Download
Report
Transcript TV 프로그램을 위한 내용기반 추천 - ai
EM Algorithm:
Expectation Maximazation
Clustering Algorithm
book: “DataMining, Morgan Kaufmann, Frank”
DataMining, Morgan Kaufmann,
p218-227
Mining Lab. 김완섭
2004년 10월 27일
Content
Clustering
K-Means via EM
Mixture Model
EM Algorithm
Simple examples of EM
EM Application; WEKA
References
Clustering (1/2)
Clustering ?
Clustering algorithms divide a data set into
natural groups (clusters).
Instances in the same cluster are similar to
each other, they share certain properties.
e.g Customer Segmentation.
Clustering vs. Classification
Supervised Learning
Unsupervised Learning
Not target variable to be predicted.
Clustering (2/2)
Categorization of Clustering Methods
Partitioning mehtods
Hierachical methods
DBSCAN / OPTICS
Grid-based methods
CURE / CHAMELON / BIRCH
Density-based methods
K-Means / K-medoids / PAM / CRARA / CRARANS
Model-Based Clustering
Probability-based Clustering
STING / CLIQUE / Wave-Cluster
Statistical Clustering
Model-based methods
EM / COBWEB / Bayesian / Neural
K-Means (1)
Algorithm
Step 0 :
Step 1 : (Assignment)
For each object compute distances to k centroids.
Assign each object to the cluster to which it is
the closest.
Step 2 : (New Centroids)
Select K objects as initial centroids.
Compute a new centroid for each cluster.
Step 3: (Converage)
Stop if the change in the centroids is less than
the selected covergence criterion.
Otherwise repeat Step 1.
K-Means (2)
simple example
Input Data
Random Centroids
Centroids & (check)
Assignment
Assignment
New Centroids
& (check)
New Centroids
& (Check)
Assignment
K-Means (3)
weakness on outlier (noise)
K-Means (4)
Calculation
0.
(4,4), (3,4)
(4,2), (0,2), (1,1), (1,0)
1. 1) <3.5, 4>
<1.5, 1.25>
2) <3.5, 4> - (3, 4), (4, 4), (4, 2)
<1.5, 1.25> - (0, 2) (1, 1), (1, 0)
2. 2) <3.67, 3.3>
<0.67, 1>
3) <3.67, 3.3> - (3, 4), (4, 4), (4, 2)
<0.67, 1> - (0, 2) (1, 1), (1, 0)
1.
(4,4), (3,4)
(4,2), (0,2), (1,1), (1,0) (100, 0)
1. 1) <3.5, 4>
<21, 1>
2) <3.5, 4> - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2)
<21, 1> - (100,1)
2. 1) <2.1, 2.1>
<100, 0>
2) <2.1, 2.1> - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2)
<100, 1> - (100, 1)
K-Means (5)
comparison with EM
C1
K-Means
Hard Clustering.
A instance belong to only one Cluster.
Based on Euclidean distance.
Not Robust on outlier, value range.
I
C2
EM
0.7
C1
0.3
Soft Clustering.
I
C2
A instance belong to several clusters with
membership probability.
Based on density probability.
Can handle both numeric and nominal attributes.
Mixture Model (1)
A Mixture is a set of k probability distributions,
repesenting k clusters.
A probability distribution have mean and
variances.
The mixture model combines several normal
distributions.
Mixture Model (2)
Only one numeric attribute
five parameter
A B A B pA
Mixture Model (3)
Simple Example
Probability that an instance x belongs to cluster A
Pr[x | A] Pr[A] f ( x; A, A ) p A
Pr[A | x]
Pr[x]
Pr[x]
1
f ( x; , )
e
2
( x )2
2 2
Probability Density Function
Mixture Model (4)
Probability Density Function
Normal Distribution
Gaussian Density Function
Poisson Distribution
1 x
(
1
p ( x)
e 2
2
1
f ( x; , )
e
2
)2
( x )2
2 2
Mixture Model (5)
Probability Density Function
Iteration
k
w
h 1
h
1, wh 0
Iteration
EM Algorithm (1)
Step 1. (Initialization)
Step 2. (Maximization Step)
Update record’s weight
Step 4.
Parameter Adjustment
Re-create cluster model
Re-compute the parameter Θ(mean, variance)
normal distribution.
Step 3. (Expectation Step)
Random probability
Calculate log-likelihood
If the value saturates, exit
If not, Go to Step 2.
Weight Adjustment
EM Algorithm (2)
Initialization
Random Probability
M-Step
Example
Num
Math
English
Cluster1
Cluster2
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
EM Algorithm (3)
M-Step : Parameter (Mean, Dev)
Estimating parameters from weighted
instances
Parameters
A B A B
means, deviations.
w1 x1 w2 x2 ... wn xn
A
w1 w2 ... wn
2
2
2
w
(
x
)
w
(
x
)
...
w
(
x
)
2
2
n
n
2 1 1
w1 w2 ... wn
17
EM Algorithm (3) A B A B
M-Step : Parameter (Mean, Dev)
Num
Math
English
Cluster-A
Cluster-B
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
A,math
A,math
0.25 80 0.8 50 ... 0.6 60
49
0.25 0.8 0.43 0.7 0.15 0.6
2
0.25(80 49) 2 0.8(50 49) 2 ... 0.6(60 49) 2
376
0.25 0.8 0.43 0.7 0.15 0.6
18
EM Algorithm (4)
E-Step : Weight
compute weight
wh ( x)
wh f h ( x | h )
wi fi ( x | h )
i
here
d
f h ( x | h ) f h , j ( x j | h )
j 1
1 x
(
1
p ( x)
e 2
2
)2
EM Algorithm (5)
E-Step : Weight
compute weight
Num
Math
English
1
80
90
0.25 f c1 ( x | c1 )
wc1 (1번학생)
0.25 f c1 (1번학생| c1 ) 0.75 f c 2 (1번학생| c 2 )
0.25 * 0.0016
0. ???
0.25 * 0.0016 0.75 * 0.00863
here
f c1 ( x | c1 ) f c1,math (학생1 | c1,math ) fc1,eng (학생1 | c1,eng ) 0.0016
1 x 49 2
)
19
(
1
e 2
19 2
0.025
1 x 77 2
)
11
(
1
e 2
11 2
0.064
EM Algorithm (6)
Objective Function (check)
Log-likelihood Function
For all instances, it’s probability belong to cluster A,
Use log for analysis
log( p
A
P r[xi | A] pB P r[xi | B]
i
k
L() log( wh f h ( x | h , h))
xD
h 1
f h ( x | h , h )
1
(2 ) d | h
1-Dimensional data
2-Cluster A,B
N-Dimensional data
K-cluster
h
- Mean vector
- Covariance matrix
1
exp( ( x h )T (h ) 1 ( x h ))
2
|
h
EM Algorithm (7)
Objective Function (check)
c1
19 0
0
11
c1 49,77
1
f ( x; , )
e
2
f h ( x | h , h )
( x )2
2 2
- Covariance Matrix
- Mean Vector
1
(2 ) d | h
1
exp( ( x h )T (h ) 1 ( x h ))
2
|
h
h
EM Algorithm (8)
Termination
Termination
| L( j ) L( j 1 ) |
Procedure stops when log-likelihood saturates.
Q4
Q3
Q2
Q1
Q0
# of Iteration
EM Algorithm (1)
Simple Data
EM example
6 data (3 sample per 1 class)
2 class (circle, rectangle)
EM Algorithm (2)
log( p
A
P r[xi | A] pB P r[xi | B]
i
Likelihood function of two component means Θ1, Θ2
EM Algorithm (3)
EM Example (1)
Example dataset
2 Column(Math, English), 6 record
Num
Math
English
1
80
90
2
60
75
3
80
90
4
30
60
5
75
100
6
15
80
EM Example (2)
Distri. Of Math
mean : 56.67
variance : 776.73
100
Distri. Of Eng
mean : 82.5
variance : 197.50
50
0
50
100
EM Example (3)
Random Cluster Weight
Num
Math
English
Cluster1
Cluster2
1
80
90
0.25
0.75
2
50
75
0.8
0.2
3
85
100
0.43
0.57
4
30
70
0.7
0.3
5
95
85
0.15
0.85
6
60
80
0.6
0.40
2.93
3.07
EM Example (4)
Iteration 1
Maximization Step
(parameter adjustment)
EM Example (4)
k
L() log( wh f h ( x | h , h))
xd
h 1
f h ( x | h , h )
1
(2 ) d | h
1
exp( ( x h )T (h ) 1 ( x h ))
2
|
1번 학생의 cluster1에대한 계산의예
f h ( x | h , h )
1 80 49
exp( ( )T
2 90 77
19 0
2
|
(2 ) |
0 11
1
1
19 0
80 49
( ))
0 11
90 77
1
0
31
1
1
19
)
exp( 31 13
2
1
2
0
13
(2 ) 19 11
11
1
1 312 11 132 19
exp(
)
2
2
19
11
(2 ) 19 11
EM Example (5)
Iteration 2
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Example (6)
Iteration 3
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Example (6)
Iteration 3
Expectation Step
(Weight adjustment)
Maximization Step
(parameter adjustment)
EM Application (1)
Weka
Weka
Waikato University in Newzealand
Open Source Mining Tool
http://www.cs.waikato.ac.nz/ml/weka
Experiment Data
Iris data
Real Data
Department Customer Data
Modified Customer Data
EM Application (2)
IRIS Data
Data Info
Attribute Information:
sepal length in cm / sepal width / petal length / petal width in cm
class : Iris Setosa / Iris Versicolour / Iris Virginica
EM Application (3)
IRIS Data
EM Application (4)
Weka Usage
Weka Clustering Packages
Weka.clusterers
Command line Execution
Java weka.clusterers.EM –t iris.arff –N 2
Java weka.clusterers.EM –t iris.arff –N 2 -V
GUI Execution
Java –jar weka.jar
EM Application (4)
Weka Usage
Options for clustering in weka
-t
<training file>
Specify training file
-T
<test file>
Specify test file
-x
<number of folds>
Specify number of folds for crossvalidation
-s
<random number seed>
Specify random number seed
-l
<input file>
Specify input file for model
-d
<output file>
Specify ouput file for model
-p
Only output prediction for test
instances
EM Application (5)
Weka usage
EM Application (5)
Weka usage – input file format
% Summary Statistics:
%
Min Max
% sepal length: 4.3 7.9
%
sepal width: 2.0 4.4
% petal length: 1.0 6.9
%
petal width: 0.1 2.5
Mean SD
5.84 0.83
3.05 0.43
3.76 1.76
1.20 0.76
Class Correlation
0.7826
-0.4194
0.9490 (high!)
0.9565 (high!)
@RELATION iris
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
@ATTRIBUTE
sepallength
sepalwidth
petallength
petalwidth
class
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
REAL
REAL
REAL
REAL
{Iris-setosa,Iris-versicolor,Iris-virginica}
EM Application (6)
Weka usage – output format
Number of clusters: 3
Cluster: 0 Prior probability: 0.3333
Attribute: sepallength
Normal Distribution. Mean = 5.006 StdDev = 0.3489
Attribute: sepalwidth
Normal Distribution. Mean = 3.418 StdDev = 0.3772
Attribute: petallength
Normal Distribution. Mean = 1.464 StdDev = 0.1718
Attribute: petalwidth
Normal Distribution. Mean = 0.244 StdDev = 0.1061
Attribute: class
Discrete Estimator. Counts = 51 1 1 (Total = 53)
0
1
2
50 ( 33%)
48 ( 32%)
52 ( 35%)
Log likelihood: -2.21138
EM Application (6)
Result Visualization
References
DataMining
DataMining, Concepts and Techiques.
Jiawei Han. Chapter 8.
The Expectation Maximization Algorithm
Morgan Cauffmann. IAN H. p218-p255.
Frank Dellaert, Febrary 2002.
A Gentle Tutorial of the EM Algorithm and its
Application to Parameter Estimation for
Gaussian Mixture and Hidden Markov Models.