Classification Methods ZeroR, OneR Saed Sayad University of Toronto

Download Report

Transcript Classification Methods ZeroR, OneR Saed Sayad University of Toronto

Classification Methods
ZeroR, OneR
Saed Sayad
5/25/2016
University of Toronto
1
ZeroR
is the simplest data mining method
Inputs
Target
Outlook
Temp.
Humidity
Windy
Play
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
Rainy
Mild
High
False
Yes
Rainy
Cool
Normal
False
Yes
Rainy
Cool
Normal
True
No
Overcast
Cool
Normal
True
Yes
Sunny
Mild
High
False
No
Sunny
Cool
Normal
False
Yes
Rainy
Mild
Normal
False
Yes
Sunny
Mild
Normal
True
Yes
Overcast
Mild
High
True
Yes
Overcast
Hot
Normal
False
Yes
Rainy
Mild
High
True
No
5/25/2016
University of Toronto
2
ZeroR
is about finding a class with the maximum frequency
Play
Play
No
No
No
No
Yes
No
Yes
No
Yes
No
No
Yes
Yes
Yes
Sort
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
5/25/2016
University of Toronto
5 / 14 = 0.36
max  f (C )
9 / 14 = 0.64
Yes
3
OneR Algorithm
For each attribute (input variable),
{
For each value of that attribute, make a rule as follows:
{
count how often each class appears
find the most frequent class
make the rule assign that class to this attribute-value
}
Calculate the error rate of the rules
}
Choose the rules with the smallest error rate
5/25/2016
University of Toronto
4
OneR
Outlook | No
Yes
-------------------------------------------Sunny
| 3
2
-------------------------------------------Overcast | 0
4
-------------------------------------------Rainy
| 2
3
Outlook
Temp
Humidity Windy
Play
Sunny
Hot
High
False
No
Sunny
Hot
High
True
No
Overcast
Hot
High
False
Yes
Rainy
Mild
High
False
Yes
Rainy
Cool
Normal
False
Yes
Rainy
Cool
Normal
True
No
Overcast
Cool
Normal
True
Yes
Sunny
Mild
High
False
No
Sunny
Cool
Normal
False
Yes
Rainy
Mild
Normal
False
Yes
Sunny
Mild
Normal
True
Yes
Overcast
Mild
High
True
Yes
Overcast
Hot
Normal
False
Yes
Rainy
Mild
High
True
No
5/25/2016
University of Toronto
Temp
| No
Yes
-------------------------------------------Hot
| 2
2
-------------------------------------------Mild
| 2
4
-------------------------------------------Cool
| 1
3
Humidity | No
Yes
-------------------------------------------High
| 4
3
-------------------------------------------Normal | 1
6
Windy
| No
Yes
-------------------------------------------False
| 2
6
-------------------------------------------True
| 3
3
5
OneR
Outlook
is a simple but powerful method which performs not much
worse than other complex methods
OneR
Outlook | No
Yes
-------------------------------------------Sunny
| 3
2
-------------------------------------------Overcast | 0
4
-------------------------------------------Rainy
| 2
3
5/25/2016
University of Toronto
Play
If Outlook = Sunny Then Play = No
If Outlook = Overcast Then Play = Yes
If Outlook = Rainy Then Play = Yes
6
OneR : Missing values and numeric attributes
5/25/2016
Outlook
Temp
Humidity Windy
Play
Sunny
85
High
False
No
Sunny
80
High
True
No
Overcast
83
High
False
Yes
Rainy
70
High
False
Yes
Rainy
68
?
False
Yes
Rainy
65
Normal
True
No
Overcast
64
Normal
True
Yes
Sunny
72
High
?
No
Sunny
69
Normal
False
Yes
Rainy
75
Normal
False
Yes
Sunny
75
Normal
True
Yes
?
72
High
True
Yes
Overcast
81
Normal
False
Yes
Rainy
71
High
True
No
University of Toronto
7
OneR : Numeric attributes discretization
1- The training examples are sorted according to the values of the numeric
attributes
64 65 68
69 70 71 72 72 75 75
yes no yes yes yes no no
80 81 83
yes yes yes no yes yes
85
Temp
no
Play
2- Partitioning the sequence by placing breakpoints wherever the class changes
64 65 68
69 70 71 72 72 75 75
yes no yes yes yes no no
5/25/2016
80 81 83
85
yes yes yes no yes yes no
University of Toronto
Temp
Play
8
OneR : Numeric attributes discretization (cont.)
3- A minimum number of examples (e.g., 3) of the majority class in each position
64 65 68
69 70 71 72 72 75 75
yes no yes yes yes no no
yes yes yes
80 81 83 85
Temp
no yes yes no
Play
4- Merge adjacent partitions which have the same majority class
64 65 68
69 70 71 72 72 75 75
yes no yes yes yes no no
yes yes yes
5- The rule for the discretized temperature:
5/25/2016
80 81 83 85
Temp
no yes yes no
Play
temp :  77.5  yes
 77.5  no
University of Toronto
9