Transcript Document

Classification with
Decision Trees and Rules
Evgueni Smirnov
Overview
• Classification Problem
• Decision Trees for Classification
• Decision Rules for Classification
Classification Task
Given:
• X is an instance space defined as {Xi}i ∈1..N
where Xi is a discrete/continuous variable.
• Y is a finite class set.
• Training data D ⊆ X x Y.
Find:
• Class y∈ Y of an instance x ∈X.
Instances, Classes, Instance Spaces
A class is a set of objects in a world that are unified by a reason. A
reason may be a similar appearance, structure or function.
friendly robots
Example. The set: {children, photos, cat, diplomas} can be
viewed as a class “Most important things to take out of your
apartment when it catches fire”.
Instances, Classes, Instance Spaces
X
head = square
body = round
smiling = yes
holding = flag
color = yellow
friendly robots
Instances, Classes, Instance Spaces
smiling = yes 
friendly robots
H
M
X
head = square
body = round
smiling = yes
holding = flag
color = yellow
friendly robots
Classification problem
H
M

X



Decision Trees for Classification
• Classification Problem
• Definition of Decision Trees
• Variable Selection: Impurity Reduction,
Entropy, and Information Gain
• Learning Decision Trees
• Overfitting and Pruning
• Handling Variables with Many Values
• Handling Missing Values
• Handling Large Data: Windowing
Decision Trees for Classification
• A decision tree is a tree where:
– Each interior node tests a variable
– Each branch corresponds to a variable value
– Each leaf node is labelled with a class (class node)
A1
a11
a13
a12
A2
c1
A3
a21
a22
a31
c1
c2
c2
a32
c1
A simple database: playtennis
Day
Outlook
Temperature
Humidity
Wind
Play
Tennis
D1
Sunny
Hot
High
Weak
No
D2
Sunny
Hot
High
Strong
No
D3
Overcast
Hot
High
Weak
Yes
D4
Rain
Mild
Normal
Weak
Yes
D5
Rain
Cool
Normal
Weak
Yes
D6
Rain
Cool
Normal
Strong
No
D7
Overcast
Cool
High
Strong
Yes
D8
Sunny
Mild
Normal
Weak
No
D9
Sunny
Hot
Normal
Weak
Yes
D10
Rain
Mild
Normal
Strong
Yes
D11
Sunny
Cool
Normal
Strong
Yes
D12
Overcast
Mild
High
Strong
Yes
D13
Overcast
Hot
Normal
Weak
Yes
D14
Rain
Mild
High
Strong
No
Decision Tree For Playing Tennis
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
yes
yes
no
Classification with Decision Trees
Classify(x: instance, node: variable containing a node of DT)
• if node is a classification node then
– return the class of node;
• else
– determine the child of node that match x.
– return Classify(x, child).
A1
a11
a13
a12
A2
c1
A3
a21
a22
a31
c1
c2
c2
a32
c1
Decision Tree Learning
Basic Algorithm:
1. Xi  the “best" decision variable for a node N.
2. Assign Xi as decision variable for the node N.
3. For each value of Xi, create new descendant of N.
4. Sort training examples to leaf nodes.
5. IF training examples perfectly classified,
THEN
Stop.
ELSE
Iterate over new leaf nodes.
Variable Quality Measures
Outlook
Sunny
Rain
Overcast
____________________________________
Outlook Temp Hum Wind Play
------------------------------------------------------Sunny
Hot
High Weak No
Sunny
Hot
High Strong No
Sunny
Mild
High Weak No
Sunny
Cool
Normal Weak Yes
Sunny
Mild
Normal Strong Yes
_____________________________________
Outlook
Temp Hum Wind Play
--------------------------------------------------------Overcast
Hot
High
Weak Yes
Overcast
Cool Normal Strong Yes
_____________________________________
Outlook
Temp Hum Wind Play
--------------------------------------------------------Rain
Mild
High
Weak Yes
Rain
Cool
Normal Weak Yes
Rain
Cool
Normal Strong No
Rain
Mild
Normal Weak Yes
Rain
Mild
High
Strong No
Variable Quality Measures
• Let S be a sample of training instances and pj be
the proportions of instances of class j (j=1,…,J)
in S.
• Define an impurity measure I(S) that satisfies:
– I(S) is minimum only when pi=1 and pj=0 for ji
(all objects are of the same class);
– I(S) is maximum only when pj =1/J
(there is exactly the same number of objects of all
classes);
– I(S) is symmetric with respect to p1,…,pJ;
Reduction of Impurity: Discrete Variables
• The “best” variable is the variable Xi that determines a
split maximizing the expected reduction of impurity:
D I ( S, Xi) = I ( S ) - 
j
| Sx |
I (Sxij )
|S|
ij
where Sxij is the subset of instances from S s.t. Xi=xij.
Xi
Sxi1
Sxi2
…….
Sxij
Information Gain: Entropy
Let S be a sample of training examples, and
p+ is the proportion of positive examples in S and
p- is the proportion of negative examples in S.
Then: entropy measures the impurity of S:
E( S) = - p+ log2 p+ – p- log2 p-
Entropy Example
In the Play Tennis dataset we had two target
classes: yes and no
Out of 14 instances, 9 classified yes, rest no
 9 
 9 


p yes = log2   = 0.41
 14 
 14 








pno = -  5  log2  5  = 0.53
 14 
 14 




E (S ) = p yes  pno = 0.94
Outlook
Temp.
Humidi
ty
Windy
Play
Outlook
Temp.
Humidi
ty
Windy
play
Sunny
Hot
High
False
No
Sunny
Mild
High
False
No
Sunny
Hot
High
True
No
Sunny
Cool
Normal
False
Yes
Overcas
t
Hot
High
False
Yes
Rainy
Mild
Normal
False
Yes
Rainy
Mild
High
False
Yes
Sunny
Mild
Normal
True
Yes
Rainy
Cool
Normal
False
Yes
Overcas
t
Mild
High
True
Yes
Rainy
Cool
Normal
True
No
Overcas
t
Hot
Normal
False
Yes
Overcas
t
Cool
Normal
True
Yes
Rainy
Mild
High
True
No
Information Gain
Information Gain is the expected reduction in entropy
caused by partitioning the instances from S according
to a given discrete variable.
| S xij |
E ( S xij )
Gain(S, Xi) = E(S) - 
|S|
j
where Sxij is the subset of instances from S s.t. Xi=xij.
Xi
Sxi1
Sxi2
…….
Sxij
Example
Outlook
Sunny
Rain
Overcast
____________________________________
Outlook Temp Hum Wind Play
------------------------------------------------------Sunny
Hot
High Weak No
Sunny
Hot
High Strong No
Sunny
Mild
High Weak No
Sunny
Cool
Normal Weak Yes
Sunny
Mild
Normal Strong Yes
_____________________________________
Outlook
Temp Hum Wind Play
--------------------------------------------------------Overcast
Hot
High
Weak Yes
Overcast
Cool Normal Strong Yes
_____________________________________
Outlook
Temp Hum Wind Play
--------------------------------------------------------Rain
Mild
High
Weak Yes
Rain
Cool
Normal Weak Yes
Rain
Cool
Normal Strong No
Rain
Mild
Normal Weak Yes
Rain
Mild
High
Strong No
Which attribute should be tested here?
Gain (Ssunny , Humidity) = = .970 - (3/5) 0.0 - (2/5) 0.0 = .970
Gain (Ssunny , Temperature) = .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0 = .570
Gain (Ssunny , Wind) = .970 - (2/5) 1.0 - (3/5) .918 = .019
Continuous Variables
Temp.
Play
Temp.
Play
80
No
64
Yes
85
No
65
No
83
Yes
68
Yes
75
Yes
69
Yes
68
Yes
70
Yes
65
No
71
No
64
Yes
72
No
72
No
72
Yes
75
Yes
75
Yes
70
Yes
75
Yes
69
Yes
80
No
72
Yes
81
Yes
81
Yes
83
Yes
71
No
85
No
Sort
Temp.< 64.5
DI=0.048
Temp.< 66.5
DI=0.010
Temp.< 70.5
DI=0.045
Temp.< 73.5
DI=0.001
Temp.< 77.5
DI=0.025
Temp.< 80.5
DI=0.000
Temp.< 84
DI=0.113
ID3 Algorithm
Informally:
– Determine the variable with the highest
information gain on the training set.
– Use this variable as the root, create a branch for
each of the values the attribute can have.
– For each branch, repeat the process with subset
of the training set that is classified by that
branch.
Hypothesis Space Search in ID3
• The hypothesis space is
the set of all decision trees
defined over the given set
of variables.
• ID3’s hypothesis space is
a compete space; i.e., the
target tree is there!
• ID3 performs a simple-tocomplex, hill climbing
search through this space.
Hypothesis Space Search in ID3
• The evaluation function is
the information gain.
• ID3 maintains only a single
current decision tree.
• ID3
performs
no
backtracking in its search.
• ID3 uses all training
instances at each step of the
search.
Decision Trees are Non-linear Classifiers
A2<0.33 ?
yes
no
good
A1<0.91 ?
A2
1
A2<0.91 ?
A1<0.23 ?
0
0
good
A2<0.75 ?
bad
A2<0.49 ?
good
bad
A2<0.65 ?
bad
good
bad
1
A1
Posterior Class Probabilities
Outlook
Sunny
no: 2 pos and 3 neg
Ppos = 0.4, Pneg = 0.6
Overcast
Rainy
Windy
no: 2 pos and 0 neg
Ppos = 1.0, Pneg = 0.0
False
no: 0 pos and 2 neg
Ppos = 0.0, Pneg = 1.0
True
no: 3 pos and 0 neg
Ppos = 1.0, Pneg = 0.0
Overfitting
Definition: Given a hypothesis space H, a hypothesis h  H is
said to overfit the training data if there exists some hypothesis
h’  H, such that h has smaller error that h’ over the training
instances, but h’ has a smaller error that h over the entire
distribution of instances.
Reasons for Overfitting
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
yes
yes
no
• Noisy training instances. Consider an noisy training example:
Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = No
This instance affects the training instances:
Outlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes
Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes
Reasons for Overfitting
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
Windy
yes
no
false
true
yes
Temp
mild
yes
cool
?
Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = No
Outlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes
Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes
high
no
Reasons for Overfitting
• Small number of instances are associated with leaf nodes. In
this case it is possible that for coincidental regularities to occur
that are unrelated to the actual borders.
-
+ + +
+
+ ++
-
-
area with probably
wrong predictions
- +
-
-
-
- -
-
-
Approaches to Avoiding Overfitting
• Pre-pruning: stop growing the tree earlier,
before it reaches the point where it perfectly
classifies the training data
• Post-pruning: Allow the tree to overfit the
data, and then post-prune the tree.
Pre-pruning
• It is difficult to decide when to stop growing the tree.
• A possible scenario is to stop when the leaf nodes get less
than m training instances. Here is an example for m = 5.
Outlook
Outlook
Sunny
Overcast
Humidity
yes
High
Normal
no
yes
3
Rainy
2
2
Sunny
Overcast
no
?
Rainy
Windy
False
True
yes
no
3
2
yes
Validation Set
• Validation set is a set of instances used to evaluate
the utility of nodes in decision trees. The validation
set has to be chosen so that it is unlikely to suffer
from same errors or fluctuations as the set used for
decision-tree training.
• Usually before pruning the training data is split
randomly into a growing set and a validation set.
Reduced-Error Pruning
(Sub-tree replacement)
Split data into
validation sets.
growing
and
Pruning a decision node d consists of:
1. removing the subtree rooted at d.
2. making d a leaf node.
3. assigning d the most common
classification of the training
instances associated with d.
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
yes
yes
no
3 instances
2 instances
Accuracy of the tree on the validation
set is 90%.
Reduced-Error Pruning
(Sub-tree replacement)
Split data into
validation sets.
growing
and
Pruning a decision node d consists of:
1. removing the subtree rooted at d.
2. making d a leaf node.
3. assigning d the most common
classification of the training
instances associated with d.
Outlook
sunny
overcast
rainy
no
yes
Windy
false
true
yes
no
Accuracy of the tree on the validation
set is 92.4%.
Reduced-Error Pruning
(Sub-tree replacement)
Split data into
validation sets.
growing
and
Pruning a decision node d consists of:
1. removing the subtree rooted at d.
2. making d a leaf node.
3. assigning d the most common
classification of the training
instances associated with d.
Do until further pruning is harmful:
1. Evaluate impact on validation set
of pruning each possible node
(plus those below it).
2. Greedily remove the one that most
improves validation set accuracy.
Outlook
sunny
overcast
rainy
no
yes
Windy
false
true
yes
no
Accuracy of the tree on the validation
set is 92.4%.
Reduced-Error Pruning
(Sub-tree replacement)
T1
Outlook
Sunny
yes
Normal
no
Wind
yes
Weak
Strong
Weak
Strong
yes
no
yes
Cool,Hot
Mild
no
Rain
Overcast
no
Wind
no
Temp.
Outlook
Sunny
Overcast
Humidity
High
T3
Rain
ErrorGS=13%, ErrorVS=15%
yes
ErrorGS=0%, ErrorVS=10%
T4
Outlook
Sunny
no
T2
Outlook
Sunny
Humidity
High
no
yes
yes
Rain
yes
ErrorGS=27%, ErrorVS=25%
Rain
Overcast
yes
Normal
Overcast
Wind
Strong
Weak
no
ErrorGS=6%, ErrorVS=8%
yes
T5
yes
ErrorGS=33%, ErrorVS=35%
Reduced Error Pruning Example
Reduced-Error Pruning
(Sub-tree raising)
Split data into
validation sets.
growing
and
Raising a sub-tree with root d
consists of:
1. removing the sub-tree rooted at
the parent of d.
2. place d at the place of its parent.
3. Sort
the
training
instances
associated with the parent of d
using the sub-tree with root d .
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
yes
yes
no
3 instances
2 instances
Accuracy of the tree on the validation
set is 90%.
Reduced-Error Pruning
(Sub-tree raising)
Split data into
validation sets.
growing
and
Raising a sub-tree with root d
consists of:
1. removing the sub-tree rooted at
the parent of d.
2. place d at the place of its parent.
3. Sort
the
training
instances
associated with the parent of d
using the sub-tree with root d .
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
high
normal
false
true
no
yes
yes
no
3 instances
2 instances
Accuracy of the tree on the validation
set is 90%.
Reduced-Error Pruning
(Sub-tree raising)
Split data into
validation sets.
growing
and
Raising a sub-tree with root d
consists of:
1. removing the sub-tree rooted at
the parent of d.
2. place d at the place of its parent.
3. Sort
the
training
instances
associated with the parent of d
using the sub-tree with root d .
Humidity
high
normal
no
yes
Accuracy of the tree on the validation
set is 73%. So, No!
Rule Post-Pruning
1. Convert tree to equivalent set of rules.
2. Prune each rule independently of others.
3. Sort final rules by their estimated accuracy, and consider them
in this sequence when classifying subsequent instances.
Outlook
sunny
overcast
rainy
Humidity
yes
Windy
IF (Outlook = Sunny) & (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) & (Humidity = Normal)
THEN PlayTennis = Yes
……….
false
normal
false
true
no
yes
yes
no
Decision Tree are non-linear. Can we
make them linear?
A2<0.33 ?
yes
no
good
A1<0.91 ?
A2
1
A2<0.91 ?
A1<0.23 ?
0
0
good
A2<0.75 ?
bad
A2<0.49 ?
good
bad
A2<0.65 ?
bad
good
bad
1
A1
Oblique Decision Trees
x+y<1
Class = +
• Test condition may involve multiple attributes
• More expressive representation
• Finding optimal test condition is computationally expensive!
Class =
Variables with Many Values
Letter
a
b
c
…
y
z
• Problem:
– Not good splits: they fragment the data too quickly,
leaving insufficient data at the next level
– The reduction of impurity of such test is often high
(example: split on the object id).
• Two solutions:
– Change the splitting criterion to penalize variables with
many values
– Consider only binary splits
Variables with Many Values
c
| Si |
| Si |
SplitInfo( S , A) = -
log2
|S|
i =1 | S |
Gain( S , A)
GainRatio=
SplitInfo( S , A)
• Example: outlook in the playtennis
– InfoGain(outlook) = 0.246
– Splitinformation(outlook) = 1.577
– Gainratio(outlook) = 0.246/1.577=0.156 < 0.246
• Problem: the gain ratio favours unbalanced tests
Variables with Many Values
Variables with Many Values
Missing Values
1. If node n tests variable Xi, assign most common
value of Xi among other instances sorted to node n.
2. If node n tests variable Xi, assign a probability to
each of possible values of Xi. These probabilities
are estimated based on the observed frequencies of
the values of Xi. These probabilities are used in the
information gain measure (via info gain).
D I ( S, Xi) = I ( S ) - 
j
| Sx |
I (Sxij )
|S|
ij
Windowing
If the data don’t fit main memory use windowing:
1. Select randomly n instances from the training data
D and put them in window set W.
2. Train a decision tree DT on W.
3. Determine a set M of instances from D
misclassified by DT.
4. W = W U M.
5. IF Not(StopCondition) THEN GoTo 2;
Summary Points
1. Decision tree learning provides a practical method for
concept learning.
2. ID3-like algorithms search complete hypothesis space.
3. The inductive bias of decision trees is preference (search)
bias.
4. Overfitting the training data is an important issue in
decision tree learning.
5. A large number of extensions of the ID3 algorithm have
been proposed for overfitting avoidance, handling missing
attributes, handling numerical attributes, etc.
Learning Decision Rules
• Decision Rules
• Basic Sequential Covering Algorithm
• Learn-One-Rule Procedure
• Pruning
Definition of Decision Rules
Definition: Decision rules are rules with the following form:
if <conditions> then concept C.
Example: If you run the Prism algorithm from Weka on the weather
data you will get the following set of decision rules:
if outlook = overcast then PlayTennis = yes
if humidity = normal and windy = FALSE then PlayTennis = yes
if temperature = mild and humidity = normal then PlayTennis = yes
if outlook = rainy and windy = FALSE then PlayTennis = yes
if outlook = sunny and humidity = high then PlayTennis = no
if outlook = rainy and windy = TRUE then PlayTennis = no
Why Decision Rules?
• Decision rules are more compact.
• Decision rules are more understandable.
X
Example: Let X {0,1}, Y {0,1},
Z {0,1}, W {0,1}. The rules are:
1
0
Z
Y
if X=1 and Y=1 then 1
if Z=1 and W=1 then 1
1
0
1
0
1
Z
W
0
Otherwise 0;
1
0
1
0
W
0
1
0
1
0
1
0
Why Decision Rules?
Decision boundaries of decision trees
++
+
+
+ +
-
-
-
+
+
+
-
+
+
+
-
-
-
-
-
-
-
-
Decision boundaries of decision rules
++
+
+
+ +
-
-
-
-
+
+
+
+
-
-
-
+
+
-
-
-
-
How to Learn Decision Rules?
1. We can convert trees to rules
2. We can use specific rule-learning methods
Sequential Covering Algorithms
function LearnRuleSet(Target, Attrs, Examples, Threshold):
LearnedRules := 
Rule := LearnOneRule(Target, Attrs, Examples)
while performance(Rule,Examples) > Threshold, do
LearnedRules := LearnedRules  {Rule}
Examples := Examples \ {examples covered by Rule}
Rule := LearnOneRule(Target, Attrs, Examples)
sort LearnedRules according to performance
return LearnedRules
Illustration
+
-
-
+
+
+
+
+
+
+
+
-
+
+
+
-
-
-
IF true THEN pos
-
-
-
Illustration
+
-
-
+
+
+
+
+
+
+
+
-
+
+
+
-
-
-
IF A
true
THEN
THEN
pospos
-
-
-
Illustration
+
-
-
+
+
+
+
+
+
+
-
+
+
+
+
-
-
-
-
IF A
true
THEN
& THEN
B THEN
pospospos
-
-
Illustration
+
-
-
+
+
+
+
+
+
+
-
+
+
+
+
-
-
-
-
IF A & B THEN pos
IF true THEN pos
-
-
Illustration
+
-
-
+
+
+
+
+
+
+
-
+
+
+
+
-
-
-
-
IF A & B THEN pos
IF C
true
THEN
THEN pos
pos
-
-
Illustration
+
-
-
+
+
+
+
+
+
+
-
+
+
+
+
-
-
-
-
IF A & B THEN pos
IF C
true
THEN
& THEN
D THEN
pos
pospos
-
-
Learning One Rule
• To learn one rule we use one of the
strategies below:
• Top-down:
– Start with maximally general rule
– Add literals one by one
• Bottom-up:
– Start with maximally specific rule
– Remove literals one by one
Bottom-up vs. Top-down
Bottom-up: typically more specific rules
+
-
-
+
+
+
+
+
+
+
+
-
-
+
+
+
-
-
-
-
-
Top-down: typically more general rules
-
Learning One Rule
Bottom-up:
• Example-driven (AQ family).
Top-down:
• Generate-then-Test (CN-2).
Example of Learning One Rule
Heuristics for Learning One Rule
– When is a rule “good”?
• High accuracy;
• Less important: high coverage.
– Possible evaluation functions:
• Relative frequency: nc/n, where nc is the number of
correctly classified instances, and n is the number of
instances covered by the rule;
• m-estimate of accuracy: (nc+ mp)/(n+m), where nc is
the number of correctly classified instances, n is the
number of instances covered by the rule, p is the prior
probablity of the class predicted by the rule, and m is
the weight of p.
• Entropy.
How to Arrange the Rules
1. The rules are ordered according to the order they have been
learned. This order is used for instance classification.
2. The rules are ordered according to their accuracy. This
order is used for instance classification.
3. The rules are not ordered but there exists a strategy how to
apply the rules (e.g., an instance covered by conflicting
rules gets the classification of the rule that classifies
correctly more training instances; if an instance is not
covered by any rule, then it gets the classification of the
majority class represented in the training data).
Approaches to Avoiding Overfitting
• Pre-pruning: stop learning the decision
rules before they reach the point where they
perfectly classify the training data
• Post-pruning: allow the decision rules to
overfit the training data, and then postprune the rules.
Post-Pruning
1.
2.
3.
4.
Split instances into Growing Set and Pruning Set;
Learn set SR of rules using Growing Set;
Find the best simplification BSR of SR.
while (Accuracy(BSR, Pruning Set) >
Accuracy(SR, Pruning Set) )
do
4.1
SR = BSR;
4.2
Find the best simplification BSR of SR.
5. return BSR;
Incremental Reduced Error Pruning
Post-pruning
D1
D2
D3
D1 D21
D3
D22
Incremental Reduced Error Pruning
1.
2.
3.
4.
4.1
4.2
4.2
5.
Split Training Set into Growing Set and Validation Set;
Learn rule R using Growing Set;
Prune the rule R using Validation Set;
if performance(R, Training Set) > Threshold
Add R to Set of Learned Rules
Remove in Training Set the instances covered by R;
go to 1;
else return Set of Learned Rules
Summary Points
1. Decision rules are easier for human comprehension
than decision trees.
2. Decision rules have simpler decision boundaries than
decision trees.
3. Decision rules are learned by sequential covering of
the training instances.
Lab 1: Some Details
Model Evaluation Techniques
• Evaluation on the training set: too optimistic
Classifier
Training set
Training set
Model Evaluation Techniques
• Hold-out Method: depends on the make-up
of the test set.
Classifier
Training set
Test set
Data
• To improve the precision of the hold-out method:
it is repeated many times.
Model Evaluation Techniques
• k-fold Cross Validation
Classifier
Data
train
train
test
train
test
train
test
train
train
Intro to Weka
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {TRUE, FALSE}
@data
sunny,hot,high,FALSE,FALSE
sunny,hot,high,TRUE,FALSE
overcast,hot,high,FALSE,TRUE
rainy,mild,high,FALSE,TRUE
rainy,cool,normal,FALSE,TRUE
rainy,cool,normal,TRUE,FALSE
overcast,cool,normal,TRUE,TRUE
………….
References
• Mitchell, Tom. M. 1997. Machine Learning. New York: McGraw-Hill
• Quinlan, J. R. 1986. Induction of decision trees. Machine Learning
• Stuart Russell, Peter Norvig, 2010. Artificial Intelligence: A Modern
Approach. New Jersey: Prantice Hall