Document 7636104
Download
Report
Transcript Document 7636104
Decision Tree
Rong Jin
Determine Milage Per Gallon
mpg
good
bad
bad
bad
bad
bad
bad
bad
:
:
:
bad
good
bad
good
bad
good
good
bad
good
bad
cylinders displacement horsepower
4
6
4
8
6
4
4
8
:
:
:
8
8
8
4
6
4
4
8
4
5
low
medium
medium
high
medium
low
low
high
:
:
:
high
high
high
low
medium
medium
low
high
low
medium
low
medium
medium
high
medium
medium
medium
high
:
:
:
high
medium
high
low
medium
low
low
high
medium
medium
weight
acceleration modelyear maker
low
medium
medium
high
medium
low
low
high
:
:
:
high
high
high
low
medium
low
medium
high
low
medium
high
medium
low
low
medium
medium
low
low
:
:
:
low
high
low
low
high
low
high
low
medium
medium
75to78
70to74
75to78
70to74
70to74
70to74
70to74
75to78
:
:
:
70to74
79to83
75to78
79to83
75to78
79to83
79to83
70to74
75to78
75to78
asia
america
europe
america
america
asia
asia
america
:
:
:
america
america
america
america
america
america
america
america
europe
europe
A Decision Tree for Determining MPG
mpg
cylinders
displacement horsepower weight
acceleration modelyear
maker
good
4
low
high
asia
low
low
From slides of Andrew Moore
75to78
Decision Tree Learning
Extremely popular method
Credit risk assessment
Medical diagnosis
Market analysis
Good at dealing with symbolic feature
Easy to comprehend
Compared to logistic regression model and
support vector machine
Representational Power
Q: Can trees represent arbitrary Boolean
expressions?
Q: How many Boolean functions are there over N
binary attributes?
How to Generate Trees from
Training Data
A Simple Idea
Enumerate all possible trees
Too many trees
Check how well each tree matches with the training
data
How to determine
the quality of
Pick the one work best
decision trees?
Problems ?
Solution: A Greedy Approach
Choose the most informative feature
Split data set
Recursive until each data item is classified
correctly
How to Determine the Best Feature?
Which feature is more
informative to MPG?
What metric should be
used?
Mutual Information !
From Andrew Moore’s slides
Mutual Information for Selecting
Best Features
P ( x, y )
P( x) P( y )
Y : MPG (good or bad), X : cylinder (3, 4, 6, 8)
I ( X ; Y ) x , y P ( x, y ) log
From Andrew Moore’s slides
Another Example: Playing Tennis
Example: Playing Tennis
Humidity
High
(3+, 4-)
(9+, 5-)
Norm
Weak
(6+, 1-)
P(h, p)
P(n, p)
P(n, p) log
P ( h) P ( p )
P ( n) P ( p )
P(h, p)
P(n, p)
P(h, p) log
P(n, p) log
P(h) P(p)
P(n) P(p)
0.151
I h P(h, p) log
Wind
(6+, 2-)
(9+, 5-)
Strong
(3+, 3-)
P( w, p)
P ( s, p )
P( s, p) log
P( w) P( p)
P( s) P( p)
P( w, p)
P( s, p)
P( w, p) log
P( s, p) log
P( w) P(p)
P( s ) P(p)
0.048
I w P( w, p) log
Predication for Nodes
What is the predication for each node?
From Andrew Moore’s slides
Predication for Nodes
Recursively Growing Trees
cylinders = 4
cylinders = 5
cylinders = 6
Original
Dataset
Partition it
according
to the value of
the attribute
we split on
cylinders = 8
From Andrew Moore slides
Recursively Growing Trees
Build tree from
These records..
cylinders = 4
Build tree from
These records..
cylinders = 5
Build tree from
These records..
cylinders = 6
Build tree from
These records..
cylinders = 8
From Andrew Moore slides
A Two Level Tree
Recursively
growing trees
When should We Stop Growing Trees?
Should we split
this node ?
Base Cases
Base Case One: If all records in current data subset have the
same output then don’t recurse
Base Case Two: If all records have exactly the same set of
input attributes then don’t recurse
Base Cases: An idea
Base Case One: If all records in current data subset have the
same output then don’t recurse
Base Case Two: If all records have exactly the same set of
input attributes then don’t recurse
Proposed Base Case 3:
If all attributes have zero information
gain then don’t recurse
Is this a good idea?
Old Topic: Overfitting
What should We do ?
Pruning
Pruning Decision Tree
Stop growing trees in time
Build the full decision tree as before.
But when you can grow it no more, start to
prune:
Reduced error pruning
Rule post-pruning
Reduced Error Pruning
Split data into training and validation set
Build a full decision tree over the training set
Keep removing node that maximally increases
validation set accuracy
Original Decision Tree
Pruned Decision Tree
Reduced Error Pruning
Rule Post-Pruning
Convert tree into rules
Prune rules by removing the preconditions
Sort final rules by their estimated accuracy
Most widely used method (e.g., C4.5)
Other methods: statistical significance test (chisquare)
Real Value Inputs
What should we do to deal with real value inputs?
mpg
good
bad
bad
bad
bad
bad
bad
bad
:
:
:
good
bad
good
bad
cylinders displacementhorsepower weight acceleration modelyear maker
4
6
4
8
6
4
4
8
:
:
:
97
199
121
350
198
108
113
302
:
:
:
4
8
4
5
75
90
110
175
95
94
95
139
:
:
:
120
455
107
131
2265
2648
2600
4100
3102
2379
2228
3570
:
:
:
79
225
86
103
18.2
15
12.8
13
16.5
16.5
14
12.8
:
:
:
2625
4425
2464
2830
77
70
77
73
74
73
71
78
:
:
:
18.6
10
15.5
15.9
82
70
76
78
asia
america
europe
america
america
asia
asia
america
:
:
:
america
america
europe
europe
Information Gain
x: a real value input
t: split value
Find the split value t such that the mutual
information I(x, y: t) between x and the class
label y is maximized.
Conclusions
Decision trees are the single most popular data
mining tool
Easy to understand
Easy to implement
Easy to use
Computationally cheap
It’s possible to get in trouble with overfitting
They do classification: predict a categorical output
from categorical and/or real inputs
Software
Most widely used decision tree C4.5 (or C5.0)
http://www2.cs.uregina.ca/~hamilton/courses/83
1/notes/ml/dtrees/c4.5/tutorial.html
Source code, tutorial
The End