Transcript Document

Decision Tree Learning
Presented by Ping Zhang
Nov. 26th, 2007
Introduction



Decision tree learning is one of the most
widely used and practical method for
inductive inference
Decision tree learning is a method for
approximating discrete-valued target
functions, in which the learned function is
represented by a decision tree
Decision tree learning is robust to noisy data
and capable of learning disjunctive
expressions
Decision tree representation


Decision tree classify instances by sorting
them down the tree from the root to some
leaf node, which provides the classification of
the instance
Each node in the tree specifies a test of some
attribute of the instance, and each branch
descending from that node corresponds to
one of the possible values for this attributes
Decision Tree for PlayTennis
When to Consider Decision
Trees




Instances describable by attribute-value pairs
Target function is discrete valued
Disjunctive hypothesis may be required
Possibly noisy training data
Examples (Classification problems):
 Equipment or medical diagnosis
 Credit risk analysis
Top-Down Induction of
Decision Trees
Entropy (1)
Entropy (2)
Information Gain
Training Examples
Selecting the Next Attribute
Which attribute should be
tested here?
Hypothesis Space Search by
ID3

Hypothesis space is complete
Target function surely in there


Only outputs a single hypothesis
No back tracking
Local minima

Statically-based search choices
Robust to noisy data

Inductive bias: “prefer shortest tree”
From ID3 to C4.5
C4.5 made a number of improvements
to ID3. Some of these are:
 Handling both continuous and discrete
attributes
 Handling training data with missing
attribute value
 Handling attributes with differing costs
 Pruning trees after creation
Overfitting in Decision Trees
Reduced-Error Pruning
Rule Post-Pruning




Convert tree to equivalent set of rules
Prune each rule by removing any preconditions that result in improving
its estimated accuracy
Sort the pruned rules by their estimated accuracy, and consider them in
this sequence when classifying subsequent instance
Perhaps most frequently used method
Continuous Valued Attributes



Create a discrete attribute to test continuous
There are two candidate thresholds
The information gain can be computed for
each of the candidate attributes,
Temperature>54 and Temperature>85, and the
best can be selected(Temperature>54)
Attributes with many Values
Problems:

If attribute has many values, Gain will select it

Imagine using the attribute Data. It would have the highest
information gain of any of attributes. But the decision tree is not
useful.
Missing Attribute Values
Attributes with Costs

Consider
Medical diagnosis, BloodTset has cost 150 dallors

How to learn a consistent tree with low
expected cost?
Conclusion






Decision Tree Learning is
Simple to understand and interpret
Requires little data preparation
Able to handle both numerical and categorical
data
Use a white box model
Possible to validate a model using statistical
tests
Robust, perform well with large data in a
short time