Recitation, January 17 slides

Download Report

Transcript Recitation, January 17 slides

Decision Trees
10-601 Recitation
1/17/08
Mary McGlohon
[email protected]
• HWAnnouncements
1 out- DTs and basic probability
• Due Mon, Jan 28 at start of class
• Matlab
• High-level language, specialized for
matrices
• Built-in plotting software, lots of math
libraries
• On campus lab machines
• Interest in tutorial?
AttendClass?
Raining
True
False
Is10601
True
Yes
Material
Before10
No
Yes
False
New
True
Represent as a logical
expression.
False
Yes
Old
No
AttendClass?
Raining
True
False
Is10601
True
Yes
AttendClass = Yes if:
(Raining = False) OR
(Is10601 = True) OR
(Material = New AND
Before10 =False)
Material
Before10
No
Yes
False
New
True
Represent as a logical
expression.
False
Yes
Old
No
Split decisions
• There are other trees logically
equivalent.
• How do we know which one to use?
Split decisions
• There are other trees logically
equivalent.
• How do we know which one to use?
• Depends on what is important to us.
Information
Gain
• Classically we rely on “information
gain”, which uses the principle that we
want to use the least number of bits, on
average, to get our idea across.
• Suppose I want to send a weather
forecast with 4 possible outcomes:
Rain, Sun, Snow, and Tornado. 4
outcomes = 2 bits.
• In Pittsburgh there’s Rain 90% of the
time, Snow 5%, Sun 4.9%, and Tornado
.01%. So if you assign Rain to a 1-bit
Entropy
Entropy
Set S has 6 positive, 2
negative examples.
H(S) = -.75 log2(.75)
- .25 log2(.25) =
Rain
Is10601
Before10
Material
Attend
+
+
-
New
+
+
-
+
New
+
+
-
Old
-
+
-
-
-
-
+
-
+
-
+
-
+
Conditional Entropy
“The average number of bits it would take to encode a
message Y, given knowledge of X”
Conditional Entropy
H(Attend | Rain) =
H(Attend | Rain=T)*P(Rain=T) +
H(Attend|Rain=F)*P(Rain=F)
Rain
Is10601
Before10
Material
Attend
+
+
-
New
+
+
-
+
New
+
+
-
Old
-
+
-
-
-
-
+
-
+
-
+
-
+
Conditional Entropy
H(Attend | Rain) =
H(Attend | Rain=T)*P(Rain=T) +
H(Attend|Rain=F)*P(Rain=F)=
1 * 0.5 + 0 * 0.5 = 0.5
Rain
Is10601
Before10
Material
Attend
+
+
-
New
+
+
-
+
New
+
+
-
Old
-
+
-
-
-
-
+
-
+
-
+
-
+
Entropy
of this
set
=1
Entropy
of this set
=0
Information Gain
IG(S,A) = H(S) - H(S|A)
“How much conditioning on attribute A increases our
knowledge (decreases entropy) of S.
Information Gain
IG(Attend,Rain) =
H(Attend) H(Attend|Rain)=
.8113 - .5 = .3113
Rain
Is10601
Before10
Material
Attend
+
+
-
New
+
+
-
+
New
+
+
-
Old
-
+
-
-
-
-
+
-
+
-
+
-
+
What
about
this?
For some
Material
dataset,
New
Old
could we
ever build
Raining
Before10
this DT?
True
False
True
Yes
Raining
True
Yes
False
Yes
Is10601
True
Yes
False
No
False
Yes
Is10601
True
Yes
False
No
What
about
this?
For some
Material
dataset,
could we
New
Old
ever build
Raining
Before10
this DT?
True
Yes
False
Yes
Is10601
True
Yes
False
No
False
Yes
Raining
True
True
False
Yes
Is10601
True
Yes
False
No
What if you were
taking 20 classes, and
it rains 90% of the
What
about
this?
For some
Material
dataset,
could we
New
Old
ever build
Raining
Before10
this DT?
True
False
True If most
False
information is gained from
Material or Before10, we won’t ever
Yes
need Yes
to traverse to 10-601.
So even a bigger tree (node-wise) may
be “simpler”, for some
data.
False
Truesets of
False
Is10601
Raining
True
Yes
Yes
Is10601
True
Yes
False
No
Yes
No
What if you were
taking 20 classes, and
it rains 90% of the
Node-based pruning
• Until further pruning is harmful,
• For each node n in trained tree T,
• Let Tn’ be T without n (and
descendents). Assign removed
node to be “best choice” under that
traversal.
• Record error of Tn’ on validation
set.
• Let T= Tk’ where Tk’ is pruned tree
with best performance on validation
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
True
Yes
Yes
False
No
True
False
Yes
False
Is10601
True
Raining
Before10
Raining
True
Old
False
Yes
Is10601
True
Yes
False
No
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
True
False
Yes
Is10601
Yes
False
No
True
False
Yes
Raining
True
Raining
Before10
Let’s test this node...
True
Old
False
Yes
Is10601
True
Yes
False
No
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
Old
Raining
Before10
True
Yes
Suppose that most
examples where
Material=New and
Before10=True are “Yes”.
Our new subtree has
“Yes” here.
False
Text
Yes
True
False
Yes
Is10601
True
Yes
False
No
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
Old
Raining
Before10
True
Yes
Suppose that most
examples where
Material=New and
Before10=True are “Yes”.
Our new subtree has
“Yes” here.
False
Text
Yes
True
False
Yes
Is10601
True
Yes
Now, test this
tree!
False
No
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
Old
Raining
Before10
True
Yes
Suppose that most
examples where
Material=New and
Before10=True are “Yes”.
Our new subtree has
“Yes” here.
False
Text
Yes
True
False
Yes
Is10601
True
Yes
Now, test this
tree!
False
No
Node-based pruning
For each node, record
performance on validation
set of tree without node.
Material
New
Suppose our initial
tree has 0.7 accurate
performance on
validation.
Old
Raining
Before10
True
Yes
Suppose that most
examples where
Material=New and
Before10=True are “Yes”.
Our new subtree has
“Yes” here.
False
Text
Yes
True
False
Yes
Is10601
True
Yes
False
No
Suppose we get accuracy of 0.73 on this pruned tree.
Repeat the test procedure by removing a different
node from the original tree...
Node-based pruning
Try this tree (with a different
node pruned)...
Material
New
Old
Raining
Before10
True
False
Yes
Is10601
True
Yes
False
No
False
Yes
Raining
True
True
False
Yes
Is10601
True
Yes
False
No
Node-based pruning
Try this tree (with a different
node pruned)...
Material
New
Old
Raining
Before10
True
True
False
Yes
Is10601
True
Yes
False
No
False
Yes
Raining
True
No
False
Yes
Now, test this
tree and record
its accuracy.
Node-based pruning
Try this tree (with a different
node pruned)...
Material
OnceNew
we test Old
all possible
prunings,
modify
our
tree
T
with
Raining
Before10
the pruning that has
the
best
True
False
True
False
performance.
Yes
Yes
No
Raining
Repeat the entire pruning
Now, test this
True
False
selection procedure on new T,tree and record
replacing
T each time with the its accuracy.
Yes
Is10601
best performing pruned tree, until
True
False
we no longer gain anything by
pruning.
Yes
No
Rule-based pruning
Material
New
Old
Raining
Before10
True
False
Yes
Is10601
True
Yes
False
No
False
Yes
Raining
True
True
False
Yes
Is10601
True
Yes
1. Convert tree to rules, one
for each leaf:
False
No
IF Material=Old AND Raining
= False THEN Attend = Yes
IF Material=Old AND
Raining=True AND
Is601=True THEN
Attend=Yes
...
•2. Prune each rule.
For instance, to
Rule-based pruning
prune this rule:
•IF Material=Old AND Raining = F THEN
Attend = T
•Test potential rule without preconditions
on validation set, compare to performance
of original rule on set.
•IF Material=OLD THEN Attend=T
Rule-based
pruning
•Suppose we got the following accuracy for
each rule:
•IF Material=Old AND Raining = F THEN
Attend = T -- 0.6
•IF Material=OLD THEN Attend=T -- 0.5
•IF Raining=F THEN Attend = T -- 0.7
pruning
•Rule-based
Suppose we got the following
accuracy for
each rule:
•IF Material=Old AND Raining = F THEN
Attend = T -- 0.6
•IF Material=OLD THEN Attend=T -- 0.5
•IF Raining=F THEN Attend = T -- 0.7
•Then, we would keep the best one and drop
the others.
rule with each rule with one precondition
removed.
Rule-based pruning
•IF Material=Old AND Raining=T AND
Is601=T then Attend=T
•If Material=Old AND Raining=T then
Attend=T
•If Material=Old AND Is601=T then Attend=T
•If Raining=T and Is601=T then Attend=T
removed.
Rule-based
pruning
•IF Material=Old AND Raining=T AND
Is601=T then Attend=T-- 0.6
•If Material=Old AND Raining=T then
Attend=T-- 0.7
•If Material=Old AND Is601=T then Attend=T- 0.3
•If Raining=T and Is601=T then Attend=T-0.65
•IF Material=Old AND Raining=T AND
Is601=T then Attend=T-- 0.6
pruning
•Rule-based
If Material=Old AND Raining=T
then
Attend=T-- 0.7
•If Material=Old AND Is601=T then Attend=T- 0.3
•If Raining=T and Is601=T then Attend=T-0.65
•If a shorter rule works better, we may also
choose to further prune on this step before
moving on to next leaf.
•IF Material=Old AND Raining=T AND
Is601=T then Attend=T-- 0.6
pruning
•Rule-based
If Material=Old AND Raining=T
then
Attend=T-- 0.75
•If Material=Old AND Is601=T then Attend=T- 0.3
•If Raining=T and Is601=T then Attend=T-0.65
•If a shorter rule works better, we may also
choose to further prune on this step before
moving on to next leaf.
Well, maybe
not this time!
•Once we have done the same pruning
procedure for each rule in the tree....
Rule-based pruning
•3.
Order the ‘kept rules’ by their accuracy,
and do all subsequent classification with that
priority.
•-IF Material=Old AND Raining=T THEN
Attend=T-- 0.75
•-IF Raining=F THEN Attend = T -- 0.7
•-....(and so on for other pruned rules)...
Adding randomness
Raining
True
Is10601
True
Yes
What if you didn’t know if you
False
had new material?
Yes For instance, you wanted to
classify this:
False
Material
New
Before10
True
No
False
Yes
Rain
Is601
T
F
Old
No
Before1
Material
Attend?
0
???
F
Adding randomness
Raining
True
Is10601
True
Yes
What if you didn’t know if you
False
had new material?
Yes For instance, you wanted to
classify this:
False
Rain
Material
New
Before10
True
No
False
Yes
Old
Is601
T
F
where to go?
No
Before1
Material
Attend?
0
???
F
You could look at training set, and see
that when Rain=T an 10601=F, p fraction
of the examples had new material. Then
flip a p-biased coin and descend the
appropriate branch. But that might not be
the best idea. Why not?
Adding randomness
Also, you may have missing data in the training
set.
Before1
Rain
Is601
Material
Attend?
Raining
0
True
False
Yes
Is10601
True
Yes
False
?
T
F
???
F
T
There are also methods to deal with this using
probability.
“Well, 60% of the time when Rain and not 601,
there’s new material (when we know there is new
material). So we’ll just randomly select 60% of
rainy, non-601 examples where we don’t know the
material, to be old material.
Adventures in
Probability
• That approach tends to work well. Still,
we may have the following trouble.
• What if there aren’t very many training
examples where Rain = True and
10601=False? Wouldn’t we still want
to use examples where Rain=False to
get the missing value?
• Well, it “depends”.
lecture next week!
Stay tuned for