Transcript lecture8

Today’s Topics
•
•
•
•
•
•
•
•
HW1 Due 11:55pm Today (no later than next Tuesday)
HW2 Out, Due in Two Weeks
Next Week We’ll Discuss the Make-Up Midterm
Be Sure to Check your @wisc.edu Email!
Forward to your Work Email?
When is 100 < 99 ? (Unrelated to AI)
Unstable Algorithms (mentioned on slide last week)
D-Tree Wrapup
What ‘Space’ does ID3 Search?
(Transition to new AI topic: SEARCH)
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
1
Unstable Algorithms
• An idea from the stats community
• An ML is unstable if small changes
to the trainset can lead to large changes
to the learned model
• D-trees unstable since one different example
can change the root
• k-NN stable since impact of examples local
• Ensembles work best with unstable algos
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
Lecture 1, Slide 2
ID3 Recap:
Questions Addressed
• How closely should we fit the training data?
– Completely, then prune
– Use tuning sets to score candidates
– Learn forests and no need to prune! Why?
• How do we judge features?
– Use info theory (Shannon)
• What if a features has many values?
– Convert to Boolean-valued features
• D-trees can also handle missing feature values
(but we won’t cover this for d-trees)
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
3
ID3 Recap (cont.)
Looks like a d-tree!
• What if some features cost more to evaluate
(eg, CAT scan vs Temperature)?
– Use an ad-hoc correction factor
• Best way to use in an ensemble?
– Random forests often perform quite well
• Batch vs. incremental (aka, online) learning?
– Basically a ‘batch’ approach
– Incremental variants exist but since ID3 is so fast,
why not simply rerun ‘from scratch’ whenever a
mistake is made?
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
4
ID3 Recap (cont.)
• What about real-valued outputs?
– Could learn a linear approximation for various regions of
the feature space, eg
3 f1 - f 2
f1 + 2 f 2
f4
Venn
• How rich is our language for
describing examples?
– Limited to fixed-length feature vectors
(but they are surprisingly effective)
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
5
Summary of ID3
Strengths
– Good technique for learning models from ‘real world’
(eg, noisy) data
– Fast, simple, and robust
– Potentially considers complete hypothesis space
– Successfully applied to many real-world tasks
– Results (trees or rules) are human-comprehensive
– One of the most widely used techniques in data mining
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
6
Summary of ID3 (cont.)
Weaknesses
–
–
–
–
–
9/29/15
Requires fixed-length feature vectors
Only makes axis-parallel (univariate) splits
Not designed to make probabilistic predictions
Non-incremental
Hill-climbing algorithm
(poor early decisions
However,
can be disastrous)
extensions
exist
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
7
A Sample Search Tree
- so we can use another search method besides hill climbing (‘greedy’ algo)
• Nodes are PARTIALLY COMPLETE D-TREES
• Expand ‘left most’ (in yellow) question mark (?) of current node
• All possible trees can be generated (given thresholds ‘implied’ by
real values in train set)
Create leaf node
Create leaf node
?
Add F1
-
Add F2
F1
?
F2
?
?
...
F1 ?
?
9/29/15
?
FN
?
?
?
Add F1
F2
+
Add FN
...
Assume F2
scores best
F2
+
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
?
8
Viewing ID3 as a
Search Algorithm
Search Space
Operators
Search Strategy
Heuristic
Function
Start Node
Goal Node
9/29/15
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
9
Viewing ID3 as a
Search Algorithm
9/29/15
Search Space
Space of all decision trees constructible using
current feature set
Operators
Add a node (ie, grow tree)
Search Strategy
Hill Climbing
Heuristic
Function
Information Gain
Start Node
An isolated leaf node marked ‘?’
Goal Node
Tree that separates all the training data (‘post
pruning’ may be done later to reduce overfitting)
(Other d-tree algo’s use similar ‘purity measures’)
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
10
Issues Methodology
Algo’s
What We’ve Covered So Far
9/29/15
•
Supervised ML Algorithms
– Instance-based (kNN)
– Logic-based (ID3, Decision Stumps)
– Ensembles (Random Forests, Bagging, Boosting)
•
Train/Tune/Test Sets, N-Fold Cross Validation
•
•
•
•
•
•
•
Feature Space, (Greedily) Searching Hypothesis Spaces
Parameter Tuning (‘Model Selection’), Feature Selection (info gain)
Dealing w/ Real-Valued and Hierarchical Features
Overfitting Reduction, Occam’s Razor
Fixed-Length Feature Vectors, Graph/Logic-Based Reps of Examples
Understandability of Learned Models, “Generalizing not Memorizing”
Briefly: Missing Feature Values, Stability (to small changes in training sets)
CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4
11