Transcript Slides
Machine Learning
in Practice
Lecture 8
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction
Institute
Plan for the Day
Announcements
Should
be finalizing plans for term project
Weka helpful hints
Spam Dataset
Overcoming some limits of Linear
Functions
Discussing ordinal attributes in light of
linear functions
Weka Helpful Hints
Feature Selection
* Click here to start setting up feature selection
Feature
selection
algorithms
pick out a
subset of the
features that
work best
Usually
they
evaluate
each
feature in
isolation
Feature Selection
* Now click here
Feature
selection
algorithms
pick out a
subset of the
features that
work best
Usually
they
evaluate
each
feature in
isolation
Feature Selection
* Now click here.
Feature Selection
Feature Selection
* Now pick your base
classifier just like before
Feature Selection
* Finally you will configure
the feature selection
Setting Up Feature Selection
* First click here.
Setting Up Feature Selection
* Select CHiSquaredAttributeEval
Setting Up Feature Selection
* Now click here.
Setting Up Feature Selection
* Select Ranker
Setting Up Feature Selection
* Now click here
Setting Up Feature Selection
* Set the number of features
you want
Setting Up Feature Selection
The number you pick
should not be larger
than the number of
features available
The number should
not be larger than the
number of coded
examples you have
Examining Which Features are Most
Predictive
You can find a
ranked list of
features in the
Performance
Report if you use
feature selection
* Predictiveness score
* Frequency
Spam Data Set
Spam Data Set
* Which algorithm will work best?
Word
frequencies
Runs of $, !,
Capitalization
All numeric
Spam versus
NotSpam
Spam Data Set
Decision Trees (.85 Kappa)
SMO (linear function) (.79 Kappa)
Naïve Bayes (.6 Kappa)
What did SMO learn?
Decision tree model
More on Linear Functions
… exploring the idea of
nonlinearity
Limits of linear functions
Numeric Prediction with
the CPU Data
Predicting CPU
performance from computer
configuration
All attributes are numeric as
well as the output
Numeric Prediction with
the CPU Data
Could discretize the output
and predict good
performance, mediocre
performance, or bad
performance
Numeric prediction allows
you to make arbitrarily many
distinctions
Linear Regression
R-squared= .87
Outliers
** Notice that here it’s the really high values that fit the line the least well.
That’s not always the case.
The two most highly weighted
features
Exploring the Attribute Space
* Identify outliers with respect to typical attribute values.
The two most highly weighted
features
Within 1
standard
deviation
of the
mean value
Trees for Numeric Prediction
Looks like we may need a representation that
allows for a nonlinear solution
Regression trees can handle a combination
of numeric and nominal attributes
M5P: computes a linear regression function
at each leaf node of the tree
Look
at CPU performance data and compare a
simple linear regression (R = .93) with M5P (R =
.98)
Results on CPU data with M5P
More
Data
Here
Biggest
Outliers
Here
Results with M5P
More
Data
Here
Biggest
Outliers
Here
Multi-Layer Networks can learn
arbitrarily complex functions
Multilayer Perceptron
Best Results So Far
Forcing a Linear Function
Note that it weights the features
differently than the linear regression
Partly because of normalization
Regression trees split on MMAX
NN emphasizes MMIN
Review of Ordinal
Attributes
Feature Space Design for Linear Functions
Often features will be numeric
Continuous values
May be more likely to generalize properly
with discretized values
We
discussed the fact that you lose ordering
and distance
With respect to linear functions, it may be more
important that you lose the ability to think in
terms of ranges
Explicitly coding ranges allows for a simple
form of nonlinearity
Ordinal Values
Weka technically does not have ordinal
attributes
But
you can simulate them with “temperature coding”!
Try to represent “If X less than or equal to .35”?
A
.2 .25
B
.28 .31
A
A or B
A or B or C
A or B or C or D
C
.35
.45 .47 .52
D
.6 .63
Ordinal Values
Weka technically does not have ordinal
attributes
But
you can simulate them with “temperature coding”!
Try to represent “If X less than or equal to .35”?
A
.2 .25
B
.28 .31
A
A or B
A or B or C
A or B or C or D
C
.35
.45 .47 .52
D
.6 .63
Ordinal Values
Weka technically does not have ordinal
attributes
But
you can simulate them with “temperature coding”!
Try to represent “If X less than or equal to .35”?
A
.2 .25
B
.28 .31
C
.35
.45 .47 .52
D
.6 .63
A
A or B
A or B or C
A or B or C or D
Now how would
you represent
X <= .35?
Ordinal Values
Weka technically does not have ordinal
attributes
But
you can simulate them with “temperature coding”!
Try to represent “If X less than or equal to .35”?
A
.2 .25
B
.28 .31
C
.35
.45 .47 .52
D
.6 .63
A
A or B
A or B or C
A or B or C or D
Now how would
you represent
X <= .35?
Feat2 = 1
Take Home Message
Linear functions cannot learn interactions
between attributes
If you need to account for interactions:
Multiple
layers
Tree-like representations
Attributes that represent ranges
Later in the semester we’ll talk about other
approaches