Transcript Document

Pattern Recognition and Machine Learning
Part 2
Lucy Kuncheva
School of Computer Science
Bangor University
[email protected]
1
Pattern Recognition – DIY using WEKA
2
The weka (also known as
Maori hen or woodhen)
(Gallirallus australis) is a
flightless bird species of
the rail family. It is
endemic to New Zealand,
where four subspecies are
recognized. Weka are
sturdy brown birds,
about the size of a
chicken. As omnivores,
they feed mainly on
invertebrates and fruit.
http://en.wikipedia.org/wiki/Weka
3
WEKA
http://www.cs.waikato.ac.nz/ml/weka/
“WEKA is a collection of machine learning
algorithms for data mining tasks. The
algorithms can either be applied directly to a
dataset or called from your own Java code.
WEKA contains tools for data pre-processing,
classification, regression, clustering,
association rules, and visualization. It is also
well-suited for developing new machine
learning schemes.”
4
WEKA
And we will be using only the hammer...
5
PROBLEM
Data set:
Your data sets are of the WIDE type: small number of objects, large number of features
OBJECTS
FEATURES
(attributes, variables, covariates...)
1 2 3 . . .
1
2
3
.
N
8
n
object # 3
feature # 2
6
WEKA
Prepare the file
.arff:
1. Open in an ascii editor
2. Add rows
• @RELATION one_word
• @ATTRIBUTE name NUMERIC
... for all features
• @ATTRIBUTE class {1,2}
... for the class variable
• @DATA
3. Paste the data underneath
7
Feature selection
(b) Feature subsets
2
questions
How do we
select the
subsets?
How do we
evaluate the worth
of a subset?
8
(b) Feature subsets
Feature selection
2
questions
How do we
select the
subsets?
How do we
evaluate the worth
of a subset?
Not our problem now
Wrapper
Classification
accuracy
Filter
Some easier-to-calculate
proxy for the
Classification
accuracy
Embedded
Decision tree
classifier
SVM
9
Feature selection
(b) Feature subsets
2
questions
How do we
select the
subsets?

Ranker

Bespoke
Greedy
Sequential Forward
Selection (SFS)
How do we
evaluate the worth
of a subset?
Wrapper
Filter
Heuristic search
Random
Embedded
Swarm
optimisation
Genetic
Algorithms (GA)
10
Feature selection
(b) Feature subsets
2
questions
How do we
select the
subsets?

Ranker

Bespoke
Greedy
Sequential Forward
Selection (SFS)
How do we
evaluate the worth
of a subset?
Wrapper
Filter
Heuristic search
Random

Genetic
Embedded
Swarm
optimisation
Algorithms (GA)
11
Feature selection methods
CfsSubsetEval
FCBF (Fast Correlation-Based Filter) - originally proposed for
microarray data analysis (Yu and Liu, 2003). The idea of FCBF is that the
features that are worth keeping should be correlated with the class variable
but not correlated among themselves.
1.
L. Yu and H. Liu (2003), Feature selection for high-dimensional data: A fast correlation-based filter
solution.
12
Feature selection methods
ReliefFAttributeEval
Relief-F. Kira and Rendell, 1992; Kononenko et al., 1997.
For each object in the data set, find the nearest neighbour from the same class
(NearHit) and the nearest neighbour from the opposite class (NearMiss) using
all features. The relevance score of a feature increases if the feature value in the
current object is closer to that in the NearHit compared to that in the
NearMiss. Otherwise, the relevance score of the feature decreases.
1.
K. Kira and L. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New
Algorithm. AAAI-92 Proceedings.
2.
I. Kononenko et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997),
Applied Intelligence, 7(1), p39-55
13
Feature selection methods
Relief-F.
Current object
NearMiss
Relevance score for y
decreases
NearHit
Relevance score for x increases
14
Feature selection methods
SVM. This classifier builds a linear function that separates the classes. The hyperplane is
calculated so as to maximise the distance to the nearest points. The absolute values of the
coefficients in front of the features can be interpreted as “importance”.
SVM-RFE. RFE stands for “Recursive Feature Elimination” (Guyon et al., 2006).
Starting with an SVM on the entire feature set, a fraction of the features with the lowest
weights is dropped. A new SVM is trained with the remaining features, and subsequently
reduced in the same way. The procedure stops when the set of the desired cardinality is
reached. While SVM-RFE has been found to be extremely useful for wide data such as
functional magnetic resonance imaging (fMRI) data (DeMartino et al., 2008), it was
discovered that the RFE step is not always needed (Abeel et al., 2010; Geurts et al., 2005).
SVMAttributeEval
15
Feature selection methods
SVM-RFE
Eliminate one feature
at each iteration
(default)
SVM
Set this value to 0
16
Feature selection methods
For this example, both SVM and
SVM-RFE give the same result
Relief-F
Ranked attributes:
6 2 GRIP_TEST_Right
5 5 HEIGHT_Standing_cm
4 1 GRIP_TEST_Left
3 4 HEIGHT_Seated_cm
2 3 WEIGHT_Kg
1 6 ARM_SPAN_cm
Ranked attributes:
0.07863 2 GRIP_TEST_Right
0.07549 5 HEIGHT_Standing_cm
0.05528 4 HEIGHT_Seated_cm
0.05414 1 GRIP_TEST_Left
0.03172 3 WEIGHT_Kg
0.00797 6 ARM_SPAN_cm
FCBF
Selected attributes: 1,2,5 : 3
GRIP_TEST_Left
GRIP_TEST_Right
HEIGHT_Standing_cm
17
PROBLEM
Feature selection methods
For this example, both SVM and
SVM-RFE give the same result
Relief-F
Ranked attributes:
Ranked attributes:
6 2 GRIP_TEST_Right
0.07863 2 GRIP_TEST_Right
While these results are
5 5 HEIGHT_Standing_cm
0.07549 5 HEIGHT_Standing_cm
4 1 GRIP_TEST_Left
(probably) curious, there is 0.05528 4 HEIGHT_Seated_cm
3 4 HEIGHT_Seated_cm
0.05414 1 GRIP_TEST_Left
no
statistical
significance
we
2 3 WEIGHT_Kg
0.03172 3 WEIGHT_Kg
1 6 ARM_SPAN_cm
0.00797 6 ARM_SPAN_cm
can attach to them... 
FCBF
Selected attributes: 1,2,5 : 3
GRIP_TEST_Left
GRIP_TEST_Right
HEIGHT_Standing_cm
18
Time for a coffee-break 
19
Feature selection methods
Permutation test
X
Y
Feature of interest: X
4.3
G
Class label variable: Y (say, G/N)
2.1
N
1.8
G
2.3
G
3.2
N
Let XG be the sample from class G, and XN, the sample from class N.
Two-sample t-test can be used to test the hypothesis of equal means when XG and
XN come from approximately normal distributions.
If we cannot ascertain this condition, use PERMUTATION tests.
Quantity of interest
...
...
V = | mXG - mXN |
(difference between the two means)
Observed value for our data: V*
Question: What is the probability that we observe V* if there was no relationship
between X and the class label Y.
20
Feature selection methods
Permutation test
1. ANTHRO- HEIGHT - Standing (cm)
30000
Observed value
# occurences
25000
20000
15000
Very small chance to
obtain the observed
V* or larger.
10000
5000
0
0
Histogram of V
for permuted labels
2
4
6
abs(mean1 - mean2)
p-value = 0.0046
8
10
21
Feature selection methods
p-value
0.0046
0.0058
0.0077
0.0100
0.0123
0.0159
0.0193
0.0266
0.0319
0.0489
Permutation test
feature
1. ANTHRO- HEIGHT - Standing (cm)
1. ANTHRO- HEIGHT - Seated (cm)
1. ANTHRO - GRIP TEST Right
2.1 DT PACE BOWL - Average MPH
1. ANTHRO-WEIGHT (Kg)
1. ANTHRO - GRIP TEST - Left
1. ANTHRO - ARM SPAN (cm)
2.1 DT PACE BOWL - max MPH
8.1 FT - SPRINT (40m)
8.1 FT - SPRINT (30m)
22
The Dead Salmon
Lo and behold!
Brain activity
responding to the
stimuli!
Neuroscientist Craig Bennett purchased a whole Atlantic salmon, took it to a lab at Dartmouth, and put
it into an fMRI machine used to study the brain. The beautiful fish was to be the lab’s test object as they
worked out some new methods.
So, as the fish sat in the scanner, they showed it “a series of photographs depicting human individuals in
social situations.” To maintain the rigor of the protocol (and perhaps because it was hilarious), the
salmon, just like a human test subject, “was asked to determine what emotion the individual in the
photo must have been experiencing.”
23
Bonferroni correction for multiple comparisons
= the simplest and most conservative method to control the familywise error rate
If we increase the number of hypotheses in a test, we also increase the likelihood
of witnessing a rare event, and therefore declaring difference when there is none.
So, if the desired significance level for the whole family of n tests should be (at
most) α, then the Bonferroni correction would test each individual hypothesis
at a significance level of α/n.
In our case, we have n = 50, significance level 0.05/50 = 0.001.
24
Feature selection methods
p-value
Permutation test
PROBLEM
feature
0.0046
1. ANTHRO- HEIGHT - Standing (cm)
0.0058
1. ANTHRO- HEIGHT - Seated (cm)
None
of the features
0.0077
1. ANTHRO - GRIP TEST Right
0.0100the Bonferroni
2.1 DT PACE BOWL - Average MPH
survives
0.0123
1. ANTHRO-WEIGHT (Kg)
correction
(p
<
0.001
for- GRIP TEST - Left
0.0159
1.
ANTHRO
significance
level
0.05). - ARM SPAN (cm)
0.0193
1. ANTHRO
0.0266
2.1 DT PACE BOWL - max MPH
0.0319
8.1 FT - SPRINT (40m)
0.0489
8.1 FT - SPRINT (30m)
25
Feature selection methods
Permutation test
More PROBLEMs
1. If there are permutation tests in WEKA, they are hidden very well...
2. If there is Bonferroni correction in WEKA, it is hidden very well too...
Solution?
DIY...
26
Feature selection methods
Permutation test
Here is an algorithm for those of you with some programming experience:
(the null hypothesis is “no difference”, hence V = 0; assume the greater the V, the
larger the difference)
1. Calculate the observed value V*. Choose the number of iterations, e.g., T = 10,000 .
2. for i = 1:T
a) Permute the labels randomly
b) Calculate and store V(i) with the permuted labels
3. end (for)
4. Calculate the p-value as the proportion of V greater than or equal to V* .
5. If you do this experiment for n features, compare p with alpha/n, where alpha is your
chosen significance level (typically alpha = 0.05).
27
Feature selection methods
Permutation test
And here is a MATLAB script
% Permutation test (assume that there are no missing values)
clear, close, clc
X = xlsread('ECB U13 2010 talent testing data.xlsx',...
'U13 Talent Test Raw Data','G2:L27');
[~,Y] = xlsread('ECB U13 2010 talent testing data.xlsx',...
'U13 Talent Test Raw Data','F2:F27'); % symbolic label
[~,Names] = xlsread('ECB U13 2010 talent testing data.xlsx',...
'U13 Talent Test Raw Data','G1:L1'); % feature names
% Convert Y to numbers (1 selected, 2 not selected)
u = unique(Y); L = ones(size(Y)); L(strcmp(u(1),Y)) = 2;
T = 20000;
continues on the next slide
28
Feature selection methods
And here is a MATLAB script
Permutation test
continued from previous slide...
for i = 1:T
la = L(randperm(numel(L)));
for j = 1:size(X,2) % for each feature
fe = X(:,j);
V(i,j) = abs(mean(fe(la == 1)) - mean(fe(la == 2)));
end
end
% p-values for the features
for j = 1:size(X,2)
V_star(j) = abs(mean(X(L == 1,j)) - mean(X(L == 2,j)));
p(j) = mean(V(:,j) > V_star(j));
fprintf('%35s %.4f\n',Names{j},p(j))
end
29
Feature selection methods
Permutation test
MATLAB output
1. ANTHRO - GRIP TEST - Left
1. ANTHRO - GRIP TEST Right
1. ANTHRO-WEIGHT (Kg)
1. ANTHRO- HEIGHT - Seated (cm)
1. ANTHRO- HEIGHT - Standing (cm)
1. ANTHRO - ARM SPAN (cm)
0.0176
0.0087
0.0124
0.0059
0.0055
0.0202
The numbers may vary slightly from one run to the next because of the random
generator. However, the larger the iteration number (T), the better.
The p-values are not corrected (Bonferroni). Correction should be applied if necessary.
30
Time for a coffee-break 
31
Time for our classifiers!!!
The Classification tab
Choose a
classifier
(SVM)
Choose a
trainingtesting
protocol
When ready
(all chosen)
click here
32
Where to find the results
The confusion
matrix
33
Where to find the results
Classification accuracy
(and classification error)
34
And a lot lot more ...
35
Thank you!
36