Discriminant Analysis

Download Report

Transcript Discriminant Analysis

Discriminant Analysis
Discriminant analysis is used to determine which
variables discriminate between two or more
naturally occurring groups.
Computationally, discriminant function analysis is
very similar to analysis of variance (ANOVA).
1
Sunday, 12 April 2015
10:07 PM
Discriminant Analysis
For example, an educational researcher may want
to investigate which variables discriminate
between high school graduates who decide (1) to
go to college, (2) to attend a trade or
professional school, or (3) to seek no further
training or education. For that purpose the
researcher could collect data on numerous
variables prior to students' graduation. After
graduation, most students will naturally fall into
one of the three categories. Discriminant
Analysis could then be used to determine which
variable(s) are the best predictors of students'
subsequent educational choice.
2
Discriminant Analysis
For example, a medical researcher may record
different variables relating to patients'
backgrounds in order to learn which variables
best predict whether a patient is likely to recover
completely (group 1), partially (group 2), or not at
all (group 3). A biologist could record different
characteristics of similar types (groups) of
flowers, and then perform a discriminant function
analysis to determine the set of characteristics
that allows for the best discrimination between
the types.
3
Discriminant Analysis
The data consist of five measurements on each of
32 skulls found in the southwestern and eastern
districts of Tibet.
1. Greatest length of skull (measure 1)
2. Greatest horizontal breadth of skull (measure 2)
3. Height of skull (measure 3)
4. Upper face length (measure 4)
5. Face breadth between outermost points of
cheekbones (measure 5)
There are also place and grouping variables.
4
Discriminant Analysis
The data can be divided into two groups. The
first comprises skulls 1 to 17 found in graves in
Sikkim and the neighbouring area of Tibet (Type
A skulls). The remaining 15 skulls (Type B skulls)
were picked up on a battlefield in the Lhasa
district and are believed to be those of native
soldiers from the eastern province of Khams.
These skulls were of particular interest since it
was thought at the time that Tibetans from
Khams might be survivors of a particular human
type, unrelated to the Mongolian and Indian types
that surrounded them.
5
Discriminant Analysis
There are two questions that might be of
interest for these data:
Do the five measurements discriminate between
the two assumed groups of skulls and can they be
used to produce a useful rule for classifying
other skulls that might become available?
Taking the 32 skulls together, are there any
natural groupings in the data and, if so, do they
correspond to the groups assumed?
6
Discriminant Analysis
Classification is an important component of
virtually al scientific research. Statistical
techniques concerned with classification are
essentially of two types. The first (cluster
analysis) aims to uncover groups of observations
from initially unclassified data. The second
(discriminant analysis) works with data that is
already classified into groups to derive rules for
classifying new (and as yet unclassified)
individuals on the basis of their observed variable
values.
7
Discriminant Analysis
Classification is an important component of
virtually al scientific research. Statistical
techniques concerned with classification are
essentially of two types. The first (cluster
analysis) aims to uncover groups of observations
from initially unclassified data. The second
(discriminant analysis) works with data that is
already classified into groups to derive rules for
classifying new (and as yet unclassified)
individuals on the basis of their observed variable
values.
8
Discriminant Analysis
Initially it is wise to take a look at your raw data.
9
Discriminant Analysis
Select matrix scatter
Use Define to select.
10
Discriminant Analysis
Select matrix
variables and
markers.
Note that greatest
length of skull is
above the list
shown.
Use OK to accept.
11
Discriminant Analysis
12
Discriminant Analysis
While this diagram only allows us to asses the
group separation in two dimensions, it seems to
suggest that face breadth between outer- most
points of cheek bones (meas5), greatest length
of skull (meas1), and upper face length (meas4)
provide the greatest discrimination between the
two skull types.
13
Discriminant Analysis
We shall now use Fisher’s linear discriminant
function to derive a classification rule for
assigning skulls to one of the two predefined
groups on the basis of the five measurements
available.
14
Discriminant Analysis
Now proceed to complete the analysis.
15
Discriminant Analysis
As before use the secondary screens to select the
grouping variable (place) and use Define Range.
16
Discriminant Analysis
Select the independents, use OK to run.
17
Discriminant Analysis
From the statistics button select
Now proceed to complete the analysis.
18
Discriminant Analysis
The
resulting
descriptive output
displays,
means
and
standard
deviations of each
of
the
five
measurements for
each type of skull
and overall are
given in the Group
Statistics table.
Group Statistics
Place where
skulls were found
Sikkem or Tibet
Lhasa
Total
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Valid N (listwise)
Unweighted
Weighted
17
17.000
Mean
174.824
Std. Deviation
6.7475
139.353
7.6030
17
17.000
132.000
69.824
6.0078
4.5756
17
17
17.000
17.000
130.353
8.1370
17
17.000
185.733
8.6269
15
15.000
138.733
6.1117
15
15.000
134.767
76.467
6.0263
3.9118
15
15
15.000
15.000
137.500
4.2384
15
15.000
179.938
9.3651
32
32.000
139.063
6.8412
32
32.000
133.297
72.938
6.0826
5.3908
32
32
32.000
32.000
133.703
7.4443
32
32.000
19
Discriminant Analysis
The within-group covariance matrices shown in the
Covariance Matrices table suggest that the sample values
differ to some extent, see Box’s test for equality of
covariances (see Log Determinants and Test Results).
Covariance Matrices
Place where
skulls were found
Sikkem or Tibet
Lhasa
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Greatest
length of skull
45.529
Greatest
horizontal
breadth of
skull
25.222
Height of skull
12.391
Upper face
length
22.154
Face breadth
between
outermost
points of
cheek bones
27.972
25.222
57.805
11.875
7.519
48.055
12.391
22.154
11.875
7.519
36.094
-.313
-.313
20.936
1.406
16.769
27.972
48.055
1.406
16.769
66.211
74.424
-9.523
22.737
17.794
11.125
-9.523
37.352
-11.263
.705
9.464
22.737
17.794
-11.263
.705
36.317
10.724
10.724
15.302
7.196
8.661
11.125
9.464
7.196
8.661
17.964
20
Discriminant Analysis
The within-group covariance matrices shown in the
Covariance Matrices table suggest that the sample values
differ to some extent, but according to Box’s test for
equality of covariances (tables Log Determinants and
Test Results) these differences are not statistically
significant (F(15,3490) = 1.2, p = 0.25).
Log Determinants
Place where skulls
were found
Sikkem or Tibet
Lhasa
Pooled within-groups
Rank
5
5
5
Test Results
Log
Determinant
16.164
15.773
16.727
The ranks and natural logarithms of determinants
printed are those of the group covariance matrices.
Box's M
F
Approx.
df1
df2
Sig.
22.371
1.218
15
3489.901
.249
Tests null hypothesis of equal population covariance matrices.
21
Discriminant Analysis
It appears that the equality of covariance matrices
assumption needed for Fisher’s linear discriminant
approach to be strictly correct is valid here.
In practice, Box’s test is not of great use since even if it
suggests a departure for the equality hypothesis, the
linear discriminant may still be preferable over a
quadratic function. Here we shall simply assume normality
for our data relying on the robustness of Fisher’s
approach to deal with any minor departure from the
assumption.
22
Discriminant Analysis
The resulting discriminant analysis shows the eigenvalue
(here 0.93) represents the ratio of the between-group
sums of squares to the within-group sum of squares of
the discriminant scores. It is this criterion that is
maximized in discriminant function analysis.
Eigenvalues
Function
1
Eigenvalue
.930a
% of Variance
100.0
Cumulative %
100.0
Canonical
Correlation
.694
a. First 1 canonical discriminant functions were used in the
analysis.
23
Discriminant Analysis
The canonical correlation is simply the Pearson
correlation between the discriminant function scores and
group membership coded as 0 and 1. For the skull data,
the canonical correlation value is 0.694 so that
0.694 × 100 = 48% of the variance in the discriminant
function scores can be explained by group differences.
Eigenvalues
Function
1
Eigenvalue
.930a
% of Variance
100.0
Cumulative %
100.0
Canonical
Correlation
.694
a. First 1 canonical discriminant functions were used in the
analysis.
24
Discriminant Analysis
Wilk’s Lambda provides a test for assessing the null
hypothesis that in the population the vectors of means of
the five measurements are the same in the two groups.
The lambda coefficient is defined as the proportion of
the total variance in the discriminant scores not

explained by differences among
the groups, here 51.8%.
The formal test confirms that the sets of five mean skull
measurements differ significantly between the two sites
2

( (5) = 18.1, p = 0.003). If the equality of mean vectors
hypothesis had been accepted, there would be little point
in carrying out a linear discriminant function analysis.
The
2
W ilks' Lambda
Test of Function(s)
1
Wilks'
Lambda
.518
Chi-square
18.083
df
5
Sig.
.003
25
Discriminant Analysis
Next we come to the Classification Function
Coefficients. This table is displayed as a result of
checking Fisher’s in the Statistics sub-dialogue box.
Classification Function Coefficients
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
(Constant)
Place where skulls
were found
Sikkem or
Tibet
Lhasa
1.468
1.558
2.361
2.205
2.752
.775
2.747
.952
.195
.372
-514.956
-545.419
Fisher's linear discriminant functions
26
Discriminant Analysis
It can be used to find
Fisher’s linear discrimimant
function as defined by
simply
subtracting
the
coefficients given for each
variable in each group giving
the following result:
Sikkern
or Tibet
Lhasa
Difference
1.468
1.558
-0.090
2.361
2.205
0.156
2.752
2.747
0.005
Upper face length
(measure 4)
0.775
0.952
-0.177
Face breadth
between
outermost points
of cheekbones
(measure 5)
0.195
0.372
-0.177
Greatest length of
skull
(measure 1)
Greatest
horizontal
breadth of skull
(measure 2)
Height of skull
(measure 3)
Z = 0 09 meas1 + 0.156 meas2+ 0.005 meas3 - 0 177.meas4 - 0 177.meas5
27
Discriminant Analysis
Z = 0 09 meas1 + 0.156 meas2+ 0.005 meas3 - 0 177.meas4 - 0 177.meas5
The difference between the constant coefficients
provides the sample mean of the discriminant function
scores
z  30.463
28
Discriminant Analysis
The coefficients defining Fisher’s linear discriminant
function in the equation are proportional to the
unstandardised coefficients given in the “Canonical
Discriminant Function Coefficients” table which is
produced when Unstandardised is checked in the
Statistics sub-dialogue box.
Canonical Discriminant Function Coefficients
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
(Constant)
Function
1
.048
-.083
-.003
.095
.095
-16.222
Unstandardized coefficients
29
Discriminant Analysis
These scores can be compared with the average of their
group means (shown in the Functions at Group Centroids
table) to allocate skulls into groups. Here the threshold
against which a skull’s discriminant score is evaluated is
0 0585= ½ (0 877 + 0 994)
Functions at Group Centroids
Place where
skulls were found
Sikkem or Tibet
Lhasa
Function
1
-.877
.994
Unstandardized canonical discriminant
functions evaluated at group means
Thus new skulls with discriminant scores above 0.0585
would be assigned to the Lhasa site (type B);
otherwise, they would be classified as type A.
30
Discriminant Analysis
When variables are measured on different scales, the
magnitude of an unstandardised coefficient provides
little indication of the relative contribution of the
variable to the overall discrimination. The “Standardized
Canonical Discriminant Function Coefficients” listed
attempt to overcome this problem by rescaling of the
variables to unit standard deviation.
Standardized Canonical Discriminant Function Coefficients
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Function
1
.367
-.578
-.017
.405
.627
31
Discriminant Analysis
For our data, such standardisation is not necessary since
all
skull
measurements
were
in
millimetres.
Standardization should, however, not matter much since
the within-group standard deviations were similar across
different skull measures. According to the standardized
coefficients, skull height (meas3) seems to contribute
little to discriminating between the two types of skulls.
Standardized Canonical Discriminant Function Coefficients
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
Function
1
.367
-.578
-.017
.405
.627
32
Discriminant Analysis
A question of some importance about a discriminant
function is: how well does it perform? One possible
method of evaluating performance is to apply the derived
classification rule to the data set and calculate the
misclassification rate.
33
Discriminant Analysis
Repeat using the following classification.
Now proceed to complete the analysis.
34
Discriminant Analysis
This is known as the re-substitution estimate and the
corresponding results are shown in the Original part of
the Classification Results table. According to this
estimate, 81.3% of skulls can be correctly classified as
type A or type B on the basis of the discriminant rule.
Classification Resultsb,c
Original
Count
%
Cross-validated
a
Count
%
Place where
skulls were found
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Predicted Group
Membership
Sikkem or
Tibet
Lhasa
14
3
3
12
82.4
17.6
20.0
80.0
12
5
6
9
70.6
29.4
40.0
60.0
Total
17
15
100.0
100.0
17
15
100.0
100.0
a. Cross validation is done only for those cases in the analysis. In cross
validation, each case is classified by the functions derived from all cases other
than that case.
b. 81.3% of original grouped cases correctly classified.
c. 65.6% of cross-validated grouped cases correctly classified.
35
Discriminant Analysis
However, estimating misclassification rates in this way is
known to be overly optimistic and several alternatives for
estimating misclassification rates in discriminant analysis
have been suggested. One of the most commonly used of
these alternatives is the so called leaving one out
method, in which the discriminant function is first
derived from only n – 1 sample members, and then used to
classify the observation left out. The procedure is
repeated n times, each time omitting a different
observation.
36
Discriminant Analysis
The Cross-validated part of the Classification Results
table shows the results from applying this procedure.
The correct classification rate now drops to 65.6%, a
considerably lower success rate than suggested by the
simple re-substitution rule.
Classification Resultsb,c
Original
Count
%
Cross-validated
a
Count
%
Place where
skulls were found
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Predicted Group
Membership
Sikkem or
Tibet
Lhasa
14
3
3
12
82.4
17.6
20.0
80.0
12
5
6
9
70.6
29.4
40.0
60.0
Total
17
15
100.0
100.0
17
15
100.0
100.0
a. Cross validation is done only for those cases in the analysis. In cross
validation, each case is classified by the functions derived from all cases other
than that case.
b. 81.3% of original grouped cases correctly classified.
c. 65.6% of cross-validated grouped cases correctly classified.
37
Discriminant Analysis
We now turn to applying cluster analysis to the skull
data. Here the prior classification of the skulls will be
ignored and the data simply “explored” to see if there is
any evidence of interesting “natural” groupings of the
skulls and if there is, whether these groups correspond
in anyway with Morant’s classification.
Here we will use two hierarchical agglomerative
clustering procedures, complete and average linkage
clustering and then k-means clustering.
38
Discriminant Analysis
Select Analyze > Classify > Hierarchical Cluster
39
Discriminant Analysis
In the usual way select the variables of interest
40
Discriminant Analysis
Select the plots desired
41
Discriminant Analysis
Select the desired method
Now proceed to complete the analysis.
42
Discriminant Analysis
The complete linkage clustering output shows which
skulls or clusters are combined at each stage of the
cluster procedure. First, skull 8 is joined with skull 13
since the Euclidean distance between these two skulls is
smaller than the distance between any other pair of
skulls. The distance is shown in the column labelled
“Coefficients”.
43
Discriminant Analysis
Second, skull 15 is joined with skull 17 and so on.
Agglomeration Schedule
Stage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Cluster Combined
Cluster 1
Cluster 2
8
13
15
17
9
23
8
19
24
28
21
22
16
29
7
8
2
3
27
30
5
9
18
32
5
7
2
15
6
16
14
25
24
31
11
18
1
20
4
10
14
21
6
11
Coefficients
3.041
5.385
5.701
5.979
6.819
6.910
7.211
8.703
8.874
9.247
9.579
9.874
10.700
11.522
12.104
12.339
13.528
13.537
13.802
14.062
15.588
16.302
Stage Cluster First
Appears
Cluster 1
Cluster 2
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
4
0
0
0
0
0
3
0
0
11
8
9
2
0
7
0
0
5
0
0
12
0
0
0
0
16
6
15
18
Next Stage
4
14
11
8
17
21
15
13
14
23
13
18
24
28
22
21
23
22
26
28
25
24
44
Discriminant Analysis
The dendrogram is simpler to interpret.
45
46
Discriminant Analysis
The dendrogram may, on occasions, also be useful in
deciding the number of clusters in a data set with a
sudden increase in the size of the difference in adjacent
steps taken as an informal indication of the appropriate
number of clusters to consider.
For the dendrogram, a fairly large jump occurs between
stages 29 and 30 (indicating a three- group solution) and
an even bigger one between this penultimate and the
ultimate fusion of groups (a two-group solution).
47
Discriminant Analysis
For an alternate approach use
Now proceed to produce the plot
48
Discriminant Analysis
The initial steps agree with the complete linkage solution,
but eventually the trees diverge with the average linkage
dendrogram successively adding small clusters to one
increasingly large cluster. For the average linkage
dendrogram it is not clear where to cut the dendrogram
to give a specific number of groups.
49
50
Discriminant Analysis
Since we believe there are two groups a final cluster
analysis, employing this information, may be attempted.
51
Discriminant Analysis
The variable selection and number of clusters are shown.
52
Discriminant Analysis
The resulting cluster output shows the Initial Cluster
Centre table displays the starting values used by the
algorithm.
Initial Cluster Centers
Cluster
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
1
200.0
2
167.0
139.5
130.0
143.5
82.5
125.5
69.5
146.0
119.5
53
Discriminant Analysis
The Iteration History table indicates that the algorithm
has converged.
Iteration Historya
Iteration
1
2
Change in Cluster
Centers
1
2
16.626
16.262
.000
.000
a. Convergence achieved due to no or small
change in cluster centers. The maximum
absolute coordinate change for any center is
.000. The current iteration is 2. The minimum
distance between initial centers is 48.729.
54
Discriminant Analysis
The Final Cluster Centres tables describe the final
cluster solution.
Final Cluster Centers
Cluster
Greatest length of skull
Greatest horizontal
breadth of skull
Height of skull
Upper face length
Face breadth between
outermost points of
cheek bones
1
188.4
2
174.1
141.3
137.6
135.8
77.6
131.6
69.7
138.5
130.4
55
Discriminant Analysis
The Number of Cases in each Cluster tables describe the
final cluster solution.
Number of Cases in each Cluster
Cluster
Valid
Missing
1
2
13.000
19.000
32.000
.000
56
Discriminant Analysis
How does the k-means two-group solution compare with
the original classification of the skulls into types A and
B?
We can investigate this by first using the Save button on
the k-Means Cluster Analysis dialogue box to save cluster
membership for each skull in the Data View spreadsheet.
57
Discriminant Analysis
The new categorical variable now available (labelled
QCL_1) can be cross-tabulated with assumed skull type
(variable place). The display shows the resulting table;
the k-means clusters largely agree with the skull types as
originally suggested by Morant, with cluster 1 consisting
primarily of Type B skulls (those from Lhasa) and cluster
2 containing mostly skulls of Type A (from Sikkim and
the neighbouring area of Tibet). Only six skulls are
wrongly placed.
58
Discriminant Analysis
The new categorical variable now available (labelled
QCL_1) can be cross-tabulated with assumed skull type.
59
Discriminant Analysis
The new categorical variable now available (labelled
QCL_1) can be cross-tabulated with assumed skull type.
60
Discriminant Analysis
The new categorical variable now available (labelled
QCL_1) can be cross-tabulated with assumed skull type.
Assumed type of skull
Cluster Number of Case
A
B
Count
Count
1
2
11
2
15
4
61
Discriminant Analysis
The k-means clusters largely agree with the skull types
as originally suggested, with cluster 1 consisting primarily
of Type B skulls (those from Lhasa) and cluster 2
containing mostly skulls of Type A (from Sikkim and the
neighbouring area of Tibet). Only six skulls are wrongly
placed.
62