ClusterAnalysisPart1

Download Report

Transcript ClusterAnalysisPart1

Advanced Statistical Methods for Research Math 736/836
Cluster Analysis
Part 1:
Hierarchical Clustering.
Watch out for the occasional paper clip!
©2009 Philip J. Ramsey, Ph.D.
1
 Yet another important multivariate exploratory method is referred
to as Cluster Analysis.
 Once again we are studying a multivariate method that is by itself
the subject of numerous textbooks, websites, and academic courses.
 The Classification Society of North America
(http://www.classification-society.org/csna/csna.html) deals
extensively with Cluster Analysis and related topics.
 The Journal of Classification (http://www.classificationsociety.org/csna/joc.html) publishes numerous research papers on
Cluster Analysis.
 Cluster Analysis like PCA, is a method of analysis we refer to as
unsupervised learning.
 An good text is Massart, D. L. and Kaufman, L. (1983), The
Interpretation of Analytical Chemical Data by the Use of Cluster
Analysis, New York: John Wiley & Sons.
©2009 Philip J. Ramsey, Ph.D.
2
 Supervised Learning refers to the scenario were we have a
predetermined structure among the variables or observations.
 As an example, some of the variables are considered responses and
the remaining variables are predictors or inputs to a model. Multiple
Regression analysis is a good example of this type of supervised
learning.
 Another type of supervised learning occurs when we have a
predetermined classification structure among the observations and
we attempt to develop a classification model that accurately predicts
that structure as a function of covariates (variables) in the dataset.
 Logistic regression, Discriminant Analysis, and CART modeling
are examples of this type of supervised learning.
 In Cluster Analysis we attempt to estimate an empirical
classification scheme among the observations or variables or both.
©2009 Philip J. Ramsey, Ph.D.
3
 Clustering is a multivariate technique of grouping observations
together that are considered similar in some manner – usually based
upon a distance measure.
 Clustering can incorporate any number of variables for N
observations.
 The variables must be numeric variables for which numerical
differences or distances make sense – hierarchical clustering in JMP
allows nominal and ordinal variables under certain conditions.
 The common situation is the N observations are not scattered
uniformly throughout an N-dimensional space, but rather they form
clumps, or locally dense areas, or modes, or clusters.
 The goal of Cluster Analysis is the identification of these natural
occurring clusters, which helps to characterize the distribution of N
observations.
©2009 Philip J. Ramsey, Ph.D.
4
 Basically clustering consists of a set of algorithms to explore
hidden structure among the observations.
 The goal is to separate the observations into groups or clusters such
that observations within a cluster are as homogeneous as possible
and the different groups are as heterogeneous as possible.
 Often we have no a priori hypotheses about the nature of the
possible clusters and rely on the algorithms to define the clusters.
 Identifying a meaningful set of groupings from cluster analysis is
as much or more a subject matter task as a statistical task.
 Generally no formal methods of inference are used in cluster
analysis, it is strictly exploratory – some t and F tests may be used.
 In some applications of cluster analysis, experts may have some
predetermined number of clusters that should exist, however the
algorithm determines the composition of the clusters.
©2009 Philip J. Ramsey, Ph.D.
5
 Cluster Analysis techniques implemented in JMP generally fall into
two broad categories.
 Hierarchical Clustering where we have no preconceived
notion of how many natural clusters may exist, it is a combining
process;
 K-means Clustering where we have a predetermined idea of
the number of clusters that may exist. Everitt and Dunn refer to
this as “Optimization Methods.”
 A subset of K-means Clustering is referred to as mixture
models analysis (models refer to multivariate probability
distributions usually assumed to be Normal).
 If one has a very large dataset, say > 2000 records (depends on
computing resources), then the K-means approach might used due to
the large number of possible classification groups that must be
considered.
©2009 Philip J. Ramsey, Ph.D.
6
 The cluster structures have 4 basic forms:
 Disjoint Clusters where each object can be in only one cluster
– K-means clustering falls in this category;
 Hierarchical Clusters where one cluster may be contained in
entirely within a superior cluster;
 Overlapping Clusters where objects can belong
simultaneously to two or more clusters. Often constraints are
placed on the amount of overlapping objects in clusters;
 Fuzzy Clusters are defined by probabilities of membership in
clusters for each object. The clusters can be any of the three types
listed above.
 The most common types of clusterins used in practice are disjoint
or hierarchical.
©2009 Philip J. Ramsey, Ph.D.
7
 Hierarchical clustering is also known as agglomerative hierarchical
clustering because we start with a set of N single member clusters
and then begin combining them based upon various distance criteria.
 The process ends when we have a final, single cluster containing N
members.
 The result of hierarchical clustering is presented graphically by
way of a dendrogram or tree diagram.
 One problem with the hierarchical method is that a large number of
possible classification schemes are developed and the researcher has
to decide which of the schemes is most appropriate.
 Two-way hierarchical clustering can also be performed where we
simultaneously cluster on the observations and the variables.
 Clustering among variables is typically based upon correlation
measures.
©2009 Philip J. Ramsey, Ph.D.
8
 Clustering of observations is typically based upon Euclidean
distance measures between the clusters.
 We typically try to find clusters of observations such that the
distances (dissimilarities) between the clusters is maximized for a
given set of clusters – as different as possible.
 There are a number of different methods by which the distances
between clusters are computed and the methods usually give
different results in terms of the cluster compositions.
 Example: The dataset BirthDeath95.JMP contains statistics on 25
nations from 1995. We will introduce hierarchical clustering using
the JMP Cluster platform, which is located in the “Multivariate
Methods” submenu. The platform provides most of the popular
clustering algorithms.
©2009 Philip J. Ramsey, Ph.D.
9
 Example continued: The procedure
begins with 25 individual clusters and
combines observations into clusters
until finally only a single cluster of 25
observations exists. The reseacher must
determine how many clusters are most
appropriate. In JMP the user can
dynamically select the number of
clusters by clicking and dragging the
diamond above or below the
dendrogram (see picture to right) and
see how the memberships change in
order to come to a final set of clusters.
In the dendrogram to the right, the
countries are assigned markers
according to their membership in 1 of
the 4 clusters designated.
©2009 Philip J. Ramsey, Ph.D.
10
 Example continued: To the
right is the cluster history from
JMP. Notice that Greece and
Italy are the first cluster
formed followed by Australia
and USA. At the 9th stage
Australia, USA, and Argentina
combine into a cluster.
Eventually Greece and Italy
join that cluster.
 The countries combine into
fewer clusters at each stage
until there exists only 1 cluster
at the top of the dendrogram or
tree.
©2009 Philip J. Ramsey, Ph.D.
Clus te ring History
Number of Clusters
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Distanc e
0.141569596
0.204865881
0.215801094
0.216828958
0.226494960
0.370451385
0.415752930
0.518773384
0.574383932
0.609473010
0.637642141
0.701263019
0.739182206
0.744788865
0.877722286
0.894878623
1.073430799
1.135510496
1.595721560
1.829760843
2.246948815
2.714981909
3.296092971
7.702060221
Leader
Greece
Australia
Philippines
Cameroon
Egypt
Ethiopia
Chile
China
Argentina
Chile
Cameroon
Chile
China
Argentina
Bolivia
Haiti
Bolivia
Chile
Bolivia
Chile
Ethiopia
Argentina
Bolivia
Argentina
Joiner
Italy
USA
Vietnam
Nigeria
India
Somalia
Costa Rica
Indones ia
Australia
Mexico
Kenya
Thailand
Philippines
Greece
Nicaragua
Zambia
Egypt
Kuw ait
Cameroon
China
Haiti
Chile
Ethiopia
Bolivia
11
 Example continued: The clusters designations can be saved to
the data table for further analysis – option under the red arrow in
the Report Window. Graphical analysis is often very important to
understanding the cluster structure.
 In this case we will use a new feature in JMP 8 called Graph
Builder located as the first option in the Graph Menu. Graph
Builder basically allows the user to construct trellis graphics.
 The interface for Graph Builder allows the user to drag and drop
variables from the Select Columns window to various areas of the
graph builder template.
 The user simply tries various combinations until a desired graph
is constructed. By right clicking on the graph it is possible to
control what type of display appears on the graph. As an example,
one may prefer box plots or histograms for each cell of the plot.
 The next slide shows a Graph Builder display for the 4 clusters.
©2009 Philip J. Ramsey, Ph.D.
12
 Example continued: Graph Builder display of the clusters.
From the graph can
you see how the four
variables define the
four clusters? As an
example, how do the
clusters vary for
Baby Mort?
©2009 Philip J. Ramsey, Ph.D.
13
 Example continued: Below we show a Fit Y by X plot of Birth
Rate vs. Death Rate with 90% density ellipses for each cluster.
Bivar iat e Fit of Death Rate By Birth Rate
20
Haiti
Death Rate
15
Kenya
India
Italy
10
Bolivia
USA
Nicaragua
Mexico
Costa Rica
5
Kuw ait
0
5
©2009 Philip J. Ramsey,
Bivariate
Bivariate
Bivariate
Ph.D.
Bivariate
10
15
20
Normal Ellipse P=0.900
Normal Ellipse P=0.900
Normal Ellipse P=0.900
Normal Ellipse P=0.900
30
25
Birth Rate
Cluster==1
Cluster==2
Cluster==3
Cluster==4
35
40
45
50
14
 Example continued: Below is a Bubble plot of the data. Note
that circles are colored by cluster number.
©2009 Philip J. Ramsey, Ph.D.
15
 The hierarchical clustering as mentioned determines the clusters
based upon distance measures. JMP supports the five most
common types of distance measures used to create clusters.
 The goal at each stage of clustering is to combine clusters that are
most similar in terms of distance between the clusters. The different
measures of inter-cluster distance can arrive at very different
clustering sequences.
 Average Linkage: the distance between two clusters is the
average distance between pairs of observations, or one in each
cluster. Average linkage tends to join clusters with small variances
and is slightly biased toward producing clusters with the same
variance. The distance formula is
d AB
©2009 Philip J. Ramsey, Ph.D.
1

nAnB
 d
iA jB
ij
16
 Centroid Method: the distance between two clusters is defined
as the squared Euclidean distance between their means. The
centroid method is more robust to outliers than most other
hierarchical methods but in other respects might not perform as
well as Ward's method or Average Linkage. Distance for the
centroid method is
2
d AB  X A  X B
 Ward's Method: For each of k clusters let ESSk be the sum of
squared deviations of each item in the cluster from the centroid of
the cluster. If there are currently k clusters than the total ESS =
ESS1 + ESS2 + …+ ESSk. At each stage all possible unions of
cluster pairs are tried and the two clusters providing the least
increase in ESS are combined. Initially ESS = 0 for the N
individual clusters and for the final single cluster
N
ESS    xi  x   xi  x 
i 1
©2009 Philip J. Ramsey, Ph.D.
17
 Ward's Method: At each stage, the method is biased toward
creating clusters with the same numbers of observations. For
Ward’s method the distance between cluster is calculated as
d AB
XA  XB

1
1

n A nB
2
 Single Linkage: the distance between two clusters is the
minimum distance between an observation in one cluster and an
observation in the other cluster. Clusters with the smallest distance
are joined at each stage. Single linkage has many desirable
theoretical properties, but has performed poorly in Monte Carlo
studies (Milligan 1980). The inter-cluster distance measure is
d AB  min  dij 
iA
jB
©2009 Philip J. Ramsey, Ph.D.
18
 Single Linkage: As mentioned this method has done poorly in
Monte Carlo studies, however it is the only clustering method that
can detect long, string-like clusters often referred to as chains. It is
also very good at detecting irregular, non-ellipsoidal shaped
clusters. Ward’s method for example assumes that the underlying
clusters are approximately ellipsoidal in shape.
 Complete Linkage: the distance between two clusters is the
maximum distance between an observation in one cluster and an
observation in the other cluster. At each stage pairs of clusters are
joined that have the smallest distance. Complete linkage is strongly
biased toward producing clusters with roughly equal diameters and
can be severely distorted by moderate outliers (Milligan 1980).
Distance for the Complete linkage cluster method is
d AB  max  dij 
iA
jB
©2009 Philip J. Ramsey, Ph.D.
19
 Example: The following is a simple example with 5
observations to illustrate the idea of clustering. We will use the
complete linkage method. We will start with a 5 by 5 symmetric
matrix of Euclidean distances between the 5 observations.
At stage 1we combine #3
and #5 to form a cluster
since they are the closest.
At stage two we compute
a new distance matrix and
join #2 and #4 since they
are the closest.
0

9 0



3 7 0



6
5
9
0


11 10 2 8 0 
d (35)1  max  dij   11
iA
jB
d (35)2  max  dij   10
iA
jB
d (35)4  max  dij   9
0

11 0



10 9 0 


9
6
5
0


iA
jB
©2009 Philip J. Ramsey, Ph.D.
20
 Example continued: At stage 3 we compute a new distance
matrix
Since cluster (24) and #1
are closest they are joined
into a new cluster.
d (35)(24)  max  dij   10
iA
jB
d (24)1  max  dij   9
iA
jB
0

10 0 


11 9 0 
Finally at the last stage cluster (241) is joined with cluster (35) to
create the final cluster of 5 observations. The clustering stages
can easily be visualized in this simple case without a
dendrogram.
©2009 Philip J. Ramsey, Ph.D.
21
 We next work through the use of JMP for Hierarchical clustering.
As mentioned Clustering is one of the platforms for Multivariate
Methods and is located under that submenu.
For Hierarchical
select the distance
measure that is
desired. Ward is
the default.
©2009 Philip J. Ramsey, Ph.D.
22
 Select the columns containing the data on which the hierarchical
clustering will be performed. If the rows of the data matrix are
identified by a label column then put that variable in the Label box.
If you do not want the data standardized prior to clustering than
deselect the Standardize Data default. The clusters are very scale
dependent, so many experts advise standardization if scales are not
commensurate.
©2009 Philip J. Ramsey, Ph.D.
23
 In the report window the dendrogram is displayed and many
analysis options exist under the hotspot at the top of the report.
Click and drag the red diamond on the
dendrogram to change the number of
clusters you wish to display. JMP usually
selects a proportion of clusters by default,
but it is not necessarily optimal.
Alternatively one can use the “Number of
Clusters” option in the report menu.
The scree plot at the bottom displays the
distance that was bridged in order to join
clusters at each stage.
©2009 Philip J. Ramsey, Ph.D.
24
 In the report window the dendrogram is displayed and many
analysis options exist under the hotspot at the top of the report.
Once the number of clusters have been
decided, then it is a good idea to save the
clusters to the data table and mark them.
Simply select the options from the menu or
right click at the bottom of the dendrogram.
If you decide to change the number of
clusters, JMP will update the markers and
cluster designations. As shown earlier we
often wish to save the clusters to the data
table for further analysis in other JMP
platforms.
©2009 Philip J. Ramsey, Ph.D.
25
 If you mouse click on a branch of the dendrogram, then all of the
observations in that branch are highlighted on the dendrogram and
selected in data table.
©2009 Philip J. Ramsey, Ph.D.
26
©2009 Philip J. Ramsey, Ph.D.
Argentina
Australia
USA
Greece
Italy
Chile
Costa Rica
Mexico
Thailand
Kuw ait
China
Indones ia
Philippines
Vietnam
Bolivia
Nicaragua
Egypt
India
Cameroon
Nigeria
Kenya
Ethiopia
Somalia
Haiti
Zambia
Literacy
Baby Mort
Birth Rate
Death Rate
 A color map can be added to the
dendrogram to help understand the
relationships between the
observations and the variables in
the columns. The map contains a
progressive color code from
smallest value to largest value. As
part of the color map, a two way
clustering can be performed where
a cluster analysis of the variables is
added to the bottom of the
observation dendrogram. Variables
clustering is based on correlation,
with negative correlations
indicating dissimilarity.
Dendr ogr am
27
 If a color map is desired it can be advantageous to select a display
order column, which will order the observations based on values of
the specified column.
 A good candidate for an ordering column is to perform PCA and
then save only the first PC. This can then be specified as the
ordering column in the launch window.
©2009 Philip J. Ramsey, Ph.D.
28
 The color map on the left is ordered, the one on the right is not.
Dendr ogr am
Dendr ogr am
Australia
USA
Argentina
Italy
Greece
Costa Rica
Chile
Mexico
Thailand
Kuw ait
Philippines
Vietnam
China
Indones ia
Nicaragua
Bolivia
Egypt
India
Kenya
Cameroon
Nigeria
Zambia
Haiti
Ethiopia
Somalia
Argentina
Australia
USA
Greece
Italy
Chile
Costa Rica
Mexico
Thailand
Kuw ait
China
Indones ia
Philippines
Vietnam
Bolivia
Nicaragua
Egypt
India
Cameroon
Nigeria
Kenya
Ethiopia
Somalia
Haiti
Zambia
©2009 Philip J. Ramsey, Ph.D.
29
 Example: We use the dataset CerealBrands.JMP. The data
contains information on 43 breakfast cereals. We have also saved
the first principle component score as an ordering column.
©2009 Philip J. Ramsey, Ph.D.
30
Dendr ogr am
PuffedRice
PuffedWheat
RiceKrispies
CornFlakes
Crispix
Product19
TotalCornFlakes
Kix
NutriGrainWheat
MultiGrainCheerios
Cheaties
TotalWholeGrain
FrostedMiniWheats
SpecialK
Cheerios
CornPops
AppleJacks
FrostedFlakes
Trix
CocoaPuffs
CountChocula
LuckyCharms
FrootLoops
Smacks
GoldenGrahams
NutNHoneyCrunch
JustRightCrunchy Nuggets
WheatiesHoneyGold
CapNCrunch
HoneyGrahamOhs
ACCheer ios
HoneyNutCheerios
OatmealRaisinCris p
Life
RaisinNutBran
CracklinOatBran
QuakerOatmeal
AllBran
NutriGrainAlmondRaisin
MueslixCrispyBlend
FruitfulBr an
TotalRais inBran
RaisinBran
Calories
Sugar
Fat
Sodium
Carbohydrates
Protein
Fiber
Potassium
 Example: Using two way
clustering with Prin1 as an
ordering variable we get the
following set of clusters.
We select 5 as the desired
number of clusters. Do the
clusters seem logical?
Which variables seem
important to the cluster
development?
©2009 Philip J. Ramsey, Ph.D.
31
 Example: Below is a Fit Y by X plot of Carbohydrates vs. Calories
with the clusters displayed.
Bivar iat e Fit of Car bohydr ate s By Calor ie s
NutriGrainAlmondRaisin
20
Product19
Cheerios
Carbohydrates
15
TotalRaisinBran
OatmealRaisinCr isp
PuffedRice
CapNCrunch
10
PuffedWheat
Smacks
AllBran
5
QuakerOatmeal
0
40
©2009 Philip J. Ramsey, Ph.D.
60
80
100
120
Calories
140
160
180
32
Dendr ogr am
 Example: To the right is an
analysis using single linkage.
Notice that the clustering
sequence is significantly
different.
©2009 Philip J. Ramsey, Ph.D.
PuffedRice
PuffedWheat
RiceKrispies
CornFlakes
Product19
Crispix
TotalCornFlakes
Kix
CornPops
AppleJacks
Trix
CocoaPuffs
CountChocula
LuckyCharms
FrootLoops
Smacks
NutNHoneyCrunch
JustRightCrunchyNuggets
WheatiesHoneyGold
MultiGrainCheerios
Cheaties
TotalWholeGrain
FrostedFlakes
GoldenGr ahams
NutriGrainWheat
CapNCrunch
HoneyGrahamOhs
ACCheerios
HoneyNutCheerios
Life
RaisinNutBran
CracklinOatBran
OatmealRaisinCrisp
NutriGrainAlmondRaisin
MueslixCrispyBlend
FruitfulBran
TotalRaisinBran
RaisinBran
FrostedMiniWheats
SpecialK
Cheerios
QuakerOatmeal
AllBran
33
 An obvious question is which hierarchical clustering method is
preferred.
 Unfortunately, over the decades numerous simulation studies have
been performed to attempt to answer this question and the overall
results tend to be inconsistent and confusing to say the least.
 In the studies, generally Ward’s method and average linkage have
tended to perform the best in finding the correct clusters, while single
linkage has tended to perform the worst.
 A problem in evaluating clustering algorithms is that each tends to
favor clusters with certain characteristics such as size, shape, or
dispersion.
 Therefore, a comprehensive evaluation of clustering algorithms
requires that one look at artificial clusters with various characteristis.
For the most part this has not been done.
©2009 Philip J. Ramsey, Ph.D.
34
 Most evaluations studies have tended to use compact clusters of
equal variance and size; often the clusters are based on a multivariate
normal distribution.
 Ward’s method is biased toward clusters of equal size and
approximately spherical shape, while average linkage is biased toward
clusters of equal variance and spherical shape.
 Therefore, it is not surprising that Wards’ method and average
linkage tend to be the winners in simulation studies.
 In fact, most clustering algorithms are biased toward regularly
shaped regions and may perform very poorly if the clusters are
irregular in shape.
 Recall that single linkage does well if one has elongated, irregularly
shaped clusters.
 In practice, one has no idea about the characteristics of clusters.
©2009 Philip J. Ramsey, Ph.D.
35
 If the natural clusters are well separated from each other, then any of
the clustering algorithms are likely to perform very well.
 To illustrate we use the artificial dataset WellSeparateCluster.JMP
and apply both Ward’s method and single linkage clustering.
 The data consist of three very distinct (well separated) clusters.
©2009 Philip J. Ramsey, Ph.D.
36
 Below are the clustering results for both Ward’s method, display on
the left, and single linkage displayed on the right.
 Notice that both methods easily identify the three clusters.
 Note, the clusters are multivariate normal with equal sizes.
©2009 Philip J. Ramsey, Ph.D.
37
 If the clusters are not well separated, then the various algorithms
will perform quite differently. We illustrate with the data set
PoorSeparateCluster.JMP and below is a plot of the clusters.
©2009 Philip J. Ramsey, Ph.D.
38
 The plot on the left is for Ward’s method and on the right single
linkage using Proc Cluster in SAS – the highlighted points are
observations Proc Cluster could not determine cluster membership.
 Ward’s method has done quite well, while single linkage has done
poorly and could not determine cluster membership for quite a few
observations.
©2009 Philip J. Ramsey, Ph.D.
39
 Next we look at multivariate normal clusters, but his time they are
different sizes and dispersion. The dataset UnequalCluster.JMP
contains the results.
©2009 Philip J. Ramsey, Ph.D.
40
 Next we look at multivariate normal clusters, but his time they are
different sizes and dispersion. The dataset UnequalCluster.JMP
contains the results. On the left is Ward’s method and on the right is
average linkage method using Proc Cluster is SAS.
 Ward’s method and average linkage produced almost identical
results. However, they both tended toward clusters of equal size and
assigned too many observations to the smallest cluster.
©2009 Philip J. Ramsey, Ph.D.
41
 Next we look at two elongated clusters. We will compare Ward’s
method to single linkage clustering. Generally, single linkage is
supposed to be superior for elongated clusters. The data are in the file
ElongateCluster.JMP.
©2009 Philip J. Ramsey, Ph.D.
42
 On the left below is Ward’s method and to the right single linkage
method using Proc Cluster in SAS.
 Ward’s method finds two cluster of approximate equal size, but
poorly classifies. The single linkage method correctly specifies the
two elongated clusters.
©2009 Philip J. Ramsey, Ph.D.
43
 When one has elongated clusters this indicates correlation or
covariance structure among the variables used to form the clusters.
 Sometimes transformations on the variables can generate more
spherical clusters that are more easily detected by Ward’s method or
similar methods.
 In theory the method is straightforward, however if one does not
know the number of clusters or covariance structure within each
cluster they have to be approximated from the data – not always so
easy to do well in practice.
 Proc Aceclus in SAS can be used to perform such transformations
prior to clustering with Proc Cluster.
 We can perform a rough approximation to the method in JMP by
converting the original variables to principal components and then
clustering on the principal component scores.
©2009 Philip J. Ramsey, Ph.D.
44
 To illustrate we will use the elongated cluster data and perform a
PCA on correlations in the Multivariate platform and save the two
principal component scores to the data table.
 Next we will perform clustering, using Ward’s method, on the
principal component scores.
 Below is a scatter plot based on the PC’s.
Although not perfect spheres,
the PC’s are more spherical in
shape than the original
variables and more easily
separated in the Prin2
direction. Proc Aceclus
produced similar results and is
not shown.
©2009 Philip J. Ramsey, Ph.D.
45
 Below are the results of clustering on the PC’s using Ward’s method.
 Notice that the two clusters are perfectly classified by Ward’s
method on the PC’s, while the method did not fair well on the original
variables.
©2009 Philip J. Ramsey, Ph.D.
46
 To illustrate the clustering, we can use Graph Builder in JMP to
show that indeed the clusters are primarily determined by the
difference in Prin2 between the two clusters.
©2009 Philip J. Ramsey, Ph.D.
47
 To illustrate the clustering, we can use Graph Builder in JMP to
show that indeed the clusters are primarily determined by the
difference in Prin2 between the two clusters.
©2009 Philip J. Ramsey, Ph.D.
48
 We examine one more scenario where we have nonconvex,
elongated clusters. Because of the cluster shape the PCA approach
will not work in this case.
 The data are contained in the file NonConvexCluster.JMP.
 Below is a plot of the two clusters.
©2009 Philip J. Ramsey, Ph.D.
49
 Below are the results of clustering using Ward’s method and single
linkage.
 Ward’s method misclassifies some of the observations in cluster 2,
while the single linkage method has virtually identified the two
clusters correctly.
©2009 Philip J. Ramsey, Ph.D.
50