Performing_a_Cluster_Analysis

Download Report

Transcript Performing_a_Cluster_Analysis

Steps to Performing a Cluster
Analysis
Rod Funk
Chestnut Health Systems
Bloomington, IL
Performing a Cluster Analysis
• First step is deciding on what variables you want
to cluster on
• Data can be continuous, counts or dichotomous
• Are the variables at one time point or are you
wanting to look at trajectories across time
– If across time, data will need to be in horizontal format: one row
per adolescent
– We name variables by time with a suffix for wave; _0, for intake,
_3 for 3 months (i.e. dcs_0, dcs_3, dcs_6, etc.)
• Cluster analysis also expects there to be data for
every variable used in the analysis. If you are
missing just one variable for a record, no clusters
will be calculated for that record.
Handling Missing Data
• Scale Level: In creating a scale that has shown
good internal consistency (alpha>.7) we calculate
using the average of answers as long was they
have 3 valid answers:
– Compute dcs=rnd(mean.3(l3a15d,l3a16d,l3a17d,l3a18d,l3a19d)*5).
• Item level: random replacement of missing values
– sort cases by loc xchk1.
– rmv ms2w=median(s2w,2).
– compute ms2w=rnd(ms2w).
– This replaces a missing S2w with the median of the 4 surrounding
cases
Handling Missing Data
• Replacement of variables across time
• For scales where items not asked:
– Use regression on scale using other items in
cluster at that wave along with the intake and
last wave values
• For missing a wave of data: As long as it is not the
first or last wave, interpolate using the average of
the two surrounding waves.
Running the Cluster Analysis
• Sample syntax
– CLUSTER Zpci_0 Zrpci_3 Zrpci_6 Zrpci_9 Zrpci_12
Zpci_30 Zici_0 Zrici_3 Zrici_6 Zrici_9 Zrici_12 ZSco01
Zmdci_0 Zrdci_3 Zrdci_6 Zrdci_9 Zrdci_12 ZSco02 Zl3v_0
Zrl3d_3 Zrl3d_6 Zrl3d_9 Zrl3d_12 Zl3d_30 Zl3w_0 Zrl3e_3
Zrl3e_6 Zrl3e_9 Zrl3e_12 Zl3e_30 Zmaxce_0 Zrmaxce_3
Zrmaxce_6 Zrmaxce_9 Zrmaxce_12 Zmaxce_30
–
/METHOD WARD
–
/MEASURE= SEUCLID
–
/PRINT SCHEDULE
–
/PLOTS NONE
–
/SAVE CLUSTER(2,12) .
Demonstration
• Purpose
• To Show how to take the results of the
cluster and create a table and figures for
validating and deciding on the proper
number of clusters.
• Will cover pivot tables in SPSS output, pasting
into Excel and graphing in Excel