V ariable Cluster Analysis: Usree Kirtania, MS; Cynthia Davis, MS

Download Report

Transcript V ariable Cluster Analysis: Usree Kirtania, MS; Cynthia Davis, MS

Variable Cluster Analysis:
A useful approach to identify underlying dimensions of a questionnaire
Usree Kirtania, MS; Cynthia Davis, MS
Institute for Community Health Promotion, Nov 2006
BROWN
UNIVERSITY
OBJECTIVE: To
identify underlying dimensions of a questionnaire using Variable cluster analysis (VARCLUS) approach.
Introduction
_____________________________________________________
Data
________________________________________________
Variable Cluster Analysis, (implemented in SAS through PROC
VARCLUS), is another variable reduction method that often has distinct
advantages over the traditional Factor Analysis (FA) approach. This
method borrowed some ideas from the Factor Analysis method and
some from the Hierarchical Clustering method and produces either
disjoint or hierarchical clusters. These distinct clusters from VARCLUS
help to identify underlying dimensions of a questionnaire which are
essential for developing a well constructed scale score.
We applied the VARCLUS method in a 94 item food habit
questionnaire (STFHQ) from the SISTERTALK study,which is a
weight control intervention program for African American women
(N=461).Each introductory item is followed by several behavioral
items. We used 57 behavioral items in the analysis.
VARCLUS Procedure
______________________________________________________
2nd eigenvalue
X1, X2, X3, X4, X5
1) All variables start in one cluster.
1.7
Threshold
(Default)
2) 2nd eigenvalue > specified
threshold => additional dimensions.
X1, X3, X4
1
0.7
X1, X3
X4
X2, X5
0
Example: Introductory item:
Behavioral item:
How often did you eat bacon or sausage?
How often was it low fat or turkey bacon?
Multi level responses: (Almost always/often, sometimes, rarely, never).
For all behavioral items higher score indicates higher fat intake behavior.
Missing values generated in behavioral items due to response ‘never’ in an
introductory item were imputed with zero.
VARCLUS procedure vs. Factor Analysis (FA)
______________________________________________
VARCLUS
MAXEIGEN=option
using Correlation matrix
OR
PERCENT=option
using Covariance matrix
Factor Analysis (FA)
Estimate communalities
using Squared Multiple
Correlation (SMC)
Divisive Clustering
3) The initial cluster divided into
two clusters.
4) The procedure stops when each
cluster satisfies 2nd eigenvalue
< specified threshold criterion.
5) Variables have relatively high
correlation with their own cluster and
low correlation with other clusters.
6) VARCLUS uses the first principal
component based on correlation matrix
or the first centroid component based
on covariance matrix.
7) VARCLUS generates
(1-R2 own cluster)
1-R2 Ratio =
(1-R2next closest cluster)
Number of clusters
Clusters that meet
2nd eigenvalue < specified
threshold
Number of factors
• Scree plot
•Kaiser-Guttman rule
Rotation
Ortho-oblique
Rotation
• Varimax
• Promax
Cluster representative
(1-R2) ratio
(Lower is better)
Factor representative
Factor loading
(Higher is better)
Fat related eating behaviors using VARCLUS procedure
_________________________________________________
Preliminary VARCLUS suggested 7 distinct clusters.
59% total variation explained by these 7 clusters. Cluster items and
1 –R2 ratio has been presented below.
Higher fat food (=0.74)
Chick1 (0.39)
Ffish1 (0.56)
Sitdn6 (0.58)
Ffood6 (0.60)
Restaurant (=0.72)
Sitdn1 (0.41)
Sitdn3 (0.41)
Sitdn5 (0.52)
Sitdn7 (0.50)
Lean fat food (=0.48)
Grdmt1 (0.52)
Grdmt2 (0.59)
Redmt1 (0.62)
Redmt3 (0.52)
Chinese food (=0.65)
Chins1 (0.32)
Chins2 (0.46)
Chins4 (0.50)
Milk fat (=0.71)
Milk1 (0.19)
Milk21 (0.16)
Fruit as snack/ dessert
(=0.76)
Otdes4 (0.23)
Snack3(0.23)
Adding fat (=0.59)
Hotcr1 (0.49)
Sandw1 (0.51)
Potat3 (0.57)
Fat related eating behaviors using Factor Analysis
__________________________________________________
FA produced 7 factors. 62% total variation explained by these 7 factors.
Higher fat food factor (8 items α=0.74).Chinese food factor (3 items
α=0.65). Restaurant factor (4 items α=0.72). Milk fat factor (3 items
α=0.70). Low/lean fat food factor (4 items α=0.44). Fruit as snack/dessert
factor (2 items α=0.76). Adding fat factor (2 items α=0.44).
Conclusions
__________________________________________________
No estimating communalities makes the VARCLUS procedure simple.
Due to distinct clusters, VARCLUS is an easier method to detect and to
explain underlying dimensions compared to the Factor Analysis
approach, which produces overlapping factors.
So, we should consider the VARCLUS approach and use it more often
along with FA because of it’s simplicity and interpretability.
Contact: Usree Kirtania. MS.
Statistical Data Analyst
[email protected]