Transcript Slide 1
Multivariate Methods
Nels Johnson and Matt Williams
Laboratory for Interdisciplinary
Statistical Analysis
Outline
•
•
•
•
•
•
Principal Component Analysis
Factor Analysis
Multivariate T Tests
MANOVA
Multidimensional Scaling
Correspondence Analysis
PCA – Motivating Examples
• You have measured a number of variables
concerning the size of aphids. You’d like to
reduce the number of variables used for
classification.
• You have a bunch of football statistics for
teams and would like to organize related
teams based on these statistics.
What is it?
• Based on an eigenvalue decomposition of the covariance
matrix S (or correlation matrix R) of the variables.
• Goal: Maximizes the variance of linear combinations of the
variables.
• Obtained by transforming the variables so that the
covariance of the new variables is diagonal.
• These new variables are called the principal components
(PCs) and their covariance matrix contains the eigenvalues
along the diagonal.
• This transformation can be thought of as a rotation of the
axes.
• Note: No variables are designated as dependent.
What do we get out of it?
• We can form an index measure (i.e. a score) or
a weighted average of variables based on a
subset of the PCs.
• This reduces the number of variables we have
to work with.
• With some subject matter area knowledge we
might be able to interpret the meaning of
some of the PCs based on correlations.
How to reduce the number of PCs?
• Pick a proportion of variation you want to
explain ahead of time, pick the number of PCs
so that the sum of their eigenvalues (the
proportion of variation explained by those
PCs) is at least that amount.
• Scree Plots
• All PCs with eigenvalue >1 (Kaiser’s Rule)
• Broken stick method
What are some issues?
• The scale variables are measured on matters.
– Standardize variables so they are all on same scale.
• Variables with a high amount of variability (i.e.
large variance) will naturally steer the
decomposition.
– Again, standardize the variables.
• When separation occurs perpendicular to an axis
(i.e. PC) it might not be picked up without looking
at other axes.
– Plot the pairwise scores for each PC. This may require
looking at too many graphs to be feasible.
Scree Plot
Biplot of Scores
Factor Analysis – Some Motivating
Examples
• You have the ratings people give to their
family members in areas such as Kindness,
Intelligent, Happy, etc. Want to associate
family members with some sort of overall
construct of these words.
• You have conducted a survey and want to
group question based on a topic they address.
What is it?
• We assume the variables Y can be summarized by some
underlying, unobserved, and reduced set of variables called
factors (you must pick how many factors).
• Goal is to estimate the factors.
• After the factors are estimated, the next goal is to
orthogonally rotate the solution to get simpler factors.
• For Principal Factor Solution (more later):
• Model : Y-μ = loadings*factors + error
• var(Y-μ) or corr(Y-μ) = V = loadings*loadingsT + Ψ
• The diagonals of H = V – Ψ are called the communalities.
They are R2-like numbers.
• Ψ is called the specific variance.
How to Estimate the Factors?
• Two main ways:
– Principal Component Solution (Not PCA!)
• Focuses on the diagonal of V (the variance).
• Does poorly on the off diagonal (the covariance).
– Principal Factor Solution
• Focuses on the off diagonals of V and pretty much ignores the
diagonal.
– Maximum Likelihood Method
• Assume normality of error and estimate the factors and loadings
using an iterative MLE method.
• May give nonsensical answers (i.e. Haywood case).
• Can adjust iterative method so this doesn’t happen.
• Rotations are unique.
More On Rotations
• If the rotation is orthogonal then
– loadings*loadingsT =
loadings*rotation*rotationT*loadingsT =
(loadings*rotation)*(loadings*rotation)T
• So we can redistribute the total variance and
variation explain by each variable differently
among the factors without actually changing
them.
• Lots of methods to pick rotations.
Interpreting Analysis
• Loadings represent the covariance (or
correlation) between factors and variables.
• So we look for high loadings to represent how
underlying factors influence variables.
• With some subject matter knowledge we can
name factors based off of these loadings
(when they make sense).
Some Issues
• Results can change depending on model
choices (This is a big deal)!
– Number of factors
– Estimation method
– Rotation method
• Haywood cases when using MLE.
• Existence of actual factors is suspect.
Example
y1
y2
y3
y4
y5
y6
Before Rotation
f1
f2
x
x
x
x
x
x
x
x
x
x
f3
x
x
x
After Rotation
f1
f2
x
x
x
x
f3
x
x
Multivariate T Tests
• Univariate t-test
– Normal data, with unknown mean and variance
t2 F1,
t (X 0 )/ S /n
2
• Hotelling’s T2 Test
– Multivariate Normal data with unknown mean
and Covariance
T (X 0 ) (S /n) (X 0 )
2
T
1
p 1 2
p
p,
T
Fp, p 1
One Sample Test
• Assumptions
– Observations are independent and multivariate
normal
• Testing
– Null Hypothesis: μ = μ0 (vectors)
– Alternative: μ ≠ μ0 (vectors)
T (X 0 ) (S /n) (X 0 )
2
T
1
n p
(n1)p
2
Tp,n1
Fp,n p
Example: One Sample Test
• We are interested in 3
different types of
calcium in the soil
• We wish to test if our
observed means are the
true means (15,6,2.85)
Y (28.1,7.18,3.09)
140.54 49.68 1.94
S 49.68 72.25 3.68
1.94
3.68
0.25
T 2 24.559
T 2.05,3,9 16.766
Two Sample Test
• Assumptions
– Two groups of multivariate normal data
• Observations are independent
• Means may be different but covariance is the same for
both groups
• Testing
– Null Hypothesis: μ1 = μ2 (vectors)
– Alternative: μ1 ≠ μ2 (vectors)
T 2 (X1 X 2 )T (Sp ( n11 n12 ))1(X1 X 2 )
n1 n 2 p 1
(n1 n 2 2)p
2
Tp,n
Fp,n1 n2 p 1
1 n 2 2
Example: Two Sample Test
• Four psychological tests
were given to 32 men
and 32 women
• We are interested in
seeing if the mean
Sp
vectors are the same
T 2 97.602
T 2.01,4,62 15.373
Y1 (15.97,15.91,27.19,22.75)
Y2 (12.34,13.91,16.66,21.94)
7.164
6.047
5.693
4.701
6.047
15.89
8.492
5.856
5.693
8.492
29.36
13.98
4.701
5.856
13.98
22.32
Other Tests
• Two sample paired test
– Use difference vector D = X1 – X2
• Partial Tests
– Testing μi = μi0 in the presence of the other (p-1)
means
• What about more than 2 groups?
– We had ANOVA instead of a t-test
– Now we have MANOVA instead of a T2
Multivariate Analysis of Variance
MANOVA
• Suppose we have data organized into several
groups, with each observation giving a vector
of responses
• We would like to test the hypothesis that all
the means for each of the groups are equal
• We can do this in a manner very similar to the
univariate Analysis of Variance (ANOVA)
MANOVA
• In ANOVA
– We compare Sums of Squares within groups to Sums
of Squares between groups
– Sums of Squares are the sums of the squared
differences between the observed values and the
means
• In MANOVA
– We compare Sums of Squares matrices from within
the groups to those between the groups
– E is the “within” Sums of Squares matrix
– H is the “between” Sums of Squares matrix
Four Tests
• There are four tests based on the eigenvalues of
E-1H: λ1 > λ2 > … > λs with s ≤ pd
• Pillai:
s
V (s)
i1
• Lawley-Hotelling:
• Wilk’s Lambda:
i
1 i
s
U (s) i
i1
s
– (reject for small values)
• Roy’s Largest Root:
1
1 i
i1
1
1 1
Comparison of the Four Tests
• In the collinear case
– The groups have means that lie on a line in space
(approximately)
– θ ≥ U(s) ≥ Λ ≥ V(s) in terms of power
• In the diffuse case
– The groups means are spread out in a higher
dimensional space (not a line)
– θ ≤ U(s) ≤ Λ ≤ V(s) in terms of power
Post-Test Analysis
• Just like with ANOVA, after the test we can
– Do pair-wise comparisons or contrasts
• In MANOVA we can also
– Do tests for the p individual variables
– F tests to identify which variables are different
Example: Rootstock Data
• We wish to compare
apple trees of different
rootstocks
• We have 8 trees from
each of 6 rootstocks
• Our four measurements
are
– Trunk girth at 4 years (y1)
– Extension growth at 4
years (y2)
– Trunk girth at 15 years (y3)
– Extension growth at 15
years (y4)
Rootstock Data
• Test Results
• Follow-up tests for
individual variables
– Λ = .154 < Λ.05,4,5,40 = .455
– V(s) = 1.305 > V(s).05 = .645
– U(s) = 2.921 > U(s).05
– θ = .652 > θ.05 = .377
–
–
–
–
Y1 : F = 1.93, p = .1094
Y2 : F = 2.91, p = .024
Y3 : F = 11.97, p < .0001
Y4 : F = 12.16, p < .0001
Extensions
•
•
•
•
•
•
Two-way MANOVA
Multivariate Contrasts
Mixed Models
Split plot designs
Profile Analysis
Different R2-like numbers
Multidimensional Scaling (MDS)
• Data is a distance or similarity matrix
– Many ways to generate
• Goal is to reduce dimension and visualize
– Often look at only 2 or 3 dimensions
• Motivating Examples
– Number of teeth for different species of mammals
– Discriminating between colors (red vs. orange)
– Distances between cities
Two Kinds of MDS
• Metric scaling (principal coordinates analysis)
– Distances (Euclidean) in the reduced dimension
are close to those measured in the full dimension
• Non-metric scaling
– Rank order of distances in the reduced dimension
are close to those measured in the full dimension
Types of Measures
• There are MANY measures that can be used
– Depends on type of data
– Depends on interest in observations vs. variables
• Properties
1.
2.
3.
4.
Minimum of 0, D(x,y) = 0 if x = y
Positive otherwise, D(x,y) > 0
Symmetric, D(x,y) = D(y,x)
Triangle Inequality, D(x,y) + D(y,z) > D(x,z)
Types of Measures
• Measures that satisfy 1-4 are called Metrics
• Measures satisfying 1-3 are Semi-metrics
• Some measures have negative values and are
called Non-metrics
• Certain measures can be plotted or visualized in a
Euclidean space
– Distances and relationships plotted are meaningful
– This is a stronger property than the triangle inequality
Measures for our Examples
• Mammal teeth - counts of teeth types
– Manhattan (city block) distance
– Total teeth different between two species
• Difference between colors (Ekman)
– Similarity measure – converted to distance
– How well people distinguish between colors
– We use the Kruskal measure (non-metric)
• Distances between cities
– Euclidean distance
– Miles between cities
Basic Procedure for MDS
• Metric Scaling
– Eigenvalue/eigenvector decomposition
– Choose a reduced number of components that still
preserves distances
– Create new coordinates based on reduced
components
• Non-metric scaling
– Reduce dimensions but preserve rank order
– Done using Isotonic regression and iterative
algorithms
Examples: Teeth Data
• 32 mammals and 8 categories of teeth
• We are interested in how “close” these
mammals are based on their teeth counts
• We use city block distance and look at want to
reduce things to 2 dimensions (from 8)
Teeth Data
Example: Ekman Color Study
• 14 different wavelengths
• 31 subjects asked to rate how well they could
distinguish between different pairs
• Ratings were averaged and scaled to get a
similarity index between 0 and 1
• We use non-metric scaling and look at a
reduction to 2 dimensions (from 14)
Color Study
Example: Distances between cities
• We have 10 U.S. cities and distances between
all pairs
• Can we reduce this distance matrix to a lower
dimension like 2 (from 10).
City Distances
Comments on MDS
• There are MANY measures we can use
– Some make more sense than others
– It depends on the data and what you are
interested in
– Different measures can lead to different results
• How many dimensions should you use?
– It’s easiest to explain 2-3 three dimensions
– There are different criteria or guidelines for metric
and non-metric scaling
One More Example
• Supposed we have data that can be organized
into a two-way table or binary or count values.
• For a small table we can do some contingency
table analyses like tests for homogeneity or
independence.
• For large tables we might like to reduce or
summarize the table
• One method is called Correspondence
Analysis
Correspondence Analysis
• Our distance measure is the Pearson chisquare measure between the observed cell
value and its expected value.
• As before, we need to decide if we are
interested in our subjects or our variables
• Similar or analogous to PCA and MDS in terms
of dimension reduction and interpretation.
• Unfortunately, the terminology is a little
different. So be careful.
Example: Postal Employees
• Postal employees for 6 positions were drug tested
• Results include negative, marijuana, cocaine, and
other
• We are interested in identifying any patterns or
trends
Postal Employees
Sources
• We compiled the information from this talk
from Methods of Multivariate Analysis 2nd ed.
by Alvin C. Rencher and from our notes from
STAT 5504 compiled by Dr. Eric Smith, Dept. of
Statistics.
• Thanks! Any questions?