Principal Component Analysis

Download Report

Transcript Principal Component Analysis

Principal Component
Analysis
Tanya and Caroline
Overview
• Basic function is to condense data
• PCA is used when several underlying
factors shape the data
– Differences in geology between two areas
• Unlike Bray-Curtis ordination, PCA is
objective
• It finds the most useful angle from which to
view the shape of the pattern the data
points make
PCA is NOT…
1. Factor Analysis or Principal Coordinates
Analysis (PCO)
2. A test of significance
3. No null hypothesis is required
4. Prior to ordination – no way to objectively
decide which variables to include
5. After analysis – no way to decide which
variables were unimportant
6. Cannot cope with missing values
2-D vs. multi-D
• Make a scatter plot of all data points
• As the number of variables increases, data
space becomes harder to visualize
This is where PCA comes in!
PCA
•
Simplifies data by reducing dimensions
of data space
1. Finds the most informative viewpoint
from which to visualize the data from a
scatter plot
2. Produces low-dimensional images of
high dimensional shapes
3. Shows amount of variance between axes
• Find first principal axis which always passes
through the overall mean of the dataset
• Find second ordination axis which must be
orthogonal or 90° to first axis
• Each successive axis explains less variance
than its predecessors and is assumed to be less
important
• First principal axis accounts for greatest possible
percentage of overall variance and second
principal axis accounts for remaining variance
Example
Mechanics of PCA
1.
2.
Normalizing data
Generating Principal Axes
•
Loadings→ Eigenvalues + Eigenvectors → Correlation matrix
Eigenvalues– rate of growth per multiplication
Eigenvector– pattern formed
3.
4.
5.
Interpretation of eigenvalues- gives the importance of each
ordination axis and the largest eigenvalue indicates the first
principal axis, etc.
Eigenvalues and eigenvectors summarize underlying structure
of a matrix
Deriving axis scores- take the Normalized Data X First
Eigenvector to get first principal axis then the same for second
eigenvector
Normalization
Example
• 2 sites- site 1 is a Heath and site 2 is a
Mound
• PCA only for the data for the 8 plant
species (vegetation)
% Variance
• The larger the
variance, the greater
the amount of info
that has been
condensed into the
ordination axis
Kaiser-Guttman vs. Broken Stick
Let’s GRAPH!
Homework
1. What’s the purpose of PCA and what 3
things does it give us?
2. Define eigenvalue and eigenvector.
3. Interpret the Figure 6.10 on page 111,
what does the First Principal Axis show
and what does the Second Principal Axis
show???