• A measure of how "spread out" a sequence of
(𝑥𝑖 − μ)2
• A measure of correlation between data
– Data set of size n
– Each data element has 3 fields:
• Birth date
• [Collect data from class]
(𝑥𝑖 − μ𝑖 )(𝑥𝑗 − μ𝑗 )
• The diagonals are the variance of that feature
• Non-diagonals are a measure of correlation
– High-positive == positive correlation
• one goes up, other goes up
– Low-negative == negative correlation
• one goes up, other goes down
– Near-zero == no correlation
– [How high depends on the range of values]
• You can calculate it with a matrix:
– Raw Matrix is a p x q matrix
• p features
• q samples
– Convert to mean-deviation form
• Calculate the average sample
• Subtract this from all samples.
– Multiply MeanDev (a p x q matrix) by its transpose
(a q x p matrix)
– Multiply by 1/n to get the covariance matrix.
• [Calculate our covariance matrix]
• An EigenSystem is:
A vector 𝒗 (the eigenvector)
A scalar λ (the eigenvalue)
• Such that:
𝐴𝑣 = λ𝑣
(the zero vector isn't an eigenvector)
• In general, not all matrices have eigenvectors.
EigenSystems and PCA
• When you calculate the eigen-system of an n x
n Covariance matrix you get:
– n eigenvectors (each of dimension n)
– n matching eigenvalues
• The biggest eigen-value "explains" the largest
amount of variance in the data set.
• Say we have a 2d data set
First eigen-pair (v1 = [0.8, 0.6], λ=800.0)
Second eigen-pair (v2 = [-0.6, 0.8], λ=100.0)
8x as much variance is along v1 as v2.
v1 and v2 are perpendicular to each other
v1 and v2 define a new set of basis vectors for this data set.
Conversions between basis vectors
• Let's take one data point…
– Let's say it is [-1.5, 0.4] in "world units"
• Project it onto v1 and v2 to get the coordinates relative to
(v1, v2 unit-length basis vectors)
𝑛𝑒𝑤𝐶𝑜𝑜𝑟𝑑 = 𝑃●𝑣1
To convert back to "world units":
𝑛𝑒𝑤𝐶𝑜𝑜𝑟𝑑0 ∗ 𝑣1
𝑛𝑒𝑤𝐶𝑜𝑜𝑟𝑑1 ∗ 𝑣2
PCA and compression
– n (the number of features) is high (~100)
– Most of the variance is captured by 3 eigenvectors.
– You can throw out the other 97 eigen-vectors.
– You can represent most of the data for each
sample using just 3 numbers per sample (instead
• For a large data set, this can be huge.
1. Collect database images
a. Subject looking straight ahead, no emotion, neutral
i. on the top include all of the eyebrows
ii. on the bottom include just to the chin
iii. on the sides, include all of the face.
c. Size to 32x32, grayscale (a limit of the eigen-solver)
d. In code, include a way to convert to (and from) a
2. Calculate the average image
a. Just pixel (Vector element) by element.
3. Calculate the Covariance matrix
4. Calculate the EigenSystem
a. Keep the eigen-pairs that preserve n% of the
data variance (98% or so)
b. Your Eigen-database is the 32x32 average image
and the (here) 8 32x32 eigen-face images.
5. Represent each of your faces as a q-value vector (q =
# of eigenfaces).
Subtract the average and project onto the q eigenfaces
The images I'm showing here are the original image and
the 8-value "eigen-coordinates
6. (for demonstration of compression)
– You can reconstruct a compressed image by:
• Start with a copy of the average image, X
• Repeat for each eigenface:
– Add the eigen-coord * eigenface to X
– Here are the reconstructions of the 2 images on the last
7. Facial Recognition
– Take a novel image (same size as database
– Using the eigenfaces computed earlier (this novel
image is usually NOT part of this computation),
– Calculate the q-dimensional distance
(pythagorean theorem in q-dimensions) between
this image and each database image.
The database image with the smallest distance is your