Transcript Document
Lecture XXVII
Orthonormal Bases and Projections
Suppose that a set of vectors {x1,…,xr} for a basis for
some space S in Rm space such that r m. For
mathematical simplicity, we may want to form an
orthogonal basis for this space. One way to form such
a basis is the Gram-Schmit orthonormalization. In
this procedure, we want to generate a new set of
vectors {y1,…yr} that are orthonormal.
The Gram-Schmit process is:
y1 x1
x2 ' y1
y 2 x2
y1
y1 ' y1
x3 ' y1 x3 ' y2
y3 x3
y1 ' y1 y2 ' y2
zi
yi
yi ' yi 2
1
Example
1
9
x1 3 , x2 7
4
16
1
9 7 16 3
70
1
9
4 13
3 50
y2 7
13
1
16
4 20
1 3 4 3 13
4
The vectors can then be normalized to one. However,
to test for orthogonality:
70
13
1 3 4 5013 0
20
13
Theorem 2.13 Every r-dimensional vector space, except
the zero-dimensional space {0}, has an orthonormal
basis.
Theorem 2.14 Let {z1,…zr} be an orthornomal basis for
some vector space S, of Rm. Then each x Rm can be
expressed uniquely as
x u v
were u S and v is a vector that is orthogonal to every
vector in S.
Definition 2.10 Let S be a vector subspace of Rm. The
orthogonal complement of S, denoted S, is the
collection of all vectors in Rm that are orthogonal to
every vector in S: That is, S={x:x Rm and x’y=0 for all
yS}.
Theorem 2.15. If S is a vector subspace of Rm then its
orthogonal complement S is also a vector subspace of
Rm.
Projection Matrices
The orthogonal projection of an m x 1 vector x onto a
vector space S can be expressed in matrix form.
Let {z1,…zr} be any othonormal basis for S while {z1,…zm}
is an orthonormal basis for Rm. Any vector x can be
written as:
x 1z1 r zr r 1zr 1 m zm u v
Aggregating 1’,2’)’ where 1=(1 …r)’ and
2=(r+1…m)’ and assuming a similar decomposition of
Z=[Z1 Z2], the vector x can be written as:
x Z Z11 Z 2 2
u Z11
v Z 2 2
given orthogonality, we know that Z1’Z1=Ir and Z1’Z2=(0),
and so
1
1
Z1Z1 ' x Z1Z1 ' Z1 Z 2 Z1 0 Z11 u
2
2
Theorem 2.17 Suppose the columns of the m x r matrix
Z1 from an orthonormal basis for the vector space S
which is a subspace of Rm. If x Rm, the orthogonal
projection of x onto S is given by Z1Z1’x.
Projection matrices allow the division of the space into a
spanned space and a set of orthogonal deviations from
the spanning set. One such separation involves the
Gram-Schmit system.
In general, if we define the m x r matrix X1=(x1,…xr) and
define the linear transformation of this matrix that
produces an orthonormal basis as A, so that:
Z1 X1 A
we are left with the result that:
Z1 ' Z1 A' X1 ' X1 A I r
Given that the matrix A is nonsingular, the projection
matrix that maps any vector x onto the spanning set
then becomes:
1
PS Z1Z1 ' X1 AA' X1 ' X1 ( X ' X ) X1 '
Ordinary least squares is also a spanning
decomposition. In the traditional linear model:
y Xb
yˆ Xbˆ
within this formulation b is chosen to minimize the
error between y and estimated y:
y Xb ' y Xb
This problem implies minimizing the distance
between the observed y and the predicted plane Xb,
which implies orthogonality. If X has full column
rank, the projection space becomes X(X’X)-1X’ and the
projection then becomes:
Xb X X ' X X ' y
1
Premultiplying each side by X’ yields:
X ' Xb X ' X X ' X X ' y
1
b X ' X X ' X X ' X X ' y
1
b X ' X X ' y
1
1
Idempotent matrices can be defined as any matrix
such that AA=A.
Note that the sum of square errors under the projection
can be expressed as:
SSE y Xb ' y Xb
I X X ' X X 'y ' I X X ' X X 'y
y ' I X X ' X X 'I X X ' X X 'y
y X X ' X X ' y ' y X X ' X X ' y
1
1
1
1
N
N
1
n
1
n
In general, the matrix In-X(X’X)-1X’ is referred to as an
idempotent matrix. An idempotent matrix is one that
AA=A:
I
n
X X ' X X ' In X X ' X X '
1
1
I n X X ' X X ' X X ' X X '
1
1
X X ' X X ' X X ' X X '
1
In X X ' X X '
1
1
Thus, the SSE can be expressed as:
SSE y ' I n X X ' X X ' y
1
y ' y y ' X X ' X X ' y v' v
1
which is the sum of the orthogonal errors from the
regression
Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors (or more appropriately
latent roots and characteristic vectors) are defined by
the solution
Ax x
for a nonzero x. Mathematically, we can solve for the
eigenvalue by rearranging the terms:
Ax x 0
A I x 0
Solving for then involves solving the characteristic
equation that is implied by:
A I 0
Again using the matrix in the previous example:
9
5
1 9 5
1 0 0 1
3 7 8 0 1 0 3
7
8
2 3 5
0 0 1
2
3
5
5 14 132 3 0
In general, there are m roots to the characteristic
equation. Some of these roots may be the same. In
the above case, the roots are complex. Turning to
another example:
5 3 3
A 4 2 3 {1,2,5}
4 4 5
The eigenvectors are then determined by the linear
dependence in A-I matrix. Taking the last example:
4 3 3
A I 4 3 3
4 4 4
Obviously, the first and second rows are linear. The
reduced system then implies that as long as x1=x2 and
x3=0, the resulting matrix is zero.
Theorem 11.5.1 For any symmetric matrix, A, there exists
an orthogonal matrix H (that is, a square matrix
satisfying H’H=I) wuch that:
H ' AH L
where L is a diagonal matrix. The diagonal elements of
L are called the characteristic roots (or eigenvalues) of
A. The ith column of H is called the characteristic vector
(or eigenvector) of A corresponding to the characteristic
root of A.
This proof follows directly from the definition of
eigenvalues. Letting H be a matrix with eigenvalues in
the columns it is obvious that
AH LH
H ' AH H ' LH LH ' H L
Kronecker Products
Two special matrix operations that you will encounter
are the Kronecker product and vec() operators.
The Kronecker product is a matrix is an element by
element multiplication of the elements of the first
matrix by the entire second matrix:
a11 B a12 B
a B a B
21
22
A B
am1 B am 2 B
a1n B
a2 n B
amn B
The vec(.) operator then involves stacking the columns
of a matrix on top of one another.