Differentiation with respect to a vector, matrix
Download
Report
Transcript Differentiation with respect to a vector, matrix
Correlation
The sample covariance matrix:
s11
s
12
S
p p
s1 p
s12
s11
s2 p
where
s1 p
s2 p
s pp
1
sik
xij xi xkj xk
n 1 j 1
n
The sample correlation matrix:
1
r
12
R
p p
r1 p
1
r2 p
where
rik
r1 p
r2 p
1
r12
x
n
sik
sii skk
j 1
x
n
j 1
ij
ij
xi xkj xk
xi
2
x
n
j 1
kj
xk
2
Note:
1
R D SD
1
where
s11
0
D
p p
0
0
s22
0
0
0
s pp
Tests for Independence
and
Non-zero correlation
Tests for Independence
Test for zero correlation (Independence between a two
variables)
n 2
The test statistic t rij
1 rij2
If independence is true then the test statistic t will have a t distributions with n = n –2 degrees of freedom.
The test is to reject independence if:
n 2
t t / 2
Test for non-zero correlation (H0: 0
The test statistic
z
1 1 r 1 1 0
ln
ln
2 1 r 2 1 0
1
n3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z z / 2
Partial Correlation
Conditional Independence
Recall
x1 q
has p-variate Normal distribution
If x
x2 p q
1 q
with mean vector
2 p q
11
and Covariance matrix
12
12
22
Then the conditional distribution of xi given x j is qi-variate
Normal distribution
with mean vector
and Covariance matrix
i j = i ij jj1 x j j
ii j ii - ij jj1ij
1
11
The matrix 2 1 22 12
12
is called the matrix of partial variances and covariances.
The i, j
th
element of the matrix 2 1
ij 1,2....q
is called the partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
ij 1,2....q
ij 1,2....q
ii 1,2....q jj 1,2....q
is called the partial correlation between xi and xj given
x1 , … , xq .
Let
S11
S
S12
S12
S 22
denote the sample Covariance matrix
Let
S2 1 S22 - S12 S111S12
The i, j
th
element of the matrix S 2 1
sij 1,2....q
is called the sample partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
Also
rij 1,2....q
sij 1,2....q
sii 1,2....q s jj 1,2....q
is called the sample partial correlation between xi and xj
given x1, … , xq.
Test for zero partial correlation correlation (Conditional
independence between a two variables given a set of p
Independent variables)
The test statistic
rij. x1 ,
,xp
t rij . x1 ,
n p 2
,xp
1 rij2. x1 ,
,x p
= the partial correlation between yi and yj given
x1, …, xp.
If independence is true then the test statistic t will have a t distributions with n = n – p - 2 degrees of freedom.
The test is to reject independence if:
t t n/2 p 2
Test for non-zero partial correlation
H 0 : ij. x1 ,
, xp
ij0. x1 ,
,xp
The test statistic
0
1
r
1
ij . x1 ,
ln
2 1 rij0. x1 ,
z
0
1
1
, xp
ij . x1 ,
ln
0
2
1
, xp
ij . x1 ,
1
n p 3
, xp
, xp
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z z / 2
The Multiple Correlation
Coefficient
Testing independence between a single
variable and a group of variables
Definition
y1
has (p +1)-variate Normal distribution
Suppose x
x1 p
y 1
with mean vector
1 p
yy
and Covariance matrix
1 y
1y
11
We are interested if the variable y is independent of the vector x1
The multiple correlation coefficient is the maximum
correlation between y and a linear combination of the
components of x1
Derivation
u y 1 0 y
Let
=
Ax
v ax1 0 a x1
This vector has a bivariate Normal distribution
with mean vector
y
A
a
1
yy
and Covariance matrix AA
a 1 y
1y a
a11a
We are interested if the variable y is independent of the vector x1
The multiple correlation coefficient is the maximum
correlation between y and a linear combination of the
components of x1
The multiple correlation coefficient is the maximum
correlation between y and ax1
The correlation between y and ax1
a
1y a
yy a11a
Thus we want to choose a to maximize a
Equivalently
a
2
1 a1 y1y a
a
yy a11a yy a11a
2
1y
d a 1 y 1y a
d a11a
a11a
a 1 y 1y a
da
da
d 2 a
1
2
da
yy
a 11a
Note:
1 2 1 y 1y a a11a 211a a 1 y 1y a
a11a
yy
2
1y a 2 a11a 1 y 211a a1 y
0
2
yy
a11a
or a11a 1 y 11a a 1 y
or aopt
a11a 1
a1 y
11
1y
1
k 11
1y
The multiple correlation coefficient is independent of
the value of k.
y x ,
1
, xn
1y aopt
aopt
11aopt
yy aopt
1y k 1111 y
yy
1
k 11
1 y 11 k 111 1 y
1y 1111 y
yy 1 y 1111 y
1y 1111 y
yy
We are interested if the variable y is independent of the vector x1
if 1 y 0
and y x1 ,
, xn
1y 1111 y
0
yy
The sample Multiple correlation coefficient
s yy
Let S
s1 y
s1y
denote the sample covariance matrix.
S11
Then the sample Multiple correlation coefficient is
ry x1 ,
, xn
s1y S111s1 y
s yy
Testing for independence between y and x1
The test statistic
2
r
n p 1 y x1 , , xn
F
p 1 ry2x1 , , xn
1
s
S
n p 1
1 y 11 s1 y
p
s yy s1y S111s1 y
If independence is true then the test statistic F will have an Fdistributions with n1 = p degrees of freedom in the numerator
and n1 = n – p + 1 degrees of freedom in the denominator
The test is to reject independence if:
F F p, n p 1
Canonical Correlation Analysis
The problem
Quite often when one has collected data on several
variables.
The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of the
other set.
In addition if it is found that the two sets of variates are
dependent, it is then important to describe and
understand the nature of this dependence.
The appropriate statistical procedure in this case is
called Canonical Correlation Analysis.
Canonical Correlation: An Example
In the following study the researcher was interested in
whether specific instructions on how to relax when
taking tests and how to increase Motivation , would
affect performance on standardized achievement tests
•
•
•
Reading,
Language and
Mathematics
A group of 65 third- and fourth-grade students were
rated after the instruction and immediately prior
taking the Scholastic Achievement tests on:
• how relaxed they were (X1) and
• how motivated they were (X2).
In addition data was collected on the three
achievement tests
•
•
•
Reading (Y1),
Language (Y2) and
Mathematics (Y3).
The data were tabulated on the next page
Case
Relaxation
X1
Motivation
X2
Reading
Y1
Language
Y2
Math
Y3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
7
43
32
17
23
10
22
13
31
24
26
35
24
20
38
32
14
24
38
30
22
36
3
44
24
33
24
28
34
39
7
13
32
14
25
21
12
12
16
21
19
21
26
21
20
17
8
27
19
11
12
25
8
25
28
22
28
25
18
21
20
7
20
19
17
18
311
501
507
453
419
545
509
320
357
485
811
367
242
237
417
429
555
599
403
550
377
671
498
477
609
521
495
400
258
466
709
586
418
436
455
473
392
337
538
512
308
296
372
748
436
349
140
648
446
579
497
383
324
496
585
488
583
413
522
645
555
175
541
757
472
361
154
765
702
401
284
414
491
517
496
685
902
393
137
331
618
458
438
414
606
674
242
710
481
260
670
716
491
624
276
348
589
492
428
Case
Relaxation
X1
Motivation
X2
Reading
Y1
Language
Y2
Math
Y3
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
40
40
35
33
40
31
29
37
21
24
19
33
19
26
20
22
37
41
29
25
27
22
4
27
28
33
33
31
20
26
23
30
20
18
17
17
27
15
19
16
23
15
14
22
19
22
15
21
22
28
35
12
21
20
11
22
23
20
24
33
11
25
10
22
362
596
431
361
663
569
699
187
1132
457
413
569
650
424
475
519
338
674
381
199
577
425
392
401
321
682
719
672
366
581
681
1019
416
592
346
414
451
462
622
223
839
410
448
605
685
427
604
612
463
613
624
171
523
466
192
520
410
433
727
705
309
558
530
917
107
622
493
404
651
398
478
221
1044
400
520
615
440
482
742
446
327
534
565
316
699
402
354
558
460
743
1052
650
537
386
581
880
Definition: (Canonical variates and Canonical correlations)
x1 q have p-variate Normal distribution
Let x
x2 p q
11 12
1 q
and
with
22
12
2 p q
Let
and
U1 a1x1 a11 x1 aq1 xq
V1 b1 x2 b11 xq 1 bp1 q x p
be such that U1 and V1 have achieved the maximum correlation f1.
Then U1 and V1 are called the first pair of canonical variates and
f1 is called the first canonical correlation coefficient.
derivation: ( 1st pair of Canonical variates and Canonical correlation)
Now
1
a
U1
1 x1
V 1
1 b1 xq 1
aq1 xq a1x1
1
bp q x p b1 x2
a1 0 x1
Ax
0 b1 x2
U 1
Thus has covariance matrix
V1
a1 0 11
AA '
0 b1 12
12 a1 0
22 0 b1
a111a1 a112b1
a1 b1 22b1
b112
derivation: ( 1st pair of Canonical variates and Canonical correlation)
Now
1
a
U1
1 x1
V 1
1 b1 xq 1
aq1 xq a1x1
1
bp q x p b1 x2
a1 0 x1
Ax
0 b1 x2
U 1
Thus has covariance matrix
V1
12 a1 0 a111a1 a112b1
22 0 b1 b112
a1 b122b1
a1 0 11
AA '
0 b1 12
hence
U V
1 1
a112b1
a111a1 b1 22b1
Thus we want to choose a1 and b1
so that
U V
1 1
or
a112b1
a111a1 b1 22b1
a b
a a b b
2
2
U1V 1
1 12 1
1 11 1
Let
is at a maximum
1
22 1
a b
V
a a b b
2
1 12 1
1 11 1
1
22 1
is at a maximum
Computing derivatives
2 a112b1 12b1 a111a1 a112b1
V
1
2
a1
b1 22b1
a111a1
2
211a1
0
12b1 a111a1 a112b1 11a1
and
2
a1 b122b1 a112b1 222b1
2 a112b1 12
V
1
0
2
b1 a111a1
b122b1
a1 b122b a112b1 22b1
12
b1 22b 1
a1
or b1
2212
a112b
a b
a
a
b b a a
a b
a
a ka
b b a a
2
Thus
12
1 12 1
1
22 12 1
11 1
1
22 11
1 11 1
2
1
11 12
1 12 1
1
22 12 1
1
1
22 11
1
1 11 1
1
This shows that a1 is an eigenvector of 11
1222112
a b
k
a a b b
2
1 12 1
1 11 1
1
2
U1V 1
22 1
Thus U21V 1 is maximized when k is the largest eigenvalue of
1
and a1 is the eigenvector associated with the
11
1222112
largest eigenvalue.
a112b1
a1
or
22b1 12
b1 22b1
b1 22b 1
a1
b1
22 12
a112b
Also
a b
and
a
a
b b a a
a b
a
a
b b a a
a b
a b
a
b
b b
b b a a b
a b
b
b
b b a a
2
1
11 12
1 12 1
1
22 12 1
1
1
22 1
1 11 1
2
12
1
11 12
1 12 1
1
22 12 1
12 1
1
22 1
1 11 1
2
12
1
11 12
1
22
1 12 1
1 12 1
22 1
1
22 1
1
22 1
1 11 1
2
1
22 12
1 12 1
1
11 12 1
1
1
22 1
1 11 1
b
22b1
22b1
1 12 1
1
Summary:
The first pair of canonical variates
U1 a1x1 a11 x1 aq1 xq
V1 b1x2 b11 xq 1 bp1 q x p
are found by finding a1 and b1, eigenvectors of the matrices
1
1
1
1
11
11
12
1222
and 22
12
12 respectively
associated with the largest eigenvalue (same for both matrices)
The largest eigenvalue of the two matrices is the square of the
first canonical correlation coefficient f1
f
=
1
1
11
the largest eigenvalue of 12
12 22
1
11
the largest eigenvalue of 22112
12
Note:
1
1
1
1
11
11
12
1222
and 22
12
12
have exactly the same eigenvalues (same for both matrices)
Proof:
1
1
11
Let and a be an eigenvalue and eigenvector of 12
1222
.
then
1
11
12
12221a a.
and
1
11
22112
12221a 221a.
1
1
11
22112
12b b where b 22
a
Thus and b 221a is an eigenvalue and
1
11
eigenvector of 22112
12 .
The remaining canonical variates and canonical
correlation coefficients
The second pair of canonical variates
U 2 a2 x1 a1 2 x1 aq 2 xq
2
2
,
so
that
V2 b2 x2 b1 xq1 bpq x p
are found by finding a2 and b2
1. (U2,V2) are independent of (U1,V1).
2. The correlation between U2 and V2 is
maximized
The correlation, f2, between U2 and V2 is called the second
canonical correlation coefficient.
The ith pair of canonical variates
Ui aix1 a1i x1 aqi xq
Vi bixi b1i xq 1 bpi q x p
are found by finding ai and bi, so that
1. (Ui,Vi) are independent of (U1,V1), …,
(Ui-1,Vi-1).
2. The correlation between Ui and Vi is
maximized
The correlation, f2, between U2 and V2 is called the second
canonical correlation coefficient.
derivation: ( 2nd pair of Canonical variates and Canonical correlation)
U1 a1x1 a1 0
Now
V bx 0 b x
1 1
1 1 2 =
Ax
U 2 a2 x1
a2 0 x2
V2 b2 x2 0 b2
has covariance matrix
a1 0
0 b1 11 12 a1 0 a2 0
AA
22 0 b1 0 b2
a
2 0 12
0 b
a111a1 a112b1 a111a2 a112b2
2
a2 b1 22b2
b1 22b1 b112
*
*
a2 11a2 a2 12b2
*
*
*
*
b
b
2 22 2
a2 12b2
U V
Now
a2 11a2 b2 22b2
2 2
a b
a a b b
2
2
U 2V2
and maximizing
2
2
Is equivalent to maximizing
11 2
12 2
2
22 2
a b
2
2
12 2
subject to
a2 0
a2 11a2 1, b222b2 1, a111a2 0, a112b2 0, b112
and b122b2
Using the Lagrange multiplier technique
V a2 12b2
2
1 1 a2 11a2 2 1 b222b2
a2 6b122b2
3a111a2 4a112b2 5b112
V a2 12b2
2
1 1 a2 11a2 2 1 b222b2
a2 6b122b2
3a111a2 4a112b2 5b112
Now
V
2 a2 12b2 12b2 2111a2 311a1 512b1 0
a2
and
V
a2 2222b2 412
a1 622b1 0
2 a2 12b2 12
b2
also
V
0, i 1,
i
6
gives the restrictions
These equations can used to show that a1 and b1
are eigenvectors of the matrices
1
1
1
1
11
11
12
1222
and 22
12
12 respectively
associated with the 2nd largest eigenvalue (same for both
matrices)
The 2nd largest eigenvalue of the two matrices is the square of
the 2nd canonical correlation coefficient f2
f2
=
1
1
11
the 2nd largest eigenvalue of 12
12 22
1
11
the 2nd largest eigenvalue of 22112
12
continuing
Coefficients for the ith pair of canonical variates, ai and bi
are eigenvectors of the matrices
1
1
1
1
11
11
12
1222
and 22
12
12 respectively
associated with the ith largest eigenvalue (same for both matrices)
The ith largest eigenvalue of the two matrices is the square of the
ith canonical correlation coefficient fi
fi
1
1
11
the i th largest eigenvalue of 12
1222
=
1
11
the i th largest eigenvalue of 22112
12
Example
Variables
• relaxation Score (X1)
• motivation score (X2).
•
•
•
Reading (Y1),
Language (Y2) and
Mathematics (Y3).
Summary Statistics
UNIVARIATE SUMMARY STATISTICS
----------------------------MEAN
STANDARD
DEVIATION
26.87692
19.41538
499.03077
485.83077
512.52308
9.50412
5.83066
172.25508
156.08957
195.18614
VARIABLE
1
2
3
4
5
Relax
Mot
Read
Lang
Math
CORRELATIONS
-----------Relax
Relax
Mot
Read
Lang
Math
1
2
3
4
5
1
1.000
0.391
0.002
0.050
0.127
Mot
Read
2
1.000
0.280
0.510
0.340
Lang
3
1.000
0.781
0.713
Math
4
1.000
0.556
5
1.000
Canonical Correlation statistics Statistics
EIGENVALUE
0.35029
0.02523
CANONICAL
CORRELATION
0.59186
0.15885
NUMBER OF
EIGENVALUES
1
BARTLETT'S TEST FOR
REMAINING EIGENVALUES
CHISQUARE
D.F.
TAIL
PROB.
27.86
1.56
6
2
0.0001
0.4586
BARTLETT'S TEST ABOVE INDICATES THE NUMBER OF CANONICAL
VARIABLES NECESSARY TO EXPRESS THE DEPENDENCY BETWEEN THE
TWO SETS OF VARIABLES. THE NECESSARY NUMBER OF CANONICAL
VARIABLES IS THE SMALLEST NUMBER OF EIGENVALUES SUCH THAT
THE TEST OF THE REMAINING EIGENVALUES IS NON-SIGNIFICANT.
FOR EXAMPLE, IF A TEST AT THE .01 LEVEL WERE DESIRED,
THEN
1 VARIABLES WOULD BE CONSIDERED NECESSARY.
HOWEVER, THE NUMBER OF CANONICAL VARIABLES OF PRACTICAL
VALUE IS LIKELY TO BE SMALLER.
continued
CANONICAL VARIABLE LOADINGS
--------------------------(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR FIRST SET OF VARIABLES
CNVRF1
CNVRF2
1
2
Relax
1
0.197
0.980
Mot
2
0.979
0.203
-----------------------------
CANONICAL VARIABLE LOADINGS
--------------------------(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR SECOND SET OF VARIABLES
CNVRS1
CNVRS2
1
2
Read
3
0.504
-0.361
Lang
4
0.900
-0.354
Math
5
0.565
0.391
------------------------------
Summary
U1 = 0.197 Relax + 0.979 Mot
V1 = 0.504 Read + 0.900 Lang + 0.565 Math
f1 = .592
U2 = 0.980 Relax + 0.203 Mot
V2 = 0.391 Math - 0.361 Read - 0.354 Lang
f2 = .159