Differentiation with respect to a vector, matrix

Download Report

Transcript Differentiation with respect to a vector, matrix

Correlation
The sample covariance matrix:
 s11
s
12


S

p p

 s1 p
s12
s11
s2 p
where
s1 p 

s2 p 


s pp 
1
sik 
xij  xi  xkj  xk 


n  1 j 1
n
The sample correlation matrix:
1
r
12


R

p p

 r1 p
1
r2 p
where
rik 
r1 p 

r2 p 


1 
r12
 x
n
sik
sii skk
j 1

 x
n
j 1
ij
ij
 xi  xkj  xk 
 xi 
2
 x
n
j 1
kj
 xk 
2
Note:
1
R  D SD
1
where
 s11

 0

D
p p

 0

0
s22
0
0 

0 


s pp 
Tests for Independence
and
Non-zero correlation
Tests for Independence
Test for zero correlation (Independence between a two
variables)
n  2

The test statistic t  rij
1  rij2
If independence is true then the test statistic t will have a t distributions with n = n –2 degrees of freedom.
The test is to reject independence if:
 n 2
t  t / 2
Test for non-zero correlation (H0:   0 
The test statistic
z
1  1  r  1  1  0 
ln 

  ln 
2  1  r  2  1  0 
1
n3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z  z / 2
Partial Correlation
Conditional Independence
Recall
 x1  q
has p-variate Normal distribution
If x   
 x2  p  q
 1  q
with mean vector    
 2  p  q
 11
and Covariance matrix   

12
12 
 22 
Then the conditional distribution of xi given x j is qi-variate
Normal distribution
with mean vector
and Covariance matrix
i j = i  ij  jj1  x j   j 
ii j  ii - ij jj1ij
1
 11
The matrix 2 1  22  12
12
is called the matrix of partial variances and covariances.
The  i, j 
th
element of the matrix  2 1
 ij 1,2....q
is called the partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
ij 1,2....q 
 ij 1,2....q
 ii 1,2....q jj 1,2....q
is called the partial correlation between xi and xj given
x1 , … , xq .
Let
 S11
S
 S12
S12 
S 22 
denote the sample Covariance matrix
Let
S2 1  S22 - S12 S111S12
The  i, j 
th
element of the matrix S 2 1
sij 1,2....q
is called the sample partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
Also
rij 1,2....q 
sij 1,2....q
sii 1,2....q s jj 1,2....q
is called the sample partial correlation between xi and xj
given x1, … , xq.
Test for zero partial correlation correlation (Conditional
independence between a two variables given a set of p
Independent variables)
The test statistic
rij. x1 ,
,xp
t  rij . x1 ,
 n  p  2
,xp
1  rij2. x1 ,
,x p
= the partial correlation between yi and yj given
x1, …, xp.
If independence is true then the test statistic t will have a t distributions with n = n – p - 2 degrees of freedom.
The test is to reject independence if:
t  t n/2 p 2
Test for non-zero partial correlation
H 0 : ij. x1 ,
, xp
 ij0. x1 ,
,xp
The test statistic
0

1

r
1
ij . x1 ,
ln 
2  1  rij0. x1 ,

z
0


1


1
, xp
ij . x1 ,
  ln 
0


2
1


, xp 
ij . x1 ,

1
n p 3
, xp
, xp
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z  z / 2




The Multiple Correlation
Coefficient
Testing independence between a single
variable and a group of variables
Definition
 y1
has (p +1)-variate Normal distribution
Suppose x   
 x1  p
 y  1
 
with mean vector
 1  p
 yy
and Covariance matrix   
 1 y
 1y 
11 
We are interested if the variable y is independent of the vector x1
The multiple correlation coefficient is the maximum
correlation between y and a linear combination of the
components of x1
Derivation
u   y  1 0   y 
Let    
=
 Ax




 v   ax1  0 a  x1 
This vector has a bivariate Normal distribution
with mean vector
 y 
A  


a

 1
  yy
and Covariance matrix AA  
 a 1 y
 1y a 
a11a 
We are interested if the variable y is independent of the vector x1
The multiple correlation coefficient is the maximum
correlation between y and a linear combination of the
components of x1
The multiple correlation coefficient is the maximum
correlation between y and ax1
The correlation between y and ax1
 a 
 1y a
 yy a11a
Thus we want to choose a to maximize   a 
Equivalently
  a 
2
1 a1 y1y a
 a 

 yy  a11a   yy  a11a 
2
1y
 d  a 1 y 1y a  
 d  a11a  

  a11a   
a 1 y 1y a 


da
da

d  2 a 


1 

2
da
 yy

 a 11a 
Note:

1 2  1 y 1y a   a11a   211a  a 1 y 1y a 
 a11a 
 yy
2
1y a  2  a11a  1 y  211a  a1 y  

0

2
 yy 

 a11a 

or  a11a   1 y  11a  a 1 y 
or aopt
a11a  1


 
 a1 y 
11
1y
1
 k 11
1y

The multiple correlation coefficient is independent of
the value of k.
y x ,
1
, xn
 1y aopt
   aopt  
 11aopt
 yy aopt
 1y  k 1111 y 

 yy


1
k 11
 1 y  11 k 111 1 y

1y 1111 y
 yy 1 y 1111 y


1y 1111 y

 yy
We are interested if the variable y is independent of the vector x1
if 1 y  0
and  y x1 ,
, xn
 1y 1111 y

0
 yy
The sample Multiple correlation coefficient
 s yy
Let S  
 s1 y
s1y 
denote the sample covariance matrix.

S11 
Then the sample Multiple correlation coefficient is
ry x1 ,
, xn

s1y S111s1 y
s yy
Testing for independence between y and x1
The test statistic
2
r
n  p  1 y x1 , , xn
F
p 1  ry2x1 , , xn
1

s
S
n  p 1
1 y 11 s1 y

p
s yy  s1y S111s1 y
If independence is true then the test statistic F will have an Fdistributions with n1 = p degrees of freedom in the numerator
and n1 = n – p + 1 degrees of freedom in the denominator
The test is to reject independence if:
F  F  p, n  p 1
Canonical Correlation Analysis
The problem
Quite often when one has collected data on several
variables.
The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of the
other set.
In addition if it is found that the two sets of variates are
dependent, it is then important to describe and
understand the nature of this dependence.
The appropriate statistical procedure in this case is
called Canonical Correlation Analysis.
Canonical Correlation: An Example
In the following study the researcher was interested in
whether specific instructions on how to relax when
taking tests and how to increase Motivation , would
affect performance on standardized achievement tests
•
•
•
Reading,
Language and
Mathematics
A group of 65 third- and fourth-grade students were
rated after the instruction and immediately prior
taking the Scholastic Achievement tests on:
• how relaxed they were (X1) and
• how motivated they were (X2).
In addition data was collected on the three
achievement tests
•
•
•
Reading (Y1),
Language (Y2) and
Mathematics (Y3).
The data were tabulated on the next page
Case
Relaxation
X1
Motivation
X2
Reading
Y1
Language
Y2
Math
Y3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
7
43
32
17
23
10
22
13
31
24
26
35
24
20
38
32
14
24
38
30
22
36
3
44
24
33
24
28
34
39
7
13
32
14
25
21
12
12
16
21
19
21
26
21
20
17
8
27
19
11
12
25
8
25
28
22
28
25
18
21
20
7
20
19
17
18
311
501
507
453
419
545
509
320
357
485
811
367
242
237
417
429
555
599
403
550
377
671
498
477
609
521
495
400
258
466
709
586
418
436
455
473
392
337
538
512
308
296
372
748
436
349
140
648
446
579
497
383
324
496
585
488
583
413
522
645
555
175
541
757
472
361
154
765
702
401
284
414
491
517
496
685
902
393
137
331
618
458
438
414
606
674
242
710
481
260
670
716
491
624
276
348
589
492
428
Case
Relaxation
X1
Motivation
X2
Reading
Y1
Language
Y2
Math
Y3
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
40
40
35
33
40
31
29
37
21
24
19
33
19
26
20
22
37
41
29
25
27
22
4
27
28
33
33
31
20
26
23
30
20
18
17
17
27
15
19
16
23
15
14
22
19
22
15
21
22
28
35
12
21
20
11
22
23
20
24
33
11
25
10
22
362
596
431
361
663
569
699
187
1132
457
413
569
650
424
475
519
338
674
381
199
577
425
392
401
321
682
719
672
366
581
681
1019
416
592
346
414
451
462
622
223
839
410
448
605
685
427
604
612
463
613
624
171
523
466
192
520
410
433
727
705
309
558
530
917
107
622
493
404
651
398
478
221
1044
400
520
615
440
482
742
446
327
534
565
316
699
402
354
558
460
743
1052
650
537
386
581
880
Definition: (Canonical variates and Canonical correlations)
 x1  q have p-variate Normal distribution
Let x   
 x2  p  q
 11 12 
 1  q
and   

with    



22 
 12
 2  p  q
Let
and
U1  a1x1  a11 x1   aq1 xq
V1  b1 x2  b11 xq 1   bp1 q x p
be such that U1 and V1 have achieved the maximum correlation f1.
Then U1 and V1 are called the first pair of canonical variates and
f1 is called the first canonical correlation coefficient.
derivation: ( 1st pair of Canonical variates and Canonical correlation)
Now
1

a
U1 
1 x1 
 V    1
 1  b1 xq 1 
 aq1 xq   a1x1 


1

 bp q x p  b1 x2 
 a1 0  x1 

    Ax
 0 b1   x2 
U 1 
Thus   has covariance matrix
 V1 
 a1 0  11
AA '  


 0 b1 12
12   a1 0 



22   0 b1 
 a111a1 a112b1 


 a1 b1 22b1 
b112
derivation: ( 1st pair of Canonical variates and Canonical correlation)
Now
1

a
U1 
1 x1 
 V    1
 1  b1 xq 1 
 aq1 xq   a1x1 


1

 bp q x p  b1 x2 
 a1 0  x1 

    Ax
 0 b1   x2 
U 1 
Thus   has covariance matrix
 V1 
12   a1 0   a111a1 a112b1 




22   0 b1  b112
 a1 b122b1 
 a1 0  11
AA '  


 0 b1 12
hence
U V 
1 1
a112b1
a111a1 b1 22b1
Thus we want to choose a1 and b1
so that
U V 
1 1
or
a112b1
a111a1 b1 22b1
a b 


 a a   b b 
2

2
U1V 1
1 12 1
1 11 1
Let
is at a maximum
1
22 1
a b 

V
 a a   b b 
2
1 12 1
1 11 1
1
22 1
is at a maximum
Computing derivatives



2 a112b1 12b1  a111a1   a112b1
V
1

2
a1
b1 22b1
 a111a1 




2
211a1
0

12b1  a111a1   a112b1 11a1
and




 


2
 a1 b122b1  a112b1 222b1
2 a112b1 12
V
1

0
2
b1  a111a1 
b122b1

 

 a1 b122b  a112b1 22b1
12
b1 22b 1
 a1
or b1 
 2212
a112b
a b 

   a 
 a
b b   a a 
a b 

    a 
a  ka
b b   a a 
2
Thus
12
1 12 1
1
22 12 1
11 1
1
22 11
1 11 1
2
1
11 12
1 12 1
1
22 12 1
1
1
22 11
1
1 11 1
1

This shows that a1 is an eigenvector of 11
1222112
a b 

k

 a a   b b 
2
1 12 1
1 11 1
1
2
U1V 1
22 1
Thus U21V 1 is maximized when k is the largest eigenvalue of
1
 and a1 is the eigenvector associated with the
11
1222112
largest eigenvalue.
a112b1
 a1
or
 22b1  12
b1 22b1
b1 22b 1
 a1
b1 
 22 12
a112b
Also
a b 

and
    a 
a
b b   a a 
a b 

      a 
 a
b b   a a 
a b 

a b
a
   
 b 
b b
b b   a a  b
a b 

    b 
b
b b   a a 
2
1
11 12
1 12 1
1
22 12 1
1
1
22 1
1 11 1
2
12
1
11 12
1 12 1
1
22 12 1
12 1
1
22 1
1 11 1
2
12
1
11 12
1
22
1 12 1
1 12 1
22 1
1
22 1
1
22 1
1 11 1
2
1
22 12
1 12 1
1
11 12 1
1
1
22 1
1 11 1
b
 22b1
22b1
1 12 1
1
Summary:
The first pair of canonical variates
U1  a1x1  a11 x1   aq1 xq
V1  b1x2  b11 xq 1   bp1 q x p
are found by finding a1 and b1, eigenvectors of the matrices
1
1
1
1
 11
 11
12
1222
and 22
12
12 respectively
associated with the largest eigenvalue (same for both matrices)
The largest eigenvalue of the two matrices is the square of the
first canonical correlation coefficient f1
f 
=
1
1
 11
the largest eigenvalue of 12
12 22
1
 11
the largest eigenvalue of  22112
12
Note:
1
1
1
1
 11
 11
12
1222
and 22
12
12
have exactly the same eigenvalues (same for both matrices)
Proof:
1
1
 11
Let  and a be an eigenvalue and eigenvector of 12
1222
.
then
1
 11
12
12221a  a.
and
1
 11
22112
12221a  221a.
1
1
 11
22112
12b  b where b  22
a
Thus  and b  221a is an eigenvalue and
1
 11
eigenvector of 22112
12 .
The remaining canonical variates and canonical
correlation coefficients
The second pair of canonical variates
U 2  a2 x1  a1 2 x1   aq 2 xq
 2
 2
,
so
that

V2  b2 x2  b1 xq1   bpq x p
are found by finding a2 and b2
1. (U2,V2) are independent of (U1,V1).
2. The correlation between U2 and V2 is
maximized
The correlation, f2, between U2 and V2 is called the second
canonical correlation coefficient.
The ith pair of canonical variates
Ui  aix1  a1i  x1   aqi  xq
Vi  bixi  b1i  xq 1   bpi q x p
are found by finding ai and bi, so that
1. (Ui,Vi) are independent of (U1,V1), …,
(Ui-1,Vi-1).
2. The correlation between Ui and Vi is
maximized
The correlation, f2, between U2 and V2 is called the second
canonical correlation coefficient.
derivation: ( 2nd pair of Canonical variates and Canonical correlation)
U1   a1x1   a1 0 
Now
 V   bx   0 b  x
1 1
 1    1 2 =
 Ax




U 2   a2 x1 
a2 0  x2 

 
  
 V2  b2 x2   0 b2 
has covariance matrix
 a1 0 


 0 b1   11 12   a1 0 a2 0 
AA  





  22   0 b1 0 b2 
a
 2 0   12
 0 b 
 a111a1 a112b1 a111a2 a112b2 
2



 a2 b1 22b2 
b1 22b1 b112
 *




*
a2 11a2 a2 12b2 
 *
 *


*
*
b

b
2 22 2 

a2 12b2
U V 
Now
a2 11a2 b2 22b2
2 2
a  b 


 a  a   b b 
2

2
U 2V2
and maximizing
2
2
Is equivalent to maximizing
11 2
12 2
2
22 2
 a  b 
2
2
12 2
subject to
 a2  0
a2 11a2  1, b222b2  1, a111a2  0, a112b2  0, b112
and b122b2
Using the Lagrange multiplier technique

V  a2 12b2

2

 1 1  a2 11a2   2 1  b222b2

 a2  6b122b2
3a111a2  4a112b2  5b112

V  a2 12b2

2

 1 1  a2 11a2   2 1  b222b2

 a2  6b122b2
3a111a2  4a112b2  5b112
Now
V
 2 a2 12b2 12b2  2111a2  311a1  512b1  0
a2


and
V
 a2  2222b2  412
 a1  622b1  0
 2 a2 12b2 12
b2

also

V
 0, i  1,
i
6
gives the restrictions
These equations can used to show that a1 and b1
are eigenvectors of the matrices
1
1
1
1
 11
 11
12
1222
and 22
12
12 respectively
associated with the 2nd largest eigenvalue (same for both
matrices)
The 2nd largest eigenvalue of the two matrices is the square of
the 2nd canonical correlation coefficient f2
f2 
=
1
1
 11
the 2nd largest eigenvalue of 12
12 22
1
 11
the 2nd largest eigenvalue of  22112
12
continuing
Coefficients for the ith pair of canonical variates, ai and bi
are eigenvectors of the matrices
1
1
1
1
 11
 11
12
1222
and 22
12
12 respectively
associated with the ith largest eigenvalue (same for both matrices)
The ith largest eigenvalue of the two matrices is the square of the
ith canonical correlation coefficient fi
fi 
1
1
 11
the i th largest eigenvalue of 12
1222
=
1
 11
the i th largest eigenvalue of  22112
12
Example
Variables
• relaxation Score (X1)
• motivation score (X2).
•
•
•
Reading (Y1),
Language (Y2) and
Mathematics (Y3).
Summary Statistics
UNIVARIATE SUMMARY STATISTICS
----------------------------MEAN
STANDARD
DEVIATION
26.87692
19.41538
499.03077
485.83077
512.52308
9.50412
5.83066
172.25508
156.08957
195.18614
VARIABLE
1
2
3
4
5
Relax
Mot
Read
Lang
Math
CORRELATIONS
-----------Relax
Relax
Mot
Read
Lang
Math
1
2
3
4
5
1
1.000
0.391
0.002
0.050
0.127
Mot
Read
2
1.000
0.280
0.510
0.340
Lang
3
1.000
0.781
0.713
Math
4
1.000
0.556
5
1.000
Canonical Correlation statistics Statistics
EIGENVALUE
0.35029
0.02523
CANONICAL
CORRELATION
0.59186
0.15885
NUMBER OF
EIGENVALUES
1
BARTLETT'S TEST FOR
REMAINING EIGENVALUES
CHISQUARE
D.F.
TAIL
PROB.
27.86
1.56
6
2
0.0001
0.4586
BARTLETT'S TEST ABOVE INDICATES THE NUMBER OF CANONICAL
VARIABLES NECESSARY TO EXPRESS THE DEPENDENCY BETWEEN THE
TWO SETS OF VARIABLES. THE NECESSARY NUMBER OF CANONICAL
VARIABLES IS THE SMALLEST NUMBER OF EIGENVALUES SUCH THAT
THE TEST OF THE REMAINING EIGENVALUES IS NON-SIGNIFICANT.
FOR EXAMPLE, IF A TEST AT THE .01 LEVEL WERE DESIRED,
THEN
1 VARIABLES WOULD BE CONSIDERED NECESSARY.
HOWEVER, THE NUMBER OF CANONICAL VARIABLES OF PRACTICAL
VALUE IS LIKELY TO BE SMALLER.
continued
CANONICAL VARIABLE LOADINGS
--------------------------(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR FIRST SET OF VARIABLES
CNVRF1
CNVRF2
1
2
Relax
1
0.197
0.980
Mot
2
0.979
0.203
-----------------------------
CANONICAL VARIABLE LOADINGS
--------------------------(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR SECOND SET OF VARIABLES
CNVRS1
CNVRS2
1
2
Read
3
0.504
-0.361
Lang
4
0.900
-0.354
Math
5
0.565
0.391
------------------------------
Summary
U1 = 0.197 Relax + 0.979 Mot
V1 = 0.504 Read + 0.900 Lang + 0.565 Math
f1 = .592
U2 = 0.980 Relax + 0.203 Mot
V2 = 0.391 Math - 0.361 Read - 0.354 Lang
f2 = .159