Transcript ppt

Correlation and Covariance
R. F. Riesenfeld
(Based on web slides by
James H. Steiger)
Goals
⇨ Introduce concepts of


Covariance
Correlation
⇨ Develop computational formulas
R F Riesenfeld Sp 2010
CS5961 Comp Stat
2
Covariance
⇨ Variables may change in relation to each
other
⇨ Covariance measures how much the
movement in one variable predicts the
movement in a corresponding variable
R F Riesenfeld Sp 2010
CS5961 Comp Stat
3
Smoking and Lung Capacity
⇨ Example: investigate relationship
between cigarette smoking and lung
capacity
⇨ Data: sample group response data on
smoking habits, and measured lung
capacities, respectively
R F Riesenfeld Sp 2010
CS5961 Comp Stat
4
Smoking v Lung Capacity Data
N
Cigarettes (X )
Lung Capacity (Y )
1
2
0
5
45
42
3
10
33
4
15
31
5
20
29
R F Riesenfeld Sp 2010
CS5961 Comp Stat
5
Smoking and Lung Capacity
Lung Capacity (Y )
50
Lung Capacity
45
40
35
30
25
20
-5
0
5
10
Smoking (yrs)
15
20
25
6
Smoking v Lung Capacity
⇨ Observe that as smoking exposure goes
up, corresponding lung capacity goes
down
⇨ Variables covary inversely
⇨ Covariance and Correlation quantify
relationship
R F Riesenfeld Sp 2010
CS5961 Comp Stat
7
Covariance
⇨ Variables that covary inversely, like
smoking and lung capacity, tend to
appear on opposite sides of the group
means

When smoking is above its group mean, lung
capacity tends to be below its group mean.
⇨ Average product of deviation measures
extent to which variables covary, the
degree of linkage between them
R F Riesenfeld Sp 2010
CS5961 Comp Stat
8
The Sample Covariance
⇨ Similar to variance, for theoretical
reasons, average is typically computed
using (N -1), not N . Thus,
1 N
S xy 
Xi  X


N  1 i1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
Y  Y 
i
9
Calculating Covariance
Cigs (X )
R F Riesenfeld Sp 2010
Lung Cap (Y )
0
5
10
15
20
45
42
33
31
29
X  10
Y  36
CS5961 Comp Stat
10
Calculating Covariance
Cigs (X ) ( X  X ) ( X  X ) (Y  Y ) (Y  Y ) Cap (Y )
0
-10
-90
9
45
5
-5
-30
6
42
10
0
0
-3
33
15
5
-25
-5
31
20
10
-70
-7
29
∑= -215
R F Riesenfeld Sp 2010
CS5961 Comp Stat
11
Covariance Calculation
(2)
Evaluation yields,
S xy
R F Riesenfeld Sp 2010
1
 ( 215)  53.75
4
CS5961 Comp Stat
12
Covariance under Affine Transformation
Let Li  aX i  b and M i  cYi  d . Then,
 l i  a  x i ,  m i  c  y i
where,  u i  ui  u .
,
Evaluating, in turn, gives,
N
S LM
1

 l i  m i

N  1 i1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
13
Covariance under Affine Transf
(2)
Evaluating further,
1 N
S LM 
 l i  m i

N  1 i1
1 N

a  x i c  y i

N  1 i1
1 N
 ac
 x i  y i

N  1 i1
 S LM  acS xy
R F Riesenfeld Sp 2010
CS5961 Comp Stat
14
(Pearson) Correlation Coefficient rxy
⇨ Like covariance, but uses Z-values instead
of deviations. Hence, invariant under
linear transformation of the raw data.
N
1
rxy 
zxi zyi

N  1 i 1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
15
Alternative (common) Expression
rxy 
R F Riesenfeld Sp 2010
sxy
sx s y
CS5961 Comp Stat
16
Computational Formula 1

X i  Yi

N

1
  X iYi  i 1 i 1
sxy 
N  1  i 1
N


N
R F Riesenfeld Sp 2010
CS5961 Comp Stat
N






17
Computational Formula 2
rxy

N  XY

R F Riesenfeld Sp 2010
N X 
2
  X Y
 X    N  Y   Y  
2
CS5961 Comp Stat
2
2
18
Table for Calculating rxy
Cigs (X )
∑=
XY
0
0
0
2025
45
5
25
210
1764
42
10
100
330
1089
33
15
225
465
961
31
20
400
580
841
29
50
750
1585
6680
180
CS5961 Comp Stat
Y
Cap (Y )
2
R F Riesenfeld Sp 2010
X
2
19
Computing rxy from Table
rxy

5(1585)  50(180)
 5(750  50 )  5(6680)  180 
2

R F Riesenfeld Sp 2010
2
7925  9000
 3750  2500  33400  32400 
CS5961 Comp Stat
20
Computing Correlation
rxy 
1075
1250 1000 
rxy   0.9615
R F Riesenfeld Sp 2010
CS5961 Comp Stat
21
rxy   0.96 Conclusion
⇨ rxy = -0.96 implies almost certainty
smoker will have diminish lung capacity
⇨ Greater smoking exposure implies greater
likelihood of lung damage
R F Riesenfeld Sp 2010
CS5961 Comp Stat
22
End
Covariance & Correlation
Notes
R F Riesenfeld Sp 2010
CS5961 Comp Stat
23