Transcript ppt
Correlation and Covariance
R. F. Riesenfeld
(Based on web slides by
James H. Steiger)
Goals
⇨ Introduce concepts of
Covariance
Correlation
⇨ Develop computational formulas
R F Riesenfeld Sp 2010
CS5961 Comp Stat
2
Covariance
⇨ Variables may change in relation to each
other
⇨ Covariance measures how much the
movement in one variable predicts the
movement in a corresponding variable
R F Riesenfeld Sp 2010
CS5961 Comp Stat
3
Smoking and Lung Capacity
⇨ Example: investigate relationship
between cigarette smoking and lung
capacity
⇨ Data: sample group response data on
smoking habits, and measured lung
capacities, respectively
R F Riesenfeld Sp 2010
CS5961 Comp Stat
4
Smoking v Lung Capacity Data
N
Cigarettes (X )
Lung Capacity (Y )
1
2
0
5
45
42
3
10
33
4
15
31
5
20
29
R F Riesenfeld Sp 2010
CS5961 Comp Stat
5
Smoking and Lung Capacity
Lung Capacity (Y )
50
Lung Capacity
45
40
35
30
25
20
-5
0
5
10
Smoking (yrs)
15
20
25
6
Smoking v Lung Capacity
⇨ Observe that as smoking exposure goes
up, corresponding lung capacity goes
down
⇨ Variables covary inversely
⇨ Covariance and Correlation quantify
relationship
R F Riesenfeld Sp 2010
CS5961 Comp Stat
7
Covariance
⇨ Variables that covary inversely, like
smoking and lung capacity, tend to
appear on opposite sides of the group
means
When smoking is above its group mean, lung
capacity tends to be below its group mean.
⇨ Average product of deviation measures
extent to which variables covary, the
degree of linkage between them
R F Riesenfeld Sp 2010
CS5961 Comp Stat
8
The Sample Covariance
⇨ Similar to variance, for theoretical
reasons, average is typically computed
using (N -1), not N . Thus,
1 N
S xy
Xi X
N 1 i1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
Y Y
i
9
Calculating Covariance
Cigs (X )
R F Riesenfeld Sp 2010
Lung Cap (Y )
0
5
10
15
20
45
42
33
31
29
X 10
Y 36
CS5961 Comp Stat
10
Calculating Covariance
Cigs (X ) ( X X ) ( X X ) (Y Y ) (Y Y ) Cap (Y )
0
-10
-90
9
45
5
-5
-30
6
42
10
0
0
-3
33
15
5
-25
-5
31
20
10
-70
-7
29
∑= -215
R F Riesenfeld Sp 2010
CS5961 Comp Stat
11
Covariance Calculation
(2)
Evaluation yields,
S xy
R F Riesenfeld Sp 2010
1
( 215) 53.75
4
CS5961 Comp Stat
12
Covariance under Affine Transformation
Let Li aX i b and M i cYi d . Then,
l i a x i , m i c y i
where, u i ui u .
,
Evaluating, in turn, gives,
N
S LM
1
l i m i
N 1 i1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
13
Covariance under Affine Transf
(2)
Evaluating further,
1 N
S LM
l i m i
N 1 i1
1 N
a x i c y i
N 1 i1
1 N
ac
x i y i
N 1 i1
S LM acS xy
R F Riesenfeld Sp 2010
CS5961 Comp Stat
14
(Pearson) Correlation Coefficient rxy
⇨ Like covariance, but uses Z-values instead
of deviations. Hence, invariant under
linear transformation of the raw data.
N
1
rxy
zxi zyi
N 1 i 1
R F Riesenfeld Sp 2010
CS5961 Comp Stat
15
Alternative (common) Expression
rxy
R F Riesenfeld Sp 2010
sxy
sx s y
CS5961 Comp Stat
16
Computational Formula 1
X i Yi
N
1
X iYi i 1 i 1
sxy
N 1 i 1
N
N
R F Riesenfeld Sp 2010
CS5961 Comp Stat
N
17
Computational Formula 2
rxy
N XY
R F Riesenfeld Sp 2010
N X
2
X Y
X N Y Y
2
CS5961 Comp Stat
2
2
18
Table for Calculating rxy
Cigs (X )
∑=
XY
0
0
0
2025
45
5
25
210
1764
42
10
100
330
1089
33
15
225
465
961
31
20
400
580
841
29
50
750
1585
6680
180
CS5961 Comp Stat
Y
Cap (Y )
2
R F Riesenfeld Sp 2010
X
2
19
Computing rxy from Table
rxy
5(1585) 50(180)
5(750 50 ) 5(6680) 180
2
R F Riesenfeld Sp 2010
2
7925 9000
3750 2500 33400 32400
CS5961 Comp Stat
20
Computing Correlation
rxy
1075
1250 1000
rxy 0.9615
R F Riesenfeld Sp 2010
CS5961 Comp Stat
21
rxy 0.96 Conclusion
⇨ rxy = -0.96 implies almost certainty
smoker will have diminish lung capacity
⇨ Greater smoking exposure implies greater
likelihood of lung damage
R F Riesenfeld Sp 2010
CS5961 Comp Stat
22
End
Covariance & Correlation
Notes
R F Riesenfeld Sp 2010
CS5961 Comp Stat
23