S1: Chapter 6 Correlation
Download
Report
Transcript S1: Chapter 6 Correlation
S1: Chapter 6
Correlation
Dr J Frost ([email protected])
Last modified: 21st November 2013
Recap of correlation
Weak
? negative
? correlation
25
Weekly time on internet (hours)
100
90
Maths Score
80
70
60
Type of correlation:
Weak
? positive
? correlation
50
40
30
20
10
0
strength
0 10 20 30 40 50 60 70 80 90 100
20
15
10
5
0
type
0
20
40
60
80
100
Age
English Score
£70,00
£50,00
£40,00
Crime Rate
Cost of train fare
£60,00
£30,00
£20,00
Strong
? positive
? correlation
£10,00
£0,00
0
50
100
Distance travelled (km)
150
No
? correlation
40
35
30
25
20
15
10
5
0
0
10000
20000
30000
40000
50000
Number of people in city called 'Dave'
60000
Recap
If we let ππ₯π₯ = π₯ β π₯
then the variance is:
Variance
=
π₯βπ₯
π
2
Recall that variance gives the extent to which the
variable π₯ βvariesβ!
πΊππ
π
2
Covariance
We can extend variance to two variables.
We might be interested in how one variable varies with another.
£70,00
£60,00
Cost of train fare
£50,00
£40,00
£30,00
We can say that as distance (say π₯)
increases, the cost (say π¦) increases.
Thus the covariance of π₯ and π¦ is
positive.
£20,00
£10,00
£0,00
0
20
40
60
80
Distance travelled (km)
100
120
140
Covariance
Comment on the covariance between the variables.
π¦
π¦
π₯
π₯
As π¦ increases, π₯ doesnβt change
very much. So the?covariance is
small (but positive)
As π₯ increases, π¦ doesnβt change
very much. So the?covariance is
small (but positive)
Covariance
Comment on the covariance between the variables.
π¦
π¦
π₯
As π₯ increases, π¦ decreases. So the
?
covariance is negative.
π₯
As π¦ varies, π₯ doesnβt vary at all. So we
say that variables are
? independent, and
the covariance is 0.
Covariance
ππ₯π¦
πΆππ£πππππππ π₯, π¦ =
π
where ππ₯π¦ = Ξ£ π₯ β π₯ π¦ β π¦
Notice that if we replace π¦ with π₯, we have
earlier is the variance of π₯.
i.e. πΆππ£πππππππ π₯, π₯ = ππππππππ π₯
ππ₯π₯
,
π
which we saw
Simpler formulae for ππ₯π₯ , ππ₯π¦
ππ₯π₯
ππ₯π¦
Ξ£π₯
2
= Ξ£π₯ β
π
2
Ξ£π₯ Ξ£π¦
= Ξ£π₯π¦ β
π
Youβre given these in the formula booklet, but itβs worth
memorising them.
Notice that the first is just the same formula as for variance,
π
except weβve just multiplied everything by π, since πππ π₯ = π₯π₯
π
Product Moment Correlation Coefficient (PMCC)
While the sign (i.e. positive or negative) of the covariance is
helpful, the magnitude (i.e. size) is hard to interpret.
We can turn our covariance into a correlation coefficientβ¦
π=
ππ₯π¦
ππ₯π₯ ππ¦π¦
π is known as the Product Moment
Correlation Coefficient (PMCC).
Dividing by this forces
our covariance to be
between -1 and 1.
Weβll interpret what
that means in a second.
Product Moment Correlation Coefficient (PMCC)
Baby
A
B
C
D
E
F
Head
Circumference (π)
31.1
33.3
30.0
31.5
35.0
30.2
Gestation Period
(π)
36
37
38
38
40
40
?
Ξ£π₯π¦ = 7296.7
π=6 ?
?
?
Ξ£π¦ = 229
Ξ£π₯ = 191.1
Ξ£π₯ 2 = 6105.39
Ξ£π¦ 2 = 8753
?
?
2
Ξ£π₯
ππ₯π₯ = Ξ£π₯ 2 β
= 18.855
?
π
2
Ξ£π¦
?
ππ¦π¦ = Ξ£π¦ 2 β
= 12.833
π
Ξ£π₯ Ξ£π¦
ππ₯π¦ = Ξ£π₯π¦ β
= 3.05?
π
π=
ππ₯π¦
ππ₯π₯ ππ¦π¦
?
= 0.196
Product Moment Correlation Coefficient (PMCC)
Quite often the
values are given to
you in an exam.
?
?
?
?
?
?
?
?
Letβs do it on our calculators!
Baby
A
B
C
D
E
F
Head
Circumference (π)
31.1
33.3
30.0
31.5
35.0
30.2
Gestation Period
(π)
36
37
38
38
40
40
β’ Put in Stats mode: MODE β 2
β’ Select 2 for π΄ + π΅π (i.e. calculations to do with
linear relationships)
β’ Insert the data into your table. Use the arrow keys
and β=β to add the values.
β’ Once done, press the π΄πΆ button. This βacceptsβ your
table of values.
β’ Press ππ»πΌπΉπ + 1, and choose 5 for REGRESSION.
β’ Select 3 for π. π is now in your calculation, so press
=.
Interpreting the PMCC
Weβve seen the PMCC varies between -1 and 1.
π=1
means
means
π=0
means
π = β1
Perfect positive? correlation.
No correlation?
Perfect negative
? correlation.
Interpreting the PMCC
25
Weekly time on internet (hours)
100
90
70
60
50
40
30
20
10
20
15
π = 0.8
10
5
π=0
0
0
0
0 10 20 30 40 50 60 70 80 90 100
20
40
60
80
100
Age
English Score
π = β0.4
£70,00
£60,00
£50,00
Crime Rate
Cost of train fare
Maths Score
80
£40,00
£30,00
£20,00
£10,00
40
35
30
25
20
15
10
5
0
π = 0.96
0
20000
40000
60000
Number of people in city called 'Dave'
£0,00
0
50
100
Distance travelled (km)
150
Exercises
Page 122 Exercise 6B
Q1, 4, 5, 7, 9
Limitations of correlation
Often thereβs a 3rd variable that explains two others, but the two
variables themselves are not connected.
Q1: The number of cars on the road has increased, and the number of DVD
recorders bought has decreased. Is there a correlation between the two
variables?
Buying a car does not necessarily mean that you will not buy a DVD recorder,
so we cannot say there is a correlation ?
between the two.
Q2: Over the past 10 years the memory capacity of personal computers has
increased, and so has the average life expectancy of people in the western
world. Is there are correlation between these two variables?
The two are not connected, but both are due to scientific development over
?
time (i.e. a third variable!)
Effects of coding
We know that ππππππππ π₯ =
ππ₯π₯
π
and π =
ππ₯π¦
ππ₯π₯ ππ¦π¦
Therefore, if all our data values π₯ get k times bigger in size and
values π¦ become π times bigger, what happens toβ¦
(Recap) The variance of π₯:
π 2 times? as big
ππ₯π₯ :
π 2 times? as big
ππ¦π¦ :
π2 times? as big
ππ₯π¦ :
ππ times
? as big
π:
?
Unaffected!
Effects of coding
For the purposes of the S1 exam, you just need to
remember that:
β’ Coding affects ππ₯π₯ in the same way that the variance is
affected. i.e. If the variance becomes 9 times larger, so
does ππ₯π₯ .
β’ If π₯ and/or π¦ are coded, the PMCC is unaffected.
Example
π
π
π=
π
π
1020
1032
1028
1034
1023
1038
320
335
345
355
360
380
π₯ β 1020
1
π=
π¦ β 300
5
0
12
8
14
3
18
4
7
9
11
12
16
We can now just find the PMCC of this new data set, and no further
adjustment is needed.
π = 0.655
?