S1: Chapter 6 Correlation

Download Report

Transcript S1: Chapter 6 Correlation

S1: Chapter 6
Correlation
Dr J Frost ([email protected])
Last modified: 21st November 2013
Recap of correlation
Weak
? negative
? correlation
25
Weekly time on internet (hours)
100
90
Maths Score
80
70
60
Type of correlation:
Weak
? positive
? correlation
50
40
30
20
10
0
strength
0 10 20 30 40 50 60 70 80 90 100
20
15
10
5
0
type
0
20
40
60
80
100
Age
English Score
£70,00
£50,00
£40,00
Crime Rate
Cost of train fare
£60,00
£30,00
£20,00
Strong
? positive
? correlation
£10,00
£0,00
0
50
100
Distance travelled (km)
150
No
? correlation
40
35
30
25
20
15
10
5
0
0
10000
20000
30000
40000
50000
Number of people in city called 'Dave'
60000
Recap
If we let 𝑆π‘₯π‘₯ = π‘₯ βˆ’ π‘₯
then the variance is:
Variance
=
π‘₯βˆ’π‘₯
𝑛
2
Recall that variance gives the extent to which the
variable π‘₯ β€˜varies’!
𝑺𝒙𝒙
𝒏
2
Covariance
We can extend variance to two variables.
We might be interested in how one variable varies with another.
£70,00
£60,00
Cost of train fare
£50,00
£40,00
£30,00
We can say that as distance (say π‘₯)
increases, the cost (say 𝑦) increases.
Thus the covariance of π‘₯ and 𝑦 is
positive.
£20,00
£10,00
£0,00
0
20
40
60
80
Distance travelled (km)
100
120
140
Covariance
Comment on the covariance between the variables.
𝑦
𝑦
π‘₯
π‘₯
As 𝑦 increases, π‘₯ doesn’t change
very much. So the?covariance is
small (but positive)
As π‘₯ increases, 𝑦 doesn’t change
very much. So the?covariance is
small (but positive)
Covariance
Comment on the covariance between the variables.
𝑦
𝑦
π‘₯
As π‘₯ increases, 𝑦 decreases. So the
?
covariance is negative.
π‘₯
As 𝑦 varies, π‘₯ doesn’t vary at all. So we
say that variables are
? independent, and
the covariance is 0.
Covariance
𝑆π‘₯𝑦
πΆπ‘œπ‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘₯, 𝑦 =
𝑛
where 𝑆π‘₯𝑦 = Ξ£ π‘₯ βˆ’ π‘₯ 𝑦 βˆ’ 𝑦
Notice that if we replace 𝑦 with π‘₯, we have
earlier is the variance of π‘₯.
i.e. πΆπ‘œπ‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘₯, π‘₯ = π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘₯
𝑆π‘₯π‘₯
,
𝑛
which we saw
Simpler formulae for 𝑆π‘₯π‘₯ , 𝑆π‘₯𝑦
𝑆π‘₯π‘₯
𝑆π‘₯𝑦
Ξ£π‘₯
2
= Ξ£π‘₯ βˆ’
𝑛
2
Ξ£π‘₯ Σ𝑦
= Ξ£π‘₯𝑦 βˆ’
𝑛
You’re given these in the formula booklet, but it’s worth
memorising them.
Notice that the first is just the same formula as for variance,
𝑆
except we’ve just multiplied everything by 𝑛, since π‘‰π‘Žπ‘Ÿ π‘₯ = π‘₯π‘₯
𝑛
Product Moment Correlation Coefficient (PMCC)
While the sign (i.e. positive or negative) of the covariance is
helpful, the magnitude (i.e. size) is hard to interpret.
We can turn our covariance into a correlation coefficient…
π‘Ÿ=
𝑆π‘₯𝑦
𝑆π‘₯π‘₯ 𝑆𝑦𝑦
π‘Ÿ is known as the Product Moment
Correlation Coefficient (PMCC).
Dividing by this forces
our covariance to be
between -1 and 1.
We’ll interpret what
that means in a second.
Product Moment Correlation Coefficient (PMCC)
Baby
A
B
C
D
E
F
Head
Circumference (𝒙)
31.1
33.3
30.0
31.5
35.0
30.2
Gestation Period
(π’š)
36
37
38
38
40
40
?
Ξ£π‘₯𝑦 = 7296.7
𝑛=6 ?
?
?
Σ𝑦 = 229
Ξ£π‘₯ = 191.1
Ξ£π‘₯ 2 = 6105.39
Σ𝑦 2 = 8753
?
?
2
Ξ£π‘₯
𝑆π‘₯π‘₯ = Ξ£π‘₯ 2 βˆ’
= 18.855
?
𝑛
2
Σ𝑦
?
𝑆𝑦𝑦 = Σ𝑦 2 βˆ’
= 12.833
𝑛
Ξ£π‘₯ Σ𝑦
𝑆π‘₯𝑦 = Ξ£π‘₯𝑦 βˆ’
= 3.05?
𝑛
π‘Ÿ=
𝑆π‘₯𝑦
𝑆π‘₯π‘₯ 𝑆𝑦𝑦
?
= 0.196
Product Moment Correlation Coefficient (PMCC)
Quite often the
values are given to
you in an exam.
?
?
?
?
?
?
?
?
Let’s do it on our calculators!
Baby
A
B
C
D
E
F
Head
Circumference (𝒙)
31.1
33.3
30.0
31.5
35.0
30.2
Gestation Period
(π’š)
36
37
38
38
40
40
β€’ Put in Stats mode: MODE β†’ 2
β€’ Select 2 for 𝐴 + 𝐡𝑋 (i.e. calculations to do with
linear relationships)
β€’ Insert the data into your table. Use the arrow keys
and β€˜=β€˜ to add the values.
β€’ Once done, press the 𝐴𝐢 button. This β€˜accepts’ your
table of values.
β€’ Press 𝑆𝐻𝐼𝐹𝑇 + 1, and choose 5 for REGRESSION.
β€’ Select 3 for π‘Ÿ. π‘Ÿ is now in your calculation, so press
=.
Interpreting the PMCC
We’ve seen the PMCC varies between -1 and 1.
π‘Ÿ=1
means
means
π‘Ÿ=0
means
π‘Ÿ = βˆ’1
Perfect positive? correlation.
No correlation?
Perfect negative
? correlation.
Interpreting the PMCC
25
Weekly time on internet (hours)
100
90
70
60
50
40
30
20
10
20
15
π‘Ÿ = 0.8
10
5
π‘Ÿ=0
0
0
0
0 10 20 30 40 50 60 70 80 90 100
20
40
60
80
100
Age
English Score
π‘Ÿ = βˆ’0.4
£70,00
£60,00
£50,00
Crime Rate
Cost of train fare
Maths Score
80
£40,00
£30,00
£20,00
£10,00
40
35
30
25
20
15
10
5
0
π‘Ÿ = 0.96
0
20000
40000
60000
Number of people in city called 'Dave'
£0,00
0
50
100
Distance travelled (km)
150
Exercises
Page 122 Exercise 6B
Q1, 4, 5, 7, 9
Limitations of correlation
Often there’s a 3rd variable that explains two others, but the two
variables themselves are not connected.
Q1: The number of cars on the road has increased, and the number of DVD
recorders bought has decreased. Is there a correlation between the two
variables?
Buying a car does not necessarily mean that you will not buy a DVD recorder,
so we cannot say there is a correlation ?
between the two.
Q2: Over the past 10 years the memory capacity of personal computers has
increased, and so has the average life expectancy of people in the western
world. Is there are correlation between these two variables?
The two are not connected, but both are due to scientific development over
?
time (i.e. a third variable!)
Effects of coding
We know that π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ π‘₯ =
𝑆π‘₯π‘₯
𝑛
and π‘Ÿ =
𝑆π‘₯𝑦
𝑆π‘₯π‘₯ 𝑆𝑦𝑦
Therefore, if all our data values π‘₯ get k times bigger in size and
values 𝑦 become 𝒒 times bigger, what happens to…
(Recap) The variance of π‘₯:
π‘˜ 2 times? as big
𝑆π‘₯π‘₯ :
π‘˜ 2 times? as big
𝑆𝑦𝑦 :
π‘ž2 times? as big
𝑆π‘₯𝑦 :
π‘˜π‘ž times
? as big
π‘Ÿ:
?
Unaffected!
Effects of coding
For the purposes of the S1 exam, you just need to
remember that:
β€’ Coding affects 𝑆π‘₯π‘₯ in the same way that the variance is
affected. i.e. If the variance becomes 9 times larger, so
does 𝑆π‘₯π‘₯ .
β€’ If π‘₯ and/or 𝑦 are coded, the PMCC is unaffected.
Example
𝒙
π’š
𝑝=
𝒑
𝒒
1020
1032
1028
1034
1023
1038
320
335
345
355
360
380
π‘₯ βˆ’ 1020
1
π‘ž=
𝑦 βˆ’ 300
5
0
12
8
14
3
18
4
7
9
11
12
16
We can now just find the PMCC of this new data set, and no further
adjustment is needed.
π‘Ÿ = 0.655
?