Statistics - East Norfolk Sixth Form College

Download Report

Transcript Statistics - East Norfolk Sixth Form College

“Teach A Level Maths”
Statistics 1
© Christine Crisp
Introduction to S1
You met some statistical diagrams when you did GCSE.
The next three presentations and this one remind you of
them and point out some details that you may not have
met before.
We will start with stem and leaf diagrams
( including back-to-back ).
Stem and leaf diagrams are sometimes called stem plots.
Introduction to S1
e.g. The table below gives the number of hours
worked in a particular week by a sample of 30 men
35
41
33
31
30
45
35
36
51
32
32
30
28
35
34
33
35
36
32
42
33
31
41
21
34
35
34
46
32
35
I’ll use intervals of 5 hours to draw the diagram i.e.
20-25, 26-30 etc.
5 1 Weekly hours of 30 men
The stem shows the
4 5 6
tens . . . and the
4 1 1 2
leaves the units
3 5 5 5 5 5 5 6 6
e.g. 46 is 4 tens and 6
3 0 0 1 1 2 2 2 2 3 3 3 4 4 4
units
2 8
2 1
Introduction to S1
e.g. The table below gives the number of hours
worked in a particular week by a sample of 30 men
35
41
33
31
30
45
35
36
51
32
32
30
28
35
34
33
35
36
32
42
33
31
41
21
34
35
34
46
32
35
I’ll use intervals of 5 hours to draw the diagram i.e.
20-25, 26-30 etc.
5 1 Weekly hours of 30 men
The stem shows the
4 5 6
tens . . . and the
4 1 1 2
leaves the units
3 5 5 5 5 5 5 6 6
e.g. 46 is 4 tens and 6
3 0 0 1 1 2 2 2 2 3 3 3 4 4 4
units
2 8
2 1
Introduction to S1
e.g. The table below gives the number of hours
worked in a particular week by a sample of 30 men
35
41
33
31
30
45
35
36
51
32
32
30
28
35
34
33
35
36
32
42
33
31
41
21
34
35
34
46
32
35
I’ll use intervals of 5 hours to draw the diagram i.e.
20-25, 26-30 etc.
5 1 Weekly hours of 30 men
The stem shows the
4 5 6
tens . . . and the
4 1 1 2
leaves the units
3 5 5 5 5 5 5 6 6
e.g. 46 is 4 tens and 6
3 0 0 1 1 2 2 2 2 3 3 3 4 4 4
units
2 8
N.B. 35 goes here . . .
2 1
not in the line below.
Introduction to S1
e.g. The table below gives the number of hours
worked in a particular week by a sample of 30 men
35
41
33
31
30
45
35
36
51
32
32
30
28
35
34
33
35
36
32
42
33
31
41
21
34
35
34
46
32
35
I’ll use intervals of 5 hours to draw the diagram i.e.
20-25, 26-30 etc.
5 1 Weekly hours of 30 men
The stem shows the
4 5 6
tens . . . and the
4 1 1 2
leaves the units
3 5 5 5 5 5 5 6 6
e.g. 46 is 4 tens and 6
3 0 0 1 1 2 2 2 2 3 3 3 4 4 4
units
2 8
We must show a key.
2 1
Key: 3 5 means 35 hours
Introduction to S1
Weekly hours of 30 men
5 1
4 5 6
4 1 1 2
3 5 5 5 5 5 5 6 6
3 0 0 1 1 2 2 2 2 3 3 3 4 4 4
2 8
Key: 3 5 means 35 hours
2 1
If you tip your head to the right and look at the diagram
you can see it is just a bar chart with more detail.
Points to notice:
•
•
The leaves are in numerical order
The diagram uses raw ( not grouped ) data
Introduction to S1
The data below is a back to back stem and leaf diagram giving the
weight in grams of eggs collected from ostriches and emus.
This method can be used to compare two sets of data.
Ostrich
8310
7621
41
10
27
28
29
30
Emu
248
24679
035
7
Key 27|2 = 272
Introduction to S1
A grouped data stem and leaf diagram
Data
2, 5, 5, 8, 12, 16, 17, 17, 19, 20, 22, 22, 24, 25, 25, 25, 27, 27, 29, 29, 36
Draw a stem and leaf diagram using groupings 0–4, 5–9, 10–14 etc
0–4
5–9
10–14
15–19
20–24
25–29
30–34
35–39
0
0
1
1
2
2
3
3
2
5
2
6
0
5
6
Key 1/2 = 12
5 8
7 7 9
2 2 4
5 5 7 7 9 9
3/6 = 36
Introduction to S1
Histogram :
A bar chart with continuous data. The bars are drawn up to
the class boundaries. NO GAPS between bars. The class
boundary occurs halfway between the boundaries of two
successive groups. (Except in age questions)
Groups 0-9 , 10-19 , 20-29 etc. the class boundaries between
each group occur at 9.5 , 19.5
So any quantity >9.5 is in group 2 and any quantity <9.5 is in
group 1 .
The bars are drawn at 9.5 and 19.5 etc.
It is very important that the area under each bar is proportional
to the frequency.
Introduction to S1
Histograms
e.g. The projected population of the U.K. for 2005 ( by age )
AGE
Freq
( years )
(millions)
0–9
7
10 – 19
8
20 – 29
7
30 – 39
9
40 – 49
9
50 – 59
8
60 – 69
6
70 – 79
4
80 – 89
2
90+
0
Source: USA IDB
Suppose the data are grouped so
that those below 20 and above 69
are combined.
Introduction to S1
e.g. The projected population of the U.K. for 2005 ( by age )
AGE
Freq
( years )
(millions)
0–9
7
10 – 19
8
20 – 29
7
30 – 39
9
40 – 49
9
50 – 59
8
60 – 69
6
70 – 79
4
80 – 89
2
90+
0
Source: USA IDB
Suppose the data are grouped so
that those below 20 and above 69
are combined.
AGE
Freq
(years) (millions)
0 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70+
15
7
9
9
8
6
6
To draw the diagram we must
have an upper class value
Introduction to S1
e.g. The projected population of the U.K. for 2005 ( by age )
AGE
Freq
( years )
(millions)
0–9
7
10 – 19
8
20 – 29
7
30 – 39
9
40 – 49
9
50 – 59
8
60 – 69
6
70 – 79
4
80 – 89
2
90+
0
Source: USA IDB
Suppose the data are grouped so
that those below 20 and above 69
are combined.
AGE
Freq
(years) (millions)
0 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 109
15
7
9
9
8
6
6
I chose a sensible figure
Introduction to S1
e.g. The projected population of the U.K. for 2005 ( by age )
If we use the data below to draw an age/frequency graph
then it is very misleading as the 1st and last bar dominate
AGE
Freq
(years) (millions)
0 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 109
15
7
9
9
8
6
6
18^|
Y
16
14
12
10
8
6
4
2
X->
0
10
20
30
40
50
60
70
80
90
100
-2
Bar1 1 should represent
just over twice as many people
as bar
2 but it appears
to be aboutby4 areas
times as many
So frequencies
are represented
Introduction to S1
A histogram shows frequencies as areas.
To draw the histogram, we need to find the width and
height of each column.
The width is the class width: upper class boundary (u.c.b.)
minus lower class boundary (l.c.b.).
AGE
Class
Freq
(years) (millions) width
0 - 19
15
20
20 - 29
7
30 - 39 for 9example,
Since these are ages, the 1st class,
49 the9 width is 20.
has u.c.b.= 20 and the l.c.b.=40
0,- so
50 - 59
8
60 - 69
6
70 - 109
6
Introduction to S1
A histogram shows frequencies as areas.
e.g. The projected population of the U.K. for 2005 ( by age )
To draw the histogram, we need to find the width and
height of each column.
The width is the class width: upper class boundary (u.c.b.)
minus lower class boundary (l.c.b.).
AGE
Class
Freq
(years) (millions) width
Area of a rectangle
= width  height 0 - 19
So,
20 - 29
frequency = width  height 30 - 39
40 - 49
 height = frequency
50 - 59
width
60 - 69
70 - 109
15
7
9
9
8
6
6
20
10
10
10
10
10
40
Introduction to S1
A histogram shows frequencies as areas.
e.g. The projected population of the U.K. for 2005 ( by age )
To draw the histogram, we need to find the width and
height of each column.
The width is the class width: upper class boundary (u.c.b.)
minus lower class boundary (l.c.b.).
height = frequency
width
AGE
Class
Freq
(years) (millions) width
0 - 19
20 - 29
The height is called
30 - 39
the frequency density
40 - 49
50 - 59
60 - 69
e.g. For the 1st class,
freq. density = 70 - 109
15
7
9
9
8
6
6
20
10
10
10
10
10
40
Freq
density
Introduction to S1
A histogram shows frequencies as areas.
e.g. The projected population of the U.K. for 2005 ( by age )
To draw the histogram, we need to find the width and
height of each column.
The width is the class width: upper class boundary (u.c.b.)
minus lower class boundary (l.c.b.).
height = frequency
width
AGE
Class
Freq
(years) (millions) width
0 - 19
15
20 - 29
7
The height is called
30 - 39
9
the frequency density
40 - 49
9
50 - 59
8
60 - 69
6
e.g. For the 1st class,
15
 0  756
freq. density = 70 - 109
20
20
10
10
10
10
10
40
Freq
density
0 ·75
Introduction to S1
A histogram shows frequencies as areas.
e.g. The projected population of the U.K. for 2005 ( by age )
To draw the histogram, we need to find the width and
height of each column.
The width is the class width: upper class boundary (u.c.b.)
minus lower class boundary (l.c.b.).
 height = frequency
width
The height is called
the frequency density
We can now draw the
histogram.
AGE
Class
Freq
(years) (millions) width
0 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 109
15
7
9
9
8
6
6
20
10
10
10
10
10
40
Freq
density
0 ·75
0 ·7
0 ·9
0 ·9
0 ·8
0 ·6
0 ·15
Introduction to S1
AGE
Class
Freq
(years) (millions) width
0 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 109
15
7
9
9
8
6
6
20
10
10
10
10
10
40
Freq
density
The projected population of the U.K.
for 2005 ( by age )
0 ·75
0 ·7
0 ·9
0 ·9
0 ·8
0 ·6
0 ·15
Notice that the frequencies for the last 2 classes are
the same. On the histogram the areas showing these
classes are the same.
If we had plotted frequency on the y-axis, the
diagram would be very misleading. ( It would suggest
there are 6 million in each age group 70 – 79, 80 – 89,
90 – 99 and 100 – 109. )
Introduction to S1
SUMMARY
Histograms are used to display grouped frequency data.
 Frequency is shown by area.
 The y-axis is used for frequency density.
 Class width is given by
u.c.b. – l.c.b.
where, u.c.b. is upper class boundary and
l.c.b. is lower class boundary
frequency
 frequency density =
class width
fluffyducks
float
clear w ater
Introduction to S1
Exercise
95 components are tested until they fail. The table
gives the times taken ( hours ) until failure.
Time to
failure (hours)
Number of
components
0-19
5
Find 3 things wrong
with the histogram
which represents the
data in the table.
20-29 30-39 40-44 45-49 50-59 60-89
8
16
22
18
16
10
Introduction to S1
Answer:
Time to
failure (hours)
Number of
components
0-19
20-29 30-39 40-44 45-49 50-59 60-89
5
• Frequency has been
plotted instead of
frequency density.
• There is no title.
• There are no units on
the x-axis.
8
16
22
18
16
10
Introduction to S1
Incorrect
diagram
Time taken for 95 components to fail
Correct
diagram
Introduction to S1
Length of millipede
0–9
10 – 19
20 – 39
Class
boundaries
0 – 9.5
9.5 – 19.5
19.5 – 39.5
Frequency
Class width
Freq. Density
6
18
14
9.5
10
20
0.63
1.8
0.7
Note Bars drawn at
9.5, 19.5 and 39.
Freq
Histogram showing length of millipede
density
length
Introduction to S1
Cumulative Frequency Graphs
e.g. The projected population of the U.K. for 2005, by age:
AGE
Freq
Cu.F
( years )
(millions)
(millions)
0–9
7
7
10 – 19
8
15
20 – 29
7
22
30 – 39
9
31
40 – 49
9
40
50 – 59
8
48
60 – 69
6
54
70 – 79
4
58
80 – 89
2
60
90+
0
60
Source: USA IDB
Why does this appear as 0?
ANS: The data are given to
the nearest million. The
projected figure was 113,000.
In drawing the diagram I
shall miss out this group.
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
Freq
Cu.F
( years )
(millions)
(millions)
0–9
7
7
10 – 19
8
15
20 – 29
7
22
30 – 39
9
31
40 – 49
9
40
50 – 59
8
48
60 – 69
6
54
70 – 79
4
58
80 – 89
2
60
Source: USA IDB
Points to notice:
• There is no gap between 9
and 10 as the data are
continuous.
• Points are plotted at upper
class boundaries (u.c.bs.)
e.g. the u.c.b. for 0 - 9
would normally be 9·5
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
Freq
Cu.F
( years )
(millions)
(millions)
0–9
7
7
10 – 19
8
15
20 – 29
7
22
30 – 39
9
31
40 – 49
9
40
50 – 59
8
48
60 – 69
6
54
70 – 79
4
58
80 – 89
2
60
Points to notice:
• There is no gap between 9
and 10 as the data are
continuous.
• Points are plotted at upper
class boundaries (u.c.bs.)
e.g. the u.c.b. for 0 - 9
would normally be 9·5
Age data have different u.c.bs.
Can you say why this is?
IDB
ANS: IfSource:
I askUSA
children
their ages, they reply 9 even if
they are nearly 10, so, the 0-9 group contains children
right up to age 10 NOT just nine and a half.
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
Freq
Cu.F
( years )
(millions)
(millions)
0–9
7
7
10 – 19
8
15
20 – 29
7
22
30 – 39
9
31
40 – 49
9
40
50 – 59
8
48
60 – 69
6
54
70 – 79
4
58
80 – 89
2
60
Source: USA IDB
Points to notice:
• There is no gap between 9
and 10 as the data are
continuous.
• Points are plotted at upper
class boundaries (u.c.bs.)
e.g. the u.c.b. for 0 - 9
would normally be 9·5
The u.c.bs. for this data
set are 10, 20, 30, . . .
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
f
Cu.f u.c.b.
( yrs )
(m)
(m)
( yrs )
0–9
7
7
10
10 – 19
8
15
20
20 – 29
7
22
30
30 – 39
9
31
40
40 – 49
9
40
50
50 – 59
8
48
60
60 – 69
6
54
70
70 – 79
4
58
80
80 – 89
2
60
90
Source: USA IDB
The projected population of the
U.K. for 2005 ( by age )
Age (yrs)
The median age is estimated as the
age corresponding to a cumulative
frequency of 30 million.
The median age is 39 years
( Half the population of the U.K.
will be over 39 in 2005. )
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
f
Cu.f u.c.b.
( yrs )
(m)
(m)
( yrs )
0–9
7
7
10
10 – 19
8
15
20
20 – 29
7
22
30
30 – 39
9
31
40
40 – 49
9
40
50
50 – 59
8
48
60
60 – 69
6
54
70
70 – 79
4
58
80
80 – 89
2
60
90
Source: USA IDB
The projected population of the
U.K. for 2005 ( by age )
Age (yrs)
The quartiles are found similarly:
LQ lower
= (n+1)th
item20ofyears
data
quartile:
quartile:
56of
years
UQupper
= (n+1)th
item
data
The interquartile range is 36 years
Introduction to S1
e.g. The projected population of the U.K. for 2005, by age:
AGE
f
Cu.f u.c.b.
( yrs )
(m)
(m)
( yrs )
0–9
7
7
10
10 – 19
8
15
20
20 – 29
7
22
30
30 – 39
9
31
40
40 – 49
9
40
50
50 – 59
8
48
60
60 – 69
6
54
70
70 – 79
4
58
80
80 – 89
2
60
90
Source: USA IDB
The projected population of the
U.K. for 2005 ( by age )
51
Age (yrs)
If the retirement age were to be
65 for everyone, how many people
would be retired?
ANS: ( 60 – 51 ) million
= 9 million
Introduction to S1
Exercise
The table and diagram show the number of flowers in a
sample of 43 antirrhinum plants.
x
f
Cu.f
20-39
6
6
40-59
10
16
60-79
12
28
80-99
7
35
100-119
5
40
120-139
1
41
140-159
1
42
160-179
1
43
Source: O.N.Bishop
Number of flowers on antirrhinum
plants
Estimate the median number of plants and the percentage
of plants that have more than 90 flowers.
Introduction to S1
Solution:
x
f
Cu.f
20-39
6
6
40-59
10
16
60-79
12
28
80-99
7
35
100-119
5
40
120-139
1
41
140-159
1
42
160-179
1
43
Number of flowers on antirrhinum
plants
32
There
are 43 (observations,
so the
the points
median) is
by ,the
The u.c.bs.
where we plot
aregiven
at 39·5
th one.
21·5
59·5
etc.
Median = 70
Number with more than 90 flowers =
Introduction to S1
Solution:
x
f
Cu.f
20-39
6
6
40-59
10
16
60-79
12
28
80-99
7
35
100-119
5
40
120-139
1
41
140-159
1
42
160-179
1
43
Number of flowers on antirrhinum
plants
32
There are 43 observations, so the median is given by the
21·5th one.
Median = 70
Number with more than 90 flowers = 43 – 32 = 11
11
 26%
Percentage with more than 90 flowers 
43
Introduction to S1
Introduction to S1