1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g.

Download Report

Transcript 1. Mean - easy to calculate but is affected by extreme values - to calculate use: Sum of all values Total number of values e.g.

1.
Mean
- easy to calculate but is affected by extreme values
- to calculate use:
Sum of all values
Total number of values
e.g. Calculate the mean of 6, 11, 3, 14, 8
Mean =
6 + 11 + 3 + 14 + 8
=
5
42
Push equals on
calculator BEFORE
dividing
= 8.4
5
e.g. Calculate the mean of 6, 11, 3, 14, 8, 100
Mean =
6 + 11 + 3 + 14 + 8 + 100
6
=
142
6
= 23.7 (1 d.p.)
2.
Median
- middle number when all are PLACED IN ORDER (two ways)
- harder to calculate but is not affected by extreme values
a) for an odd number of values, median is the middle value
e.g. Find the median of 39, 44, 38, 37, 42, 40, 42, 39, 32
32, 37, 38, 39, 39, 40, 42, 42, 44
To find placement of
median use:
9 + 1 = 10 = 5
n+1
2
2
2
Median = 39
n = amount of data
Cross of data, one at a
time from each end until
you reach the middle
value.
b) for an even number of values, median is average of the two middle values
OR
e.g. Find the median of 69, 71, 68, 85, 73, 73, 64, 75
64, 68, 69, 71, 73, 73, 75, 85
n + 1 = 8 + 1 = 4.5
2
2
Median = 71 + 73 = 144 = 72
2
2
OR
3.
Mode
- only useful to find most popular item
- is the most common value (can be none, one or more)
e.g. Find the mode of 188, 93, 4, 93, 15, 0, 100 15
Mode = 15 and 93
Range
- can show how spread out the data is
- is the difference between the largest and smallest values
e.g. Find the range of 4, 2, 6, 9, 8
lowest value
highest value
Range = 9 – 2
= 7 (2 – 9)
Note: Its a good idea to write in brackets the values that make up the range.
- Useful when dealing with large amounts of discrete data
e.g. Here are the number of fundraising tickets sold by 25 members of a
Hockey team. Place data on a frequency table.
3, 5, 0, 1, 0, 2, 5, 2, 4, 0, 1, 2, 3, 5, 7, 2, 3, 3, 1, 4, 3, 3, 2, 0, 1
No. of tickets sold
(x)
0
1
2
3
4
5
6
7
Total
Tally
Frequency (f)
x.f
IIII
IIII
IIII
IIII I
II
III
4
4
5
6
2
3
0
1
25
0x4=0
1x4=4
2 x 5 = 10
3 x 6 = 18
4x2=8
5 x 3 = 15
0
7x1=7
62
I
To find the mean, we need the sum of the ticket numbers multiplied by their
frequencies, and divide this by the total frequency.
Check total
Mean =
sum of x.f .
total frequency
= 62
25
= 2.48 tickets
frequency
matches question!
No. of tickets sold
(x)
0
1
2
3
4
5
6
7
Total
Tally
Frequency (f)
x.f
IIII
IIII
IIII
IIII I
II
III
4
4
5
6
2
3
0
1
25
0x4=0
1x4=4
2 x 5 = 10
3 x 6 = 18
4x2=8
5 x 3 = 15
0
7x1=7
62
I
4
8
13
To find the median, determine its position by using the previous formula.
n + 1 = 25 + 1 = 13
2
2
Now, by adding down the frequency
column, locate position of median
Therefore: Median = 2 tickets
To find the mode, look for the highest frequency
Therefore: Mode = 3 tickets
1. Discrete Data
– usually found by counting, usually whole numbers
e.g.
Number of cars passing the school
2. Continuous Data
– usually found by measuring
e.g.
Weights and heights of students
1. Bar Graph
– shows discrete data
– must have GAPS between bars
e.g. Beside are the number of times
28 students went out for dinner last
month. Place data on a bar graph.
Number of dinners
0
1
2
3
4
5
Frequency
2
6
8
6
4
2
y
c
n
8
Don’t forget a title
6
Note gaps between
bars
4
F
r
e
q
u
e
Students out for Dinner
2
0
0
1
2
Number of dinners
Or axis labels
3
4
5
2. Dot Plots
– are like a bar graph
– each dot represents one item
e.g.
Plot these 15 golf scores on a dot plot
70, 72, 68, 74, 74, 78, 77, 70, 72, 72, 76, 72, 76, 75, 78
Range plot between lowest
and highest values
68
70
72
Golf Scores
74
76
78
3. Pictograms
– uses symbols to represent fixed numbers
– key shows the value of the symbol
e.g. Using an appropriate symbol, draw a pictogram displaying the
number of hours per week spent completing homework for the
following subjects.
Hours of Study in a Week
Science
English
KEY
Maths
1 hour
4. Pie Graphs
– show comparisons
– slices are called sectors
– uses percentages and angles (protractor and compass)
e.g.
Students of a class arrived to school in the following manner.
Show on a Pie Graph
Walked = 6
Cycled = 5
Car = 4
Bus = 9
90°
75°
60°
135°
To calculate angle of
sectors use:
Amount of sector x 360
Total Data
Walked = 6 x 360 = 90
24
Student Mode of Transport
Car
Walked
Cycled
Note: Instead of labels, a
key could also be used.
Bus
5. Strip Graph
– shows the proportion of each part to the whole
– should have a scale
– linked to pie graphs
e.g.
Using Pie Graph example, Strip Graph drawn could use a
scale of1 cm = 2 students
– are measures of spread which with the median splits the data into quarters
– method used is similar as to when finding median
When the data is in order:
– the lower quartile (LQ) has 25% or ¼ of the data below it.
– the upper quartile (UQ) has 75% or ¾ of the data below it.
– the Interquartile Range (IQR) = UQ – LQ
e.g.
Find the LQ, UQ and the interquartile range of the following data
6, 6, 6, 7, 8, 9, 10, 10, 11, 14, 16, 16, 17, 19, 20, 20, 24, 24, 25, 29
Note: always find the median first
10 + 1 = 11 = 5.5
2
2
20 + 1 = 21 = 10.5
or cross off data
2
2
LQ =8 + 9 = 17 = 8.5 OR UQ = 20 + 20 = 40 = 20 OR
2
2
2
2
IQR = 20 – 8.5 = 11.5
IQR = UQ - LQ
e.g.
Find the LQ, UQ and the interquartile range of the following data
5, 6, 8, 10, 11, 11, 12, 15, 18, 22, 23, 28, 30
Remember, always find the median first
13 + 1 = 14 = 7
or cross off data
2
2
As the median is an actual piece of data, it is ignored when finding the LQ and UQ
6 + 1 = 7 = 3.5
2
2
LQ = 8 + 10 = 18 = 9
2
2
IQR =
UQ = 22 + 23 = 45 = 22.5
2
2
22.5 – 9 = 13.5
– records and organises data
– most significant figures form the stem and the final digits the leaves
– can be in back to back form in order to compare two sets of data
e.g.
Place the following heights (in m) onto a back to back stem and leaf plot
BOYS = 1. 59, 1.69, 1.47, 1.43, 1.82, 1.70, 1.73, 1.35, 1.76, 1.68,
1.62, 1.84, 1.45, 1.50, 1.54, 1.73, 1.84, 1.71, 1.66
GIRLS = 1. 44, 1.46, 1.63, 1.29, 1.48, 1.57, 1.51, 1.42, 1.34, 1.45,
1.57, 1.59, 1.42
Look at the highest and lowest data values to decide the range of the stem
Unordered Graph of Heights
Ordered Graph of Heights
Boys
Girls
Boys
Girls
4 ,4 ,2 1.8
4, 4, 2 1.8
1 ,3 ,6 ,3 ,0 1.7
6, 3, 3, 1, 0 1.7
6 ,2 ,8 ,9 1.6 3
9, 8, 6, 2 1.6 3
4 ,0 ,9 1.5 7, 1, 7, 9
9, 4, 0 1.5 1, 7, 7, 9
5 ,3 ,7 1.4 4, 6, 8, 2, 5, 2
7, 5, 3 1.4 2, 2, 4, 5, 6, 8
5 1.3 4
5 1.3 4
1.2 9
1.2 9
Place the final digits of the data on the graph on the correct side
For each statistic, make
Graph of Heights
sure to write down the
Boys
Girls
whole number, not just
4, 4, 2 1.8
the ‘leaf’!
6, 3, 3, 1, 0 1.7
9, 8, 6, 2 1.6 3
9, 4, 0 1.5 1, 7, 7, 9
7, 5, 3 1.4 2, 2, 4, 5, 6, 8
5 1.3 4
1.2 9
When finding median, LQ
and UQ, make sure you
count/cross in the right
direction!
e.g. From the ordered plot state the minimum, maximum, LQ, median, UQ, IQR
and range statistics for each side
Minimum:
Maximum:
LQ:
Median:
UQ:
IQR:
Range:
BOYS
1.35 m
1.84 m
1.50 m
1.68 m
1.73 m
1.73 – 1.50 = 0.23 m
1.84 – 1.35 = 0.49 m
GIRLS
Median = 13 + 1 = 7
1.29 m
2
1.63 m
1.42 m
LQ/UQ = 6 + 1 = 3.5
1.46 m
2
1.57 m
1.57 – 1.42 = 0.15 m
1.63 – 1.29 = 0.34 m
Remember: If you find it hard to calculate stats off graph, write out data in a line first!
– shows the minimum, maximum, LQ, median and UQ
– ideal for comparing two sets of data
e.g.
Note: Use the minimum
and maximum values
to determine length of
scale
Using the height data from the Stem and Leaf diagrams, draw two box and
whisker plots (Boys and Girls)
Box and Whisker Plot of Boys and Girls Heights
Males
Minimum
LQ
Median
UQ
Maximum
Females
1.20
1.30
1.40
1.50
1.60
1.70
Height (m)
Question: What is the comparison
between the boy and girl heights?
ANSWER?
EVIDENCE?
1.80
1.90
– used when dealing with a large amount of continuous data and groups are needed
e.g.
Listed below are the heights (in cm) of 25 students. Represent the data on a
frequency table
167, 173, 171, 149, 162, 174, 185, 165, 160, 170, 173, 161, 158, 172, 168,
168, 178, 170, 180, 166, 183, 150, 164, 161, 164
Note: Make sure you have enough groups but don’t make them too small!
Interval
140 – 149
150 – 159
160 – 169
170 – 179
180 – 189
TOTAL
Tally
I
II
IIII IIII I
IIII III
III
Freq. (f)
1
2
11
8
3
25
Midpoint (x)
(140144.5
+ 149) / 2
154.5
164.5
174.5
184.5
x.f
144.5
144.5x 1
309
1809.5
1396
553.5
4212.5
To calculate the mean a midpoint is needed and the formula used is:
e.g.
Mean = sum of midpoint x
total frequency
Calculate the mean from the above data and state the modal interval
Mean = 4212.5 = 168.5 cm
25
Modal Interval = 160 – 169 cm
)
s
d
Graph the grouped frequency table data about heights onto a histogram
n
e.g.
c
e
y
n
t
– display grouped data
– frequency is along vertical axis, group intervals are along horizontal axis
– there are NO gaps between bars
e
t
u
s
u
Note: The groups from the table form the intervals along the horizontal axis and
the highest frequency determines the height of the vertical axis.
Student Heights
q
f
e
o
12
10
.
r
8
o
F
6
(
n
4
2
0
140
150
160
Height (cm)
170
180
190
– Side by side histograms can also be used to compare data
Female Heights
Male Heights
8
8
6
6
4
4
2
2
140
150
160
170
180
190
200
140
150
160
170
180
190
200
Question: What is the comparison between the female and male heights?
ANSWER?
EVIDENCE?
g
k
(
– looks for a relationship between two measured variables
– points are plotted like co-ordinates
Use the data to
determine scale to
use on both axes
Scatte r Di a gra m fo r boys hei g hts and wei g hts
g
60
i
Weight
(kg)
48
52
50
49
53
47
58
45
50
51
49
46
44
49
Line of
best fit
55
e
Height
(cm)
144
152
161
148
155
140
158
139
147
150
152
138
137
145
W
e.g.
h
t
Outliers can
generally be ignored
Below are the heights and weights of Year 7 boys. Place on a scatter plot.
50
45
135
140
145
Hei g ht (c m)
150
155
If points form a line (or close to) we
can say there is a relationship
What is the relationship between the
between the two variables.
boys height and weight?
160
ANSWER?
EVIDENCE?
– a collection of measurements recorded at specific intervals where the quantity changes with time.
)
Features of Time Series
a) Order is important with all measurements retained to examine trends
b) Long term trends where measurements definitely tend to increase or decrease
c) Seasonal trends resulting in up and down patterns What are the short and long term
trends? ANSWER? EVIDENCE?
e.g. Draw a time series graph for the following data:
(
9900
9500
e
s
9700
l
9300
a
9100
9
8
9
7
9
6
9
.
r
a
M
.
c
e
D
p
e
S
t
n
J
u
e
.
.
r
a
M
.
c
e
D
S
t
p
e
e
n
u
J
r
a
M
.
c
e
D
.
.
.
S
t
p
e
e
n
u
r
a
M
.
c
e
t
p
e
8300
J
.
.
8500
D
Join up each of the points
8700
9
8900
S
Sept.
Dec.
Mar. 96
June
Sept.
Dec.
Mar. 97
June
Sept.
Dec.
Mar. 98
June
Sept.
Dec.
Mar. 99
Quarterly
sales
9040
8650
8370
9250
9033
8578
8495
9407
9209
8740
8618
9504
9246
8929
8670
S
Season
$
Quarterly Sales for Elliots's Fish and Chips Shop
Quarter Years
Good graphs should have:
- an accurate heading (watch emotive headings)
- scales in even steps
- scales from zero unless a break is shown
- values easy to read
- bar graphs have the same width bars and similar shading
Population: The entire group of members under consideration
Sample: When part of the group is surveyed
Census: Whole population is surveyed
Survey: Collection of information from some or all members of a population
Sampling Frame: A list covering the target population
A Good Sampling Frame: - should have each unit listed only once
- has each unit distinguishable from others
- is up to date
When planning an investigation:
- think carefully about what you are trying to find (question)
- what data is needed
- how will you obtain the data
- is the method practical and convenient
- how will you record the information
- how will you present the data
A sample should:
1) Be large enough to be representative of whole population
2) Have people/items in it that are representative of the population
It is best to choose samples that are large and random but size may be
affected by time, money, personnel, equipment etc.
Simple random sampling:
1- obtain a population list
2- number each member
3- use random table or random number on calculator
Systematic sampling:
1- obtain a population list
2- randomly select a starting point on the list
3- select every nth member until desired sample size is reached
Note: every nth member is found by: Population/group size
Size of sample needed
- Biased sample
- Wrong measurements
- Poorly worded, misleading questions
- Mistakes in calculations and/or display
For when comparing two sets of data.
1. If the two sets of data are NOT related (have no affect on each other)
Use the words COMPARE OR COMPARISON
e.g. What is the COMPARISON between…
How does … COMPARE to …
THEN: (also if justifying statements)
- Get as many statistics as possible (averages, quartiles, max and min, range etc)
- Draw a STEM and LEAF GRAPH and a BOX and WHISKER PLOT
(maybe SIDE BY SIDE HISTOGRAMS)
- Answer your question in one sentence Remember to use “generally/on average”
- Back up your answer with at least 2-3 statements using the data from your
statistics/graphs (at least one each on average and spread)
2. If the two sets of data ARE related (do have an affect on each other)
Use the words RELATE OR RELATIONSHIP
e.g. What is the RELATIONSHIP between…
How does … RELATE to …
THEN:
- Get as many statistics as possible
- Draw a SCATTERPLOT
- Answer your question in one sentence
- Back up your answer with at least 2-3 statements
3. If it is a single set of data taken over time we look for short and long term trends.
- Write your question in the following manner:
What are the SHORT and LONG TERM trends in ….
THEN:
- Get as many statistics as possible
- Draw a TIME SERIES GRAPH
- Answer your question and back it up with justifications
1. In terms of Data Collection
Typical Limitations
- Sample too small
- Not random or
representative
- Taken over too short
a time period
- Outliers distort data
2. In terms of Your Process
Typical Limitations
- Not enough statistics
calculated
- Not enough graphs used,
data could be compared better
- Scales on graphs too large
- Way graphs are drawn
Improvements
- Obtain a bigger sample
- Get a representative
sample
- Take data over a longer
time period
- Ignore extreme outliers
Improvements
- Calculate more statistics
- Use other graphs (i.e.
comparative histograms)
- Change scales on graph
(smaller)
- Alter the way the graphs may
be drawn