Transcript Chapter 5

Statistics
• Population Data:
• Including data from ALL people or items with the
characteristic one wishes to understand.
• Sample Data:
• Utilizing a set of data collected and/or selected
from a statistical population by a defined
procedure.
• Can you think of examples?
• When would you use population data?
• EX:
• When would you use sample data?
• EX:
• 5 main methods:
• Random Sampling
• Systematic Sampling
• Stratified Sampling
• Cluster Sampling
• Convenience Sampling
• The “pick a name out of the hat” technique
• Random number table
• Random number generator
Hawkes and Marsh (2004)
• All data is sequentially numbered
• Every nth piece of data is chosen
Hawkes and Marsh (2004)
• Data is divided into
subgroups (strata)
• Strata are based
specific characteristic
• Age
• Education level
• Etc.
• Use random sampling
within each strata
Hawkes and Marsh (2004)
• Data is divided into clusters
• Usually geographic
• Random sampling used to choose clusters
• All data used from selected clusters
Hawkes and Marsh (2004)
• Data is chosen based on convenience
• BE WARY OF BIAS!
Hawkes and Marsh (2004)
• Bias means how far from the true value the
estimated value is.
• If a value has zero bias it is called unbiased.
• Why is this important in statistical studies?
•
•
•
•
•
•
Selection Bias
Omitted- Variable Bias
Funding Bias
Reporting/ Response Bias
Analytical Bias
Exclusion Bias
• Can you think of others?
In a class of 18 students, 6 are chosen for an assignment
Sampling Type
Example
Random
Pull 6 names out of a hat
Systematic
Selecting every 3rd student
Stratified
Divide the class into 2 equal age
groups. Randomly choose 3 from
each group
Cluster
Divide the class into 6 groups of 3
students each. Randomly choose 2
groups
Convenience
Take the 6 students closest to the
teacher
• Determine average student age
• Sample of 10 students
• Ages of 50 statistics students
18
21
42
32
17
18
18
18
19
22
25
24
23
25
18
18
19
19
20
21
19
29
22
17
21
20
20
24
36
18
17
19
19
23
25
21
19
21
24
27
21
22
19
18
25
23
24
17
19
20
• Random number generator
Data Point
Location
Corresponding
Data Value
35
25
48
17
37
19
14
25
47
24
4
32
33
19
35
25
34
23
3
42
Mean
25.1
• Take every
data point
Data Point
Location
Corresponding
Data Value
5
17
10
22
15
18
20
21
25
21
30
18
35
21
40
27
45
23
50
20
Mean
20.8
• Take the first 10 data
points
Data Point
Location
Corresponding
Data Value
1
18
2
21
3
42
4
32
5
17
6
18
7
18
8
18
9
19
10
22
22.5
Mean
Sampling Method vs. Average
Age
25.1
Random
Sampling
20.8
22.5
21.7
Systematic
Sampling
Convenience
Sampling
Actual
• In a group of two or three, create a list of at least
3 pros and 3 cons for each type of sampling.
• In the same group, create a list of when you may
use each type of sampling and for what reason.
• As a group determine which type of sampling is
overall the best, and which is overall the easiest.
• Measures of Central Tendency: Values that describe
the center of distribution. The mean, median, and
mode are 3 measures of central tendency.
• Mean: A measure of central tendency that is
determined by dividing the sum of all values in a data
set by the number of values.
• Frequency Distribution Table: A table that lists a group
of data values, as well as the number of times each
value appears in the data set.
• Outliers: Extreme values in a data set.
•µ
pronounced ‘mu’
• Symbols which represents the mean population
•∑
• Symbol which means ‘the sum of’– represents the
addition of numbers
•N
• Symbol which represents the number of data
values of a given population
• In words:
• Mean =
𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
• In mathematical symbols:
•𝜇 =
𝑥1 +𝑥2 +𝑥3 + ……+𝑥𝑛
𝑁
• x1, x2, etc. are the given data values
•𝑥
pronounced ‘x bar’
• Symbols which represents the sample mean
• ∑
• Symbol which means ‘the sum of’– represents the
addition of numbers
•n
• Symbol which represents the number of data
values of a given sample
• In words:
• Mean =
𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
• In mathematical symbols:
•𝑥 =
𝑥1 +𝑥2 +𝑥3 + ……+𝑥𝑛
𝑛
• x1, x2, etc. are the given data values
• Mark operates a donut business which has 8
employees. There ages are as follows: 55, 63, 34,
59, 29, 46, 51, 41.
• Find the mean age of the workers.
• Which will we use? Population or Sample? Why?
• The selling prices for the last 10 houses sold in a
small town are listed below:
• $125,000
$142,000
$129,500
$89,500
$105,000
$144,000
$168,300
$96,000
$182,300
$212,000
• Calculate the mean selling price of the last 10
homes that were sold. Is this a population or
sample?
• 60 students were asked how many books they had read over
the past 12 months. The results are listed in the frequency
distribution table below. Calculate the mean number of books
read by each student Books
Frequency
0
1
1
6
2
8
3
10
4
13
5
8
6
5
7
6
8
3
• The following data shows the heights in centimeters
of a group of 10th grade students. Organize the
data in a frequency distribution table and
calculate the mean height of the students.
•
183
179
176
183
171
170
164
167
158
182
176
167
171
183
179
176
182
170
183
171
158
171
176
182
164
167
170
179
183
176
183
170
• The mean can be affected by extreme values or
outliers.
• Example:
• If you are employed by a company that paid
all of its employees a salary between
$60,000 and $70,000 you could estimate the
mean salary to be about $65,000. However if
you add the $150,000 of the CEO then the
mean would increase greatly.
• To calculate mean of a sample in the calculator:
• STAT  Edit  Put in your data into L1  2nd
Quit
• STAT  CALC  1-Var Stats  Enter  Enter
• Use technology to determine the mean of the
following set of numbers:
• 24, 25, 25, 25, 26, 26, 27, 27, 28, 28, 31, 32
• In Tim’s school, there are 25 teachers. Each teacher
travels to school every morning in his or her own
car. The distribution of the driving times (in minutes)
from home to school for the teachers is shown in the
table below: Driving Times
Number of teachers
0 to 10 minutes
3
10 to 20 minutes
10
20 to 30 minutes
6
30 to 40 minutes
4
40 to 50 minutes
2
• The following table shows the frequency
distribution of the number of hours spent per week
texting messages on a cell phone by 60 10th grade
students at a local high school. Calculate the mean
number of hours per week spent texting.
Time per Week (hours)
Number of Students
0 to less than 5
8
5 to less than 10
11
10 to less than 15
15
15 to less than 20
12
20 to less than 25
9
25 to less than 30
5
• Median: The value of the middle term in a set of
organized data.
• Cumulative Frequency: The sum of the frequencies
up to and including that frequency.
• Find the median of the following set of data:
• 12, 2, 16, 8, 14, 10, 6
• First organize the data from least to greatest.
• Then find the middle number. When there are
two middle numbers, take the two add them
together and divide by 2.
• Find the median of the following data:
• 7, 9, 3, 4, 11, 1, 8, 6, 1, 4
• The amount of money spent by each of 15 high
school girls for a prom dress is shown below. Find
the median price of a prom dress.
• $250
$175
$325
$195
$450
$300
$275
$350
$425
$150
$375
$300
$400
$225
$360
• To calculate mean of a sample in the calculator:
• STAT  Edit  Put in your data into L1  2nd
Quit
• STAT  CALC  1-Var Stats  Enter  Enter
• Scroll down to the Med button and this gives you
the median of the data.
• The local police department spent the holiday weekend
ticketing drivers who were speeding. 50 locations within the
state were targeted. The number of tickets issued druing the
weekend in each of the locations is shown below. What is the
median number of speeding tickets issued?
•
32
11
3
31
7
17
19
12
10
5
3
25
37
40
15
24
35
7
36
9
8
18
27
37
40
2
16
6
13
10
15
33
42
17
26
19
21
41
9
21
16
23
38
23
18
41
28
33
46
29
• Mode: The value or values that occur with the
greatest frequency in a data set.
• Unimodal: The term used to describe the
distribution of a data set that has only one mode.
• Bimodal: The term used to describe the distribution
of a data set that has 2 modes.
• Multimodal: The term used to describe the
distribution of a data set that has more than two
modes.
• The posted speed limit along a busy highway is 65
miles per hour. The following values represent the
speeds (in mph) of 10 cars that were stopped for
violating the speed limit. Find the mode.
• 76 81 79 80 78 83 77 79
82 75
• Is this unimodal, bimodal, or multimodal?
• The ages of 12 randomly selected customers at a
local coffee shop are listed below. What is the
mode of the ages?
• 23 21 29 24 31 21 27 23
24 32 33 19
• Is this unimodal, bimodal, or multimodal?
•QUESTIONS???