Chapter 3 Numerically Summarizing Data

Download Report

Transcript Chapter 3 Numerically Summarizing Data

Chapter 3
Numerically Summarizing
Data
3.2
Measures of Dispersion
To order food at a McDonald’s Restaurant, one must
choose from multiple lines, while at Wendy’s
Restaurant, one enters a single line. The following
data represent the wait time (in minutes) in line for a
simple random sample of 30 customers at each
restaurant during the lunch hour. For each sample,
answer the following:
(a) What was the mean wait time?
(b) Draw a histogram of each restaurant’s wait time.
(c ) Which restaurant’s wait time appears more
dispersed? Which line would you prefer to wait in?
Why?
Wait Time at Wendy’s
1.50
2.53
1.88
3.99
0.90
0.79
1.20
2.94
1.90
1.23
1.01
1.46
1.40
1.00
0.92
1.66
0.89
1.33
1.54
1.09
0.94
0.95
1.20
0.99
1.72
0.67
0.90
0.84
0.35
2.00
Wait Time at McDonald’s
3.50
0.00
1.97
0.00
3.08
0.00
0.26
0.71
0.28
2.75
0.38
0.14
2.22
0.44
0.36
0.43
0.60
4.54
1.38
3.10
1.82
2.33
0.80
0.92
2.19
3.04
2.54
0.50
1.17
0.23
The mean wait time in each line is 1.39
minutes.
The range, R, of a variable is the difference
between the largest data value and the
smallest data values. That is
Range = R = Largest Data Value – Smallest Data Value
EXAMPLE Finding the Range of a Set of Data
Find the range of the student data collected from
Section 3.1
The population variance of a variable is the
sum of squared deviations about the
population mean divided by the number of
observations in the population, N.
That is it is the arithmetic mean of the sum of
the squared deviations about the population
mean.
The population variance is symbolically
represented by lower case Greek sigma squared.
Note: When using the above formula, do not round until
the last computation. Use as many decimals as allowed
by your calculator in order to avoid round off errors.
EXAMPLE
Computing a Population Variance
Compute the population variance of the
population data collected in Section 3.1.
The sample variance is computed by
determining the sum of squared deviations
about the sample mean and then dividing this
result by n – 1.
Note: Whenever a statistic consistently overestimates or
underestimates a parameter, it is called biased. To obtain
an unbiased estimate of the population variance, we
divide the sum of the squared deviations about the mean
by n - 1.
EXAMPLE Computing a Sample Variance
Compute the sample variance using the sample
data from Section 3.1
The population standard deviation is denoted by
It is obtained by taking the square root of the
population variance, so that
EXAMPLE
Computing a Population Standard
Deviation and Sample Standard
Deviation
Compute the population and sample standard
deviation for the data obtained in Section 3.1
EXAMPLE
Comparing Standard Deviations
Determine the standard deviation waiting
time for Wendy’s and McDonald’s. Which
is larger? Why?
EXAMPLE
Comparing Standard Deviations
Determine the standard deviation waiting
time for Wendy’s and McDonald’s. Which
is larger? Why?
Sample standard deviation for Wendy’s:
0.738 minutes
Sample standard deviation for McDonald’s:
1.265 minutes
EXAMPLE Using the Empirical Rule
The following data represent the serum HDL
cholesterol of the 54 female patients of a family
doctor.
41
62
67
60
54
45
48
75
69
60
54
47
43
77
69
60
55
47
38
58
70
61
56
48
35
82
65
62
56
48
37
39
72
63
56
50
44
85
74
64
57
52
44
55
74
64
58
52
44
54
74
64
59
53
(a) Compute the population mean and standard
deviation.
(b) Draw a histogram to verify the data is bellshaped.
(c) Determine the percentage of patients that have
serum HDL within 3 standard deviations of the
mean according to the Empirical Rule.
(d) Determine the percentage of patients that have
serum HDL between 33.8 and 81 according to the
Empirical Rule.
(e) Determine the actual percentage of patients
that have serum HDL between 33.8 and 81.
(a) Using a TI83 plus graphing calculator, we find
  57.4 and   11.8
(b)
  57.4 and   11.8
(c) According to the Empirical Rule, approximately
99.7% of the patients will have serum HDL
cholesterol levels within 3 standard deviations of the
mean. That is, approximately 99.7% of the patients
will have serum HDL cholesterol levels greater than
or equal to 57.4 - 3(11.8) = 22 and less than or equal
to 57.4 + 3(11.8) = 92.8.
  57.4 and   11.8
(d) Because 33.8 is 2 standard deviations below the
mean (57.4 - 2(11.8) = 33.8) and 81 is 2 standard
deviations above the mean (57.4 + 2(11.8) = 81), the
Empirical Rule states that approximately 95% of the
data will lie between 33.8 and 81.
(e) There are no observations below 33.8. There
are 2 observations greater than 81. Therefore,
52/54 = 96.3% of the data lie between 33.8 and 81.
EXAMPLE Using Chebyshev’s Theorem
Using the data from the previous example, use
Chebyshev’s Theorem to
(a) determine the percentage of patients that have
serum HDL within 3 standard deviations of the
mean.
(b) determine the percentage of patients that have
serum HDL between 33.8 and 81.