Chapter 3 Slides

Download Report

Transcript Chapter 3 Slides

Chapter 3
Statistics for Describing,
Exploring, and Comparing Data
3-1 Review and Preview
3-2 Measures of Center
3-3 Measures of Variation
3-4 Measures of Relative Standing and
Boxplots
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 1
Preview
 Descriptive Statistics
In this chapter we’ll learn to summarize or
describe the important characteristics of
a known set of data
 Inferential Statistics
In later chapters we’ll learn to use sample
data to make inferences or
generalizations about a population
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 2
Section 3-2
Measures of Center
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 3
Measure of Center
 Measure of Center
the value at the center or middle of a
data set
 The three common measures of
center are the mean, the median, and
the mode.
3.1 - 4
 Mean
the measure of center obtained by
adding the data values and then
dividing the total by the number of
values
What most people call an average
also called the arithmetic mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 5
Notation

Greek letter sigma used to denote the sum
of a set of values.
x
is the variable usually used to represent
the data values.
n
represents the number of data values in a
sample.
3.1 - 6
Example of summation
• If there are n data values that are
denoted as:
x1 , x2 ,  , xn
Then:
 x  x1  x2      xn
3.1 - 7
Example of summation
• data
21,25,32,48,53,62,62,64
Then:
x
21 25  32  48  53  62  62  64
 367
3.1 - 8
Sample Mean
x
is pronounced ‘x-bar’ and denotes the
mean of a set of sample values
x =
x
n
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 9
Example of Sample Mean
• data
21,25,32,48,53,62,62,64
Then:
21 25  32  48  53  62  62  64
x
8
367

 45.875  45.9
8
3.1 - 10
Notation
µ Greek letter mu used to denote the
population mean
N represents the number of data values in a
population.
3.1 - 11
Population Mean
µ =
x
N
Note: here x represents the data values in the
population
3.1 - 12
Mean
 Advantages
 Is relatively reliable: means of samples
drawn from the same population don’t vary
as much as other measures of center
 Takes every data value into account
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 13
Mean
 Disadvantage
Is sensitive to every data value, one
extreme value can affect it dramatically;
is not a resistant measure of center
Example:
x  45 .9
21,25,32,48,53,62,62,300 → x  75 .4
21,25,32,48,53,62,62,64 →
3.1 - 14
Median
 Median
the measure of center which is the middle
value when the original data values are
arranged in order of increasing (or
decreasing) magnitude
 often denoted by x~
(pronounced ‘x-tilde’)
 is not affected by an extreme value - is a
resistant measure of the center
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 15
Finding the Median
First sort the values (arrange them in
order), the follow one of these
1. If the number of data values is odd,
the median is the number located in
the exact middle of the list.
2. If the number of data values is even,
the median is found by computing the
mean of the two middle numbers.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 16
Example of Median
• 6 data values:
5.40
1.10
0.42
0.73
0.48
1.10
• Sorted data:
0.42
0.48
0.73
1.10
1.10
5.40
(even number of values – no exact middle)
0.73  1.1
median 
 0.915
2
3.1 - 17
Example of Median
• 7 data values:
5.40
1.10
0.42
0.73
0.48
1.10
0.73
1.10
1.10
0.66
• Sorted data:
0.42
0.48
0.66
5.40
median  0.73
3.1 - 18
 Mode
the value that occurs with the greatest
frequency
 Data set can have one, more than one,
or no mode
3.1 - 19
Mode
Bimodal
two data values occur with the
same greatest frequency
Multimodal more than two data values occur
with the same greatest
frequency
No Mode
no data value is repeated
Mode is the only measure of central
tendency that can be used with nominal data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 20
Mode - Examples
a) 5.40 1.10 0.42 0.73 0.48 1.10
Mode is 1.10
b) 27 27 27 55 55 55 88 88 99
Bimodal -
c) 1 2 3 6 7 8 9 10
No Mode
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
27 & 55
3.1 - 21
Definition
 Midrange
the value midway between the
maximum and minimum data values in
the original data set (average of
highest and lowest data values)
=
maximum data value + minimum data value
2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 22
Example of Midrange
• data values:
5.41
1.13
0.42
0.49
0.65
1.86
1.69
5.41  0.42
midrange 
 2.915
2
3.1 - 23
Midrange
 Sensitive to extremes
because it uses only the maximum
and minimum values, so rarely used
 Redeeming Features
(1) very easy to compute
(2) reinforces that there are several
ways to define the center
(3) avoids confusion with median
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 24
Best Measure of Center
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 25
Round-off Rule for
Measures of Center
Carry one more decimal place than is
present in the original set of values.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 26
Example of Round-off Rule
• data values:
5.41
1.13
0.42
0.49
0.65
1.86
1.69
5.41 1.13  0.42  0.49  0.65  1.86  1.69
x
7
11.65

 1.6642857
7
x  1.664
3.1 - 27
Example problem 12, page 95
• These data values represent
weight gain or loss in kg for a
simple random sample (SRS)
of18 college freshman (negative
data values indicate weight loss)
11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2
• Do these values support the
legend that college students gain
15 pounds (6.8 kg) during their
freshman year? Explain
3.1 - 28
Example problem 12, page 95
• Sample Mean
 x  34  1.9 kg
n
18
• Median
-5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11
1 2
median 
 1.5 kg
2
3.1 - 29
Example problem 12, page 95
• Mode is -2
• Midrange
 5  11
midrange
 3.0 kg
2
3.1 - 30
Example problem 12, page 95
• All of the measures of center are
below 6.8 kg (15 pounds)
• Based on measures of center,
these data values do not support
the idea that college students
gain 15 pounds (6.8 kg) during
their freshman year
3.1 - 31
Critical Thinking
Think about whether the results
are reasonable.
See example 6, page 89
Think about the method used to
collect the sample data.
See example 7, page 89
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 32
Mean/Median with Graphing
Calculator
• First,
enter the list of data values
•Then select 2nd STAT (LIST) and
arrow right to MATH option 3:mean( or
4: median(
•and input the desired list
3.1 - 33
Example of Computing the
Mean Using Calculator
Sorted amounts of Strontium-90
(in millibecquerels) in a simple
random sample of baby teeth
obtained from Philadelphia
residents born after 1979
Note: this data is related to Three
Mile Island nuclear power plant
Accident in 1979.
x =
x
n
= 149.2
3.1 - 34
Example of Computing the
Mean Using Calculator
Median is
150
3.1 - 35
Part 2
Beyond the Basics of
Measures of Center
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 36
Mean from a Frequency
Distribution
Assume that all sample values in
each class are equal to the class
midpoint.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 37
Mean from a Frequency
Distribution
use class midpoints for variable x
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 38
Example of Computing the
Mean from a Frequency
Distribution
Heights (inches) of 25 Women
67
64
65
65
64
59
67
67
72
65
64
62
66
67
66
60
70
68
61
64
60
68
65
66
62
3.1 - 39
Example (cont.)
Frequency distribution of heights of
women
HEIGHT
FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
Class midpoints are:
59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5
3.1 - 40
Example (cont.)
HEIGHT

FREQUENCY
59-60
3
61-62
3
63-64
4
65-66
7
67-68
6
69-70
1
71-72
1
3(59.5)  3(61.5)  4(63.5)  7(65.5)  6(67.5)  1(69.5)  1(71.5)
25
 1621 .5 / 25  64.86
3.1 - 41
Weighted Mean
When data values are assigned
different weights, we can compute a
weighted mean.
 (w • x)
x =
w
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 42
Example: Weighted Mean
In this class, homework/quizz
average is weighted 10%, 2 exams
are weighted 60%, and final exam is
weighted 30%.
Suppose a student makes
homework/quiz average 87, exam
scores of 80 and 92, and final exam
score 85. Compute the weighted
average.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 43
Example: Weighted Mean
Weighted average is:
0.10(87)  0.30(80)  0.30(92)  0.30(85)
 85.8
1.00
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 44
Skewed and Symmetric
 Symmetric
distribution of data is symmetric if the
left half of its histogram is roughly a
mirror image of its right half
 Skewed
distribution of data is skewed if it is not
symmetric and extends more to one
side than the other
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 45
Skewed Left or Right
 Skewed to the left
(also called negatively skewed) have a
longer left tail, mean and median are to
the left of the mode
 Skewed to the right
(also called positively skewed) have a
longer right tail, mean and median are
to the right of the mode
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 46
Shape of the Distribution
The mean and median cannot
always be used to identify the
shape of the distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 47
Shape of the Distribution
• Consider the data:
5 5 5 5 5 10 10 10 10 10 10 15 15 15 15 15
20 20 20 20 25 25 25 30 30 30 35 35 40 45
•
Mean:
x 560

x

 18.7
n
30
15  15
~
x
 15
2
•
Median:
•
The distribution is skewed and the mean is “pulled”
in the direction of the outliers 40 and 45
3.1 - 48
Distribution Skewed Left
•
Outliers that are much less than the median
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 49
Symmetric Distribution
•
No outliers
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 50
Distribution Skewed Right
•
Outliers that are much more than the median
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 51
Recap
In this section we have
discussed:
 Types of measures of center
Mean
Median
Mode
 Mean from a frequency distribution
 Weighted means
 Best measures of center
 Skewness
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 52
Section 3-3
Measures of Variation
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 53
Key Concept
Discuss characteristics of variation, in
particular, measures of variation, such as
standard deviation, for analyzing data.
Make understanding and interpreting the
standard deviation a priority.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 54
Part 1
Basics Concepts of
Measures of Variation
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 55
Why is it important to
understand variation?
•
•
A measure of the center by itself
can be misleading
Example:
Two nations with the same
median family income are very
different if one has extremes of
wealth and poverty and the other
has little variation among families
(see the following table).
3.1 - 56
Example of variation
MEAN
MEDIAN
Data Set A Data Set B
50,000
10,000
60,000
20,000
70,000
70,000
80,000
120,000
90,000
130,000
70,000
70,000
70,000
70,000
Data set B has more variation
about the mean
3.1 - 57
Histograms: example of variation
Data set B has more variation
about the mean (Target).
3.1 - 58
How do we quantify variation?
3.1 - 59
Definition
The range of a set of data values is
the difference between the
maximum data value and the
minimum data value.
Range = (maximum value) – (minimum value)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 60
Example of range.
Data:
27 28 25 6 27 30 26
Range = 30 - 6 = 24
3.1 - 61
Range (cont.)
Ignoring the outlier of 6 in the previous data set
gives data
27 28 25 27 30 26
Range = 30 - 25 = 5
This shows that the range is very sensitive to
extreme values; therefore not as useful as other
measures of variation.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 - 62
Definition
The standard deviation of a set of
sample values, denoted by s, is a
measure of variation of values about
the mean.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 63
Sample Standard
Deviation Formula
s=
 (x – x)
n–1
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
2
3.1 - 64
Steps to calculate the sample
standard deviation
1. Calculate the sample mean x
2. Find the squared deviations from
the sample mean for each sample
data value:
(x  x)
2
3. Add the squared deviations
4. Divide the sum in step 3 by n-1
5. Take the square root of the
quotient in step 4
3.1 - 65
Example: Standard Deviation
Given the data set:
8, 5, 12, 8, 9, 15, 21, 16, 3
Find the standard deviation
3.1 - 66
Example: Standard Deviation
• Find the mean
 x 8  5  12  8  9  15  21  16  3
x

 10.78
n
9
3.1 - 67
Data Value
Squared Deviations
From the Mean
8
5
12
8
9
15
21
16
3
(8 10.78)2  7.73
(5 10.78) 2  33.41
(12 10.78) 2  1.49
(8 10.78)2  7.73
(9 10.78)2  3.17
(15 10.78)2  17.81
(2110.78)2  104.45
(16 10.78)2  27.25
(3 10.78)2  60.53
3.1 - 68
Example: Standard Deviation
• Add the squared deviations
(last column in the table above)
7.73  33.41  1.49  7.73  3.17  17.81  104 .45  27.25  60.53
 263 .57
3.1 - 69
Example: Standard Deviation
• Divide the sum by 9-1=8:
263 .57 / 8  32.95
• Take the square root:
32.95  5.74
s  5 .7
3.1 - 70
Round-Off Rule for
Measures of Variation
When rounding the value of a
measure of variation, carry one more
decimal place than is present in the
original set of data.
Round only the final answer, not values in
the middle of a calculation.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 71
Sample Standard Deviation
(Shortcut Formula)
n(x ) – (x)
n (n – 1)
2
s=
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
2
3.1 - 72
Example: Standard Deviation
• Page 110, problem 10
Data:
2 1 1 1 1 1 1 4 1 2 2 1 2 3
3 2 3 1 3 1 3 1 3 2 2
The range is 4 - 1 = 3
Determine the standard deviation
using the previous formula
3.1 - 73
Example: Standard Deviation
•
We need to find each the following:
n
(
x
)


2
x

2
x
3.1 - 74
Data Table
(25 data values)
TOTALS:
47
109
3.1 - 75
Example: Standard Deviation
• Thus:
n  25
(x )   x  109
x

47

2
2
3.1 - 76
Example: Standard Deviation
• And:
s
(

n  x  ( x 
2
2
n(n  1)
25(109)  (47

25(24)
2
516

 0.86  0.9
600
3.1 - 77
Example: Standard Deviation
• Page 110, problem 10
Do the results make sense?
NO: although we can determine the
standard deviation, the results make no
sense because the data is actually
categorical (describing phenotype) and
do not represent counts. Note that if we
renumber the categories, the standard
deviation would change but the results
would not.
3.1 - 78
Standard Deviation Important Properties
 The standard deviation is a measure of
variation of all values from the mean.
 The value of the standard deviation s is
never negative and usually not zero.
 The value of the standard deviation s can
increase dramatically with the inclusion of
one or more outliers (data values far away
from all others).
 The units of the standard deviation s are the
same as the units of the original data values.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 79
Calculator Example
• Data from problem on page
111, problem 20 is strontium90 data from previous class
examples
3.1 - 80
Calculator Example
Amounts of Strontium-90
(in millibecquerels) in a simple
random sample of baby teeth
obtained from Philadelphia
residents born after 1979
Note: this data is related to Three
Mile Island nuclear power plant
Accident in 1979.
DIRECTIONS:
Find standard deviation
3.1 - 81
Calculator Example
•To compute standard deviation:
1) Enter the list of data values
2) Select 2nd STAT (LIST) and
arrow right to MATH option 7:stdDev(
3) Input the data by choosing the desired
list
Previous example:
s = 15.0 millibecquerels
3.1 - 82
Calculator Example
Press STAT. Arrow to the right to
CALC. Now choose option #1: 1-Var
Stats. When 1-Var Stats appears on
the home screen, tell the calculator
the name of the list containing the
data
3.1 - 83
Calculator Example
• Strontium-90 data, 1-Var Stats
Displays:
x  149.15 (mean)
x
x
2
 5966
 898586
S x  14.98 (samplestandarddeviation)
ETC.
3.1 - 84
Population Standard
Deviation
 =
 (x – µ)
2
N
This formula is similar to the previous
formula, but instead, the population mean
and population size are used.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 85
Variance
 The variance of a set of values is a
measure of variation equal to the
square of the standard deviation.
 Sample variance: s2 - Square of the
sample standard deviation s
 Population variance: 2 - Square of
the population standard deviation 
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 86
Unbiased Estimator
The sample variance s2 is an
unbiased estimator of the population
variance 2, which means values of
s2 tend to target the value of 2
instead of systematically tending to
overestimate or underestimate 2.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 87
Variance
 Unlike standard deviation, the units
of variance do not match the units of
the original data set, they are the
square of the units in the original
data set.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 88
Variance - Notation
s = sample standard deviation
s2 = sample variance
 = population standard deviation
 2 = population variance
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 89
Part 2
Beyond the Basics of
Measures of Variation
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 90
Range Rule of Thumb
for many data sets, the vast majority
(such as 95%) of sample values lie
within two standard deviations of the
mean.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 91
Range Rule of Thumb
•usual values in a data set are those
that are “typical” and not too extreme
•rough estimates of the minimum and
maximum “usual” sample values can
be computed using the “range rule of
thumb”
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 92
Range Rule of Thumb
minimum “usual” value
= (mean) – 2  (standard deviation)
 x  2s
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 93
Range Rule of Thumb
maximum “usual” value
= (mean) + 2  (standard deviation)
 x  2s
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 - 94
Range Rule of Thumb
 Usual values fall between the
maximum and minimum usual
values:
x  2s  x  x  2s
 Otherwise the value is unusual
3.1 - 95
Example: Standard Deviation
• Page 110, problem 12
Data (kg):
11 3 0 -2 3 -2 -2 5 -2
7 2 4 1 8 1 0 -5 2
3.1 - 96
Example (cont.)
• Thus:
n  18
(x )   x  344
x

34

2
2
3.1 - 97
Example (cont.)
x 34

x

 1.9 kg
n
s
(
18

n  x  ( x 
2
n(n  1)
2
18(344)  (34

18(17)
2
5036

 4.1 kg
306
3.1 - 98
Example (cont.)
• Minimum usual value
x  2s  1.9  2(4.1)  6.3 kg
• Maximum usual value
x  2s  1.9  2(4.1)  10.1 kg
3.1 - 99
Example (cont.)
• Because 6.8 kg falls between
the minimum usual value and
the maximum usual value, it is
not considered “unusual”
3.1 -
Range Rule of Thumb for
Estimating Standard Deviation s
To roughly estimate the standard
deviation from a collection of known
sample data use
range
s
4
where
range = (maximum value) – (minimum value)
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: Standard Deviation
• Page 110, problem 12
Data (kg):
11 3 0 -2 3 -2 -2 5 -2
7 2 4 1 8 1 0 -5 2
Range rule of thumb:
11  (5) 16
s

 4 kg
4
4
3.1 -
Properties of the
Standard Deviation
• Measures the variation among data
values
• Values close together have a small
standard deviation, but values with
much more variation have a larger
standard deviation
• Has the same units of measurement
as the original data
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Properties of the
Standard Deviation
• For many data sets, a value is unusual
if it differs from the mean by more
than two standard deviations
• Compare standard deviations of two
different data sets only if the they use
the same scale and units, and they
have means that are approximately
the same
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Empirical (or 68-95-99.7) Rule
For data sets having a distribution that is
approximately bell shaped, the following
properties apply:
 About 68% of all values fall within 1
standard deviation of the mean.
 About 95% of all values fall within 2
standard deviations of the mean.
 About 99.7% of all values fall within 3
standard deviations of the mean.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
The Empirical Rule
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
The Empirical Rule
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
The Empirical Rule
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: The Empirical Rule
Page 113, problem 34
One standard deviation from the
mean is between:
x  s  125.0  0.3  124.7 volts
x  s  125.0  0.3  125.3 volts
68% of the data are between 124.7
volts and 125.3 volts
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: The Empirical Rule
Page 113, problem 34
Two standard deviations from the
mean is between:
x  2s  125.0  2(0.3)  124.4 volts
x  2s  125.0  2(0.3)  125.6 volts
95% of the data are between 124.4
volts and 125.6 volts
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: The Empirical Rule
Page 113, problem 34
Three standard deviations from the
mean is between:
x  3s  125.0  3(0.3)  124.1 volts
x  3s  125.0  3(0.3)  125.9 volts
99.7% of the data are between 124.1
volts and 125.9 volts
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: The Empirical Rule
Page 113, problem 34
a) 95% of the data are between 124.4
volts and 125.6 volts
b) 99.7% of the data are between
124.1 volts and 125.9 volts
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Chebyshev’s Theorem
The proportion (or fraction) of any set of
data lying within K standard deviations
of the mean is always at least 1–1/K2,
where K is any positive number greater
than 1.
 For K = 2, at least 3/4 (or 75%) of all
values lie within 2 standard
deviations of the mean.
 For K = 3, at least 8/9 (or 89%) of all
values lie within 3 standard
deviations of the mean.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: Chebyshev’s Theorem
Page 113, problem 36
By Chebyshev’s Theorem there
must be
1  1 / 3  1  1 / 9  8 / 9  0.89  89%
2
within three standard deviations of
the mean
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: Chebyshev’s Theorem
Page 113, problem 36
The minimum voltage amount within
3 standard deviations of the mean is
x  3s  125.0  3(0.3)  124.1 volts
and the maximum amount is
x  3s  125.0  3(0.3)  125.9 volts
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Comparing Variation in
Different Samples
It’s a good practice to compare two
sample standard deviations with s only
when the sample means are
approximately the same.
In general, it is better to use the
coefficient of variation when comparing
any two standard deviations.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Coefficient of Variation
The coefficient of variation (or CV) for a set
of nonnegative sample or population data,
expressed as a percent, describes the
standard deviation relative to the mean.
Sample
CV =
s  100%
x
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
Population
CV =

 100%
m
3.1 -
Example: Comparing Variation
Page 112, problem 24
Waiting times for Jefferson Valley:
x 71.5

x

 7.15min
n
s
(
10

n  x  ( x 
2
n(n  1)
2
 0.48 min
CV  s / x  0.48/ 7.15  0.067  6.7%
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: Comparing Variation
Page 112, problem 24
Waiting times for Providence:
x 71.5

x

 7.15min
n
s
(
10

n  x  ( x 
2
n(n  1)
2
 1.82 min
CV  s / x  1.82 / 7.15  22.5%
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Example: Comparing Variation
Page 112, problem 24
CV for Jefferson Valley wait times much
smaller than CV for Providence wait
times implies that variation for
Jefferson Valley is much smaller than
that for Providence
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Formula for Standard
Deviation of a Sample
s=
 (x – x)
n–1
2
Why do we use n-1 instead of n?
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Rationale for using n – 1
versus n
There are only n – 1 independent
values. With a given mean, only n – 1
values can be freely assigned any
number before the last value is
determined (see homework problem
from 3-2 number 35).
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Rationale for using n – 1
versus n
Dividing by n – 1 yields better results
than dividing by n. It causes s2 to
target 2 whereas division by n causes
s2 to underestimate 2.
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Recap
In this section we have looked at:
 Range
 Standard deviation of a sample and
population
 Variance of a sample and population
 Range rule of thumb
 Empirical distribution
 Chebyshev’s theorem
 Coefficient of variation (CV)
Copyright © 2010
2010,Pearson
2007, 2004
Education
Pearson Education, Inc. All Rights Reserved.
3.1 -
Section 3-4
Measures of Relative
Standing and Boxplots
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Key Concepts
•This section introduces measures of
relative standing, which are numbers
showing the location of data values
relative to the other values within a data
set.
•Measures of relative standing can be
used to compare values from different
data sets, or to compare values within
the same data set.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Key Concepts
•The most important concept is the z
score. We will also discuss percentiles
and quartiles, as well as a new
statistical graph called the boxplot.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Part 1
Basics of z Scores,
Percentiles, Quartiles, and
Boxplots
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Z score
 z Score
(or standardized value)
the number of standard deviations
that a given value x is above or below
the mean
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Measures of Position z Score
Sample
x
–
x
z= s
Population
x
–
µ
z=

Round z scores to 2 decimal places
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Interpreting Z Scores
Whenever a value is less than the mean, its
corresponding z score is negative
Ordinary values:
–2 ≤ z score ≤ 2
Unusual Values:
z score < –2 or z score > 2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 6 on page 127
Given mean and standard
deviation:
x  43.8 years and s  8.9 years
a) Difference between Hoffman’s
age and mean age is
x  x  38  43.8 5.8 years
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 6 on page 127
Note: book answer is absolute
value of the difference between
Hoffman’s age and mean age which
is
x  x  38  43.8   5.8 years 5.8 years
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 6 on page 127
b) How many standard deviations
is that?
Here take the ratio of the absolute
value of the difference and the
standard deviation:
5.8
 0.65
8.9
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 6 on page 127
c) Convert Hoffman’s age to a z
score
Here take the ratio of the difference
and the standard deviation:
 5.8
 0.65
8.9
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 6 on page 127
d) Is Hoffman’s age usual or
unusual?
“Usual” means z score is between 2 and 2 and since -0.65 is between 2 and 2, Hoffman’s age is usual.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 10 on page 127
Given mean and standard deviation
for heights of women:
x  63.6in and s  2.5 in
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 10 on page 127
Z scores for height requirements
for women in army
• Minimum height requirement z
score
x  x 58  63.6

 2.24 years
s
2.5
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 10 on page 127
Z scores for height requirements
for women in army
• Maximum height requirement z
score
x  x 80  63.6

 6.56 years
s
2.5
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 10 on page 127
• Minimum height requirement z
score is less than -2 so it is
considered unusual
• Maximum height requirement z
score is greater than 2 so it is
considered unusual
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 13 on page 128
• Score on SAT test is 1190 with
mean of scores on SAT test 1518
and standard deviation 325 has z
score of
x  x 1840- 1518

 0.99
s
325
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 13 on page 128
• Score on ACT test is 26 with mean
of scores on ACT test 21 and
standard deviation 4.8 has z score
of
x  x 26  21

 1.02
s
4.8
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Z score
• Problem 13 on page 128
Comparing z scores the ACT z
score was better than the SAT z
score which means the ACT score
was higher above the mean than
the SAT score (relative to the
standard deviation) so the ACT
score is better.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Percentiles
are measures of location. There are 99
percentiles denoted P1, P2, . . . P99,
which divide a set of data into 100
groups with about 1% of the values in
each group.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Finding the Percentile
of a Data Value
Percentile of value x =
number of values less than x
100
total number of values
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Percentile
• Problem 16 on page 128
Data (NFL superbowl total points):
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
Find the percentile corresponding
to 65
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Percentile
• Problem 16 on page 128
Percentile of value x =
number of values less than x
100
total number of values
20

100  83
24
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Converting from the kth Percentile to
the Corresponding Data Value
Notation
total number of values in the
data set
k percentile being used
L locator that gives the
position of a value
Pk kth percentile
n
L=
k
100
•n
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Converting from the
kth Percentile to the
Corresponding Data Value
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Data Value
Given Its Percentile
• Problem 22 on page 128
Data (NFL superbowl total points):
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
Find the data value at P80
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Data Value
Given Its Percentile
• Problem 22 on page 128
• For 80th percentile P80 , the k
value in the locator formula is 80
k
80
L
n 
 24  19.2
100
100
• If L is not a whole number, then
round up to next whole number:
L  20
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Data Value
Given Its Quartile
• The 20th score is 61:
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
x20  61
Thus:
P80  61
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Quartiles
Are measures of location, denoted Q1, Q2, and
Q3, which divide a set of data into four groups
with about 25% of the values in each group.
 Q1 (First Quartile) separates the bottom
25% of sorted values from the top 75%.
 Q2 (Second Quartile) same as the median;
separates the bottom 50% of sorted
values from the top 50%.
 Q3 (Third Quartile) separates the bottom
75% of sorted values from the top 25%.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Quartiles
The first quartile is the 25th percentile
Q1  P25
• The second quartile is the 50th percentile
Q2  P50
• The third quartile is the 75th percentile
Q3  P75
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Data Value
Given Its Quartile
• Problem 20 on page 128
Data (NFL superbowl total points):
36 37 37 39 39 41 43 44 44 47 50 53
54 55 56 56 57 59 61 61 65 69 69 75
Find the data value at
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
Q1
3.1 -
Example of Finding Data Value
Given Its Quartile
• Convert quartile to percentile:
Q1  P25
• For 25th percentile P25 , the k
value in the locator formula is 25
k
25
L
n 
 24  6
100
100
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Finding Data Value
Given Its Quartile
• If L is a whole number, then round
up to next whole number, find the
percentile by adding the Lth value
and the next value then dividing by
2
• Sixth value is 41 and seventh value
is 43:
x6  x7 41  43
Q1 

 42
2
2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Quartiles
Q1, Q2, Q3
divide ranked scores into four equal parts
25%
(minimum)
25%
25% 25%
Q1 Q2 Q3
(maximum)
(median)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Some Other Statistics
 Interquartile Range (or IQR): Q3 – Q1
 Semi-interquartile Range:
Q3 – Q1
2
 Midquartile:
Q3 + Q1
2
 10 - 90 Percentile Range: P90 – P10
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
5-Number Summary
 For a set of data, the 5-number
summary consists of the
minimum value; the first quartile
Q1; the median (or second
quartile Q2); the third quartile,
Q3; and the maximum value.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Boxplot
 A boxplot (or box-and-whiskerdiagram) is a graph of a data set
that consists of a line extending
from the minimum value to the
maximum value, and a box with
lines drawn at the first quartile,
Q1; the median; and the third
quartile, Q3.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Boxplots
Boxplot of Movie Budget Amounts
(See example 7 and 8 on page 121)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Boxplots - Normal Distribution
Normal Distribution:
Heights from a Simple Random Sample of Women
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Boxplots - Skewed Distribution
Skewed Distribution:
Salaries (in thousands of dollars) of NCAA Football Coaches
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Five Number
Summary and Boxplots
• Problem 30 on page 128 uses 10
data values from Strontium-90 data
ordered as follows:
128 130 133 137 138 142 142 144 147 149
151 151 151 155 156 161 163 163 166 172
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Five Number
Summary and Boxplots
• Min data value is 128
• First quartile
Q1  P25
k
25
L
n 
 20  5
100
100
x5  x6 138  142
Q1 

 140
2
2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Five Number
Summary and Boxplots
• Second quartile
Q2  P50  median
k
50
L
n 
 20  10
100
100
x10  x11 149  151
Q2 

 150
2
2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Five Number
Summary and Boxplots
• Third quartile
Q3  P75
k
75
L
n 
 20  15
100
100
x15  x16 156  161
Q3 

 158 .5
2
2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Example of Five Number
Summary and Boxplots
• Max data value is 172
• Five Number Summary
128 140 150 158.5
• Boxplot?
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
172
3.1 -
Example of Five Number
Summary
• Boxplot
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Five Number
Summary
• Five Number Summary (from
website mathbits.com)
• Enter the data in a list:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Five Number
Summary
• Go to STAT - CALC and
choose 1-Var Stats
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Five Number
Summary
• On the HOME screen, when
1-Var Stats appears, type the list
containing the data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Five Number
Summary
• Arrow down to the five number
summary (last five items in the list)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Boxplot
• CLEAR out the graphs under y = (or
turn them off).
• Enter the data into the calculator lists.
(choose STAT, #1 EDIT and type in
entries)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Boxplot
• CLEAR out the graphs under y = (or
turn them off).
• Enter the data into the calculator lists.
(choose STAT, #1 EDIT and type in
entries)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Boxplot
•
Press 2nd STATPLOT and choose #1 PLOT
1. Be sure the plot is ON, the second box-andwhisker icon is highlighted, and that the list you
will be using is indicated next to Xlist.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Calculator: Boxplot
• To see the box-and-whisker plot, press
ZOOM and #9 ZoomStat. Press the
TRACE key to see on-screen data about
the box-and-whisker plot.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Part 2
Outliers and
Modified Boxplots
OMIT THIS PART OF 3-4
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Recap
In this section we have discussed:
 z Scores
 z Scores and unusual values
 Percentiles
 Quartiles
 Converting a percentile to corresponding
data values
 Other statistics
 5-number summary
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -
Putting It All Together
Always consider certain key factors:
 Context of the data
 Source of the data
 Sampling Method
 Measures of Center
 Measures of Variation
 Distribution
 Outliers
 Changing patterns over time
 Conclusions
 Practical Implications
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved.
3.1 -