DIS_ch_4 - Investigadores CIDE

Transcript DIS_ch_4 - Investigadores CIDE

HAWKES LEARNING SYSTEMS
math courseware specialists
Copyright © 2010 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Chapter 4
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Ch 4. Describing Data From One Variable
Sections 4.1-4.3a Measures of Location
4.1 Measures of Location
Objectives:
• To calculate the mean, median, and mode.
• To determine the most appropriate measure of center.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Measures of Location:
• If we think about a data set as a group of data values that cluster
around some central value, then the central value provides a focal
point for the set, a location of sorts.
• Unfortunately, the notion of central value is a vague concept,
which is as much defined by the way it is measured as by the
notion itself.
• There are several statistical measures that are used to define the
notion of center: the arithmetic mean, trimmed mean, median, and
mode.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
The Arithmetic Mean:
• Suppose there are n observations in a data set, consisting of the
observations x1 , x 2 , ..., x n ; then the arithmetic mean is 1  x  x  ...  x  .
n
1
2
n
•The mean is what we typically call the “average” of a data set.
• To calculate the mean, simply add all the values and divide by the
total number in the data set.
• Mean should only be used for quantitative data.
• Outliers have a dramatic effect on the mean value.
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.1 Measures of Location
math courseware specialists
The Arithmetic Mean:
• If we use mathematical notation, the formula can be simplified to
x
th
 ni where x is the i data value in the data set and 
(pronounced sigma) is a mathematical notation for adding values.
i
• There are two symbols that are associated with mean:
• x 
1
n
•
 
 x1  x 2  ...  x n  th e s a m p le m e a n, a n d
1
N
 x1  x 2  ...  x n 
th e p o p u la tio n m e a n.
• Here n refers to the size of the sample and N refers to the size
of the population. Otherwise, the calculations are made in
precisely the same way.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Example:
Calculate the sample mean of the following heights in inches.
63, 68, 71, 67, 63, 72, 66, 67, 70
Solution:

607
9
When calculating the mean, round to one more decimal place than
what is given in the data.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Deviation:
• Given some point A and a data point x, then x – A represents how
far x deviates from A. This difference is also called a deviation.
• The table below shows the deviations from the mean for the
following sample data values: 4, 10, 7, 15. The mean of this data
set is 9.
Deviations
Data Value
from the mean
xi
(xi – 9)
1
x =  4 + 10 + 7 + 15  = 9.
4
–5
4
10
1
7
–2
15
6
x
i
9=0
Notice that the sum of the deviations is zero. This illustrates why the
mean is a measure of central tendency. If we calculate the deviations
about any other value the sum of the deviations will not equal zero.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
The Median:
• The median of a set of data values is the middle value in an
ordered array. The same number of values is on either side of the
median value.
Median is the sum of the
two middle values in the
data divided by two.
Arrange the
data in
ascending
order.
Count the
number of
values in the
data
Median is the middle
value in the data.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.1 Measures of Location
math courseware specialists
Example:
Calculate the median of the following sets of data.
a.
15 16 11 22 19 10 17 22
Solution:
10 11 15 16 17 19 22 22
16 + 17
= 1 6 .5
2
b.
2.6 3.3 5.0 1.8 0.7 2.2 4.1 6.1 6.7
Solution:
0.7 1.8 2.2 2.6 3.3 4.1 5.0 6.1 6.7
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
The Trimmed Mean:
• The trimmed mean ignores an equal percentage of the highest
and lowest values in calculating the mean.
For calculating
10% trimmed
mean, arrange
the data in
ascending order
Delete the
lowest 10%
of the values
Delete the
highest 10%
of the values
Calculate the
arithmetic mean
of the remaining
80% of the
values.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Example:
Consider the following data:
16 18 20 21 23 23 24 32 36 42
mean = 25.5 median = 23
Find the 10% trimmed mean.
Solution:
Since there are 10 observations, removing the highest 10% and lowest 10%
means only removing one observation from each end of the data.
1 0 % trim m e d m e a n = 1 8 + 2 0 + 2 1+ 2 3 + 2 3 + 2 4 +3 2+3 6
8
= 2 4 .6 2 5
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Resistant Measures:
• Statistical measures which are not affected by outliers are said to
be resistant.
• The mean is not a resistant measure.
• The trimmed mean is a resistant measure.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
The Mode:
• The mode of a data set is the most frequently occurring value.
• The mode is the only measure of centralness that can be applied
to nominal data.
• When a data set has two modes it is said to be bimodal.
• When the data set has more than two modes it is said to be
multimodal.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Example:
Calculate the mode of each data set.
a.
63 68 71 67 63 72 66 67 70
Solution:
There are two modes: 63 and 67. The data set is bimodal.
b.
51 77 54 51 68 70 54 65 51
Solution:
51 occurs three times. The mode is 51.
c.
1 5 7 3 2 0 4 6
Solution:
Each value appears only once. There is no mode.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
The Relationship between the Mean and Median:
• The shape of the data determines how the mean, median, and
mode are related.
• For a bell-shaped distribution, the mean, median, and mode are
identical.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.1 Measures of Location
Skewed Distributions:
• Not all data produce distributions which follow a bell-shaped curve.
• If the distribution of the data has a long tail to the right, it is said to be
skewed to the right, or positively skewed.
• Conversely, if the distribution has a long tail on the left, the data is said
to be skewed to the left, or negatively skewed.
If the data is positively skewed, the
median will be smaller than the
mean.
If the data is negatively skewed,
the median will be larger than the
mean.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.2 Selecting a Measure of Location
Selecting a Measure of Location:
• The objective of using descriptive statistics is to provide
measures which convey useful summary information about the
data.
• When selecting a statistic to represent the central value of a data
set, the first question involves what type of data is being analyzed.
• The arithmetic mean is frequently, but not always, the most
reasonable measure of centralness.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.2 Selecting a Measure of Location
math courseware specialists
Selecting a Measure of Location:
To the right is a table that
defines the applicable levels of
measurement for each measure
of location.
Measure
of
Location
Qualitative
nominal
ordinal
median
t-mean

Quantitative
interval
ratio










mean
mode
Measure
of
Location
Applicable Level of
Measurement
not
very
sensitive sensitive

mean
median

mode

t-mean

To the left is a table that defines the
sensitivity to outliers for each measure of
location.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.2 Selecting a Measure of Location
Selecting a Measure of Location:
• The mean and median are the same value when the data is
symmetrical.
• When the data is nominal or ordinal, the mean should not be
calculated.
• When the data is at least interval and there are no outliers the
mean is a reasonable choice.
• When the data is at most ordinal, then the median is the best
choice.
• The median is a good measure of central tendency since it is not
sensitive to outliers.
• The median can be applied to all levels of measurement except
nominal.
• The mode can be applied to all levels of data, but is not very
useful for quantitative data.
• If the data is nominal, there is only one choice, the mode.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.2 Selecting a Measure of Location
Time Series Data and Measures of Centralness:
• The graph below shows the average gas price over a number of
years. In this non-stationary time series, the central value of the
process is trending upward.
• One way to capture this movement is with a moving average.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.2 Selecting a Measure of Location
math courseware specialists
Moving Average:
• A moving average is obtained by adding consecutive
observations for a number of periods and dividing the result by the
number of periods included in the average.
• The table below shows the average US gas price from 1991 to
2002 along with the 2 and 3 period moving averages.
Year
Average
US Gas
Price
2 Period
Moving
Average
3 Period
Moving
Average
Year
Average
US Gas
Price
2 Period
Moving
Average
3 Period
Moving
Average
1991
1.09
1997
1.18
1.195
1.167
1992
1.10
1.095
1998
1.01
1.095
1.333
1993
1.07
1.085
1.087
1999
1.14
1.075
1.110
1994
1.08
1.075
1.083
2000
1.49
1.315
1.213
1995
1.11
1.095
1.087
2001
1.38
1.435
1.337
1996
1.21
1.160
1.133
2002
1.34
1.360
1.403
• The two-period moving average for 1992 1.09 +1.10
=1.095.
averages the time series in 1991 and 1992:
2
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.2 Selecting a Measure of Location
Moving Average:
• The chart below displays the time series and the two and threeperiod moving averages.
• Notice that both of the averages follow the time series quite
closely.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Ch 4. Describing Data From One Variable
Sections 4.1-4.3b Measures of Dispersion
4.1 Measures of Location
Objective:
•To compute the range, variance, and standard deviation.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.3 Measures of Dispersion
Measuring Variation:
• Many of the good measures of variation use the concept of
deviation from the mean.
• If the mean is a focal point or base, use it as a common basis
from which to measure variation.
• The distance that a point is from its mean is called the deviation
from the mean.
• The sum of the positive deviations equals the sum of the absolute
values of the negative deviations.
•The deviations will always sum to zero.
• Many of the variability measures average the deviations in some
form.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.3 Measures of Dispersion
math courseware specialists
Example:
A data set and its deviations from the mean are calculated in the
table below. Notice that the sum of the deviations is zero.
Data set: 3, 12, 20, 15, 0
Mean = 10
Data
Values
Deviations from the mean
data – mean = deviation
3
3 – 10 =
–7
12
12 – 10 =
2
20
20 – 10 =
10
15
15 – 10 =
0
0 – 10 =
5
– 10
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.3 Measures of Dispersion
math courseware specialists
Mean Absolute Deviation:
• The sample mean absolute deviation (MAD) is

MAD =
xi - x
.
n
• Computes the average distance from the mean of a data set.
• If data set A has a larger deviation than B, then it is reasonable to
believe that data set A has more variability than data set B.
• Intuitive measure of variation.
• Theoretical development has been hampered due to the
difficulty that absolute values pose to calculus.
• Sensitive to outliers and not a resistant measure.
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.3 Measures of Dispersion
math courseware specialists
Example:
Suppose six people participated in a 1000 meter run. Their times,
measured in minutes, are given below. The mean time is 8.333 minutes.
Calculate the mean absolute deviation.
Time in
min.
4
10
9
11
9
7
Deviation
Absolute
Deviation
% of
total
4 – 8.333 = – 4.333
10 – 8.333 = 1.667
9 – 8.333 = 0.667
11 – 8.333 = 2.667
4.333
1.667
0.667
2.667
38.23
14.71
5.88
23.53
9 – 8.333 = 0.667
7 – 8.333 = – 1.333
0.667
1.333
5.88
11.77
11.334
100.00
Total
4.333+1.667+0.667+2.667+0.667+1.333 =
4 .3 3 3 1 0 0 = 3 8 .2 3
1 1 .3 3 4
M e a n A b so lu te D e v ia tio n = 1 1 .3 3 4 =1 .8 8 9 m in u te s.
6
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.3 Measures of Dispersion
Variance and Standard Deviation:
• Standard deviation and variance are the most common
measures of variability.
• The standard deviation and variance also provide numerical
measures of how the data varies around the mean.
• If the data is tightly packed around the mean, the standard
deviation and variance will be relatively small.
• If the data is widely dispersed about the mean, the standard
deviation and variance will be relatively large.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.3 Measures of Dispersion
math courseware specialists
Variance:
• The variance of a data set containing the complete set of
population data is given by:

2

 (x
i
 )
2
N
and is called the population variance.
• The variance of a data set containing sample data is given by:
s
2


(xi  x )
n 1
and is called the sample variance.
2
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.3 Measures of Dispersion
math courseware specialists
Example:
Given the following times in minutes of 6 persons running the 1000 meter
course, compute the sample variance. The sample mean is 8.333.
4, 10, 9, 11, 9, 7
Data
Deviations
4
4 – 8.333 = – 4.333
Squared
Deviations
18.7749
10
10 – 8.333 = 1.667
2.7789
8.87
9
9 – 8.333 = 0.667
0.4449
1.42
11
11 – 8.333 = 2.667
7.1129
22.70
9
9 – 8.333 = 0.667
0.4449
1.42
7
7 – 8.333 = – 1.333
1.7769
5.67
31.33
100.00
Total

s2 =
59.93

  x i  x 
n 1
% of total
=
3 1 .3 3
= 6 .2 6 6 sq u a re d m in u te s.
5
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.3 Measures of Dispersion
Standard Deviation:
• The standard deviation is the square root of the variance.
• There are two measures of variance, so there will be two
standard deviations.
• The sample standard deviation s =
• The population standard deviation  
s
2

2
• It is important to remember the symbols above since standard
deviation is a fundamental statistical concept.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.3 Measures of Dispersion
Standard Deviation:
• Standard deviation is the square root of the average squared
deviation.
• It can also be used to measure how far the data values are from
the mean.
• Relatively few data values will be more than two deviation units
from the mean.
• Like the variance, the standard deviation is sensitive to outliers.
• The presence of outliers tarnishes the interpretation of the
standard deviation as a typical deviation.
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.3 Measures of Dispersion
math courseware specialists
Range:
• The range is the difference between the largest and smallest data
values.
Example:
Calculate the range of the following data set.
4, 6, 16, 9, 24, 8, 0, 12, 1
Solution:
The largest value is 24 and the smallest value is 0.
Range = 24 – 0 = 24.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Objectives:
• Determine the percentiles and locations of specific data points.
• Find the quartiles of the data.
• Determine the z-score as a measure of relative position.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Pth Percentile:
• Given a set of data x1, x2,…,xn, the Pth percentile is a value say,
X, such that at least P percent of the data is less than or equal to X
and at least (100 – P) percent of the data is greater than or equal
to X.
• The most often used measure of relative position is the
percentile.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Pth Percentile:
To determine the Pth percentile:
• Form an ordered array by placing the data in order from
smallest to largest
• To find the location of the Pth percentile in the ordered array,
let
 P 
 n

100


where n is the number of observations in the ordered data.
• If
is not an integer, then round to the next greatest integer.
• If is an integer, then average the data value in the
location with the data value in the  1 location.
• Remember, is not the percentile, is the location of the
percentile in the ordered array.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Determining the Pth Percentile Flow Chart:
Arrange the
data in
ascending
order.
To find the Pth
percentile in the
ordered data, calculate,
 P 
 n

 100 
where n is the number
of observations in the
ordered data.
Is
an integer?
Yes
Average the data value
in the location
with the data value in
the  1 location
No
Round up
to next
greatest
integer.
Find the data
value in the
th
location.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Find the 50th percentile for the following data set.
3, 5, 0, 1, 9, 2, 7
Solution:
 50 
 7 
 = 3 .5
 100 
Since the location is not an integer, the value is rounded up to 4.
0, 1, 2, 3, 5, 7, 9
Thus, the fourth observation in the ordered array would be the
median.
The median value (which is the 50th percentile) equals 3.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Find the 50th percentile for the following data set.
3, 5, 0, 1, 9, 2, 7, 6
Solution:
 50 
 8 
=4
 100 
Since the location is an integer, we average the 4th value and the
5th value of the ordered array.
0, 1, 2, 3, 5,6, 7, 9
3+5 = 8 = 4
2
2
The 50th percentile for this data set is 4.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Percentile:
• The percentile of some data value x is given by:
p e rc e n tile o f x  n u m b e r o f d a ta v a lu e s  x 1 0 0
to ta l n u m b e r o f d a ta v a lu e s
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Find the percentile of 45 for the following data set.
67, 45, 63, 58, 35, 54, 27, 66, 21, 48
Solution:
The values less than or equal to 45 are:
21, 27, 35, 45, 48, 54, 58, 63, 66, 67
So the number of values less than or equal to 45 is 4.
p e rce n tile o f 4 5 =
4
10
1 0 0 = 4 1 0 = 4 0 .
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Quartiles:
• The 25th, 50th, and 75th percentiles are known as quartiles and
are denoted as Q1, Q2, and Q3.
• Quartiles serve as markers to divide the data.
• Q1 separates the lowest 25 percent.
• Q2 represents the median (50th percentile).
• Q3 marks the beginning of the top 25 percent of the data.
• Since quartiles are nothing more than percentiles, we construct
them in the same way as before.
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Find Q1, Q2, and Q3 for the following data set of test scores.
50, 50, 62, 75, 77, 82, 86, 87, 88, 88
Solution:
 25 
 10  
 = 2 .5
 100 
 50 
 10  
=5
 100 
 75 
 10  
 = 7 .5
 100 
Q = 2 5 th p e rce n tile = 3 rd d a ta v a lu e = 6 2 .
1
Q = 5 0 th p e rce n tile =
7 7 +8 2
2
= 7 9 .5 .
2
Q = 7 5 th p e rce n tile = 8 th d a ta v a lu e = 8 7 .
3
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Interquartile Range:
• The interquartile range, which describes the range of the middle
fifty percent of the data, is given by
Interquartile range = Q3 – Q1.
• For the previous example the interquartile range is 87 – 62 = 25.
• A data point is considered an outlier if it is 1.5 times the
interquartile range above the 75th percentile or 1.5 times the
interquartile range below the 25th percentile.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Section 4.4 Measures of Relative Position
Box Plots:
• An important use of quartiles is the construction of box plots.
• Box plots are graphical summaries of data which looks like a box.
• It provides an alternative method to the histogram for displaying data.
• A box plot is a graphical summary of central tendency, the spread, the
skewness, and the potential existence of outliers in the data.
• Below is a box plot of the test scores data set.
0 10 20 30 40 50 60 70 80 90 100 110 120 130
• The plot is constructed from five summary measures:
• largest data value
• smallest data value
• 25th percentile
• 75th percentile
• median
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Find the outliers in this new data set of test scores.
12, 50, 62, 75, 77, 82, 86, 87, 88, 126
Q1 = 62, Q2 = 79.5, Q3 = 87, and interquartile range = 25
Solution:
Larger than 75th percentile + 1.5 times the interquartile range = 124.5
8 7 + 1 .5  2 5 = 1 2 4 .5
Smaller than 25th percentile – 1.5 times the interquartile range = 24.5
6 2  1 .5  2 5 = 2 4 .5
The outliers of this data set are 12 and 126.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Z-Scores:
• The z-score transforms the data value into the number of
standard deviations that value is from the mean.
z 
x 

Remember:
  m ean
  sta n d a rd d e v ia tio n
• Describing the number of standard deviations is a fundamental
concept of statistics.
• It is used as a standardization technique.
• If the z-score is negative, the value is less than the mean.
• If the z-score is positive, the value is greater than the mean.
• The z-score is unit free measure.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.4 Measures of Relative Position
math courseware specialists
Example:
Suppose you scored an 86 on your biology
test and a 94 on your psychology test. The
Course
mean and standard deviation of the two tests
are given to the right.
Biology
What are the z-scores for your two tests?
Psychology
On which test did you perform relatively
better?
Mean
Standard
Deviation
74
10
82
11
Solution:
The z-score for the biology test is:
z=
The z-score for the psychology test is:
86  74
= 1.2.
10
z=
94  82
= 1 .0 9 .
11
Even though the raw score on the psychology test is larger than the raw
score on the biology test, the performance on the biology test was slightly
better.
HAWKES LEARNING SYSTEMS
math courseware specialists
Describing Data from One Variable
Sections 4.5-4.10 Applying the Standard Deviation
Objectives:
• To calculate the coefficient of variation and use it to compare
the variation of different data sets.
• To calculate the mean, variance, and standard deviation of
grouped data.
• To use the empirical rule and Chebyshev’s Theorem to
describe the variability of data.
HAWKES LEARNING SYSTEMS
math courseware specialists
Empirical Rule:
Describing Data from One Variable
Section 4.5 Using the Standard Deviation
If the distribution is bell-shaped:
One sigma rule: about 68% of the data should lie
within one standard deviation of the mean.
A deviation of more than one sigma is to be
expected once every three observations.
Two sigma rule: about 95% of the data should lie
within two standard deviations of the mean.
A deviation of more than two sigma is to be
expected about once every twenty observations.
Three sigma rule: about 99.7% of the data should
lie within three standard deviations of the mean.
A deviation of more than three sigma is to be
expected about once every 333 observations,
slightly less than 0.3% of the time.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.5 Using the Standard Deviation
math courseware specialists
Chebyshev’s Theorem:
• The proportion of any data set lying within k standard
deviations of the mean is at least
1
1
k
• k = 2: At least
1
1
2
2
2
, fo r k  1 .
=
3
4
(or 75%) of the data values lie
within 2 standard deviations of the mean, for any data set.
• k = 3: At least
1
1
3
2
=
8
9
(or 88.9%) of the data values lie
within 3 standard deviations of the mean, for any data set.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.8 The Coefficient of Variation
math courseware specialists
Coefficient of Variation:
• The coefficient of variation compares the variation in data
sets.
• For sample data:
CV 
s
1 0 0 %
x
• For a population:
CV 


1 0 0 %
• The coefficient of variation standardizes the variation
measure.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.9 Analyzing Grouped Data
math courseware specialists
Finding the Mean of Grouped Data:
• Finding the mean of grouped data involves finding the
midpoint of each of the classes in the frequency distribution
and then weighting each of these midpoints by the number of
observations in the class. Let
f i  n u m b e r o f o b se rv a tio n s in th e i
th
g ro u p ,
N  th e to ta l n u m b e r o f o b se rv a tio n s in a ll cla sse s, N 
M i  m id p o in t o f th e i
th

fi,
cla ss, a n d
n  th e n u m b e r o f o b se rv a tio n s in th e sa m p le .
• For a population the mean of grouped data is given by
 

fi M i
.
N
• If the grouped data represent sample observations the mean
is given by
x 

fi M i
n
.
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.9 Analyzing Grouped Data
math courseware specialists
Finding the Variance of Grouped Data:
• Let
f i  n u m b e r o f o b se rv a tio n s in th e i
th
g ro u p ,
N  th e to ta l n u m b e r o f o b se rv a tio n s in a ll cla sse s, N 
M i  m id p o in t o f th e i
th

cla ss, a n d
n  th e n u m b e r o f o b se rv a tio n s in th e sa m p le .
• The population variance of grouped data is given by the
expression

2


fi M i 
2

fi M i 
2
N


N
fi M i




2
N
• The sample variance is given by
s 
2

fi M i 
2

n 1
fi M i 
n
2
.

2
fi M i 
 .
N

fi,
HAWKES LEARNING SYSTEMS
Describing Data from One Variable
Section 4.10 Proportions
math courseware specialists
Proportions:
• A proportion measures the fraction of a group that
possesses some characteristic.
• To calculate the proportion, simply count the number in the
group that possess the characteristic and divide the count by
the number in the group. Let
X  n u m b e r th a t p o sse ss th e ch a ra cte ristic
N  n u m b e r in th e p o p u la tio n
n  n u m b e r in th e sa m p le , th e n
p
X
th e p o p u la tio n p ro p o rtio n , a n d
N
pˆ 
X
th e sa m p le p ro p o rtio n .
n
ˆ is pronounced p-hat.
• The symbol p
Describing Data from One Variable
HAWKES LEARNING SYSTEMS
Section 4.10 Proportions
math courseware specialists
Example:
Suppose your statistics class is composed of 48 students of which 4
are left-handed. What proportion of the class is left-handed? What
proportion of the class is right-handed?
Solution:
p
X
=
4
 .0 8 3
N
48
Then .083 is the proportion of people in the class that are left-handed.
p
X
N

44
 .9 1 7
48
Then .917 is the proportion of people in the class that are right-handed.