Transcript Chapter03

Chapter 3
Descriptive Statistics: Numerical
Methods
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Descriptive Statistics
3.1 Describing Central Tendency
3.2 Measures of Variation
3.3 Percentiles, Quartiles and Box-andWhiskers Displays
3.4 Covariance, Correlation, and the
Least Square Line
3.5 Weighted Means and Grouped Data
(Optional)
3.6 The Geometric Mean (Optional)
3-2
Describing Central Tendency
• In addition to describing the shape of a
distribution, want to describe the data
set’s central tendency
– A measure of central tendency represents
the center or middle of the data
3-3
Parameters and Statistics
• A population parameter is a number
calculated from all the population
measurements that describes some
aspect of the population
• A sample statistic is a number
calculated using the sample
measurements that describes some
aspect of the sample
3-4
Measures of Central Tendency
Mean, 
The average or expected
value
Median, Md
The value of the middle
point of the ordered
measurements
Mode, Mo
The most frequent value
3-5
The Mean
Population X1, X2, …, XN

Sample x1, x2, …, xn
x
Population Mean
Sample Mean
n
N


Xi
i=1
N
x
x
i
i=1
n
3-6
The Sample Mean
For a sample of size n, the sample mean is defined as
n
x
x
i 1
n
i
x1  x2  ...  xn

n
and is a point estimate of the population mean 
• It is the value to expect, on average and in the long run
3-7
Example 3.1: The Car Mileage Case
• Example 3.1:Sample mean for first five
car mileages from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
5
x
x1  x2  x3  x4  x5
x

5
5
30.8  31.7  30.1  31.6  32.1 156.3
x

 31.26
5
5
i 1
i
3-8
The Median
The median Md is a value such that 50% of
all measurements, after having been
arranged in numerical order, lie above (or
below) it
1. If the number of measurements is odd, the
median is the middlemost measurement in the
ordering
2. If the number of measurements is even, the
median is the average of the two middlemost
measurements in the ordering
3-9
Example: Car Mileage Case
• Example 3.1: First five observations
from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
• In order: 30.1, 30.8, 31.6, 31.7, 32.1
• There is an odd so median is one in
middle, or 31.6
3-10
The Mode
The mode Mo of a population or sample of
measurements is the measurement that
occurs most frequently
– Modes are the values that are observed “most
typically”
– Sometimes higher frequencies at two or more
values
• If there are two modes, the data is bimodal
• If more than two modes, the data is multimodal
– When data are in classes, the class with the
highest frequency is the modal class
• The tallest box in the histogram
3-11
Histogram Describing the 50 Mileages
3-12
Relationships Among Mean, Median
and Mode
3-13
Measures of Variation
• Knowing the measures of central tendency is
not enough
• Both of the distributions below have identical
measures of central tendency
3-14
Measures of Variation
Range
Largest minus the smallest
measurement
Variance
The average of the squared
deviations of all the population
measurements from the population
mean
Standard
The square root of the variance
Deviation
3-15
The Range
• Largest minus smallest
• Measures the interval spanned by all
the data
• For Figure 3.13, largest repair time is 5
and smallest is 3
• Range is 5 – 3 = 2 days
3-16
Population Variance and Standard
Deviation
• The population variance (σ2) is the
average of the squared deviations of
the individual population measurements
from the population mean (µ)
• The population standard deviation (σ) is
the positive square root of the
population variance
3-17
Variance
• For a population of size N, the
population variance σ2 is:
N
2 
2


x


 i
i 1
N
2
2
2

x1     x2       xN   

N
• For a sample of size n, the sample
variance s2 is:
n
s2 
2


x

x
 i
i 1
n 1
2
2
2

x1  x    x2  x      xn  x 

n 1
3-18
Standard Deviation
• Population standard deviation (σ):
 
2
• Sample standard deviation (s):
s s
2
3-19
Example: Chris’s Class Sizes This
Semester
• Data points are: 60, 41, 15, 30, 34
• Mean is 36
• Variance is:

2
2
2
2
2
2

60  36  41 36  15  36  30  36  34  36

5
576 25  441 36  4 1082


 216.4
5
5
Standard deviation is:
  216.4  14.71
3-20
Example: Sample Variance and
Standard Deviation
• Example 3.7: data for first five car
mileages from Table 3.1 are 30.8, 31.7,
30.1, 31.6, 32.1
• The sample mean is 31.26
5
s2 
 x  x 
i 1
2
i
5 1
2
2
2
2
2

30.8  31.26  31.7  31.26  30.1  31.26  31.6  31.26  32.1  31.26

4
2.572

 0.643
4
s  s 2  0.643  0.8019
3-21
The Empirical Rule for Normal
Populations
• If a population has mean µ and standard
deviation σ and is described by a normal
curve, then
– 68.26% of the population measurements lie within
one standard deviation of the mean: [µ-σ, µ+σ]
– 95.44% of the population measurements lie within
two standard deviations of the mean: [µ-2σ, µ+2σ]
– 99.73% of the population measurements lie within
three standard deviations of the mean: [µ-3σ,
µ+3σ]
3-22
The Empirical Rule and Tolerance
Intervals
3-23
Example 3.9: The Car Mileage Case
Continued
• 68.26% of all individual cars will have
mileages in the range
[x±s] = [31.6±0.8] = [30.8, 32.4] mpg
• 95.44% of all individual cars will have
mileages in the range
[x±2s] = [31.6±1.6] = [30.0, 33.2] mpg
• 99.73% of all individual cars will have
mileages in the range
[x±3s] = [31.6±2.4] = [29.2, 34.0] mpg
3-24
Estimated Tolerance Intervals in the Car
Mileage Case
3-25
Chebyshev’s Theorem
• Let µ and σ be a population’s mean and
standard deviation, then for any value
k> 1
• At least 100(1 - 1/k2 )% of the
population measurements lie in the
interval [µ-kσ, µ+kσ]
• Only practical for non-mound-shaped
distribution population that is not very
skewed
3-26
z Scores
• For any x in a population or sample, the
associated z score is
x  mean
z
standarddeviation
• The z score is the number of standard
deviations that x is from the mean
– A positive z score is for x above (greater
than) the mean
– A negative z score is for x below (less
than) the mean
3-27
Example: z Score
• Population of profit margins for five American
companies: 8%, 10%, 15%, 12%, 5%
• µ = 10%, σ = 3.406%
3-28
Coefficient of Variation
• Measures the size of the standard deviation
relative to the size of the mean
• Coefficient of variation =standard
deviation/mean × 100%
• Used to:
– Compare the relative variabilities of values about
the mean
– Compare the relative variability of populations or
samples with different means and different
standard deviations
– Measure risk
3-29