Transcript Chapter 3

Business Statistics, 3e
by Ken Black
Chapter 3
Discrete Distributions
Descriptive
Statistics
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-1
Learning Objectives
• Distinguish between measures of central
tendency, measures of variability, and
measures of shape
• Understand the meanings of mean, median,
mode, quartile, percentile, and range
• Compute mean, median, mode, percentile,
quartile, range, variance, standard deviation,
and mean absolute deviation on ungrouped
data
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-2
Learning Objectives -- Continued
• Differentiate between sample and
population variance and standard deviation
• Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshev’s theorem
• Compute the mean, median, standard
deviation, and variance on grouped data
• Understand box and whisker plots,
skewness, and kurtosis
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-3
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Location
– Mode
– Median
– Mean
– Percentiles
– Quartiles
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-4
Mode
• The most frequently occurring value in a
data set
• Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
• Bimodal -- Data sets that have two modes
• Multimodal -- Data sets that contain more
than two modes
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-5
Mode -- Example
• The mode is 44.
• There are more 44s
than any other value.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
35
41
44
45
37
41
44
46
37
43
44
46
39
43
44
46
40
43
44
46
40
43
45
48
3-6
Median
• Middle value in an ordered array of
numbers.
• Applicable for ordinal, interval, and ratio
data
• Not applicable for nominal data
• Unaffected by extremely large and
extremely small values.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-7
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median
is the middle term of the ordered array.
– If there is an even number of terms, the median
is the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is
given by (n+1)/2.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-8
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
•
•
•
•
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, 15.
If the 22 is replaced by 100, the median is
15.
• If the 3 is replaced by -103, the median is
15.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-9
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
• There are 16 terms in the ordered array.
• Position of median = (n+1)/2 = (16+1)/2 = 8.5
• The median is between the 8th and 9th terms,
14.5.
• If the 21 is replaced by 100, the median is
14.5.
• If the 3 is replaced by -88, the median is 14.5.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-10
Arithmetic Mean
•
•
•
•
•
Commonly called ‘the mean’
is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set,
including extreme values
• Computed by summing all values in the
data set and dividing the sum by the number
of values in the data set
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-11
Population Mean
X X  X  X ...  X



1
2
3
N
N
N
24  13  19  26  11

5
93

5
 18. 6
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-12
Sample Mean
X X  X  X ...  X

X

1
2
3
n
n
n
57  86  42  38  90  66

6
379

6
 63.167
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-13
Percentiles
• Measures of central tendency that divide a
group of data into 100 parts
• At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data
lie above the nth percentile
• Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10%
of the data lie above it
• The median and the 50th percentile have the
same value.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-14
Percentiles: Computational Procedure
• Organize the data into an ascending ordered
array.
• Calculate the
P
percentile location:
i
(n)
100
• Determine the percentile’s location and its
value.
• If i is a whole number, the percentile is the
average of the values at the i and (i+1)
positions.
• If i is not a whole number, the percentile is at
the (i+1) position in the ordered array.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-15
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of
30th percentile:
30
i
(8)  2.4
100
• The location index, i, is not a whole
number; i+1 = 2.4+1=3.4; the whole
number portion is 3; the 30th percentile is at
the 3rd location of the array; the 30th
percentile is 13.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-16
Quartiles
• Measures of central tendency that divide a group
of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second
quartile
• Q3: 75% of the data set is below the third quartile
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the
median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the
data set
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-17
Quartiles
Q2
Q1
25%
25%
Q3
25%
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
25%
3-18
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
• Q1
25
109114
i
(8)  2
100
Q1 
2
 1115
.
• Q2:
50
i
(8)  4
100
116121
Q2 
 1185
.
2
• Q3:
75
i
(8)  6
100
122125
Q3 
 1235
.
2
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-19
Variability
No Variability in Cash Flow
Variability in Cash Flow
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Mean
Mean
Mean
Mean
3-20
Variability
Variability
No Variability
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-21
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread
or the dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-22
Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute
35
41
44
• Ignores all data points except 37 41 the44
two extremes
37
43
44
• Example:
Range
39
43 = 44
Largest - Smallest
=
40
43
44
48 - 35 = 13
40
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
43
45
45
46
46
46
46
48
3-23
Interquartile Range
• Range of values between the first and third
quartiles
• Range of the “middle half”
• Less influenced by extremes
Interquartile Range  Q3  Q1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-24
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
X 65


N

5
 13
• Deviations from the mean: -8, -4, 3, 4, 5
-4
-8
0
+3 +4
5
10
15
+5
20

Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-25
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X
5
9
16
17
18
X  
X  
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
M . A. D. 
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning

X 
N
24

5
 4.8
3-26
Population Variance
• Average of the squared deviations from the
arithmetic mean
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0

64
16
9
16
25
130
2

Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2

 X 
N
130

5
 2 6 .0
3-27
2
Population Standard Deviation
• Square root of the
variance
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0

2

64
16
9
16
25
130
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2

 X 
N
130

5
 2 6 .0
 

2
 2 6 .0
 5 .1
3-28
2
Sample Variance
• Average of the squared deviations from the
arithmetic mean
X
2,398
1,844
1,539
1,311
7,092
X  X X
625
71
-234
-462
0
 X

390,625
5,041
54,756
213,444
663,866
2
S
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2

X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
3-29

2
Sample Standard Deviation
• Square root of the
sample variance
X
X  X X
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
 X

2
S
390,625
5,041
54,756
213,444
663,866
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2

X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
S 
S
2
 2 2 1 , 2 8 8 .6 7
 4 7 0 .4 1
3-30

2
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-31
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security


A
15%
3%
B
15%
7%
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-32
Empirical Rule
• Data are normally distributed (or approximately
normal)
Distance from
the Mean
  1
  2
  3
Percentage of Values
Falling Within Distance
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
68
95
99.7
3-33
Chebyshev’s Theorem
• Applies to all distributions
1
P(  k  X    k )  1  2
k
for k > 1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-34
Chebyshev’s Theorem
• Applies to all distributions
Number
of
Standard
Deviations
K=2
K=3
K=4
Distance from
the Mean
  2
  3
  4
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Minimum Proportion
of Values Falling
Within Distance
1-1/22 = 0.75
1-1/32 = 0.89
1-1/42 = 0.94
3-35
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion

C.V . 100

Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-36
Coefficient of Variation
  29

  84
1
1

 4.6

CV
. .
1
1
100
1
2
2
 10

CV
. .
4.6
100

29
 1586
.
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2
2
100
2
10
 100
84
 1190
.
3-37
Measures of Central Tendency
and Variability: Grouped Data
• Measures of Central Tendency
– Mean
– Median
– Mode
• Measures of Variability
– Variance
– Standard Deviation
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-38
Mean of Grouped Data
• Weighted average of class midpoints
• Class frequencies are the weights
 

 fM
 f
 fM
N
f 1 M 1  f 2 M 2  f 3 M 3      f iM i

f 1  f 2  f 3      fi
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-39
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint
20-under 30
6
25
30-under 40
18
35
40-under 50
11
45
50-under 60
11
55
60-under 70
3
65
70-under 80
1
75
50
fM


f

fM
150
630
495
605
195
75
2150
2150
 43 . 0
50
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-40
Median of Grouped Data
N
 cfp
W 
Median  L  2
fmed
Where:
L  the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-41
Median of Grouped Data -- Example
Cumulative
Class Interval Frequency Frequency
20-under 30
6
6
30-under 40
18
24
40-under 50
11
35
50-under 60
11
46
60-under 70
3
49
70-under 80
1
50
N = 50
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
N
 cfp
W 
Md  L  2
fmed
50
 24
10
 40  2
11
 40.909
3-42
Mode of Grouped Data
• Midpoint of the modal class
• Modal class has the greatest frequency
30  40
Class Interval Frequency
Mode 
 35
20-under 30
6
2
30-under 40
18
40-under 50
50-under 60
60-under 70
70-under 80
11
11
3
1
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-43
Variance and Standard Deviation
of Grouped Data
Population
Sample
 f  M   S
 
N
2
2
 

2
2

S 
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning

f
M  X 
n1
S
2
3-44
2
Population Variance and Standard
Deviation of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

2


f
f
M
fM
6
18
11
11
3
1
50
25
35
45
55
65
75
150
630
495
605
195
75
2150
M 
N
M
M 
-18
-8
2
12
22
32
2
7200

 144
50
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
2
f
M
1944
1152
44
1584
1452
1024
7200
324
64
4
144
484
1024


2

2
144  12
3-45
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
–
–
–
–
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-46
Skewness
Negatively
Skewed
Symmetric
(Not Skewed)
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Positively
Skewed
3-47
Skewness
Mean
Median
Mean
Median
Mode
Negatively
Skewed
Symmetric
(Not Skewed)
Mode
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Mean
Mode
Median
Positively
Skewed
3-48
Coefficient of Skewness
• Summary measure for skewness
S
3   Md 

• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-49
Coefficient of Skewness

1
M
d1

1
S
1
 23

 26
M
d2
 12.3


2
3 1 

M

d1
1
3 23  26

12.3
 0.73

S
2
2
 26

 26
M
d3
 12.3


3
3 2 

M
d2
2
3 26  26

12.3
0
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning


S
3
3
 29
 26
 12.3


3 3 

M

d3
3
3 29  26

12.3
 0.73
3-50
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-51
Box and Whisker Plot
• Five secific values are used:
–
–
–
–
–
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
3-52
Box and Whisker Plot
Minimum
Q1
Q2
Q3
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
Maximum
3-53
Skewness: Box and Whisker Plots,
and Coefficient of Skewness
S<0
Negatively
Skewed
S=0
Symmetric
(Not Skewed)
Business Statistics: Contemporary Decision Making, 3e, by Black. © 2001 South-Western/Thomson Learning
S>0
Positively
Skewed
3-54