Transcript Chapter 3
© 2002 Thomson / South-Western
Chapter 3 Descriptive Statistics Slide 3-1
Learning Objectives • Distinguish between measures of central tendency, measures of variability, and measures of shape • Understand the meanings of mean, median, mode, quartile, percentile, and range • Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation © 2002 Thomson / South-Western
Slide 3-2
Learning Objectives -- Continued • Differentiate between sample and population variance and standard deviation • Understand the meaning of standard deviation as it is applied by using the empirical rule • Understand box and whisker plots, skewness, and kurtosis © 2002 Thomson / South-Western
Slide 3-3
Measures of Central Tendency • Measures of central tendency yield information about “particular places or locations in a group of numbers.” • Common Measures of Location –Mode –Median –Mean –Percentiles –Quartiles © 2002 Thomson / South-Western
Slide 3-4
Mode • The most frequently occurring value in a data set • Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) • Bimodal -- Data sets that have two modes • Multimodal -- Data sets that contain more than two modes © 2002 Thomson / South-Western
Slide 3-5
Mode -- Example • The mode is 44.
• There are more 44s than any other value.
35 37 41 44 45 41 44 46
© 2002 Thomson / South-Western
37 39 40 40 43 44 46 43 44 46 43 44 46 43 45 48 Slide 3-6
Median ( ΔΙΑΜΕΣΟΣ) • Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data • Not applicable for nominal data • Unaffected by extremely large and extremely small values.
© 2002 Thomson / South-Western
Slide 3-7
Median: Computational Procedure • First Procedure – Arrange observations in an ordered array.
– If number of terms is odd, the median is the middle term of the ordered array.
– If number of terms is even, the median is the average of the middle two terms.
• Second Procedure – The median’s position in an ordered array is given by (n+1)/2.
© 2002 Thomson / South-Western
Slide 3-8
Median: Example with an Odd Number of Terms Ordered Array includes: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 • There are 17 terms in the ordered array.
• Position of median = (n+1)/2 = (17+1)/2 = 9 • The median is the 9th term, 15.
• If the 22 is replaced by 100, the median remains at 15.
• If the 3 is replaced by -103, the median remains at 15.
© 2002 Thomson / South-Western
Slide 3-9
Mean (ΜΕΣΟΣ) • Is the average of a group of numbers • Applicable for interval and ratio data, not applicable for nominal or ordinal data • Affected by each value in the data set, including extreme values • Computed by summing all values in the data set and dividing the sum by the number of values in the data set © 2002 Thomson / South-Western
Slide 3-10
Population Mean
X N
93 5 5
N
3
X
N
© 2002 Thomson / South-Western
Slide 3-11
Sample Mean
X
n X
X X X
n
3 6 379 6
X
n
© 2002 Thomson / South-Western
Slide 3-12
Quartiles Measures of central tendency that divide a group of data into four subgroups • Q • Q • Q 1 2 3 : 25% of the data set is below the first quartile : 50% of the data set is below the second quartile : 75% of the data set is below the third quartile © 2002 Thomson / South-Western
Slide 3-13
Quartiles,
continued
• Q 1 is equal to the 25th percentile • Q 2 is located at 50th percentile and equals the median • Q 3 is equal to the 75th percentile Quartile values are not necessarily members of the data set © 2002 Thomson / South-Western
Slide 3-14
Quartiles Q 1 25% 25% Q 2 25% Q 3 25% © 2002 Thomson / South-Western
Slide 3-15
Quartiles: Example • Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 • Q • Q • Q 1 2 3 : : :
i
25 100 ( )
i
50 100 ( )
i
75 100 ( )
Q
1
Q
2
Q
3 2 2 2 © 2002 Thomson / South-Western
Slide 3-16
Measures of Variability • Measures of variability describe the spread or the dispersion of a set of data.
• Common Measures of Variability –Range –Interquartile Range –Mean Absolute Deviation –Variance –Standard Deviation – Z scores –Coefficient of Variation © 2002 Thomson / South-Western
Slide 3-17
Variability
No Variability in Cash Flow Variability in Cash Flow
Mean Mean © 2002 Thomson / South-Western
Slide 3-18
Variability
Variability © 2002 Thomson / South-Western
No Variability Slide 3-19
Range • The difference between the largest and the smallest values in a set of data • Simple to compute • Ignores all data points
35 37 41 41 44 44 45 46
except the
37 43 44 46
two extremes • Example: Range Largest - Smallest
39 40 43 43 44 44
=
46 46
= 48 - 35 = 13
40 43 45 48
© 2002 Thomson / South-Western
Slide 3-20
Interquartile Range • Range of values between the first and third quartiles • Range of the “middle half” • Less influenced by extremes 3
Q
1 © 2002 Thomson / South-Western
Slide 3-21
Deviation from the Mean • Data set: 5, 9, 16, 17, 18 • Mean:
N X
65 5 13 • Deviations from the mean: -8, -4, 3, 4, 5
+5 +3 +4
-8 -4
0 5 10
15 20
© 2002 Thomson / South-Western
Slide 3-22
Mean Absolute Deviation • Average of the absolute deviations from the mean
X
5 9 16 17 18
X
-8 -4 +3 +4 +5 0
X
+8 +4 +3 +4 +5 24
.
24 5
X N
© 2002 Thomson / South-Western
Slide 3-23
Population Variance • Average of the squared deviations from the arithmetic mean
X
X
X
2
5 9 16 17 18 -8 -4 +3 +4 +5 0 64 16 9 16 25 130
2 1 3 0 5
X N
2 © 2002 Thomson / South-Western
Slide 3-24
Population Standard Deviation • Square root of the variance
X
5 9 16 17 18
X
X
2
-8 -4 +3 +4 +5 0
© 2002 Thomson / South-Western
64 16 9 16 25 130
2
X N
2 1 3 0 5 2
Slide 3-25
Empirical Rule • Data are normally distributed (or approximately normal) Distance from the Mean 1 2 3 © 2002 Thomson / South-Western Percentage of Values Falling Within Distance 68 95 99.7
Slide 3-26
Sample Variance • Average of the squared deviations from the arithmetic mean
X
2,398 1,844 1,539 1,311 7,092
X
X
625 71 -234 -462 0
X
X
2
390,625 5,041 54,756 213,444 663,866
S
2
X n
1
X
2 3 © 2002 Thomson / South-Western
Slide 3-27
Sample Standard Deviation • Square root of the sample variance
X X
X
X
X
2
S
2
X n
1
X
2
2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866
S
S
2 3 © 2002 Thomson / South-Western
Slide 3-28
Coefficient of Variation • Ratio of the standard deviation to the mean, expressed as a percentage • Measurement of relative dispersion
C V
100 © 2002 Thomson / South-Western
Slide 3-29
Coefficient of Variation 1 1 29
C V
1 1 1 100 29 100 © 2002 Thomson / South-Western 2 2
C V
2 84 10 2 2 100 10 84 100
Slide 3-30
Measures of Shape • • •
Skewness
– Absence of symmetry – Extreme values in one side of a distribution
Kurtosis
– Peakedness of a distribution
Box and Whisker Plots
– Graphic display of a distribution – Reveals skewness © 2002 Thomson / South-Western
Slide 3-31
Skewness
Negatively Skewed Symmetric (Not Skewed) Positively Skewed
© 2002 Thomson / South-Western
Slide 3-32
Skewness
Mean Median Negatively Skewed Mode Mean Median Mode Symmetric (Not Skewed) Mode Median Positively Skewed Mean
© 2002 Thomson / South-Western
Slide 3-33
Coefficient of Skewness • Summary measure for skewness
S
3
M d
• If S < 0, the distribution is negatively skewed (skewed to the left).
• If S = 0, the distribution is symmetric (not skewed).
• If S > 0, the distribution is positively skewed (skewed to the right).
© 2002 Thomson / South-Western
Slide 3-34
Coefficient of Skewness 1 23
M d
1 1 26
S
1 3 1 1
M d
1 2 26
M d
2 2 26
S
2 3 2 2
M d
2 3 29
M d
3 3 26
S
3 3 3 3
M d
3 .
© 2002 Thomson / South-Western 0 .
Slide 3-35
Kurtosis • Peakedness of a distribution – Leptokurtic: high and thin – Mesokurtic: normal in shape – Platykurtic: flat and spread out
Leptokurtic Mesokurtic Platykurtic
© 2002 Thomson / South-Western
Slide 3-36
Box and Whisker Plot •
Five specific values are used:
–
Median, Q 2
–
First quartile, Q 1
–
Third quartile, Q 3
–
Minimum value in the data set
–
Maximum value in the data set
© 2002 Thomson / South-Western
Slide 3-37
Box and Whisker Plot,
continued
•
Inner Fences
–
IQR = Q 3 - Q 1
–
Lower inner fence = Q 1
–
Upper inner fence = Q 3 - 1.5 IQR + 1.5 IQR
•
Outer Fences
–
Lower outer fence = Q 1
–
Upper outer fence = Q 3 - 3.0 IQR + 3.0 IQR
© 2002 Thomson / South-Western
Slide 3-38
Box and Whisker Plot Minimum
Q 1 Q 2 Q 3
Maximum © 2002 Thomson / South-Western
Slide 3-39
Skewness: Box and Whisker Plots, and Coefficient of Skewness
S < 0 S = 0 S > 0 Negatively Skewed
© 2002 Thomson / South-Western
Symmetric (Not Skewed) Positively Skewed Slide 3-40