Transcript Chapter 3

© 2002 Thomson / South-Western

Chapter 3 Descriptive Statistics Slide 3-1

Learning Objectives • Distinguish between measures of central tendency, measures of variability, and measures of shape • Understand the meanings of mean, median, mode, quartile, percentile, and range • Compute mean, median, mode, percentile, quartile, range, variance, standard deviation, and mean absolute deviation © 2002 Thomson / South-Western

Slide 3-2

Learning Objectives -- Continued • Differentiate between sample and population variance and standard deviation • Understand the meaning of standard deviation as it is applied by using the empirical rule • Understand box and whisker plots, skewness, and kurtosis © 2002 Thomson / South-Western

Slide 3-3

Measures of Central Tendency • Measures of central tendency yield information about “particular places or locations in a group of numbers.” • Common Measures of Location –Mode –Median –Mean –Percentiles –Quartiles © 2002 Thomson / South-Western

Slide 3-4

Mode • The most frequently occurring value in a data set • Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) • Bimodal -- Data sets that have two modes • Multimodal -- Data sets that contain more than two modes © 2002 Thomson / South-Western

Slide 3-5

Mode -- Example • The mode is 44.

• There are more 44s than any other value.

35 37 41 44 45 41 44 46

© 2002 Thomson / South-Western

37 39 40 40 43 44 46 43 44 46 43 44 46 43 45 48 Slide 3-6

Median ( ΔΙΑΜΕΣΟΣ) • Middle value in an ordered array of numbers.

• Applicable for ordinal, interval, and ratio data • Not applicable for nominal data • Unaffected by extremely large and extremely small values.

© 2002 Thomson / South-Western

Slide 3-7

Median: Computational Procedure • First Procedure – Arrange observations in an ordered array.

– If number of terms is odd, the median is the middle term of the ordered array.

– If number of terms is even, the median is the average of the middle two terms.

• Second Procedure – The median’s position in an ordered array is given by (n+1)/2.

© 2002 Thomson / South-Western

Slide 3-8

Median: Example with an Odd Number of Terms Ordered Array includes: 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 • There are 17 terms in the ordered array.

• Position of median = (n+1)/2 = (17+1)/2 = 9 • The median is the 9th term, 15.

• If the 22 is replaced by 100, the median remains at 15.

• If the 3 is replaced by -103, the median remains at 15.

© 2002 Thomson / South-Western

Slide 3-9

Mean (ΜΕΣΟΣ) • Is the average of a group of numbers • Applicable for interval and ratio data, not applicable for nominal or ordinal data • Affected by each value in the data set, including extreme values • Computed by summing all values in the data set and dividing the sum by the number of values in the data set © 2002 Thomson / South-Western

Slide 3-10

Population Mean   

X N

   93 5  5  

N

3

X

N

© 2002 Thomson / South-Western

Slide 3-11

Sample Mean

X

 

n X

  

X X X

n

3    6  379 6 

X

n

© 2002 Thomson / South-Western

Slide 3-12

Quartiles Measures of central tendency that divide a group of data into four subgroups • Q • Q • Q 1 2 3 : 25% of the data set is below the first quartile : 50% of the data set is below the second quartile : 75% of the data set is below the third quartile © 2002 Thomson / South-Western

Slide 3-13

Quartiles,

continued

• Q 1 is equal to the 25th percentile • Q 2 is located at 50th percentile and equals the median • Q 3 is equal to the 75th percentile Quartile values are not necessarily members of the data set © 2002 Thomson / South-Western

Slide 3-14

Quartiles Q 1 25% 25% Q 2 25% Q 3 25% © 2002 Thomson / South-Western

Slide 3-15

Quartiles: Example • Ordered array: 106, 109, 114, 116, 121, 122, 125, 129 • Q • Q • Q 1 2 3 : : :

i

 25 100 ( )

i

 50 100 ( )

i

 75 100 ( )

Q

1 

Q

2 

Q

3  2 2 2    © 2002 Thomson / South-Western

Slide 3-16

Measures of Variability • Measures of variability describe the spread or the dispersion of a set of data.

• Common Measures of Variability –Range –Interquartile Range –Mean Absolute Deviation –Variance –Standard Deviation – Z scores –Coefficient of Variation © 2002 Thomson / South-Western

Slide 3-17

Variability

No Variability in Cash Flow Variability in Cash Flow

Mean Mean © 2002 Thomson / South-Western

Slide 3-18

Variability

Variability © 2002 Thomson / South-Western

No Variability Slide 3-19

Range • The difference between the largest and the smallest values in a set of data • Simple to compute • Ignores all data points

35 37 41 41 44 44 45 46

except the

37 43 44 46

two extremes • Example: Range Largest - Smallest

39 40 43 43 44 44

=

46 46

= 48 - 35 = 13

40 43 45 48

© 2002 Thomson / South-Western

Slide 3-20

Interquartile Range • Range of values between the first and third quartiles • Range of the “middle half” • Less influenced by extremes 3 

Q

1 © 2002 Thomson / South-Western

Slide 3-21

Deviation from the Mean • Data set: 5, 9, 16, 17, 18 • Mean:   

N X

 65 5  13 • Deviations from the mean: -8, -4, 3, 4, 5

+5 +3 +4

-8 -4

0 5 10

15 20

© 2002 Thomson / South-Western

Slide 3-22

Mean Absolute Deviation • Average of the absolute deviations from the mean

X

5 9 16 17 18

X

 

-8 -4 +3 +4 +5 0

X

 

+8 +4 +3 +4 +5 24

.

   24 5 

X N

  © 2002 Thomson / South-Western

Slide 3-23

Population Variance • Average of the squared deviations from the arithmetic mean

X

X

  

X

   2

5 9 16 17 18 -8 -4 +3 +4 +5 0 64 16 9 16 25 130

 2     1 3 0 5 

X N

   2 © 2002 Thomson / South-Western

Slide 3-24

Population Standard Deviation • Square root of the variance

X

5 9 16 17 18

X

  

X

   2

-8 -4 +3 +4 +5 0

© 2002 Thomson / South-Western

64 16 9 16 25 130

 2         

X N

   2 1 3 0 5  2

Slide 3-25

Empirical Rule • Data are normally distributed (or approximately normal) Distance from the Mean     1 2   3    © 2002 Thomson / South-Western Percentage of Values Falling Within Distance 68 95 99.7

Slide 3-26

Sample Variance • Average of the squared deviations from the arithmetic mean

X

2,398 1,844 1,539 1,311 7,092

X

X

625 71 -234 -462 0

X

X

 2

390,625 5,041 54,756 213,444 663,866

S

2   

X n

  1

X

 2   3 © 2002 Thomson / South-Western

Slide 3-27

Sample Standard Deviation • Square root of the sample variance

X X

X

X

X

 2

S

2   

X n

  1

X

 2 

2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866

S

  

S

2 3 © 2002 Thomson / South-Western

Slide 3-28

Coefficient of Variation • Ratio of the standard deviation to the mean, expressed as a percentage • Measurement of relative dispersion

C V

    100  © 2002 Thomson / South-Western

Slide 3-29

Coefficient of Variation   1 1   29

C V

1    1 1  100   29  100   © 2002 Thomson / South-Western   2 2

C V

2  84   10   2 2  100   10 84  100  

Slide 3-30

Measures of Shape • • •

Skewness

– Absence of symmetry – Extreme values in one side of a distribution

Kurtosis

– Peakedness of a distribution

Box and Whisker Plots

– Graphic display of a distribution – Reveals skewness © 2002 Thomson / South-Western

Slide 3-31

Skewness

Negatively Skewed Symmetric (Not Skewed) Positively Skewed

© 2002 Thomson / South-Western

Slide 3-32

Skewness

Mean Median Negatively Skewed Mode Mean Median Mode Symmetric (Not Skewed) Mode Median Positively Skewed Mean

© 2002 Thomson / South-Western

Slide 3-33

Coefficient of Skewness • Summary measure for skewness

S

 3    

M d

 • If S < 0, the distribution is negatively skewed (skewed to the left).

• If S = 0, the distribution is symmetric (not skewed).

• If S > 0, the distribution is positively skewed (skewed to the right).

© 2002 Thomson / South-Western

Slide 3-34

Coefficient of Skewness  1  23

M d

 1 1   26

S

1     3   1  1

M d

 1   2  26

M d

 2 2   26

S

2     3   2  2 

M d

 2  3  29

M d

 3 3   26

S

3     3   3  3

M d

 3    .

© 2002 Thomson / South-Western  0   .

Slide 3-35

Kurtosis • Peakedness of a distribution – Leptokurtic: high and thin – Mesokurtic: normal in shape – Platykurtic: flat and spread out

Leptokurtic Mesokurtic Platykurtic

© 2002 Thomson / South-Western

Slide 3-36

Box and Whisker Plot •

Five specific values are used:

Median, Q 2

First quartile, Q 1

Third quartile, Q 3

Minimum value in the data set

Maximum value in the data set

© 2002 Thomson / South-Western

Slide 3-37

Box and Whisker Plot,

continued

Inner Fences

IQR = Q 3 - Q 1

Lower inner fence = Q 1

Upper inner fence = Q 3 - 1.5 IQR + 1.5 IQR

Outer Fences

Lower outer fence = Q 1

Upper outer fence = Q 3 - 3.0 IQR + 3.0 IQR

© 2002 Thomson / South-Western

Slide 3-38

Box and Whisker Plot Minimum

Q 1 Q 2 Q 3

Maximum © 2002 Thomson / South-Western

Slide 3-39

Skewness: Box and Whisker Plots, and Coefficient of Skewness

S < 0 S = 0 S > 0 Negatively Skewed

© 2002 Thomson / South-Western

Symmetric (Not Skewed) Positively Skewed Slide 3-40