Transcript Chapter 3

Descriptive Statistics – Central Tendency & Variability

Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke

Learning Objectives

Distinguish between measures of central tendency, measures of variability, measures of shape, and measures of association.

Compute variance, standard deviation, and mean absolute deviation on ungrouped data.

Differentiate between sample and population variance and standard deviation.

Learning Objectives -- Continued

Understand the meaning of standard deviation as it is applied by using the empirical rule and Chebyshev’s theorem.

Compute the mean, mode, standard deviation, and variance on grouped data.

Understand skewness, kurtosis, and box and whisker plots.

Measures of Central Tendency: Ungrouped Data Measures of central tendency yield information about the center, or middle part, of a group of numbers.

Common Measures of central tendency      Mode Median Mean Percentiles Quartiles

Mode

The most frequently occurring value in a data set Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio) Bimodal -- Data sets that have two modes Multimodal -- Data sets that contain more than two modes

Mode -- Example

The mode is 44.

44 is the most frequently occurring data value.

35 37 37 39 40 40 41 44 45 41 44 46 43 44 46 43 44 46 43 44 46 43 45 48

Median

Middle value in an

ordered

numbers array of Applicable for ordinal, interval, and ratio data Not applicable for nominal data Unaffected by extremely large and extremely small values

Median: Computational Procedure

First Procedure  Arrange the observations in an ordered array.

  If there is an odd number of terms, the median is the middle term of the ordered array.

If there is an even number of terms, the median is the average of the middle two terms.

Second Procedure  The median’s position in an ordered array is given by (n+1)/2.

Median: Example with an Odd Number of Terms Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22 There are 17 terms in the ordered array.

Position of median = (n+1)/2 = (17+1)/2 = 9 The median is the 9th term, which is 15.

If the 22 is replaced by 100, the median is 15.

If the 3 is replaced by -103, the median is 15.

Median: Example with an Even Number of Terms Ordered Array 3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 • There are 16 terms in the ordered array.

• Position of median = (n+1)/2 = (16+1)/2 = 8.5

• The median is between the 8th and 9th terms, 14.5.

NOTE • If the 21 is replaced by 100, the median is 14.5.

• If the 3 is replaced by -88, the median is 14.5.

Arithmetic Mean

Commonly called ‘the mean’ Is the average of a group of numbers Applicable for interval and ratio data Not applicable for nominal or ordinal data Affected by each value in the data set, including extreme values Computed by summing all values in the data set and dividing the sum by the number of values in the data set

Population Mean

Data for total population: 57, 57, 86, 86, 42, 42, 43, 56, 57, 42, 42, 43    

X

N

57  57

X

 86 1 

X

 86  2 

X

42

N

 3 42 

X N

43  56  57  42  42  43 12  653 12  54.4167

Mean for a Sample of 3

X

  

X

 57

n

 86 

X

42 1 

X

3 2 

n X

3  185 3  61.667

X n

Example: Computing Central Tend. Measures using Frequency Tables

Mean=

 

F i F i *X i = 1655/15 =110.33

X

i

55 60 100 125 140 

F i

2 1 3 5 4 15 F

i

* X

i

110 60 300 625 560 1655

Mode= 125 Median position = = (15+1)/2 = 8th Median value = 125

Exercise: Computing Central Tend. Measures using Frequency Tables

Mean=

 

F i F i *X i = =

X

i

1 10 4 6 12 

F i

2 3 4 3 2 n=14 F

i

* X

i Mode= Median position = = Median value =

Exercise: Central Tendency Measures for Grouped Data

Class interval

[1 – 3) inches [3 – 5) inches [5 – 7) inches [7 – 9) inches [9 – 11) inches [11 – 13) inches  Modal class: Median position: Median class: Frequency (F

i

) 16 2 4 3 9 6 40 Midpoints (M

i

) 2 4 6 8 10 12 40

Example: Central Tendency Measures for Grouped Data

Class interval

[1 – 3) inches [3 – 5) inches [5 – 7) inches [7 – 9) inches [9 – 11) inches [11 – 13) inches  Frequency (F

i

) 16 2 4 3 9 6 40 Midpoint (M

i

) 2 4 6 8 10 12 40 (F

i

)*(M

i

) 32 8 24 24 90 72 226 Find the mean for the distribution: Mean: = (Σ

F i

*M

i

)/n = 226/40 = 5.65 inches

Exercise: Central Tendency Measures for Grouped Data

Class interval

[1 – 2) inches [2 – 3) inches [3 – 4) inches [4 – 5) inches [5 – 6) inches  Frequency (F

i

) 2 2 4 2 1 Midpoint (M

i

) (F

i

)*(M

i

) Find the mean for the distribution: Mean: = (Σ

F i

*M

i

)/n = inches

Exercise: Computing Central Tend. Measures using Frequency Tables We want to choose one of the two suppliers. We have data about their lateness in delivery (data is in hours). Which one has better statistical measures of central tendency?

Supplier 1 Supplier 2 X

i

1 4 6 10 12 

F i

2 4 3 3 2 n=14 F

i

* X

i

2 8 18 30 24 82 X

i

0 1 4 6 10 

F i

2 0 3 5 4 n=14 F

i

* X

i

0 0 12 30 40 82

Measures of Dispersion: Variability

No Variability in Cash Flow (same amounts) Variability in Cash Flow (different amounts)

Mean Mean

Measures of Variability: Ungrouped Data Measures of variability describe the spread or the dispersion of a set of data.

Common Measures of Variability   Range Interquartile Range      Mean Absolute Deviation Variance Standard Deviation Z scores Coefficient of Variation

Range

The difference between the largest and the smallest values in a set of data Simple to compute Ignores all data points except the two extremes Example: Range = Largest - Smallest = 48 - 35 = 13

35 37 37 39 40 41 41 43 43 43 44 44 44 44 44 40 43 45 45 46 46 46 46 48

Interquartile Range

Range of values between the first and third quartiles Range of the middle 50% of the ordered data set Less influenced by extremes 3 

Q

1

Deviation from the Mean

Data set: 5, 9, 16, 17, 18 Mean:  = 13 Deviations (X i 3, 4, 5  ) from the mean: -8, -4, -8 -4

+3 +4 +5 0 5 10

15 20

Mean Absolute Deviation

X

Average of the absolute deviations from the mean

5

X

 

-8

X

 

+8

.

 

X N

 

9 16 -4 +3 +4 +3

 24 5

17 18 +4 +5 +4 +5

0 24

Population Variance

Average of the squared deviations from the arithmetic mean

X

5 9 16 17 18

X

 

-8 -4 +3 +4 +5 0

(

X

  ) 2

64 16 9 16 25 130

s 2     130 5 (

X N

  ) 2

Population Standard Deviation

Square root of the variance s s 2    (

X N

  ) 2 1 3 0 5   s 2  

Sample Variance

Average of the squared deviations from the arithmetic mean

X

2,398

X

X

625

(

X

X

390,625

S

2  (

X n X

1,844 71 5,041

) 2    1

1,539 1,311 -234 -462 54,756 213,444

 3

7,092 0 663,866

 ) 2

Sample Standard Deviation

Square root of the sample variance

S

2   (

X n

  1

X

) 2   3

S

  

S

2

Uses of Standard Deviation

Indicator of financial risk Quality Control   construction of quality control charts process capability studies Comparing populations   household incomes in two cities employee absenteeism at two plants

Exercise: Computing Standard Deviation using Frequency Tables Which one has better statistical measures of central tendency?

Supplier 2 (mean = 5.8hours) X

i F i

F

i

* X

i 0 1 4 6 10

2 4 3 3 2 n=14 0 0 12 30 40 82

Exercise: Computing Standard Deviation using Frequency Tables Which one has better statistical measures of central tendency?

Supplier 1 (mean=5.8 hrs) X

i F i

F

i

* X

i

1 4 6 10 12  2 4 3 3 2 n=14 2 8 18 30 24 82 Mode= 4 hours Median position= 15/2 = 7.5 Median value= 6 hours Mean = 82/14 = 5.8 hours Which supplier is better? Why?

Standard Deviation as an Indicator of Financial Risk

Annualized Rate of Return  s Financial Security A B 15% 15% 3% 7%

Variance and Standard Deviation of Grouped Data Population s 2   s  s 2

f

(

M

  ) 2

N

Sample

S

2  

S

S

2

f

(

n M

 1 

X

) 2

Population Variance and Standard Deviation of Grouped Data 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 s 2  

f

(

M

6 18 11 11 3 1 50

N f M fM M

   25 35 45 55 65 75  150 630 495 605 195 75 ) 2 2150  7 2 0 0 5 0 -18 -8 2 12 22 32  1 4 4 (

M

  ) 2 324 64 4 144 484 1024 s  s 2 

f

(

M

  ) 2 1944 1152 44 1584 1452 1024 7200

Measures of Shape

Skewness

 Absence of symmetry  Extreme values in one side of a distribution

Kurtosis

  Peakedness of a distribution Leptokurtic: high and thin  

Box and Whisker Plots

 Graphic display of a distribution  Mesokurtic: normal shape Platykurtic: flat and spread out Reveals skewness

Relationship of Mean, Median and Mode

Relationship of Mean, Median and Mode

Relationship of Mean, Median and Mode

Empirical Rule

Data are normally distributed (or approximately normal) Distance from the Mean     1 2   3 s s s Percentage of Values Falling Within Distance 68 95 99.7

Chebyshev’s Theorem

Applies to all distributions

P

(

 

k

s

for

k > 1

X k

s

) 1

k

2

Chebyshev’s Theorem

Applies to all distributions Number of Standard Deviations K = 2 K = 3 K = 4   Distance from  the Mean    2 3 4 s s s Minimum Proportion of Values Falling Within Distance 1-1/2 2 =0.75

1-1/3 2 = 0.89

1-1/4 2 = 0.94

Box and Whisker Plot

Five specific values are used:

    

Median, Q 2 First quartile, Q 1 Third quartile, Q 3 Minimum value in the data set Maximum value in the data set Inner Fences

  

IQR = Q 3 - Q 1 Lower inner fence = Q 1 Upper inner fence = Q 3 Outer Fences

 

Lower outer fence = Q 1 Upper outer fence = Q 3 - 1.5 IQR + 1.5 IQR - 3.0 IQR + 3.0 IQR

Box and Whisker Plot

Minimum

Q 1 Q 2 Q 3

Maximum

Exercises