Transcript Chapter 3

Chapter 3

1

◦   

Methods for organizing, displaying and describing data using tables, graphs and summary measures

Raw data is made more manageable Raw data is presented in a logical form Patterns can be seen from organised data  Frequency tables    Graphical techniques Measures of Central Tendency Measures of Spread (variability)

Chapter 3

2

   Organize data and display data using tables and graphs a) presentation of qualitative data b) presentation of quantitative data      Describe the characteristics of data set using statistical

measures

a) measures of central tendency b) measures of dispersion c) measures of skewness d) Box and whisker plot e) Population vs sample

Chapter 3

3

Chapter 3

4

Definition:

Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data

21 18 25 22 25 19 20 19 28 23 24 19 31 21 18 25 22 19 20 37 29 19 23 22 27 34 19 18 22 23 26 25 23 21 21

Ex: Ages of 50 students

27 22 19 20 25

Chapter 3

37 25 23 19 21 33 23 26 21 24 5

      Qualitative Data

Data that cannot be measured but can be classified into different categories

Example: gender, status of a students, nationality, races Quantitative Data

Data that can be measured numerically

Example: income, heights, gross sales, prices of homes, numbers of cars owned and numbers of accident

Chapter 3

6

a) Organizing qualitative data

(i) Frequency distributions

(ii)Relative distributions frequency and

b) Graphing qualitative data

(i) Bar graphs

(ii)Pie charts percentage

Chapter 3

7

A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.

Chapter 3

8

 A sample of 20 employees from large companies was selected and these employees were asked how stressful their jobs were. The responses are recorded as very represents very stressful, somewhat means somewhat stressful and none stands for not stressful at all.

somewhat none somewhat very none very somewhat none somewhat somewhat very somewhat very somewhat somewhat somewhat very very none very

Chapter 3

9

Stress on job

Very Somewhat None

Tally

|||| || |||| |||| ||||

Frequency (f )

7 9 4 Sum = 20

Frequency Distribution of Stress on Job Chapter 3

10

Relative frequency of a category = Frequency of that category Sum of all frequencies

Chapter 3

11

Stress on job

Very Somewhat None

Relative frequency

7/20 = 0.35

9/20 = 0.45

4/20 = 0.20 Sum = 1.00

Percentage (%)

0.35(100) = 35 0.45(100) = 45 0.20(100) = 20 Sum = 100%

Relative frequency and percentage distributions of stress on job Chapter 3

12

 A graph made of bars where the categories are on the horizontal axis and the frequencies (or relative frequencies) are on the vertical axis.

60 40 20 0 heart cancer stroke CLRD accident Chapter 3

13

 A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories is called a pie chart.

heart cancer stroke CLRD accident Chapter 3

14

1 Array Numerical Data 2 Types of quantitative data 3 Frequency Distributions a b Histogram Polygon c Ogive Chapter 3 d Stem & Leaf

15

 1. Organizes data to focus on major features

i. Ascending

Example: 1, 2, 3, 4, 5,….

ii. Descending

Example: 10, 9, 8, 7, 6,….

iii.Range

(difference between the largest and smallest) Example: largest height is 74 inch smallest height is 60 inch range is 74 – 60 = 14 inch

Chapter 3

16

    o Quickly notice lowest and highest values in the data o Easily divide data into sections o Easily see values that occur frequently o Observe variability in the data

Chapter 3

17

Raw Data: Yards Produced by 30 Carpet Looms

16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8

15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0

16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3

(ungrouped data)

Chapter 3

18

Raw Data: Yards Produced by 30 Carpet Looms

16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8

15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0

16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3

Data Array: Daily Production in Yards of 30 Carpet Looms 15.2 15.7 15.9 16.0 16.2 16.4

15.4 15.7 15.9 16.0 16.3 16.6

15.6 15.8 15.9 16.0 16.3 16.8

15.6 15.8 15.9 16.1 16.3 16.8

15.6 15.8 16.0 16.2 16.4 16.9

Chapter 3

19

  Discrete data - integer values 0, 1, 2 Example: number of children, cars,..

  Continuous data Example: weight, length, time, area, price, 256.312 grams

Chapter 3

20

 A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class. Data presented in the form of a frequency distribution are called

grouped data

Chapter 3

21

variable third class classes lower limit of sixth class Weekly Earnings (RM) 401 - 600 601 - 800 801 - 1000 1001 - 1200 1201 - 1400 1401 - 1600 Num of Employees (

f

) 9 12 39 15 9 frequency column frequencies 6 upper limit of sixth class

Chapter 3

22

Class boundary = upper limit + lower limit of next class 2

Ex:

Upper boundary

of first class (600+601)/2 = 600.5

Lower boundary

of second class (601+600)/2 = 600.5

Upper boundary one class = Lower boundary next class

Chapter 3

23

Class width = upper boundary - lower boundary

Example: Width of first class 600.5 - 400.5 = 200 Width of second class 800.5 - 600.5 = 200

Chapter 3

24

Class midpoint = lower limit + upper limit 2

Ex: Midpoint of the first class (401 + 600)/2 = 500.5

Ex: Midpoint of the second class (601 + 800)/2 = 700.5

Chapter 3

25

class interval Height (cm) 60 - 62 63 - 65 66 - 68 69 - 71 72 – 74 Total Number of Students 10 18 42 27 8 105 i. First class limits.

Lower class limit = 60 Upper class limit = 62 ii. First class boundary. Upper boundary = 62.5 Lower boundary = 59.5

iii. Class width.

Example: c = 62.5 - 59.5 = 3 iv. First class midpoint = (60 + 62)/2 = 61 v. Class frequency = number of students frequency

Chapter 3

26

Weekly Earnings (RM) 400 600 600 - 800 800 - 1000 1000 - 1200 1200 - 1400 1400 - 1600 Num of Employees (

f

) 9 12 39 15 9 6

Class limit = Class boundary

Chapter 3

27

Raw Data: 15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.5 15.5 15.5 15.5 15.5 15.5 15.6 15.6 15.6 15.7 15.7 15.7

Frequency Distribution Class 15.2

15.3

15.4

15.5

15.6

15.7

Tallies Frequency // //// 2 5 //// //// / 11 //// / 6 /// /// 3 3

Relative Frequency Distribution Class 15.2

15.3

15.4

15.5

15.6

15.7

Frequency (1) Cumulative Relative Freq.

(1)

Relative 30 Frequency 2 5 11 6 3 3 30 0.07

0.16

0.37 0.20

0.10

0.10

1.00

0.07

0.23

0.60

0.80

0.90

1.00

Chapter 3

29

 When constructing a frequency distribution table, we need to make the following three major decisions :    Number of Classes Class Width Lower Limit of the First Class / Starting Point

Chapter 3

30

Number of Classes

 

k = 1 + 3.3 log n

Class width

i

≥ Largest Value – Smallest Value Number of classes (k) Lower Limit of the First Class/ Starting Point

Any convenient number that is equal to or less than the smallest value in the data set can be used as the lower limit of the first class.

Chapter 3

31

1.

Determine the Class Interval Size or Class Width) Example: Given the following data 100

74

84 95 95 110 99 87 100 108 85 103 99 83 91 91 84 110 113 105 100 98 100 108 100 98 100 107 79 86

123

107 87 105 88 85 99 101 93 99 u

R = 123 - 74 = 49

Chapter 3

32

Number of Classes

k = 1 + 3.3 log n = 1 + 3.3 log 40 = 6.3

≈ 6

Chapter 3

33

     i

Class Width

≥ Largest Value – Smallest Value Number of classes ( k ) ≥ 49/6 ≥ 9

Chapter 3

34

Grouped Frequency Distribution

6 classes

Cumulative Class Frequency (1) Relative Frequency (1)

40 71 - 80 81 - 90 91 - 100 101 - 110 111 - 120 121 - 130 Upper Limit 100 Class Interval Midpoint Lower Limit 91 (71 + 80)/2 = 75.5

%

Class width = 130.5 – 120.5

= 10

Chapter 3

35

Class Boundary

real class limit.

Is given by the mid-point of the upper limit of one class and the lower limit of the next class. Class boundaries are also call

Chapter 3

36

• • • Histogram is a certain kind of graph that can be drawn for a frequency distribution, a relative frequency distribution or a percentage distribution.

To draw histogram, mark horizontal axis as classes and vertical axis as frequencies (or relative frequencies or percentage).

A histogram is called a frequency histogram, a relative frequency histogram or a percentage histogram depending on the vertical axis

Chapter 3

37

12 10 8 6 4 2 0 15.2 15.5 15.8 16.1 16.4 16.7

15.5 15.8 16.1 16.4 16.7 16.10

Chapter 3 Class 15.2-15.5 15.5-15.8

15.8-16.1

16.1-16.4

16.4-16.7

16.7-16.10

Frequency 2 5 11 6 3 3

38

• • • A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.

A graph of polygon consist of class midpoints on the horizontal axis and the frequencies, relative frequencies or percentages on the vertical axis.

A histogram is called a frequency histogram, a relative frequency histogram or a percentage histogram depending on the vertical axis

Chapter 3

39

12 10 8 6 4 2 0 15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1

Pro duction Level in Yards Chapter 3

40

12 10 8 6 4 2 0 15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1

Pr oduction Level in Yards Chapter 3

41

Ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.

Chapter 3

42

• • • Each value is divided into two portions (a stem and a leaf). The leaves for each stem are shown separately is a display.

An advantage of a stem and leaf display is we do not lose information on individual observations only for quantitative data

Chapter 3

43

The following are scores of 30 college students on a statistics test:

 75 93 79 71 52 95 68 87 80 69 50 72 96 72 92 92 65 81 83 57 79 61 84 98 71 76 77 87 86 64 

Construct a stem and leaf display.

Chapter 3

44

1.

2.

3.

4.

Split each score into two parts First part contains first digit which called stem Second part contains the second digit which called the leaf Arranged in increasing order.

stem 5 6 7 8 9 2 5 leaves

Chapter 3

45

 The complete stem and leaf display for scores is shown below: 5 6 7 8 9 2 0 7 5 9 1 8 4 5 9 1 2 6 9 7 1 2 0 7 1 6 3 4 7 6 3 5 2 2 8  From the figure, the stem 7 has the highest frequency followed by stem 8,9,6 and 5

Chapter 3

46

 The leaves for each stem are ranked in increasing order as below: 5 6 7 8 9 0 2 7 1 4 5 8 9 1 1 2 2 5 6 7 9 9 0 1 3 4 6 7 7 2 2 3 5 6 8

Chapter 3

47

  Diastolic blood pressure on 120 people.

60 Type A people vs. 60 Type B people   Type A: Extremely hostile, competitive, impatient Type B: Laid back people

Chapter 3

48

Type A: Extremely hostile, competitive, impatient 53, 57, 58, 59, 59, 60, … Type B: Laid back people 51, 52, 59, 59, 60, …

Chapter 3

49

5 6 6 6 6 7 7 8 37899 00001111 2223333 444455555 666777778888 0000111 333444789 011 5 6 6 6 7 7 8 9 1299 0001122233333 4445555555777 888889 0000111 222333466899 0000 3

Chapter 3

50

5 6 6 6 6 7 7 8 37899 00001111 2223333 444455555 666777778888 0000111 333444789 011 Modes

Chapter 3

51

Chapter 3

52

    distinguish among the measures of central tendency, measures of dispersion and measures of skewness.

calculate values for common measures of location, including the arithmetic mean, median and mode.

calculate values for common measures of dispersion, including range, variance, standard deviation and quartile deviation calculate values for measures of skewness.

Chapter 3

53

measure of asymmetry:

to show frequency distribution symmetrical about the mean or skewed

Measure of central tendency Statistical Measures Measure of skewness

measure of location: to show where the centre of the data

Measure of dispersion

measure of spread: to show how spread out the data are around the centre Chapter 3

54

MEASURE OF CENTRAL TENDENCY a) Set of values,

x =  x n - Add all observation - Divide this sum by the

number of observation b) Simple frequency distribution

x =   f x f

c) Grouped frequency

x =   f x f

( x = class midpoint) Chapter 3

55

MEASURE OF CENTRAL TENDENCY

it might be distorted by extremely high or low values.

Chapter 3

56

MEASURE OF CENTRAL TENDENCY

◦ Advantages    it is widely understood the value of every item is included in the computation of the mean.

it is well suited to further statistical analysis.

◦ Disadvantages   its value may not correspond to any actual value.

it might be affected by extremely high or low values.

Chapter 3

57

MEASURE OF CENTRAL TENDENCY Example a. The arithmetic mean (mean) of the number 8, 3, 5, 12, and 10 is..

b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1, the mean is..

c. Find the mean of the following frequency distribution

Class 1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 Frequency 1 4 8 6 3 1

Chapter 3

58

a.

MEASURE OF CENTRAL TENDENCY

x

 

x n

 5  7.6

b.

x

  

fx f

  5.7

c.

Class 1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 f 1 4 8 6 3 1   23

f

x (midpoint) 2 5 8 11 14 17 fx 2 20 64 66 42 17  

f x

211

x

Chapter 3

  

fx f

 211  9.17

23 59

MEASURE OF CENTRAL TENDENCY a) Set of data 2. Median (middle value of a distribution or array)

- Arrange the observations

in order of increasing size

- Find the number of observations

and the middle observation

- Identify the median as this middle

value b) Simple frequency distribution

n

 1

n odd and

2 2

even

( n = sample size )

n

2  1

c) Grouped frequency (i) Graphical method (ii) Interpolation method Chapter 3

60

MEASURE OF CENTRAL TENDENCY (i) Graphical Method Median = 700 Chapter 3

61

MEASURE OF CENTRAL TENDENCY (ii) Interpolation Method Median =

L m      n 2  F m  1 f m     C m

Where: L m = the lower boundary of the class containing the median.

n = the total frequencies.

F m-1 f C m m = the cumulative frequency in the classes immediately preceding the class containing the median.

= the frequency in the class containing the median.

= the width of the class in which the median lies.

Chapter 3

62

MEASURE OF CENTRAL TENDENCY

it is unaffected by extremely high or low values.

Chapter 3

63

MEASURE OF CENTRAL TENDENCY

  Advantages   it is unaffected by extremely high or low values.

can be used when certain end values of a set or distribution are difficult, expensive or impossible to obtain, particularly appropriate to ‘life’ data.

 can be used with non-numeric data if desired, providing the measurements can be naturally ordered.

 will often assume a value equal to one of the original data.

Disadvantages  it is difficult to handle theoretically in more advanced statistical work, so its use is restricted to analysis at a basic level.

 it fails to reflect the full range of values.

Chapter 3

64

MEASURE OF CENTRAL TENDENCY Example a. The times taken to inspect five units coming from a production line are recorded as 13, 14, 11, 17 and 11 minutes. What is the median?

b. Find the median of the following frequency distribution

Class 118 - 126 127 - 135 136 - 144 145 - 153 154 - 162 163 - 171 172 - 180 Frequency 3 5 9 12 5 4 2

Chapter 3

65

a.

MEASURE OF CENTRAL TENDENCY

11, 11, 13, 14, 17

median median

 

n

 1  2 13 2  3

b.

Class 118 - 126 127 - 135 136 - 144 145 - 153 154 - 162 163 - 171 172 - 180

median class n

2 40  20 2 f 3 5 9 12 5 4 2 F 3 8 17 29 34 38 40

median

L m

    

n

2 

F m

 1

fm

   

C m

=144.5+     40 2  17 12      147

Chapter 3

66

MEASURE OF CENTRAL TENDENCY 3. Mode (value which occurs most often) a) Set of data b) Simple frequency distribution

- Draw a frequency table

for the data

- Identify the mode as the

most frequent value Mode = value that appears most frequently c) Grouped frequency (i) Graphical method (ii) Interpolation method Chapter 3

67

MEASURE OF CENTRAL TENDENCY (i) Graphical Method

16 14 12 10 8 6 4 2 0

Mode = 146

110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180 Mileage (km)

Chapter 3

68

MEASURE OF CENTRAL TENDENCY (ii) Interpolation Method Mode =

L    D 1 D 1  D 2   C

Where: L = The lower class boundary of class containing the mode.

C = The class width for class containing the mode.

D 1 = Difference between the largest frequency and the frequency immediately preceding it (f0 – f-).

D 2 = Difference between the largest frequency and the frequency immediately following it (f0 – f+).

Chapter 3

69

MEASURE OF CENTRAL TENDENCY

Mode

the mode of a set of data is that value which occurs most often, or, equivalently , has the largest frequency.

Chapter 3

70

MEASURE OF CENTRAL TENDENCY

◦ Advantages  it is more appropriate average to use in situations where it is useful to know the most common value.

  easy to understand, not difficult to calculate and can be used when a distribution has opened-ended classes.

it is not affected by extreme values.

◦ Disadvantages    it ignores dispersion around the modal value and it does not take all the values into account.

it is unsuitable for further statistical analysis.

although it ignores extreme values, it is thought to be too much affected by the most popular class when a distribution is significantly skewed.

Chapter 3

71

MEASURE OF CENTRAL TENDENCY Example a. Find the mode of the following frequency distribution

Class 1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 Frequency 1 4 8 6 3 1

Chapter 3

72

MEASURE OF CENTRAL TENDENCY

Class 1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 mode Frequency 1 4 8 6 3 1   

D

1

D

1 

D

2   

C

mode class

 6.5

 (8  4)  (8  6) (9.5

 6.5)  8.5

Chapter 3

73

MEASURE OF DISPERSION

1. Range maximum value – minimum value

Chapter 3

74

MEASURE OF DISPERSION a) Set of data 2. Standard deviation

s =   x - x  2 n - Calculate the mean value - find the deviation of each

observation from this mean

- Square these deviations - add the squares - divide this sum by num of

observations - Square root of the value obtained b) Simple frequency distribution

s =

c) Grouped frequency

s =   fx   fx f f 2 2     s =   fx f   2   fx f  x 2 n     x n   2   2

where x = class mid-point Chapter 3

75

MEASURE OF DISPERSION Comparing standard deviation Chapter 3

76

MEASURE OF DISPERSION a) Set of data 3. Variance

v

=   x - x  2 n

v

=  x 2 n     fx n   2  2

b) Simple frequency distribution

s

2 =   fx f 2     fx f   2

c) Grouped frequency

s

2 =   fx f 2     fx f

where x = class mid-point

  2 77

MEASURE OF DISPERSION Example a. Find the variance and standard deviation of the following data:

Class 0 - 4.9

5 - 9.9

10 - 14.9

15 - 19.9

20 - 24.9

Frequency 3 5 7 6 2

Chapter 3

78

MEASURE OF DISPERSION

Class 0 - 4.9

5 - 9.9

10 - 14.9

15 - 19.9

20 - 24.9

f 3 5 7 6  2

f

 23 x 2.45

7.45

12.45

17.45

22.45

x 2 6.0025

55.5025

155.0025

304.5025

504.0025

s

2    fx f 2     fx f   2  4215.5575

23   33.6484

 33.65

281.35

23 2 fx 7.35

37.25

87.15

104.7

  44.9

fx

281.35

s

  5.8

s

2

Chapter 3

fx 2 18.0075

277.5125

1085.0175

1827.015

  1008.005

fx

2 4215.5575

79

MEASURE OF DISPERSION 4. Chebyshev’s Theorem - By using the mean and standard deviation, we can find the proportion or percentage of the total observation that fall within a given interval about the mean using Chebyshev’s theorem.

( 1  1 2 )

k

data values lie within k standard deviations of the mean. At least areas.

(1-1/k 2 )

of the values lie in the shaded

 

k

k

 

k

  

k

Chapter 3

80

MEASURE OF DISPERSION Example The average systolic blood pressure for 4000 women who were screened for high blood pressure was found to be 187 with a standard deviation of 22. Using Chebyshev’s theorem, find at least what percentage of women in this group have a systolic blood pressure between 143 and 231.

Chapter 3

81

MEASURE OF DISPERSION Solution:

  187

and

  22

To find the percentage of blood pressure between 143 and 231 143 143 - 187 = -44

  187

231 - 187 = 44 231 k is obtained by dividing the distance between the mean by standard deviation.

1  1

k

2

k

 44 22  2 1 (2) 2  0.75

Chapter 3

82

MEASURE OF DISPERSION At least 75% of the women have systolic blood pressure between 143 and 231 At least 75% of the women have systolic blood pressure between 143 and 231.

143

 2 

187

 

231

 2 

Chapter 3

83

MEASURE OF DISPERSION 5. Empirical Rule - The empirical rule applies only to a specific type of distribution called a bell-shaped distribution also known as normal curve.

68% of the observations lie within

one mean

95% of the observations lie within

the mean two standard deviation of the standard deviation of the mean

99.7% of the observations lie within

three standard deviation of 99.7% 95% 68%

  3    2     2    3 

Chapter 3

84

MEASURE OF DISPERSION Example 1 The age distribution of a sample of 5000 person is bell shaped with a mean of 40 years and a standard deviation of 12 years. Determine the approximate percentage of people who are 16 to 64 years old.

Chapter 3

85

MEASURE OF DISPERSION Solution:

x

 40

and

s

 12

To find the percentage of age between 16 and 64 16 16 - 40 = -24

x

 40

64 - 40 = 24 64 Dividing the distance,24 by the standard deviation,12 we have the distance is equal 2s

24  2 12

Chapter 3

86

MEASURE OF DISPERSION 16 - 40 = -24 = -2s 64 - 40 = 24 = 2s 16

x

 2

s x

 40

64

x

 2

s

Because the area within two standard deviations of the mean is approximately 95% for a bell-shaped curve, approximately 95% of the people in the sample are 16 to 64 years old.

Chapter 3

87

MEASURE OF DISPERSION Example 2 Assuming the incomes for all single parent household last year produces a bell shaped distribution with mean RM23,500 and standard deviation of RM4,500. Determine the range of income if it is distributed for 68% 95% 99.7% = = = (RM19,000,RM28,000) (RM14,500,RM32,500) (RM10,000,RM37,000) Chapter 3

88

MEASURE OF DISPERSION 6. Coefficient of variation

standard deviation (s) ×100% x • The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each other.

• Investopedia explains Coefficient Of Variation - CV In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. In simple language, the lower the ratio of standard deviation to mean return, the better your risk-return tradeoff.

Chapter 3

89

MEASURE OF DISPERSION Comparing coefficient of variation the higher the coefficient of variation, the more dispersed are the data Chapter 3

90

MEASURE OF DISPERSION Example 2 New Car Mean = RM20,100 Standard deviation = RM6,125 Used Car Mean = RM5,485 Standard dev.= RM2,730 Chapter 3

91

MEASURE OF DISPERSION 7. Quartile Deviation a) Set of data b) Simple frequency distribution - Quartiles are defined as value which are quarter the data

Q 1

- first quartile - value below 25% of -

Q 2

-

Q 3

observations - second quartile - half of the data(median) - third quartile - value below 75% of obs ervation

Quartile Deviation = Q - Q 3 1 2 Inter-quartile range = Q -Q 3 1 Q 1  

n

 1  4 Q 3  3 

n

 1  4

c) Grouped frequency (i) Graphical method (ii) Interpolation method Chapter 3

92

MEASURE OF DISPERSION (i) Graphical Method

F n 3n/4 n/4 Q1 Q3

Chapter 3

x 93

MEASURE OF DISPERSION (ii) Interpolation Method

Q = L Q 1

Where:

+      n 4 - F Q 1-1 f Q 1      C Q 1

L Q1 = the lower boundary of the class containing Q1.

n = the total frequencies F Q1-1 = the cumulative number of frequency in the classes immediately preceding the class containing Q1.

f Q1 = the frequency in the class containing Q1.

C Q1 = the width of the class in which Q1 lies.

Chapter 3

94

MEASURE OF DISPERSION

Q 3 +     3n 4 f - F Q 3-1 Q 3     C Q 3

Where: L Q3 = the lower boundary of the class containing Q3.

n = the total frequencies.

F Q3-1 = the cumulative number of frequency in the classes immediately preceding the class containing Q3.

f Q3 = the frequency in the class containing Q3.

C Q3 = the width of the class in which Q3 lies.

Chapter 3

95

MEASURE OF DISPERSION Example a. Find the quartile deviation of the following data:

Class 0 - 9.9

10 - 19.9

20 - 29.9

30 - 39.9

40 - 49.9

50 - 59.9

60 - 69.9

Frequency 5 19 38 43 34 17 4

Chapter 3

96

MEASURE OF DISPERSION

Class 0 - 9.9

10 - 19.9

20 - 29.9

30 - 39.9

40 - 49.9

50 - 59.9

60 - 69.9

Q 1 +     n 4 - F Q 1-1 f Q 1  19.95

   160 4  38 24     10     C Q 1  24.16

f 5 19 38 43 34 17 4 F 5 24 62 105 139 156 160 Q = L Q 3 +      3n 4 f - F Q 3-1 Q 3      C Q 3  39.95

     3(160)  105 4 34     10  44.36

Chapter 3

Q 1 = n 4 Q 3 = 3n 4 97

MEASURE OF DISPERSION Therefore the quartile deviation is,

Quartile Deviation =  Q - Q 3 1 2 44.36

 24.16

2  10.1

Chapter 3

98

MEASURE OF SKEWNESS

•Skewness is the degree of asymmetry •Method to describe data distribution •Data which are not symmetrical may be either positively or negatively skewed.

negative skewness positive skewness Chapter 3

99

MEASURE OF SKEWNESS Mean Mode Median Mode Median Mean Symmetric Histogram Positive Skewed Histogram Mean Median Mode

100

MEASURE OF SKEWNESS Example a. What type of distribution is described by the following information? Mean = 56 Median = 58.1

Mode = 63 Answer : Negatively skewed b.

1 2 3 4 1 1 2 2 3 3 4 5 6 7 3 4 4 5 6 6 1 1 2 2 2 3 0 0 1 Based on the stem-and-leaf plots above, find the i) median, ii) mode, iii) mean and iv) describe the shape of the distribution.

Answer : i) 24 ii) 32 iii) 23.76 iv) Negative skewed distribution Chapter 3

101

MEASURE OF SKEWNESS c.

Class 0 - 100 100 - 200 200 - 300 300 - 400 400 - 500 Frequency 5 19 38 43 34

Based on the distribution table i) construct a histogram, and ii) describe the shape of the distribution.

Chapter 3

102

MEASURE OF SKEWNESS Curve A Chapter 3

103

MEASURE OF SKEWNESS Curve A Curve B Chapter 3

104

MEASURE OF SKEWNESS Curve A Curve C Curve B Chapter 3

105

MEASURE OF SKEWNESS Curve A: Chapter 3

106

MEASURE OF SKEWNESS Curve A: Curve B: Chapter 3

107

MEASURE OF SKEWNESS Curve A: Positively Skewed Chapter 3

108

MEASURE OF SKEWNESS Curve A: Positively Skewed Curve B: Negatively Skewed Chapter 3

109

BOX-AND-WHISKER PLOT A plot that show the center, spread and skewness of a data set. It is constructed by drawing a box and two whiskers that use the median,the first quartile, the third quartile and the smallest and the largest values in the data set between the lower and the upper inner fences.

Minimum Q 1 Q 2 Q 3 Maximum Chapter 3

110

BOX-AND-WHISKER PLOT Example The following data are the incomes (in thousands of dollars) for a sample of 12 households.

35 104 29 39 44 58 72 34 64 41 Construct a box-and-whisker plot for these data.

50 54 Chapter 3

111

BOX-AND-WHISKER PLOT Solution: Step 1: Rank the data 29 34 35 39 41 44 50 54 58 64 72 104 Q3 Q1 median

median  44  50 2

Q Q

1 3   IQR( 35 58

Q

3  2  39 64 2 

Q

1 )    37 61  47

Chapter 3

112

Step 2: Determine the lower and upper inner fences

  36 Lower inner fence Upper inner fence 

Q

1 

Q

3  36  37  36  1  36   97

Step 3: Determine the smallest and the largest values in the data set within the two inner fences Smallest value = 29 Largest value = 72 Step 4: Draw median First quartile Third quartile 25 35 45 55 65 75 85 95 105 Chapter 3

113

: called whiskers Step 5: median smallest value within the two inner fences First quartile Third quartile largest value within the two inner fences * 25 35 45 55 65 75 85 95 105 an outlier The data are skewed to the right outlier : value that falls outside the two inner fences (value that are very small or very large relative).

Chapter 3

114

BOX-AND-WHISKER PLOT S < 0 S = 0 S > 0 Negatively Skewed Symmetric (Not Skewed) Chapter 3 Positively Skewed

115

BOX-AND-WHISKER PLOT

Left-Skewed Symmetric Right-Skewed

Q

1

Q

2

Q

3

Q

1

Q

2

Q

3

Chapter 3

Q

1

Q

2

Q

3 116

BOX-AND-WHISKER PLOT

      Median close to the center of the box - symmetrical Median close to the left of the center of the box - positive skewed Median close to the right of the center of the box - negative skewed Whiskers are the same length - symmetrical Whisker is longer than the left whisker - skewed Whisker is longer than the right whisker - skewed positive negative

Chapter 3

117

BOX-AND-WHISKER PLOT

   A bimodal distribution has two modes.

All classes occur with approximately the same frequency in a uniform distribution.

An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph.

Chapter 3

118

POPULATION VERSUS SAMPLE

Measurement Mean Standard deviation Variance Sample

s x s

2 Population   

Chapter 3

119

POPULATION VERSUS SAMPLE 1. The following are ages of all eight employees of a small company 53 32 61 27 39 44 49 57 Find the mean age of these employees.

POPULATION 2. The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership.

10 5 19 8 3 Find the median SAMPLE Chapter 3

120

POPULATION VERSUS SAMPLE Example 3. Data in table below refer to the 2002 payrolls (in million of dollars) of five MLB teams. Those data are reproduced here.

MLB Team 2002 Total Payroll (million of dollars) Anaheim Angels Atlanta Braves New York Yankees St Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34

Find the variance and standard deviation of these data SAMPLE Chapter 3

121

POPULATION VERSUS SAMPLE Example 4. Following are the 2002 earning (in thousand of dollars) before taxes for all six employees of a small company.

48.50 38.40 65.50 22.6

Calculate the variance and standard deviation for these data.

POPULATION Chapter 3

122