Dispersion - WordPress.com

Transcript Dispersion - WordPress.com

 In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which the individual items vary ” 2



Measures of dispersion

statistics that describe how similar a set of scores are to each other are descriptive   The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be  In general, the more spread out a distribution is, the larger the measure of dispersion will be 3

 Which of the distributions of scores has the larger dispersion?

The upper distribution has more dispersion because the scores are more spread out That is, they are less similar to each other 125 100 75 50 25 0 1 2 3 4 5 6 7 8 9 10 125 100 75 50 25 0 1 2 3 4 5 6 7 8 9 10 4

The following are the main methods of measuring Dispersion:-

 Range  Interquartile Range and Quartile Deviation  Mean Deviation  Standard Deviation  Coefficient of Variation  Lorenz Curve 5

The Range is defined as the difference

between the largest score in the set of data and the smallest score in the set of data, X L - X S  What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 ?

The largest score (X L ) is 9; the smallest score (X S ) is 1; the range is X L - X S = 9 - 1 = 8 6

  The range is used when  you have ordinal data or  you are presenting your results to people with little or no knowledge of statistics The range is rarely used in scientific work as it is fairly insensitive   It depends on only two scores in the set of data, X L and X S Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9 7

Interquartile range

difference of the Upper and Lower quartiles (IR) is defined as the Example: Upper quartile = Lower quartile = Q 1 Q 3

Interquartile Range = Q

3 – Q 1 8

Quartile Deviation also, called semi-

interquaetile range is half of the difference between the upper and lower quartiles

Example: Quartile Deviation = Q

-Q

/ 2

   What is the SIR for the data to the right?

25 % of the scores are below 5  5 is the first quartile 25 % of the scores are above 25  25 is the third quartile IR = (Q 3 - Q 1 ) / 2 = (25 2 4 6 8 10 12 14 20 30 60  5 = 25 th %tile  25 = 75 th %tile 10

The relative measures of quartile deviation also called the Coefficient of Quartile

Deviation Example: Coefficient of (Q.D)= Q

– Q

/ Q

+ Q

 Mean Deviation is also known as average deviation. In this case deviation taken from any average especially Mean, Median or Mode.

While taking deviation we have to ignore negative items and consider all of them as positive. The formula is given below 12

The formula of MD is given below MD =  d N (deviation taken from mean) MD =  m N (deviation taken from median) MD =  z N (deviation taken from mode) 13

10-15 15-20 20-25 25-30 30-35

12.5

17.5

22.5

27.5

32.5

3 5 7 4 2 21

xi.fi

37.5

87.5

157.5

110 65 457.5

x-‾x

9.286

4.286

.714

5.714

10.714

30.714

x -x‾.fi

27.85

21.43

4.99

22.85

21.42

98.57

solution : MD =  d mean) N (deviation taken from =30.714/21 = 1.462

  When the deviate scores are squared in variance, their unit of measure is squared as well  E.g. If people’s weights are measured in pounds, then the variance of the weights would be expressed in pounds 2 (or squared pounds) Since squared units of measure are often awkward to deal with, the square root of variance is often used instead  The standard deviation is the square root of variance 16

    Very popular scientific measure of dispersion From SD we Correlation etc can calculate Skewness, It considers all the items of the series The squaring of deviations make them positive and the difficulty about algebraic signs which was expressed in case of mean deviation is not found here.

   Calculation is difficult not as easier as Range and QD It always depends on AM Extreme items gain great importance The formula of SD is = √ ∑ d 2 N Problem: Calculate Standard Deviation of the following series X – 40, 44, 54, 60, 62, 64, 70, 80, 90, 96 18

Standard deviation =



variance Variance = standard deviation

2 S.D

  When calculating variance, it is often easier to use a computational formula which is algebraically equivalent to the definitional formula: 2   X 2  N   N 2    X N   2   2 is the population variance, X is a score, scores  is the population mean, and N is the number of 20

X 9 8 6 5 8 6  = 42 X 2 81 64 36 25 64 36  = 306 X  2 1 -1 -2 1 -1  = 0 (X  ) 2 4 1 1 4 1 1  = 12 21

 2   X 2    2 N N   306  6 42 2 6 306  6 294  12  2 6  2    X    2 N  12 6  2 22



Variance is defined as the average of the square deviations:

 2    X    2 N 23

 First, it says to subtract the mean from each of the scores  This difference is called a deviate or a deviation

score

  The deviate tells us how far a given score is from the typical, or average, score Thus, the deviate is a measure of dispersion for a given score 24

 Why can’t we simply take the average of the deviates? That is, why isn’t variance defined as:  2    X    N This is not the formula for variance!

   One of the definitions of the mean was that it always made the sum of the scores minus the mean equal to 0 Thus, the average of the deviates must be 0 since the sum of the deviates must equal 0 To avoid this problem, statisticians square the deviate score prior to averaging them  Squaring the deviate score makes all the squared scores positive 26

   Variance is the mean of the squared deviation scores The larger the variance is, the more the scores deviate, on average, away from the mean The smaller the variance is, the less the scores deviate, on average, from the mean 27

 Because the sample mean is not a perfect estimate of the population mean, the formula for the variance of a sample is slightly different from the formula for the variance of a population: s 2    N  1 2 s 2 is the sample variance, X is a score, X is the sample mean, and N is the number of scores 28

It is a graphical method of studying dispersion. Its was given by famous statistician Max o Lorenz. It has great utility in the study of degree of inequality in distribution of income and wealth 29