EPSY 439 - Texas A&M University

Transcript EPSY 439 - Texas A&M University

EPSY 439

Variability

Range Variance Standard Deviation

z-score

Concept Map

Variability - Chapter 3

Index of Vari abil ity D is tribution Single Sc ore R ange Standard D ev iation Varianc e z Sc ores D epends only on ex tr eme s c ores Same mean, s ame r ange differer nt form C omputation: Population v ariability D ev iation-Sc ore Sc ores :(X- Mean) C omputation: R aw- Sc ore D ev iation Graphing Standard D ev iations Meas ure of a s ample's v ariablity C omputation: D ev iation-Sc ore Es timate of a population's v ariability C omputation: R aw- Sc ore C onc ept of N -1 D es c ribes the relation of an X to the Mean with res pec t to the v ar iability of the dis tribution R epres ents # of SD s a s c ore is abov e or below the Mean Giv es y ou a w ay to c ompar e raw s c ores Standard Sc ore R ange betw een -3 and +3 April 24, 2020 Copyright 2000 - Robert J. Hall 2

Overview

The mean, as a single value, is a useful measure for describing  the center of a distribution and  where the most frequent scores in a distribution are located. It does

not

tell us very much, however, about  scores away from the center of the distribution and/or  scores that occur infrequently.

Overview

Thus, to describe a data set accurately, we need to know not only where scores are centered but also about how much individual scores within the distribution differ from one another. Measures of variability , then, provide context (spread/consistency) for measures of center.

Overview

Three different distributions have the same mean score

Sample A

0 2 6 10 12 S `

= 6

Sample B

8 7 6 5 4 S `

= 6

Sample C

6 6 6 6 6 S `

= 6 Each of the three samples has a mean of 6, so if you did not look at the raw scores, you might think that we have three identical distributions. Clearly, they are not the same.

Overview

Distance between the locations of scores in three distributions Sample A `

= 6 0, 2, 6, 10, 12

X X X X X

Sample B `

= 6 4, 5, 6, 7, 8

X X X X X

Sample C `

= 6 6, 6, 6, 6, 6

X X X X X

0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Here you can see differences that separate scores in the three samples.

Statistical Model of Variability

The statistical model for abnormality requires two statistical concepts:  a measure of the average and  a measure of the deviation from the average – We have examined the standard techniques for defining the average score; we will now focus on methods for measuring deviations.

– The fact that scores deviate from average means that they are variable. Variability, then, is one of the most basic statistical concepts.

Meaning of Variability

The term

variability

has much the same meaning in statistics as it has in everyday language; to say that things are variable means that they are not all the same.

 In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution.

– Are scores all clustered together, or are they scattered over a wide range of values?

Measuring Variability

A good measure of variability should:   provide an accurate the distribution, and picture of the spread of give an indication of how well an individual score (or group of scores) represents the entire distribution.

– In inferential statistics, where we often use small samples to answer general questions about large populations, variability is a particularly important concern. Why?

Types of Variability

In this section we will consider three different measures of variability:    the

range

, –

interquartile range

the

standard deviation,

and the

variance

Of these three, the standard deviation is by far the most important.

Range

 The

range

is the distance between the largest score (

max ) and the smallest score in the distribution (

min ).  In determining this distance, you must also take into account the real limits of the maximum and minimum

values.  The range, therefore, is computed as:

range = URL

max 

LRL

Range

 The

range

is the most obvious way of describing how spread out the scores are in the distribution.

*** Problem ***  Range is completely determined by the two extreme values and ignores the other scores in the distribution.

Range

Example:  The following two distributions have exactly the same range (10 points in each case) but – Distribution 1 clusters scores at the upper end of the range – Distribution 2 spreads scores out over the range Distribution 1: 1, 8, 9, 9, 10, 10 Distribution 2: 1, 2, 4, 6, 8, 10 April 24, 2020 Copyright 2000 - Robert J. Hall 13

Range

Summary  Because the range does not consider all the scores in the distribution, it often does not give an accurate description of the variability for the entire distribution.

 Generally, the range is considered to be a crude and unreliable measure of variability.

– April 24, 2020 John Tukey has developed ways looking at data that give prominence to the dispersion of data using what is called the boxplot and interquartile range .

Interquartile Range

 A distribution can be divided into four equal parts using quartiles.

 The first quartile (

1 ) is the score that separates the lower 25% of the distribution from the rest.  The second quartile (

2 or median) is the score that has exactly two quarters, or 50%, of the distribution below it.

Interquartile Range

 Finally, the third quartile (

3 ) is the score that divides the bottom three-fourths of the distribution from the top quarter.

 The

interquartile range

is the distance between the first quartile and the third quartile: interquartile range =

Q 3 -

Interquartile Range

Frequency distribution for a set of 16 scores 4.0

Interquartile range (3.5 points) 3.0

Bottom 25% Top 25% 2.0

April 24, 2020 1.0

0.0

2.0

4.0

6.0

8.0

3.0

1 5.0

7.0

= 4.5

3 SCORE 9.0

= 8 10.0

11.0

N = 16.00

Boxplot

Max Q

3 = 8

2 = 6

1 = 4.5

Min

IQR =

3.5

Construction of Boxplot

 Draw a number line that includes the range of observations.

 Compute

1 ,

2 , and

3 .

 Above the line drawn in the first step, draw a box extending from

1 to

3 .

 Inside the box, draw a line at the median.

Construction of Boxplot

 To identify the outliers, compute the lower and upper fences, which are located at 1.5 (IQR) below

1 and above

3 . That is, the lower fence is

L =

1 - 1.5(IQR) and the upper fence is

U =

Construction of Boxplot

 Observations located beyond the fence are classified as outliers and are identified with an asterisk (*).

 If there are no outliers, extend horizontal line segments (whiskers) from the ends of the box to the smallest and largest observations.  If there are outliers, extend the whiskers to the smallest and largest non-outliers.

Example

Temperatures at time of space shuttle launches: 53 57 58 63 66 67 67 67 68 69 70 70 Five number summary: 70 70 72 73 75 75 76 76 78 79 80 81 Low 53

1 66.84

3 75.5

C F 1 1 0

Example - Frequency Distribution

T e Median: 70.5

58.3

1 7 50 16.6

3 69.5

7 .

50 8 50  41 9 .

7  8 .

3  .

50 16 .

6 5 * 1  .

5 6 8 9 69 .

5  0 .

5  70 .

C F 1 1 0

Example - Frequency Distribution

T e First Quartile: 67.5

33.3

1 7 25 12.5

3 66.5

7 8 25  12 .

20 .

8 5 9  4 12 .

2 5  .

34 5 .

34 * 1  6 .

8 34 66 .

5  0 .

34  66 .

C F 1 1 0

Example - Frequency Distribution

T e Third Quartile: April 24, 2020 m li p e v e y d 1 7 8.3

3 74.5

7 8 75 .

0  8 .

3 66 .

9 7  8 8 .

3 3  1 .

00 5 1 .

00 * 1 6  8 1 .

00 74 .

5  1 .

00  75 .

S p a c

Example - Summary Statistics

S s c s e S m h p u e r 2 a a M i s L r s a e u s n c 7 7 0 0 .

0 0 7 M M M e e o a d d i c c 5 7 2 .

6 4 2 1 3 7 S V S a k S E r .

r i e r .

o r c a s .

1 9 9 1 K u S E r r r t .

o r i c 6 7 7 7 0 5 .

Example - Histogram

April 24, 2020 5 6 Histogram 4 3 2 1 0 52.5

55.0

57.5

60.0

62.5

65.0

67.5

70.0

72.5

75.0

77.5

80.0

Std. Dev = 7.22 Mean = 70.0

N = 24.00

Example - Boxplot

90 80 70 60 50 N = April 24, 2020 1 24 Launch Temperature Copyright 2000 - Robert J. Hall O 1 indicates that case number 1, 53 o , is an outlier in this data set. Case 1 was the temperature at the time that the

Challenger

space shuttle was launched. On the morning of January 28, 1986

Challenger

exploded and crashed 73 seconds after take off. There were no survivors. A simple boxplot such as this might have raised enough of a red flag for mission control to abort the launch.

Victims of Challenger Disaster

Seven astronauts were killed when the space shuttle Challenger suddenly exploded shortly after launch on January 28, 1986. The crewmembers who died during the mission are shown here. April 24, 2020 Copyright 2000 - Robert J. Hall 30

Standard Deviation - Index of Variability

 The most commonly used measure of the variability of scores in a distribution is the

standard deviation

 It is an index of the variability (spread) of scores about the mean of a distribution, something like the average dispersion or deviation of the scores (

) about their mean ( `

– The greater the spread of scores, the greater the standard deviation.

Standard Deviation - Illustration

– For example, in the figure below, the scores in distribution A cluster about the mean of the distribution, and the scores in distribution B are spread out.

 The standard deviation for distribution A, then, would be less than the standard deviation for distribution B.

Standard Deviation - Definition

 More formally, the

standard deviation

be defined as the square root of the may average squared deviation of scores from the mean of the distribution, measured in units of the original score.

 This definition is reflected in the

deviation score formula

  (

X N



) 2  

Standard Deviation - Explanation

 Major limitation of the range as a measure of the variability of scores in a distribution is that it is based on two extreme, often unstable, scores.

 Another way to measure the spread of scores in a distribution might be to consider the variability of each of the scores in the distribution about the mean -- the most stable measure of central tendency -- from each of the scores in the distribution.

Standard Deviation - Deviation Scores

 This process produces deviation scores that can be represented as:





 These deviation scores provide a measure of the

distance

of each raw score from the mean of the distribution.

Standard Deviation -

(

-

)

 Since we want a measure that takes all deviation scores into account, the most obvious next step would be to sum up the deviation scores.

 Since the mean is the balance point of the distribution, however, we know that this attempt is doomed to failure:  (



)  

Standard Deviation -

(

-

)

2  How do we get around this problem?

 Square each deviation score to eliminate the sign and to preserve the relative distance between scores and the mean.

 This would yield:  (



) 2  

Standard Deviation - Control for

 We’re finished, right?

 No.

 We still have a problem comparing across distributions because different distributions are likely to have different numbers of people.

 To control for the size of the distribution, we get a measure of the “average” amount of variability about the mean of the distribution.

Standard Deviation - Average Deviation

 This average can be obtained by dividing the sum of squared deviation scores by

:  (

X N



) 2  

Standard Deviation - Average Dispersion

 This formula will give us something like the average dispersion of scores in the distribution about the mean; it is defined as the variance.

 The last problem is to return the measure of variability back to the original score scale.

 This problem can be solved by taking the square root.

Standard Deviation - Square Root

 (

X N



) 2  

Standard Deviation - Example (Deviation Score)

1 3 5 7 9 April 24, 2020

Mean

5 5 5 5 5 

-4 -2 0 2 4 

x 2

16 4 0 4 16  40  

N x

2 8  2 .

83 S (

Standard Deviation - Example (Raw Score)

1 3 5 7 9 25

1 9 25 49 81 165

 

2   

 2

N N

83 44

Organizational Chart - Variability

Describing variability (differences between scores) Descriptive measures are used to describe a known sample of scores or a population In formulas final division uses

Inferential measures are used to estimate the population based on a sample In formulas final division uses

N - 1

To describe sample variance compute

S x

2 To describe population variance compute s

2 Taking square root gives Taking square root gives Sample standard deviation

S x

April 24, 2020 Population standard deviation s

To estimate population variance compute

s x

2 Taking square root gives Estimate population standard deviation

s x

EPSY 439 - Texas A&M University

Transcript EPSY 439 - Texas A&M University

EPSY 439

Variability

Concept Map

Overview

Overview

Overview

Overview

Statistical Model of Variability

Meaning of Variability

Measuring Variability

Types of Variability

Range

range = URL

LRL

Range

Range

Range

Interquartile Range

Interquartile Range

Interquartile Range

Boxplot

Construction of Boxplot

Construction of Boxplot

Construction of Boxplot

Example

Example - Frequency Distribution

Example - Frequency Distribution

Example - Frequency Distribution

Example - Summary Statistics

Example - Histogram

Example - Boxplot

Standard Deviation - Index of Variability

Standard Deviation - Illustration

Standard Deviation - Definition

Standard Deviation - Explanation

Standard Deviation - Deviation Scores

Standard Deviation -

(

-

)

Standard Deviation -

(

-

)

Standard Deviation - Control for

Standard Deviation - Average Deviation

Standard Deviation - Average Dispersion

Standard Deviation - Square Root

Standard Deviation - Example (Deviation Score)

Standard Deviation - Example (Raw Score)

Organizational Chart - Variability

Directory