Making Sense of the Social World th 4 Edition Chapter 8, Quantitative Data Analysis.

Download Report

Transcript Making Sense of the Social World th 4 Edition Chapter 8, Quantitative Data Analysis.

Making Sense of
the Social World
th
4 Edition
Chapter 8, Quantitative Data Analysis
What are the Options for
Summarizing Distributions?


Measures of Central Tendency: mode, median
and mean
Measures of Variation: range, interquartile
range, variance and standard deviation
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Mode
The mode is the most frequent value in a distribution.
Respondent's Religious Preference (GSS94)
2000
Count
1000
0
PROTESTANT CATHOLIC
JEWISH
NONE
OTHER
RS RELIGIOUS PREFERENCE
In a distribution of Americans’ religious affiliations, Protestant
Christian is the most frequently occurring value—the largest
single group.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Median
The median is the position average, or the point that divides the
distribution in half (the 50th percentile).
HIGHEST YEAR OF SCHOOL COMPLETED
Valid
Missing
Total
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DK
Total
NAP
Frequency
4
1
3
6
12
15
19
29
109
85
102
168
929
277
321
146
433
97
119
46
64
3
2988
4
2992
Percent
.1
.0
.1
.2
.4
.5
.6
1.0
3.6
2.8
3.4
5.6
31.0
9.3
10.7
4.9
14.5
3.2
4.0
1.5
2.1
.1
99.9
.1
100.0
Valid Percent
.1
.0
.1
.2
.4
.5
.6
1.0
3.6
2.8
3.4
5.6
31.1
9.3
10.7
4.9
14.5
3.2
4.0
1.5
2.1
.1
100.0
Cumulative
Percent
.1
.2
.3
.5
.9
1.4
2.0
3.0
6.6
9.5
12.9
18.5
49.6
58.9
69.6
74.5
89.0
92.2
96.2
97.8
99.9
100.0
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The median in a
frequency
distribution is
determined by
identifying the
value
corresponding to
a cumulative
percentage of
50.
The Mean
The mean is just the arithmetic average.
Mean = Sum of value of cases/number of cases
In algebraic notation, the equation is:
Y = å Yi / N .
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Mean, cont’d
For example, to calculate the mean value of eight
cases, we add the values of all the cases (Yi)
and divide by the number of cases (N):
(28 + 117 + 42 + 10 + 77 + 51 + 64 + 55) /8 = 444/8 = 55.5
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
Measures of Variation
It is important to know that the median household income in the
United States is a bit over $40,000 a year, but if the variation in
income isn’t known—the fact that incomes range from zero up to
hundreds of millions of dollars—we haven’t really learned much.
Measures of variation capture how widely or densely spread
income (for instance) is.
Four popular measures of variation for quantitative variables are
the range, the interquartile range, the variance, and the standard
deviation (which is the single most popular measure of
variability).
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Range
The range is the simplest measure of variation, calculated
as the highest value in a distribution minus the lowest value,
plus 1:
Range = Highest value – Lowest value + 1
It often is important to report the range of a distribution, to
identify the whole range of possible values that might be
encountered.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Range, cont’d.
Number of times
Respondent saw
Star Wars:
Say that you surveyed 10 people, and
asked them how many times they saw the
movie Star Wars, and their answers
looked like this:
The range for “times respondent saw Star
Wars” is 20 – 0 + 1= 21.
However, since the range can be
drastically altered by just one
exceptionally high or low value (termed
an outlier), it’s not a good summary
measure for most purposes.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
0
2
2
3
4
4
5
20
2
1
Interquartile Range
The interquartile range avoids the problem created by
outliers, by showing the range where most cases lie.
Quartiles are the points in a distribution corresponding to
the first 25% of the cases, the first 50% of the cases, and
the first 75% of the cases.
In the example of number of times respondents saw Star
Wars, the first 25% of cases fall within the range of 0 and
1.75 times. The second quartile fall within the range of 1.75
and 2.5 times. The third quartile falls within 2.5 and 4.25
times. The last quartile, of course, is between 4.25 and 20
times.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
Interquartile Range, cont’d
The interquartile range is the difference between
the first quartile and the third quartile (plus 1).
In our Star Wars example, the interquartile range is
4.25 – 1.75 + 1 = 3.50.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Variance
The variance, in its statistical definition, is the average
squared deviation of each case from the mean; you take
each case’s distance from the mean, square that number,
and take the average of all such numbers.
2
å(Y
Y
)
2
i
s =
N
Symbol key: = mean; N = number of cases; S = sum over
all cases; Yi = value of case i on variable Y.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
Number of times
Respondent saw
Star Wars:
_
(Yi – Y)
_
(Yi – Y)2
0
2
2
-4.2
17.64
-2.2
4.84
-2.2
4.84
3
4
4
-1.2
1.44
-0.2
0.04
-0.2
0.04
5
20
2
1
0.8
0.64
15.8
249.64
-2.2
4.84
-3.2
10.24
0
294.2
Total = 42
The Variance, cont’d
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
2
å(Y
Y
)
i
s2 =
N
σ2 =294.2
10
σ2 = 29.42
The variance takes into account the amount by
which each case differs from the mean.
As you can see it is affected by outliers, such as
the person who saw Star Wars 20 times.
It is mainly useful for computing the standard
deviation, which comes next in our list here.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The Standard Deviation
The standard deviation is simply the square root of
the variance. It is the square root of the average
squared deviation of each case from the mean:
s=
å(Yi - Y) 2
N
Symbol key: ¯ Y = mean; N = number of cases; S = sum
over all cases; Yi = value of case i on variable Y; =
square root.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
The standard deviation has mathematical properties that make it the
preferred measure of variability in many cases, particularly when a
variable is normally distributed.
10
8
6
4
2
Std. Dev = 12.67
Mean = 75.0
N = 25.00
0
45.0
65.0
85.0
Scores
A graph of a normal distribution looks like a bell, with one “hump” in
the middle, centered around the population mean, and the number of
cases tapering off on both sides of the mean.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
A normal distribution is symmetric: If you folded it in half at its center (at
the population mean), the two halves would match perfectly.
10
8
6
4
2
Std. Dev = 12.67
Mean = 75.0
N = 25.00
0
45.0
65.0
85.0
Scores
If a variable is normally distributed, 68% of the cases (almost exactly
2/3) will lie between plus and minus 1 standard deviation from the
distribution’s mean, and 95% of the cases will lie between 1.96 standard
deviations above and below the mean.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
Different Statistics for Different Data
Nominal
Mode
Ordinal Interval/Ratio
X
X
X
Median
X
X
Mean
X
X
Range
X
X
Interquartile Range
X
Variance
X
Standard Deviation
X
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
Crosstabulation of Voting in 2000 by Family Income: Cell Counts and
Percentages
FAMILY INCOME: CELL COUNTS
Voting
<$20,000
Voted
178
Did not vote
182
Total (n)
(360)
$20,000-$34,999 $35,000 - $59,999
239
135
(374)
364
168
(532)
$60,000+
761
193
(954)
FAMILY INCOME: PERCENTAGES
Voting
<$20,000
Voted
49%
Did not vote 51%
Total
100%
$20,000-$34,999 $35,000 - $59,999
64%
36%
100%
68%
32%
100%
Source: General Social Survey, 2004. Weighted.
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications
$60,000+
80%
20%
100%
Summary statistics describe particular features of a
distribution and facilitate comparison among
distributions.
The next step is to test for associations . . .
Chambliss/Schutt, Making Sense of the Social World 4th edition
© 2012 SAGE Publications