Making Sense of the Social World th 4 Edition Chapter 8, Quantitative Data Analysis.
Download ReportTranscript Making Sense of the Social World th 4 Edition Chapter 8, Quantitative Data Analysis.
Making Sense of the Social World th 4 Edition Chapter 8, Quantitative Data Analysis What are the Options for Summarizing Distributions? Measures of Central Tendency: mode, median and mean Measures of Variation: range, interquartile range, variance and standard deviation Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Mode The mode is the most frequent value in a distribution. Respondent's Religious Preference (GSS94) 2000 Count 1000 0 PROTESTANT CATHOLIC JEWISH NONE OTHER RS RELIGIOUS PREFERENCE In a distribution of Americans’ religious affiliations, Protestant Christian is the most frequently occurring value—the largest single group. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Median The median is the position average, or the point that divides the distribution in half (the 50th percentile). HIGHEST YEAR OF SCHOOL COMPLETED Valid Missing Total 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DK Total NAP Frequency 4 1 3 6 12 15 19 29 109 85 102 168 929 277 321 146 433 97 119 46 64 3 2988 4 2992 Percent .1 .0 .1 .2 .4 .5 .6 1.0 3.6 2.8 3.4 5.6 31.0 9.3 10.7 4.9 14.5 3.2 4.0 1.5 2.1 .1 99.9 .1 100.0 Valid Percent .1 .0 .1 .2 .4 .5 .6 1.0 3.6 2.8 3.4 5.6 31.1 9.3 10.7 4.9 14.5 3.2 4.0 1.5 2.1 .1 100.0 Cumulative Percent .1 .2 .3 .5 .9 1.4 2.0 3.0 6.6 9.5 12.9 18.5 49.6 58.9 69.6 74.5 89.0 92.2 96.2 97.8 99.9 100.0 Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The median in a frequency distribution is determined by identifying the value corresponding to a cumulative percentage of 50. The Mean The mean is just the arithmetic average. Mean = Sum of value of cases/number of cases In algebraic notation, the equation is: Y = å Yi / N . Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Mean, cont’d For example, to calculate the mean value of eight cases, we add the values of all the cases (Yi) and divide by the number of cases (N): (28 + 117 + 42 + 10 + 77 + 51 + 64 + 55) /8 = 444/8 = 55.5 Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications Measures of Variation It is important to know that the median household income in the United States is a bit over $40,000 a year, but if the variation in income isn’t known—the fact that incomes range from zero up to hundreds of millions of dollars—we haven’t really learned much. Measures of variation capture how widely or densely spread income (for instance) is. Four popular measures of variation for quantitative variables are the range, the interquartile range, the variance, and the standard deviation (which is the single most popular measure of variability). Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Range The range is the simplest measure of variation, calculated as the highest value in a distribution minus the lowest value, plus 1: Range = Highest value – Lowest value + 1 It often is important to report the range of a distribution, to identify the whole range of possible values that might be encountered. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Range, cont’d. Number of times Respondent saw Star Wars: Say that you surveyed 10 people, and asked them how many times they saw the movie Star Wars, and their answers looked like this: The range for “times respondent saw Star Wars” is 20 – 0 + 1= 21. However, since the range can be drastically altered by just one exceptionally high or low value (termed an outlier), it’s not a good summary measure for most purposes. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications 0 2 2 3 4 4 5 20 2 1 Interquartile Range The interquartile range avoids the problem created by outliers, by showing the range where most cases lie. Quartiles are the points in a distribution corresponding to the first 25% of the cases, the first 50% of the cases, and the first 75% of the cases. In the example of number of times respondents saw Star Wars, the first 25% of cases fall within the range of 0 and 1.75 times. The second quartile fall within the range of 1.75 and 2.5 times. The third quartile falls within 2.5 and 4.25 times. The last quartile, of course, is between 4.25 and 20 times. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications Interquartile Range, cont’d The interquartile range is the difference between the first quartile and the third quartile (plus 1). In our Star Wars example, the interquartile range is 4.25 – 1.75 + 1 = 3.50. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Variance The variance, in its statistical definition, is the average squared deviation of each case from the mean; you take each case’s distance from the mean, square that number, and take the average of all such numbers. 2 å(Y Y ) 2 i s = N Symbol key: = mean; N = number of cases; S = sum over all cases; Yi = value of case i on variable Y. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications Number of times Respondent saw Star Wars: _ (Yi – Y) _ (Yi – Y)2 0 2 2 -4.2 17.64 -2.2 4.84 -2.2 4.84 3 4 4 -1.2 1.44 -0.2 0.04 -0.2 0.04 5 20 2 1 0.8 0.64 15.8 249.64 -2.2 4.84 -3.2 10.24 0 294.2 Total = 42 The Variance, cont’d Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications 2 å(Y Y ) i s2 = N σ2 =294.2 10 σ2 = 29.42 The variance takes into account the amount by which each case differs from the mean. As you can see it is affected by outliers, such as the person who saw Star Wars 20 times. It is mainly useful for computing the standard deviation, which comes next in our list here. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The Standard Deviation The standard deviation is simply the square root of the variance. It is the square root of the average squared deviation of each case from the mean: s= å(Yi - Y) 2 N Symbol key: ¯ Y = mean; N = number of cases; S = sum over all cases; Yi = value of case i on variable Y; = square root. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications The standard deviation has mathematical properties that make it the preferred measure of variability in many cases, particularly when a variable is normally distributed. 10 8 6 4 2 Std. Dev = 12.67 Mean = 75.0 N = 25.00 0 45.0 65.0 85.0 Scores A graph of a normal distribution looks like a bell, with one “hump” in the middle, centered around the population mean, and the number of cases tapering off on both sides of the mean. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications A normal distribution is symmetric: If you folded it in half at its center (at the population mean), the two halves would match perfectly. 10 8 6 4 2 Std. Dev = 12.67 Mean = 75.0 N = 25.00 0 45.0 65.0 85.0 Scores If a variable is normally distributed, 68% of the cases (almost exactly 2/3) will lie between plus and minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between 1.96 standard deviations above and below the mean. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications Different Statistics for Different Data Nominal Mode Ordinal Interval/Ratio X X X Median X X Mean X X Range X X Interquartile Range X Variance X Standard Deviation X Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications Crosstabulation of Voting in 2000 by Family Income: Cell Counts and Percentages FAMILY INCOME: CELL COUNTS Voting <$20,000 Voted 178 Did not vote 182 Total (n) (360) $20,000-$34,999 $35,000 - $59,999 239 135 (374) 364 168 (532) $60,000+ 761 193 (954) FAMILY INCOME: PERCENTAGES Voting <$20,000 Voted 49% Did not vote 51% Total 100% $20,000-$34,999 $35,000 - $59,999 64% 36% 100% 68% 32% 100% Source: General Social Survey, 2004. Weighted. Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications $60,000+ 80% 20% 100% Summary statistics describe particular features of a distribution and facilitate comparison among distributions. The next step is to test for associations . . . Chambliss/Schutt, Making Sense of the Social World 4th edition © 2012 SAGE Publications