Statistics - Healey Chapter 10-11

Download Report

Transcript Statistics - Healey Chapter 10-11

Week 10
Chapter 10 - Hypothesis Testing III :
The Analysis of Variance
(ANOVA)
&
Chapter 11 – Hypothesis Testing IV: Chi Square
Chapter 10
Hypothesis Testing III :
The Analysis of Variance
(ANOVA)
In This Presentation
 The basic logic of analysis of variance
(ANOVA)
 A sample problem applying ANOVA
 The Five Step Model
 Limitations of ANOVA
 Post hoc techniques
Chapter 8
Group 1 = All
Education
Majors
Population =
Penn State
University
RS of 100
Education
Majors
Chapter 9
Group 1 = All
Males in
Population
RS of 100
Males
Population =
Pennsylvania
Group 2 =
All Females
in
Population
RS of 100
Females
In this Chapter
Group 1 = All
Protestants
in Population
RS of 100
Protest.
Population =
Pennsylvania
Group 2 =
All Catholics
in
Population
Group 2 =
All Jews in
Population
RS of 100
Catholics
RS of 100
Jews
Basic Logic
 ANOVA can be used in situations where the researcher
is interested in the differences in sample means across
three or more categories
 Examples:



How do Protestants, Catholics and Jews vary in terms of
number of children?
How do Republicans, Democrats, and Independents vary
in terms of income?
How do older, middle-aged, and younger people vary in
terms of frequency of church attendance?
Basic Logic
 ANOVA is used when:
 The independent variable has more than two categories
 The dependent variable is measured at the interval or
ratio level
Basic Logic
 Can think of ANOVA as extension of t test for more
than two groups

The t test can only be used when the independent
variable only has two categories
 ANOVA asks “are the differences between the
samples large enough to reject the null hypothesis and
justify the conclusion that the populations represented
by the samples are different?” (p. 243)
 The Ho is that the population means are the same:

Ho: μ1= μ2= μ3 = … = μk
Basic Logic
 If the Ho is true, the sample means should be about
the same value

If the Ho is true, there will be little difference between
sample means
 If the Ho is false, there should be substantial
differences between categories, combined with
relatively little difference within categories


The sample standard deviations should be low in value
If the Ho is false, there will be big difference between
sample means combined with small values for sample
standard deviations
Basic Logic
 The larger the differences between the
sample means, the more likely the Ho is false
– especially when there is little difference
within categories
 When we reject the Ho, we are saying there
are differences between the populations
represented by the sample
Example 1
We have administered the support for capital punishment
scale to a sample of 20 people who are equally divided
across five religious categories
Example 1
Hypothesis Test of ANOVA
 Step 1: Make assumptions and meet test
requirements
 Independent random samples
 Interval-ratio level of measurement
 Normally distributed populations
 Equal population variances
Example 1
 Step 2: State the null hypothesis
 Ho: μ1 = μ2 = μ3 = μ4 = μ5
 H1: at least one of the populations means is
different
 Step 3: Select the sampling distribution and establish
the critical region
 Sampling distribution = F distribution
 Alpha = 0.05
 dfw = 15, dfb = 4
 F(critical) = 3.06
Example 1
 Step 4: Compute test statistic
 F = 2.57
 Step 5: Make a decision and interpret the results
 F(critical) = 3.06
 F(obtained) = 2.57
 The test statistic does not fall in the critical region,
so fail to reject the null hypothesis – support for
capital punishment does not differ across the
populations of religious affiliations
Limitations of ANOVA
1.
2.
3.
Requires interval-ratio level measurement of the
dependent variable and roughly equal numbers of
cases in the categories of the independent variable
Statistically significant differences are not
necessarily important
The alternative (research) hypothesis is not specific
– it only asserts that at least one of the population
means differs from the others

Use post hoc techniques for more specific differences
USING SPSS
 On the top menu, click on “Analyze”
 Select “Compare Means”
 Select “One Way ANOVA”
ANOVA in SPSS
 Analyze / Compare means / One-way ANOVA
ANOVA dialog box
ANOVA output
ANOVA
Total output
Between Groups
Within Groups
Total
Sum of
Squares
309.600
293.900
603.500
df
2
27
29
Mean Square
154.800
10.885
F
14.221
Sig .
.000
Chapter 11
Hypothesis Testing IV:
Chi Square
In This Presentation
 Bivariate (crosstabulation) tables
 The basic logic of Chi Square
 The terminology used with bivariate tables
 The computation of Chi Square with an
example problem
 The five step model
 Limitations of Chi Square
The Bivariate Table
Bivariate tables: display the scores of cases on two
different variables at the same time
The Bivariate Table
Note the two dimensions: rows and columns.
What is the independent variable?
What is the dependent variable?
Where are the row and column marginals?
Where is the total number of cases (N)?
Chi Square
 Chi Square can be used:
 with variables that are measured at any level (nominal,
ordinal, interval or ratio)
 with variables that have many categories or scores
 when we don’t know the shape of the population or
sampling distribution
Basic Logic
 Independence:

“Two variables are independent if the
classification of a case into a particular
category of one variable has no effect on
the probability that the case will fall into any
particular category of the second variable”
(p. 274)
Basic Logic
 Chi Square as a test of statistical significance
is a test for independence
Basic Logic
 Chi Square is a test of significance based on bivariate,
crosstabulation tables (also called crosstabs)
 We are looking for significant differences between the
actual cell frequencies OBSERVED in a table (fo) and
those that would be EXPECTED by random chance or
if cell frequencies were independent (fe)
Computation of Chi Square
Example
 RQ: Is the probability of securing employment in the
field of social work dependent on the accreditation
status of the program?
 NULL HYP: The probability of securing employment
in the field of social work is NOT dependent on the
accreditation status of the program. (The variables
are independent)
 HYP: The probability of securing employment in the
field of social work is dependent on the accreditation
status of the program. (The variables are dependent)
Computation of Chi Square
Example
Computation of Chi Square
Expected frequency (fe) for the top-left cell:
fe 
row m arg inal  column m arg inal 40  55

 22
N
100
Computation of Chi Square
Example
Step 1: Make Assumptions and Meet
Test Requirements
 Independent random samples
 Level of Measurement is nominal

Note the minimal assumptions. In
particular, note that no assumption is made
about the shape of the sampling
distribution. The chi square test is
nonparametric, or distribution-free
Step 2: State the Null Hypothesis
 Ho: The variables are independent
Another way to state the Ho, more
consistently with previous tests:
 H0: fo = fe
 H1: The variables are dependent
 Another way to state the H1:
 H1: fo ≠ fe

Step 3: Select the Sampling Distribution
and Establish the Critical Region
 Sampling Distribution = Chi Square, χ2
 Alpha = 0.05
 df = (r-1)(c-1) = (2-1)(2-1)= 1
 χ2 (critical) = 3.841
Step 4: Calculate the Test Statistic
 χ2 (obtained) = 10.78
Step 5: Make a Decision and Interpret
the Results of the Test
 χ2 (critical) = 3.841
 χ2 (obtained) = 10.78
 The test statistic falls in the critical region, so
reject Ho
 There is a significant relationship between
employment status and accreditation status in
the population from which the sample was
drawn
Interpreting Chi Square
 The chi square test tells us only if the
variables are independent or not
 It does not tell us the pattern or nature of the
relationship
 To investigate the pattern, compute
percentages within each column and
compare across the columns
Computation of Chi Square
 Are the homicide rate and volume of gun sales related for a
sample of 25 cities? (Problem 11.4, p. 295)
 The bivariate table shows the relationship between homicide rate
(columns) and gun sales (rows)
 This 2 x 2 table has 4 cells
Step 1: Make Assumptions and Meet
Test Requirements
 Independent random samples
 Level of Measurement is nominal

Note the minimal assumptions. In particular,
note that no assumption is made about the
shape of the sampling distribution. The chi
square test is nonparametric, or distributionfree
Step 2: State the Null Hypothesis
 Ho: The variables are independent

Another way to state the Ho, more
consistently with previous tests:
 Ho: fo = fe
 H1: The variables are dependent

Another way to state the H1:
 H1: fo ≠ fe
Step 3: Select the Sampling Distribution
and Establish the Critical Region
 Sampling Distribution = χ2
 Alpha = 0.05
 df = (r-1)(c-1) = (2-1)(2-1)=1
 χ2 (critical) = 3.841
Step 4: Calculate the Test Statistic
 χ2 (obtained) = 2.00
Step 5: Make a Decision and Interpret
the Results of the Test
 χ2 (critical) = 3.841
 χ2 (obtained) = 2.00
 The test statistic is not in the critical region,
fail to reject the Ho
 There is no relationship between homicide
rate and gun sales in the population from
which the sample was drawn
Interpreting Chi Square
 Cities low on homicide rate were high in gun sales, and cities
high in homicide rate were low in gun sales
 As homicide rates increase, gun sales decrease
 We found this relationship not to be significant, but it does have
a clear pattern
Homicide Rate
Gun Sales
Low
High
High
8 (66.7%)
5 (38.5%)
13
Low
4 (33.3%)
8 (61.5%)
12
12 (100%)
13 (100%)
25
Limitations of Chi Square
1.
2.
3.
Difficult to interpret when variables have many categories

Best when variables have four or fewer categories
With small sample size, cannot assume that Chi Square
sampling distribution will be accurate

Small sample: High percentage of cells have expected
frequencies of 5 or less
Like all tests of hypotheses, Chi Square is sensitive to sample
size

As N increases, obtained Chi Square increases

With large samples, trivial relationships may be significant
It is important to remember that statistical significance is
not the same as substantive significance
Chi Square in SPSS
Step 4: computing the test statistic in SPSS
Chi Square in SPSS
 Step 5: making a decision and interpreting
the results of the test
overweight_1 * urban Crosstabulation
Chi-Square Tests
urban
overweight_1
0
1
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
0
1
329
385.7
155
98.3
484
484.0
468
411.3
48
104.7
516
516.0
Total
797
797.0
203
203.0
1000
1000.0
Pearson Chi-Square
Continuity Correction a
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases
Value
79.699b
78.301
82.696
79.619
df
1
1
1
1
Asymp. Sig.
(2-sided)
.000
.000
.000
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.000
.000
.000
1000
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 98.
25.
Result
(χ2 obtained)
Chi Square in SPSS
Symmetric Measures
Nominal by Nominal
N of Valid Cases
Contingency Coefficient
Value
.272
1000
Approx. Sig.
.000
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
The nominal symmetric measures indicate
both the strength and significance of the relationship
between the row and column variables of a
crosstabulation.