Inferential Statistics III

Download Report

Transcript Inferential Statistics III

Chi-Square
X2
Parking lot exercise
• Graph the distribution of car values
for each parking lot
• Fill in the frequency and
percentage tables
The “null” hypothesis
•
•
•
•
•
•
•
•
Inferential statistics use samples to make conclusions about a population
Whenever we use inferential statistics the “null hypothesis” applies
– Null hypothesis: Any apparent effect of the independent variable(s) on the
dependent variable(s) was produced by chance
– Unless you can show otherwise, THE NULL IS ALWAYS TRUE
Researchers always want to REJECT the null hypothesis
– Rejecting the null hypothesis is the same as confirming the working hypothesis
The only way to reject the null is for the results of statistical tests (e.g., difference
between the means) to be very, very substantial
How substantial? The test statistic (e.g., r, t, b, X2, etc.) must be of such magnitude - so
large - that it goes way beyond what one would expect because of sampling error
How far is that? To reject the null the probability that it’s true must be LESS than
5 in 100 (p <.05)
How do we know if it is?
– If you’re doing the computation, compare the test statistic to a table
– If you’re reading a study, is there an asterisk by the test statistic? Usually one
asterisk (*) means the probability the null is true is less than 5/100. Two asterisks
(**) are better (p <.01, probability the null is true is less than 1/100). Three (***) is
great (p <.001, probability the null is true is less than 1/1000.)
If there are NO asterisks, the null hypothesis is true
2) is used when all
Chi-Square
(X
Chi square ( 2)
variables are categorical (not ordinal)
• Example: Does gender affect court disposition?
• Used with moderate size random samples
• Tests for relationship between two nominal variables (categorical,
cannot be ordered) that have been cross tabulated
• Evaluates difference between Observed and Expected cell
frequencies:
– “Observed” means the cell frequencies that are actually present
– “Expected” means the cell frequencies we would “expect” if
there was no relationship between the variables (null
hypothesis is true)
– If there is no difference, 2 is zero
– Greater the difference, the larger the value of 2
Class exercise
Hypothesis: Gender  Disposition
Observed cell frequencies
Court disposition
Gender
Jail
Released
Total
Male
84
16
100
Female
30
20
50
Total
114
36
n = 150
Creating the “Expected” table –
cell frequencies if the null hypothesis is true
Independent variable category total
Grand total
X
Dependent variable category total
Court disposition
Gender
Jail
Released
Total
Male
100
Female
50
Total
114
36
n = 150
Male/Jail: 100/150 X 114 = 75.9 = 76
Male/Released: 100/150 X 36 = 23.9 = 24
Female/Jail: 50/150 X 114 = 37.9 = 38
Female/Released: 50/150 X 36 = 11.9 = 12
Expected frequencies
Court disposition
Gender
Jail
Released
Total
Male
76
24
100
Female
38
12
50
Total
114
36
n = 150
Obtaining X2
•
•
•
•
•
(O - E)2
2 = ---------E
O= observed frequency E= expected frequency (what we would get if
the null hypothesis is true)
2 is the ratio of systematic variation to chance variation
The higher the ratio – the greater the systematic than the chance
variation – the more likely that we can reject the null
Chi-square is not a good measure because its significance level is
closely tied to sample size
Over-estimate significance with very large samples, under-estimate
with very small samples
Expected frequencies
Court disposition
Observed frequencies
Court disposition
Gender
Jail
Released
Total
Gender
Jail
Released
Total
Male
84
16
100
Male
76
24
100
Female
30
20
50
Female
38
12
50
Total
114
36
n = 150
Total
114
36
n = 150
(O - E)2
(84-76)2
(16-24)2 (30-38)2
(20-12)2
2 =  --------- = ----------- + ------------ + ------------ + ------------ = 10.5
E
76
24
38
12
2 = 10.5
df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1
Reject null hypothesis – there is less than one chance in a hundred that the
relationship between gender and court disposition is due to chance (p = <.01)
Class exercise
Hypothesis: More building alarms  Less crime
• Hypothesis: Building alarms lead to less crime
• Randomly sampled 120 businesses with alarms
– 50 had crimes, 70 didn’t
• Randomly sample 90 businesses without alarms
– 50 had crimes, 40 didn’t
• Build an observed table, then an expected table
• Remember, they’re tables, so place the values of the independent
variable in rows
• Compute 2
(O - E)2
2 = ---------E
Observed (obtained) frequencies
Crime
Y
N
Expected (by chance) frequencies
Crime
Total
Alarm
Y
N
Total
Alarm
Y
50
70
120
Y
57
63
120
N
50
40
90
N
43
47
90
Total
100
110
210
Total
100
110
210
(O - E)2
(50-57)2
(70-63)2 (50-43)2
(40-47)2
2 =  --------- = ----------- + ------------ + ------------ + ------------ = 3.82
E
57
63
43
47
2 = 3.82
df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1
To reject at .05 level need 2 = 3.841 or greater
Accept null hypothesis – NO significant relationship; what’s there
is due to chance
Demonstrating the meaning of “expected”
Expected (by chance) frequencies
Crime
Y
N
Expected (by chance) frequencies
Crime
Total
Alarm
Y
N
Total
Alarm
Y
57
63
120
Y
47%
53%
120
N
43
47
90
N
48%
52%
90
Total
100
110
210
Total
100
110
210
Checking the expected frequencies table by converting it into percentages
In a properly done expected table as you change the value of the independent variable, the
distribution across the dependent variable shouldn’t change
A properly done expected table will always show no relationship -- it’s the null hypothesis)
Back to the parking lots…
• Use the frequency (not percentage) table to
create a “frequencies expected” table
(meaning, expected if there is no
relationship)
• This table should artificially reflect no
relationship between income and car value
• Instructions on next slide…
ROW MARGINALS
TOTAL CASES
COLUMN MARGINALS
Computing expected frequencies:
Row marginal
Total cases
X Column marginal
Expected frequencies
• Now compute the Chi-Square
• Instructions on next slide
Computing Chi-Square
Minus
Minus
1. Cell by corresponding cell, subtract EXPECTED from
OBSERVED.
2. Square each difference.
3. Divide each result by the frequency EXPECTED.
4. Total them up.
In scientific research the greatest risk we can take of being wrong is five in onehundred (.05 column). Our Chi-square, 8.66, is more than the minimum required
of 7.815. So we can reject the NULL hypothesis and accept the WORKING
hypothesis that higher income persons drive more expensive cars.
Homework
Homework exercise
Hypothesis: Sergeants have more stress than patrol officers
Job Stress
Low
High
Total
Sergeant
30
60
90
Patrol Officer
86
24
110
116
84
200
Position on police force
Total
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
1. Calculate expected cell frequencies (null hypothesis of no relationship is true)
2. Compute Chi-square
3. Use table in Appendix E to determine your chi-square’s probability level
4. Can we reject the null hypothesis?
Homework answer
Job Stress
Low
High
Total
Sergeant
30
60
90
Patrol Officer
86
24
110
116
84
200
Position on police force
Total
Observed
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
Job Stress
Low
High
Total
Sergeant
52
38
90
Patrol Officer
64
46
110
116
84
200
Position on police force
Expected
Total
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
(30-52)2 (60-38)2 (86-64)2 (24-46)2
2 =  --------- + ---------- + --------- + --------- = 40.1
52
38
64
46
2 = 40.1
df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1
To reject at .05 level need 2 = 3.841 or greater
Reject null hypothesis – Less than 1 chance in 1,000 that
relationship is due to chance
Practice for the final
•
You will test a hypothesis using two categorical variables and determine whether the
independent variable has a statistically significant effect.
•
You will be asked to state the null hypothesis.
•
You will used supplied data to create an Observed frequencies table. You will use it to create
an Expected frequencies table. You will be given a formula but should know the procedure.
•
You will compute the Chi-Square statistic and degrees of freedom. You will be given formulas
but should know the procedures by heart.
•
You will use the Chi-Square table to determine whether the results support the working
hypothesis.
– Print and bring to class: http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf
•
Sample question: Hypothesis is that alarm systems prevent burglary. Random sample of 120
business with an alarm system and 90 without. Fifty businesses of each kind were burglarized.
– Null hypothesis: No significant difference in crime between businesses with and without
alarms
Observed frequencies
Expected frequencies
Observed frequencies
(50-57)2
--------- +
57
Expected frequencies
(70-63)2
(50-43)2
(40-47)2
---------- + ----------- + ----------- =
63
43
47
.86 + .78 + 1.14 + 1.04 = 3.82
– Chi-Square = 3.82
– Df = (r-1) X (c-1) = 1
– Check the table. Do the results support the working hypothesis? No - Chi-Square
must be at least 3.84 to reject the null hypothesis of no relationship between
alarm systems and crime, with only five chances in 100 that it is true