Transcript Chi-Square
Social Statistics: Chi-square
This week
What is chi-square
CHIDIST
Non-parameteric statistics
2
Parametric statistics
A main branch of statistics
3
Assuming data with a type of probability distribution (e.g.
normal distribution)
Making inferences about the parameters of the distribution
(e.g. sample size, factors in the test)
Assumption: the sample is large enough to represent the
population (e.g. sample size around 30).
They are not distribution-free (they require a probability
distribution)
Nonparametric statistics
Nonparametric statistics (distribution-free statistics)
4
Do not rely on assumptions that the data are drawn from a given
probability distribution (data model is not specified).
It was widely used for studying populations that take on a ranked
order (e.g. movie reviews from one to four stars, opinions about hotel
ranking). Fits for ordinal data.
It makes less assumption.Therefore it can be applied in situations
where less is known about the application.
It might require to draw conclusion on a larger sample size with the
same degree of confidence comparing with parametric statistics.
Nonparametric statistics
Nonparametric statistics (distribution-free
statistics)
Data with frequencies or percentage
Number of kids in difference grades
The percentage of people receiving social security
5
One-sample/Two-sample chi-square
One-sample chi-square includes only one
dimension
Two-sample chi-square includes two
dimensions
6
Whether the number of respondents is equally distributed
across all levels of education.
Whether the voting for the school voucher has a pattern
of preference.
Whether preference for the school voucher is
independent of political party affiliation and gender
Compute chi-square
One-sample chi-square test
2
(O E )
2
E
O: the observed frequency
E: the expected frequency
7
Example
Question: Whether the number of respondents is equally
distributed across all opinions?
One-sample chi-square
for
23
8
Preference for School Voucher
maybe
against
17
50
total
90
Chi-square steps
Step1: a statement of null and research
hypothesis
There is no difference in the frequency or proportion in each category
H 0 : P1 P2 P3
There is difference in the frequency or proportion in each category
H 1 : P1 P2 P3
9
Chi-square steps
Step2: setting the level of risk (or the level of
significance or Type I error) associated with
the null hypothesis
10
0.05
Chi-square steps
Step3: selection of proper test statistic
11
Frequencynonparametric procedureschisquare
Chi-square steps
Step4. Computation of the test statistic value
(called the obtained value)
category
for
maybe
against
Total
12
observed
frequency (O)
23
17
50
90
expected
frequency (E)
D(difference)
30
30
30
90
7
13
20
(O-E)2
49
169
400
(O-E)2/E
1.63
5.63
13.33
20.60
Chi-square steps
Step5: determination of the value needed for
rejection of the null hypothesis using the appropriate
table of critical values for the particular statistic
13
Distribution of Chi-Square
df = r-1 (r= number of categories)
If the obtained value > the critical value reject the null
hypothesis
If the obtained value < the critical value accept the null
hypothesis
Chi-square steps
14
Chi-square steps
Step6: a comparison of the obtained value and
the critical value is made
15
20.6 and 5.991
Chi-square steps
Step 7 and 8: decision time
16
What is your conclusion, why and how to
interpret?
Another example
17
We’ll settle the age-old debate of whether
people can actually detect their favorite cola
based solely on taste. For 30 coke-lovers, I
blindfold them, and have them sample 3
colas…is there a true difference, or are these
preference differences explainable by chance?
Hypothesis
Null: There are no preferences: The population
is divided evenly among the brands
Alternate: There are preferences: The
population is not divided evenly among the
brands
18
Chance Model
df = C -1 = 3 -1 = 2, set α = .05
For df = 2, X2-crit = 5.99
19
Calculate Chi-Square
category
Coke
Pepsi
RC Cola
Total
20
observed
frequency (O)
13
9
8
30
expected
frequency (E)
D(difference)
10
10
10
30
(O-E)2
3
1
2
9
1
4
(O-E)2/E
0.9
0.1
0.4
1.4
Decision and Conclusion
21
2
crit
5 . 99
2
obt
1 . 40
2
obt
2
crit
Conclude that the preferences are evenly
divided among the colas when the logos are
removed.
Excel functions
CHIDIST (x,degrees_freedom)
CHIDIST(20.6,2)
CHIDIST(1.40,2)
22
0.000036<0.05
0.496585304>0.05