Chapter 4: Random Variables and Probability Distributions

Download Report

Transcript Chapter 4: Random Variables and Probability Distributions

Statistics
Chapter 13: Categorical Data Analysis
Where We’ve Been


Presented methods for making inferences
about the population proportion associated
with a two-level qualitative variable (i.e., a
binomial variable)
Presented methods for making inferences
about the difference between two binomial
proportions
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
2
Where We’re Going



Discuss qualitative (categorical) data with
more than two outcomes
Present a chi-square hypothesis test for
comparing the category proportions
associated with a single qualitative variable
– called a one-way analysis
Present a chi-square hypothesis test relating
two qualitative variables – called a two-way
analysis
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
3
13.1: Categorical Data and the
Multinomial Experiment

Properties of the Multinomial Experiment
1.
The experiment consists of n identical trials.
2.
There are k possible outcomes (called classes,
categories or cells) to each trial.
3.
The probabilities of the k outcomes, denoted by p1, p2,
…, pk, where p1+ p2+ … + pk = 1, remain the same from
trial to trial.
4.
The trials are independent.
5.
The random variables of interest are the cell counts n1,
n2, …, nk of the number of observations that fall into
each of the k categories.
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
4
13.2: Testing Categorical
Probabilities: One-Way Table

Suppose three candidates are running
for office, and 150 voters are asked
their preferences.




Candidate 1 is the choice of 61 voters.
Candidate 2 is the choice of 53 voters.
Candidate 3 is the choice of 36 voters.
Do these data suggest the population
may prefer one candidate over the
others?
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
5
13.2: Testing Categorical
Probabilities: One-Way Table
Candidate 1 is the
choice of 61 voters.
Candidate 2 is the
choice of 53 voters.
H 0 : p1  p2  p3 
1
3
 No preference
H a : At least one of the proprtions exceeds
1
3
E (Number of votes for each candidate| H 0 )  150  50
3
E1  E2  E3  50
A chi-square ( 2 ) test is used to test H 0 .
Candidate 3 is the
choice of 36 voters.
n =150
2
2
2
[
n

E
]
[
n

E
]
[
n

E
]
3
2
2  1 1  2
 3
E1
E2
E3
2
2
2
[61

50]
[53

50]
[36

50]
2 


 6.52
50
50
50
2
.05,
df  2  5.99147
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
6
13.2: Testing Categorical
Probabilities: One-Way Table
Reject the null
hypothesis
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
7
13.2: Testing Categorical
Probabilities: One-Way Table
Test of a Hypothesis about Multinomial Probabilities:
One-Way Table
H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0
where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial
probabilities
Ha: At least one of the multinomial probabilities does not equal its
hypothesized value
2
2
2
Rejection
region:



[
n

E
]
,
2
i
i
Test statistic:   
Ei
with (k-1) df.
where Ei = np1,0, is the expected cell count given the null hypothesis.
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
8
13.2: Testing Categorical
Probabilities: One-Way Table
Conditions Required for a Valid
One-Way Table
1.
2.
2
Test:
A multinomial experiment has been conducted.
The sample size n will be large enough so that, for every cell,
the expected cell count E(ni) will be equal to 5 or more.
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
9
13.2: Testing Categorical
Probabilities: One-Way Table
Example 13.2: Distribution of Opinions About Marijuana
Possession Before Television Series has Aired
Legalization
Decriminalization
Existing Law
No Opinion
7%
18%
65%
10%
Table 13.2: Distribution of Opinions About Marijuana
Possession After Television Series has Aired
Legalization
Decriminalization
Existing Law
No Opinion
39
99
336
26
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
10
13.2: Testing Categorical
Probabilities: One-Way Table
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
11
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired
Legalization
Decriminalization
Existing Law
No Opinion
500(.07)=35
500(.18)=90
500(.65)=325
500(.10)=50
H 0 : p1  .07, p2  .18, p3  .65, p4  .10
H a : At least one of the proportions differs
from its null hypothesis value.
2
[
n

E
]
i
Test statistic:  2   i
Ei
Rejection region:  2  2 .01,df 3  11.3449
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
12
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired
Legalization
Decriminalization
Existing Law
No Opinion
500(.07)=35
500(.18)=90
500(.65)=325
500(.10)=50
Rejection region:  2  2 .01,df 3  11.3449
(39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
 



35
90
325
50
 2  13.249
2
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
13
13.2: Testing Categorical
Probabilities: One-Way Table
Expected Distribution of 500 Opinions About Marijuana
Possession After Television Series has Aired
Legalization
Decriminalization
Existing Law
No Opinion
500(.07)=35
500(.18)=90
500(.65)=325
500(.10)=50
Rejection region:  2  2 .01,df 3  11.3449
(39  35) 2 (99  90) 2 (336  325) 2 (26  50) 2
 



35
90
325
50
 2  13.249
Reject the null
hypothesis
2
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
14
13.2: Testing Categorical
Probabilities: One-Way Table

Inferences can be made on any single proportion as well:

95% confidence interval on the proportion of citizens in the
viewing area with no opinion is
pˆ 4  1.96 pˆ 4
n4
26

 .052
n 500
pˆ 4 (1  pˆ 4 )
.052(.948)
and  pˆ 4 

 .0099
n
500
pˆ 4  1.96 pˆ 4  .052  1.96(.0099)  .052  .019
where pˆ 4 
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
15
13.3: Testing Categorical
Probabilities: Two-Way Table

Chi-square analysis can also be used
to investigate studies based on
qualitative factors.

Does having one characteristic make it
more/less likely to exhibit another
characteristic?
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
16
13.3: Testing Categorical
Probabilities: Two-Way Table
The columns are divided according to the subcategories for one
qualitative variable and the rows for the other qualitative variable.
Column
Row
Column Totals
1
2
1
n11
n12
2
n21
n22



r
nr1
nr2
C1
C1




McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
c
Row Totals
n1c
R1
n2c
R2


nrc
Rr
C1
n
17
13.3: Testing Categorical
Probabilities: Two-Way Table
General Form of a Two-way (Contigency) Table Analysis:
A Test for Independence
H 0 : The two classifications are independent
H a : The two classifications are dependent
Test statistic:  2  
where Eij 
[nij  Eij ]2
Eij
Ri C j
n
and Ri  total for row i, C j  total for row j , n  sample size
Rejection region:  2  2 , df = ( r  1)(c  1)
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
18
13.3: Testing Categorical
Probabilities: Two-Way Table

The results of a survey regarding marital status and
religious affiliation are reported below (Example
13.3 in the text).
Religious Affiliation
Marital
Status
A
B
C
D
None
Totals
Divorced
39
19
12
28
18
116
Married, never
divorced
172
61
44
70
37
384
Totals
211
80
56
98
55
500
H0: Marital status and religious affiliation are independent
Ha: Marital status and religious affiliation are dependent
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
19
13.3: Testing Categorical
Probabilities: Two-Way Table

The expected frequencies (see Figure 13.4) are
included below:
Religious Affiliation
Marital
Status
A
B
C
D
None
Totals
Divorced
39
(48.95)
19
(18.56)
12
(12.99)
28
(27.74)
18
(12.76)
116
Married,
never
divorced
172
(162.05)
61
(61.44)
44
(43.01)
70
(75.26)
37
(42.24)
384
211
80
56
98
55
500
Totals
The chi-square value computed with SAS is 7.1355, with p-value = .1289.
Even at the = .10 level, we cannot reject the null hypothesis.
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
20
13.3: Testing Categorical
Probabilities: Two-Way Table
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
21
13.4: A Word of Caution About
Chi-Square Tests
Relative
ease of
use
Misuse and
misinterpretation
Widespread
applications
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
22
13.4: A Word of Caution About
Chi-Square Tests
Be sure
Sample is from the correct
population
Expected counts are ≥ 5
Avoid Type II errors by
not accepting non-rejected
null hypotheses
Avoid mistaking
dependence with causation
To produce
(possibly) valid
2 results
McClave, Statistics, 11th ed. Chapter 13:
Categorical Data Analysis
23