Transcript Slide 1
The greatest achievement in life is
to be able to get up again from failure.
1
Categorical
Data Analysis
Chapter 5 II: Logistic
Regression for
Qualitative/Mixed Factors
2
Anova Type Representation
of Factors
• Binary response variable: Y ~ Bernoulli(p)
• Qualitative factors: A, B, …
SAS textbook Sec 8.4
3
Example: Berkeley Admissions Data (Table 2.10)
Men
Major
Women
# of
applicants
%
admitted
# of
applicants
%
admitted
A
825
62
108
82
B
560
63
25
68
C
325
37
593
34
D
417
33
375
35
E
191
28
393
24
F
373
6
341
7
4
Anova-Type Logistic Regression
• Only one factor (eg. Department)
pi
Ai
log
1 pi
• Only main effects of two factors
p ij
Ai B j
log
1 p
ij
• Full model
p ij
Ai B j ABij
log
1 p
ij
5
Anova-Type Logistic Regression
• Parameterization (in SAS):
The effect at the last level of each factor is
set as 0
• (Regular) logistic regression expression by
dummy variables (one factor example)
p
log
1 x1 2 x2 ... I 1 xI 1
1 p
6
Mixed-type Logistic Regression
• Binary response variable: Y ~ Bernoulli(p)
• Qualitative factors: A, B, …
• Quantitative factors: X
SAS textbook Sec 8.5
7
Example: Horseshoe Crab
• Dataset is given in Table 4.3, textbook
• Each female crab had a male crab
attached to her in her nest; other males
residing nearby her are called satellites
• Y= # of satellites
• X= female crab’s color (C), spine condition
(S), weight (Wt), and carapace width (W)
– C = 1 to 4 (light to dark);
– S = 1 to 3 (good to worst)
8
Mixed-Type Logistic Regression
Numerical factors Wt, W and:
•Only one factor (eg. color)
pi
Ci 1Wt 2W
log
1 p i
• Only main effects of two factors
p ij
Ci S j 1Wt 2W
log
1 p
ij
• With interaction effects (Not the saturated model)
p ij
Ci S j CS ij 1Wt 2W
log
1 p
ij
9
Mixed-Type Logistic Regression
• Parameterization (PROC GENMOD in SAS):
The effect at the last level of each factor is
set as 0
• (Regular) logistic regression expression by
dummy variables (C + W example)
p
log
1 p
1 x1 2 x2 ... I 1 xI 1 W
10
Quantitative Treatment of
Ordinal factors
• Assign scores to its categories for
each ordinal factor
• Treat the ordinal factors as
quantitative factors to fit GLM
e.g. color
11
Goodness of Fit
• Deviance or comparison to the full
model
• Residuals
• Model comparisons (L-R tests)
12