Transcript Slide 1

The greatest achievement in life is
to be able to get up again from failure.
1
Categorical
Data Analysis
Chapter 5 II: Logistic
Regression for
Qualitative/Mixed Factors
2
Anova Type Representation
of Factors
• Binary response variable: Y ~ Bernoulli(p)
• Qualitative factors: A, B, …
SAS textbook Sec 8.4
3
Example: Berkeley Admissions Data (Table 2.10)
Men
Major
Women
# of
applicants
%
admitted
# of
applicants
%
admitted
A
825
62
108
82
B
560
63
25
68
C
325
37
593
34
D
417
33
375
35
E
191
28
393
24
F
373
6
341
7
4
Anova-Type Logistic Regression
• Only one factor (eg. Department)
 pi 
    Ai
log
1 pi 
• Only main effects of two factors
 p ij 
    Ai  B j
log
1 p 
ij 

• Full model
 p ij 
    Ai  B j  ABij
log
1 p 
ij 

5
Anova-Type Logistic Regression
• Parameterization (in SAS):
The effect at the last level of each factor is
set as 0
• (Regular) logistic regression expression by
dummy variables (one factor example)
 p 
log
    1 x1   2 x2  ...   I 1 xI 1
1 p 
6
Mixed-type Logistic Regression
• Binary response variable: Y ~ Bernoulli(p)
• Qualitative factors: A, B, …
• Quantitative factors: X
SAS textbook Sec 8.5
7
Example: Horseshoe Crab
• Dataset is given in Table 4.3, textbook
• Each female crab had a male crab
attached to her in her nest; other males
residing nearby her are called satellites
• Y= # of satellites
• X= female crab’s color (C), spine condition
(S), weight (Wt), and carapace width (W)
– C = 1 to 4 (light to dark);
– S = 1 to 3 (good to worst)
8
Mixed-Type Logistic Regression
Numerical factors Wt, W and:
•Only one factor (eg. color)
 pi 
    Ci   1Wt   2W
log
1 p i 
• Only main effects of two factors
 p ij 
    Ci  S j   1Wt   2W
log
1 p 
ij 

• With interaction effects (Not the saturated model)
 p ij 
    Ci  S j  CS ij   1Wt   2W
log
1 p 
ij 

9
Mixed-Type Logistic Regression
• Parameterization (PROC GENMOD in SAS):
The effect at the last level of each factor is
set as 0
• (Regular) logistic regression expression by
dummy variables (C + W example)
 p
log
1 p

    1 x1   2 x2  ...   I 1 xI 1  W

10
Quantitative Treatment of
Ordinal factors
• Assign scores to its categories for
each ordinal factor
• Treat the ordinal factors as
quantitative factors to fit GLM
e.g. color
11
Goodness of Fit
• Deviance or comparison to the full
model
• Residuals
• Model comparisons (L-R tests)
12