Single-Factor Studies
Download
Report
Transcript Single-Factor Studies
Single-Factor Studies
KNNL – Chapter 16
Single-Factor Models
• Independent Variable can be qualitative or
quantitative
• If Quantitative, we typically assume a linear,
polynomial, or no “structural” relation
• If Qualitative, we typically have no “structural”
relation
• Balanced designs have equal numbers of replicates at
each level of the independent variable
• When no structure is assumed, we refer to models as
“Analysis of Variance” models, and use indicator
variables for treatments in regression model
Single-Factor ANOVA Model
• Model Assumptions for Model Testing
All probability distributions are normal
All probability distributions have equal variance
Responses are random samples from their
probability distributions, and are independent
• Analysis Procedure
Test for differences among factor level means
Follow-up (post-hoc) comparisons among pairs or
groups of factor level means
Cell Means Model
r # of levels of the study factor
ni # of replicates (cases, units) for the i th level of the study factor
r
n1 ... nr ni nT overall sample size (number of cases)
i 1
Yij i ij
i 1,..., r
j 1,..., ni
Yij Response for j th case within the i th level of the study factor
i Population mean for the i th level of the study factor
ij ~ NID 0, 2 where NID Normally and Independently Distributed
E Yij i 2 Yij 2
Yij are independent N i , 2
Cell Means Model – Regression Form
Suppose r 3 and n1 n2 n3 2
Y11
Y
12
Y
Y 21
Y22
Y31
Y32
1
1
0
X
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
E Y11
1
1
E Y12
E Y21
0
E Y
Xβ
E Y22
0
E Y
0
31
E Y32
0
2 0 0
X'X 0 2 0
0 0 2
1
β 2
3
0
0
1
1
0
0
11
12
ε 21
22
31
32
2 0
0
0
0
0
2
0
0
0
0
0
2
0
0
0
0
0
2
2 ε
I
2
0
0
0
0
0
0
0
0
0 2 0
2
0
0
0
0
0
0
1
0
1
1
0 2
2
0 2
3
1 3
1
3
Y11 Y12
X'Y Y21 Y22
Y31 Y32
^
0 Y11 Y12 Y 1 1
0.5 0
^
^
-1
β = X'X X'Y 0 0.5 0 Y21 Y22 Y 2 2
0
0 0.5 Y31 Y32 Y 3 ^
3
Model Interpretations
• Factor Level Means
Observational Studies – The i represent the
population means among units from the populations
of factor levels
Experimental Studies - The i represent the means of
the various factor levels, had they been assigned to a
population of experimental units
• Fixed and Random Factors
Fixed Factors – All levels of interest are observed in study
Random Factors – Factor levels included in study represent a
sample from a population of factor levels
Fitting ANOVA Models
ni
ni
Notation: Yi Yij
Y i
Y
ij
j 1
ni
j 1
ni
r
r
Y
i
ni
ni
Y Yij
Y
i 1 j 1
Y
i 1 j 1
nT
ij
Y r ni Y i
nT i 1 nT
Least Squares and Maximum Likelihood Estimation
ni
ni
Error Sum of Squares: Q Yij i
r
i 1 j 1
r
2
ij
2
i 1 j 1
nk
Q
2 Ykj k
k
j 1
nk
Q
Setting
0
k
nk
Y
kj
j 1
^
^
nk k k
Likelihood: L 1 ,..., r , | Y11 ,..., Yrnr
2
Y
1
2 2
j 1
kj
nk
Y k
k 1,..., r
1 r ni
2
exp 2 Yij i
n
2 i 1 j 1
ni
maximizing Likelihood wrt 1 ,..., r minimizing Yij i
r
i 1 j 1
^
Fitted values: Y ij Y i
^
Residuals: eij Yij Y ij Yij Y i
2
k 1,..., r
Analysis of Variance
Y
Yij Y Yij Y i
Total
Deviation
Deviation from
trt mean (residual)
Y
r
ni
r
Y
Deviation of trt mean
from overall mean
r
Y
ni
i 1 j 1
Yij Y
2
i 1
ni
r
Yij Y i
i 1 j 1
2
ni
Y i Y i Y Y i Y
ij
i 1 j 1
i
Y i 0
ij
j 1
ni
r
Y i Y
i 1 j 1
r
ni
Total (Corrected) Sum of Squares: SSTO Yij Y
i 1 j 1
ni
r
Treatment Sum of Squares: SSTR Y i Y
i 1 j 1
r
ni
Error Sum of Squares: SSE Yij Y i
i 1 j 1
Note: SSTO SSTR SSE
ni
s
2
i
j 1
Yij Y i
ni 1
2
2
2
2
r
dfTO nT 1
ni Y i Y
i 1
2
dfTR r 1
df E nT r
dfTO dfTR df E
Useful result:
2
Mean Squares: MSTR
ni
ni 1 s
Yij Y i
SSTR
r 1
MSE
2
i
j 1
SSE
nT r
2
r
SSE ni 1 s
i 1
2
i
r
df E nT r ni 1
i 1
ANOVA Table
Source
df
SS
MS
E{MS }
r
Treatments r 1
r
SSTR ni Y i Y
i 1
nT r
Error
nT 1
Total
ni
r
SSE Yij Y i
i 1 j 1
r
ni
2
SSTO Yij Y
i 1 j 1
r
Note: SSTR ni Y nT Y
2
i
2
i 1
SSTR
2
MSTR
i 1
r 1
2
SSE
nT r
MSE
r
ni i
2
r 1
2
2
ni
r
SSE Y ni Y i
i 1 j 1
2
ij
2
i 1
r ni 2 r
Yij E Y E Yij ni i2 nT 2
i 1 j 1 i 1
r
2
2
2
2
r
2
2
Y i
E Y i i
E ni Y i ni i2 r 2
ni
ni
i 1
i 1
E Yij i
2
2
2
ij
2
E Y i i
2
i
r
E Y
n
i 1
i
nT
i
Y
2
2
nT
E Y
2
2
2
nT
2
E nT Y nT 2 2
F-Test for H0: 1 ... r
H 0 : 1 ... r
H A : Not all i are equal
MSTR
MSE
Under null hypothesis (and independence and normality of errors):
Test Statistic: F *
SSTR
2
~ r21
SSE
2
~ n2T r and are independent (independent even if H 0 false)
SSTR
r
1
2
MSTR
~ F r 1, nT r
SSE
MSE
n
r
2 T
Decision Rule: Reject H 0 if F *
MSTR
F 1 ; r 1, nT r
MSE
General Linear Test of Equal Means
H 0 : 1 ... r c
c Common Mean (Reduced Model)
H A : Not all i are equal (Complete Model)
^
^
Reduced Model: c Y Y ij
2
i
SSE ( R) Yij Y ij Yij Y
i 1 j 1
i 1 j 1
r
ni
^
^
r
n
2
SSTO df R nT 1
2
SSE df F nT r
^
Complete (Full) Model: i Y i Y ij
2
r
i
SSE ( F ) Yij Y ij Yij Y i
i 1 j 1
i 1 j 1
r
ni
^
n
SSE ( R) SSE ( F ) SSTO SSE SSTR
n
1
n
r
T T r 1 MSTR
df R df F
*
Test Statistic: F
SSE ( F )
SSE
SSE MSE
df
n r
n r
F
T
T
Factor Effects Model
Alternative Form of Model (Necessary for interactions in multi-factor models):
i i i
Yij i ij
i i "Effect" of i th factor level
ij ~ NID 0, 2
Defining :
r
Unweighted Mean:
i 1
r
i
i 1
i 1
r
Weighted Mean: wi i
r
r
s.t.
i
0
w 1
i 1
i
r
w
i 1
i i
0
Weights may represent the population sizes in observational studies
Note: 1 ... r
1 ... r 0
Regression Approach – Factor Effects Model
Suppose r 3 and n1 n2 n3 2 and Unweighted Mean Model: 1 2 3 0 3 1 2
Y11
Y
12
Y
Y 21
Y22
Y31
Y32
1 1 0
1 1 0
1 0 1
X
1 0 1
1 1 1
1
1
1
β 1
2
11
12
ε 21
22
31
32
E Y11
1 1
1 1 0
1 1 0
E Y12
1 1
E Y21
1 0 1 2 2
E Y
Xβ
1
E
Y
1
0
1
22
2 2
2
E Y
1 1 1 1 2 3
31
E Y32
1 1 1
1 2 3
6 0 0
X'X 0 4 2
0 2 4
Y11 Y12 Y21 Y22 Y31 Y32
X'Y Y11 Y12 Y31 Y32
Y21 Y22 Y31 Y32
^
0
0 Y11 Y12 Y21 Y22 Y31 Y32 Y
1/ 6
^
^
-1
β = X'X X'Y 0
1/ 3 1/ 6 Y11 Y12 Y31 Y32 Y 1 Y 1
0 1/ 6 1/ 3 Y21 Y22 Y31 Y32 Y 2 Y ^
2
Factor Effects Model with Weighted Mean
ni
Weights are relative sample sizes: wi
nT
r
r
r
ni
wi i 0 i ni i 0
i 1
i 1 nT
i 1
r 1
r 1
ni
nr r ni i r i
i 1
i 1 nr
Yij 1 X ij1 ... r 1 X ij ,r 1 ij
1 if i 1
n1
X ij1
if i r
nr
0 otherwise
...
1 if i r 1
nr 1
X ij ,r 1
if i r
nr
0 otherwise
Regression for Cell Means Model
Yij i ij 1 X ij1 ... r X ijr
1 if i 1
X1
0 if i 1
1
β
r
1 if i r
Xr
0 if i r
...
Y 1
β
Y r
^
When fitting with a regression package, no intercept is used
Under H 0 : 1 ... r c :
1
X
1
β c
^
β Y
Randomization (aka Permutation) Tests
• Treats the units in the study as a finite population of
units, each with a fixed error term ij
• When the randomization procedure assigns the unit to
treatment i, we observe Yij = . i + ij
• When there are no treatment effects (all i = 0),
Yij = . ij
• We can compute a test statistic, such as F* under all (or
in practice, many) potential treatment arrangements of
the observed units (responses)
• The p-value is measured as proportion of observed test
statistics as or more extreme than original.
• Total number of potential permutations = nT!/(n1!...nr!)
Power Approach to Sample Size Choice - Tables
When the means are not all equal, the F -statistic is non-central F :
r
F ~ F r 1, nT r , where
*
1
n
r
i 1
i
i
r
When all sample sizes are equal:
1
r
2
n i
where
n
where
r
The power of the test, when conducted at the significance level of :
Pr F * F 1 ; r 1, nT r ,
i
nT
r
2
i 1
i
i 1
i 1
i
r
See Table B.11
Choose sample sizes so that the power is sufficiently high for specific
1 ,..., r or effects levels of interest 1 ,..., r
max i min i
Table B.12 is simple to use for equal sample sizes and
mean levels of interest
Power Approach to Sample Size Choice – R Code
When the means are not all equal, the F -statistic is non-central F :
r
F ~ F r 1, nT r , where
*
n
i 1
i
i
where
2
r
When all sample sizes are equal:
r
2
n i
i 1
n
i 1
nT
r
2
where
2
i 1
r
The power of the test, when conducted at the significance level of :
i
Pr F * F 1 ; r 1, nT r | F * ~ F r 1, nT r ,
In R:
F 1 ; r 1, nT r qf (1 , r 1, nT r )
Power = 1 1 pf qf (1 , r 1, nT r ), r 1, nT r ,
i
i
Power Approach to Finding “Best” Treatment
Goal: Determining the best treatment (one with highest or lowest mean):
1 Probability the treatment with highest (lowest) sample mean
has highest (lowest) population mean
Difference between highest (lowest) mean and 2nd highest (lowest) mean
r Number of treatments
n
for various r ,1
Solve for n for given ,
Table B.13 gives