Transformations - Oregon State University

Download Report

Transcript Transformations - Oregon State University

Fixed vs. Random Effects
 Fixed effect
– we are interested in the effects of the treatments (or blocks) per se
– if the experiment were repeated, the levels would be the same
– conclusions apply to the treatment (or block) levels that were tested
– treatment (or block) effects sum to zero
 Random effect
 i  0
i
– represents a sample from a larger reference population
– the specific levels used are not of particular interest
– conclusions apply to the reference population
• inference space may be broad (all possible random effects)
or narrow (just the random effects in the experiment)
– goal is generally to estimate the variance among treatments
(or other groups)

 Need to know which effects are fixed or random to determine
appropriate F tests in ANOVA
2
T
Fixed or Random?
 lambs born from common parents (same ram and ewe)






are given different formulations of a vitamin supplement
comparison of new herbicides for potential licensing
comparison of herbicides used in different decades
(1980’s, 1990’s, 2000’s)
nitrogen fertilizer treatments at rates of 0, 50, 100, and
150 kg N/ha
years of evaluation of new canola varieties (2008, 2009,
2010)
location of a crop rotation experiment that is conducted
on three farmers’ fields in the Willamette valley (Junction
City, Albany, Woodburn)
species of trees in an old growth forest
Fixed and random models for the CRD
Yij = µ + i + ij
2t   i2 (t  1)
i
variance among fixed treatment effects
Fixed Model
(Model I)
Source
Treatment
Error
Random Model
(Model II)
Source
Treatment
Error
Expected
df
Mean Square
t -1  e2 + r T2
tr -t  e2
df
t -1
tr -t
Expected
Mean Square
2e + r2T
 e2
Yij = µ + i +j + ij
Models for the RBD
Fixed Model
Source
Block
Treatment
Error
df
r-1
t-1
(r-1)(t-1)
Source Random Model
Expected
Mean Square
e2 + t2B
e2 + rT2
 e2
Mixed Model
Source
Block
Treatment
Error
df
r-1
t-1
(r-1)(t-1)
Source
Block
Block
Treatment
Error
e2 + t2B
e2 + rT2
 e2
Treatment
Source
Expected
Mean Square
 + t
 + r

2
e
2
e
2
e
df
r-1
t-1
(r-1)(t-1)
Expected
Mean Square
2
B
2
T
T2    2j (t  1)
j
Block
2
B  
i
Treatment
2
i
(r  1)
RBD Mixed Model Analyses with SAS
Distribution
Treatments Fixed
Blocks Fixed
Treatments Fixed
Blocks Random
Normal
(continuous)
(PROC GLM)
Linear Model (LM)
(PROC MIXED)
Linear Mixed Model
(LMM)
Non-normal
(categories
or counts)
(PROC GENMOD)
Generalized Linear
Model (GLM)
(PROC GLIMMIX)
Generalized Linear
Mixed Model
(GLMM)
 Mixed Models - contain both random and fixed effects
 Note that PROC GLM will only handle LM!
 PROC GLIMMIX can handle all of the situations above
Generalized Linear Models
 An alternative to data transformations
 Principle is to make the model fit the data, rather
than changing the data to fit the model
 Models include link functions that allow
heterogeneous variances and nonlinearity
 Analysis and estimation are based on maximum
likelihood methods
 Becoming more widely used - recommended by the
experts
 Need some understanding of the underlying theory
to implement properly
Notes adapted from ASA GLMM Workshop, Long Beach, CA, 2010
Generalized Linear Models
ANOVA/Regression model is fit to a non-normal data set
Three elements:
1. Random component – a probability distribution for Yi from
the exponential family of distributions
2. Systematic component – represent the linear predictors
(X variables) in the model
i   + i
Form is mean + trt effect
No error term
3. Link function – links the random and systematic elements
i  g(i )
Generalized Linear Models
ANOVA/Regression model is fit to a non-normal data set
Three elements:
1. Random component – a probability distribution for Yi from
the exponential family of distributions (this is known)
2. Systematic component – represent the linear predictors
(X variables) in the model
i   + i
Form is mean + trt effect
No error term
3. Link function – links the random and systematic elements
i  g(i )
Log of Distribution = “Log-Likelihood”
 Binary responses (0 or 1)
 Probability of success follows a binomial distribution
Y
 N Y
N!
NY
NY

P 1  P 
  P 1  P 
Y! N  Y !
Y
 N  Y

N Y 
log   P 1  P  

 Y 
N
 P 
 Y log 
 + N log(1  P )  + log  
1 P 
Y
“canonical parameter” Takes the form Y * function of P
Example – logit link
  
link    log 

1  
µ can only vary from 0 to 1
 can take on any value
Use an inverse function to convert means to
the original scale

e


1+ e
Some Common Distributions & Link(s)
Distribution
Variable
Type
Mean
Variance
Common
Link(s)
Normal
Continuous

2
Identity =
Binomial
Discrete
proportion
N(1  )
logit
probit
Poisson
Discrete
count


=log()

2
log(), 1/
Exponential Continuous


N
Linear Models for an RBD in SAS
 Treatments fixed, Blocks fixed
– PROC GLM (normal) or PROC GENMOD (non-normal)
– all effects appear in model statement
Model Response = Block Treatment;
 Treatments fixed, Blocks random
– PROC MIXED (normal) or PROC GLIMMIX (non-normal)
– Only fixed effects appear in model statement
Model Response = Treatment;
Random Block;
GLIMMIX basic syntax for an RBD
proc glimmix;
class treatment block;
model response = treatment / link=log s dist=poisson;
random block;
lsmeans treatment/ilink diff;
 fixed effects go in the model statement
 random effects go in the random statement
 default means and standard errors from lsmeans statement are
on a log scale
 ilink option gives back-transformed means on original scale and
estimates standard errors on original scale
 diff option requests significant tests between all possible pairs
of treatments in the trial,
Estimation in LMM, GLM, and GLMM
 Does not use Least Squares estimation
 Does not calculate Sums of Squares or Mean Squares
 Estimates are by Maximum Likelihood
Output includes
 Source of variation
 degrees of freedom
 F tests and p-values
 Treatment means and standard errors
 Comparisons of means and standard errors