Statistics - University of Delaware

Download Report

Transcript Statistics - University of Delaware

Statistics
April 16, 2009
Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D.
Nemours Bioinformatics Core Facility
Nemours Biomedical Research
Experimental Design Terminology
•
An Experimental Unit is the entity on which measurement or an
observation is made. For example, subjects are experimental units
in most clinical studies.
•
Homogeneous Experimental Units: Units that are as uniform as
possible on all characteristics that could affect the response.
•
Randomization is the process of assigning experimental units
randomly to different experimental groups.
•
Replication is the repetition of an entire experiment or
portion of an experiment under two or more sets of
conditions.
Nemours Biomedical Research
Experimental Design Terminology
•
A Factor is a controllable independent variable that is being
investigated to determine its effect on a response. E.g.
treatment group is a factor.
•
Factors can be fixed or random
•
–
Fixed -- the factor can take on a discrete number of values
and these are the only values of interest.
–
Random -- the factor can take on a wide range of values
and one wants to generalize from specific values to all
possible values.
Each specific value of a factor is called a level. E.g. treatment
group: A, B, and placebo. Then all these are three levels.
Nemours Biomedical Research
Experimental Design Terminology
•
Effect is the change in the average response
between two factor levels.
• That is, factor effect = average response at one level
– average response at a second level.
Nemours Biomedical Research
Experimental Design Terminology
• Interaction is the joint factor effects in which the effect of
one factor depends on the levels of the other factors.
No interaction effect of
factor A and B
Interaction effect of factor A
and B
Nemours Biomedical Research
Analysis of Variance (ANOVA)
•
The analysis of variance (ANOVA) is a technique of decomposing the total
variability of a response variable into:
•
Variability due to the experimental factor(s) and…
•
Variability due to error (i.e., factors that are not accounted for in the
experimental design).
•
The basic purpose of ANOVA is to test the equality of several means.
•
A fixed effect model includes only fixed factors in the model.
•
A random effect model includes only random factors in the model.
•
A mixed effect model includes both fixed and random factors in the model.
Nemours Biomedical Research
The basic ANOVA situation
 Type of variables: Quantitative response and Categorical (factor)
predictors (independent variable).
 Main Question: Are mean response measures of different groups
are equal?
 One categorical variable with only 2 levels (groups):
 2-sample t-test
 One categorical variable with more than two levels (groups):
 One way ANOVA
 Two or more categorical variable, each with at least two or more
levels (groups) of each:
 Factorial ANOVA
Nemours Biomedical Research
Graphical Investigation
Graphical investigation:
• side-by-side box plots
• multiple histograms
Side by Side Boxplots
Nemours Biomedical Research
One-way analysis of Variance
•
One factor of k levels or groups. E.g., 3 treatment groups in our default
data
•
Total variation of observations (SST) can be split in two components:
variation between groups (SSG) and variation within groups (SSE).
•
Variation between groups is due to the difference in different groups.
E.g. different treatment groups or different doses of the same
treatment.
•
Variation within groups is the inherent variation among the
observations within each group.
•
Completely randomized design (CRD) is an example of one-way
analysis of variance.
Nemours Biomedical Research
One-way analysis of Variance
•
Model:
–
–
–
–
•
yij = µ + ai + eij
Where yij is the ith observation of the jth group
ai is the effect of the ith group
µ is the grand mean and eij is the error.
Assumptions:
–
–
–
Observations yij are independent.
eij are normally distributed with mean zero and constant standard
deviation.
The second assumption implies that response variable for each group is
normal (Check using q-q plot, histogram, or test for normality) and
standard deviations for all groups are equal (rule of thumb: ratio of largest
to smallest are approximately 2:1).
Nemours Biomedical Research
One-way analysis of Variance
• Hypothesis:
Ho: Means of all groups are equal.
Ha: At least one of them is not equal to other.
– doesn’t say how or which ones differ.
– Can follow up with “multiple comparisons”
• ANOVA Table for one way classified data
Sources of
Variation
Sum of
Squares
df
Mean Sum of
Squares
F-Ratio
Group
SSG
k-1
MSG=SSG/k-1
F=MSG/MSE
Error
SSE
n-k
MSE=SSE/n-k
Total
SST
n-1
Note: Large F means that MSG is large compared to MSE
Nemours Biomedical Research
Summary One-way ANOVA
 Significance of the differences between the groups depends on
•
the difference in the means
•
the standard deviations of each group
•
the sample sizes
• A useful web (thanks Bette for sending this website for the class):
www.psych.utah.edu/stat/introstats/anovaflash.html
 ANOVA determines P-value from the F statistic
Nemours Biomedical Research
Multiple comparisons
•
If the F test is significant in ANOVA table, then we intend
to find the pairs of groups are significantly different.
Following are the commonly used procedures:
–
Fisher’s Least Significant Difference (LSD)
–
Tukey’s HSD method
–
Bonferroni’s method
–
Scheffe’s method
–
Dunn’s multiple-comparison procedure
–
Dunnett’s Procedure
Nemours Biomedical Research
One-way ANOVA – Rcmdr Demo
•
Make sure that group variables are in factor form.
•
Data->Manage variables in active data set -> Convert numeric
variables to factor -> pick variables, select supply level names or Use
numbers for Factor levels, Give a new name or leave as the same
name.
•
To run one-way ANOVA:
•
Statistics -> Means-> One-way ANOVA -> Model Name, pick
response variable (e.g, PLUC.post), pick group variable (e.g. grp),
select pairwise comparisons of means
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on
treatment groups
> Model1 <- aov(PLUC.post ~ grp, data=data)
> summary(Model1)
Df
grp
Residuals
Sum Sq Mean Sq F value
2 157.282
78.641
57 237.880
4.173
Pr(>F)
18.844 5.225e-07 ***
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> numSummary(data$PLUC.post , groups=data$grp, statistics=c("mean",
"sd"))
mean
sd
n
1 10.15657 2.227334 20
2 12.15339 1.874268 20
3 14.12242 2.011497 20
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on
treatment groups
> .Pairs <- glht(Model1, linfct = mcp(grp = "Tukey"))
> confint(.Pairs)
Simultaneous Confidence Intervals
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = PLUC.post ~ grp, data = data)
Estimated Quantile = 2.4064
95% family-wise confidence level
Linear Hypotheses:
Estimate lwr
upr
2 - 1 == 0 1.9968
0.4422 3.5514
3 - 1 == 0 3.9658
2.4113 5.5204
3 - 2 == 0 1.9690
0.4144 3.5236
Nemours Biomedical Research
One-way ANOVA output: Pluc.pre on
treatment groups
95% family-wise confidence level
2-1
(
)
(
3-1
3-2
)
(
)
1
2
3
4
Linear Function
Nemours Biomedical Research
5
Analysis of variance of factorial
experiment (Two or more factors)
• Factorial experiment:
– The effects of the two or more factors including their
interactions are investigated simultaneously.
– For example, consider two factors A and B. Then total
variation of the response will be split into variation for A,
variation for B, variation for their interaction AB, and variation
due to error.
Nemours Biomedical Research
Analysis of variance of factorial
experiment (Two or more factors)
• Model with two factors (A, B) and their interactions:
yijk     i   j  ( )ij  eijk
 is thegeneralmean
αi is theeffectof ith levelof thefactorA
 j is theeffectof jth levelof thefactorB
(β)ij is theinteraction effectof ith levelA and jth levelof B
eijk is theerror
• Assumptions: The same as in One-way ANOVA.
Nemours Biomedical Research
Analysis of variance of factorial
experiment (Two or more factors)
•
Null Hypotheses:
•
Hoa: Means of all groups of the factor A are equal.
•
Hob: Means of all groups of the factor B are equal.
•
Hoab:(αβ)ij = 0, i. e. two factors A and B are independent
Nemours Biomedical Research
Two Factor ANOVA
• ANOVA for two factors A and B with their interaction AB.
Sources of
Variation
Sum of
Squares
df
Mean Sum of
Squares
F-Ratio
Main Effect A SSA
k-1
MSA=SSA/k-1
MSA/MSE
Main Effect B SSB
P-1
MSB=SSB/p-1
MSB/MSE
Interaction
Effect AB
SSAB
(k-1)(p-1)
MSAB=SSAB/
(k-1)(p-1)
MSAB/MSE
Error
SSE
kp(r-1)
MSE=SSE/
kp(r-1)
Total
SST
Kpr-1
Nemours Biomedical Research
Two-factor ANOVA - Rcmdr Demo
Statistics -> Means-> One-way ANOVA -> Model Name, pick
response variable (e.g, PLUC.post), pick group variable
(e.g. grp), select pairwise comparisons of means
Nemours Biomedical Research
Two- way ANOVA output: Pluc.pre
on treatment groups and ped
Model2 <- (lm(PLUC.post ~ grp*Ped, data=data))
> Anova(Model2)
Anova Table (Type II tests)
Response: PLUC.post
Sum Sq Df F value
Pr(>F)
grp
157.282 2 19.2016 5.024e-07 ***
Ped
0.842 1 0.2055
0.6521
grp:Ped
15.880 2 1.9387
0.1538
Residuals 221.159 54
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), mean, na.rm=TRUE)
+
# means
Ped
grp
1
2
1 9.591563 10.72158
2 12.397521 11.90926
3 14.798619 13.44621
Nemours Biomedical Research
Two- way ANOVA output: Pluc.pre
on treatment groups and ped
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), sd, na.rm=TRUE)
+
# std. deviations
Ped
grp
1
2
1 2.656100 1.645896
2 1.850990 1.964046
3 2.034403 1.840354
> tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), function(x)
+
sum(!is.na(x))) # counts
Ped
grp 1 2
1 10 10
2 10 10
3 10 10
Nemours Biomedical Research
Repeated Measures
•
The term repeated measures refers to data sets with
multiple measurements of a response variable on the same
experimental unit or subject.
•
In repeated measurements designs, we are often concerned
with two types of variability:
–
Between-subjects - Variability associated with different groups of subjects
who are treated differently (equivalent to between groups effects in oneway ANOVA)
–
Within-subjects - Variability associated with measurements made on an
individual subject.
Nemours Biomedical Research
Repeated Measures
•
Examples of Repeated Measures designs:
A. Two groups of subjects treated with different drugs for
whom responses are measured at six-hour increments for
24 hours. Here, DRUG treatment is the between-subjects
factor and TIME is the within-subjects factor.
B. Students in three different statistics classes (taught by
different instructors) are given a test with five problems and
scores on each problem are recorded separately. Here
CLASS is a between-subjects factor and PROBLEM is a
within-subjects factor.
Nemours Biomedical Research
Repeated Measures
•
When measures are made over time as in example A we want to
assess:



•
how the dependent measure changes over time independent of treatment
(i.e. the main effect of time)
how treatments differ independent of time (i.e., the main effect of
treatment)
how treatment effects differ at different times (i.e. the treatment by time
interaction).
Repeated measures require special treatment because:


Observations made on the same subject are not independent of each
other.
Adjacent observations in time are likely to be more correlated than nonadjacent observations
Nemours Biomedical Research
Response
Repeated Measures
Time
Nemours Biomedical Research
Repeated Measures
• Methods of repeated measures ANOVA
 Univariate - Uses a single outcome measure.
 Multivariate - Uses multiple outcome measures.
 Mixed Model Analysis - One or more factors (other than subject) are
random effects.
• We will discuss only univariate approach
Nemours Biomedical Research
Repeated Measures
• Assumptions:
 Subjects are independent.
 The repeated observations for each subject follows a
multivariate normal distribution
 The correlation between any pair of within subjects
levels are equal. This assumption is known as
sphericity.
Nemours Biomedical Research
Repeated Measures
•
Test for Sphericity:

Mauchley’s test
•
Violation of sphericity assumption leads to inflated F statistics and hence
inflated type I error.
•
Three common corrections for violation of sphericity:

Greenhouse-Geisser correction

Huynh-Feldt correction

Lower Bound correction
•
All these three methods adjust the degrees of freedom using a correction
factor called Epsilon.
•
Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within
subject factor.
Nemours Biomedical Research
Repeated Measures - R Demo
• The following R commands will arrange the data in default.csv for the
ANOVA of repeated measures (you can also use R functions gl() for
this manipulation), attach(x)
sid1<- c(sid,sid)
grp1<-c(grp,grp)
plc<- c(PLUC.pre, Pluc.post)
time<-c(rep(1,60),rep(2,60))
• The following code is for repeated measures ANOVA
summary(aov(plc~factor(grp1)*factor(time) + Error(factor(sid1))))
Nemours Biomedical Research
Repeated Measures R output:
Error: factor(sid1)
Df Sum Sq Mean Sq F value
Pr(>F)
factor(grp1) 2 121.640 60.820 16.328 2.476e-06 ***
Residuals
57 212.313
3.725
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
Df Sum Sq Mean Sq F value
Pr(>F)
factor(time)
1 172.952 172.952 33.8676 2.835e-07 ***
factor(grp1):factor(time) 2 46.818 23.409 4.5839
0.01426 *
Residuals
57 291.083
5.107
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Nemours Biomedical Research
Thank you
Nemours Biomedical Research