Basics of Statistics

Download Report

Transcript Basics of Statistics

Experimental Design Terminology
 An Experimental Unit is the entity on which
measurement or an observation is made. For
example, subjects are experimental units in most
studies.
 Homogeneous Experimental Units: Units that
are as uniform as possible on all characteristics
that could affect the response.
 A Block is a group of homogeneous
experimental units.
 A Replication is the repetition of an entire
experiment or portion of an experiment under
two or more sets of conditions.
 A Factor is a controllable independent variable
that is thought to influence the response.
Experimental Design Terminology
 Factors can be fixed or random
 Fixed -- the factor can take on a discrete
number of values and these are the only
values of interest.
 Random -- the factor can take on a wide
range of values and one wants to
generalize from specific values to all
possible values.
 Each specific value of a factor is
called a level.
Experimental Design Terminology
 A covariate is an independent variable not
manipulated by the experimenter but still affecting
the response.
 Effect is the change in the average response
between two factor levels.
 Interaction is the joint factor effects in which the
effect of each factor depends on the level of the
other factors.
 A Design (layout) of the experiment includes the
choice of factors and factor-levels, number of
replications, blocking, randomization, and the
assignment of factor –level combination to
experimental units.
Experimental Design Terminology
 Sum of Squares (SS): Let x1, …, xn be n observations.
The sum of squares of these n observations can be
written as x12 + x22 +…. xn2. In notations, ∑xi2. In a
corrected
form this sum of squares can be written as
n
2
(
x

x
)
.
 i
i 1
 Degrees of freedom (df): Number of quantities of the
form – Number of restrictions. For example, in the
following SS, we need n nquantities of the form xi  x .
There is one constraint  ( xi  x )  0. So the df for this SS
i 1
is n – 1.
 Mean Sum of Squares (MSS): The SS divided by it’s
df.
Experimental Design Terminology
 The analysis of variance (ANOVA) is a technique of
decomposing the total variability of a response variable
into:
 Variability due to the experimental factor(s) and…
 Variability due to error (i.e., factors that are not
accounted for in the experimental design).
 The basic purpose of ANOVA is to test the equality of
several means.
 A fixed effect model includes only fixed factors in the
model.
 A random effect model includes only random factors
in the model.
 A mixed effect model includes both fixed and random
factors in the model.
One-way analysis of Variance
 One factor of k levels or groups. E.g., 3 treatment groups
in a drug study.
 The main objective is to examine the equality of means of
different groups.
 Total variation of observations (SST) can be split in two
components: variation between groups (SSG) and
variation within groups (SSE).
 Variation between groups is due to the difference in
different groups. E.g. different treatment groups or
different doses of the same treatment.
 Variation within groups is the inherent variation among
the observations within each group.
 Completely randomized design (CRD) is an example of
one-way analysis of variance.
One-way analysis of variance
Consider a layout of a study with 16 subjects
that intended to compare 4 treatment groups
(G1-G4). Each group contains four subjects.
S1 S2 S3 S4
G1
Y11 Y12 Y13 Y14
G2
Y21 Y22 Y23 Y24
G3
Y31 Y32 Y33 Y34
G4
Y41 Y42 Y43 Y44
One-way analysis of Variance
 Model:
yij     i  eij
where, yij is theith observation of jth group,
 i is theeffectof ith group,
 is thegeneralmean and eij is theerror.
 Assumptions:
 Observations yij are independent.
 eij are normally distributed with mean zero and
constant standard deviation.
One-way analysis of Variance
 Hypothesis:
Ho: Means of all groups are equal.
Ha: At least one of them is not equal to other.
 Analysis of variance (ANOVA) Table for one
way classified data
Sources of
Variation
Sum of
Squares
df
Mean Sum F-Ratio
of Squares
Group
SSG
k-1
MSG=
SSG/k-1
Error
SSE
n-k
MSE=
SSE/n-k
Total
SST
n-1
F=
MSG/MSE
Multiple comparisons
 If the F test is significant in ANOVA
table, then we intend to find the pairs of
groups are significantly different.
Following are the commonly used
procedures:




Fisher’s Least Significant Difference (LSD)
Tukey’s method
Bonferroni’s method
Scheffe’s method
One-way ANOVA - Demo
 MS Excel:




Put response data (hgt) for each groups (grp) in
side by side columns (see next slides)
Select Tools/Data Analysis and select Anova:
Single Factor from the Analysis Tools list. Click
OK.
Select Input Range (for our example a1: c21), mark
on Group by columns and again mark labels in first
row.
Select output range and then click on ok.
One-way ANOVA MS-Excel Data
Layout
grp1
52.50647
43.14426
55.91853
45.68187
54.76593
45.27999
41.95513
43.67319
44.01857
44.54295
46.1765
40.02826
58.09449
43.25757
42.07507
46.80839
43.80479
57.60508
42.47022
57.4945
grp2
47.83167
43.69665
40.73333
46.56424
53.03273
56.41127
43.69635
50.92119
40.36097
51.95264
57.1638
50.98321
49.23148
48.01014
45.98231
55.32514
47.21837
40.30104
50.56327
55.55145
grp3
43.82211
40.41359
57.65748
41.96207
49.08051
55.97797
47.71419
41.75912
46.21859
53.61966
49.63484
57.68229
49.08471
40.59866
38.99055
54.74286
53.74624
44.82507
45.85581
41.36863
One-way ANOVA MS-Excel output:
height on treatment groups
Anova: Single Factor
SUMMARY
Groups
grp1
grp2
grp3
Count
Sum
Average Variance
20 949.3017 47.46509 36.88689
20 975.5313 48.77656 28.10855
20 954.7549 47.73775 37.50739
ANOVA
Source of Variation
SS
Between Groups
19.15632
Within Groups
1947.554
Total
1966.71
df
MS
F
P-value
F crit
2 9.578159 0.280329 0.756571 3.158846
57 34.16761
59
One-way ANOVA - Demo
 SPSS:
 Select Analyze > Compare Means > One –
Way ANOVA
 Select variables as Dependent List: response
(hgt), and Factor: Group (grp) and then
make selections as follows-click on Post Hoc
and select Multiple comparisons (LSD,
Tukey, Bonferroni, or Scheffe), click options
and select Homogeneity of variance test,
click continue and then Ok.
One-way ANOVA SPSS output:
height on treatment groups
ANOVA
hgt
Between Groups
Sum of
Squares
19.156
Within Groups
Total
df
2
Mean Square
9.578
1947.554
57
34.168
1966.710
59
F
.280
Sig.
.757
One-way ANOVA R output: height
on treatment groups
>grp<- as.factor(grp)
> summary(aov(hgt~grp))
Df Sum Sq Mean Sq F value Pr(>F)
grp
2
19.16
9.58 0.2803 0.7566
Residuals
57 1947.55
34.17
Analysis of variance of factorial
experiment (Two or more factors)
 Factorial experiment: The effects of the
two or more factors including their
interactions are investigated
simultaneously. For example, consider
two factors A and B. Then total variation
of the response will be split into
variation for A, variation for B, variation
for their interaction AB, and variation
due to error.
Analysis of variance of factorial
experiment (Two or more factors)
 Model with two factors (A, B) and their
interactions:
yijk     i   j  ( )ij  eijk
 is thegeneralmean
αi is theeffectof ith levelof thefactorA
 j is theeffectof jth levelof thefactorB
(β)ij is theinteraction effectof ith levelA and jth levelof B
eijk is theerror
 Assumptions: The same as in One-way ANOVA.
Analysis of variance of factorial
experiment (Two or more factors)
 Null Hypotheses:
 Hoa: Means of all groups of the factor A
are equal.
 Hob: Means of all groups of the factor B
are equal.
 Hoab:(αβ)ij = 0, i. e. two factors A and B
are independent
Analysis of variance of factorial
experiment (Two or more factors)
 ANOVA for two factors A and B with their
interaction AB.
Sources of
Variation
Sum of
Squares
df
Mean Sum of
Squares
F-Ratio
Main Effect A SSA
k-1
MSA=SSA/k-1
MSA/MSE
Main Effect B SSB
P-1
MSB=SSB/p-1
MSB/MSE
Interaction
Effect AB
SSAB
(k-1)(p-1)
MSAB=SSAB/
(k-1)(p-1)
MSAB/MSE
Error
SSE
kp(r-1)
MSE=SSE/
kp(r-1)
Total
SST
Kpr-1
Two-factor with replication - Demo
 MS Excel:
 Put response data for two factors like in a lay out
like in the next page.
 Select Tools/Data Analysis and select Anova:
Two Factor with replication from the Analysis
Tools list. Click OK.
 Select Input Range and input the rows per sample:
Number of replications (excel needs equal
replications for every levels). Replication is 2 for the
data in the next page.
 Select output range and then click on ok.
Two-factor ANOVA MS-Excel Data
Layout:
shade1
shade1
shade1
shade1
shade1
shade1
shade1
shade1
shade1
shade1
shade2
shade2
shade2
shade2
shade2
shade2
shade2
shade2
shade2
shade2
grp1
grp2
grp3
52.50647
47.83167
43.82211
43.14426
43.69665
40.41359
55.91853
40.73333
57.65748
45.68187
46.56424
41.96207
54.76593
53.03273
49.08051
45.27999
56.41127
55.97797
41.95513
43.69635
47.71419
43.67319
50.92119
41.75912
44.01857
40.36097
46.21859
44.54295
51.95264
53.61966
46.1765
57.1638
49.63484
40.02826
50.98321
57.68229
58.09449
49.23148
49.08471
43.25757
48.01014
40.59866
42.07507
45.98231
38.99055
46.80839
55.32514
54.74286
43.80479
47.21837
53.74624
57.60508
40.30104
44.82507
42.47022
50.56327
45.85581
57.4945
55.55145
41.36863
Two-factor ANOVA MS-Excel
output: height on treatment group,
shades, and their interaction
Anova: Two-Factor With Replication
SUMMARYgrp1
grp2
grp3
Total
shade1
Count
Sum
Average
Variance
10
10
10
30
471.4869 475.201 478.2253 1424.913
47.14869 47.5201 47.82253 47.49711
26.773 29.80231 38.11298 29.46458
shade2
Count
Sum
Average
Variance
10
10
10
30
477.8149 500.3302 476.5297 1454.675
47.78149 50.03302 47.65297 48.48916
50.87686 26.02977 41.05331 37.84396
Total
Count
Sum
Average
Variance
20
20
20
949.3017 975.5313 954.7549
47.46509 48.77656 47.73775
36.88689 28.10855 37.50739
ANOVA
Source of Variation SS
Sample
14.76245
Columns
19.15632
Interaction 18.9572
Within
1913.834
Total
1966.71
df
1
2
2
54
59
MS
F
P-value
F crit
14.76245 0.416532 0.521406 4.01954
9.578159 0.270254 0.764212 3.168246
9.478602 0.267445 0.766341 3.168246
35.44137
Two-factor ANOVA - Demo
 SPSS:
 Select Analyze > General Linear Model > Univariate
 Make selection of variables e.g. Dependent
varaiable: response (hgt), and Fixed Factor: grp and
shades.
 Make other selections as follows-click on Post Hoc
and select Multiple comparisons (LSD, Tukey,
Bonferroni, or Scheffe), click options and select
Homogeneity of variance test, click continue and
then Ok.
Two-factor ANOVA SPSS output:
height on treatment group, shades,
and their interaction
Between-Subjects Factors
N
grp
Shades
1
20
2
20
3
20
1
30
2
30
Tests of Between-Subjects Effects
Dependent Variable: hgt
Source
Corrected Model
Intercept
Type III Sum
of Squares
52.876(a)
df
Mean Square
F
Sig.
5
10.575
.298
.912
138200.445
1
138200.445
3899.410
.000
grp
19.156
2
9.578
.270
.764
Shades
14.762
1
14.762
.417
.521
grp * Shades
.267
.766
18.957
2
9.479
Error
1913.834
54
35.441
Total
140167.155
60
1966.710
a R Squared = .027 (Adjusted R
Squared = -.063)
59
Corrected Total
Two-factor ANOVA R output: height
on treatment group, shades, and
their interaction
> grp<- as.factor(grp)
> shades<-as.factor(shades)
> summary(aov(hgt~grp+shades+grp*shades))
Df Sum Sq Mean Sq F value Pr(>F)
grp
2
19.16
9.58 0.2703 0.7642
shades
1
14.76
14.76 0.4165 0.5214
grp:shades
2
18.96
9.48 0.2674 0.7663
Residuals
54 1913.83
35.44