Basic principles of probability theory

Download Report

Transcript Basic principles of probability theory

• • • •

Design of experiment II: ANOVA and multiple hypotheses testing

ANOVA Interactions Contrasts Multiple hypothesis tests

ANOVA tables

Standard ANOVA tables look like effect v ...

1 v p error total df d ...

d d N 1 p e SSh SS ...

SS p SS e SS t 1 MS MS ...

MS MS 1 p e =SS =SS =SS 1 p e /d /d /d e 1 p F MS 1 /MS e … MS p /MS e prob pr 1 … pr p Where v 1 ,,,v p are different factors, each v i is a vector of levels. For each v i we want to test if all levels have the same effect (H 0 ). df-s is degrees of freedom corresponding hypothesis.

SSh-s are corresponding sum of the squares (h denotes hypothesis). F is F-value used for F distribution. Its degrees of freedom are (

d i ,d e

). Prob-s are corresponding probability. If a particular probability is very low (less than 0.05) then we reject hypothesis that all levels of a factor have the same effect. These values are calculated using likelihood ratio test. Let us say we want to test hypothesis: H 0: effect of all v i -s are equal to each other vs H 1 :at least one of the effects is different Then we maximise likelihood under null hypothesis to find corresponding variance and then we maximise the likelihood under alternative hypothesis and find corresponding variance.

Then we calculate sum of the squares for null and alternative hypotheses and find F statistics

LR test for ANOVA

Suppose variances are:  ˆˆ 2 for null hypothesis  ˆ 2 for the alternativ e hypothesis Then mean sum of the squares for the null and alternative hypotheses as:

SSh

  2   2 and for the alternative hypothesis

dfh

SSe   2

dfe

Since the first sum of the squares is multiple of is multiple of 

2

 with degrees of freedom

2

with degrees of freedom

dfe dfh

and the second sum of squares and they are independent and multiplicative factorz for both of the are same then their ratio has F-distribution with degrees of freedom (

dfh,dfe

). Degrees of freedom of hypothesis is found using the number of levels of the factor minus 1 in the simplest case.

Degrees of freedom of hypothesis is defined by the number of constraints it implies. Degrees of freedom of error is the number of observations minus the number of (estimated) parameters Using this type of ANOVA tables we can only tell if there is significant differences between means. It does not tell which one is significantly different.

II

Example: Two way ANOVA

Let us consider an example taken from Box, Hunter and Hunter. Experiment was done on animals. Survival times of the animals for various poisons and treatment was tested.

Table is: A B treatment C D poisons I 0.31

0.45

0.46

0.43

0.82 0.43

1.10

0.45

0.88 0.63

0.72 0.76

0.45

0.71

0.66

0.62

0.36

0.29

0.40

0.23

0.92 0.44

0.61 0.35

0.49 0.31

1.24 0.40

0.56

1.02

0.71

0.38

III 0.22

0.21

0.18

0.23

0.30 0.23

0.37 0.25

0.38 0.24

0.29

0.22

0.30

0.36

0.31

0.33

ANOVA table

ANOVA table produced by R: Df Sum Sq pois 2 1.03828

treat 3 pois:treat 6 Residuals 36 0.92569

0.25580

0.83013

Mean Sq 0.51914

0.30856

0.04263

0.02306

F value Pr(>F) 22.5135 4.551e-07 *** 13.3814 5.057e-06 *** 1.8489

0.1170

Most important values are F and Pr(>F).

In this table we have tests for main effects - pois. and treat. Moreover we have “interactions” between these two factors. If there are significant interactions then it would be difficult to separate effects of these two factors. They should be considered simultaneously. In this case pr. for interaction is not very small and it is not large enough to discard interaction effects with confidence. In these situations transformation of the observations might help.

Let us consider ANOVA table for the transformed observations. Let us use transformation 1/y. Now ANOVA table looks like: pois treat Df 3 2 pois:treat 6 Residuals 36 Sum Sq 34.903

20.449

1.579

8.697

Mean Sq 17.452

6.816

0.263

0.242

F value 72.2347

28.2131

1.0892

Pr(>F) 2.501e-13 *** 1.457e-09 *** 0.3874

ANOVA table

According to the table after transformation pr. corresponding to interaction terms is high. It means that interaction for the transformed observations is not significant. We could remove interaction terms. We can build ANOVA table without the interactions. It will look like: pois treat Df 3 2 Residuals 42 Sum Sq 34.903

20.449

10.276

Mean Sq 17.452

6.816

0.245

F value 71.326

27.858

Pr(>F) 3.124e-14 *** 4.456e-10 *** Now we can say that there is significant differences between poisons as well as treatments.

Sometimes it is useful to use transformation to reduce the effect of interactions. For this several different transformations (inverse, inverse square, log) could be used. Box and Cox transformation (boxcox command in R) may help to find transformation.

Interactions

For one way ANOVA we need 1) to find out if there is significant effect (difference between means); 2) to analyse and find out where this difference is located For two way or higher levels we need first to consider interactions. We would like to remove interactions if we can. Otherwise tests for the main effects (differences between means of observations corresponding to different levels of factors) may not be reliable. If we see significant interactions then we can try transformations of the data. Box and Cox family of variance stabilising transformations is one way of removing interactions. If we find necessary transformation then we can carry out further analysis using the transformed data or use GLM with corresponding link function.

If interactions can not be removed then we may need to combine all pairs together and work with them as if we have one way anova with

IJ

levels, where

I

is the number of levels for the first factor and

J

is the number of levels for the second factor.

What to do if there is an effect

ANOVA tables will tell us if there is an effect somewhere. But it does not say where is this effect. If we can make conclusions that there is significant differences then we should carry on and find out where are these differences. One way (not recommended) is to look at the confidence intervals using

confint

. For example for poison data: 2.5 % 97.5 % (Intercept) 0.22201644 0.40631690

poison1 -0.09299276 0.01986776

poison2 -0.13414253 -0.06898247

treatB 0.23217989 0.49282011

treatC -0.05198677 0.20865344

treatD 0.08967989 0.35032011

Since these intervals were calculated after making decision that there are significant main effects they meant to be more reliable (we know that there are effects so effects we see on this table may correspond to actual effects that exist). It is called Fisher least significant differences - Fisher’s LSD. Confidence intervals based directly on t distribution are very optimistic when many tests are performed simultaneously.

Contrasts

Let us say we have

m

parameters with the property:  to estimate and we have a vector

c

with 

i c i =

0. If we are dealing with unbalanced anova then be considered.

Then

we can build hypothesis H0: 

i

i c i =0

.

i

i c i

is called contrast. If we have a

mxk

matrix

C

then we can form

k m

elements and 

i n i c i =0

should contrasts. Matrix

C

is called contrasts matrix. Usually this matrix is orthogonal. I.e.

j c ij =0

j c ij c kj =0

In anova we are interested in effects, i.e. differences between effects of different levels of some factors. They can be written using contrasts matrix. Default contrasts matrix used in R - contr.treatment uses this type of contrast matrix.

Note that R uses contrasts by default and during the estimation usually not the parameters themselves but those corresponding to contrasts are estimated.

Finding where the effect is multiple hypotheses testing problem.

• • •

Multiple comparison

One comparison - use t-test or equivalent Few comparisons - use Bonferroni Many comparisons - Tukey’s honest significant differences, Holm, Scheffe or others

Bonferroni correction

If there is only one comparison then we can use t-test or intervals based on t distribution. However if the number of tests increases then probability that significant effect will be observed when there is no significant effect becomes very large. It can be calculated using

1-(1-

) n

, where  is significance level and

n

is the number of comparisons. For example if the significance level is 0.05

and the number of comparisons (tests) is 10 then the probability that at least one significant effect will be detected by chance is 1-(1-0.05) 10 =0.40. Bonferroni suggested using 

/n

instead of  for designing simultaneous confidence intervals. It means that the intervals will be calculated using 

i -

j

t

/(2n) (se of comparison) 0.5

Clearly when

n

becomes very large these intervals will become very conservative.

Bonferroni correction is recommended when only few effects are compared.

pairwise.t.test in R can do Bonferroni correction.

If Bonferoni correction is used then

p

values are multiplied by the number of comparisons (Note that of we are testing effects of

I

levels of factor then the number of comparisons is

I(I-1)/2

Bonferroni correction: Example

Let us take the example dataset - poisons and try Bonferroni correction for each factor:

pairwise.t.test(poison,treat,”none”,data=poisons)

1 2 2 0.32844 3 3.3e-05 0.00074

pairwise.t.test(poison,treat,”bonferroni”,data=poisons)

1 2 2 0.9853 3 1e-04 0.0022

As it is seen each

p

-value is multiplied by the number of comparisons 3*2/2 = 3. If the corresponding adjusted

p

-value becomes more than one then it is truncated to one. It says that there are significant differences between effects of poisons 1 and 3 and between 2 and 3. Difference between effects of poisons 1 and 2 is not significant. Note: Command in R - pairwise.t.test can be used for one way anova only.

Holm correction

Another correction for multiple tests – Holm’s correction is less conservative than Bonferroni correction. It is a modification of Bonferroni correction. It works in a sequential manner.

Let us say we need to make n comparisons and significant level is calculate

p

 . Then we values for all of them and sort them in ascending order and apply the following procedure: 1) 2) 3) set If

i = 1 p i <

/(n-i+1)

then it is significant, otherwise it is not.

If a comparison number

i

to the step 2 is significant then increment

i

by one and if

i

n

go The number of significant effects will be equal to

i

where the procedure stops.

When reporting

p

-values Holm correction works similar to Bonferroni but in a sequential manner. If we have m comparisons then the smallest p value is multiplied by

m

, the second smallest is multiplied by multiplied by

m-j+1 m-1, j

-th comparison is

Holm correction: example

Let us take the example - the data set poisons and try Holm correction for each factor:

pairwise.t.test(poison,treat,”none”,data=poisons)

1 2 2 0.32844 3 3.3e-05 0.00074

pairwise.t.test(poison,treat,”holm”,data=poisons) # this correction is the default in R

1 2 2 0.3284 3 1e-04 0.0015

The smallest is multiplied by 3 the second by 2 and the largest by 1 It says that there is significant differences between effects of poisons 1 and 3 and between 2 and 3. Difference between effects of poisons 1 and 2 is not significant.

Tukey’s honest significant difference

This test is used to calculate simultaneous confidence intervals for differences of all effects.

Tukey’s range distribution

. If we have a random sample of size N from normal distribution then the distribution of stundentised range -

(max i (x i )-min i (x i ))/sd

is called Tukey’s distribution.

Let us say we want to test if 

i -

j

is 0. For simultaneous 100  % confidence intervals we need to calculate for all pairs lower and upper limits of the interval using:

difference ± q l,

sd (1/J i +1/J j ) 0.5

/

2

Where q is the  -quantile of Tukey’s distribution, observations used to calculate 

i

and  number of levels to be compared and

j

,

J i

and

J j

are the numbers of 

sd

is the standard deviation,

l

is the is the degree of freedom used to calculate

sd

.

Tukey’s honest significant difference

R command to perform this test is

TukeyHSD

. It takes an object derived using aov as an input and gives confidence intervals for all possible differences. For example for poison data (if you want to use this command you should use

aov

for analysis)

lm1 = aov(time~poison+treat,data=poisons) TukeyHSD(lm1)

$poison diff lwr upr p adj 2-1 -0.073125 -0.2089936 0.0627436 0.3989657 # insignifacnt 3-1 -0.341250 -0.4771186 -0.2053814 0.0000008 # significant 3-2 -0.268125 -0.4039936 -0.1322564 0.0000606 # significant $treat diff lwr upr p adj B-A 0.36250000 0.18976135 0.53523865 0.0000083 #sginificant C-A 0.07833333 -0.09440532 0.25107198 0.6221729 #insgnific D-A 0.22000000 0.04726135 0.39273865 0.0076661 #significant C-B -0.28416667 -0.45690532 -0.11142802 0.0004090 #significant D-B -0.14250000 -0.31523865 0.03023865 0.1380432 #insignific D-C 0.14166667 -0.03107198 0.31440532 0.1416151 #insignific

Scheffe’s simultaneous confidence intervals

If we have a parameter vector exists a vector

a

so that  then a linear combination

E(a T y) = c T

 .

c T

 is estimatable if there Scheffe’s theorem states that simultaneous 100(1  )% confidence interval for all estimatable  is: 

± (q F q,n-r (

) ) 1/2 (var(

) 1/2 q

is the dimension of the space of all possible contrasts, matrix),

n r

is the rank of

X

(design is the number of observations. It can also be applied for regression surface confidence intervals

x T

± (q F q,n-r (

) ) 1/2 (var(x T (X T X) -1 x)) 1/2

Testing for homogeneity of variances

One of the simplest tests for homogeneity is using Levene’s test. It can be done using the algorithm: 1) fit linear model; 2) calculate residuals; 3) fit linear model using absolute values of residuals as a response vector and 4) test for differences. If p-values are very small then variances are not equal (homogenic) otherwise variances are homogenic. There is also an R command

levene.test

Another test for homogeneity of variances is Bartlett test. It can be done using bartlett.test in R. Bartlett test is sensitive to violation normality assumption.

Levene’s test is less sensitive to such violation.

Testing for homogeneity of variances

Example: lm2 = lm(time~poison+treat) summary(lm(abs(lm2$res)~poison+treat)) Residual standard error: 0.09448 on 42 degrees of freedom Multiple R-squared: 0.2533, Adjusted R-squared: 0.1644 F-statistic: 2.849 on 5 and 42 DF, p-value: 0.02649 P value is 0.026 small enough (usually you would look around 0.05). lm2 = lm(1/time~poison+treat) summary(lm(abs(lm2$res)~poison+treat)) Residual standard error: 0.2634 on 42 degrees of freedom Multiple R-squared: 0.1494, Adjusted R-squared: 0.04812 F-statistic: 1.475 on 5 and 42 DF, p-value: 0.2183 Now

p

-value is sufficiently big. So we can say that variances are homogenic.

References

1.

2.

3.

4.

5.

Stuart, A., Ord, KJ, Arnold, S (1999) Kendall’s advanced theory of statistics, Volume 2A Box, GEP, Hunter, WG and Hunter, JS (1978) Statistics for experimenters Berthold, MJ and Hand, DJ. Intelligent Data Analysis Cox, GM and Cochran WG. Experimental design Dalgaard, P. Introductory Statistics with R

Exercise 4

Take the data set from: http://www.ysbl.york.ac.uk/~garib/mres_course/2010/data4.txt

And analyse it using anova.

Description of this data set is: http://www.ysbl.york.ac.uk/~garib/mres_course/2010/data4_descr.txt