Transcript anova

SPH 247 Statistical Analysis of Laboratory Data

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 1

ANOVA—Fixed and Random Effects

 We will review the analysis of variance (ANOVA) and then move to random and fixed effects models  Nested models are used to look at levels of variability (days within subjects, replicate measurements within days)  Crossed models are often used when there are both fixed and random effects.

2 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data

The Basic Idea

 The analysis of variance is a way of testing whether observed differences between groups are too large to be explained by chance variation  One-way ANOVA is used when there are k ≥ 2 groups for one factor, and no other quantitative variable or classification factor.

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 3

A

9 7 7 9

B

10 9 8 9

C

12 14 14 12 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 4

Data = Grand Mean + Column Deviations from grand mean + Cell Deviations from column mean Are the column deviations from the grand mean too big to be accounted for by the cell deviations from the column means?

5 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data

April 2, 2013

A

9 7 7 9 Data

B

10 9 8 9

C

12 14 14 12 SPH 247 Statistical Analysis of Laboratory Data 6

April 2, 2013

A

8 8 8 8 Column Means

B

9 9 9 9

C

13 13 13 13 SPH 247 Statistical Analysis of Laboratory Data 7

April 2, 2013 Deviations from Column Means

A B C

1 -1 -1 1 -1 0 1 0 -1 1 1 -1 SPH 247 Statistical Analysis of Laboratory Data 8

red.cell.folate

Red cell folate data Description: package:ISwR R Documentation The 'folate' data frame has 22 rows and 2 columns. It contains data on red cell folate levels in patients receiving three different methods of ventilation during anesthesia.

Format: This data frame contains the following columns: folate: a numeric vector. Folate concentration (μg/l).

ventilation: a factor with levels: 'N2O+O2,24h': 50% nitrous oxide and 50% oxygen, continuously for 24~hours; 'N2O+O2,op': 50% nitrous oxide and 50% oxygen, only during operation; 'O2,24h': no nitrous oxide, but 35-50% oxygen for 24~hours.

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 9

> data(red.cell.folate) > help(red.cell.folate) > summary(red.cell.folate) folate ventilation Min. :206.0 N2O+O2,24h:8 1st Qu.:249.5 N2O+O2,op :9 Median :274.0 O2,24h :5 Mean :283.2 3rd Qu.:305.5 Max. :392.0 > attach(red.cell.folate) > plot(folate ~ ventilation) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 10

April 2, 2013 N2O+O2,24h N2O+O2,op ventilation SPH 247 Statistical Analysis of Laboratory Data O2,24h 11

> folate.lm <- lm(folate ~ ventilation) > summary(folate.lm) Call: lm(formula = folate ~ ventilation) Residuals: Min 1Q Median 3Q Max -73.625 -35.361 -4.444 35.625 75.375 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 316.62 16.16 19.588 4.65e-14 *** ventilationN2O+O2,op -60.18 22.22 -2.709 0.0139 * ventilationO2,24h -38.62 26.06 -1.482 0.1548 -- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 45.72 on 19 degrees of freedom Multiple R-Squared: 0.2809, Adjusted R-squared: 0.2052 F-statistic: 3.711 on 2 and 19 DF, p-value: 0.04359 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 12

> anova(folate.lm) Analysis of Variance Table Response: folate -- Df Sum Sq Mean Sq F value Pr(>F) ventilation 2 15516 7758 3.7113 0.04359 * Residuals 19 39716 2090 Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 13

Two- and Multi-way ANOVA

 If there is more than one factor, the sum of squares can be decomposed according to each factor, and possibly according to interactions  One can also have factors and quantitative variables in the same model (cf. analysis of covariance)  All have similar interpretations 14 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data

Heart rates after enalaprilat (ACE inhibitor) Description: 36 rows and 3 columns.

data for nine patients with congestive heart failure before and shortly after administration of enalaprilat, in a balanced two-way layout.

Format: hr a numeric vector. Heart rate in beats per minute.

subj a factor with levels '1' to '9'. time a factor with levels '0' (before), '30', '60', and '120' (minutes after administration).

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 15

> data(heart.rate) > attach(heart.rate) > heart.rate

hr subj time 1 96 1 0 2 110 2 0 3 89 3 0 4 95 4 0 5 128 5 0 6 100 6 0 7 72 7 0 8 79 8 0 9 100 9 0 10 92 1 30 ......

18 106 9 30 19 86 1 60 ......

27 104 9 60 28 92 1 120 ......

36 102 9 120 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 16

> plot(hr~subj) > plot(hr~time) > hr.lm <- lm(hr~subj+time) > anova(hr.lm) Analysis of Variance Table Note that when the design is orthogonal, the ANOVA results don’t depend on the order of terms.

Response: hr Df Sum Sq Mean Sq F value Pr(>F) subj -- 8 8966.6 1120.8 90.6391 4.863e-16 *** time 3 151.0 50.3 4.0696 0.01802 * Residuals 24 296.8 12.4 Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > sres <- resid(lm(hr~subj)) > plot(sres~time) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 17

April 2, 2013 1 2 3 4 5 subj 6 SPH 247 Statistical Analysis of Laboratory Data 7 8 9 18

April 2, 2013 0 30 60 time SPH 247 Statistical Analysis of Laboratory Data 120 19

April 2, 2013 0 30 60 time SPH 247 Statistical Analysis of Laboratory Data 120 20

Fixed and Random Effects

 A fixed effect is a factor that can be duplicated (dosage of a drug)  A random effect is one that cannot be duplicated  Patient/subject  Repeated measurement  There can be important differences in the analysis of data with random effects  The error term is always a random effect April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 21

Fixed Effect One-way A NOVA

y

 

i

 т

i y

  

i

 

i

 

i

  0

i

 т ~

N

(0,  т 2 ) т

i

)   т 2

E M SA

)

Q H

0 ( 

i

) 

Q

(   

n a

  )  0 1

i

i

2

i

)   т 2 ~

F

under th April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 22

Rando m Effect One-way ANOVA

y

 

i

 т

i y

i

 т

i

т ~  ~

N

(0,  т 2 )

N

(0,  2  ) )   т 2 ) 

n

 2    т 2

n

is replicates per level of 

H

0 :  2   0 ~

F

under the null  ˆ 2   ( April 2, 2013 ) /

n

SPH 247 Statistical Analysis of Laboratory Data 23

Estradiol data from Rosner

 5 subjects from the Nurses’ Health Study  One blood sample each  Each sample assayed twice for estradiol (and three other hormones)  The within variability is strictly technical/assay  Variability within a person over time will be much greater April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 24

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 25

> anova(lm(Estradiol ~ Subject,data=endocrin)) Analysis of Variance Table Response: Estradiol Df Sum Sq Mean Sq F value Pr(>F) Subject 4 593.31 148.329 24.546 0.001747 ** Residuals 5 30.21 6.043 -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Replication error variance is 6.043, so the standard deviation of replicates is 2.46 pg/mL This compared to average levels across subjects from 8.05 to 18.80

Estimated variance across subjects is (148.329 − 6.043)/2 = 71.143

Standard deviation across subjects is 8.43 pg/mL If we average the replicates, we get five values, the standard deviation of which is also 71.1

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 26

Fasting Blood Glucose

 Part of a larger study that also examined glucose tolerance during pregnancy  Here we have 53 subjects with 6 tests each at intervals of at least a year  The response is glucose as mg/100mL April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 27

> anova(lm(FG ~ Subject,data=fg2)) Analysis of Variance Table Response: FG Df Sum Sq Mean Sq F value Pr(>F) Subject 52 10936 210.310 2.9235 9.717e-09 *** Residuals 265 19064 71.938 -- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > Estimated within-Subject variance is 71.938, so the standard deviation is 8.48 mg/100mL Estimated between-Subject variance is (210.310 − 71.938)/6 = 23.062, sd = 4.80 mg/100mL The variance of the 53 means is 35.05, which is larger because it includes a component of the within-subject variance April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 28

Nested Random Effects Models

 Cooperative trial with 6 laboratories, one analyte (7 in the full data set), 3 batches per lab (a month apart), and 2 replicates per batch  Estimate the variance components due to labs, batches, and replicates  Test for significance if possible  Effects are lab, batch-in-lab, and error April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 29

Analysis using lm or aov

> anova(lm(Conc ~ Lab + Lab:Bat,data=coop2)) Analysis of Variance Table Response: Conc Df Sum Sq Mean Sq F value Pr(>F) Lab 5 1.89021 0.37804 60.0333 1.354e-10 *** Lab:Bat 12 0.20440 0.01703 2.7049 0.02768 * Residuals 18 0.11335 0.00630 The test for batch-in-lab is correct, but the test for lab is not—the denominator should be The Lab:Bat MS, so F(5,12) = 0.37804/0.01703 = 22.198 so p = 3.47e-4, still significant Residual Batch Lab 0.00630 0.00537

0.01683

0.0794

0.0733

0.2453

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 30

Expected Mean Squares

l

laboratories

b

batches per laboratory

r

replicates per batch laboratories

b r

L

2 

r

B

2  

e

2 batches within laboratories

r

B

2  

e

2 replicates wi thin b atches 

e

2  ˆ

B

2  ˆ

L

2  (

SS B

 (

SS L

SSE

) /

r

S S B

) /

br

April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 31

Analysis using lme

 R package nlme  Two separate formulas, one for the fixed effects and one for the random effects  In this case, no fixed effects  Nested random effects use the / notation lme(Conc ~1, random = ~1 | Lab/Bat,data=coop2) April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 32

lme(Conc ~1, random = ~1 | Lab/Bat,data=coop2) Linear mixed-effects model fit by REML Data: coop2 Log-restricted-likelihood: 21.02158

Fixed: Conc ~ 1 (Intercept)

0.5080556

Average Concentration

Random effects: Formula: ~1 | Lab (Intercept) StdDev:

0.2452922

SD of Labs

Formula: ~1 | Bat %in% Lab (Intercept) Residual StdDev:

0.07326702 0.07935504

SD of Batches and Replicates

Number of Observations: 36 Number of Groups: Lab Bat %in% Lab 6 18 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 33

Hypothesis Tests

 When data are balanced, one can compute expected mean squares, and many times can compute a valid F test.

 In more complex cases, or when data are unbalanced, this is more difficult  One requirement for certain hypothesis tests to be valid is that the null hypothesis value is not on the edge of the possible values  For H 0 : α = 0, we have that α could be either positive or negative  For H 0 : σ 2 = 0, negative variances are not possible April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 34

Effect Variance SD ----------------------------------- Residual Batch Lab 0.00630 0.00537

0.01683

0.0794

0.0733

0.2453

 The variance among replicates a month apart (0.00630 + 0.00537 = 0.01167) is about twice that of those on the same day (0.00630), and the standard deviations are 0.1080 and 0.0794. These are CV’s on the average of 21% and 16% respectively  The variance among values from different labs is about 0.0285, with a standard deviation of 0.1688 and a CV of about 33% April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data 35

More complex models

 When data are balanced and the expected mean squares can be computed, this is a valid way for testing and estimation  Programs like lme and lmer in R and Proc Mixed in SAS can handle complex models  But most likely this is a time when you may need to consult an expert 36 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data