Statistics review Basic concepts: measures Variability

Download Report

Transcript Statistics review Basic concepts: measures Variability

Statistics review

• • • •

Basic concepts: Variability measures Distributions Hypotheses Types of error

• • • •

Common analyses T-tests One-way ANOVA Randomized block ANOVA Two-way ANOVA

The t-test

Asks: do two samples come from different populations?

YES Ho NO DATA A B

The t-test

Depends on whether the difference between samples is much greater than difference within sample.

A B Between >> within… A B

The t-test

Depends on whether the difference between samples is much greater than difference within sample.

A B Between < within… A B

The t-test

T-statistic= Difference between means Standard error within each sample

s 2 + s 2 n 1 n 2

The t-test

How many degrees of freedom?

(n 1 -1) + (n 2 -1) Why does this seem familiar?

s 2 + s 2 n 1 n 2

T-tables

v

1 2 3 4 0.10

3.078

1.886

1.638

1.533

0.05

6.314

2.920

2.353

2.132

0.025

12.706

4.303

3.182

2.776

Careful! This table built for one-tailed tests. Only common stats table where to do a two-tailed test (A infinity

1.282

1.645

1.960

T-tables

v

1 2 3 4 0.10

3.078

1.886

1.638

1.533

0.05

6.314

2.920

2.353

2.132

0.025

12.706

4.303

3.182

2.776

Two samples, each n=3, with t-statistic of 2.50: significantly different?

infinity

1.282

1.645

1.960

T-tables

v

1 2 3 4 0.10

3.078

1.886

1.638

1.533

0.05

6.314

2.920

2.353

2.132

0.025

12.706

4.303

3.182

2.776

Two samples, each n=3, with t-statistic of 2.50: significantly different? No!

infinity

1.282

1.645

1.960

If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap?

v

1 2 3 4 0.10

3.078

1.886

1.638

1.533

0.05

6.314

2.920

2.353

2.132

0.025

12.706

4.303

3.182

2.776

infinity

1.282

1.645

1.960

If you have two samples with similar n and S.E., why do you know instantly that they are not significantly different if their error bars overlap?

v

1 2 3 4 0.10

3.078

1.886

1.638

1.533

0.05

6.314

2.920

2.353

2.132

0.025

12.706

4.303

3.182

2.776

infinity

1.282

1.645

1.960

• the difference in means < 2 x S.E., i.e. t-statistic < 2 • and, for any df, t must be > 1.96

to be significant!

}

One-way ANOVA

General form of the t-test, can have more than 2 samples Ho: All samples the same… Ha: At least one sample different

Ho

One-way ANOVA

General form of the t-test, can have more than 2 samples DATA A AB B C C

Ha

A A C B BC

One-way ANOVA

Just like t-test, compares differences between samples to differences within samples

T-test statistic (t) ANOVA statistic (F)

A B C Difference between means Standard error within sample MS between groups MS within group

Mean squares:

MS= Sum of squares df

Everyone gets a lot of cake (high MS) when:

Lots of cake (high SS) Few forks (low df)

MS= Sum of squares df

Mean squares:

MS= Sum of squares df

Analogous to variance

Variance:

S

2

= Σ (x

i

– x )

2

n-1

df Sum of squared differences

ANOVA tables

Treatment (between groups) Error (within groups) Total df

df (X) df (E) df (T)

SS

SSX SSE SST

MS F p

SST = SSX SSE

ANOVA tables

Treatment (between groups) Error (within groups) Total df

df (X) df (E) df (T)

SS

SSX SSE SST

MS

SSX df (X) SSE df (E)

F

SSE MSX = SSX

df (X) df (E)

p

= MSE

ANOVA tables

Treatment (between groups) Error (within groups) Total df

df (X) df (E) df (T)

SS

SSX SSE SST

MS F

SSX df (X)

}

SSE df (E)

}

MSX MSE

p

Look up !

SSE MSX = SSX

df (X) df (E)

= MSE

Do three species of palms differ in growth rate? We have 5 observations per species. Complete the table!

df Treatment (between groups) Error (within groups) Total

k(n-1)

SS

69 104

MS F p

Hint: For the total df, remember that we calculate total SS as if there are no groups (total variance)… df Treatment (between groups) Error (within groups) Total

k(n-1)

SS

69 104

MS F p

Note: treatment df always k-1 Is it significant? At alpha = 0.05, F 2,12 = 3.89

Treatment (between groups) Error (within groups) Total df

2 12 14

SS

69 35 104

MS

34.5

2.92

F

11.8

p

?

2. Randomized block Good patch BLOCK A Medium patch BLOCK B Poor patch BLOCK C

Pro : Can remove between-block SS from error SS…may increase power of test Error Treatment Error Block Treatment

Con : Blocks use up error degrees of freedom Error Treatment Error Block Treatment

Do the benefits outweigh the costs? Does MS error go down?

F = Treatment SS/treatment df Error SS/error df Error Error Block Treatment Treatment

Two-way ANOVA

Just like one-way ANOVA, except subdivides the treatment SS into:

• • •

Treatment 1 Treatment 2 Interaction 1&2

Two-way ANOVA

Suppose we wanted to know if moss grows thicker on north or south side of trees, and we look at 10 aspen and 10 fir trees:

Aspect (2 levels, so 1 df)

Tree species (2 levels, so 1 df)

Aspect x species interaction (1df x 1df = 1df)

Error?

k(n-1) = 4 (10-1) = 36

v

Aspect Species df

1 1

Aspect x Species

1

Error (within groups) Total

36 39

SS

SS(Aspect)

MS F

MS(Aspect) MS(As) MSE SS(Species) MS(Species) MS(Sp) SS(Int) MS(Int) MSE MS(Int) MSE SSE MSE SST

Interactions

Combination of treatments gives non additive effect

Additive effect: Alder

5

Fir

3 2

North South

Interactions

Combination of treatments gives non additive effect

Anything not parallel!

North South North South

Careful!

If you log-transformed your variables, the absence of interaction is a multiplicative effect: log (a) + log (b) = log (ab)

North South North South