Multivariate Analysis in Landscape Ecology

Download Report

Transcript Multivariate Analysis in Landscape Ecology

Accuracy and power of randomization tests
in multivariate analysis of variance with
vegetation data
Valério De Patta Pillar
Departamento de Ecologia
Universidade Federal do Rio Grande do Sul
Porto Alegre, Brazil
[email protected]
http://ecoqua.ecologia.ufrgs.br
• Randomization testing:
– Became practical with fast microcomputers.
– Applicable to most cases analyzed by classical methods.
– Applicable to cases not covered by classical methods.
How good is randomization
testing?
• Is it accurate?
• Is it powerful enough?
Group comparison by randomization testing
Choose a test criterion () to
compare the groups
Permute the data according to the conditions
stated by the null hypothesis (Ho) that the
groups do not differ
Calculate the test criterion in the random
data and compare it to the value found in
the observed data.
After many iterations, the probability P(o
≥ ) will be the number of iterations with
o ≥  divided by the total number of
.
Reject Ho if P(o ≥ ) is smaller
than a threshold ()
iterations
Manly, B. F. J. 1997. Randomization, Bootstrap and Monte Carlo
Methods in Biology. 2 ed. Chapman and Hall.
Randomization test criteria for multivariate
comparisons of any number of groups
Sum of squares bet ween groups* ( Qb )
Q b  Qt  Qw
n 1
n
1
Qt    d hi2

n h 1 i h 1
sum of squares of n( n-1 ) /2 pair-wise squared
is t he t ot al
dissimi larit ies bet ween n samp ling units.
k
wc
Qw   Q
c 1
is t he sum of squares wit hin k groups, such t hat
Qwc
1
nc
n 1
n
 d

2
hi |c
h 1 i h 1
d hi2 |c is comparing
units belonging t o group c.
Qb / Qw bet ween t he sum of squares
Pseudo F-ratio

bet ween groups ( Qb ) and wit hin groups ( Qw )

*Pillar, V. D. & Orlóci, L. 1996. J. Veg. Sci. 7:585-592.
An example
Cover-abundance of tw o plant f uncti onal t ypes (PFTs) in 1 4
experime nt al plot s of n atu ral grassland, under diff erent
levels of N f ert iliz er addit ion ( Sosinski 2 000 ) .
N le vel
PFT 4
PFT 13
1
0
6
1
2
30
5
0
200
Plot s
3
4
5
6
7
8
9
10
30 1 0 0 10 0 1 0 0 1 0 0 17 0 17 0 20 0
5
3
3. 2 1. 6 0. 6 0. 8 1. 4 0. 4
1 2. 2
2
2. 2
0
0. 8 2. 6 5. 6
170
100
100
100
0
Observed squared distance matrix
0
2. 1
0
1. 3 0. 5
0
7. 4 9. 7 5. 8
0
6. 1 8. 0 4. 5 0. 1
0
1 6 .6 1 7 .8 1 2 .8 2. 0 2. 6
0
2 7 .0 2 1 .2 1 8 .0 1 0 .6 1 0 .8 5. 8
0
2 3 .4 2 0 .0 1 6 .0 6. 8 7. 2 2. 6 0. 7
0
1 9 .1 2 1 .2 1 5 .6 2. 7 3. 6 0. 2 7. 4 3. 6
0
4 4 .7 5 4 .4 4 4 .4 1 8 .3 2 0 .8 1 3 .0 3 1 .4 2 3 .2 1 0 .0
170
100
0
30
30
0
Is there a significant effect of N on
vegetation composition as defined by these
two PFTs?
0
Total sum of squares (Qt)= (2.1 + 1.3 + ... + 10.0)/10 = 60.088
SQ within groups (Qw) = 0/1 + 0.5/2 + (0.1+2+2.6+10.6+10.8+5.8)/4 + 3.6/2 + 0/1 = 10.02
SQ between groups(Qb) = 60.088 - 10.02 = 50.068
How common is a Qb ≥ 50.068 if Ho were true (that the
composition is unrelated to group)?
Reference set under Ho
If Ho true, the observation vector in a given sampling unit is
independent from the group to which the unit belongs.
Fact or gr oups
Observ at ion vec t or s
0 3 0 3 0 1 00 1 00 1 00 1 00 1 70 1 70 2 00
1 2
3
4
5
6
7
8
9
10
One p ossible permuta t ion:
0 3 0 3 0 1 00 1 00 1 00 1 00 1 70 1 70 2 00
Fact or gr oups
Observ at ion vec t or s 4 8 7 6 1 0 3 9 2 5 1
A random permutation and corresponding
statistics
Observed
200
Sampling u nit
Vect or
0 3 0 3 0 1 00 1 00 1 00 1 00 1 70 1 70 2 00
4 8 7
6
10
3
9
2
5
1
Permuted squared distance matrix
0
6. 8
0
1 0 .6 0. 7
0
2. 0 2. 6 5. 8
0
1 8 .3 2 3 .2 3 1 .4 1 3 .0
0
5. 8 1 6 .0 1 8 .0 1 2 .8 4 4 .4
0
2. 7 3. 6 7. 4 0. 2 1 0 .0 1 5 .6
0
9. 7 2 0 .0 2 1 .2 1 7 .8 5 4 .4 0. 5 2 1 .2 0
0. 1 7. 2 1 0 .8 2. 6 2 0 .8 4. 5 3. 6 8. 0 0
7. 4 2 3 .4 2 7 .0 1 6 .6 4 4 .7 1. 3 1 9 .1 2. 1 6. 1
170
100
100
100
0
170
30
100
0
30
0
0
Total sum of squares (Qt)= (6.8 + 10.6 + ... + 6.1)/10 = 60.088
SQ within groups (Qwo) = 0/1 + 0.7/2 +
(13+12.8+44.4+0.2+10+15.6)/4 + 8/2 + 0/1 = 28.35
SQ between groups(Qbo) = 60.088 - 28.35 = 31.738
Since, 31.738 < 50.068 (Qbo < Qb), this iteration adds zero to
the frequency of cases in which Qbo ≥ Qb.
Permuted
After 10000 random permutations…
--------------------------- ------------------ ------------------- ------------------ RANDOMIZATION TEST
--------------------------- ------------------ ------------------- ------------------ Elapsed t ime : 1 second
Number of it erat ions: 1 00 0 0
Group part it ion of sampling units:
Sampling units:
1 2 34 5 67 89 1
0
Factor N :
1 2 2 3 3 3 3 4 4 5
Source of variat ion
Sum of squares( Q) P( Qb o •Qb)
--------------------------- ------------------ ------------------- ------------------ N:
Betw een g roups
5 0 .06 8
0. 0 04 9
With in g roup s
1 0 .02
--------------------------- ------------------ ------------------- ------------------ Tot al
6 0 .08 8
Group cent roid vecto rs
Factor N :
Group 1 (n =1) : 5. 6
Group 2 (n =2) :
5
Group 3 (n =4) : 2. 1
Group 4 (n =2) : 1. 1
Group 5 (n =1) : 0. 4
in each group:
1. 4
0. 3
1. 6
1. 7
5. 6
Two-factor designs
Test criterion:
Qb = Qt - Qw is based on the groups defined by the joint states of the
factors.
Qb is partitioned as
Qb = Qb|A + Qb|B + Qb|AB
where
Qb|A: sum of squares between la groups according to factor A disregarding
factor B
Qb|B: sum of squares between lb groups according to factor B disregarding
factor A
Qb|AB: sum of squares of the interaction AB, obtained by difference.
F-ratio = Qb/Qw
Unrestricted permutation in two-factor
design
Groups f acto r A
Groups f acto r B
Observati on vecto rs
1 1 1 1 2 2 2 2
1 2 3 4 1 2 3 4
1 2 3 4 5 6 7 8
One permut ati on:
Groups f acto r A
Groups f acto r B
Observati on vecto rs
1 1 1 1 2 2 2 2
1 2 3 4 1 2 3 4
6 8 1 4 5 7 2 3
Two-factor Multivariate Analysis
of Variance
A
1
1
2
2
1
1
2
2
B
1
2
1
2
1
2
1
2
A
B
0. 0
10 2.7
40 .7
14 9.1
42 .0
11 8.7
50 .7
66 .9
1
1
0. 0
10 1.0
55 .2
13 3.2
62 .5
95 .9
75 .7
1
2
Observed
0. 0
14 9.7
46 .5
10 2.5
47 .7
69 .3
2
1
0. 0
17 6.9 0. 0
93 .2 11 1.8 0. 0
15 3.9 47 .5 97 .8 0. 0
12 0.8 85 .3 57 .0 57 .7 0. 0
2
1
1
2
2
2
1
2
1
2
Data: Species (57) composition in 8 vegetation
units surveyed in two landscape positions
(factor A) and two grazing levels (factor B).
A
1
1
2
2
1
1
2
2
B
1
2
1
2
1
2
1
2
A
B
0.0
57.7
66.9
69.3
75.7
85.3
57.0
120.8
1
1
0.0
50.7
47.7
95.9
47.5
97.8
153.9
1
2
Factor A :
Factor B:
Int eract ion A x B:
0.0
40.7
102.7
42.0
118.7
149.1
2
1
One random
permutation
0.0
101.0
46.5
102.5
149.7
2
2
Observed It erat ion 2
21 .5
26 .6
1 2 9.1
37 .5
26 .9
54
Qb comb inat ion A and B:
1 7 7.5
Sum of squares with in groups: 1 3 6.5
Tot al sum of squares:
0.0
133.2 0.0
62.5 111.8 0.0
55.2 176.9 93.2 0.0
1
1
2
2
1
2
1
2
31 4
1 1 8.1
1 9 5.9
31 4
…
…
…
…
…
…
…
After 10000 random permutations…
Source of variat ion
Sum of squares( Q) F=Qb/Q w P( Fo •F)
--------------------------- ------------------ ------------------- -------------A ( landscape po sit ion):
Betw een g roups
2 1 .51 9
0. 1 57 6 9 0. 7 17 2
--------------------------- ------------------ ------------------- -------------B ( grazing) :
Betw een g roups
1 2 9.1 1
0. 9 46 1
0. 0 24 6
--------------------------- ------------------ ------------------- -------------Ax B
2 6 .91 3
0. 1 97 2 2 0. 5 77 6
--------------------------- ------------------ ------------------- -------------Betw een g roups
1 7 7.5 4
1. 3 01
0. 1 23 6
With in g roup s
1 3 6.4 6
--------------------------- ------------------ ------------------- -------------Tot al
314
Data: Species (57) composition in 8 vegetation units surveyed in two landscape positions (factor A) and
two grazing levels (factor B). Unrestricted random permutations. Test criterion F-ratio = Qb/Qw.
Restricted permutations
•
In two-factor (not nested) designs, for testing one factor, permutations
may be restricted to occur within the levels of the other factor
(Edgington 1987).
•
Restricted permutation within the levels of factor A (for testing factor
B):
Groups f acto r A
11112222
Groups f acto r B
11221122
Observati on vecto r ident iti es 1 2 3 4 5 6 7 8
One permut ati on:
Groups f acto r A
11112222
Groups f acto r B
11221122
Observati on vecto r ident iti es 2 4 3 1 5 7 8 6
Edgington, E. S. 1987. Randomization Tests. Marcel Dekker, New York.

Permutations of residuals instead of raw
data
Per mut at ion of observ at ion vec t or s in which t he eff ect s of bot h fa ct ors were
re mov ed can ov erco me impossibilit y of exact t est s f or int er act ions.
Residuals are compute d in t he data be fo re obt aining t he dissimilari t y mat rix .
For te st ing t he int er act ion in t wo-f act or analysis t he re siduals re mov e bot h fa ct ors:
z hijk  y hijk  y hi..  y h. j.  y h...
y hijk : observ at ion of variable h in unit k, be longing t o gr oup i in f act or A an d t o

gr oup j in fa ct or B;
y hi .. : mean f or vari able h in fa ct or A gr oup i;
y h. j . : mean f or vari able h in fa ct or B gr oup j;

y h... : ov erall mean f or vari able h in t he data set.

Anderson, M.J. and t er Braak, C. 20 0 3, Journal of St at ist ical Comput at ions and S imulat ions 73: 8 5 -1 1 3.

--------------------------- ------------------ ------------------- ------------------ RANDOMIZATION TEST
--------------------------- ------------------ ------------------- ------------------ Elapsed t ime : 7 seconds
Number of it erat ions: 1 00 0 0
Group part it ion of sampling units:
Sampling units:
1 2 34 5 67 89 1
0 1 1 12 1 3 1 4 15 1 6
Landscape po sit ion:
1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
Grazing:
1 2 12 1 21 21
2 1 2 1 2 1 2
Source of variat ion
Sum of squares( Q) F=Qb/Q w
P( Fo F)
--------------------------- ------------------ ------------------- ------------------ Landscape po sit ion:
Betw een g roups
4 3 9.5 5
1. 6 85 8
0. 0 00 3
Cont rasts:
1 -1 0 0
2 1 .51 9
0. 0 73 5 75
0. 8 30 1
1 0 -1 0
9 7 .29 3
0. 3 13 7 5
0. 0 58
1 0 0 -1
250
1. 3 15
0. 0 59
0 1 -1 0
1 0 7.9 2
0. 3 40 3 2
0. 0 51 9
0 1 0 -1
2 4 0.8 8
1. 2 21 9
0. 0 56 4
0 0 1 -1
1 6 1.4 8
0. 7 51 9 2
0. 0 56 3
1 1 -2 0
1 2 9.6 4
0. 2 69 3 1
0. 0 04 1
1 1 1 -3
2 8 8.3 9
0. 4 38 0 3
0. 0 01 4
--------------------------- ------------------ ------------------- -------------Grazing:
Betw een g roups
1 2 2.9 4
0. 4 7 1 52
0. 0 05 5
--------------------------- ------------------ ------------------- -------------LanPosit ion x Grazing
1 2 3.5 6
0. 4 73 9
0. 1 60 5
--------------------------- ------------------ ------------------- -------------Betw een g roups
6 8 6.0 5
2. 6 31 2
0. 0 00 3
With in g roup s
2 6 0.7 3
--------------------------- ------------------ ------------------- ------------------ Tot al
9 4 6.7 8
Two-factor multivariate analysis of
variance by randomization testing for the
effects of landscape position and grazing
level in natural grassland, southern Brazil
(data from Pillar 1986). The data set
contains 16 polled community stands by
60 species.
Restricted random permutations for
testing factors landscape and grazing.
Permutation of residuals removing both
factors for testing the interaction.
How good is randomization testing
in two-factor multivariate analysis
of variance?
0.18
0.16
Simulation of interaction
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02
1
2
1
2
0.35
0.3
Fact or A
Fact or B
1
1
0 .00
0 .00
0 .00
0 .00
0 .00
1
2
2
1
Group averages
0 .00
0 .16
0 .16
0 .16
0 .16
0 .16
0 .16
0 .16
0 .16
0 .16
2
2
0.25
0.2
0.15
0.1
0.05
0
0 .16
0 .32
0 .00
0 .48
0 .64
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02
1
2
0.6
0.5
0.4
•
•
•
•
For each case, 1000 data sets were generated, with
distribution properties of real vegetation data and subject to
multivariate analysis of variance with randomization testing.
When factor or interaction effect is set to zero, the proportion of
Ho rejection under a given  threshold estimates Type I Error,
the probability of wrongly rejecting Ho when it is true.
If Type I Error is equal to , the test is exact.
When factor or interaction effect > 0, the proportion of Ho
rejection estimates the power of the test, which is the onecomplement of Type II Error, the probability of not rejecting Ho
when it is false.
0.3
0.2
0.1
0
1
2
1
2
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Simulated data generated with distributional
properties of real data
Data set: 16 grassland units described by cover of 60 species.
Two factors: landscape position (top-convex, concave-lowland) and grazing levels
(grazed, ungrazed).
Procedure described by Peres-Neto & Olden (2000):
1.
2.
3.
4.
5.
6.
7.
Calculate the mean ( ) and the standard deviation (ij) for each species vector i
within each group j defined by the four factor level combinations;
Standardize these vectors for mean equal 0 and standard deviation equal 1, thij=(xhij )/ ;
Randomly permute whole stand vectors across groups;
Restore the original dispersion within each group by computing new observations shij=
thij , defining in this way a data set with the conditions specified by Ho;
Apply to the species vectors the corresponding group differences for factor and
interaction effects;
Perform the randomization tests using 1000 random permutations;
Repeat the steps (3) to (6) 1000 times, recording the proportion of Ho rejection.
Peres-Neto, P.R. & Olden, J.D. 2000. Animal Behaviour 61: 79-86.
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02
Increasing eff ect of one fact or, no inte ract ion:
Proport ion of Ho reject ion
Unrest rict ed perm ut ati ons
Eff ect s
Qb
F-rat io
1
2
1 x2
1
2
1 x2
1
2
1 x2
0 .00 0 .00 0 .00 0 .03 1 0 .03 0 .04 8 0 .03 2 0 .04 2 0 .04 9
0 .16 0 .00 0 .00 0 .97 7 0 .01 2 0 .01 5 0 .97 1 0 .03 6 0 .04 1
0 .32 0 .00 0 .00 1 .00 0 0 .00 0 0 .00 0 1 .00 0 0 .01 2 0 .02 3
1
Rest ricte d
1
2
0 .04 3
0 .04 2
0 .97 1
0 .04 2
1 .00 0
0 .04 0
2
Residuals F
1 x2
0 .05 6
0 .04 1
0 .04 7
With no factor and interaction effects, type I error is not
different from 0.05, as expected by using  = 0.05.
As the effect of factor 1 increases, type I error for factor 2 and interaction are
underestimated with unrestricted permutations with Qb and -ratio, but not
with restricted permutations and residuals.
Results of power evaluation by data simulation in two-factor MANOVA. The proportion of Ho rejection at 
= 0.05 was obtained for 1000 simulated data sets generated on the basis of plant community data with 16
units and 60 species, with increasing difference between the two groups for factor 1, with no interaction.
Each factor combination had equal number of units. For each data set a randomization test was run with
1000 iterations.
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1
Increasing eff ect of bot h fact ors, no inte ract ion:
Proport ion of Ho reject ion
Rest ricte d
Eff ect s
Qb
F-rat io
1
1
2
1 x2
1
2
1 x2
1
2
1 x2
0 .97 1
0 .16 0 .16 0 .00 0 .77 7 0 .80 3 0 .00 3 0 .93 2 0 .92 5 0 .02 7
1 .00 0
0 .32 0 .32 0 .00 1 .00 0 1 .00 0 0 .00 0 1 .00 0 1 .00 0 0 .00 5
Residuals F
1 x2
0 .04 1
0 .05 3
As the effects of both factors increase, type I error for the interaction is
underestimated with unrestricted permutations with Qb and -ratio, but not with
residuals.
2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02
No fact ors eff ect s, increasing inte ract ion:
Proport ion of Ho reject ion
Eff ect s
Qb
F-rat io
1
2
1 x2
1
2
1 x2
1
2
1 x2
0 .00 0 .00 0 .16 0 .00 9 0 .00 7 0 .97 0 0 .02 3 0 .03 9 0 .96 3
0 .00 0 .00 0 .32 0 .00 0 0 .00 0 1 .00 0 0 .00 7 0 .00 9 1 .00 0
Rest ricte d
1
0 .00 0
0 .00 0
1
2
Residuals F
1 x2
0 .95 6
1 .00 0
As the effect of interaction increases, type I error for both factors is underestimated with
Qb and -ratio, un- and restricted permutations.
But, main factors should not be considered at all when interaction is present!
0.6
0.5
0.4
0.3
Increasing eff ect of bot h fact ors, wit h weak or
st ronger inte ract ion:
Proport ion of Ho reject ion
Rest ricte d
Eff ect s
Qb
F-rat io
1
1
2
1 x2
1
2
1 x2
1
2
1 x2
0 .15 9
0 .08 0 .08 0 .16 0 .11 2 0 .15 8 0 .20 0 0 .18 4 0 .19 8 0 .23 0
1 .00 0
0 .24 0 .24 0 .16 0 .98 7 0 .98 9 0 .00 1 1 .00 0 1 .00 0 0 .06 1
0.2
0.1
0
1
2
Residuals F
1 x2
0 .25 5
0 .22 2
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
Proport ion of Ho reject ion
Eff ect s
Qb
F-rat io
Rest ricte d
1
1
2
1 x2
1
2
1 x2
1
2
1 x2
0 .82 3
0 .16 0 .16 0 .32 0 .44 5 0 .50 5 0 .55 5 0 .84 9 0 .82 8 0 .86 0
1 .00 0
0 .32 0 .32 0 .32 0 .99 9 0 .99 5 0 .00 0 1 .00 0 1 .00 0 0 .42 1
2
Residuals F
1 x2
0 .94 1
0 .95 6
As the effects of both factors increase, the power of the test with permutations of raw
data is decreased for detecting the interaction when using Qb and -ratio, but not when
permuting residuals.