Document 7891194

Download Report

Transcript Document 7891194

CHAPTER 24
MRPP (Multi-response Permutation Procedures)
and Related Techniques
Tables, Figures, and Equations
From: McCune, B. & J. B. Grace. 2002. Analysis of
Ecological Communities. MjM Software Design,
Gleneden Beach, Oregon http://www.pcord.com
How it works
1. Calculate distance matrix, D.
2. Calculate the average distance xi within each group i.
3. Calculate delta (the weighted mean within-group distance)
g
delta =  =
C x
i
i
i=1
for g groups, where C is a weight that depends on the number of items in the
groups (normally Ci = ni /N, where ni is the number of items in group i and
N is the total number of items).
Table 24.1. Methods for weighting groups in MRPP.
Formula
Comments
ni
 ni
A natural weighting recommended by Mielke (1984) and used in most recent
applications of MRPP.
ni - 1
Ci =
 ni - 1
With squared Euclidean distance this weighting results in an MRPP statistic
that is equivalent to a 2-sample t-test or one-way ANOVA F-test (Mielke et
al. 1982; Zimmerman et al. 1985). While this option accounts for degrees of
freedom, this is a foreign concept to permutation procedures.
Ci =
Ci =
1
g
Not recommended but available for experimentation.
Ci =
ni * ( ni - 1)
 ni * ( ni - 1)
Not recommended but available for experimentation. Used in some early
applications of MRPP.
4. Determine probability of a  this small or smaller.
The number of possible partitions (M) for two groups is
Species
SU
M = N!/(n1! * n2!)
Groups
SU
1
1
2
2
3
etc.
Proportion of these that have  smaller than the observed :
p =
1 + no. smaller deltas
total no. possible partitions
Figure 24.2. Frequency distribution of delta under the null hypothesis,
compared to the observed delta. The area under the curve less than the
observed delta is the probability of type I error under the null hypo- thesis of
no difference between groups.
The test statistic, T is:
T  (  m ) / s
where m and s are the mean and standard deviation of  under
the null hypothesis.
observed   expected 
T
s.dev. of expected 
5. Calculate the effect size that is independent of the sample size. This is
provided by the chance-corrected within-group agreement (A):

observed 
A = 1= 1expected 
m
20
G roup
1
2
3
Species 2
15
10
5
Figure 24.3. Fifteen
sample units in species
space, each sample
unit assigned to one of
three groups.
0
0
5
10
Species 1
15
20
Table 24.2. A species data matrix of 15 plots by 2 species, their assignments to three
groups, and Sørensen distances among plots. Shaded cells are between-group distances,
ignored by MRPP.
Species
Plot
Sp1
Sørensen distance matrix
Sp2 Grou
p
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
0
20
1 0.000 0.250 0.375 0.818 0.909 0.212 0.290 0.375 0.379 0.429 0.840 1.000 0.933 1.000 1.000
2
0
12
1 0.250 0.000 0.167 0.714 0.857 0.040 0.043 0.167 0.143 0.200 0.765 1.000 0.909 1.000 1.000
3
2
10
1 0.375 0.167 0.000 0.714 0.714 0.200 0.130 0.000 0.143 0.200 0.529 0.750 0.727 0.846 0.862
4
0
2
1 0.818 0.714 0.714 0.000 0.500 0.733 0.692 0.714 0.636 0.600 0.429 1.000 0.833 1.000 1.000
5
1
1
1 0.909 0.857 0.714 0.500 0.000 0.867 0.846 0.714 0.818 0.800 0.429 0.667 0.667 0.875 0.895
6
0
13
2 0.212 0.040 0.200 0.733 0.867 0.000 0.083 0.200 0.182 0.238 0.778 1.000 0.913 1.000 1.000
7
0
11
2 0.290 0.043 0.130 0.692 0.846 0.083 0.000 0.130 0.100 0.158 0.750 1.000 0.905 1.000 1.000
8
2
10
2 0.375 0.167 0.000 0.714 0.714 0.200 0.130 0.000 0.143 0.200 0.529 0.750 0.727 0.846 0.862
9
0
9
2 0.379 0.143 0.143 0.636 0.818 0.182 0.100 0.143 0.000 0.059 0.714 1.000 0.895 1.000 1.000
10
0
8
2 0.429 0.200 0.200 0.600 0.800 0.238 0.158 0.200 0.059 0.000 0.692 1.000 0.889 1.000 1.000
11
3
2
3 0.840 0.765 0.529 0.429 0.429 0.778 0.750 0.529 0.714 0.692 0.000 0.333 0.467 0.684 0.727
12
4
0
3 1.000 1.000 0.750 1.000 0.667 1.000 1.000 0.750 1.000 1.000 0.333 0.000 0.429 0.556 0.619
13
9
1
3 0.933 0.909 0.727 0.833 0.667 0.913 0.905 0.727 0.895 0.889 0.467 0.429 0.000 0.250 0.333
14
14
0
3 1.000 1.000 0.846 1.000 0.875 1.000 1.000 0.846 1.000 1.000 0.684 0.556 0.250 0.000 0.097
15
17
0
3 1.000 1.000 0.862 1.000 0.895 1.000 1.000 0.862 1.000 1.000 0.727 0.619 0.333 0.097 0.000
Table 24.3. Average within-group distance calculated from three
different distance matrices. The average within-group distance is used
as the test statistic.
Average within-group distance
Group
1
2
3
Average
Sørensen
Ranked Sørensen
Euclidean
0.602
0.149
0.449
0.400
0.453
0.159
0.337
0.316
9.78
2.79
7.79
6.79
Table 24.4. Summary statistics for MRPP of simple example. Results are
given for three different distance matrices, comparing across all groups, as
well as for multiple pairwise comparisons for the Sørensen distances. The
pairwise comparisons were also made with MRPP.
 under null hypothesis
Observed 
Expected
Variance
Skewness
Sørensen distances
Ranked Sørensen
Euclidean
0.400
0.316
6.79
0.625
0.495
9.97
0.0019
0.0012
0.5177
Multiple comparisons
(Sørensen)
1 vs 2
1 vs 3
2 vs 3
0.376
0.526
0.299
0.397
0.699
0.627
0.0005
0.0019
0.0036
T
p
A
-1.24
-1.26
-1.18
-5.14
-5.18
-4.43
0.0007
0.0007
0.0017
0.359
0.361
0.320
-0.31
-1.26
-2.11
-0.93
-3.89
-5.49
0.1730
0.0039
0.0017
0.055
0.248
0.523
Blocked MRPP (MRBP)
Given b blocks and g groups (treatments), the MRPP
statistic is modified to:
  b 
 =  g  
  2 
-1
g
 ( x
ij
, x ik )
i=1 j<k
where (x,y) is the distance between points x and y in the
p-dimensional space.
The combinatoric term is simply the number of items
represented in the double summation.
Average distance function commensuration. This option equalizes the
contribution of each variable to the distance function. For each variable m the
sum of deviations (Devm) is calculated:
g
Dev m =
b
g
b
 | x
mij
V
|
- x mkl
i=1 j=1 k=1 l=1
V is set to 2 for squared Euclidean distance or 1 for Euclidean distance. Then
each element x of the data matrix is divided by the sum of the deviations for
the corresponding variable to produce the transformed value y:
y mij = x mij / Dev m
Table 24.5. Example comparing results from raw data versus data aligned within
blocks to zero as input to Blocked MRPP.
Raw Data
Aligned Data
Block 1
Block 2
Block 1
Block 2
Group 1
Group 2
Group 3
Group 4
4
2
3
1
9
7
8
2
1.5
-0.5
0.5
-1.5
1.5
-0.5
0.5
-5.5
Median
2.5
7.5
0
0
Observed 
Expected 
Agreement (A)
p
5 = (5+5+5+1)/4
4.375
0.086
0.184
1 = (0+0+0+4)/4
2.225
0.556
0.016
Analysis of similarity (ANOSIM)
Elements of a similarity matrix among all sample units are
ranked. The highest similarity is given a rank of 1.
R  rB  rW  / ( M / 2)
where:
rB = rank similarity for each between-group similarity
rW = rank similarity for each within-group similarity
M = n(n-1)/2
n = the total number of sample units
The denominator constrains R to the range -1 to 1.
Positive values indicate differences among groups.
The Qb method
The test criterion is the sum of the squared distances between groups:
Qb = Qt - Qw
The total sum of squares (Qt) is based on one triangle of the distance
matrix, the triangle having n(n-1)/2 terms, each term being a squared
distance between two entities j and k:
1
Qt =
n
n-1
n
 d
j=1
2
kj
k=j+1
The within-group sum of squares Qwg is summed across all g groups:
g
Qw =  Qwg
i=1
where
Qwg
1
=
ng
n-1
n
d
j=1 k=j+1
2
jk
(j,k, g)
NPMANOVA (= perMANOVA)
Figure 24.4. The sum of squared distances
from points to the centroid (left) can be
calculated from the average squared
interpoint distance (right).
The total sum of squares of a distance matrix D with N rows and N
columns is
1
SS T 
N
N 1

i 1
N
2
d
 ij
j  i 1
The residual (within-group) sum of squares for a one-way
classification is
1 N 1
SS R  
n i1
N
2
d
 ij ij
j  i 1
where n is the number of observations per group, N is the number of sample units,
and ij =1 if i and j are in the same group, but ij =0 if in different groups.
The sum of squares between groups is then
SSA = SST - SSR
so we can calculate a pseudo-F-ratio:
SS A / ( a  1)
F
SS R / ( N  a )
where a is the number of groups. If the distance matrix contains Euclidean
distances, then this gives the traditional parametric univariate F ratio.
For a two-factor design (say factors A and B), one calculates the following terms:
SSA = within-group sum of squares for A, ignoring any influence of B
SSB = within-group sum of squares for B, ignoring any influence of A
SSR = residual sum of squares, pooling the sum of squares within groups
defined by each of the combinations of factors A and B
SSAB = interaction sum of squares for AB, by subtraction:
SSAB = SST - SSA - SSB - SSR
If factor B is nested within A, then
SSB(A) = SST - SSA - SSR
and there is no interaction term.