Ingen lysbildetittel

Download Report

Transcript Ingen lysbildetittel

Rotation Tests -
Computing exact adjusted p-values in multiresponse experiments
Øyvind Langsrud, MATFORSK, Norwegian Food Research Institute.
Multiple responses are common in industrial and scientific experimentation. Often these multiple response
variables are related in some way. Digitisations of continuous curves and several related measures of the
same physical phenomena are examples of such data. Ordinary variance analysis (or general linear
modelling) of each response variable results in several p-values (raw p-values). We may want to adjust
these p-values in the sense that the experimentwise error rate is controlled. Bonferroni adjustment is the
most well known method, but this method is extremely conservative.
It is, however, possible to compute exact and non-conservative adjusted p-values by using Monte Carlo
testing. The unknown covariance matrix is handled by conditioning on sufficient statistics and this
methodology is called rotation tests. Compared to permutation tests, we replace permutations by proper
random rotations. Permutation tests avoid the multinormal assumption, but they are limited to relatively
simple models. On the other hand, a rotation test can be applied to adjust p-values from any general linear
model. Instead of controlling the experimentwise (or familywise) error rate, we can make a rotation testing
method that controls the false discovery rate. This type of p-value adjustment has become very popular in
microarray data analysis. More information and free software can be found at
http://www.matforsk.no/ola/rotation.htm.
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 1 |
Forfatter: Øyvind Langsrud
Campylobacter experiment
0.08
0.06
Temperature
Atmosphere
Days
Aerobic
Microaerobic
Anaerobic
2



4



7
b)


Aerobic
Microaerobic
Anaerobic



 b)


 b)


0.04
5C
25C
0.02
0
-0.02
-0.04
0
50
100
150
200
250
300
350
 Three biological replicates (block variable)
 312 FT-IR wavelengths as multiple responses

polysaccharide region [1200-900 cm-1]
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 2 | Forfatter: Øyvind Langsrud
Analysis with MINITAB - first wavelength
Analysis of Variance for 1200.6, using Adjusted SS for Tests
Source
BioRep
Atmos
Day
Temp
Atmos*Temp
Day*Temp
Error
Total
DF
2
2
2
1
2
2
31
42
Seq SS
0.0001545
0.0000639
0.0001147
0.0002607
0.0000002
0.0000082
0.0003117
0.0009139
Adj SS
0.0001407
0.0000698
0.0000730
0.0001881
0.0000007
0.0000082
0.0003117
Adj MS
0.0000703
0.0000349
0.0000365
0.0001881
0.0000004
0.0000041
0.0000101
F
7.00
3.47
3.63
18.71
0.03
0.41
P
0.003
0.044
0.038
0.000
0.966
0.669
Least Squares Means for 1200.6
Day
2
4
7
Mean
0.005694
0.003469
0.002255
SE Mean
0.000774
0.000915
0.001069
handeled by MINITAB
Atmos*Day could not be
Because of missing values,
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 3 | Forfatter: Øyvind Langsrud
ANALYSIS with 50-50 MANOVA
--- 50-50 MANOVA ver. 1.71 --- 43 objects -- 1 responses:
Source
DF
exVarSS nPC nBu exVarPC exVarBU
p-Value
BioRep
2
0.155716
1
0 1.000
1.000
0.003046
Atmos
2
0.075998
1
0 1.000
1.000
0.043882
Day
2
0.075725
1
0 1.000
1.000
0.044326
Temp
1
0.298146
1
0 1.000
1.000
0.000014
Atmos*Day
3(4)
0.037330
1
0 1.000
1.000
0.347352
Atmos*Temp
2
0.004477
1
0 1.000
1.000
0.814765
Day*Temp
2
0.000229
1
0 1.000
1.000
0.989520
Error
28
0.303750 - STANDARDIZATION ON ------------
--- 50-50 MANOVA ver. 1.71 --- 43 objects -- 312 responses:
Source
DF
exVarSS nPC nBu exVarPC exVarBU
p-Value
BioRep
2
0.134434
2 11 0.842
1.000
0.000000
Atmos
2
0.062564
2 11 0.806
1.000
0.000496
Day
2
0.110177
2 11 0.812
1.000
0.000000
Temp
1
0.090016
2 11 0.809
1.000
0.000000
Atmos*Day
3(4)
0.018259
2 11 0.811
1.000
0.929734
Atmos*Temp
2
0.028598
2 11 0.810
1.000
0.102187
Day*Temp
2
0.027059
2 11 0.800
1.000
0.348228
Error
28
0.428493 - STANDARDIZATION ON ------------
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 4 | Forfatter: Øyvind Langsrud
Adjusted means as curves
Day
2
4
7
0.06
0.04
0.02
0
-0.02
50
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
100
150
200
250
300
Ark nr.: 5 | Forfatter: Øyvind Langsrud
Effect of Day - 312 single response p-values
 Ordinary significance tests are not longer suitable

A lot of significant results cased by random variation
(since several tests/responses)
 The p-values need to be adjusted

So that they are interpretable
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 6 | Forfatter: Øyvind Langsrud
Adjusted p-values
 So that experimentwise (or familywise) error rate is controlled
 Bonferroni correction (classical method)



pAdj = #responses • pRaw
Conservative upper bound (in practice often too conservative )
Dependence among responses not investigated
 Modern methods



Makes active use of dependence among responses
Permutation tests
Rotation tests
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 7 | Forfatter: Øyvind Langsrud
Assume a regression model
(simplified model without constant term)
Ynq  X nr  Brq  Enq
H0 : B  0
 Separate F-tests for each response
 Random variables: F1, F2 … , Fq
 Observed values:
f1, f2 … , fq
 Maximal F-value (= minimum p-value) obtained for response number k

Raw p-value:
pk  P( Fk  f k )

Adjusted p-value:
pk  P(max ( Fi )  f k )
i 1q
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 8 | Forfatter: Øyvind Langsrud
Adjusting the minimum p-value by permutations
f max  max ( f i )  G ( X , Y )
i 1q
 For m=1,2 …. M


permute data (Y  P(m)Y )
compute maximal F-statistic from these data
(m)
f max
 G( X , P (m)Y )
 Compute p-value as
(m)
# ( f max
 f max )
M 1
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 9 | Forfatter: Øyvind Langsrud
How is dependence handled?
 Estimate of covariance matrix under H0:
 Estimate based on permuted data:
1
n
1
n
T
Y TY
Y P
(m) T
P (m)Y  1n Y T Y
 The permutation test is a conditional test


Conditioned on the covariance matrix estimate
Conditioned on sufficient statistics for the unknown parameters
 Fisher's exact test for 22 contingency tables

is the most famous conditional test
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 10 | Forfatter: Øyvind Langsrud
Conditional test under multivariate normality?
 Need distribution of Y conditioned on YT Y
 Answer

Y is distributed as RYobs


where Yobs is the observed matrix
and where R is an uniformly distributed orthogonal matrix (random rotation matrix)
 Relation to well-known tests

t-test, F-tests, Hotelling T2, Wilks’  are special cases of rotation testing



But these test statistics do not depend on YT Y
Conditioning not needed
Simulations not necessary
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 11 | Forfatter: Øyvind Langsrud
Adjusting the minimum p-value by rotations
f max  max ( f i )  G ( X , Y )
i 1q
 For m=1,2 …. M

simulate rotated data (Y  R(m)Y )


where R(m) is a simulated random rotation matrix
compute maximal F-statistic from these data
(m)
f max
 G( X , R(m)Y )
 Compute p-value as
(m)
# ( f max
 f max )
M 1
 In practice: a much more efficient algorithm is applied
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 12 | Forfatter: Øyvind Langsrud
Adjusting the other p-values
(permutations or rotations)




Remove response with minimum p-value
Adjust minimum p-value in new data set
and so on
Enforce monotonicity
 All calculations can be done simultaneously
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 13 | Forfatter: Øyvind Langsrud
Permutation test or rotation test
 Exact permutation testing



The only assumption: independent observations
Useless for few observations
Useless for complex ANOVA and regression models
 Exact rotation testing



Assumes multivariate normality
Does not need as many observations as permutation testing
Can be use for complex ANOVA and regression models

F-test  rotation test
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 14 | Forfatter: Øyvind Langsrud
Adjusted p-values (FWE)
 non-adjusted p-values (RAW)

False significance at 1% level is expected in 1% of all the investigated
responses

If you have 5000 responses …..
 “Classically ” adjusted p-values (FWE)


False significance at 1% level is expected in not more that 1% of all
experiments where the method is applied.
The experimentwise (or familywise) error rate is controlled
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 15 | Forfatter: Øyvind Langsrud
False Discovery Rate (FDR)
 Adjusted p-values according to False Discovery Rate


False significance at 1% level is expected in 1% of all cases (responses) reported
as significant at 1% level.
If you have 5000 responses and 200 are reported as significant at 1% level, one
will expect two of these as false.
 “q-values” is proposed instead of “adjusted p-values”
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 16 | Forfatter: Øyvind Langsrud
Calculation of FDR adjusted p-values
 Several methods exist
 Most of them do not handle the dependence among the responses

but OK if the “weak dependence requirement” is met
 New variant



based on rotations (or alternatively permutations)
handles any kind of dependence
conservative compared to other methods

since the method does not involve an estimate of the amount of
responses with true null hypotheses
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 17 | Forfatter: Øyvind Langsrud
Adjusted p-values (first 30 wavelengths)
rankNr
151
153
152
149
148
145
143
140
137
135
132
130
125
124
120
117
115
113
110
108
104
102
101
103
106
109
114
119
129
138
varNr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
varName
1200.6
1199.64
1198.67
1197.71
1196.74
1195.78
1194.82
1193.85
1192.89
1191.92
1190.96
1189.99
1189.03
1188.07
1187.1
1186.14
1185.17
1184.21
1183.24
1182.28
1181.32
1180.35
1179.39
1178.42
1177.46
1176.49
1175.53
1174.57
1173.6
1172.64
pRaw
0.044326
0.044794
0.044399
0.043170
0.041301
0.038873
0.036089
0.033120
0.030078
0.027126
0.024367
0.021827
0.019559
0.017581
0.015869
0.014414
0.013182
0.012153
0.011315
0.010663
0.010197
0.009898
0.009792
0.009918
0.010350
0.011244
0.012874
0.015757
0.021030
0.031110
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
pAdjFDR
0.089791
0.090035
0.089791
0.088950
0.085699
0.082265
0.077248
0.072581
0.067314
0.061613
0.056594
0.051500
0.047599
0.043434
0.040234
0.037658
0.034951
0.032861
0.031373
0.030080
0.029590
0.029204
0.029204
0.029204
0.029702
0.031373
0.034500
0.040234
0.049988
0.069138
pAdjFWE
0.308669
0.310380
0.308669
0.303313
0.294036
0.282597
0.268276
0.252768
0.236018
0.218551
0.202177
0.186164
0.173055
0.159513
0.147983
0.137538
0.128280
0.120358
0.113914
0.108626
0.105095
0.102575
0.101727
0.102743
0.106292
0.113354
0.125852
0.147242
0.181178
0.241854
Ark nr.: 18 | Forfatter: Øyvind Langsrud
Adjusted p-values (30 most significant wavelengths)
rankNr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
varNr
180
181
179
182
178
49
48
50
47
183
177
51
46
176
184
52
175
45
185
53
174
186
44
173
54
172
187
171
43
55
varName
1027.99
1027.03
1028.95
1026.06
1029.92
1154.32
1155.28
1153.35
1156.24
1025.1
1030.88
1152.39
1157.21
1031.85
1024.13
1151.42
1032.81
1158.17
1023.17
1150.46
1033.78
1022.2
1159.14
1034.74
1149.49
1035.7
1021.24
1036.67
1160.1
1148.53
pRaw
0.000060
0.000062
0.000063
0.000067
0.000070
0.000070
0.000070
0.000075
0.000077
0.000079
0.000081
0.000087
0.000092
0.000098
0.000098
0.000108
0.000121
0.000123
0.000130
0.000146
0.000153
0.000180
0.000186
0.000195
0.000213
0.000247
0.000259
0.000310
0.000325
0.000334
pAdjFDR
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001281
0.001323
0.001351
0.001352
0.001352
0.001433
0.001527
0.001527
0.001562
0.001701
0.001728
0.001968
0.001983
0.002025
0.002143
0.002392
0.002451
0.002848
0.002923
0.002940
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
pAdjFWE
0.001340
0.001364
0.001385
0.001471
0.001509
0.001516
0.001526
0.001622
0.001656
0.001690
0.001726
0.001826
0.001912
0.002009
0.002038
0.002218
0.002454
0.002490
0.002607
0.002887
0.003001
0.003465
0.003574
0.003705
0.003998
0.004533
0.004749
0.005586
0.005855
0.006000
Ark nr.: 19 | Forfatter: Øyvind Langsrud
p-values
1
RAW
FWE
FDR
Bonfer
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
50
100
150
200
250
300
350
Ark nr.: 20 | Forfatter: Øyvind Langsrud
Rotation Tests - Conclusion
 Simulation principle for computing exact Monte Carlo p-value for any test
statistic.
 Based on multivariate normal distribution.
 Generalisation of classical tests.
 Related to permutation testing.
 Useful for computing adjusted p-values (F-tests)



FWE, FDR
General linear models (ANOVA and regression)
Implemented in the 50-50 MANOVA program (www.matforsk.no/ola)
- a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk
Ark nr.: 21 | Forfatter: Øyvind Langsrud