Nonparametric Methods III - TIGP Bioinformatics Program

Download Report

Transcript Nonparametric Methods III - TIGP Bioinformatics Program

Nonparametric Methods III
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
PART 4: Bootstrap and Permutation
Tests
Introduction
 References
 Bootstrap Tests
 Permutation Tests
 Cross-validation
 Bootstrap Regression
 ANOVA

2
References
Efron, B.; Tibshirani, R. (1993). An
Introduction to the Bootstrap. Chapman &
Hall/CRC.
 http://cran.r-project.org/doc/contrib/FoxCompanion/appendix-bootstrapping.pdf
 http://cran.rproject.org/bin/macosx/2.1/check/bootstr
ap-check.ex
 http://bcs.whfreeman.com/ips5e/content/
cat_080/pdf/moore14.pdf

3
Hypothesis Testing (1)
A statistical hypothesis test is a
method of making statistical decisions
from and about experimental data.
 Null-hypothesis testing just answers the
question of “how well the findings fit the
possibility that chance factors alone might
be responsible.”
 This is done by asking and answering a
hypothetical question.

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
4
Hypothesis Testing (2)

Hypothesis testing is largely the product of Ronald
Fisher, Jerzy Neyman, Karl Pearson and (son) Egon
Pearson. Fisher was an agricultural statistician who
emphasized rigorous experimental design and
methods to extract a result from few samples
assuming Gaussian distributions. Neyman (who
teamed with the younger Pearson) emphasized
mathematical rigor and methods to obtain more
results from many samples and a wider range of
distributions. Modern hypothesis testing is an
(extended) hybrid of the Fisher vs. Neyman/Pearson
formulation, methods and terminology developed in
the early 20th century.
5
Hypothesis Testing (3)
6
Hypothesis Testing (4)
7
Hypothesis Testing (5)
8
Hypothesis Testing (7)

Parametric Tests:

Nonparametric Tests:


Bootstrap Tests
Permutation Tests
9
Confidence Intervals
vs. Hypothesis Testing (1)

Interval estimation ("Confidence
Intervals") and point estimation
("Hypothesis Testing") are two different
ways of expressing the same information.
http://www.une.edu.au/WebStat/unit_materials/
c5_inferential_statistics/confidence_interv_hypo.html
10
Confidence Intervals
vs. Hypothesis Testing (2)

If the exact p-value is reported, then the
relationship between confidence intervals
and hypothesis testing is very
close. However, the objective of the two
methods is different:


Hypothesis testing relates to a single
conclusion of statistical significance vs. no
statistical significance.
Confidence intervals provide a range of
plausible values for your population.
http://www.nedarc.org/nedarc/analyzingData/
advancedStatistics/convidenceVsHypothesis.html
11
Confidence Intervals
vs. Hypothesis Testing (3)

Which one?


Use hypothesis testing when you want to do a
strict comparison with a pre-specified
hypothesis and significance level.
Use confidence intervals to describe the
magnitude of an effect (e.g., mean difference,
odds ratio, etc.) or when you want to describe
a single sample.
http://www.nedarc.org/nedarc/analyzingData/
advancedStatistics/convidenceVsHypothesis.html
12
P-value
http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf
13
Achieved Significance Level (ASL)
Definition
A hypothesis test is a way of deciding whether or not the data decisively
reject the hypothesis H 0 .
Definition:
The achieved significance level of the test (ASL) is defined as:
ASL = P(ˆ*  ˆ | H ).
0
0
The smaller ASL, the stronger is the evidence of H 0 false.
The ASL is an estimate of the p-value by permuation and bootstrap methods.
https://www.cs.tcd.ie/Rozenn.Dahyot/453Bootstrap/05_Permutation.pdf
14
Bootstrap Tests



Methodology
Flowchart
R code
15
Bootstrap Tests
Beran (1988) showed that bootstrap
inference is refined when the quantity
bootstrapped is asymptotically pivotal.
 It is often used as a robust alternative to
inference based on parametric
assumptions.

http://socserv.mcmaster.ca/jfox/Books/Companion/appendixbootstrapping.pdf
16
Hypothesis Testing by a Pivot
Pivot or pivotal quantity: a function of observations whose distribution does
not depend on unknown parameters.
http://en.wikipedia.org/wiki/Pivotal_quantity
Examples :
n
1. A pivot : Z 
X - 

 N (0, 1), when X i iid N (  ,  ) , and X 
2
n
2. An asymptotic pivot : T =
where X 
i 1
n
n
i
i 1
n
i
,  is known.
X - D

 N (0, 1) as n  , when X i iid N (  ,  2 ),
S
n
n
X
X
,  is unknown and S 
(X
i 1
i
 X )2
n 1
.
17
One Sample Bootstrap Tests
T statistics can be regarded as a pivot or an
asymptotic pivotal when the data are
normally distributed.
 Bootstrap T tests can be applied when the
data are not normally distributed.

18
Bootstrap T tests
Flowchart
 R code

19
Flowchart of Bootstrap T Tests
data x x  ( x1 , x2 , ..., xn )  ˆ  s ( x), and t0 
ˆ   0
·
 (ˆ)
Bootstrap B times
x *1
x *2
x *B
*
1
*
2
*
B
t
t
t
t 
*
b
ˆb*   0
 (ˆb* )
·
ASLBoot  #{tb*  t0 }/ B
20
Bootstrap T Tests by R
21
An Example of Bootstrap T Tests by R
22
Bootstrap Tests by The “BCa”
The BCa percentile method is an efficient
method to generate bootstrap confidence
intervals.
 There is a correspondence between
confidence intervals and hypothesis
testing.
 So, we can use the BCa percentile method
to test whether H0 is true.
 Example: use BCa to calculate p-value

23
BCa Confidence Intervals:




Use R package “boot.ci(boot)”
Use R package “bcanon(bootstrap)”
http://qualopt.eivd.ch/stats/?page=bootstrap
http://www.stata.com/capabilities/boot.html
24
http://finzi.psych.upenn.edu/R/library/boot/DESCRIPTION
25
An Example of “boot.ci(boot)” in R
26
http://finzi.psych.upenn.edu/R/library/bootstrap/DESCRIPTION
27
An example of “bcanon(bootstrap)” in
R
28
BCa by
http://qualopt.eivd.ch/stats/?page=bootstrap
29
Use BCa to calculate p-value by R
30
Two Sample Bootstrap Tests
Flowchart
 R code

31
Flowchart of Two-Sample Bootstrap
Tests
Sample 1: y  ( y1 , y2 , ..., yn )
Sample 2: x  ( x1 , x 2 , ..., x m )
combine
m+n=N
combined data : d  ( d1 , d 2 , ..., d n , d n1 , ..., d n  m )  ( y, x)  ˆ  s( y)  s( x)
d1*  (y1* , x1* ) d*2  (y*2 , x*2 )
d*B  (y*B , x*B )
ˆ1*  s (y1* )  s (x1* ) ˆ2*  s (y *2 )  s ( x*2 )
Bootstrap B times
ˆB*  s (y *B )  s (x*B )
·
ASLBoot  (#(ˆb*  ˆ)) / B
32
Two-Sample Bootstrap Tests by R
33
Output (1)
34
Output (2)
35
Permutation Tests
Methodology
 Flowchart
 R code

36
Permutation

In several fields of mathematics, the term
permutation is used with different but
closely related meanings. They all relate
to the notion of (re-)arranging elements
from a given finite set into a sequence.
http://en.wikipedia.org/wiki/Permutation
37
Permutation Tests
Permutation test is also called a
randomization test, re-randomization test,
or an exact test.
 If the labels are exchangeable under the
null hypothesis, then the resulting tests
yield exact significance levels.
 Confidence intervals can then be derived
from the tests.
 The theory has evolved from the works of
R.A. Fisher and E.J.G. Pitman in the 1930s.

http://en.wikipedia.org/wiki/Pitman_permutation_test
38
Applications of Permutation Tests (1)
We can use a permutation test only when
we can see how to resample in a way that
is consistent with the study design and
with the null hypothesis.
http://bcs.whfreeman.com/ips5e/content/
cat_080/pdf/moore14.pdf
39
Applications of Permutation Tests
(2)

Two-sample problems when the null
hypothesis says that the two populations are
identical. We may wish to compare population means,
proportions, standard deviations, or other statistics.

Matched pairs designs when the null
hypothesis says that there are only random
differences within pairs. A variety of comparisons is
again possible.

Relationships between two quantitative
variables when the null hypothesis says that the
variables are not related. The correlation is the most
common measure of association, but not the only
one.
http://bcs.whfreeman.com/ips5e/content/
40
cat_080/pdf/moore14.pdf
Inference by Permutation Tests
A tradionnal way is to consider some hypotheses: Fa ~N ( ,  2 )
and Fb ~N ( ,  2 ), and the null hypothesis becomes a = b .
Under H , the statistic ˆ. = X -X can be modelled as a normal
0
a
b
distribution with mean 0 and variance  2ˆ   2 (
1 1
 ).
m n
The ASL is then computed by
 (ˆ* ˆ ) 2

ASL=  ˆ

e
2 2
2 ˆ
dˆ*
when  is unknown and has to be estimated from the data by
n
m
2
(
X

X
)

(
X

X
)
 ai a  bi b
2
2 
i 1
i 1
mn2
We will reject H 0 if ASL >  .
.
https://www.cs.tcd.ie/Roze
nn.Dahyot/453Bootstrap/05
_Permutation.pdf
41
Flowchart of The Permutation Test
for Mean Shift in One Sample
Sample x1 , x2 , ..., xn , xn1, xn2 , ..., xnm
O11
G11 G12
G21 G22
*
x*21
x11
*
x*22
x12
x1
nm N
O12
x2
ˆ  s (x1 )  s (x 2 )
GB1 GB 2
Partition 2
subset B times
*
x*2 B
x1B
(treatment group)(control group)
(treatment group)(control group)
ˆb*  s (x1*b )  s ( x*2b )
·
ASLPerm  (#(ˆb*  ˆ)) / B, and B  CnN
42
An Example for One Sample Permutation Test by R
http://mason.gmu.edu/~csutton/
43
EandTCh15a.txt
44
An Example of Output Results
45
Flowchart of The Permutation Test
for Mean Shift in Two Samples
Sample 1: y  ( y1 , y2 , ..., yn )
combine
Sample 2: x  ( x1 , x 2 , ..., x m )
m+n=N
combined data : d  ( d1 , d 2 , ..., d n , d n 1 , ..., d n  m )  ( y, x)  ˆ  s( y)  s( x)
Partition
subset B times
G11
G12
G21 G22
G B1
GB 2
x1*
y1*
x*2
y*2
x*B
y*B
treatment
control
subgroup
subgroup
ˆb*  s (x*b )  s (y *b )
treatment
control
subgroup
subgroup
·
ASLPerm  (#(ˆb*  ˆ)) / B, and B  CnN
46
Bootstrap Tests vs. Permutation Tests
Very similar results between the
permutation test and the bootstrap test.
N
 ASLPerm is the exact probability when B  Cn .
 ASLBoot is not an exact probability but is
guaranteed to be accurate as an estimate
of the ASL, as the sample size B goes to
infinity.

https://www.cs.tcd.ie/Rozenn.Dahyot/453Bootstrap/05_Permutation.pdf
47
Cross-validation
Methodology
 R code

48
Cross-validation

Cross-validation, sometimes called
rotation estimation, is the statistical
practice of partitioning a sample of data
into subsets such that the analysis is
initially performed on a single subset,
while the other subset(s) are retained for
subsequent use in confirming and
validating the initial analysis.


The initial subset of data is called the training
set.
the other subset(s) are called validation or
testing sets.
http://en.wikipedia.org/wiki/Cross-validation
49
Overfitting Problems





In statistics, overfitting is fitting a statistical model that has
too many parameters.
When the degrees of freedom in parameter selection
exceed the information content of the data, this leads to
arbitrariness in the final (fitted) model parameters which
reduces or destroys the ability of the model to generalize
beyond the fitting data.
The concept of overfitting is important also in machine
learning.
In both statistics and machine learning, in order to avoid
overfitting, it is necessary to use additional techniques (e.g.
cross-validation, early stopping, Bayesian priors on
parameters or model comparison), that can indicate when
further training is not resulting in better generalization.
http://en.wikipedia.org/wiki/Overfitting
50
library(bootstrap)
?crossval
51
An Example of Cross-validation by
R
52
output
53
Bootstrap Regression
Bootstrapping pairs:
Resample from the sample pairs { xi , yi }.
 Bootstrapping residuals:
1. Fit yi  xi ˆ by the original sample and
obtain the residuals.
2. Resample from residuals.

54
Bootstrapping Pairs by R
http://www.stat.uiuc.edu/~babailey/stat328/lab7.html
55
Output
56
Bootstrapping Residuals by R
http://www.stat.uiuc.edu/~babailey/stat328/lab7.html
57
Bootstrapping residuals
58
ANOVA
When random errors follow a normal
distribution:
 When random errors do not follow a
Normal distribution:
Bootstrap tests:
Permutation tests:

59
An Example of ANOVA by R (1)

Example
Twenty lambs are randomly assigned to three
different diets. The weight gain (in two weeks)
is recorded. Is there a difference among the
diets?
Reference
 http://mcs.une.edu.au/~stat261/Bootstrap/bo
otstrap.R


60
An Example of ANOVA by R (1)
61
An Example of ANOVA by R (2)
62
An Example of ANOVA by R (3)
63
Output (1)
64
Output (2)
65
Output (3)
66
Output (4)
67
Output (5)
68
Output (6)
69
Output (7)
70
The Second Example of ANOVA by R
(1)


Data source
 http://finzi.psych.upenn.edu/R/library/rpart/html/
kyphosis.html
Reference


http://www.stat.umn.edu/geyer/5601/examp/parm.html
Kyphosis is a misalignment of the spine. The data
are on 83 laminectomy (a surgical procedure
involving the spine) patients. The predictor variables
are age and age^2 (that is, a quadratic function of
age), number of vertebrae involved in the surgery
and start the vertebra number of the first vertebra
involved. The response is presence or absence of
kyphosis after the surgery (and perhaps caused by
it).
71
The Second Example of ANOVA by R (2)
72
The Second Example of ANOVA by R
(3)
73
The Second Example of ANOVA by R
(4)
74
Output (1)
Data = kyphosis
75
Output (2)
76
Output (3)
77
Output (4)
78
Output (5)
#deviance
#p-value
79
Output (6)
80
Exercises:

Write your own programs similar to those
examples presented in this talk.

Write programs for those examples
mentioned at the reference web pages.

Write programs for the other examples
that you know.

Practice Makes Perfect!
81