Nonparametric Methods III

Download Report

Transcript Nonparametric Methods III

Nonparametric Methods III
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
PART 4: Bootstrap and Permutation
Tests
Introduction
 References
 Bootstrap Tests
 Permutation Tests
 Cross-validation
 Bootstrap Regression
 ANOVA

2
References
Efron, B.; Tibshirani, R. (1993). An
Introduction to the Bootstrap. Chapman &
Hall/CRC.
 http://cran.r-project.org/doc/contrib/FoxCompanion/appendix-bootstrapping.pdf
 http://cran.rproject.org/bin/macosx/2.1/check/bootstra
p-check.ex
 http://bcs.whfreeman.com/ips5e/content/c
at_080/pdf/moore14.pdf

3
Hypothesis Testing (1)
A statistical hypothesis test is a method
of making statistical decisions from and
about experimental data.
 Null-hypothesis testing just answers the
question of “how well the findings fit the
possibility that chance factors alone might
be responsible.”
 This is done by asking and answering a
hypothetical question.
 http://en.wikipedia.org/wiki/Statistical_hyp
othesis_testing

4
Hypothesis Testing (2)

Hypothesis testing is largely the product of
Ronald Fisher, Jerzy Neyman, Karl Pearson
and (son) Egon Pearson. Fisher was an
agricultural statistician who emphasized
rigorous experimental design and methods
to extract a result from few samples
assuming Gaussian distributions.
5
Hypothesis Testing (3)
Neyman (who teamed with the younger
Pearson) emphasized mathematical rigor
and methods to obtain more results from
many samples and a wider range of
distributions. Modern hypothesis testing is
an (extended) hybrid of the Fisher vs.
Neyman/Pearson formulation, methods and
terminology developed in the early 20th
century.
6
Hypothesis Testing (4)
7
Hypothesis Testing (5)
8
Hypothesis Testing (6)
9
Hypothesis Testing (7)

Parametric Tests:

Nonparametric Tests:


Bootstrap Tests
Permutation Tests
10
Confidence Intervals
vs. Hypothesis Testing (1)

Interval estimation ("Confidence Intervals")
and point estimation ("Hypothesis Testing")
are two different ways of expressing the
same information.

http://www.une.edu.au/WebStat/unit_mate
rials/c5_inferential_statistics/confidence_int
erv_hypo.html
11
Confidence Intervals
vs. Hypothesis Testing (2)

If the exact p-value is reported, then the
relationship between confidence intervals
and hypothesis testing is very
close. However, the objective of the two
methods is different:


Hypothesis testing relates to a single conclusion
of statistical significance vs. no statistical
significance.
Confidence intervals provide a range of plausible
values for your population.
12
Confidence Intervals
vs. Hypothesis Testing (3)

Which one?



Use hypothesis testing when you want to do a
strict comparison with a pre-specified hypothesis
and significance level.
Use confidence intervals to describe the
magnitude of an effect (e.g., mean difference,
odds ratio, etc.) or when you want to describe a
single sample.
http://www.nedarc.org/nedarc/analyzingDa
ta/advancedStatistics/convidenceVsHypothe
sis.html
13
P-value

http://bcs.whfreeman.com/ips5e/content/c
at_080/pdf/moore14.pdf
14
Achieved Significance Level (ASL)

Definition:
A hypothesis test is a way of deciding whether
or not the data decisively reject the
hypothesis H 0 .
The archived significance level of the test (ASL)
is defined as: ASL  P ˆ*  ˆ0 | H 0 .
The smaller ASL, the stronger is the evidence
of H 0 false.
The ASL is an estimate of the p-value by
permutation and bootstrap methods.
https://www.cs.tcd.ie/Rozenn.Dahyot/453Boot
strap/05_Permutation.pdf



15
Bootstrap Tests
Methodology
 Flowchart
 R code

16
Bootstrap Tests

Beran (1988) showed that bootstrap
inference is refined when the quantity
bootstrapped is asymptotically pivotal.

It is often used as a robust alternative to
inference based on parametric assumptions.

http://socserv.mcmaster.ca/jfox/Books/Co
mpanion/appendix-bootstrapping.pdf
17
Hypothesis Testing by a Pivot (1)
Pivot or pivotal quantity: a function of
observations whose distribution does not
depend on unknown parameters.
 http://en.wikipedia.org/wiki/Pivotal_quantit
y
 Examples:


A pivot: Z 
X 


~ N 0,1
when X i ~ N      and X 
iid
n
X
i 1
n
i
,  is known
18
Hypothesis Testing by a Pivot (2)

An asymptotic pivot:
X  D
T

 N 0,1 as n  
S n
when X i ~ N     
iid
n
where X 
 Xi
i 1
n
 X
n
,  is unknown, and S 
i 1
i  X
n 1
2
One Sample Bootstrap Tests
T statistics can be regarded as a pivot or an
asymptotic pivotal when the data are
normally distributed.
 Bootstrap T tests can be applied when the
data are not normally distributed.

Bootstrap T tests
Flowchart
 R code

Flowchart of Bootstrap T Tests
ˆ  0
ˆ
data x x  ( x1 , x2 , ..., xn )    s ( x), and t0 
ˆ (ˆ)
Bootstrap B times
x *1
x *2
x *B
*
1
*
2
*
B
t
t
t
ˆb*   0
t 
ˆ (ˆb* )
*
b
*
ˆ
ASL

#{
t
Boot
b  t0 }/ B
22
Bootstrap T Tests by R

Output
23
Bootstrap Tests by The “Bca”
The BCa percentile method is an efficient
method to generate bootstrap confidence
intervals.
 There is a correspondence between
confidence intervals and hypothesis testing.
 So, we can use the BCa percentile method
to test whether H0 is true.
 Example: use BCa to calculate p-value

24
BCa Confidence Intervals:
Use R package “boot.ci(boot)”
 Use R package “bcanon(bootstrap)”
 http://qualopt.eivd.ch/stats/?page=bootstrap
 http://www.stata.com/capabilities/boot.html

25
R package "boot.ci(boot)"

http://finzi.psych.upenn.edu/R/library/boot
/DESCRIPTION
26
An Example of "boot.ci" in R

Output
27
R package "bcanon(bootstrap)"

http://finzi.psych.upenn.edu/R/library/boot
strap/DESCRIPTION
28
An example of "bcanon" in R

Output
29
BCa

http://qualopt.eivd.ch/stats/?page=bootstrap
30
Two Sample Bootstrap Tests
Flowchart
 R code

31
Flowchart of Two-Sample Bootstrap
Tests
Sample 1: y  ( y1 , y2 , ..., yn )
Sample 2: x  ( x1 , x 2 , ..., x m )
combine
m+n=N
combined data : d  ( d1 , d 2 , ..., d n , d n1 , ..., d n  m )  ( y, x)  ˆ  s( y)  s( x)
Bootstrap B times
d1*  (y1* , x1* ) d*2  (y*2 , x*2 )
ˆ1*  s (y1* )  s (x1* ) ˆ2*  s (y *2 )  s ( x*2 )
ˆ
ˆ* ˆ
ASL
Boot  (#(b   )) / B
d*B  (y*B , x*B )
ˆB*  s (y *B )  s (x*B )
32
Two-Sample Bootstrap Tests by R

Output
33
Permutation Tests
Methodology
 Flowchart
 R code

34
Permutation

In several fields of mathematics, the term
permutation is used with different but
closely related meanings. They all relate to
the notion of (re-)arranging elements from
a given finite set into a sequence.

http://en.wikipedia.org/wiki/Permutation
35
Permutation Tests (1)

Permutation test is also called a
randomization test, re-randomization test,
or an exact test.

If the labels are exchangeable under the
null hypothesis, then the resulting tests
yield exact significance levels.
36
Permutation Tests (2)

Confidence intervals can then be derived
from the tests.

The theory has evolved from the works of
R.A. Fisher and E.J.G. Pitman in the 1930s.

http://en.wikipedia.org/wiki/Pitman_permut
ation_test
37
Applications of Permutation Tests (1)

We can use a permutation test only when
we can see how to resample in a way that
is consistent with the study design and with
the null hypothesis.

http://bcs.whfreeman.com/ips5e/content/c
at_080/pdf/moore14.pdf
38
Applications of Permutation Tests (2)



Two-sample problems when the null
hypothesis says that the two populations are
identical. We may wish to compare population
means, proportions, standard deviations, or
other statistics.
Matched pairs designs when the null
hypothesis says that there are only random
differences within pairs. A variety of
comparisons is again possible.
Relationships between two quantitative
variables when the null hypothesis says that
the variables are not related. The correlation is
the most common measure of association, but39
not the only one.
Inference by Permutation Tests (1)

A traditional way is to consider some
hypotheses: Fa ~ N     2  and Fb ~ N     2  ,
and the null hypothesis becomes a  b .
Under H 0 , the statistic ˆ  X a  X b can be
modeled as a normal distribution with mean
 1 1
0 and variance       .
m n
2
ˆ

2
https://www.cs.tcd.ie/Rozenn.Dahyot/453B
ootstrap/05_Permutation.pdf
40
Inference by Permutation Tests (2)

The ASL is then computed by
ˆ ˆ 


*

ASL   ˆ
2 2ˆ
e

2
2 ˆ
dˆ*
when  is unknown and has to be
estimated from the data by
n
2 
 X
i 1
ai
m
 X a     X bi  X b 
2
2
i 1
mn2
We will reject H 0 if ASL  a .
41
Flowchart of The Permutation Test
for Mean Shift in One Sample
Sample x1 , x2 , ..., xn , xn1, xn2 , ..., xnm
O11
G11 G12
G21 G22
*
x*21
x11
*
x*22
x12
(treatment group)
x1
nm N
O12
x2
ˆ  s (x1 )  s (x 2 )
GB1 GB 2
Partition 2
subset B times
*
x*2 B
x1B
(control group)
(treatment group)
(control group)
ˆb*  s (x1*b )  s ( x*2b )
ˆ
ˆ*  ˆ)) / B, and B  C N
ASL

(#(

Perm
b
n
42
An Example for One Sample
Permutation Test by R (1)
43
An Example for One Sample
Permutation Test by R (2)

http://mason.gmu.edu/~csutton/EandTCh1
5a.txt
44
An Example for One Sample
Permutation Test by R (3)

Output
45
Flowchart of The Permutation Test
for Mean Shift in Two Samples
combine
Sample 1: y  ( y1 , y2 , ..., yn )
m+n=N
Sample 2: x  ( x1 , x 2 , ..., x m )
Partition subset
B times
combined data : d  ( d1 , d 2 , ..., d n , d n 1 , ..., d n  m )  ( y, x)  ˆ  s( y)  s( x)
G11 G12
G21 G22
GB1 GB 2
x1*
x*2
x*B
y1*
y*2
y*B
treatment
control
treatment
control
subgroup
subgroup
subgroup
subgroup
ˆb*  s (x*b )  s (y *b )
ˆ
ˆ*  ˆ)) / B, and B  C N
ASL

(#(

Perm
b
n
46
Bootstrap Tests vs. Permutation Tests
Very similar results between the
permutation test and the bootstrap test.
N
ˆ
 ASL
is
the
exact
probability
when
B

C
Perm
n .
ˆ
 ASL
Boot is not an exact probability but is
guaranteed to be accurate as an estimate
of the ASL, as the sample size B goes to
infinity.
 https://www.cs.tcd.ie/Rozenn.Dahyot/453B
ootstrap/05_Permutation.pdf

47
Cross-validation
Methodology
 R code

48
Cross-validation

Cross-validation, sometimes called rotation
estimation, is the statistical practice of
partitioning a sample of data into subsets
such that the analysis is initially performed
on a single subset, while the other subset(s)
are retained for subsequent use in confirming
and validating the initial analysis.
The initial subset of data is called the training
set.
 The other subset(s) are called validation or
testing sets.
http://en.wikipedia.org/wiki/Cross-validation


49
Overfitting Problems (1)
In statistics, overfitting is fitting a statistical
model that has too many parameters.
 When the degrees of freedom in parameter
selection exceed the information content of
the data, this leads to arbitrariness in the
final (fitted) model parameters which
reduces or destroys the ability of the model
to generalize beyond the fitting data.

50
Overfitting Problems (2)
The concept of overfitting is important also
in machine learning.
 In both statistics and machine learning, in
order to avoid overfitting, it is necessary to
use additional techniques (e.g. crossvalidation, early stopping, Bayesian priors
on parameters or model comparison), that
can indicate when further training is not
resulting in better generalization.
 http://en.wikipedia.org/wiki/Overfitting

51
R package “crossval(bootstrap)”
52
An Example of Cross-validation by R

Output
53
Bootstrap Regression
Bootstrapping pairs:
Resample from the sample pairs {xi , yi }.
 Bootstrapping residuals:
1. Fit yi  xi ˆ by the original sample and
obtain the residuals.
2. Resample from residuals.

54
Bootstrapping Pairs by R (1)

http://www.stat.uiuc.edu/~babailey/stat32
8/lab7.html
55
Bootstrapping Pairs by R (2)

Output
56
Bootstrapping Residuals by R

Output
57
ANOVA
When random errors follow a normal
distribution:
 When random errors do not follow a Normal
distribution:
Bootstrap tests:
Permutation tests:

58
An Example of ANOVA by R (1)

Example


Twenty lambs are randomly assigned to three
different diets. The weight gain (in two weeks) is
recorded. Is there a difference among the diets?
http://mcs.une.edu.au/~stat261/Bootstrap/
bootstrap.R
59
An Example of ANOVA by R (2)
60
An Example of ANOVA by R (3)
61
An Example of ANOVA by R (4)
62
An Example of ANOVA by R (5)

Output
63
An Example of ANOVA by R (6)
64
An Example of ANOVA by R (7)
65
An Example of ANOVA by R (1)

Data source


http://finzi.psych.upenn.edu/R/library/rpart/htm
l/kyphosis.html
Reference

http://www.stat.umn.edu/geyer/5601/examp/p
arm.html
66
An Example of ANOVA by R (2)

Kyphosis is a misalignment of the spine.
The data are on 83 laminectomy (a surgical
procedure involving the spine) patients. The
predictor variables are age and age^2 (that
is, a quadratic function of age), number of
vertebrae involved in the surgery and start
the vertebra number of the first vertebra
involved. The response is presence or
absence of kyphosis after the surgery (and
perhaps caused by it).
67
An Example of ANOVA by R (3)
68
An Example of ANOVA by R (4)

Output
69
An Example of ANOVA by R (5)
70
An Example of ANOVA by R (6)
71
Exercises

Write your own programs similar to those
examples presented in this talk.

Write programs for those examples
mentioned at the reference web pages.

Write programs for the other examples that
you know.

Practice Makes Perfect!
72