What are non-parametric tests?

Download Report

Transcript What are non-parametric tests?

Statistics for Health Research
Non-Parametric
Methods
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives of Presentation
•
•
•
•
Introduction
Ranks & Median
Paired Wilcoxon Signed Rank
Mann-Whitney test (or Wilcoxon Rank
Sum test)
• Spearman’s Rank Correlation
Coefficient
• Others….
What are non-parametric tests?
• ‘Parametric’ tests involve estimating
parameters such as the mean, and
assume that distribution of sample
means are ‘normally’ distributed
• Often data does not follow a Normal
distribution eg number of cigarettes
smoked, cost to NHS etc.
• Positively skewed distributions
A positively skewed distribution
20
Frequency
15
10
5
Mean = 8.03
Std. Dev. = 12.952
N = 30
0
0
10
20
30
Units of alcohol per week
40
50
What are non-parametric tests?
• ‘Non-parametric’ tests were developed for
these situations where fewer assumptions
have to be made
• Sometimes called Distribution-free tests
• NP tests STILL have assumptions but are
less stringent
• NP tests can be applied to Normal data but
parametric tests have greater power IF
assumptions met
Ranks
• Practical differences between
parametric and NP are that NP
methods use the ranks of values
rather than the actual values
• E.g.
1,2,3,4,5,7,13,22,38,45 - actual
1,2,3,4,5,6, 7, 8, 9,10 - rank
Median
• The median is the value above and
below which 50% of the data lie.
• If the data is ranked in order, it is
the middle value
• In symmetric distributions the mean
and median are the same
• In skewed distributions, median more
appropriate
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=140
• No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=
Median
• BPs:
135, 138, 140, 140, 141, 142, 143
Median=140
• No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=2.5
T-test
• T-test used to test whether the
mean of a sample is sig different
from a hypothesised sample mean
• T-test relies on the sample being
drawn from a normally distributed
population
• If sample not Normal then use the
Wilcoxon Signed Rank Test as an
alternative
Wilcoxon tests
• Frank Wilcoxon was Chemist
In USA who wanted to develop
test similar to t-test but without
requirement of Normal distribution
• Presented paper in 1945
• Wilcoxon Signed Rank Ξ paired t-test
• Wilcoxon Rank Sum Ξ independent ttest
Wilcoxon Signed Rank Test
• NP test relating to the median as
measure of central tendency
• The ranks of the absolute
differences between the data and the
hypothesised median calculated
• The ranks for the negative and the
positive differences are then summed
separately (W- and W+ resp.)
• The minimum of these is the test
statistic, W
Wilcoxon Signed Rank Test
Normal Approximation
• As the number of ranks (n) becomes
•
•
•
•
larger, the distribution of W becomes
approximately Normal
Generally, if n>20
Mean W=n(n+1)/4
Variance W=n(n+1)(2n+1)/24
Z=(W-mean W)/SD(W)
Wilcoxon Signed Rank Test
Assumptions
• Population should be approximately
symmetrical but need not be Normal
• Results must be classified as either
being greater than or less than the
median ie exclude results=median
• Can be used for small or large
samples
Paired samples t-test
• Disadvantage: Assumes data are a
random sample from a population
which is Normally distributed
• Advantage: Uses all detail of the
available data, and if the data are
normally distributed it is the most
powerful test
The Wilcoxon Signed Rank Test
for Paired Comparisons
• Disadvantage: Only the sign (+ or -)
of any change is analysed
• Advantage: Easy to carry out and
data can be analysed from any
distribution or population
Paired And Not Paired
Comparisons
• If you have the same sample
measured on two separate occasions
then this is a paired comparison
• Two independent samples is not a
paired comparison
• Different samples which are
‘matched’ by age and gender are
paired
The Wilcoxon Signed Rank Test
for Paired Comparisons
• Similar calculation to the Wilcoxon
Signed Rank test, only the
differences in the paired results are
ranked
• Example using SPSS:
A group of 10 patients with chronic
anxiety receive sessions of cognitive
therapy. Quality of Life scores are
measured before and after therapy.
Wilcoxon Signed Rank Test
example
QoL Score
Before After
6
9
5
12
3
9
4
9
2
3
1
1
3
2
8
12
6
9
12
10
Diff
3
7
6
5
1
0
-1
4
3
-2
Rank
5.5
10
9
8
4
3
2
7
5.5
1
-/+
+
+
+
+
+
tied
+
+
-
W- = 2
W+ = 7
1 tied
Wilcoxon Signed Rank Test
example
SPSS Output
p < 0.05
Wilcoxon tests
• Frank Wilcoxon was Chemist
In USA who wanted to develop
test similar to t-test but without
requirement of Normal distribution
• Presented paper in 1945
• Wilcoxon Signed Rank Ξ paired t-test
• Wilcoxon Rank Sum Ξ independent ttest
Mann-Whitney test Ξ Wilcoxon
Rank Sum
• Used when we want to compare two
HB Mann
•
•
unrelated or INDEPENDENT groups
For parametric data you would use
the unpaired (independent) samples
t-test
The assumptions of the t-test
were:
1. The distribution of the measure in each
group is approx Normally distributed
2. The variances are similar
Example (1)
The following data shows the number
of alcohol units per week collected in a
survey:
Men (n=13): 0,0,1,5,10,30,45,5,5,1,0,0,0
Women (n=14): 0,0,0,0,1,5,4,1,0,0,3,20,0,0
Is the amount greater in men compared
to women?
Example (2)
How would you test whether the
distributions in both groups are
approximately Normally distributed?




Plot histograms
Stem and leaf plot
Box-plot
Q-Q or P-P plot
Boxplots of alcohol units per week by gender
50
7
Units of alcohol per week
40
6
30
25
20
10
0
Male
Female
Gender
Example (3)
Are those distributions symmetrical?
Definitely not!
They are both highly skewed so not
Normal. If transformation is still not Normal
then use non-parametric test – Mann Whitney
Suggests perhaps that males tend to
have a higher intake than women.
Mann-Whitney on SPSS
Normal approx (NS)
Mann-Whitney (NS)
Spearman Rank Correlation
• Method for investigating the
relationship between 2 measured
variables
• Non-parametric equivalent to
Pearson correlation
• Variables are either non-Normal or
measured on ordinal scale
Spearman Rank Correlation
Example
A researcher wishes to assess whether
the distance to general practice
influences the time of diagnosis of
colorectal cancer.
The null hypothesis would be that
distance is not associated with time to
diagnosis. Data collected for 7 patients
Distance from GP and time to diagnosis
Distance (km)
Time to diagnosis
(weeks)
5
6
2
4
4
3
8
4
20
5
45
5
10
4
Scatterplot
Distance from GP and time to diagnosis
D2
Distance
(km)
Time
(weeks)
Rank for
distance
Rank for
time
Difference
in Ranks
2
4
1
3
-2
4
4
3
2
1
1
1
5
6
3
7
-4
16
8
4
4
3
1
1
10
4
5
3
2
4
20
5
6
5.5
0.5
0.25
45
5
7
5.5
1.5
2.25
Total = 0
d2=28.5
Spearman Rank Correlation
Example
The formula for Spearman’s rank
correlation is:
rs  1 
6 d

2
n n 1
2

where n is the number of pairs
Spearman’s in SPSS
Spearman’s in SPSS
Spearman Rank Correlation
Example
In our example, rs=0.468
In SPSS we can see that this value is
not significant, ie.p=0.29
Therefore there is no significant
relationship between the distance to a
GP and the time to diagnosis but note
that correlation is quite high!
Spearman Rank Correlation
• Correlations lie between –1 to +1
• A correlation coefficient close to
•
•
zero indicates weak or no
correlation
A significant rs value depends on
sample size and tells you that its
unlikely these results have arisen by
chance
Correlation does NOT measure
causality only association
Chi-squared test
• Used when comparing 2 or more
•
•
groups of categorical or nominal
data (as opposed to measured data)
Already covered!
In SPSS Chi-squared test is test of
observed vs. expected in single
categorical variable
More than 2 groups
• So far we have been comparing 2
•
•
•
•
groups
If we have 3 or more independent
groups and data is not Normal we
need NP equivalent to ANOVA
If independent samples use KruskalWallis
If related samples use Friedman
Same assumptions as before
More than 2 groups
Parametric related to Nonparametric test
Parametric Tests
Single sample t-test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Non-parametric Tests
Parametric / Non-parametric
Parametric Tests
Single sample t-test
Paired sample t-test
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Non-parametric Tests
Wilcoxon-signed rank test
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
One-way Analysis of Variance
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test (Note:
sometimes called Wilcoxon
Rank Sum test!)
One-way Analysis of Variance
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test (Note:
sometimes called Wilcoxon
Rank Sum test!)
One-way Analysis of Variance
Kruskal-Wallis
Pearson’s correlation
Parametric / Non-parametric
Parametric Tests
Non-parametric Tests
Single sample t-test
Wilcoxon-signed rank test
Paired sample t-test
Paired Wilcoxon-signed rank
2 independent samples t-test
Mann-Whitney test(Note:
sometimes called Wilcoxon
Rank Sums test!)
One-way Analysis of Variance
Kruskal-Wallis
Pearson’s correlation
Spearman Rank
Repeated Measures
Friedman
Summary
Non-parametric
• Non-parametric methods have fewer
assumptions than parametric tests
• So useful when these assumptions not met
• Often used when sample size is small and
difficult to tell if Normally distributed
• Non-parametric methods are a ragbag of
tests developed over time with no
consistent framework
• Read in datasets LDL, etc and carry out
appropriate Non-Parametric tests
References
Corder GW, Foreman DI. Non-parametric Statistics for NonStatisticians. Wiley, 2009.
Nonparametric statistics for the behavioural Sciences.
Siegel S, Castellan NJ, Jr. McGraw-Hill, 1988 (first edition
was 1956)