Lecture 10 - University of Pennsylvania

Transcript Lecture 10 - University of Pennsylvania

Lecture 10
• Review Rank Sum test (Chapter 4.2)
• Welch t-test for comparing two normal
populations with unequal spreads (Chapter
4.3.2)
• Practical and statistical significance
(Chapter 4.5.1)
Rank Sum Test Review
• Let F and G be the two distributions. The rank
sum test is a test of H0 : F  G vs. Ha : F  .G
Validity depends only on having independent
samples from two populations.
• Useful when sample sizes are small (<30) and
populations appear nonnormal.
• Implementation: Compute T=sum of ranks in
T  Mean(T )
Group 1. Compute z 
SD(T )
where mean(T) and SD(T) calculated under H0.
p-value = 2*Prob(Z>|z|) where Z = standard
normal r.v.
Example
• Study of role of vitamin C in schizophrenia.
• Twenty schizophrenic patients and 15 controls
with a diagnosis of neurosis of different origin
were selected for study.
• Dose of vitamin C was given to schizophrenic
patients and controls. Total amount of urinary
vitamin C excreted during a six hour period was
measured.
• Data in schizovitaminc.JMP.
Oneway Analysis of Total By Group
700
600
500
Total
400
300
200
100
0
-100
Nonschizophrenic
Schizophrenic
Group
Oneway Analysis of Total By Group
t Test
Nonschizophrenic-Schizophrenic
Assuming equal variances
Difference
Std Err Dif
Upper CL Dif
Lower CL Dif
Confidence
87.193
41.219
171.054
3.333
0.95
t Ratio
DF
Prob > |t|
Prob > t
Prob < t
2.115377
33
0.0420
0.0210
0.9790
Wilcoxon / Kruskal-Wallis Tests (Rank Sums)
Level
Nonschizophrenic
Schizophrenic
Count
15
20
Score Sum
344
286
2-Sample Test, Normal Approximation
S
344
Z
2.45620
Prob>|Z|
0.0140
Score Mean
22.9333
14.3000
(Mean-Mean0)/Std0
2.456
-2.456
Meaning of null hypothesis in
rank sum test
• The rank sum test is most useful if the following
condition holds:
– Condition: Two population distributions F and G are the
same, except that they are
 shifted by a constant so that
G is higher
than F.

• If condition holds, the rank sum test is a test of
whether or not two populations have same center
(and are equal).
• If condition does not hold, then the null hypothesis
of the rank sum test H0 : F  G might not be true
(and hence will be rejected in large samples) even
if populations have same center.
Cognitive Load in Teaching
• Case Study 4.1.2
• A randomized experiment was done to compare (i)
a conventional approach to teaching coordinate
geometry in which presentation is split into
diagram, text and algebra with (ii) a modified
approach in which algebraic manipulations and
explanations are presented as part of the graphical
display. Students’ performance on a test was
compared after being taught by two methods.
• Both distributions are highly skewed. In addition,
there were five students who did not come to any
solution in the five minutes allotted so that their
solution times are censored (all that is known
about them is that they exceed 300 seconds).
Oneway Analysis of TIME By TREATMT
350
300
TIME
250
200
150
100
50
CONVENTIONAL
MODIFIED
T REAT MT
Wilcoxon / Kruskal-Wallis Tests (Rank Sums)
Level
CONVENTIONAL
MODIFIED
Count
14
14
Score Sum
269
137
2-Sample Test, Normal Approximation
S
137
Z
-3.01826
Prob>|Z|
0.0025
1-way Test, ChiSquare Approximation
ChiSquare
9.2495
DF
1
Prob>ChiSq
0.0024
Score Mean
19.2143
9.7857
(Mean-Mean0)/Std0
3.018
-3.018
Welch t-test for comparing
normal pops. with unequal spread
• t-test assumes populations have equal standard
deviations. Rule of thumb: t-test remains
approximately valid if ratio of larger standard
deviation to smaller standard deviation is less than
2.
• Welch’s t-test for unequal spreads: Rather than
pooling to obtain single estimate of population
SD, use individual sample SD’s to estimate
respective population SD’s, resulting in different
formula for standard error of Y2  Y1 :
s1 s2
Y2  Y1

, tW 
n1 n2
SEW (Y2  Y1 )
2
SEW (Y2  Y1 ) 
2
Welch’s t-test
• Welch’s t-test assumes populations are
normal but doesn’t assume equal SD.
• Welch’s t-test has different degrees of
freedom, p-value only approximate
• JMP implementation: Analyze, Fit Y by X,
click on unequal variances under red
triangle next to Oneway Analysis.
• Welch
Vitamin
C in schizophrenia study:
Anova testing Means Equal, allowing Std Devs Not Equal
F Ratio
3.7648
t Test
1.9403
DFNum
1
DFDen
19.502
Prob > F
0.0669
Value of equal spread model
• If two populations have same spread and
same shape, then difference in means is
entirely adequate summary of their
difference.
• If two populations have different means and
different standard deviations, then
difference in means may be inadequate
summary. See Display 4.11.
Practical and Statistical Significance
• Section 4.5.1
• p-values indicate statistical significance, the
extent to which a null hypothesis is
contradicted by data
• This must be distinguished from practical
significance, the practical importance of the
finding.
Example
• Investigators compare WISC vocabulary scores for big city
and rural children.
• They take a simple random sample of 2500 big city
children and an independent simple random sample of
2500 rural children.
• The big city children average 26 on the test and their SD is
10 points; the rural children average only 25 and their SD
is 10 points
1
1
t

1
/(
10

)  3.3 ,
• Two sample t-test:
2500 2500
p-value  .00005
• Difference between big city children and rural children is
highly significant, rural children are lagging behind in
development of language skills and the investigators
launch a crusade to pour money into rural schools.
Example Continued
• Confidence interval for mean difference between rural and
big city children: Approximate 95% CI =
1
1
1  2 *10

 (0.43, 1.57)
2500 2500
.
• WISC test – 40 words child has to define. Two points
given for correct definition, one for partially correct
definition.
• Likely value of mean difference between big city and rural
children is about one partial understanding of a word out of
forty.
• Not a good basis for a crusade. Actually investigators have
shown that there is almost no difference between big city
and rural children on WISC vocabulary scale.
Practical vs. Statistical Significance
• The p-value of a test depends on the sample size.
With a large sample, even a small difference can
be “statistically significant,” that is hard to explain
by the luck of the draw. This doesn’t necessarily
make it important. Conversely, an important
difference may not be statistically significant if the
sample is too small.
• Always accompany p-values for tests of
hypotheses with confidence intervals. Confidence
intervals provide information about the likely
magnitude of the difference and thus provide
information about its practical importance.

Lecture 10 - University of Pennsylvania

Transcript Lecture 10 - University of Pennsylvania

Directory