Two sample t-test - Harvard University

Download Report

Transcript Two sample t-test - Harvard University

Comparison of two
samples
Summer program
Brian Healy
Previous classes

Hypothesis testing
– Null and Alternative hypotheses
– Test statistic
– p-value
– Conclusion
Confidence intervals
 Comparison of CI to hypothesis test
 Power and sample size

What are we doing today?

Two-sample t-test
– Paired t-test
– Independent samples
 Equal variance
 Unequal variance

Sample size for two samples
Big picture
Up to this point, we have only concerned
ourselves with one sample. Often we want
to compare one group to another. What
happens when we are comparing two
samples?
 Variability in both samples, and potentially
two samples are related
 Much of the theory is the same

Example



One of the first studies I analyzed was a tumor size
study. Having an accurate measure of tumor size is
extremely important because it allows a physician to
accurately determine if a tumor is growing, shrinking or
remaining constant.
The problem is that often the measurements of the
tumor size vary from physician to physician.
In the past, tumor size was measured using the linear
distance across the tumor, but this was found to be very
variable because of the irregular shape of some tumors.
A new method called the RECIST criteria traces the
outside of the tumor. The RECIST method was believed
to give more consistent measures of the volume of the
tumor.
Available data
For a portion of the study, a pair of doctors were
shown the same set of tumor pictures. The
volume of the tumor was measured by two
separate physicians under similar conditions.
 Question of interest: Did the measurements
from the two physicians significantly differ?
 If not, then there would be no evidence that the
volume measurements change based on
physician.

20 scans were
measured by each
physician (10 are shown
here)
 Measurements in cm3
 What can you say about
these samples?

– Two measurement on the
same person
– They are related so we
must account for this
– Much research in
statistics deals with how
to handle correlated
data, but in this case it is
pretty easy
Tumor
Dr. 1
Dr. 2
1
15.8
17.2
2
22.3
20.3
3
14.5
14.2
4
15.7
18.5
5
26.8
28.0
6
24.0
24.8
7
21.8
20.3
8
23.0
25.4
9
29.3
27.5
10
20.5
19.7
Dependent sample

Tumor Dr. 1
We can measure the
effect of the treatment in 1
15.8
each person by taking the 2
22.3
difference
di  x1i  x2i

Instead of having two
samples, we can consider
our dataset to be one
sample of differences
– Just like the one sample
problem
Dr. 2
Difference
17.2
-1.4
20.3
2.0
3
14.5
14.2
0.3
4
15.7
18.5
-2.8
5
26.8
28.0
-1.2
6
24.0
24.8
-0.8
7
21.8
20.3
1.5
8
23.0
25.4
-2.4
9
29.3
27.5
1.8
10
20.5
19.7
0.8
Differences

Volume from Dr. 1
– Population mean: 1
– Sample mean: x1

Volume from Dr. 2
– Population mean: 2
– Sample mean: x 2

Difference
– Population mean:   1  2
n
– Sample mean:
d
d
i 1
n
i
Distribution of differences

Assuming di’s are normally distributed, can
use t-distribution with n-1 dof where n is the
number of differences
t

d 
sd n
Standard deviation of differences
 d
n
sd 

i 1
i
d

2
n 1
Test statistic acts just like one sample
Picture
3
4
5
Histogram of diff
2
1
0
We can see that the
assumption of
normality of the
differences is
reasonable in this
case
Frequency

-4
-3
-2
-1
0
diff
1
2
3
Paired t-test
1)
Null hypothesis: No difference between physicians
effect
H0 : dr1  dr 2    dr1  dr 2  0
2)
3)
Two dependent samples; alpha=0.05
Test statistic: t-statistic with dof t  d  
sd
4)
5)
6)
n

 0.24
 0.646
1.66 20
p-value=0.53
Fail to reject null hypothesis
Conclusion: there is no evidence of a difference in
tumor volume measurement based on physician
Confidence interval

Confidence interval for paired t-test
constructed in the same way as one-sample ttest
sd
sd 

, d  t1a / 2
 d  t1a / 2

n
n

For our example, the confidence interval is
(-1.01 0.54)
 Note that the conclusion from the hypothesis
test and the confidence interval are the same

Paired t-test in R

Using the help menu, determine how to
complete the paired t-test in R.
Paired t-test in R



data<-read.table(P:\\”pairedscans.dat”, header=F)
dr1<-data[,1]; dr2<-data[,2]
t.test(dr1, dr2, paired=T)
– The output provides the p-value and the confidence interval
Paired t-test
data: data[, 1] and data[, 2]
t = -0.6456, df = 19, p-value = 0.5262
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
-1.0180279 0.5380279
sample estimates:
mean of the differences
-0.24
Extensions

Some additional examples of paired samples
are:
– Differences between left and right eye
– Differences between dominant and recessive hand
– Matched samples

When you have more than two samples,
techniques account for the correlation between
the samples
– Multivariate / longitudinal data
Unpaired samples

Often it is impractical to design study to
use the same patients for both group
– Ex. Comparison of cholesterol in males and
females
– Ex. Time constraints

Since the samples are not paired, we
cannot use the difference between the
individual samples
– Must adjust previous analysis
Example
Another aspect of the tumor volume study was
trying to compare the tumor volume among
patients with different forms of cancer. The
average tumor size is important to know the
effect of treatment can be determined.
 In this study, patients with brain, breast and
liver tumors, but initially we will only compare
the brain and breast cancers.
 All of the tumors were measured using the
RECIST method

Null hypothesis
The null hypothesis is that there is no difference
between the volume of the tumor in the two
forms of cancer
 H0: brain =breast , or brain – breast =0
 More generally, we can test if the difference
between two groups is a specific value, 1-2=D

– This occurs when comparing two treatment groups
and we are interested if the two groups are different
by a specific amount
Each patient contributes one observation
 Can estimate from the sample

(1 , 1 )
– Mean and standard deviation in brain cancer group
with ( x1 , sd1 )
( 2 ,  2 )
– Mean and standard deviation in breast cancer group
with ( x2 , sd 2 )

Are the two groups the same?
– H0: 1=2, or 1-2=0
– To determine this, we are going to look at x1  x2
– We also need to know


 
 
Var x1  x2  Var x1  Var x2 
 12
n1

 22
n2
Difference in the sample means


We are going to use the difference of the means as our
test statistic, but we need to estimate the variance of
this difference to determine if the difference is
significant
Basic form of test statistic:
– Standard deviations known
z
x  x     
1
2
 12
n1

1

 22
2
unknown

x  x      
t
n2
1
2
1
sx  x
1
2
2
The estimate of the standard deviation changes when
– The samples have equal variance OR
– The samples have unequal variance
Equal variance

Sometimes we will be willing to assume that
the variance in the two groups is equal:
12   22   2

If we know this variance, we can use the zstatistic 
 

z
x1  x2  1   2
1 1


n1 n2
Often we have to estimate 2 with the sample
variance from each of the samples, s12 , s22
 Since we have two estimates of one quantity,
we pool the two estimates

Equal variance continued

The estimate of  is given by:
2
2




n

1
s

n

1
s
1
2
2
s2  1
n1  n2  2
 The t-statistic based on the pooled variance is very
similar to the z-statistic as always:
x1  x2  1  2 
p
t


sp

1 1

n1 n2
The t-statistic has a t-distribution with n1  n2  2
degrees of freedom
2
1
0
Frequency
3
4
Histogram of size[(gr == 0)]
13
14
15
16
17
18
19
20
size[(gr == 0)]
6
8
Histogram of size[(gr == 1)]
4

Breast
n
20
28
xbar 16.2 cm3 17.5 cm3
s2
3.49
6.0
Frequency

Brain
2

For the tumor volume
study, there were 20
brain cancer subjects and
28 breast cancer subjects
The summary statistics
and histogram for the
data are given here
What can you say about
the distributions?
Does the equal variance
assumption seem valid in
this case?
0

12
14
16
18
size[(gr == 1)]
20
22
Hypothesis test
H0: mean brain tumor size = mean breast
tumor size
2) Two independent samples with equal variance;
alpha = 0.05
3) t  x  x       16.2  17.5  0  2.054
1)
1
2
sp
1
1 1

n1 n2
2
2.23
1
1

20 28
p-value: 0.046
5) Reject null hypothesis
6) Conclusion: There is a significant difference in
the size of brain and breast cancer tumors
4)
R code

If we only had the test statistics above, we can
calculate the test statistic and then compare it to
the t-distribution using
pt(-2.054 ,df=46)
to determine the area in the lower tail
 How do we convert this into the appropriate pvalue?
 With the full data, we can use
data<-read.table(“cancer.dat”,header=T)
gr<-data[,1]; size<-data[,2]
t.test(size[(gr==0)], size[(gr==1)], var.equal=T)
R output
Two Sample t-test
data: size[(gr == 0)] and size[(gr == 1)]
t = -2.054, df = 46, p-value = 0.04568
alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
-2.65174438 -0.02682705
sample estimates:
mean of x mean of y
16.15000 17.48929
Unequal variance
Often, we are unwilling to assume that the
variances are equal
 We now write the test statistic as:


x  x      
t
1
2
1
2
s12 s22

n1 n2

The distribution of this statistic is difficult to
derive and we approximate the distribution using
a t-distribution with n degrees of freedom
s n  s n 
n
 s n  s n  



2
1
2
1
2
2
1
2
1
 n1  1
2
2
2
2
2
2
n2  1 

This is called the Satterthwaite or Welch
approximation
– When you complete a two-sample t-test in R
and the variances are not assumed equal, this
approximation is used
Example
For the comparison of
the brain cancers to
the liver cancers, the
variances are much
more different.
 Let’s use the unequal
variance two sample
t-test in this case
Brain
Liver
n
20
17
xbar
16.2 cm3 19.35 cm3
s2
3.49

14.4
Example
H0: mean brain tumor size = mean liver tumor
size
2) Two independent samples with equal variance;
alpha = 0.05
x  x       16.15  19.35  0  3.17
3) t 
1)
1
2
1
2
1
2
2
s
s

n1 n2
2
3.5 14.4

20 17
p-value: 0.0044
5) Reject null hypothesis
6) Conclusion: There is a significant difference in
the size of the brain and liver tumor size
4)
R output
> t.test(size[(gr==0)],size[(gr==2)])
Welch Two Sample t-test
data: size[(gr == 0)] and size[(gr == 2)]
t = -3.1666, df = 22.48, p-value = 0.00439
alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
-5.288291 -1.105827
sample estimates:
mean of x mean of y
16.15000 19.34706
Practice
Get the TV dataset from the course folder
 We want to compare the amount of TV
the boys and girls watch. Perform the
most appropriate test. Boys are coded as
0 and girls are coded as 1.

Can we test if the variances are
equal?
Since we can never be sure if the
variances are equal, could we test if they
are equal?
 Of course we can!!!

– But, remember there is error in every
statistical test
– Sometimes it is just preferred to use the
unequal variance unless there is a good
reason
Equality of variance
H0: 12 22
 To test this hypothesis, we use the sample
variances: s12 , s22
 If one of the variances is much larger than the
other, this is evidence against the null
 As we discussed a couple classes ago:
s12
2
~

n1 1
2

1
s22

2
2
~
2
n2 1
Test of equality






One way to test if the two variances are equal is to check
if the ratio is equal to 1 (H0: ratio=1)
2
s
Under the null, the ratio simplifies to 1
s22
The ratio of 2 chi-square random variables has an Fdistribution
The F-distribution is defined by the numerator and
denominator degrees of freedom
Here we have an F-distribution with n1-1 and n2-1
degrees of freedom
This works better with s12  s22
F-distribution
Here is the Fdistribution with 5
and 500 degrees of
freedom
 Note the skew of the
distribution

2000
1000
0
Frequency
3000
Histogram of rf(10000, df1 = 5, df2 = 500)
0
1
2
3
rf(10000, df1 = 5, df2 = 500)
4
5
Example
> var.test(size[(gr==1)],size[(gr==0)])
F test to compare two variances
data: size[(gr == 1)] and size[(gr == 0)]
F = 1.719, num df = 27, denom df = 19, p-value = 0.2247
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.710335 3.904512
sample estimates:
ratio of variances
1.719033
> var.test(size[(gr==2)],size[(gr==0)])
F test to compare two variances
data: size[(gr == 2)] and size[(gr == 0)]
F = 4.1182, num df = 16, denom df = 19, p-value = 0.004156
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
1.589643 11.111060
sample estimates:
ratio of variances
4.118214
Practice


Example from Rosner,
Principles of Biostatistics
(Problems 8.88 and 8.89)
The following table
compares the balance
scores in patients with
rheumatoid arthritis and
osteoarthitis. Which test
is most appropriate and
what is the conclusion
you would draw?
Mean SD
n
RA
3.4
3.0
36
OA
2.5
2.8
30
Power and sample size
As with the one sample case, we can find
power and sample size for a two sample
problem
 For two dependent samples, the power
and sample size can be calculated exactly
as in the one sample case because the
paired t-test is a one sample problem
 For two independent samples, the power
and sample size is slightly different

One sample case (review)

To find the sample size in the one sample case
we needed
–
–
–
–
–
The hypothesized difference in the means
The alpha level
The power
The variance in the sample
One-sided or two sided test

z
n
2


z

1 / 2
1 
2
1   0 2
Two sample case
We still need to have the following pieces of
information.
 For equal sample size,



n

2
1

  22 z1 / 2  z1  
2
 2  1 2
For sample sizes n2=kn1,
n1



n2

k

2
1

  22 / k z1 / 2  z1  
2
2  1 2
2
1

  22 z1 / 2  z1  
2  1 2
2