Clinical Trials Methods - Centre for Clinical Trials

Download Report

Transcript Clinical Trials Methods - Centre for Clinical Trials

Clinical Trial Writing II
Sample Size Calculation and
Randomization
Liying XU (Tel: 22528716)
CCTER
CUHK
31st July 2002
1
Sample Size Planning
1.1 Introduction

Fundamental Points

Clinical trials should have sufficient
statistical power to detect difference
between groups considered to be of
clinical interest. Therefore calculation
of sample size with provision for
adequate levels of significance and power
is a essential part of planning.
Five Key Questions Regarding the Sample
Size






What is the main purpose of the trial?
What is the principal measure of patients
outcome?
How will the data be analyzed to detect a
treatment difference? (The test statistic: t-test ,
X2 or CI.)
What type of results does one anticipate with
standard treatment?
Ho and HA, How small a treatment difference
is it important to detect and with what degree of
certainty? ( ,  and .)
How to deal with treatment withdraws and
protocol violations. (Data set used.)
SSC: Only an Estimate
Parameters used in calculation are
estimates with uncertainty and often
base on very small prior studies
 Population may be different
 Publication bias--overly optimistic
 Different inclusion and exclusion
criteria
 Mathematical models approximation

What should be in the protocol?

Sample size justification
Methods of calculation
 Quantities used in calculation:

• Variances
• mean values
• response rates
• difference to be detected
Realistic and Conservative

Overestimated size:



unfeasible
early termination
Underestimated size



justify an increase
extension in follow-up
incorrect conclusion (WORSE)
What is  (Type I error)?
The probability of erroneously
rejecting the null hypothesis
 (Put an useless medicine into the
market!)

What is  (Type II error)?
The probability of erroneously failing
to reject the null hypothesis.
 (keep a good medicine away from
patients!)

What is Power ?

Power quantifies the ability of the
study to find true differences of
various values of .

Power = 1- =P (accept H1|H1 is true)

----the chance of correctly identify H1
(correctly identify a better medicine)
What is ?

 is the minimum difference between groups
that is judged to be clinically important


Minimal effect which has clinical relevance in
the management of patients
The anticipated effect of the new treatment
(larger)
The Choice of  and  depend on:



the medical and practical consequences of the two
kinds of errors
prior plausibility of the hypothesis
the desired impact of the results
The Choice of  and 



=0.10 and =0.2 for preliminary trials
that are likely to be replicated.
=0.01 and =0.05 for the trial that are
unlikely replicated.
= if both test and control treatments are
new, about equal in cost, and there are
good reasons to consider them both
relatively safe.
The Choice of  and 


> if there is no established control treatment
and test treatment is relatively inexpensive, easy
to apply and is not known to have any serious side
effects.
< (the most common approach 0.05 and 0,2)if
the control treatment is already widely used and is
known to be reasonably safe and effective,
whereas the test treatment is new,costly, and
produces serious side effects.
1.2 SSC for Continuous Outcome
Variables
H0: =C-I=0
 HA: =C-I0
 If the variance in known
 If
x  x 

z
c


I
1
1

NC N I
If Z  Z
H0 will be rejected at the
 level of significance.

A total sample 2N would be needed to
detect a true difference  between I and C
with power (1-) and significant level  by
formula:
2N 


4 Z  Z   2
2
2
Example 1

An investigator wish to estimate the sample size
necessary to detect a 10 mg/dl difference in
cholesterol level in a diet intervention group
compared to the control group. The variance from
other data is estimated to be (50 mg/dl). For a two
sided 5% significance level, Z=1.96, and for 90%
power, Z=1.282.

2N=4(1.96+1.282)2(50)2/102=1050
Example1a
Baseline Adjustment





An investigator interested in the mean levels of
change might want to test whether diet
intervention lowers serum cholesterol from
baseline levels when compare with a control.
H0:  c   I =0
HA: c   I 0
=20mg/dl, =10mg/dl
2N=4(1.96+1.282)2(20)2/102=170
A Professional Statement

A sample size of 85 in each group will
have 90% power to detect a difference
in means of 10.0 assuming that the
common standard deviation is 20.0
using a two group t-test with a 0.05
two-sided significant level.
Values of f(,) to be used in formula
for sample size calculation
(Type II error)

(Type I
error)
Z

 Z

0.1
0.05
0.02
0.01
2
0.05
10.8
13.0
15.8
17.8
 f ( ,  )
0.1
8.6
10.5
13.0
14.9
0.2
6.2
7.9
10.0
11.7
0.5
2.7
3.8
5.4
6.6
1.3 SSC for a Binary Outcome

Two independent samples
 1
1 

Z   pC  pI  / p 1  p 

 NC N I 
p  rI  rC  /( N I  N C )
p  ( pC  pI ) / 2
2 N  4(Z  Z  ) p1  p  /  pC  pI 
2
2
Example 2

Suppose the annual event rate in the
control group is anticipated to be 20%. The
investigator hopes that the intervention will
reduce the annual rate to 15%. The study
is planned so that each participant will be
followed for 2 years. Therefore, if the
assumption are accurate, approximately
40% of the participants in the control
group and 30% of the participants in the
intervention group will develop an event.
2 N  41.96  1.282 (0.35)(0.65) / 0.4  0.3
 956  960
2
2
A Professional Statement

A two group x2 test with a 0.05 twosided significant level will have 90%
power to detect the difference between a
Group 1 proportion, P1,of 0.40 and a
Group 2 proportion P2 of 0.30 (odds
ratio of 0.643) when the sample size in
each group is 480.
Table 1.3 Approximate total sample size for comparing
various proportions in two groups with significance level ()
of 0.05
and power(1-) of 0.8 and 0.9
True proportions
=0.05(one-sided)
=0.05(two-sided)
pC
pI
1-
1-
1-
1-
Control group
Intervention
group
0.50
0.40
0.30
0.20
0.40
0.30
0.25
0.20
0.30
0.25
0.20
0.20
0.15
0.10
0.15
0.10
0.05
0.05
0.90
0.80
0.90
0.80
850
210
90
50
850
210
130
90
780
330
180
640
270
140
1980
440
170
950
610
160
70
40
610
150
90
60
560
240
130
470
190
100
1430
320
120
690
1040
260
120
60
1040
250
160
110
960
410
220
790
330
170
2430
540
200
1170
780
200
90
50
780
190
120
80
720
310
170
590
250
130
1810
400
150
870
0.6
0.50
0.40
0.30
0.20
0.10
From Table 1.3 You can see:
N
 The power 1- N 
 The N 

Paired Binary Outcome

McNemar’s test
Np



Z


 Z
d

2
f
2
d=difference in the proportion of successes
(d=pI-pC)
f=the portion of participants whose response is
discordant (the pair of outcome are not the
same)
Example 3

Consider an eye study where one eye
is treated for loss in visual acuity by a
new laser procedure and the other
eye is treated by standard therapy.
The failure rate on the control, pC, is
estimated to be 0.4, and the new
procedure is projected to reduce the
failure rate to 0.20. The discordant
rate f is assumed to be 0.50.
=0.05
 The power 1- =0.90
 f=0.5
 PC=0.4
PI=0.2


1.96  1.282 0.5
Np 
 262 0.5  132
2
0.4  0.2
2
1.4 Adjusting for Non-adherence
Ro =drop out rate
 RI=drop in rate
2
 N=N / 1  RO  RI 

If RO=0.20, RI=0.05
 N =1.78N

1.5 Adjusting the Multiple Comparison

’= /k

k= the number of multiple comparison
variables
Table 1.4 Adjusting for Randomization Ratio
Randomization Ratio
1:1
1:2
1:3
1:4
1:5
1:6
Increase in total N
0
+12.5%
+33%
+56%
+80%
+100%
1.6 Adjusting for loss of follow up

If p is the proportion of subjects lost to
follow-up, the number of subjects must be
increased by a factor of 1/(1-p).
1.7 Other Factors:
the rate of attrition of subjects during
a trial
 intermediate analyses

Sample size re-estimation
Events rates are lower than
anticipate
 Variability of larger than expected

Without unbinding data and
 Making treatment comparisons

1.8 Power Calculation
(assuming we compare two medicines)

Power Depends on 4 Elements:
 The real difference between the two medicines,

• Big big power
 The variation among individuals,
• Small big power
 The sample size, n
• Large nbig power
 Type I error,
• Large  big power
Sensitivity of the sample size
estimate


to a variety of deviations from these
assumptions
a power table
Table 1 Statistical Power of the Tanzania
Vitamin and HIV Infection Trial (N=960)
Effect of B
0%
15%
30%
Loss to follow up
Loss to follow up
Loss to follow up
0%
20% 33%
0%
20%
33%
0%
30%
89%
82% 74%
85% 76%
68%
79% 69% 61%
25%
75% 65%
Effect of A
58%
69%
59% 52%
20%
33%
62% 52% 45%
Example 4
Regret for Low Power Due to Small
Sample?

I have a set of data that the mean change
between the 2 groups is significantly
different (p<0.05). But when I put
calculate the power it gives only
50%. How should I interpret this? Also,
can someone kindly advise as whether it
is meaningful (or pointless) to calculate
the power when the result is statistically
significant?
Books and Software



Sample size tables for clinical
studies (second edition)
By David Machin, Michael Campbell Peter Fayers
and Alain Pinol
Blackwell Science 1997
PASS 2000 available in CCTER
 nQuery 4.0 available in CCTER

2. Randomization
Randomization
 Definition:
 randomization
is a process by which each
participant has the same chance of being
assigned to either intervention or control.
Fundamental Point
 Randomization
trends to produce study
groups comparable with respect to known
and unknown risk factors, removes
investigator bias in the allocation of
participants, and guarantees that statistical
tests will have valid significance levels.
Two Types of Bias in Randomization


Selection bias
 occurs if the allocation process is predictable. If any
bias exists as to what treatment particular types of
participants should receive, then a selection bias
might occur.
Accidental bias
 can arise if the randomization procedure does not
achieve balance on risk factors or prognostic
covariates especially in small studies.
Fixed Allocation Randomization
 Fixed allocation randomization procedures
assign the intervention to participants with
a pre-specified probability, usually equal,
and that allocation probability is not altered
as the study processes
• Simple randomization
• Blocked randomization
• Stratified randomization
Randomization Types

Simple randomization
Simple Randomization



Option 1: to toss an unbiased coin for a randomized
trial with two treatment (call them A and B)
Option 2: to use a random digit table. A randomization
list may be generated by using the digits, one per
treatment assignment, starting with the top row and
working downwards:
Option 3: to use a random number-producing algorithm,
available on most digital computer systems.
Advantages
 Each
treatment assignment is completely
unpredictable, and probability theory
guarantees that in the long run the numbers
of patients on each treatment will not be
radically different and easy to implement
Disadvantages
Unequal groups
 one treatment is assigned more often than
another
 Time imbalance or chronological bias
 One treatment is given with greater frequency
at the beginning of a trial and another with
greater frequency at the end of the trial.
 Simple randomization is not often used, even for
large studies.

Randomization Types

Blocked randomization
Blocked Randomization
(permuted block randomization)
Blocked randomization is to ensure exactly equal
treatment numbers at certain equally spaced
point in the sequence of patients assignments
 A table of random permutations is used
containing, in random order, all possible
combinations (permutations) of a small series of
figures.
 Block size: 6,8,10,16,20.

Advantages
 The
balance between the number of
participants in each group is guaranteed
during the course of randomization. The
number in each group will never differ by
more than b/2 when b is the length of the
block.
Disadvantages
 Analysis may
be more complicated (in theory)
 Correct analysis could have bigger power
 Changing block size can avoid the
randomization to be predictable
 Mid-block inequality might occur if the interim
analysis is intended.
Randomization Types

Stratified randomization
U.S.
geographic
location
previous
exposure
Yes
Europe
No
Yes
No
site
lymph skin breast lymph skin breast lymph skin breast lymph skin breast
Stratified Randomization

Stratified randomization process involves
measuring the level of the selected factors for
participants, determining to which stratum each
belongs, and performing the randomization within
the stratum. Within each stratum, the
randomization process itself could be simple
randomization, but in practice most clinical trials
use some blocked randomization strategy.
Table 3. Stratification Factors and Levels
(323=18 Strata)
Age
Sex
Smoking history
1. 40-49 yr
1.Male
1. Current smoker
2. 50-59 yr
2 Female
2. Ex-smoker
3. 60-69 yr
3. Never smoked
Table 4 Stratified Randomization with Block Size of Four
Strat
a
1
2
3
4
5
6
7
8
9
10
11
12
Age
Sex
Smoking
Group assignment
40-49
40-49
40-49
40-49
40-49
40-59
50-59
50-59
50-59
50-59
50-59
50-59
etc.
M
M
M
F
F
F
M
M
M
F
F
F
Current
Ex
Never
Current
Ex
Never
Current
Ex
Never
Current
Ex
Never
ABBA BABA..
BABA BBAA..
Etc.
Advantages
 To
make two study groups appear
comparable with regard to specified factors,
the power of the study can be increased by
taking the stratification into account in the
analysis.
Disadvantages
 The
prognostic factor used in stratified
randomization may be unimportant and other
factors may be identified later are of more
importance
Mechanism
Trial Type
Mechanism
No central registration office
Double blind drug trial
Randomization list
sealed envelops
Pharmacist will be involved
Multi-centre trial
Central registration office
Single-centre trial
Independent person
responsible for patients
registration and randomization
An Example of Stratified Randomization
Patients will be stratified according to the following
criteria:
 1) Treatment center (Hospital A vs Hospital B vs
Hospital C)
 2) N-stage(N2 vs N3)
 3) T-stage (T1-2 vs T3-4)

What should be in the protocol?

A dynamic allocation scheme will be used to
randomize patients in equal proportions within
each of 12 strata. The scheme first creates timeordered blocks of size divisible by three and then
uses simple randomization to divide the patients
in each block into three treatment arms, in equal
proportion. The block sizes will be chosen
randomly so that each block contains either 6 or
9 patients.
Cont…

This procedure helps to ensure both
randomness and investigator blinding (the block
sizes are known only to the statistician), as
recommended by Freedman et al.
Randomization will be generated by the
consulting statistician in sealed envelopes,
labeled by stratum, which will be unsealed after
patient registration.
Adaptive Randomization
 Number adaptive
 Biased
coin method
 Baseline adaptive (MINIMIZATION)
 Outcome adaptive
Biased Coin Method
 Advantages
 Investigators
can not determine the next
assignment by discovery the blocking
factor.
 Disadvantages
 Complexity in use
 Statistical analysis cumbersome
Minimization
 Minimization is
an well -accepted statistical
method to limit imbalance in relative small
randomized clinical trials in conditions with
known important prognostic baseline
characteristics.
 It called minimization because imbalance in
the distribution of prognostic factors are
minimized
Table 1 Some baseline characteristics of patients in a controlled trial
of mustine versus talc in the control of pleural effusions
in patients with breast cancer (Frientiman et al, 1983)
Treatment
Mustine (n=23)
Talc(n=23)
Mean age (SE)
Stage of disease:
1 or 2
3 or 4
Mean interval in
month between BC
diag. and effusion
diag. (SE)
Postmenopausal
50.3(1.5)
55.3(2.2)
52%
48%
74%
26%
33.1(6.2)
60.4(13.1)
43%
74%
Minimization Factors
Age ( years)
<=50
Or
>50
Stage of disease
1 or 2
Or
3 or 4
Time between diagnosis
of cancer and diagnosis
of effusions(months)
<=30
Or
>30
Menopausal
Pre
Or
Post
Table 2 Characteristics of the first 29 patients in a clinical
trial using minimization to allocate treatment
Mustine
Talc
Age
<=50
>50
7
8
6
8
Stage
1 or 2
3 or 4
11
4
11
3
<=30m
>30m
6
9
4
10
Pre
Post
7
8
5
9
Time
Interval
Menopausal
Table 3
Calculation of imbalance in patient characteristics
for allocating treatment to the thirtieth patient
Age
>50
Mustine
(n=15)
8
Stage
3 or 4
4
3
Time interval
<=30m
6
4
8
9
26
24
Postmenopausal
Total
Talc
(n=14)
8
Advantages
It can reduce the imbalance into the minimum
level especially in small trial
 Computer Program available (called Mini) and
also not difficult to perform ‘by hand’
 Minimization and stratification on the same
prognostic factors produce similar levels of
power, but minimization may add slightly more
power if stratification does not include all of the
covariance

Disadvantages
 It
is a bit complicated process compare to the
simple randomization
Practical Considerations
Study type
Randomization
Large studies
Blocked
Large, Multicentre studies Stratified by centre
Small studies
Blocked and Stratified
by centre
Large number of
Prognostic factors
Minimization
Large studies
Stratified analysis
without stratified
randomization