Clinical Trials A short course

Download Report

Transcript Clinical Trials A short course

Statistical Methods in
Clinical Trials
II Categorical Data
Ziad Taib
Biostatistics
AstraZeneca
March 7, 2012
Types of Data
Continuous
Blood pressure
Time to event
Categorical
sex
quantitative
qualitative
Discrete
No of relapses
Ordered
Categorical
Pain level
Types of data analysis (Inference)
Parametric
Vs
Non parametric
Model based
Vs
Data driven
Frequentist
Vs
Bayesian
Inference problems
1. Binary data (proportions)
•
•
2.
3.
4.
5.
One sample
Paired data
Ordered categorical data
Combining categorical data
Logistic regression
A Bayesian alternative
Categorical data
In a RCT, endpoints and surrogate
endpoints can be categorical or ordered
categorical variables. In the simplest
cases we have binary responses (e.g.
responders non-responders). In Outcomes
research it is common to use many
ordered categories (no improvement,
moderate improvement, high
improvement).
Bernoulli experiment
1
Success
With probability p
Random
experience
0
Failure
Hole in one?
With probability1-p
Binary variables
•
•
•
•
Sex
Mortality
Presence/absence of an AE
Responder/non-responder according to
some pre-defined criteria
• Success/Failure
Estimation
• Assume that a treatment has been
applied to n patients and that at the end
of the trial they were classified according
to how they responded to the treatment:
0 meaning not cured and 1 meaning
cured. The data at hand is thus a sample
of n independent binary variables
• The probability of being cured by this
treatment can be estimated by
satisfying
Hypothesis testing
• We can test the null hypothesis
• Using the test statistic
• When n is large, Z follows, under the null
hypothesis, the standard normal
distribution (obs! Not when p very small or
very large).
Hypothesis testing
• For moderate values of n we can use the
exact Bernoulli distribution of
leading to the sum being Binomially
distributed i.e.
• As with continuous variables, tests can be
used to build confidence intervals.
Example 1: Hypothesis test
based on binomial distr.
Consider testing
against
H0: P=0.5
Ha: P>0.5
and where: n=10 and y=number of
successes=8
p-value=(probability of obtaining a result at least
as extreme as the one observed)=
Prob(8 or more responders)=P8+ P9+ P10=
={using the binomial formula}=0.0547
Example 2
RCT of two analgesic drugs A and B given
in a random order to each of 100 patients.
After both treatment periods, each patient
states a preference for one of the drugs.
Result: 65 patients preferred A and 35 B
Example (cont’d)
Hypotheses: H0: P=0.5 against H1: P0.5
Observed test-statistic: z=2.90
p-value:
p=0.0037
(exact p-value using the binomial distr. = 0.0035)
95% CI for P:
(0.56 ; 0.74)
Example 3
We want to test if the proportion of patients
experiencing an early improvement after
some treatment is 0.35. n=312 patients were
observed among which 147 experienced
such an improvement yielding a proportion of
(47.1%). The Z value is 4.3 yielding a p-value
of 0.00002. Using the exact distribution
0.00001. Of course n here is large so the
normal approximation is good enough. A
95% confidence interval for the proportion is
[4.1, 5.2] and does not contain the point 0.35.
Two proportions
• Sometimes we want to compare the proportion of
successes in two separate groups. For this purpose we
take two samples of sizes n1 and n2. We let yi1 and pi1 be
the observed number of subjects and the proportion of
successes in the ith group. The difference in population
proportions of successes and its large sample variance
can be estimated by
Two proportions (continued)
• Assume we want to test the null hypothesis that
there is no difference between the proportions of
success in the two groups. Under the null
hypothesis, we can estimate the common
proportion by
• Its large sample variance is estimated by
Example 4
NINDS trial in acute ischemic stroke
Treatment
rt-PA
placebo
n
312
312
responders*
147 (47.1%)
122 (39.1%)
*early improvement defined on a neurological scale
Point estimate:
0.080 (s.e.=0.0397)
95% CI:
(0.003 ; 0.158)
p-value:
0.043
Two proportions (Chi square)
• The problem of comparing two proportions
can sometimes be formulated as a problem
of independence! Assume we have two
groups as above (treatment and placebo).
Assume further that the subjects were
randomized to these groups. We can then
test for independence between belonging to
a certain group and the clinical endpoint
(success or failure). The data can be
organized in the form of a contingency table
in which the marginal totals and the total
number of subjects are considered as fixed.
2 x 2 Contingency table
RESPONSE
T
R
E
A
T
M
E
N
T
Failure
Success
Total
Drug
Y10
Y11
Y1.
Placebo
Y20
Y21
Y2.
Total
Y.0
Y.1
N=Y..
2 x 2 Contingency table
RESPONSE
T
R
E
A
T
M
E
N
T
Failure
Success
Total
Drug
165
147
312
Placebo
190
122
312
Total
355
462
N=624
Hyper geometric distribution
•n balls are drawn at random without
replacement.
•Y is the number of white balls
(successes)
•Y follows the Hyper geometric
Distribution with parameters (N, W, n)
Urn containing W white balls and
R red balls: N=W+R
Contingency tables
•
•
•
•
•
N subjects in total
y.1 of these are special (success)
y1. are drawn at random
Y11 no of successes among these y1.
Y11 is HG(N,y.1,y 1.)
in general
Contingency tables
• The null hypothesis of independence is
tested using the chi square statistic
• Which, under the null hypothesis, is chi
square distributed with one degree of
freedom provided the sample sizes in the
two groups are large (over 30) and the
expected frequency in each cell is non
negligible (over 5)
Contingency tables
• For moderate sample sizes we use Fisher’s exact
test. According to this calculate the desired
probabilities using the exact Hyper-geometric
distribution. The variance can then be calculated.
To illustrate consider:
• Using this and expectation m11 we have the
randomization chi square statistic. With fixed
margins only one cell is allowed to vary.
Randomization is crucial for this approach.
The (Pearson) Chi-square test
One Factor
A
i niA
ii niiA
iii niiiA
nA
Other factor
B
C
D
niB niC niD
niiB niiC niiD
niiiB niiiC niiiD
nB nC nD
E
niE
niiE
niiiE
nE
ni
nii
niii
niA
35 contingency table
The Chi-square test is used for testing the independence
between the two factors
The (Pearson) Chi-square test
2
(O

E
)
ij
ij
2




The test-statistic is:
p
i j
Eij
where Oij = observed frequencies
and
Eij = expected frequencies (under independence)
the test-statistic approximately follows a chi-square
distribution
Example 5
Chi-square test for a 22 table
Examining the independence between two treatments and a
classification into responder/non-responder is equivalent to
comparing the proportion of responders in the two groups
NINDS again
Observed frequencies
Expected frequencies
rt-PA
placebo
rt-PA
placebo
non-resp
165
190
responder
147
122
355
269
non-resp
177.5
177.5
responder
134.5
134.5
355
269
312
312
312
312
• p0=(122+147)/(624)=0.43
• v(p0)=0.00157
which gives a p-value of 0.043 in all these
cases. This implies the drug is better than
placebo. However when using Fisher’s
exact test or using a continuity correction
the chi square test the p-value is 0.052.
TABLE OF GRP BY Y
S
A
S
|
o
u
t
p
u
t
Frequency‚
Row Pct ‚nonresp ‚resp
‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
placebo ‚
190 ‚
122 ‚
‚ 60.90 ‚ 39.10 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
rt-PA
‚
165 ‚
147 ‚
‚ 52.88 ‚ 47.12 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
355
269
Total
312
312
624
STATISTICS FOR TABLE OF GRP BY Y
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
4.084
0.043
Likelihood Ratio Chi-Square
1
4.089
0.043
Continuity Adj. Chi-Square
1
3.764
0.052
Mantel-Haenszel Chi-Square
1
4.077
0.043
Fisher's Exact Test (Left)
0.982
(Right)
0.026
(2-Tail)
0.052
Phi Coefficient
0.081
Contingency Coefficient
0.081
Cramer's V
0.081
Sample Size = 624
Odds, Odds Ratios and relative Risks
The odds of success in group i is estimated by
The odds ratio of success between the two groups
i is estimated by
Define risk for success in the ith group as the
proportion of cases with success. The relative
risk between the two groups is estimated by
Categorical data
• Nominal
– E.g. patient residence at end of follow-up
(hospital, nursing home, own home, etc.)
• Ordinal (ordered)
– E.g. some global rating
•
•
•
•
•
•
•
Normal, not at all ill
Borderline mentally ill
Mildly ill
Moderately ill
Markedly ill
Severely ill
Among the most extremely ill patients
Categorical data & Chi-square test
One Factor
A
i niA
ii niiA
iii niiiA
nA
Other factor
B
C
D
niB niC niD
niiB niiC niiD
niiiB niiiC niiiD
nB nC nD
E
niE
niiE
niiiE
nE
ni
nii
niii
niA
The chi-square test is useful for detection of a general
association between treatment and categorical response
(in either the nominal or ordinal scale), but it cannot identify
a particular relationship, e.g. a location shift.
Nominal categorical data
treatment A
group
B
dip
33
28
61
Disease category
snip
fup
bop
15
34
26
18
34
20
33
68
46
other
8
14
22
Chi-square test: 2 = 3.084 , df=4 , p = 0.544
116
114
230
Ordered categorical data
• Here we assume two groups one receiving the
drug and one placebo. The response is assumed
to be ordered categorical with J categories.
• The null hypothesis is that the distribution of
subjects in response categories is the same for
both groups.
• Again the randomization and the HG distribution
lead to the same chi square test statistic but this
time with (J-1) df. Moreover the same relationship
exists between the two versions of the chi square
statistic.
The Mantel-Haensel statistic
The aim here is to combine data from
several (H) strata for comparing two
groups drug and placebo. The expected
frequency and the variance for each
stratum are used to define the MantelHaensel statistic
which is chi square
distributed with
one df.
Logistic regression
• Consider again the Bernoulli situation,
where Y is a binary r.v. (success or failure)
with p being the success probability.
Sometimes Y can depend on some other
factors or covariates. Since Y is binary we
cannot use usual regression.
Logistic regression
• Logistic regression is part of a category of statistical
models called generalized linear models (GLM). This
broad class of models includes ordinary regression and
ANOVA, as well as multivariate statistics such as
ANCOVA and loglinear regression. An excellent
treatment of generalized linear models is presented in
Agresti (1996).
• Logistic regression allows one to predict a discrete
outcome, such as group membership, from a set of
variables that may be continuous, discrete, dichotomous,
or a mix of any of these. Generally, the dependent or
response variable is dichotomous, such as
presence/absence or success/failure.
Simple linear regression
Table 1
Age and systolic blood pressure (SBP) among 33 adult women
Age
SBP
Age
SBP
Age
SBP
22
23
24
27
28
29
30
32
33
35
40
131
128
116
106
114
123
117
122
99
121
147
41
41
46
47
48
49
49
50
51
51
51
139
171
137
111
115
133
128
183
130
133
144
52
54
56
57
58
59
63
67
71
77
81
128
105
145
141
153
157
155
176
172
178
217
SBP (mm Hg)
220
SBP  81.54  1.222  Age
200
180
160
140
120
100
80
20
30
40
50
60
70
Age (years)
adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
80
90
Simple linear regression
• Relation between 2 continuous variables (SBP
and age)
y
Slope
y  α  β1x1
x
• Regression coefficient b1
– Measures association between y and x
– Amount by which y changes on average when x
changes by one unit
– Least squares method
Multiple linear regression
• Relation between a continuous variable and a
set of i continuous variables
y  α  β1x1  β2 x 2  ...  βixi
• Partial regression coefficients bi
– Amount by which y changes on average
when xi changes by one unit
and all the other xis remain constant
– Measures association between xi and y adjusted for
all other xi
• Example
– SBP versus age, weight, height, etc
Multiple linear regression
y

Predicted
Response variable
variables
Outcome variable
Dependent
variables
α  β1x1  β2 x 2  ...  βixi
Predictor variables
Explanatory
Covariables
Independent
Logistic regression
Table 2
Age and signs of coronary heart disease (CD)
Age
CD
Age
CD
Age
CD
22
23
24
27
28
30
30
32
33
35
38
0
0
0
0
0
0
0
0
0
1
0
40
41
46
47
48
49
49
50
51
51
52
0
1
0
0
0
1
0
1
0
1
0
54
55
58
60
60
62
65
67
71
77
81
0
1
1
1
0
1
1
1
1
1
1
How can we analyse these
data?
• Compare mean age of diseased and nondiseased
– Non-diseased:
38.6 years
– Diseased: 58.7 years (p<0.0001)
• Linear regression?
Dot-plot: Data from Table 2
Signsofcoronarydisease
Y
e
s
N
o
0
2
0
4
0
A
G
E
(y
e
a
rs
)
6
0
8
0
1
0
0
Logistic regression (2)
Table 3
Prevalence (%) of signs of CD according
to age group
Diseased
Age group
# in group
#
%
20 - 29
5
0
0
30 - 39
6
1
17
40 - 49
7
2
29
50 - 59
7
4
57
60 - 69
5
4
80
70 - 79
2
2
100
80 - 89
1
1
100
Dot-plot: Data from Table 3
Diseased %
100
80
60
40
20
0
0
2
4
Age group
6
8
Logistic function (1)
Probability
of disease
1.0
0.8
e  bx
P( y x ) 
1  e  bx
0.6
0.4
0.2
0.0
x
Transformation
P(y x)
1  P(y x)
1.0
{
 P( y x ) 
ln
    bx
1 P( y x) 
logit of P(y|x)
0.8
0.6
0.4
0.2
0.0
e  bx
P( y x) 
1  e  bx
Fitting equation to the data
• Linear regression: Least squares or
Maximum likelihood
• Logistic regression: Maximum likelihood
• Likelihood function
– Estimates parameters  and b
– Practically easier to work with log-likelihood
n
L()  lnl ()   yi ln ( xi )  (1  yi ) ln1   ( xi )
i 1
Maximum likelihood
• Iterative computing (Newton-Raphson)
– Choice of an arbitrary value for the
coefficients (usually 0)
– Computing of log-likelihood
– Variation of coefficients’ values
– Reiteration until maximisation (plateau)
• Results
– Maximum Likelihood Estimates (MLE) for 
and b
– Estimates of P(y) for a given value of x
Multiple logistic regression
• More than one independent variable
– Dichotomous, ordinal, nominal, continuous …
 P 
ln 
  α  β1x1  β2x2  ... βixi
 1- P 
• Interpretation of bi
– Increase in log-odds for a one unit increase in
xi with all the other xis constant
– Measures association between xi and logodds adjusted for all other xi
Statistical testing
• Question
– Does model including given independent
variable provide more information about
dependent variable than model without this
variable?
• Three tests
– Likelihood ratio statistic (LRS)
– Wald test
– Score test
Likelihood ratio statistic
• Compares two nested models
Log(odds) =  + b1x1 + b2x2 + b3x3 (model 1)
Log(odds) =  + b1x1 + b2x2
(model 2)
• LR statistic
-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood
model 1)
LR statistic is a 2 with DF = number of extra
parameters in model
Example 6
Fitting a Logistic regression model to the
NINDS data, using only one covariate
(treatment group).
NINDS again
Observed frequencies
rt-PA
placebo
non-resp responder
165
147
190
122
355
269
312
312
S
A
S
|
o
u
t
p
u
t
The LOGISTIC Procedure
Response Profile
Ordered
Value
1
2
Binary
Outcome
Count
EVENT
NO EVENT
269
355
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Criterion
AIC
SC
-2 LOG L
Score
Intercept
Only
Intercept
and
Covariates
855.157
859.593
853.157
.
853.069
861.941
849.069
.
Chi-Square for Covariates
.
.
4.089 with 1 DF (p=0.0432)
4.084 with 1 DF (p=0.0433)
Analysis of Maximum Likelihood Estimates
Variable
DF
INTERCPT
GRP
1
1
Parameter
Estimate
Standard
Error
Wald
Chi-Square
Pr >
Chi-Square
Standardized
Estimate
Odds
Ratio
-0.4430
0.3275
0.1160
0.1622
14.5805
4.0743
0.0001
0.0435
.
0.090350
.
1.387
Logistic regression example
• AZ trial (CLASS) in acute stroke comparing
clomethiazole (n=678) with placebo (n=675)
• Response defined as a Barthel Index score
 60 at 90 days
• Covariates:
–
–
–
–
STRATUM (time to start of trmt: 0-6, 6-12)
AGE
SEVERITY (baseline SSS score)
TRT (treatment group)
S
A
S
|
o
u
t
p
u
t
Response Profile
Ordered
Value
BI_60
Count
1
2
1
0
750
603
Analysis of Maximum Likelihood Estimates
Parameter Standard
Wald
Pr >
Standardized
Variable DF Estimate
Error Chi-Square Chi-Square
Estimate
Odds
Ratio
INTERCPT
TRT
STRATUM
AGE
SEVERITY
1.139
1.114
0.935
1.099
1
1
1
1
1
2.4244
0.1299
0.1079
-0.0673
0.0942
0.5116
0.1310
0.1323
0.00671
0.00642
22.4603
0.9838
0.6648
100.6676
215.0990
0.0001
0.3213
0.4149
0.0001
0.0001
0.035826
0.029751
-0.409641
0.621293
Conditional Odds Ratios and 95% Confidence Intervals
Wald Confidence Limits
Variable
Unit
Odds
Ratio
Lower
Upper
TRT
STRATUM
AGE
SEVERITY
1.0000
1.0000
1.0000
1.0000
1.139
1.114
0.935
1.099
0.881
0.859
0.923
1.085
1.472
1.444
0.947
1.113
A Bayesian alternative
Case-Control
• Imagine a randomised clinical trial or a
case control study. The analysis uses a
chi square test and the corresponding pvalues. If this turns out to be less than
0.05 we assume significance.
Example 7:
Some studies from the year 1990
suggested that the risk to CHD is
associated with childhood poverty.
Since infection with the bacterium H.
Pylori is also linked to poverty, some
researchers suspected H. Pylori to be
the missing link. In a case control study
where levels of infections were
considered in patients and controls the
following results were obtained.
Case/Control
Case CHD Control
High
60%
39%
n11+n12
Low
40%
61%
n21+n22
n11+n21
n12+n22

1- P [ H0 ] 

P [ H 0 | D]  1 
 P [ H 0 ]  BF 
where
P[D| H 0 ]
BF 
P[D| H1 ]
1
The chi square statistic having, in this case, the
value 4.37 yields a p-value of 0.03 which is less
than the formal level of significance 0.05.
There is, however, no theoretical reason to
believe that this result is true. So we take again
P(H0)=0.5. This leads to
1
1 

1


 BF  1 
 BF 
P [H0 | D]  1  2   




1

 BF 
 BF  1 
BF 

 2

Berger and Selke (1987) have shown that for a
very wide range of cases including the case
control case
BF   e
2
 1  2

 2





Using the value 4.73 for the chi square
variable leads to a BF value of at least
0.337
(M. A. Mendall et al Relation betweenH. Pylori infection
and coronary heart disease. Heart J. (1994)).
Conclusion
 0.337 
P[H0 | D]  
  0.252
 0.337 1 
Taking another (more or less sceptical)
attitude does not change a the conclusion
that much:
P(H0)=0.75 => P[ H0| D] > (0.5)
P(H0)=0.25 => P[ H0| D] > (0.1)
Questions or Comments?