Variable Selection for Tailoring treatment

Download Report

Transcript Variable Selection for Tailoring treatment

Variable Selection for
Tailoring Treatment
L. Gunter, J. Zhu & S.A. Murphy
ASA, Nov 11, 2008
1
Outline
•
•
•
•
Motivation
Need for Variable Selection
Characteristics of a Tailoring Variable
A New Technique for Finding Tailoring
Variables
• Comparisons
• Discussion
2
Motivating Example
STAR*D "Sequenced Treatment to Relieve Depression"
Preference
Treatment Intermediate Preference
Two
Outcome
Treatment Intermediate
Three
Outcome
Follow-up
CIT + BUS
Augment
Follow-up
Remission
R
L2-Tx +THY
Augment
Treatment
Four
Remission
R
TCP
CIT + BUP
L2-Tx +LI
Nonremission
CIT
Nonremission
BUP
R
MIRT
MIRT + VEN
Switch
R
Switch
R
VEN
SER
NTP
30+ baseline variables, 10+ variables at each treatment level, both categorical and continuous
3
Simple Example
Nefazodone - CBASP Trial
Nefazodone
Randomization
Nefazodone + Cognitive
Behavioral Analysis
System of Psychotherapy
(CBASP)
50+ baseline covariates, both categorical and continuous
4
Simple Example
Nefazodone - CBASP Trial
X
patient’s medical history, severity of depression,
current symptoms, etc.
A
Nefazodone OR Nefazodone + CBASP
R
depression symptoms post treatment
Which variables in X are important for tailoring the
treatment?
5
Optimization
• We want to select the treatment that
“optimizes” R
arg max E[ R | X , A  a]
a
• The optimal choice of treatment may
depend on X
6
Optimization
• The optimal treatment(s) is given by
• The value of d is
7
Need for Variable Selection
• In clinical trials many pretreatment variables are
collected to improve understanding and inform
future treatment
• Yet in clinical practice, only the most informative
variables for tailoring treatment can be collected.
• A combination of theory, clinical experience and
statistical variable selection methods can be
used to determine which variables are important.
8
Current Statistical Variable
Selection Methods
• Current statistical variable selection methods
focus on finding good predictors of the response
• Also need variables to help determine which
treatment is best for which types of patients, e.g.
tailoring variables
• Experts typically have knowledge on which
variables are good predictors, but intuition about
tailoring variables is often lacking
9
What is a Tailoring Variable?
0.8
X1
No Interaction
A=0
0.0
A=0
0.0
0.4
0.8
R
A=0
0.0
A=1
0.4
0.8
A=1
0.4
0.4
R
A=1
0.0
R
0.8
• Tailoring variables help us determine which treatment is
best
• Tailoring variables qualitatively interact with the
treatment; different values of the tailoring variable result
in different best treatments.
0.0
0.4
0.8
X2
Non-qualitative Interaction
0.0
0.4
0.8
X3
Qualitative interaction
10
Qualitative Interactions
• Qualitative interactions have been discussed by
many within stat literature (e.g. Byar & Corle,1977; Peto,
1982; Shuster & Van Eys, 1983; Gail & Simon, 1985; Yusuf et al.,
1991; Senn, 2001; Lagakos, 2001)
• Many express skepticism concerning validity of
qualitative interactions when found in studies
• Our approach for finding qualitative interactions
should be robust to finding spurious results
11
Qualitative Interactions
0.8
0.8
0.8
• We focus on two important factors
– The magnitude of the interaction between the
variable and the treatment indicator
– The proportion of patients for whom the best choice
of treatment changes given knowledge of the variable
A=1
0.4
0.8
X4
big interaction
big proportion
A=0
0.0
A=0
0.0
0.0
0.4
R
0.4
A=1
R
0.4
A=0
0.0
R
A=1
0.0
0.4
0.8
X5
small interaction
big proportion
0.0
0.4
0.8
X6
big interaction
small proportion
12
Ranking Score S
• Ranking Score:
S j  Eˆ max Eˆ  R | X j  x j , A  a   Eˆ  R | X j  x j , A  a*
 a

where a*  arg max Eˆ[ R | A  a]
a
0.6
A=0
0.0
• S estimates the quantity
described by Parmigiani
(2002) as the value of
information.
R
A=1=a*
0.0
0.4
Xj
0.8
13
Ranking Score S
• Higher S scores correspond to higher
evidence of a qualitative interaction
between X and A
• We use this ranking in a variable selection
algorithm to select important tailoring
variables.
– Avoid over-fitting in Eˆ[ R | X , A] due to large
number of X variables
– Consider variables jointly
14
Variable Selection Algorithm
1. Select important predictors of R from (X,
X*A) using Lasso
-- Select tuning parameter using BIC
2. Select all X*A variables with nonzero S.
-- Use predictors from 1. to form linear
regression estimator of Eˆ[R | X , A] to form S.
(using linear models)
15
Lasso
• Lasso on (X, A, XA)
(Tibshirani, 1996)
– Lasso minimization criterion:
n


2
ˆ
 ( )  arg min Ri  Zi      1 

 i 1

where Zi is the vector of predictors for patient i,
λ is a penalty parameter
– Coefficient for A not penalized
– Value of λ chosen by Bayesian Information Criterion
(BIC) (Zou, Hastie & Tibshirani, 2007)
16
Variable Selection Algorithm
3. Rank order (X, X*A) variables selected in
steps 1 & 2 using a weighted Lasso
-- Weight is 1 if variable is not an interaction
-- Otherwise weight for kth interaction is
-is a small positive number.
-- Produces a combined ranking of the selected
(X, X*A) variables (say p variables).
17
Variable Selection Algorithm
4. Choose between variable subsets using a
criterion that trades off maximal value of
information and complexity.
-- The ordering of the p variables creates p subsets of
variables. Estimate the value of information for
each of the p subsets
ˆ ˆ
Vk  V0 , k  1,..., p
-- Select the subset, k with largest
18
Simulations
• Data simulated under wide variety of realistic
decision making scenarios (with and without
qualitative interactions)
– Used X from the CBASP study, generated new A and
R
• Compared:
– New method: S with variable selection algorithm
– Standard method: BIC Lasso on (X, A, XA)
• 1000 simulated data sets: recorded percentage
of time each variable’s interaction with
treatment was selected for each method
19
Simulation Results
Generative Model
Ave # of Spurious
Ave % increase in
Interactions Selected
Value over
over BIC LASSO
BIC LASSO*
No Interactions
0.5
-0.03
Non-qualitative
Interactions Only
0.1
0.00
Qualitative Interaction
Only
1.1
0.23
Both Qualitative and
Non-qualitative Interactions
0.2
0.39
* Over the total possible increase; 1000 data sets each of size 440
20
Simulation Results
• Pros: when the model contained
qualitative interactions, the new method
gave significant increases in expected
response over BIC-Lasso
• Cons: the new method resulted in a slight
increase in the number of spurious
interactions over BIC-Lasso
21
Nefazodone - CBASP Trial
Aim of the Nefazodone CBASP trial – to
compare efficacy of three alternate treatments
for major depressive disorder (MDD):
1. Nefazodone,
2. Cognitive behavioral-analysis system of
psychotherapy (CBASP)
3. Nefazodone + CBASP
Which variables might help tailor the depression
treatment to each patient?
22
Nefazodone - CBASP Trial
• For our analysis we used data from 440
patients with
X
61 baseline variables
A
Nefazodone vs. Nefazodone + CBASP
R
Hamilton’s Rating Scale for Depression
score, post treatment
23
Method Application and
Confidence Measures
•
When applying new method to real data it is
desirable to have a measure of reliability and to
control family-wise error rate
•
We used bootstrap sampling to assess reliability
– On each of 1000 bootstrap samples:
1. Run variable selection method
2. Record the interaction variables selected
– Calculate selection percentages over bootstrap
samples
24
Error Rate Thresholds
•
To help control family-wise error rate, compute
the following inclusion thresholds for selection
percentages:
1. Repeat 100 times
a. Permute interactions to remove effects from the data
i. Run method on 1000 bootstrap samples of permuted data
ii. Calculate selection percentages over bootstrap samples
b. Record largest selection percentage over the p interactions
2. Threshold: (1-α)th percentile over 100 max selection
percentages
•
Select all interactions with selection
percentage greater than threshold
25
Error Rate Thresholds
• When tested in simulations using new
method, error rate threshold effectively
controlled family-wise error rate
• This augmentation of bootstrap sampling
and thresholding was also tested on BIC
Lasso and effectively controlled familywise error rate in simulations
26
Nefazodone - CBASP Trial
20
40
60
variable number
20
40
ALC
OCD
0
20
ALC
0
New Method
% of time chosen
40
OCD
0
% of time chosen
BIC Lasso
0
20
40
60
variable number
27
35
Interacion Plot
Interaction
Plot
30
Txt=Nef
25
Fitted R
Txt=Combo
0
1
Alcohol Dependence
28
30
Interaction
Plot
Interacion Plot
20
15
Txt=Nef
10
Fitted R
25
Txt=Combo
0
1
Obsessive Compulsive Disorder
29
Discussion
• This method provides a list of potential
tailoring variables while reducing the
number of false leads.
• Replication is required to confirm the
usefulness of a tailoring variable.
• Our long term goal is to generalize this
method so that it can be used with data
from Sequential, Multiple Assignment,
Randomized Trials as illustrated by
STAR*D.
30
• Email Susan Murphy at [email protected]
for more information!
• This seminar can be found at
http://www.stat.lsa.umich.edu/~samurphy/seminars/
ASA11.11.08.ppt
• Support: NIDA P50 DA10075, NIMH R01 MH080015
and NSF DMS 0505432
• Thanks for technical and data support go to
– A. John Rush, MD, Betty Jo Hay Chair in Mental Health at the
University of Texas Southwestern Medical Center, Dallas
– Martin Keller and the investigators who conducted the trial `A
Comparison of Nefazodone, the Cognitive Behavioral-analysis
System of Psychotherapy, and Their Combination for Treatment
of Chronic Depression’
31
35
Interacion Plot
Interaction
Plot
30
Txt=Nef
25
Fitted R
Txt=Combo
0
1
Alcohol Dependence
32
30
Interaction
Plot
Interacion Plot
20
15
Txt=Nef
10
Fitted R
25
Txt=Combo
0
1
Obsessive Compulsive Disorder
33
Lasso Weighting Scheme
• Lasso minimization criterion equivalent to:
p
n


2
ˆ
 ( )  arg min wi Ri  Z i      w j  j 

j 1
 i 1

so smaller wj means greater importance
• Weights wj  v j kp1 vk where
– vj = 1
for predictive variables
– vj = S j (maxk (Sk )   ) for prescriptive variables
34
AGV Criterion
• For a subset of k variables, X{k} the Average
Gain in Value ( AGV) criterion is
max Eˆ [ R | X {k } , A  a ]  Eˆ [ R | A  a*]  m * 
AGVk  a


ˆ
ˆ
k
max E[ R | X {m*} , A  a ]  E[ R | A  a*] 

a


where m*  arg max max E[ R | X{k }, A  a]
k
a
• The criterion selects the subset of variables with
the maximum proportion of increase in E[R] per
variable
35
Simulation Results (S-score)
0
20
40
60
variable number
0
20
40
60
variable number
20
10
0
0
20
40
60
variable number
60
BIC Lasso
×
20
Qualitative Interaction
 Non-qualitative Interaction
 Spurious Interaction
0
% of time chosen
60
20
0
% of time chosen
New Method
BIC Lasso
% of time chosen
10
Qualitative Interaction
 Spurious Interaction
0
×
20
% of time chosen
New Method
0
20
40
60
variable number
36