Dennis_NIH_9-10-08_presentation

Download Report

Transcript Dennis_NIH_9-10-08_presentation

Practical Applications of
Measurement to Addiction
Research
(“Why do we care?”)
Michael L. Dennis, Ph.D.
Chestnut Health Systems, Bloomington, IL
Presentation at NIH Pre-session of the International Conference on Outcome
Measurement, September 10, 2008, Rockville, MD. This presentation supported
by National Institute on Drug Abuse (NIDA) grant no R37 DA11323 and Center
for Substance Abuse Treatment (CSAT), Substance Abuse and Mental Health
Services Administration (SAMHSA) contract 270-07-019. The opinions are
those of the author and do not reflect official positions of the consortium or
government. Available on line at www.chestnut.org/LI/Posters or by contacting
Joan Unsicker at 720 West Chestnut, Bloomington, IL 61701, phone: (309) 8276026, fax: (309) 829-4661, e-Mail: [email protected]
Objectives are to...

Examine why more traditional clinical trials type
researchers need to care about measurement

Provide explicit practical examples of how
addressing measurement in Addiction Research can
help improve it
Since the early 1960s, Jacob Cohen and
colleagues has suggest that clinical trials
research should:

Focus on Statistical power, which is
-

the probability of finding what you are looking for
given that it is there
Combine data from multiple clinical trials into
meta analyses, which can be used as
-
a more stable estimate of truth
-
to evaluate the accuracy of our early estimates and
how methods can be improved
In a review of over 200 meta analyses of
medical, social and legal studies published
between 1960-1990, Lipsey consistently found


Less than a third of the individual articles
coded even mentioned
-
the statistical power of their core contrast
-
reliability, validity, or sensitivity of their outcome
measure
That relative to final effect size estimated from
the meta analysis, the studies averaged less
than 50% power
-
in other words, it was more accurate to flip a coin
than to use a statistical test the way they were being
used “on average” in the published literature
Movement to Improve the Methodological
Quality of Clinical Trials Research

In 1993 a group of 30 experts (medical journal editors,
clinical trialists, epidemiologists, and methodologists)
met in Ottawa to try to identify methodological gaps in
the literature

In 1996 this growing group issued the Consolidated
Standards of Reporting Trials (CONSORT;
www.consort-statement.org)

Since 2000, NIH has required DSMB on all Phase 3
and multi-site phase 2 studies (Notice OD-00-38) –
which also push CONSORT

Today virtually every major medical, psychiatric,
psychological, criminological, and social journal has
signed onto CONSORT
Basic ways to increase power
While the most common
approach, these are also the
most expensive and
logistically difficult to do

Increase sample size

Increase observations

Target a higher severity/less heterogeneous
sample

Increase implementation

Reduce measurement error

Reduce unexplained variance (which may be
systematic)

More accurately model error and unexplained
variance in analysis
Today’s focus
“Observed” Effect size
goes down with lower
reliability
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Reliability of Dependent Variable
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
No Measurement
Error
True
Effect
Size
d=.2
d=.4
d=.8
1.00
Observed Effect Size (Observed d)
Observed Effect Size as a function of
“True” effect size (Cohen’s d) and
reliability of dependent variable
1000
900
800
700
600
500
400
300
200
100
0
Reliability of Dependent Variable
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
d=.2
d=.4
d=.8
0.90
A reliability of
.7 doubles
sample size
requirements
True
Effect
Size
1.00
n per group for 80% power
Sample size required for 80% power as a
function of “True” effect size (Cohen’s d) and
reliability of dependent variable
Increasing
reliability from
.4 to .7 cuts
sample size
requirements
by over 50%
Some of common source of discordant
answers in test-retest questions that can be
readily addressed are:






Unclear time periods
Badly worded double negatives
Constantly changing response sets
Difficult to use (or time consuming) response sets
Behavior/trait that varied in a range (disturbance)
Abstract concepts not defined well by a single
question
Impact of Comprehensive Data Collection
Protocol Certification on Measurement Issues
-0.6
-0.4
0 <- Cohen's da
-0.2
Proportion of
Inconsistencies (100%)*
-0.39
Duration
(in Minutes)*
-0.25
Denial/Misrepresentation
(Staff Rating)*
-0.24
Context Effect
(Staff Report)
-0.10
-0.04
\a Cohen's d (Post Certification - Pre Certification)/Pooled STD
* p<.05
Proportion of Missing
Data (100%)
-0.03
Atypicalness
(Outfit in Logits)
-0.03
Randomness
(Infit in Logits)
Source: GAIN coordinating center
Staff Experience Matters as well
Major improvement
over the first 15
interviews
Most improvements
have occurred by 60
interviews
Source: GAIN coordinating center
R in
wave
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Observations
Two observations (e.g., pre & post test)
more reliable than post only
18
17
15
14
13
12
11
10
9
8
7
6
5
4
3
2
0.1
1
Reliability Across OIbservations
Impact of the Number of Observations on
Reliability Across Observations
by Initial Reliability in a Wave
The lower the reliability, the
longer it takes to reach a point of
diminishing returns on more
observations
Some examples of increasing reliability with
multiple observations






Baseline observation to separate individual
differences
Multiple observations to separate trajectories
Multiple observations nested within a hierarchical
structure (e.g., patients within staff or site)
Blood pressure, lung capacity, motivation, readiness
to change, attitudes or other things that tend to vary in
a range (aka disturbance)
Redoing a urine or BAC test when unexpected
reading or it is contested by participant
Redoing a positive HIV test for confirmation
Peak Joints Reported at Time 2 on Form 90
Identify Cut Points Where a Question Like
“Peak Use” Is Likely to Become Unreliable
Peak Joints Reported at time 1 on GAIN
Source: Dennis et al 2004
Impact of Number of Items on
Reliability (Alpha) Observed
by Average Inter-item Correlation
Avg
Item R
0.9
0.8
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.7
0.6
0.5
0.4
Symptom counts related to a
syndrome or latent construct
usually max out in 5-13 items
0.3
0.2
Behavioral Measures (e.g., how
many days, times) have high
reliability and max out around
3-5 items
Number of Items
30
28
26
24
22
20
18
15
13
11
9
7
5
3
0.1
1
Reilablity (Alpha)
Generally
target
1.0
.7 to .9 0.9
Covert Scales (e.g., MMPI),
summative indices, and other
measures with low inter item R
may take 30 items (or more)
Example of how scales can also be inter-related
andassociated
used
Higher scores
with for validation
Higher scores associated with greater dysfunction
Higher scores associated with mental
alcohol and drug abuse medication
(e.g., dropping out of school, unemployment,
(methadone, naltrexone, antaabuse,
financial problems, homelessness)
buprenorphine)
and/or
substance
Structure of GAIN’s Psychopathology Measures
induced legal, mental health,
physical health, and withdrawal
problems
Higher scores associated with mental health treatment (e.g.,
Ritalin, Adderall, lithium), special/alternative education,
school or work problems, gambling and other evidence of
impulse control problems, and/or anti-social/borderline
personality disorders
health treatment (e.g., anti depressants,
seritonin reuptake inhibitors (SSRI),
and monoamine
Validityoxidase
Checks
inhibitors (MAOI)
sedatives) and/or a history of traumatic
victimization, and/or high levels of
stress
Higher scores associated with arrests, detention/jail time,
probation, parole, size of drug habit
Key Advantages of Creating Scales
and Indices for Clinical Research




One of the lowest cost ways to reduce measurement
error and increase statistical power
Reduce clinical omissions and backtracking for validity
checks
Increase conceptual robustness, interpretability and
make it easier to explain to others
Facilitates profiling over a large number of items
Formal Measurement Models Can Be Used to








Place people along a more reliable/sensitive ruler (aka
common or latent factor)
Look at the slope/ discrimination of items (primarily 2
parameter IRT)
Related items in terms of their average severity
Look at the match/mismatch of people and item
locations (primarily Rasch / 1 parameter IRT)
Study real differences by primary substance, gender,
race, age or other groups
Identify potential bias at the item and test level by
gender, race or other groups
Identify atypical patterns of answers (e.g. outfit)
Identify random response patterns (e.g., infit)
Note you can also create a summary
measures across different sources of data
Source: Lennox et al 2006 (CFI=.98)
Impact of Item Discrimination (aka steepness
of slope) on Sample Size Requirements
1000
900
800
700
600
500
400
300
200
100
0
True Effect Size
(number of items)
n per group for 80% power
16-36%
reduction
in sample
size
IRT focuses on
better use of items
with low / range of
discrimination
d=.2
(50 items)
d=.4
(10 items)
d=.8
(10 items)
0.5
1.0
1.5
2.0
2.5
Flat<-Average Item Discrmination/slope -> Steep
Rasch focuses
on finding high
discrimination
items so that
differences
between items
can be ignored
Why Use Rasch and IRT?


Raw, Rasch and IRT scales generally correlated over
.95 and vary by less than 5% in sample size
requirements
The big advantage of going to Rasch and IRT are that
they can be used to:
reduce scale length (aka cost) through computer adaptive
interviewing (as just described by Dr. Riley)
- explore and test assumptions about how items are related to
each other
- explore and test assumptions how items/ scales vary by
subgroups
- identify people with atypical presentations
- identify people who appear to be responding randomly
-
Example: Evaluating the
Substance Use Disorders (SUD) Concept





Much of our conceptual basis of addiction comes from
Jellnick’s 1960 “disease” model of adult alcoholism
Edwards & Gross (1976) codified this into a set of biopsycho-social symptoms related to a “dependence”
syndrome
In practice, they are typically complemented by a set of
separate “abuse” symptoms that represent other key reasons
why people enter treatment
DSM 3, 3R, 4, 4TR, ICD 8, 9, & 10, and ASAM’s PPC1
and PPC2 all focus on this syndrome
Note that these symptoms are only correlated about .4 to .6
with “use” (e.g., ASI, SFS) or “problem” scales (e.g.,
MAST, DAST, CAGE) more commonly used in treatment
research
DSM (GAIN) Symptoms of Dependence
(3+ Symptoms)
Physiological
n. Tolerance (you needed more alcohol or drugs to get high or found that the
same amount did not get you as high as it used to?)
p.
Withdrawal (you had withdrawal problems from alcohol or drugs like
shaking hands, throwing up, having trouble sitting still or sleeping, or that you
used any alcohol or drugs to stop being sick or avoid withdrawal problems?)
Non-physiological
q. Loss of Control (you used alcohol or drugs in larger amounts, more often or
for a longer time than you meant to?)
r. Unable to Stop (you were unable to cut down or stop using alcohol or
drugs?)
s. Time Consuming (you spent a lot of your time either getting alcohol or
drugs, using alcohol or drugs, or feeling the effects of alcohol or drugs?)
t. Reduced Activities (your use of alcohol or drugs caused you to give up,
reduce or have problems at important activities at work, school, home or
social events?)
u. Continued Use Despite Personal Problems (you kept using alcohol or drugs
even after you knew it was causing or adding to medical, psychological or
emotional problems you were having?)
DSM (GAIN) Symptoms of Abuse
(1+ symptoms)
h. Role Failure (you kept using alcohol or drugs even though
you knew it was keeping you from meeting your
responsibilities at work, school, or home?)
j. Hazardous Use (you used alcohol or drugs where it made
the situation unsafe or dangerous for you, such as when
you were driving a car, using a machine, or where you
might have been forced into sex or hurt?)
k. Legal problems (your alcohol or drug use caused you to
have repeated problems with the law?)
m.Continued Use after Legal/Social Problems (you kept
using alcohol or drugs even after you knew it could get
you into fights or other kinds of legal trouble?)
On-Going Debates About SUD Concept
•
•
•
•
•
Formal assumption that symptoms of “physiological
dependence” (either tolerance or withdrawal) are
markers of high severity
Debate about whether “abuse” symptoms should be
dropped, thought of as early dependence, or thought
of as moderate/high severity markers that warrant
treatment even in the absence of a full syndrome
Debate about whether to treat diagnostic orphans (1-2
symptoms of dependence) as abuse or continue to
ignore them
Concern about whether the current symptoms (which
were based primarily on adult data) are appropriate
for use with adolescents
Concern about the sensitivity to change
Conrad et al 2007
Data Source and Methods

Data from 2474 Adolescents, 344 Young Adults and 661
Adults interviewed between 1998 and 2005 with the
Global Appraisal of Individual Needs (GAIN; Dennis et al
2003)

Participants recruited at intake to Early Intervention,
Outpatient, Intensive Outpatient, Short, Moderate & Long
term Residential, Corrections Based and Post Residential
Outpatient Continuing Care as part of 72 local evaluations
around the U.S. and pooled into a common data set

Analysis here focuses on the GAIN Substance Use
Disorder Scale (SUDS) with symptoms of dependence and
abuse overall and by substance. The rating scale is 3=past
month, 2=past 2-12 months, 1=more than a year ago and
0=never.

Analyses done with a combination of Winsteps and Facets
The GAIN’s
Substance Problem Scale (SPS)

DSM-IV Clinical Diagnosis categories and courser
specifiers (Kappa of .5 to .7)

Epidemiological Lifetime, Past Year and/or Past
Month Diagnosis categories (Kappa of .5 to .7)

Dimensional Symptom counts for lifetime, past
year and/or past month with internal consistencies
of .8 to .9 (test retest of .7 to .9)
Sample Characteristics
Young Adult:
Adolescents:
18-25
<18 (n=2474)
(n=344)
Male
74%
Caucasian
48%
African American
18%
Hispanic
12%
Average Age
15.6
Substance Disorder
85%
Internal Disorder
53%
External Disorder
63%
Crime/Violence
64%
Residential Tx
31%
Current CJ/JJ invol.
69%
Note: all significant, p < .01
Adults:
26+
(n=661)
58%
47%
54%
29%
27%
63%
7%
2%
20.2
37.3
82%
90%
62%
67%
45%
37%
51%
34%
56%
74%
74%
45%
Item Relationships Across Substances
Withdrawal (+0.34)
Desp.PH/MH (+0.10)
Give up act. (+0.05)
Can't stop (+0.05)
Tolerance (0.00)
Loss of Contro (-0.10)
Fights/troub. (0.17)
0.00
Role Failure (-0.12)
0.20
Time Cons. (-0.21)
Rasch Severity Measure
0.40
Hazardous (-0.03)
Average Item Severity (0.00)
0.60
1st dimension explains
75% of variance (2nd explains 1.2%)
Despite Legal (+0.10)
0.80
-0.20
-0.40
-0.60
Abuse Sx:
Abuse Symptoms are also
spread over continuum
Physiological Sx:
While Withdrawal is
High severity, Tolerance
Dependence Sx:
is only Moderate
Other dependence Symptoms
spread over continuum
Symptom Severity Varied by Drug
0.80
Withdrawal much less likely for CAN
AVG (0.00)
0.60
CAN
AMP (+0.89)
Rasch Severity Measure
OPI (+0.44)
COC (-0.22)
0.40
ALC (-0.44)
CAN (-0.67)
0.20
ALC
CAN
0.00
AMP
OPI
ALC
COC
-0.20
OPI
AMP
ALC
CAN
COC
COC
OPI
AMP
COC
OPI
OPI
CAN
ALC
AMP
COC
ALC
AMP
CAN
CAN
OPI
AMP
COC
OPI
COC
-0.60
Easier to endorse
Easier to endorse time fighting/ trouble
for ALC/CAN
consuming for CAN
OPI
COC
ALC
CAN
AMP
OPI
ALC
CAN
ALC
AMP
AMP
OPI
COC
AMP
ALC
CAN
-0.40
ALC
AMP
CAN
OPI
COC
CAN
ALC
COC
Easier to
endorse
hazardous
use for
ALC/CAN
Easier to
endorse
moderate
Sx for
COC/OPI
Easier to
endorse
Easier to
despite legal endorse
problem for Withdrawal
ALC/CAN
for
AMP/OPI
Symptom Severity Varied Even More By Age
1.8
Rasch Severity Measure
1.6
26+
Age
1.4
<18
1.2
18-25
Continued use in spite
of legal problems more
likely among Adol/YA
26+
1
0.8
1825
0.6
26+
0.4
26+
0.2
1825
0
<18
1825
-0.2
-0.4
-0.6
-0.8
<18
1825
<18
26+
<18
1825
<18
1825
<18
1825
26+
<18
1825
1825
<18
1825
<18
26+
26+
26+
26+
26+
26+
-1
More likely to lead to
fights among Adol/YA
1825
<18
<18
Hazardous use more
likely among Adol/YA
Adults more
likely to endorse
most symptoms
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2
l
vg
.S
x.
Se
v.
w
A
dr
a
W
ith
iv
up
A
cs
M
ed
pr
ob
s
La
w
n
G
td
ow
Cu
O
D
eA
or
M
U
ns
a
fe
r
Lo
ng
e
ub
le
Tr
o
Ti
m
eG
et
Re
sp
on
Comparing Substances
Amp 0.88
Opi 0.43
Coc -0.21
Alc -0.44
Can -0.66
Rasch Severity by Past Month Status
2.00
Rasch Severity Measure
1.50
1.00
0.50
Diagnostic Orphans (1-2
dependence symptoms)
are lower, but still overlap
with other clinical groups
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3.00
-3.50
None
Diagnostic Diagnostic Lifetime
Lifetime
SUD
Orphan Orphan
SUD
in early
in early
in CE
remission 45+ days
remission
Abuse
Only
Dependence Both
Only
Abuse
and
Dependence
Rasch Severity Measure
Severity by Past Year Symptom Count
2.00
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3.00
-3.50
-4.00
1. Better Gradation
2. Still a lot of overlap in range
0
1
2
3
4
5
6
7
8
9
10
11
Severity by Number of
Past Year SUD Diagnoses
1. Better Gradation
2. Less overlap in range
2.00
Rasch Severity Measure
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3.00
-3.50
-4.00
0
1
2
3
4
5
Rasch Severity Measure
Severity by Weighted (past month=2, past year=1)
Number of Substance x SUD Symptoms
1. Better Gradation
2. Much less overlap in range
2.00
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3.00
-3.50
-4.00
0
1-4
5-8
9-12 13-16 17-20 21-24 25-30 31-40 41+
Average Severity by Age
2.00
1. Average goes up with age
2. Complete overlap in range
3. Narrowing of distribution on
higher severity at older ages
1.50
1.00
0.50
0.00
-0.50
-1.00
-1.50
-2.00
-2.50
-3.00
-3.50
-4.00
Adolescent (<18)
Young Adult (18-25)
Adult (26+)
Construct Validity (i.e., does it matter?)
Recovery
Environment
DSM diagnosis \a
Symptom Count Continuous \b
0.47
0.48
0.40
0.43
0.32
0.39
0.30 0.30
0.32 0.31
Weighted Symptom Rasch \c
Weighted Drug x Symptom \c,d
0.57
0.26
0.46
0.27
0.39
0.19
0.39 0.32
0.29 0.09
\a Categorized as Past year physiology dependence, non-physiological
dependence, abuse, other
\b Raw past year symptom count (0-11)
\c Symptoms weighted by recency (2=past month, 1=2-12 months ago, 0=other)
\d Symptoms by drug (alcohol, amphetamine, cannabis, cocaine, opioids)
Social Risk
Emotional
Problems
Weighted
symptom by
drug count
severity did
WORSE
Past Week
Withdrawal
Rasch
does
a little
Better
still
Frequency
Of Use
Past year
Symptom
count did
better than
DSM
Implications for SUD Concept







“Tolerance” is not a good marker of high severity;
withdrawal (and substance induced health problems are)
“Abuse” symptoms are consistent with the overall syndrome
and represent moderate severity or “other reasons to treat in
the absence of the full blown syndrome”
Diagnostic orphans are lower severity, but relevant
Pattern of symptoms varies by substance and age, but all
symptoms are relevant
“Adolescents” experienced the same range of symptoms,
though they (and young adults) were particularly more likely
to be involved with the law, use in hazardous situations, and
to get into fights at lower severity
Symptom Counts appear to be more useful than the current
DSM approach to categorizing severity
While weighting by recency & drug delineated severity, it did
not improve construct validity