No Slide Title

Download Report

Transcript No Slide Title

Measurement issues
Jean Bourbeau, MD
Respiratory Epidemiology and Clinical Research Unit
McGill University
Clinical Epidemiology (679)
June 8, 2005
Objectives

Define categorical and continuous variables

Define 2 sources of variation: biological and measurement error
(random and bias)

Describe classification measures and their focus: functional,
descriptive and methodological

Define and discuss the advantages and disadvantages of objective
and subjective health measures

Define the psychometric properties of measurement instruments:
reliability, validity, responsiveness

Discuss key questions and concerns about each of the
psychometric properties of an instrument: reliability, validity and
responsiveness

Define and discuss minimal clinically important difference
Reading
Fletcher, Chapter 2
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Measurement
We need to assign numbers to
certain clinical phenomena to make
them manageable and “scientific”
Measurement
Measure:
•A scale or test is an instrument to
measure clinical phenomena; a score is
a value on the scale in a given patient
If implemented in an objective and
standardized manner, these measures
can be used for program evaluation and
research
Measurement
The attributes or events that are measured in
a research study are called « variables »
Variables are measured according to 2 types:
•Categorical
•Continuous
Categorical variables
•Also called discrete variable
•Dichotomous
or Polychotomous (multilevel):
- Nominal
- Ordinal
Dichotomous categorical
variables
Examples:
•Vital status (alive vs dead)
•Yes or no (response to a question)
•Sex (male vs female)
Polychotomous categorical
variables
Nominal:
•Named categories that bear no
ordered relationship to one another
Example:
•Hair, colour, race, or country of origin
Nominal scale
Hierarchy of mathematical adequacy:
• Lowest level (not a measurement but a
classification)
• Use numbers as a labels (such as male or
female)
• No inference can be drawn from the relative
size of the numbers used
Polychotomous categorical
variables
Ordinal:
•Named categories that bear an ordered
relationship to one another
•The intervals need not be equal
Example:
•Ordinal pain scale that include « pain
severity »: none, mild, moderate, and severe
•Deep tendon reflex: absent, 1+,2+, 3+, or 4+
Ordinal scale
Hierarchy of mathematical adequacy:
• Numbers are again used as a labels for
response categories
• Numbers reflect the increasing order of the
characteristics being measured (mild,
moderate,severe)
• Actual value of the numbers, and the
numerical distance hold no intrinsic meaning
Continuous variables
•Also called dimensional, quantitative or
interval variables
•Expressed as integers, fractions, or decimals
in which equal distances exist between
successive internals
•Examples: age, blood pressure, blood sugar
Interval scale
Hierarchy of mathematical adequacy:
• Numbers are assigned to the response
categories in such a way that a unit change
represents a constant change across the range
of the scale (temperature in degrees Celsius)
Ratio scale
Hierarchy of mathematical adequacy:
• Possible to state how many times greater
one score is that another
• This improves on the interval scale by
including a zero point
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Sources of variation
2 sources of variation:
•Biological variation
•Measurement error
Biological variation
Sources:
•Dynamic nature of most biologic
entities (difference age, sex, race, or
disease status)
•Temporal variation
(predictable sometime such as the
clinical cycle of plasma cortisol)
Measurement error
2 different types:
•Random (chance error)
•Bias (systematic error)
Measurement error
Can arise from:
•The method (measuring instrument )
•Observer (the measurer)
Measurement error
We can talk about the variability between
methods of making the measurement or
between the observers
Same method or observer to repeat the
measurement
• Intramethod or Intraobserver
Between two or more methods or observers
• Intermethod or Interobserver
Consequences of erroneous
measurement
Individual
•Makes no difference whether the error is systematic
or random
Group
•Variability in the absence of bias should not change
the average group value
•However, it can have deleterious consequences
when one is seeking associations or correlations
between 2 measures (analytic bias)
Regression toward the mean
•Individual measurement is subject to both
biologic variation and measurement error
•An extremely high or low value obtained in
an individual from a group is more likely to be
an error than is an intermediate value
•Tendency toward a less extreme value is
greater than the tendency for an intermediate
value to become more extreme
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Classification measures
Functional focus on:
• Purpose of application of the measures
Descriptive focus on:
• Their scope
Methodological focus on:
• Technical aspects
Functional classification
•Measures have discriminative,
evaluative or predictive properties
•Choice of measure depend of the
purpose(s) for which it will be used
Functional classification
Discriminative instrument:
Can discriminate between people with
different levels of a particular attribute or
disease
• For example:
• NYHA scale
• MRC dyspnea scale
MRC Dyspnea Scale
none
Grade 1 
Breathless with strenuous exercise
Grade 2 
Short of breath when hurrying on the
level or walking up a slight hill
Grade 3 
Walks slower than people of the same
age on the level or stops for breath while
walking at own pace on the level
Grade 4 
Stops for breath after walking 100 yards
Grade 5 
Too breathless to leave the house or
breathless when dressing
severe
Functional classification
Predictive instrument:
•Can predict a clinical diagnosis
(diagnostic test) or the likelihood of a
future event (prognostic test)
5-year survival COPD
FEV1
Dyspnea MRC scale
...according to staging as defined by the
ATS Guidelines (% predicted FEV1)
...according to the level of dyspnea as
evaluated by the MRC Dyspnea Scale
Nishimura K, et al. Chest 2002; 121: 1434-1440.
Functional classification
Evaluative instrument:
Can measure change over time in the
same person
•For example:
• Dyspnea subscale of the Chronic
Respiratory Questionnaires (CRQ) (COPD
disease specific quality of life questionnaire)
Descriptive classification
•Large number of possible
categories
•Can categorize by:
• Domain (dyspnea, fatigue, emotion)
• Generic or specific
COPD
Questionnaires
General
 used in any population
 cross-condition comparison
 co-morbid conditions and
effects to treatment covered
 do not focus on HRQL/ COPD
 irrelevant items
 insensitive to small changes
Disease-Specific
 focus on relevant aspects
of HRQL
 greater sensitivity for
disease changes
 increased responsiveness
 no comparisons
Methodological classification
•Large number of possible
categories
•Can categorize by:
• Interviewer versus self-administered
• Objective versus subjective
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Health measurements
Measurements may be based on:
• Laboratory or diagnostic tests (objective)
• Indicators in which the patient or the clinician
makes a judgement (subjective)
Health measurements
Unfortunately subjective is also used in
other ways:
• To indicate if the variable is observable or not
Examples:
• Objective indicator such as « The ability to climb
stairs »
• Subjective indicators such as « pain or feelings »
Objective vs Subjective
Objective:
• More often continuous (lab data)
• Few categorical (vital status, sex and race)
Subjective:
• Greater potential, for bias or variability on the part of
the observer
• Many variables that are most important in caring for
patients are « soft » and subjective
• For example: pain, mood, dyspnea, ability to work
The example of CABG
Why is quality of life important in studies
of CABG patients?
• Survival with surgery > medical treatment for
patients with left main and triple vessels
• Survival similar in patients with less severe
disease
CASS NEJM 1984; European cooperative study Lancet 1982.
As Feinstein has emphasized
The tendency of clinical investigators to
focus on “objective” rather that
“subjective” measurements, can result
in research that is both dehumanizing
and irrelevant
Subjective vs Objective
measurement
Objective vs Subjective
Data traditionally considered objective
“hard” can be seen, to have feet of
softer clay
Example:
•X-ray or cytopathologic diagnosis have
been shown to be subject to
considerable intra- and interobserver
variability
Subjective health
measurements
May be grouped into 3 main categories:
• General feelings of well-being
• Symptoms of illness
• Adequacy of a person’s functioning
Subjective health
measurements
Advantages:
• Amplify the data obtainable from morbidity and
mortality statistics
• Give insights into matters of human concern such
as pain suffering or depression
• Offer a systematic way to record the « voice of
the patient »
• Do not require expensive or invasive procedures
Subjective health
measurements
Disadvantages:
• For example, they contrast sharply with the
inherent reliability of mortality rate
• Seem more susceptible to bias
• Applying these measures to an entire
population more difficult or impossible
Subjective health
measurements
The use of rating methods suitable for
statistical analysis permit subjective
health measurements to rival the
quantitative strengths of the traditional
“objective” indicators
Health measurements
Scientific basis:
• Subjective judgements as a valid approach
to measurement derive from the field of
psychophysics;
• Psychophysics principles were later
incorporated into psychometrics from which
most of the techniques used to develop
subjective measurements of health have
been derived
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Psychometric properties
Definition:
• Psychometrics is the science of using
standardized tests or scales to measure
attributes of a person or object
Numerical estimates of health
Many scaling methods exist for:
• Translating « indicators » into numerical
estimates of severity
• When it is done, they may be combined into
an overall score,termed « health index »
Psychometric properties
Criteria for a scoring system:
•Reliability
•Validity
•Responsiveness
•Minimal clinical important difference
(MCID)
Reliability
Definition:
•The extent to which the same results
are obtained when the measurement is
repeated
It may reflect either (temporal) variation
or random measurement error
Reliability
Key Questions:
•Internal consistency
•Test-retest reliability (reproducibility)
Key Concern:
•Error
(error attenuates relationships and makes it
more difficult to detect treatment effects)
Validity
Definition:
•The extent to which the measurement
corresponds to the « true » value
(some accepted « gold standard ») or
behaves as expected
Validity depends on minimizing
measurement error caused by bias
Type of measurement validity
Content validity
Construct validity (convergent, discriminant)
Criterion validity (predictive, concurrent)
Cross-cultural validity
“Situational” validity
Content validity
Definition:
•The extent to which the items sampled for
inclusion in the instrument adequately
represent the domain of content
(particular domain area) addressed by the
instrument
Content validity
Key Questions:
• Theoretical foundation of the instrument
• Instrument development: primary sources
of information, items came from where
and scaling structure selection
• Rules applied for content validation:
patient and/or clinician validation;
scientific review
• Instrument is appropriate for the study
under consideration
Content validity
Key concern:
•Without validity, an instrument has no
meaning
Construct validity
Definition:
•The extent to which the instrument measures
an abstract concept (construct) or attribute;
evaluated by comparison with instruments
measuring related constructs
•Convergent (come together or same
concept) or discriminant with other
instruments (distinguishes between other
instruments that are related but different)
Criterion validity
Definition:
•Extent to which the instrument relates to
external criterion (criterion of practical value)
•Concurrent (able to correlate to a present
criterion) or predictive (able to correlate to a
future criterion
Construct validity
It is important to understand that a
direct test of the validity of an abstract
concept such as impaired health due to
disease is not possible
Construct validity
Key Questions:
• Factor structure of the measure consistent
with expectations
• Scores from the instrument correlate with
those of other instruments (measuring the
same or related constructs)
• Score from the instrument independent of
scores from instruments measuring
dissimilar constructs
• Differentiate groups known to differ on the
attribute being measure, e.i, on HRQL
Testing construct validity
•The most widely method used is the
multitrait-multimethod matrix
•It involves testing a series of
hypotheses concerning relationships
between the new instrument and a
range of reference measures of
disease activity
Construct validity
Key concern:
•Without validity, an instrument has no
meaning
Cross-cultural validity
Definition:
•The extent to which an instrument
developed and tested in one cultural
group is appropriate for, and behaves
similarly in, another
Cross-cultural validity
Key Questions:
•Items appropriate for the culture under
consideration
•Instrument translated culturally and
linguistically
•Evidence of reliability and validity
“Situational” validity
Definition:
•The extent to which an instrument is
appropriate for use in any given
situation
“Situational” validity
Key Questions:
•Instrument measure an appropriate
outcome for the trial
•Instrument valide for the specific purpose of
the trial
•Sufficiently reliable and responsive for this
purpose
•Sample size sufficient to detect the change
“Situational” validity
Key Issues:
•Validity can be situation specific; an
instrument valid for one situation is not
necessarily valid for another
•Failure to detect treatment effects may be a
function of study design, rather than a
limitation of the instrument
Responsiveness
Definition:
The extent to which scores change with a
given change in the condition or disease state
Key Questions:
• Instrument has been evaluated for responsiveness
• Effects sizes have been associated with the
instrument in well designed trials.
Key concerns:
• The ability to track changes
MCID
Definition:
The smallest difference that clinicians and
patients would care about
Key Questions:
• MCID has been established
• Method used to establish this score
Key concerns:
• The ability to detect true treatment effects
Benefits of Pulmonary Rehabilitation
Functional exercise capacity
Health status
6-MWD (N=444)
CRQ dyspnea (N=519)
Lacasse Y, et al. Cochrane Database Syst Rev 2002; 3:CD003793.
Key messages
Some simple criteria:
•The system must address a well defined clinical phenomena
•The scale has to have a clearly defined ranking in a
hierarchical order (reasonable clinical or mathematical
criteria)
•The different stages or categories have to be mutually
exclusive
•The scale has to be adapted to the area of measurement
where it will be applied
•Creating complex or assembled scores such as quality of
life require one to address the issues concerning the inner
structure of a score
Key messages
Quote from McDowell and Newell:
•Ultimately the selection of a measurement contains an
element of art and perhaps even luck; it is often prudent to
apply more than one measurement whenever possible.
•This has the advantage of reinforcing the conclusions of the
study when the results from ostensibly similar methods are in
agreement, and it also serves to increase our general
understanding of the comparability of the measurement we
use.