Development and Validation of a health status measure
Download
Report
Transcript Development and Validation of a health status measure
513-611A STUDY DESIGN AND ANALYSIS I
Sept 29, 2003
Development and Validation of a health
status measure
Susan Stock, MD MSc FRCPC
Institut
national de santé publique du Québec
Direction de santé publique de Montreal-Centre
Dept of Epidemiology, Biostatistics and Occupational Health, McGill
Plan
Types of Health Status Measures
Steps in the development of a health status
measure
Steps in the development of the Neck and Upper
Limb Index
Steps in the validation of a health status measure
Steps in the validation of the Neck and Upper
Limb Index
Susan Stock : Developing & Validating Health Status Measures
Health status measure
A health outcome questionnaire that quantifies symptoms,
function, feelings and/or behaviour directly from the
respondent to measure overall health status (generic
instrument) or disorder-specific health status
Vary in scope
Activities of daily living ("ADL”- e.g. self care,
mobility)
Functional status – measure capacity or performance of
physical functioning, e.g. household tasks, work,
recreational activities
Health-related "quality of life" instruments - measure
not only physical functioning but also psychological,
social and role functioning
Susan Stock : Developing & Validating Health Status Measures
Health status measures
allow patient/subject to identify impact of a disorder or
health problem on his/her life across many dimensions
based on his/her experience rather than the interpretation
of a health care professional
Useful in a wide range of studies and clinical contexts:
In studies of aetiology, prevalence and prognostic
factors they can be incorporated into case definitions
that distinguish according to severity
In intervention studies and health services research
they can be used as the primary outcome to demonstrate
change over time in health status
Susan Stock : Developing & Validating Health Status Measures
Development of Health Status
Measures: references
Streiner DL, Norman GR. Health measurement scales. A
practical guide to their development and use. Second edition.
New York: Oxford University Press, 1995: 28-53
Guyatt GH, Bombardier C, Tugwell PX. Measuring diseasespecific quality of life in clinical trials. CMAJ 1986; 134: 889895
Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in
clinical trials: Choosing the right approach.
Juniper EF, Guyatt GH, Jaeschke R. How to develop and
validate a new health-related quality of life instrument.
In Spilker B (ed), Quality of Life and Pharmacoeconomics
in Clinical Trials, Second edition. Lippencott-Raven
Publishers, Philadelphia, 1996
Susan Stock : Developing & Validating Health Status Measures
Neck and Upper Limb Index (NULI)
Health-related quality of life instrument:
specific to neck and upper extremity musculoskeletal
disorders
capable of measuring changes within subjects over time in
intervention studies
capable of distinguishing between subjects (i.e., assess
severity) in prognostic, prevalence or etiologic studies
applicable to both French and English speaking
populations in Canada, and
practical and easy to use in clinical settings
Susan Stock : Developing & Validating Health Status Measures
Neck and Upper Limb Index (NULI)
In order to develop an instrument that was equally
appropriate to the two major cultural and linguistic
groups in Canada
Conducted two separate studies with similar
protocols for item reduction and selection and
subsequent validation
one in an Ontario English-speaking population
the other in a Quebec French-speaking
population
Susan Stock : Developing & Validating Health Status Measures
Steps in development of a health
status measure
Search for appropriate existing measure!
If none available:
Identify domains of interest
Generating potential items
Refine items and pre-test
Choose appropriate response scale(s) for the items
Carry out item reduction and item selection
strategies
Susan Stock : Developing & Validating Health Status Measures
Steps in development of the NULI
Identification of domains of interest
Generation of potential items
Item refinement and pre-testing
English item reduction and selection study
French translation of potential items
French item reduction and selection study
Comparison of English and French results
Selection of 20 items appropriate for both
populations
Reliability and validity testing of the final 20-item
instrument in both English and French populations
Susan Stock : Developing & Validating Health Status Measures
Domain
a dimension of life potentially affected by the
disorder or health problem in question
e.g. self care, household responsibilities, work, social
life, sexual life, mood, self esteem, transportation,
recreation, sleep, financial impact of disorder,
iatrogenic effect of evaluation or treatment
Susan Stock : Developing & Validating Health Status Measures
Identifying domains & generating
items
Strategies for identifying the most appropriate
domains of interest and for generating potential
items are aimed at optimizing content validity
the extent to which the measurement incorporates all
the relevant content or domains of the phenomenon
under study
Susan Stock : Developing & Validating Health Status Measures
NULI:
Identifying domains & generating items
review of relevant literature (rheumatology, rehabilitation,
orthopaedics, back pain)
review of existing health status instruments identified by
bibliographic search and contact with experts
clinical experience of investigators
survey of 30 clinicians
interviews with 33 worker-patients who presented with
neck and/or upper limb disorders in five clinical
occupational health settings
Susan Stock : Developing & Validating Health Status Measures
Evaluating content validity of existing
instruments
Identify relevant domains for the concept of interest and
evaluate whether instruments measure these domains
adequately
Identify number or proportion of items in each instrument
that are not relevant to the concept you wish to measure
Ref: Stock SR, Cole DC, Tugwell P. Review of applicability of
existing functional status measures to the study of workers with
musculoskeletal disorders of the neck and upper limb. Am J Indust
Med 1996, 29, 679-688
Susan Stock : Developing & Validating Health Status Measures
An example of evaluation of
content validity
Distribution of items among domains for
selected musculoskeletal functional status
instruments
Wor
k
Self
care
Househ Soc Recreat Sle
old &
ial
ion
ep
family
life
Mood
Sex
life
PDI
2
4
2
-
2
-
-
-
DRI
2
1
1
-
2
-
-
-
NULI
4
1
4
-
2
2
4
-
DASH
1
1
9
1
3 (+4)
1
-
1
ASES
1
4
-
-
1
1
-
-
Susan Stock : Developing & Validating Health Status Measures
NULI:
Identifying domains & generating items
80 questions in 8 domains identified through
investigator clinical experience, existing
instruments and literature
52 additional items and 2 domains generated by
clinician survey
48 additional items and 2 more domains identified
by patient interviews
Total of 12 domains identified
Susan Stock : Developing & Validating Health Status Measures
Item refinement
Redundant items eliminated
Pool of approximately 150 items with 7-30 items per
domain
Wording of items
Literacy editor to ensure Grade 6 language
“Applicability”: Screening question developed for activityrelated items to evaluate whether the item was applicable
or relevant to the subject (work, household and family
responsibilities, transportation/driving, recreation, and
social activities; sexual life)
Vacuuming, shovelling snow
Sports activities
Susan Stock : Developing & Validating Health Status Measures
Item refinement:
Choice of response scale
Response scale: 7-point numbered scale with
verbal anchors
Maximize reliability: reliability of a scale rises
rapidly as the number of divisions increases to
seven and then rises more slowly until there are 11
points (Streiner and Norman 1995, Nunnally et Wilson 1975,
Nishisato et Torii 1985 )
Susan Stock : Developing & Validating Health Status Measures
Response scale :
number of points on a scale
Loss of test re-test reliability:
7-10 categories: little reduction of reliability
5 categories reduces reliability by 12%
2 categories reduces reliability by 35%
Optimum number of points recommended: (5 to) 7
categories
(Reference: Streiner and Norman 1995, Chap 4)
Treating rating scales as interval data statistically
will result in less measurement error when there
are more items
Susan Stock : Developing & Validating Health Status Measures
Scaling: number of points on a scale
Potential sources of error when there are few
points on a scale:
Uncertainty, confusion of respondents
Reduction in reliability
Loss of efficiency of the instrument
More subjects needed to show an effect (S Suissa J
Clin Epidemiol 1991, 44: 241-8)
Lower correlation with other measures (Hunter &
Schmidt 1990, J Applied Psychol 75:334-49)
Susan Stock : Developing & Validating Health Status Measures
Pre-test
Pre-test in 10 clients with musculoskeletal
disorders of neck or upper extremity in a
vocational rehabilitation clinic
To identify questions that are unclear, ambiguous,
difficult to understand or inappropriate
Revise items following pre-test
Susan Stock : Developing & Validating Health Status Measures
Inter-rater reliability testing
inter-rater reliability study of revised potential items
English study conducted on 38 worker-patients with neck
and upper limb disorders in four clinical settings prior to
the item selection study; French inter-rater reliability study
was conducted with 16 worker-patients
2 raters interviewed each patient on the same day, at 2-4
hour intervals
Following the second interview, feedback was sought from
respondents to further identify any ambiguous items or
those difficult to understand
ICC (intraclass correlations) calculated for the mean of
items in each domain and for each individual item.
Items with low inter-rater reliability (ICC<0.7) identified
and source of difficulty reviewed with the interviewers.
Items were reformulated where indicated.
Susan Stock : Developing & Validating Health Status Measures
Interviewer training
3-5-day training sessions for interviewers
to be familiar with content of questions, use of
scales
to teach appropriate standardised technique
interviewers trained to probe in a non-directive,
non-biasing fashion, and be interpersonally neutral
feedback on tape-recorded interviews
role-playing of interviews with potentially difficult
subjects
Susan Stock : Developing & Validating Health Status Measures
Interviewer training
To reduce bias and random error and ensure strict adherence to
research protocol
Inform re purpose of study, type of data to be gathered, how results
will be used
Familiarize with questionnaire, understand every item
How to handle first meeting with respondent, techniques for building
rapport
How to answer questions commonly asked by respondents
Confidentiality procedures
When and how to probe
How to ask questions
How to record responses
Checking the questionnaire
How to end interviews
How to deal with special situations (angry, tearful, or verbose
respondents)
Susan Stock : Developing & Validating Health Status Measures
Item reduction studies
Study procedure:
Pre and post-treatment administration of 170 potential
items and validating measures to 119 English-speaking
Ontario workers and to 93 French-speaking Quebec
workers with neck or upper limb disorders recruited from
occupational and physiotherapy clinics
7-30 specific items in each of the 12 domains including a global
question about the overall impact of the disorder on that domain
An additional administration 3-7 days after the initial
administration for test re-rest reliability
Subjects rank ordered the 12 domains according to the
relative importance of the impact of their musculoskeletal
disorder on these dimensions of their lives
Susan Stock : Developing & Validating Health Status Measures
NULI Item reduction
Objective of item reduction:
To identify and omit items that were irrelevant,
unresponsive, had poor test re-test reliability,
discriminated poorly or were unresponsive to
change
Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction
Applicability of activity related items
Eliminate items not applicable to at least 80% study
population
Eliminate items not applicable to at least 70% of men
and 70% of women
e.g. vacuuming applicable to 49% men 83% women
Shovelling snow not applicable to 82% women
Reproducibility
Eliminate items with Pearson correlation coefficient
0.5
Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction
Internal consistency
Responsiveness to change
Eliminate items with correlation 0.3 between item score and: (1)
mean of all items in the domain without that item; (2) the global
question score for the domain
Eliminate items with correlation 0.3 between the residual
change scores pre-treatment and post-treatment to the residual
change score of the domain Global Score
Discriminative Ability
Eliminate items with a skewness statistic > 2 standard deviations
of the standard error of this statistic
Susan Stock : Developing & Validating Health Status Measures
Measuring change
Problem with change scores: regression to the
mean (tendency of outlying scores to return to the
mean)
by chance low pre-test scores will be higher on post-test
and high pre-test scores will be lower on post-test)
Possible solution: residual change scores
Susan Stock : Developing & Validating Health Status Measures
Selection of final domains
Selection of domains: relative impact and importance study
subjects attributed to each domain
mean score of the global question for each domain and
domain rankings
calculated for each study population as well as by gender
committee of co-investigators reviewed these data and, through
consensus discussion, arrived at a choice of priority domains and
the number of items of each domain the final instrument should
include
Selection among remaining items
Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for
Each Domain between Quebec and Ontario
Study Populations
Mean global question score
7
Ontario
Québec
6
5
4
3
2
1
Ontario
Work
5.4
Sleep
4.9
Recr
4
Québec
5.3
5.2
4.3
Mood Housework Esteem Self care Financial
3.9
3.7
3.5
3.3
3
3.8
4
4
3.4
2
Driving
2.9
3
Sex life
2.9
Iatrog
2.5
Social
2.1
4.2
3.2
2.6
Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain
between Quebec and Ontario Study Populations
12
Ontario
Quebec
11
10
13 - mean rank
9
8
7
6
5
4
3
2
1
WORK HOUSE/F SLEEP
Ontario
Quebec
10.8
10.3
8.4
8.1
8.1
8.6
MOOD
RECR
$$$
S/CARE
DRIV
6.9
6.3
6.5
7.2
6.5
4.2
6.5
6.7
5.9
6.3
ESTEEM IATRO
5.7
5.1
4.8
5.9
SOCIAL
SEX
4.7
4.8
3.7
4.5
Susan Stock : Developing & Validating Health Status Measures
Selection of remaining items
Selection of the most responsive and most discriminating
items that covered the priority domains
Number of items that would result in an instrument that
takes no more than 5-10 minutes to complete (version 1=
35-items; version 2 = 20 items)
Selection among items with similar responsiveness and
discriminative ability were selected based on the clinical
judgement of the co-investigator research committee
Susan Stock : Developing & Validating Health Status Measures
Translation into French
double reverse parallel translation method (Vallerand
1989)
translation into French of the English questionnaire by two
independent translators (versions A and B)
the two French versions (versions A and B) translated into English
by two different translators (versions C and D)
versions C and D compared to the original English version by a
committee comprised of three bilingual study researchers (two
francophones, one anglophone) and discrepancies resolved through
consensus to arrive at a revised French translation, version E
version E pre-tested on 16 francophone workers with neck or
upper extremity disorders to identify ambiguous or difficult to
understand items
results of the pre-test reviewed by the research translation
committee and a final French version of the questionnaire was
agreed upon (version F).
Susan Stock : Developing & Validating Health Status Measures
Criteria for acceptance of a French
formulation
meaning of the French version was as close as
possible to the English one
the most simple term would be selected (in order
to be understandable at a Grade 6 or lower
reading level)
French syntax would be respected
the terms most commonly used in current Quebec
French would be selected
Susan Stock : Developing & Validating Health Status Measures
Comparison of English and French
item reduction results
Compare demographic profile of the 2 populations
compare English and French subjects’ mean responses for the global
question of each domain by t-test for univariate analyses and multiple
regression analyses controlling for sex, age, income and duration of
symptoms
compare English and French subjects’ mean ranking scores for each
domain by Wilcoxon rank-sum test for univariate analyses and by
partial Spearman correlations between the mean ranking score of each
domain and the study group status (i.e., English or French study group)
controlling for sex, age, income and duration of symptoms
Susan Stock : Developing & Validating Health Status Measures
Comparison of Ontario and Quebec study
populations
Ontario n=119
Quebec n=93
40.3% female;
59.7% male
55.9% female;
44.1% male
39.7 yr. (± 10.1)
41,1 yr. (± 10,0)
% cases with
duration of injury >
6 months
30.4
58.8
% cases off work
72.9
57.0
% cases on WCB
67.8
26.9
Gender
Mean age (S.D.)
The Quebec study population was more likely to be female (p=.02), have had
symptoms > 6 months (p=.001), still be at work (p=.02) and less likely to be on
WCB benefits (p=0.0001)
Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for
Each Domain between Quebec and Ontario
Study Populations
Mean global question score
7
Ontario
Québec
6
5
4
3
2
1
Ontario
Work
5.4
Sleep
4.9
Recr
4
Québec
5.3
5.2
4.3
Mood Housework Esteem Self care Financial
3.9
3.7
3.5
3.3
3
3.8
4
4
3.4
2
Driving
2.9
3
Sex life
2.9
Iatrog
2.5
Social
2.1
4.2
3.2
2.6
Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain
between Quebec and Ontario Study Populations
12
Ontario
Quebec
11
10
13 - mean rank
9
8
7
6
5
4
3
2
1
WORK HOUSE/F SLEEP
Ontario
Quebec
10.8
10.3
8.4
8.1
8.1
8.6
MOOD
RECR
$$$
S/CARE
DRIV
6.9
6.3
6.5
7.2
6.5
4.2
6.5
6.7
5.9
6.3
ESTEEM IATRO
5.7
5.1
4.8
5.9
SOCIAL
SEX
4.7
4.8
3.7
4.5
Susan Stock : Developing & Validating Health Status Measures
Comparison of mean rank of each domain
between English and French study
subjects: univariate analyses
«Wilcoxon rank
sum» test (p)
Domain
Personal care
Family and domestic responsibilities
Work
Transportation
Mood
Self esteem
Sleep
Sexual life
recreation
Social life
Financial impact
Iatrogenic effects
.798
.448
.099
.412
.168
.069
.251
.018
.084
.442
.000
.017
Susan Stock : Developing & Validating Health Status Measures
Correlation of study status (English or French)
to mean domain ranking controlling for age,
gender, income and duration of symptoms
Domain
Personal care
Family and domestic
responsibilities
Work
Transportation
Mood
Self esteem
Sleep
Sexual life
recreation
Social life
Financial impact
Iatrogenic effects
p
Partial Spearman
correlation coefficient
.003
-.037
.966
.622
-.092
.076
-.116
-.135
.077
.146
.170
.061
-.293
.180
.216
.313
.122
.075
.305
.051
.022
.415
.0001
.015
Susan Stock : Developing & Validating Health Status Measures
Multiple regression for each domain to assess whether study
status (English or French) was a predictor of the mean score of
the domain global question when controlling for age, gender,
income and duration of symptoms
Domain
Personal care
Family and domestic
responsibilities
Work
Transportation
Mood
Self esteem
Sleep
Sexual life
recreation
Social life
Financial impact
Iatrogenic effects
Standardised
coefficient
-.016
-.004
Signif.
-.041
-.032
-.080
-.020
-.013
.3091
.102
.108
-.2302
.1491
.587
.676
.290
.791
.869
.0004
.193
.156
.002
.049
.829
.950
1A
positive coefficient indicates that French study subjects had significantly higher mean global scores than English subjects for
that domain
2
A negative coefficient indicates that English study subjects had significantly higher mean global scores than French subjects
for that domain
Susan Stock : Developing & Validating Health Status Measures
Synthesis of English-French
comparisons
Sexual life:
Financial impact/iatrogenic effects:
Statistically significant differences in mean ranking and mean
domain global score but clinically insignificant difference in
ranking
Domain did not meet applicability criteria
Statistically significant differences in mean ranking and mean
domain global score probably reflecting differences in proportion
of subjects off work and differences in clinical treatment program
Overall no major differences in mean domain rankings or
mean domain scores or in results of individual item
reduction
A single instrument could be developed for both
populations
Susan Stock : Developing & Validating Health Status Measures
Final instrument
20 items:
4 work
7 physical activities (self care, domestic
responsibilities, leisure)
6 psychosocial (mood, self esteem, social role function)
2 sleep
1 iatrogenic
Susan Stock : Developing & Validating Health Status Measures
Validation of a health status measure
Internal consistency
Reproducibility (test re-test reliability)
Validity
Content
Criterion or convergent
Construct
Predictive
Responsive to change
Susan Stock : Developing & Validating Health Status Measures
Measures of internal consistency
Cronbach alpha (0.0-1.0)
An estimate of the correlation between the total score across a
series of items from a rating scale and the total score that would
have been obtained had a comparable series of items been
employed
Inter-item correlations
Item-total correlations (total ± item)
Correlation of item to mean of items (mean ± item)
Split half reliability (items randomly divided and 2 sub-scales
correlated)
Susan Stock : Developing & Validating Health Status Measures
Reliability
Test re-test reliability: the stability exhibited when a
measurement is repeated under identical conditions
calculation of the intra-class correlation (ICC) for two
administrations of the index, 3-7 day apart in 99
Ontario subjects and 33 Quebec subjects
Internal consistency: intercorrelation between items of a
scale meant to measure the same concept
Cronbach’s alpha calculated for 119 Ontario subjects
and 93 Quebec subjects present at the initial pretreatment administration of the questionnaires
Susan Stock : Developing & Validating Health Status Measures
Ways of improving reproducibility
Increase the number of items in a test or measurement
scale
Increase the number of response choices for each item
Reduce inter-observer variation (training of interviewers,
standardised protocol)
Reduce ambiguity in questions
Susan Stock : Developing & Validating Health Status Measures
Validity
An expression of the degree to which a measurment
measures what it purports to measure (Last)
Is the scale measuring what it was intended to measure?
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Subjective judgement by “experts”:
Face validity: the extent to which, on the face of it, the
measurement appears to be assessing the desired
qualities
Content validity: the extent to which the measurement
incorporates all the relevant content or domains of the
phenomenon under study
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Criterion validity: extent to which the
measurement correlates with an external criterion
(preferably a "gold standard")
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Convergent (concurrent) validity: correlation
between measurement of interest and another measurement
known to measure the same concept and measured at the
same time (0.4-0.8)
Predictive validity: ability of a measurement to predict
the criterion
Susan Stock : Developing & Validating Health Status Measures
Reliability and Responsiveness of Revised
20-item NULI / IDVQ
Reproducibility
test re-test
reliability (ICC)
Internal
consistency
(Chronbach
alpha)
Reponsiveness
(Standardised
response mean
with 95% CI)
Ontario
Quebec
(based on
original items)
(based on
revised format)
0.88 (n =99)
0.83 (n = 33)
0.90 (n = 119)
0.93 (n = 93)
1.48 (1.1-1.8)
(n = 33)
1.63 (1.3-2.0)
(n = 35)
Susan Stock : Developing & Validating Health Status Measures
Example of convergent validity:
Pearson’s Correlations between NULI and other
measures
Ontario
Québec
(based on original items)
(based on revised format)
1-item Global question
(mean subject-clinician)
0.60
0.73
Pain Scale
0.42
0.55
Shoulder abduction
-0.32
-0.47
Scratch test
Hand grip strength
(Jamar)
0.37
0.30
0.29
-0.41
0.66
N/A
N/A
-0.50
N/A
-0.52
SIP
Physical component
SF-36
Mental
SF-36
Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity?
Construct validity: the extent to which the
measurement corresponds to theoretical constructs
concerning the phenomenon under study
e.g., testing a hypothesis about whether the measure will
distinguish between 2 groups who differ with respect to the
concept of interest
e.g. NULI: those who returned to work had significantly lower
NULI scores at the post-treatment administration than those
who did not return to work at that time
Tests theory and measure at the same time
Susan Stock : Developing & Validating Health Status Measures
Responsiveness
The ability of a measure to detect change (in the
construct being measured) over time
AKA « sensitivity to change »
Important when testing effectiveness of an
intervention
Susan Stock : Developing & Validating Health Status Measures
Statistical measures of responsiveness
Effect size – ability to detect the effect of treatments
Ratio of the difference between groups to the variability within
groups
Numerator: raw change score
Denominator:
• standard deviation of pre-test scores vs
• SD of change scores vs
• standard error of change score vs
• SD of change score in stable subjects
Example: Standardised response mean: mean change score SD
of change scores
Susan Stock : Developing & Validating Health Status Measures
Responsiveness to change of NULI
Standardized response mean (SRM)
calculated for 33 Ontario subjects and 35
Quebec subjects who both subject and
clinician deemed improved on a 1-item
global question of disability
Susan Stock : Developing & Validating Health Status Measures
Comparison of standardised response means of
Revised 20-item NULI / IDVQ and other measures
NULI (IDVQ) -20
Pain Scale
Shoulder abduction
Scratch test
Hand grip strength
(Jamar)
SIP (total)
Physical componentSF36
Mental component –
SF36
Ontario
Québec
1.48 (1.1,1.8)
1.22 (0.9,1.6)
-0.61 (-1.0,-0.3)
0.02 (-0.3,0.4)
-0.80 (-1.2,-0.5)
1.63 (1.3,2.0)
1.73 (1.4,2.1)
-1.16 (-1.6,-0.7)
0.59 (0.2,1.0)
-0.33 (-0.8,0.1)
1.14 (0.8,1.5)
-
-1.26 (-1.6,-0.9)
-
-0.48 (-0.5,0.2)
Susan Stock : Developing & Validating Health Status Measures
Existing instrument vs. designing
your own
Development of a reliable, valid instrument is a
lengthy, complicated process
Whenever possible, use existing instrument with
known reliability and validity
When choosing among existing instruments,
choose the instrument with the best reliability,
validity and/or responsiveness that will measure
the concept you wish to measure
Susan Stock : Developing & Validating Health Status Measures
Choosing a health outcome measure:
Internet resources
1. Quality of life compendium: choosing a quality of life instrument
(from the Dept of Public Health and Primary Health Care,
University of Bergen, Norway)
www.uib.no/isf/people/doc/qol/comp0006.htm
2. Quality of Life Assessment in Medicine - Internet Resources
http://www.qlmed.org/url.htm
3. Clinician’s computer-assisted guide to the choice of instruments
for quality of life assessment in medicine
http://www.glamm.com/ql/guide.htm
4. Medical Outcomes Trust Scientific Advisory Committee
Instrument Review Criteria
http://www.outcomes-trust.org/bulletin/34sacrev.htm
Susan Stock : Developing & Validating Health Status Measures