Working Paper No.3 22 November 2005 STATISTICAL COMMISSION and UN ECONOMIC COMMISSION FOR EUROPE STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) CONFERENCE OF EUROPEAN STATISTICIANS WORLD HEALTH ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting on.

Download Report

Transcript Working Paper No.3 22 November 2005 STATISTICAL COMMISSION and UN ECONOMIC COMMISSION FOR EUROPE STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) CONFERENCE OF EUROPEAN STATISTICIANS WORLD HEALTH ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting on.

STATISTICAL COMMISSION and UN ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS

Joint UNECE/WHO/Eurostat Meeting on the Measurement of Health Status (Budapest, Hungary, 14-16 November 2005) Working Paper No.3

22 November 2005

STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) WORLD HEALTH ORGANIZATION (WHO) Session 3-Invited paper

Task Force on the Development of a Common Instrument to Measure Health States: Conceptual and Logistic Issues in Item Construction

Cameron N. McIntosh; Julie Bernier; Jean-Marie Berthelot; Sarah Connor Gorber; Michael C. Wolfson Statistics Canada Ottawa, Ontario, Canada

1

Selected Domains

For inclusion on the common instrument, the task force selected the following 10 domains, for which specific items were to be constructed:

1

. Physical Functioning: Mobility 2. Physical Functioning: Dexterity 3. Vitality/Fatigue 4. Affect (happiness, depression) 5. Anxiety (worry, fear, nervousness) 6. Vision (visual acuity) 7. Hearing (auditory acuity) 8. Pain and Discomfort 9. Social Relationships (including aspects of communication) 10. Cognition (a) memory and concentration (b) problem solving and thinking 2

Developing Questions for the Health Domains

 A number of conceptual and logistic issues needed to be considered in the item construction process for all domains; these can be grouped under the following five major headings: (1) Number of Questions per Domain (2) Questions Should be Uni-dimensional (3) Duration of the Recall Period for the Questions (4) Dealing with Technical and Medicinal Prosthetics (5) Item Wording and Response Categories 3

(1) Number of Questions per Domain

 trade-off between adequate domain coverage and operational feasibility of the survey instrument; ideally, each question should be assessed using only one or two items  multi-faceted domains (e.g., Cognition) may necessitate multiple items to enhance measurement precision 

Filter questions

should be considered for screening out respondents with “no limitations” on a given domain

Advantage

: might conserve interview time since not all response categories would need to be read in all cases

Disadvantage

: might result in a bias toward “no” responses, as it provides a relief from the mental effort needed to generate an estimate of functioning 4

(2) Questions Should be Uni-Dimensional

 to maximize measurement precision, each item should only assess one domain (or domain aspect); “double-barreled” response categories should be avoided, for example (EQ-5D): 1. I am not anxious or depressed 2. I am moderately anxious or depressed 3. I am extremely anxious or depressed  responses to items mixing different concepts are difficult to interpret; do not know which part of the question was being answered  multiple concepts within a single question might also confuse respondents and result in natural questions for interviewers, for example:

“If I am not anxious but am moderately depressed, should I pick 1 or 2?”

5

(3) Duration of the Recall Period for the Questions

 respondents need to base their functional status reports on some time period  just asking about “general” or “usual” functioning might provide the least biased estimates, as it helps to avoid picking up the impact of time-limited health conditions (e.g., flu)  a problem is that “usual” or “general” are vague terms and might not have consistent meaning across countries and cultures; may pose translation difficulties  A specific recall period (e.g., the previous 30 days) would help standardize measurement, as well as facilitate translation 6

(3) Duration of the Recall Period for the Questions

 choice of specific recall period must take several factors into account  shorter the recall period, the greater the tendency to only consider frequent, highly patterned events of lower intensity  longer the recall period, the tendency is toward consideration of infrequent, more intense events (e.g., intense episodes of anger)  optimum recall period would lead to a balanced consideration of domain-related events (i.e., events of varying intensity) 7

(3) Duration of the Recall Period for the Questions

Telescoping

: events are improperly included or excluded from the recall period

Forward telescoping

: an event that is better-represented in memory (highly vivid and intense) is

included incorrectly in

the recall period Back

ward telescoping

: an event that is more poorly represented in memory (less vivid and intense) is

excluded incorrectly

from the recall period  questions may need to reinforce that the focus is on respondents’ lives during the specified recall period only 8

(4) Dealing With Technical and Medicinal Prosthetics

 to accurately measure capacity and feelings, the questions may need to incorporate information on the use of aids (e.g., walking equipment, glasses and contact lenses, hearing aids, medication for controlling pain and regulating mood)  if certain items do not specify the use of aids, respondents who use aids might pose natural questions to interviewers, for example:

“Do you mean how much difficulty I have getting around the neighbourhood with or without my walker/wheelchair?” “Are you referring to the intensity of my pain when I am on or off my medication?”

 questions for domains where aids are most relevant (e.g., mobility, vision, hearing, pain and discomfort) should probably mention the use of aids in the preamble and/or the response categories 9

(5) Item Wording and Response Categories

 Terminology will have to be chosen carefully in order to facilitate translation and international comparability of concepts  language that is either overly colloquial or overly scientific should be avoided  might be best to assess capacity in terms of “difficulty in doing __”; questions directly using the terms “capacity” (or “ability”) might be ambiguous for respondents  need to determine whether problems in functioning will be assessed in terms of frequency (how often), intensity (how bad), or both 10

(5) Item Wording and Response Categories

Response category cut-point shift problem –

the same underlying level of capacity or feeling may not receive the same rating across countries, cultures, or individuals (e.g., limitations seen as “mild” in one culture may be seen as “severe” in another; the frequency of a given problem might be rated as “some of the time” in one culture and “all of the time” in another)  alternative to full sets of quantifiers and qualifiers would be to use scales with qualifiers or quantifiers on the endpoints only (e.g., Visual Analogue Scale, or a ladder)  measurement precision is lessened when descriptors are not attached to all scale values; also, it may be optimal to define every domain level for future preference measurement.  both types of items (i.e., a fully defined system of levels versus endpoint labels only) should be subjected to cognitive testing 11

Issues Requiring Input

 What should be the upper limit on questions for each domain?

 How do we arrive at an optimal balance between precision in measurement (i.e., maintaining item uni-dimensionality) and operational feasibility (i.e., having a reasonably brief survey module)?

 What is the best recall period for the items?

 What is the best way to incorporate information on technical and medicinal prosthetics be built into the items?

 Should there be response category labels for every level of a domain, or should there be scales with labels on the endpoints only? How do we derive a set of internationally comparable descriptors?

12