Transcript Strand21
Assessment Methodology: Lessons from OMERACT Meetings
Vibeke Strand, MD Biopharmaceutical Consultant Adjunct Clinical Professor, Division of Immunology, Stanford University
OMERACT: Outcome Measures in Rheumatology Clinical Trials
• • • • • • I: II: 1992: Rheumatoid Arthritis Clinical Trials 1994: Adverse Events → Establishment of Registries Health Related Quality of Life Economic Evaluations III: 1996: Osteoarthritis Osteoporosis Psychosocial Measures IV: 1998: Longitudinal Observational Studies RA Response Criteria / Imaging Ankylosing Spondylitis → ASAS Systemic Lupus Erythematosus 5: 6: 2000: MCID Economics: Cost Effectiveness Imaging: Radiography and MRI 2002: Economic Evaluations Imaging
What is OMERACT?
• • • Data driven process to define outcome measures to be used in RCTs and LOS for each clinical indication Domains derived from the “Ds”: Discomfort Disability Dollar cost Death Literature reviews, data available from LOS and RCTs: • Validity of currently defined instruments to assess outcome • “Data mining” to better understand clinical response • Correlation of patient reported responses with other outcome measures • Definition of “minimally clinically important improvement” = MCID
What is OMERACT?
• Presentation of evidence and development of consensus at each conference: Representatives from: Academia, Clinical Investigators, Regulatory Agencies, Sponsors, Clinical Rheumatologists • Goal: To Develop Recommendations for: • “Core Set” of minimum number of domains / outcome measures assessed in RCTs and LOS • Working agenda identifying ‘need’ to focus future work • Previous OMERACT Recommendations have been ratified by WHO / ILAR in RA, OA, SLE, including HRQOL and Economic evaluations
The OMERACT ‘Umbrella’ RHEUMATOID ARTHRITIS: EULAR ACR JRA: PRCSG SLE: SLICC EULAR OSTEOARTHRITIS: OARSI ANKYLOSING SPONDYLITIS: ASAS PAIN: IMMPACT
The OMERACT Filter
•
TRUTH:
Face, content, construct and criterion validity Is the measure truthful?
Does it measure what is intended?
•
DISCRIMINATION:
Reliability and sensitivity to change Does the measure discriminate between situations [states] of interest?
•
FEASIBILITY:
Can the outcome easily be measured given constraints of time, money and interpretability?
Boers et al: JRheum 1998: 25: 198-9
Rheumatoid Arthritis: OMERACT I, 1992
• RCTs available, but data limited • Only a few included a measure of physical function • General ‘belief’ that none had demonstrated convincing efficacy • “Paper patients” derived from actual RCT data • • → [healthy] arguments regarding changes reported Clear disagreement about importance of MD Global assessments • Participants ranked patient reported physical function and SJC highest when assessing efficacy • Facilitated recognition that ‘perception’ of benefit variable
ACR Response Criteria
• Defined and Ratified after OMERACT I Data driven nominal group process • Based on Paulus criteria and statistical analyses of CSSRD and MTX RCTs best differentiating active therapy from placebo • Require ≥20% improvement in 5 of 7 measures: • Tender Joint Count and Swollen Joint Count • and 3 of the following 5: MD Global Physical function: HAQ Pain by VAS Patient Global ESR and/or CRP
EULAR Response Definition
DAS28 Score ≤3.2
>3.2 and ≤5.1
>1.2
Decrease in DAS28 >0.6 to ≤1.2
≤0.6
Good Moderate >5.1
None DIscriminant function analysis of patients w/active; inactive RA Disease activity state determined by treatment changes
Van Gestel et al. Arth Rheum 1996; 26:705-11
As Demonstrated in RA, Responder Analyses Have Face and Content Validity
• Allow assessment of multiple domains • Facilitate comparison of efficacy across: • Products • • Heterogeneous populations, and Disease indications • May lead to tiered approach to label indications • Precedent: ACR Responder Index in RA DAS28 both confirms active disease at baseline and ‘clinical responses’ Additional data by x-ray and HRQOL
Rheumatoid Arthritis: Later Efforts
• Demonstrated that ‘generic’ measures of HRQOL sensitive to change in RA RCTs • Identified ‘MCID’ for HAQ and SF-36……facilitating: • • Comparisons across products, disease populations Economic evaluations • Helped to show impact of ‘Rheumatic Diseases’ to WHO • In this Bone and Joint Decade • Identified importance of Rheumatic Diseases relative to CV, DM, HTN, OP….
• [Hopefully] → allocation of more resources to identify and treat Rheumatic Diseases…..
Minimum Clinically Important Differences [MCID]
• Degree of improvement • Perceptible to patients = clinically important/ meaningful • Defined by patient query, delphi technique OMERACT: 33-36% improvement;18% > placebo • Confirmed by statistical correlations with patient global assessments in RCTs in RA and OA • Determination of proportion of patients with clinically important improvement provides a more interpretable result with direct clinical implications
Minimum Clinically Important Differences [MCID]
Score Range Direction of Scoring MCID Literature HAQ DI 1-4 0 - 3 SF-36 2, 5-7 0 - 100 PCS/MCS mean 50 ± 10 – + + 0.22
5 - 10 points 2.5 - 5 points
1 Guzman et al. Arth Rheum. 1996; 39:5208 2 Kosinski et al. Arth Rheum. 2000; 43:1478-87 3 Redelmeier et al. Arch Intern Med. 1993; 153:1337-42 4 Wells et al. J Rheumatol. 1993; 20:557-60 5 Kosinski et al. Arth Rheum. 2000; 43:S140 6 Samsa et al. Pharmacoeconomics. 1999; 15:141-155 7 Thumboo et al. J Rheumatol. 1999; 26:97-102.
Health Assessment Questionnaire (HAQ)
• Widely accepted, validated, rheumatology-specific instrument to assess physical function in RA • Gold Standard: OMERACT/FDA Guidance • 20 questions covering 8 types of activities Dressing + Grooming; Arising; Eating; Walking; Hygiene; Reaching; Gripping, Activities of Daily Living • HAQ Disability Index (HAQ DI) • Scores the worst items within each of the eight scales • Based on use of aids and devices
Mean Improvement in HAQ Disability Index Year-2 Cohorts at 24 Months LEF MTX SSZ Worsening US301 MN301/303/305 MN302/304 0 (97) (101) (51) (46) (248) (273) -0.22
-0.37
-0.5
-0.56
-0.6
* *LEF vs MTX; p=0.01
-0.73
Improvement -1 % Achieving MCID 84% 69% 86% 82% -0.48
-0.56
74% 78%
ATTRACT: HAQ Disability Index Mean Improvement through Week 102 0.5
0.4
0.3
0.2
0.2
0.4
0.1
0 MTX + Placebo 3 mg/kg q8w 0.5
0.5
0.4
0.45
3 mg/kg q4w 10 mg/kg q8w 10 mg/kg q4w All infliximab p-value vs. MTX + Placebo < 0.001
< 0.001
< 0.001
< 0.001
ERA: Mean Change in HAQ DI at Month 12 Baseline HAQ DI: 1.6 -0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.80
Kosinski et al. AJMC. 2002;8:231-240 1.6
-0.70
MTX ETN MCID
Mean Changes in HAQ DI at Weeks 24 and 52 Anakinra+MTX Baseline: 1.38 Placebo 1.43 Active 0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.18
-0.29
-0.15
-0.28
-0.6
-0.7
-0.8
24 weeks 52 weeks Fleishman et al. Arth Rheum. 2002;46:S574 .
MCID Placebo+MTX Anakinra+MTX
Mean Changes in HAQ DI at Weeks 24 and 52 DE019: Adalimumab+MTX 0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.24
-0.25
MCID -0.6
-0.6
-0.56
-0.61
-0.59
-0.7
-0.8
24 weeks 52 weeks Placebo BL 1.48
Adalimumab 20 mg weekly BL 1.45
Adalimumab 40 mg eow BL 1.44
Keystone E. Arthritis & Rheum 2002; 46(9) suppl.
Mean Changes in HAQ DI from Weeks 30 to 54 ASPIRE RCT
Baseline HAQ DI: 1.5 1.5 1.5
-0.1
-0.2
-0.3
MCID -0.4
-0.5
MTX MTX+INF 3 mg/kg MTX+INF 6mg/kg -0.6
-0.7
-0.8
-0.75
-0.78
-0.79
% Achieving MCID: 65 76 76
Smolen et al. Ann Rheum Ds 2003;62:S64
Mean Changes in HAQ DI at Weeks 52 TEMPO RCT
Baseline HAQ DI: 1.7 1.7 1.8
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-
1.0
-0.61
-0.66
-0.97
MCID MTX ETN MTX+ETN
SF-36: Short Form 36 Health Survey
• Validated, widely used generic measure of HRQOL • 8 Domains: • Scored 0 - 100; age, sex adjusted rates • 2 Summary Scores • Physical Component: PCS – Measures how decrements in physical function affect day to day activities – Impact of physical impairment/disability on HRQOL • Mental Component: MCS – Impact of mental affect, symptoms of pain on HRQOL • Normative based scoring (Mean: 50, SD: 10)
SF-36 Two-Component Model Physical Component Physical Function Role Physical Bodily Pain General Health Vitality Social Function Role Emotion Mental Health Mental Component
US 301: Baseline SF-36 Scores US Norms vs US301 Population US Norms (A/S Adjusted) Study US301 Population 100 90 80 70 60 50 40 30 20 10 0 Physical Function Role Physical Bodily Pain General Health Perception Vitality Social Function Role Emotion Mental Health
US301: Mean Improvement in SF-36: Year-2 Cohorts Leflunomide and Methotrexate Better LEF 24 Months (n = 93) MTX 24 Months (n = 89) US Norms (A/S Adjusted) Baseline Year-2 Cohort 40 30 20 10 90 80 70 60 50 0 Physical Function Role Physical Bodily Pain General Health Perception Vitality Social Function Role Emotion Mental Health
20 15 10 5 0 Mean Changes in SF-36 Scores DE019: Adalimumab+MTX 35 Placebo Adalimumab (40 mg) QOW 28.1
30 25 23.3
5.2
14.6
13.5
8.2
3.5
16.9
8.7
9.0
7.5
13.4
5.2
15.5
2.3
6.7
MCID Keystone E. Arthritis & Rheum 2002; 46 suppl.
Leflunomide and Methotrexate: Mean Changes in SF-36 PCS Year-2 Cohort (US301) 60 50 US Norm 42.7 41.7
40 38.6 38.8
30 30.9
30.2
2 SDs below US Norm 20 10 0 BL 12 M 24 M LEF (93) BL 12 M 24 M MTX (97)
Etanercept and Methotrexate: Mean Changes SF-36 PCS at 12 Months (ERA) 60 50 US Norm 38.7
38.8
40 30 28.0
29.2
2 SDs below US Norm 20 10 0 BL 12M ETN 25mg (193) Kosinski et al. AJMC. 2002;8:231-240.
BL 12M MTX (199)
Infliximab: Median Improvement in SF-36 PCS at Month 24 (ATTRACT) 16 Baseline: 23.9 –30.8
12 8 6.8
6.9
6.7
4.6
4 2.8
0 MTX + Placebo (n=88) p-value vs. placebo 3 mg/kg q 8 wks (n=86) 0.011
Kavanaugh et al. Arth Rheum. 2000;43:S147.
3 mg/kg q 4 wks (n=86) <0.001
10 mg/kg q 8 wks (n=87) <0.001
10 mg/kg q 4 wks (n=81) <0.001
Anakinra+MTX: Mean Improvement in SF-36 PCS at Month 12 Baseline: 29.9 PL 28.8 Active Fleishman et al. Arth Rheum. 2002;46:S574.
Correlation Between HAQ and SF-36
Reference Ruta
1
Talamo Lubeck Strand
6 2
Kavanaugh Kosinski
5 4 3
Study — — Infliximab/ATTRACT Etanercept/ERA Etanercept/RAPOLO Leflunomide/US301 Scales PCS PF PCS PF PCS PF PCS PF PCS PF
1 Ruta et al. Br J Rheum. 1998;37:425-436.
2 Talamo et al. Br J Rheum. 1997;36:463-469.
3 Kavanaugh et al. A&R. 2000;43:S147.
4 Kosinski et al. Medical Care. 1999;37:MS23-39.
5 Lubeck et al. Value in Health. 2001;4:MS2,163.
6 Strand et al. A&R. 2001;44:S187.
Correlation -0.77
-0.72
-0.51
-0.54
-0.60
-0.61
-0.79
-0.82
-0.60
-0.74
MCID Values Are Consistent in RCTs in RA
• Improvements in HAQ DI and SF-36 in RA with newly approved therapies are statistically significant; more importantly, CLINICALLY MEANINGFUL • MCID values are consistent across agents and patient populations • Disease specific [‘relevant’] measure: HAQ • Generic measure: SF-36 • Improvements in disease specific highly correlated with generic measures
MCID Workshop: Identifying Candidate Measures to Define ‘Low Disease Activity State’
• Pain • Function • Inflammation • Health Related Quality of life • Structure damage • Toxicity • Co-morbidity • Fatigue
Osteoarthritis
• OMERACT III: 1996 • Candidate instruments to assess: • • • Pain Stiffness Physical Function • Limited data from RCTs; treatments offering only symptomatic benefit • Identification of a ‘Core Set’ of 4 Domains as a foundation for future work • Research Agenda: Identification of ‘Disease Control’, ‘Biologic Markers’ of Response
Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index
• • • Self-administered questionnaire • • Developed querying patients with hip or knee OA Reflects physical activities most affected by symptoms, disease manifestations Composite score based on 24 questions; subscores: • Pain (5 questions) • • Joint stiffness (2 questions) Physical function (17 questions) Scored by 0 - 4 Likert or 0 - 10 cm VAS scales • Improvement = negative change
BIOLOGIC MARKERS INFLAM MATION HRQOL / UTILITY PAIN PHYSICAL FUNCTION
PATIENT GLOBAL IMAGING (≥1YR)
STIFFNESS 90% 36% 8% MD GLOBAL OTHER Eg, Performance based Flares Time to Surgery Analgesic Count % Voting for inclusion ≥ 90% ≥30% - <90% 0% - <30% Placement INNER Core MIDDLE Core OUTER Core
Consequence CORE SET HR QOL/ Utility (Strongly Recommended) OPTIONAL
Outcome Measures in OA: OARSI Guidelines OMERACT Core Set and ‘Strongly Recommended’
Pain: WOMAC pain / stiffness subscales Differentiating pain from stiffness Physical function: WOMAC physical function subscale Patient Global Assessment: Signal joint Transition question How to phrase question?
In all the ways arthritis affects you, how are you doing today?
HRQOL/Utilities: WOMAC Composite Score SF-36 EQ5D / Utilities MD Global Assessment
WOMAC Scores in OA RCTs: Identifying MCID
• MCID in WOMAC composite score, Likert scale : • Anchored to Patient Global Assessment • 12 wk pivotal OA RCTs with Celecoxib: 10.1
[0 – 89] • Pain, Stiffness, Physical Fxn: 2.1, 1.2, 6.5
[0 – 20] [0 – 8] [0 – 61]
Zhao et al. Pharmacother 1999;19:1269-78
• MCID in WOMAC VAS : • Anchored to Patient Response to Rx [0-4 Likert scale] • 6 wk RCTs OA hip, knee; Rofecoxib v Ibuprofen v PL: • Pain, Stiffness, Physical Fxn: 9.7, 10, 9.3 mm, VAS • 11 mm VAS for Patient Global Assessment
Ehrich et al: JRheum 2000;27: 2635-2641
Improvement in WOMAC Composite Scores at Week 12 : Pivotal OA RCTs, Celecoxib
*
MCID = 10.1 (SE=0.4)
* * * *
14 12 10 8 6 4 2 0
* * *
CT20: knee Placebo Cel 50 Zhao et al Pharmacother 1999;19:1269-78 CT21: knee Cel 100 Cel 200 * P <.05 v placebo
* * *
CT54: hip Nap 500
*
WOMAC Physical Function Subscale, knee or hip OA at 12 months: Pivotal RCT, Rofecoxib 0 -5 -10 MCID = 9.3
-15 -20 -25 -30 Mean baseline = 69.6 mm -35 R 2 4 8 12 26 Week R = randomization P < 0.05 for all groups; treatment response compared with baseline Cannon GW, et al. Arthritis Rheum. 2000;43:978 –987.
Rofecoxib 12.5 mg Rofecoxib 25 mg Diclofenac 150 mg 39 52
• • •
SF-36 in Osteoarthritis RCTs
Truth or Validity • Domains, especially Bodily Pain discriminated differences/ changes in symptoms over time • Closer correlation with patient assessed outcomes Feasibility or Reliability •
Ware et al: A+R 1996; 39:S90
• • Ceiling effects minimal; floor effects for RP and RE domains
Ware et al: A+R 1996; 39:S90
Able to detect effects of arthritis in community sample Discrimination or Responsiveness
Hill et al: JRheum 1999; 26:2029-35
• • In longitudinal tests, BP domain and PCS summary score most responsive, even within 2-6 weeks
Bellamy et al, A+R 2000; S221
Valid and responsive measure of TKR, esp long term
Brooks et al, A+R 1997; 40:S110
Short term treatment → significant improvement in MCS
Ehrich et al: JRheum 2000;27: 2635-2641
25 20 15 10 5 0 Mean Improvement in SF-36: All Rofecoxib v Normative Data US Population Difference between ages 45-54 and 55-64 US population. Ware et al 1993 PF RP PAIN GHP US Norms VITAL Rofecoxib SOC RE MH
Change in SF-36 Scores at Week 12: OA of knee Pivotal Trial with Celecoxib 24 19 14 9 4
* * * * * * * * *
-1 PF Placebo RP BP Cel 50 * p < .05 v placebo
* * * * * * * * *
GH Cel 100 VT SF Cel 200
*
RE Nap 500 MH
Use of WOMAC and SF-36 in RCTs of OA Conclusions Based on the COX-2 Experience
• WOMAC Questionnaire reflects clinical improvement consistent with other patient assessed measures • Proved valid, reliable and sensitive to change • • • Pain and stiffness subscales reflect symptoms Physical function subscale dominates composite score WOMAC Composite score is a disease specific measure of HRQOL • Correlates closely with improvements reported by generic SF-36 • Based on MCID calculations, Likert and VAS versions similarly sensitive to change
OMERACT 4 SLE Module 1998: Goal
• To develop consensus on required outcome domains to be assessed in clinical trials in SLE • Paucity of data from Randomized Controlled Trials [RCTs]; Most evidence derived from Longitudinal Observational Studies [LOS]
Strand et al: J Rheum 1999; 26: 490-497 Smolen et al: J Rheum 1999; 26: 504-507
Disease Activity Indices BILAG, ECLAM, LAI, SLAM, SLEDAI
• Good evidence for validity, discrimination, feasibility in published cohort [LOS] studies • • Changes in one index correlated with others Recommendation to use index of choice – Computer generation of all 5 indices facilitates: • Clinical research efforts: SLICC ESCICIT EURO-LUPUS • Exchange of information: interested parties biotech / pharma • Some limitations when used as primary outcome measures in RCTs; ongoing efforts to improve
SF-36: Sensitive to Change in LOS in SLE
• Baseline domain scores low in SLE – v. age/gender matched norms for Canada, Norway, UK, US – v. serious medical problems (IDDM, CAD)
Gladman et al: J Rheum 1995; 23:1953-5
• In cohort studies reflects changes in disease activity measures – disease activity in PF, BP, GHP – disease activity SF-36 domain scores, esp. PF •
Gordon et al: A+R 1997; 40:S112 Gladman et al: Clin Exp Rheum 1995; 14:305-8 Stoll et al: J Rheum 1997; 24:309-13 and 1608-14 Fortin et al: Lupus 1998; 7:101-7
Decrements in multiple domains correlate with increased disease activity and damage
Abu-Shakra et al J Rheum 1999; 26:306-9 Thumboo et al J Rheum 1999; 26:97-102 Wang et al J Rheum 2001; 28:525-32
– Immunosuppressive use
Rood et al J Rheum 2000; 27:2057-9
– ESRD
Vu, Escalante J Rheum 1999; 26:2595-2601
Domains Recommended by OMERACT 4
Disease activity : Disease Activity Scores: SLEDAI, BILAG, ECLAM, SELENA SLEDAI, SLAM-R Definitions of Active Nephritis by U/A, 24 hour CCr, proteinuria, «Renal flare» «Major SLE Flare» Damage : ACR/SLICC Damage Index End Stage Renal Disease [ESRD] Doubling of Serum Creatinine Chronicity Index on Biopsy Bone loss due to disease activity and/or corticosteroids HRQOL : SF-36 [Should also include: Adverse events Economic costs including health utilities]
As reviewed in Schiffenbauer et al: EBM Treatment of SLE; BJR: in press
Ankylosing Spondylitis: ASAS
• A successful and relevant example • To be discussed by Robert Landewe Juergen Braun
Systemic Sclerosis Workshop: OMERACT 6
Absence of data: Few ‘failed’ RCTs Limited information from LOS Assessment by organ system involvement • Renal • Cardio-pulmonary • Muscle • HRQOL • Skin • GI
OMERACT 7 May 12-16, 2004 Asilomar, California
• Module: RA: Definition of Low Disease Activity • Module Updates: Imaging in Ankylosing Spondylitis [ASAS] Working Group on Safety • Workshops: Outcome Measures in Psoriatic Arthritis Outcome Measures in Fibromyalgia Outcome Measures in Gout The Patient Perspective in Outcome Measures