Evaluation Instruments

Download Report

Transcript Evaluation Instruments

Evaluation Rating Forms
Craig McClure, MD
May 15, 2003
Educational Outcomes
Service Group
Typical Use of Rating
Scales
End of Rotation (global)
After single encounter (focused)
To incorporate input from multiple
evaluators
Videotaped encounters
NOT As checklist for single
encounters: Yes/No
Alternate Forms
Multiple episodes versus focused
(single) episode
Measuring global (six domains)
versus task-specific behavior
Global Rating of Learner
Domains of competence, not specific
skills, tasks, or behaviors
Completed retrospectively
concerning multiple days and
activities
May be from multiple sources
Use rating scales
Focused Rating Scale
Single patient encounter
Concerning specific task, skill,
behavior
Advantages (Global)
Easy to develop
Easy to use (training minimal)
Can be used to evaluate all domains
Reasonable reliability when
Focused evaluation
Tailored to competencies measured
Systematic Rater Errors
(Global)
Leniency/Severity
Range Restriction
Halo Effect
Inappropriate Weighting
Drawbacks (Global)
Content validity uncertain
Questionable validity of general
assessments extrapolated to whole
domain
Inefficient at directing learner
improvement
Accuracy variable
Generosity factor
Poor discrimination between learners
Mixed Research results
Discriminating between competence
levels
Reliably rating more skilled physicians
higher than less skilled
Reliability of ratings
Reproducibility
Best: knowledge
Harder: patient care, interpersonal skills
Clarify Evaluative
Objectives
Global versus focused
Define using competency-based
language emphasized by ACGME
Group the Competencies
Patient Care,
Medical knowledge,
Practice-Based Learning and
Improvement,
Interpersonal and
Communication Skills,
Professionalism, and
Systems-Based Practice.
Composition of Form
Short is better than long
Big font is better than small
Clean better than cluttered
Each Behavior is Evaluated
Independently
Otherwise:
Uncertain what to evaluate
Learner uncertain what to address
Decide on Options in the
Scale
Best if minimum of five
Best if a descriptor present for each
Absence of middle labels skews ratings
toward the positive side
Primacy Effect
“The results showed that when the
positive side of the scale was on the
left, the ratings were more positive
and had reduced variance than
when the positive label was on the
right.”
•Lake Wobegon Effect
Where all the children are above
average
• Faculty tend to interpret anchors as
more negative than literal
• Generosity effect
•
Consider Changing
Anchors
IF desire to keep evaluative anchors
Poor, fair, below average, average,
above average and excellent
Very poor, poor, fair, good, very
good, excellent
Consider Using Frequency
Anchors
Frequency of observable resident
behaviors from “never” to “always”
Considerable education of the
evaluators to minimize inter-rater
variability needed for judgmental
rating
Permits PD competency judgment
Example of Stem for
Frequency Anchor
Resident demonstrates respect in
speaking to patient…
Never,
25%,
50%,
75%,
Always
Competency Judgment at
Program Level
Permits competency definitions to
vary by year of training
Diminishes effect of inter-rater
variability
Focuses on observable behavior
Requires less training of evaluators
References
Evaluations, S. Swing, Academic
Emergency Medicine 2002;9:1278-88
Assessment of Communication and
Interpersonal Skills Competencies,
Academic Emergency Medicine 2002;9:
1257-69
ACGME/ABMS Joint Initiative Toolbox of
Assessment Methods, September 2000
References (2)
Challenges in using rater judgments
in medical education, M.A.
Albanese, Journal of Evaluation in
Clinical Practice,6:3: 305-319