Presentation Title

Transcript Presentation Title

Defending Your
Licensing Examination
Programme
Deborah Worrad
Registrar and Executive Director
College of Massage Therapists of Ontario
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Critical Steps
Job Analysis Survey
Blueprint for Examination
Item Development & Test Development
Cut Scores & Scoring/Analysis
Security
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Subject Matter Experts
Selection
Broadly
representative of the
profession
Specialties of
practice
Ethnicity
Age distribution
Education level
Gender distribution
Representation from
newly credentialed
practitioners
Geographical
distribution
Urban vs. rural
practice locations
Practice settings
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Provides
the framework for the examination
development
a critical element for ensuring that valid
interpretations are made about an
individual’s exam performance
a link between what is done on the job and
how candidates are evaluated for
competency
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Comprehensive survey of critical
knowledge, skills and abilities (KSAs)
required by an occupation
Relative importance, frequency and
level of proficiency of tasks must be
established
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Multiple sources of information should
be used to develop the KSAs
The survey must provide sufficient detail
in order to provide enough data to
support exam construction (blueprint)
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Good directions
User friendly simple layout
Demographic information requested
from respondents
Reasonable rating scale
Pilot test
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Survey is sent to either a representative
sample (large profession) or all
members (small)
With computer technology the JAS can
be done on line saving costs associated
with printing and mailing
Motivating members to complete the
survey may be necessary
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis Survey
Statistical analysis of results must
include elimination of outliers and
respondents with personal agendas
A final technical report with the data
analysis must be produced
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Blueprint for Examination
An examination based on a JAS provides the
foundation for the programme content validity
The data from the JAS on tasks and KSAs
critical to effective performance is used to
create the examination blueprint
Subject Matter Experts review the blueprint to
confirm results from data analysis
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Item Development
Items must fit the test blueprint and be
properly referenced
Principles of item writing must be
followed and the writers trained to
create items that will properly
discriminate at an entry level
The writers must be demographically
representative of practitioners
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Item Development
Item editing is completed by a team of
Subject Matter Experts (SMEs) for
content review and verification of
accuracy
Items are converted to a second
language at this point if required
Items should be pre-tested with large
enough samples
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Examination
Psychometrics
Options
Computer adaptive model
 Paper and pencil model with item response
theory (IRT) and pre-testing
 Equipercentile equating using an
embedded set of items on every form for
equating and establishing a pass score

Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Test Development
Relationship between test specifications
and content must be logical and
defensible
Test questions are linked to blueprint
which is linked to the JAS
Exam materials must be secure
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Test Development
Elements of test development differ
depending on model you are using
Generally - develop a test form ensuring
Items selected meet statistical
requirements
Items match the blueprint
No item cues another item
No repetition of same items
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Cut Scores
Use an approved method to establish
minimal competence standards required
to pass the examination
This establishes the cut score (pass
level)
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Cut Scores
One method is the modified Angoff in
which a SME panel makes judgements
about the minimally competent
candidate’s ability to answer each item
correctly
This is frequently used by testing
programmes and does not take too long
to complete
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Cut Scores
The SMEs provide an estimate of the
proportion of minimally competent candidates
who would respond correctly to each item
This process is completed for all items and an
average rating established for each item
Individual item rating data are analyzed to
establish the passing score
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Scoring
Scoring must be correct in all aspects:
Scanning
Error checks
Proper key
Quality control
Reporting
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Scoring/Analysis
Test item analysis on item difficulty and item
discrimination must be conducted
Adopt a model of scoring appropriate for your
exam (IRT, equipercentile equating)
Must ensure that the passing scores are fair
and consistent eliminating the impact of
varying difficulty among forms
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Scoring
Adopting a scaled score for reporting
results to candidates may be beneficial
Scaling scores facilitates the reporting
of any shifts in the passing point due to
ease or difficulty of a form
Cut scores may vary depending on the
test form so scaling enables reporting
on a common scale
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Security
For all aspects of the work related to
examinations, proper security
procedures must be followed including:
Passwords and password maintenance
Programme software security
Back-ups
Encryption for email transmissions
Confidentiality agreements
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Security
Exam administration security must
include:
Exam materials locked in fire proof vaults
 Security of delivery of exam materials
 Diligence in dealing with changes in
technology if computer delivery of the
exam is used

Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Presentation Follow-up
Please pick up a handout from this
presentation -AND/ORPresentation materials will be posted on
CLEAR’s website
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Defending Your
Licensing Examination
Program
With Data
Robert C. Shaw, Jr., PhD
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
The Defense Triangle
Test
Score
Use
Content
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
Standard 14.14 (1999) – “The content
domain to be covered by a credentialing
test should be defined clearly and
justified in terms of the importance of
the content . . .”
We typically evaluate tasks along an
importance dimension or a significance
dimension that incorporates importance
and frequency
extent dimension
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
Task importance/significance scale
points
4. Extremely
3. Very
2. Moderately
1. Not
Task extent scale point
0. Never Performed
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
We cause each task to independently
surpass importance/significance and
extent exclusion rules
We do not composite task ratings
We are concerned about diluting tests with
relatively trivial content (high extent-low
importance) or including content that may
be unfair to test (low extent-high
importance)
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
Selecting a subset of tasks and labeling
them critical is only defensible when the
original list was reasonably complete
We typically ask task inventory
respondents how adequately the task
list covered the job
completely, adequately, inadequately
We then calculate percentages of
respondents who selected each option
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
Evaluate task rating consistency
Were the people consistent?
Intraclass correlation
Were tasks consistently rated within each
content domain?
Coefficient alpha
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
Reliability (consistency)
Between
Between Tasks
Respondents
(Coefficient
(Intraclass)
Alpha)
.95
.96
.94
.94
.95
.95
.95
.94
Content
N of
N of
Sections
Tasks
Respondents*
I.
49
252
II.
26
278
III.
33
261
IV.
33
259
Weighted
Grand Mean
.95
.95
Total Tasks
141
*Only those who responded to every task in each section with a rating of 0 to 4 were
included for these analyses.
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
We typically ask task inventory respondents
in what percentages they would allocate
items across content areas to lend support to
the structure of the outline
I encourage a task force to explicitly follow these
results or follow the rank order
Because items are specified according to the
outline, we feel these results demonstrate
broader support for test specifications beyond
the task force
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Content
What percentage of items would you
allocate to each content area?
Content Sections
I.
II.
III.
IV.
Total
Mean % Minimum Maximum
22.8
10
55
25.8
5
50
28.9
10
50
22.5
10
45
100.0
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Reliability
Test scores lack utility
until one can show
the measurement
scale is reasonably
precise
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Reliability
Test score precision is often expressed
in terms of
Kuder-Richardson Formula 20 (KR 20)
when items are dichotomously (i.e., 0 or 1)
scored
Coefficient Alpha when items are scored on
a broader scale (e.g., 0 to 5)
Standard Error of Measurement
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Reliability
Standard 14.15 (1999) – “Estimates of
the reliability of test-based credentialing
decisions should be provided.”
“Comment: . . . Other types of reliability
estimates and associated standard errors
of measurement may also be useful, but
the reliability of the decision of whether or
not to certify is of primary importance”
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Decision
Consistency
Index
Theoretic Second Attempt
Reliability
First Attempt
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
The criterion to which test scores are
related can be represented by two
planks
Criterion-Related Study
Minimal Competence Expectation
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Most programs rely on the minimal
competence criterion expressed in a
passing point study
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Judges’ expectations are expressed
through
text describing minimally competent
practitioners
item difficulty ratings
We calculate an intraclass correlation to
focus on the consistency with which
judges’ gave ratings
We find confidence intervals around the
mean rating
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
We use the
discrimination
value to look
for aberrant
behavior from
judges
JUDGE RATING SUMS
SD DISC
1
73.93 11.21 0.81
2
71.67 9.19 0.78
3
72.73 9.46 0.79
4
76.93 9.95 0.85
5
76.73 10.81 0.69
6
72.68 10.24 0.74
7
76.10 8.71 0.78
8
LOW 69.69 10.35 0.73
9
74.25 12.01 0.75
10
77.20 7.97 0.80
11
HIGH 80.83 9.60 0.66
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Mean of judges’ ratings
Passing score
Decision
Raw Cut Score Consistency Pass %
122
0.821 49.1
121
0.814 50.9
120
0.811 52.7
119
0.810 54.5
118
0.814 63.6
117
0.820 65.5
116
0.829 67.3
115
0.840 70.9
114
0.851 74.5
113
0.863 78.2
112
0.875 80.0
111
0.886 81.8
110
0.897 81.8
109
0.906 81.8
108
0.915 83.6
107
0.923 83.6
106
0.930 85.5
105
0.936 85.5
104
0.941 90.9
103
0.946 92.7
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
One of my clients was sued in 1975
In spite of evidence linking test content to a
1973 role delineation study, the court
would not dismiss the case
Issues that required defense were
discrimination or adverse impact from of
test score use
job-relatedness of test scores
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Only after a criterion-related validation
study was conducted was the suit
settled
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Theoretic model of these studies
Supervisor Rating Inventory
Critical Content
Test
Correlation of
Ratings and Test
Scores
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Criterion
Test Bias Study
Compare regression lines of job performance
from test scores for focal and comparator
groups
There are statistical procedures available to
determine whether slopes and intercepts
significantly differ
Differences in mean scores are not
necessarily a critical indicator
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
The Defense Triangle
Test
Score
Use
Content
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Presentation Follow-up
Presentation materials will be posted on
CLEAR’s website
[email protected]
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Defending Your Program:
Strengthening Validity in
Existing Examinations
Ron Rodgers, Ph.D.
Director of Measurement Services
Continental Testing Services (CTS)
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Waht Can Go Wrong?
1. Job/practice analysis & test specs
2. Item development & documentation
3. Test assembly procedures & controls
4. Candidate information: before & after
5. Scoring accuracy & item revalidation
6. Suspected cheating & candidate appeals
7. Practical exam procedures & scoring
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Job Analysis & Test Specs
Undocumented (or no) job analysis
Embedded test specifications
Unrepresentative populations for job
analysis or pilot testing
Misuse of “trial forms” and data to
support “live” examinations
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Item Development
Do item authors and reviewers sign and
understand non-disclosure agreements?
How does each question reflect job
analysis results and test specifications?
Should qualified candidates be able to
answer Qs correctly with information
available during the examination?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Item Development
Do any questions offer cues that answer
other questions on an exam?
Do item patterns offer cues to marginally
qualified, test-savvy candidates?
Is longest answer always correct?
If None of the above or All of the above
Qs are used, are these always correct?
True-False questions with clear patterns?
Do other detectable patterns cue answers?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Item Documentation
Are all Qs supported by references cited
for and available to all candidates?
Do any questions cite item authors or
committee members as “references”?
Are page references cited for each Q?
Are citations updated as new editions of
each reference are published?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Candidate Information
Are all references identified to and
equally available to all candidates?
Are content outlines for each test
provided to help candidates prepare?
Are sample Qs given to all candidates?
Are candidates told what they must/may
bring and use during the examination?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Test Assembly Controls
Are parallel forms assembled to be of
approximately equal difficulty?
Is answer key properly balanced?
Approx. equal numbers of each option
Limit consecutive Qs with same answer
Avoid repeated patterns of responses
Avoid long series of Qs without an option
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Suspected Cheating
Is potential cheating behavior at the test
site clearly defined for onsite staff?
Are candidates informed of possible
consequences of suspected cheating?
Are staff trained to respond fairly and
appropriately to suspected cheating?
Are procedures in place to help staff
document/report suspected cheating?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Scoring Controls
How is accuracy of answer key verified?
Do item analyses show any anomalies in
candidate performance on test?
Are oddly performing Qs revalidated?
Identify ambiguities in sources or Qs
 Verify that each Q has one right answer
 Give credit to all candidates when needed

Are scoring adjustments applied fairly?
Are rescores/refunds issued as needed?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Candidate Appeals
How do candidates request rescoring?
Do policies allow cancellation of scores
when organized cheating is found?
Harvested Qs on websites, in print
Are appeal procedures available?
Are appeal procedures explained?
How is test security protected during
candidate appeal procedures?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Practical Examinations
Is test uniform for all candidates?
Is passing score defensible?
Are scoring controls in place to limit bias
for or against individual candidates?
Are scoring criteria well-documented?
Are judges well-trained to apply scoring
criteria consistently?
Are scoring judgments easy to record?
How are marginal scores resolved?
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003
Presentation Follow-up
Please pick up a handout from this
presentation -AND/ORPresentation materials will be posted on
CLEAR’s website
Presented at CLEAR’s 23rd Annual Conference
Toronto, Ontario September, 2003

Presentation Title

Transcript Presentation Title

Directory