Establishing Passing Standards Without Gambling

Download Report

Transcript Establishing Passing Standards Without Gambling

Establishing Passing
Standards Without Gambling
Photo by Mel Curtis. Monitor on Psychology, September 2002.
What is a Standard?
Cut Score = Test score a candidate must attain to pass
Raw Score - for example, 36 correct
Percent Correct - 36/50 = 72%
Scaled Score - for example, 300
Standard = The level of knowledge or proficiency a
candidate must demonstrate to pass
Why Do We Need
a Formal Procedure?
70%?
• Defensibility
• Fairness
• Validity
Important Factors in
Setting a Standard
 Subject Matter Experts
 A Representative Committee
 Ample Time
 Actual Test Items (Questions)
 Definition of the Level of Proficiency
Distinguishing Qualified and Unqualified
Candidates
Some Common Methods
of Standard Setting
 Angoff
 Item Mapping
 Bookmark
 Contrasting Groups
The (Modified) Angoff
Method
Subject Matter Experts define the minimally
competent (borderline) candidate in terms of
knowledge.
They evaluate every item in the test and estimate
this candidate’s chances of answering correctly.
The mean estimate across all experts and all items
determines the passing score.
Steps in a Typical
Angoff Session
Take the Test!
Define the "Just Sufficiently Qualified"
(JSQ) or Minimally Competent (MC)
Candidate
What is the Probability the JSQ Candidate
Will Answer an Item Correctly?
Steps in Angoff,
continued
Determine Probabilities in
Groups of 10 Items - Discuss
Change Probability Estimates if Desired
Add Mean Ratings for All Items to
Calculate Cut Score
Evaluate: Is This Cut Reasonable?
Advantages of the
Angoff Method
 A relatively straightfoward process
 No data necessary
 Has held up in court
Disadvantages of the
Angoff Method

Must look at every item on test(s)


Time and cost
Fatigue, inattention, “rushing”
 Difficulty of accurately estimating
probabilities
Item Mapping
A graphical method of determining the
level of competence necessary for licensure
Item Mapping Process
Administer items to a pilot group.
Collect statistics, including the difficulty of each item.
Group items by difficulty. Display in a graph.
Subject Matter Experts define the minimally
competent (borderline) candidate.
SMEs evaluate a sample of items: Does the borderline
candidate have at least a 50% chance of answering
correctly?
Evaluate: Is this cut reasonable?
Rasch Model
if
Candidate Ability = Item Difficulty
then
Chance of a correct answer = 50%
Item Mapping for NCBTMB - Therapeutic Massage and Bodywork
01660
01429
01522
01329
01453
01517 01390
01226
01373 01334
01323 01568
01283
01257 01389
01190
01566
01204 01153
01269 01560
01416
01206 01548
01189 00901
01145
01491
01187 01117
01159 01381
01056
01160 01499
00955 00851
01639
01095
01208
01180 00917
01053 01348
01051
01006 01357
00752 00830
01596
01060
01573
01097
01039 00897
01047 01276
00959
00958 01307
00738 00827
01592
01004
01360
01584
00934
00857 00852
00893 01262
00924
00642 01227
00717 00725
00769
00915 01137
01212
00526
00823
00846 00630
00839 00516
00874
00591 01218
00616 00699
00655
00788 01068
00982 01107
01488
00418
00733
00567 00445
00634 00432
00765
00556 00637
00365 00561
00519
00611 00812
00804 00923
01258
100
105
80
85
90
01613
95
110
115
Easier Item s <---------------- Difficulty ----------------> Harder Items
120
01251
125
Candidate Ability Distribution
300
200
100
99.8%
0
85. 5
90. 5
Scaled Ability
95. 5
100.5
97.0%
105.5
84.0%
110.5
115.5
46.0%
120.5
Item Mapping
Advantages
 Sound statistical basis
 More discussion (no “rushing”)
 Portrait of the borderline candidate
 Multiple forms cut simultaneously
 Time
Disadvantages
 Less straightforward (Rasch model)
 Requires empirical data
Bookmark Method
Conceptually Similar to Item Mapping
Administer items to a pilot group.
Collect statistics, including the difficulty of each item.
Order items by difficulty. Display in a booklet.
Subject Matter Experts define the minimally
competent (borderline) candidate.
Bookmark Method,
continued
SMEs review items and place a bookmark between
items the minimally acceptable candidate is
likely to answer correctly and items this
candidate is unlikely to answer correctly.
Discuss and repeat the process, aiming for agreement.
Evaluate: Is this cut reasonable?
Bookmark Method
Advantages
 More discussion (no “rushing”)
 Portrait of the borderline candidate
 More focus on item content over
entire exam
 High level of face validity
Disadvantages
 Tends to be time-consuming
 Requires empirical data
Contrasting Groups
Administer items to a pilot group.
Subject Matter Experts classify each candidate
as qualified or unqualified based on other data.
Score the exam and order candidate IDs by score.
Find a score or a narrow range of scores for which
approximately half of the candidates have been
labeled unqualified.
Contrasting Groups
Score
46-50
41-45
36-40
31-35
26-30
21-25
16-20
11-15
0-10
Number of Candidates
Qualified / Unqualified
5
14
25
22
17
11
4
0
0
0
1
7
10
12
12
12
6
1
Percent
Qualified
100
93
78
69
59
49
33
0
0
Contrasting Groups
Not widely used in licensure testing
― Subjectivity of judgments (Q or UnQ)
― Connection to job is less direct
― Often not feasible to get judgments
?