~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J.

Download Report

Transcript ~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J.

~ Test Construction and Validation ~
Fundamental Points and Practices
Stephen J. Vodanovich, Ph.D.
~ Identifying The Item Domain ~
[a.k.a. Where do the questions come from?]
Test
Item
Domain
• Specific, defined content area (e.g., course exam, training program)
• Expert opinion, observation (e.g., professional literature)
• Job analysis (identification of major job tasks, duties)
Job Analysis Overview
Task Identification
Job
(or Job
Category)
KSA Identification
Task 1
KSA 1
Task 2
KSA 2
Task 3
KSA 3
Task 4
KSA 4
• Rate Tasks and KSAs
• Connect KSAs to Tasks
~ Sample Task Rating Form ~
Frequency of use
5 = almost all of the time
4 = frequently
3 = occasionally
2 = seldom
1 = not performed at all
1
2
3
4
5
6
7
Importance of
performing successfully
Importance for new hire
5 = extremely important
4 = very important
3 = moderately important
2 = slightly important
1 = of no importance
5 = extremely important
4 = very important
3 = moderately important
2 = slightly important
1 = of no importance
Distinguishes between
superior & ad
performance
5 = a great deal
4 = considerably
3 = moderately
2 = slightly
1 = not at all
Damage if error occurs
5 = extreme damage
4 = considerable damage
3 = moderate damage
2 = very little damage
1 = virtually no damage
~ Sample KSA Rating Form ~
Importance for acceptable job
performance
5 = extremely important
4 = very important
3 = moderately important
2 = slightly important
1 = of no importance
A
B
C
D
E
F
G
Importance for new hire
5 = extremely important
4 = very important
3 = moderately important
2 = slightly important
1 = of no importance
Distinguishes between superior
& adequate performance
5 = a great deal
4 = considerably
3 = moderately
2 = slightly
1 = not at all
Sample Task -- KSA Matrix
To what extent is each KSA needed when performing each job task?
5 = Extremely necessary, the job task cannot be performed without the KSA
4 = Very necessary, the KSA is very helpful when performing the job task
3 = Moderately necessary, the KSA is moderately helpful when performing the job task
2 = Slightly necessary, the KSA is slightly helpful when performing the job task
1 = Not necessary, the KSA is not used when performing the job task
KSA
Job Tasks
1
2
3
4
5
6
7
A
B
C
D
E
F
G
H
~ Writing Test Items ~
• Write a lot of questions
• Write more questions for the most critical KSAs
• Consider the reading level of the test takers
~ Selecting Test Items ~
• Initial review by Subject Matter Experts (SMEs)
• Connect items to KSAs
• Assess difficulty of items relative to job requirements
• Suggest revisions to items and answers
Sample Item Rating Form
Connect each
item to a KSA
or two
Rate difficulty
of each item
(5-point scale)
relative to the
level of KSA
needed in the
job)
~ Statistical Properties of Items ~
• Item Difficulty levels. Goal is to keep items of moderate difficulty (e.g., p
values between .40 - .60)
“p-value” is
% of people
getting each
item correct
-4
-3
-2
-1
Mean
+1
+2
+3
+4
10
RELIABILITY ANALYSIS - S
C A L E (A L L)
Mean
Std Dev
Cases
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
.7167
.7583
.8167
.9333
.9583
.9000
.6333
.8750
.8000
.6167
.9750
.8083
.7583
.5083
.4525
.4299
.3886
.2505
.2007
.3013
.4839
.3321
.4017
.4882
.1568
.3953
.4299
.5020
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
120.0
Answers are scored as correct
“1” or wrong “0.” So, the mean
is the p value of the items
(difficulty level or % or people
getting each item correct)
Easy items
Acceptable items
~ Statistical Properties of Items (cont.) ~
Internal Consistency
• Item correlations with each other. Goal is to select items that relate
moderately to each other or “hang together” reasonably well (e.g., item x total
score correlations of between .40 - .60, “alpha if item deleted” information)
~ Item-Total Statistics ~
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Scale mean if
item deleted
Scale variance if
item deleted
43.3750
43.3333
43.2750
43.1583
43.1333
43.1917
43.4583
43.2167
43.2917
43.4750
43.1167
43.2833
43.3333
43.5833
67.0599
67.7031
66.5708
67.7814
68.6711
68.8117
65.8302
67.0283
65.9562
67.4952
68.8938
67.9022
65.9216
65.2871
Alpha = .8374
Corrected itemtotal correlation
.2285
.1513
.3527
.2700
.0741
.0111
.3685
.3346
.4353
.1526
.0152
.1381
.4085
.4214
Alpha if item
deleted
.8356
.8370
.8335
.8354
.8374
.8385
.8327
.8341
.8319
.8373
.8378
.8371
.8322
.8315
~ Legal Concerns ~
Kirkland v. Department of Correctional Services (1974)
"Without such an analysis (job analysis) to single out the critical
knowledge, skills and abilities required by the job, their importance relative
importance to each other, and the level of proficiency demanded as to
each attribute, a test constructor is aiming in the dark and can only hope to
achieve job relatedness by blind luck”
•
The KSAs tested for must be critical to successful job performance
•
Portions of the exam should be accurately weighted to reflect the
relative importance to the job of the attributes for which they test
•
The level of difficulty of the exam material should match the level of
difficulty of the job
Construct Validation
Method 1
(Paper & Pencil)
Traits
Method 1
(Paper & Pencil)
A
B
C
Method 2
(Clinical
Interview)
A
B
Method 3
(Peer observation)
C
A
B
A
B
Mono
Method
C
Method 2
(Clinical
Interview)
A
B
Hetero
Method
Mono
Method
Hetero
Method
Hetero
Method
C
Method 3
(Peer
observation)
A
B
C
Mono
Method
C
Method 1
(Paper & Pencil)
Traits
Method 1
(Paper & Pencil)
Hetero-Trait;
Mono Method
Method 2
(Clinical
Interview)
A (Boredom)
B
A
B
Method 3
(Peer observation)
C
A
B
C (Anxiety)
.33 .36
.87
.55
.20 .08 .92
.20
.46 .12 .54 .93
.15
.15 .53 .62 .55
A (Boredom)
.55
.20 .15
.61
.35 .41 .90
B (Dep)
.21
.46 .13
.40
.54 .37 .49 .93
C (Anxiety)
.15
.15 .53 .31 .32
B (Dep)
C
Reliability Figures
.49 .91
A (Boredom)
C
.89
B (Dep)
C (Anxiety)
Method 3
(Peer
observation)
A
Method 2
(Clinical
Interview)
Mono-Trait; Hetero-Method
.82
.66
.54 .52
.87
Hetero-Trait; Hetero-Method