ارزشیابی پیشرفت تحصیلی
Download
Report
Transcript ارزشیابی پیشرفت تحصیلی
IN THE NAME OF GOD
TEST CONSTRUCTION
WORKSHOP
J.KOOHPAYEHZADEH M.D , MPH
Education development center
Iran University of Medical
Sciences
4/13/2015
TEST CONSTRUCTION Workshop
1
“Tell me, I forget.
Ask me, I remember.
Involve me, I understand.”
4/13/2015
TEST CONSTRUCTION Workshop
2
Why Test?
Testing is 50% of
Teaching
4/13/2015
TEST CONSTRUCTION Workshop
3
Well defined educational objectives
prerequsite for assessment
Example for this session:
At the end of this session participants will be able:
To named at list three differences between
summative and formative assessment
To make a list of at least three written AM
To name the most effective AM to assess clinical
skills
To describe the most effective AM to assess
attitudes
4/13/2015
TEST CONSTRUCTION Workshop
4
Evaluating Students:
Tests ARE Not the Only Way!
Tests
Projects
Performance
Participation
4/13/2015
TEST CONSTRUCTION Workshop
7
How چگونه ؟
Why چرا؟
ارزيابي
When
4/13/2015
چه موقع؟
What چه چيزي را؟
TEST CONSTRUCTION Workshop
12
When?
Summative
Formative
Pre-test
4/13/2015
چه موقع ارزيابي كنيم؟
در پايان آموزش
در طول آموزش
قبل از آموزش
TEST CONSTRUCTION Workshop
13
چرا ارزيابي ميكنيم؟
.1
.2
.3
.4
.5
.6
.7
14
?WHY
تشويق به يادگيري
آگاه نمودن دانشجو
آگاه نمودن مدرس
ي
اصالح فعاليتهاي يادگير
انتخاب دانشجو
گواهي دادن
كسب آمادگي ارتقاء
TEST CONSTRUCTION Workshop
4/13/2015
Why Evaluate Students?
To help students improve
To assess student learning
To determine if the teacher is teaching
Motivation tool
To communicate with others such as
parents
4/13/2015
TEST CONSTRUCTION Workshop
15
What?
دانش
مهارت
4/13/2015
TEST CONSTRUCTION Workshop
نگرش
19
Who Should Assess?
Faculty
Self
Peers
Tutors
Other team members
Standardized patients, patients
External and internal examiners
Public, society, …
4/13/2015
360 o
TEST CONSTRUCTION Workshop
26
Where?
Does
4/13/2015
Work Place
Assessment
Shows how
Test Center/Skill
Lab
Knows how
Examination Hall
knows
Examination Hall
TEST CONSTRUCTION Workshop
27
How to use assessment?
Summative:
usually undertaken at the end of a training
programme and determines whether the educational
objectives have been successfully achieved.
With summative assessment the students usually receives
a grade or a mark. Exam
Formative:
This is testing that is part of developmental
or ongoing teaching / learning process. It should include
delivery of feedback to the student.
4/13/2015
TEST CONSTRUCTION Workshop
29
Formative assesssment
Feedback
Feedback
Feedback
Feedback
Feedback
4/13/2015
TEST CONSTRUCTION Workshop
33
THANK YOU
ANY QUESTIONS?
4/13/2015
TEST CONSTRUCTION Workshop
41
Stages of test development
Conceptualization
Construction
Tryout
Item analysis
Revision
4/13/2015
TEST CONSTRUCTION Workshop
42
Conceptualization
An idea…
4/13/2015
TEST CONSTRUCTION Workshop
43
Conceptualization
What will it measure?
What is the objective?
Is there a need?
Who will use it?
Etc…
4/13/2015
TEST CONSTRUCTION Workshop
44
Test Construction Principles
Adequate provision should be made for
evaluating all the teacher objectives of
the instruction.
The test should reflect the approximate
proportion of emphasis in the course.
4/13/2015
TEST CONSTRUCTION Workshop
45
Preparing the test
The preliminary draft of the test should
be prepared as early as possible.
As a rule the test should include more
than one type of item.
4/13/2015
TEST CONSTRUCTION Workshop
46
Preparing the test, continued
The content of the test should range
from very easy to very difficult for the
group being measured.
The items in the test should be
arranged in order of difficulty.
The items should be so phrased that
the content rather than the form of the
statement will determine the answer.
4/13/2015
TEST CONSTRUCTION Workshop
47
Preparing the test, continued
A regular sequence in the pattern of
response should be avoided.
The directions to the pupils should be
as clear, complete and concise as
possible.
One question should not provide the
answer to another question.
4/13/2015
TEST CONSTRUCTION Workshop
48
Item Analysis
Process of determining which items are
“good”
Tools in item analysis
Item
Item
Item
Item
4/13/2015
difficulty index
reliability index
validity index
discrimination index
TEST CONSTRUCTION Workshop
50
Characteristics of
assessment Tools
4/13/2015
TEST CONSTRUCTION Workshop
55
Reliability
If an assessment
is repeated with
the same
trainees, they
should get the
same results
4/13/2015
TEST CONSTRUCTION Workshop
57
Validity
What is it?
the degree to which a measurement instrument
truly measures what it is intended to measure
Importance:
If the assessment test does not test what
it is meant to test so the test is useless
Reliability is a pre-req for validity but not
sufficient by itself
4/13/2015
TEST CONSTRUCTION Workshop
58
Standardization
What is it?
All students are tested on the same test
items, patients, tasks & according to the
same criteria
Importance:
So that no one gets more easy or
difficult questions (Fairness)
4/13/2015
TEST CONSTRUCTION Workshop
60
Feasibility
What is it?
Importance
4/13/2015
TEST CONSTRUCTION Workshop
61
Objectivity
What is it?
it is a level of agreement among independent
assessors (experts) about the right answer to
certain question
Importance
Decreases intra-rater and inter-rater
bias
4/13/2015
TEST CONSTRUCTION Workshop
62
ويژگيهاي يك آزمون
اعتبار Validity
ميزان دقت يك وسيله اندازهگيري در اندازهگيري موضوع مورد نظر
قابليت اطمينان Reliability
ميزان ثبات يك وسيله اندازهگيري در اندازهگيري يك متغيير
عينيت Objectivity
براي هر
درجه توافق بين قضاوتهاي مستقل تعدادي ممتحن خبره بر سر پاسخهاي خوب
يك از اجزاي وسايل اندازهگيري
عملي بودن Practicability
سهولت كلي استفاده از يك آزمون هم براي سازنده آزمون و هم براي دانشجويان
63
TEST CONSTRUCTION Workshop
4/13/2015
رابطه ميان روايي و پايايي
Validityvalidity+
Reliability+
reliability +
• •
• •
• •
• •
validityReliability4/13/2015
•
•
•
•
TEST CONSTRUCTION Workshop
64
ن
جدول مشخصات آزمو
)(Table of specifications
يك جدول د وبعدي است:
-1بعد افقي :محتواي آموزش ي مورد نظر
-2بعد عمودي :سطوح حيطه شناختي
(دانش ،ادراك ،كاربرد ،تجزيه و تحليل)..،
69
TEST CONSTRUCTION Workshop
4/13/2015
م
سطوح
دانش
درك
كاربرد
تجزيه و تحليل
محتواي
آموزش ي
نارسايي قلب
2سؤال
1سؤال
0سؤال
0سؤال
شوك
2سؤال
1سؤال
1سؤال
1سؤال
مسموميت با
ديگوكسين
1سؤال
1سؤال
1سؤال
0سؤال
70
TEST CONSTRUCTION Workshop
4/13/2015
جدول مشخصات آزمون
.1
.2
.3
بعد محتوا
تعداد كل سئوالها
بعد هدف
دانش
.1
.2
.3
فهميدن
.1
.2
تحليل
تركيب
ارزشيابي
تعداد كل سئوالها
درصد سئوالها
71
TEST CONSTRUCTION Workshop
4/13/2015
تعداد ساعتهائي كه صرف تدريس يك موضوع شده
هر موضوع(بخش)=
نسبت ساعتهاي تدريس براي
تعداد كل ساعتهاي تدريس يك دوره (واحد درس ي)
درصد سئواالت هر بخش= *100نسبت ساعتهاي تدريس هر موضوع
تعداد سئوالها
درصدسئوالهاي
ساعتهاي تدريس
عناوين يك دوره درس ي يا
2واحد درس ي ()36
6
%11
4
2
8
.1
.2
.3
50
%100
36
جمع
در صد سؤاالت بخش يك 4 =0/11 *100=%11
= نسبت ساعتهاي تدريس حال آنچه
36
يك آزمون 50سئوال از اين دوره درس ي بايد تهيه شود تعداد سئواالت مربوط به بخش يك ميشود .
72
TEST CONSTRUCTION Workshop
6
100 11
50
*
4/13/2015
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
73
انواع آزمونها
(Written) كتبي.1
MCQ :عینی
Essay : غير عینی
(Oral) شفاهي.2
(Practical) عملي.3
Log Book Portfolio
4/13/2015
MiniCEX
MSF
TEST CONSTRUCTION Workshop
OSCE
DOPS
74
?What are assessment tools
باز
كتبي
تشريحي
محدود پاسخ
restricted
كوتاه پاسخ
گسترده پاسخ
extended
صحيح-غلط
بسته
جور كردني
چندگزينهاي
انجام تكاليف Assignments
75
TEST CONSTRUCTION Workshop
4/13/2015
انواع آزمونهاي تشريحي
گسترده پاسخ Extended response
سطح تركيب و ارزشيابي
محدود پاسخ Restricted response
سطوح فهميدن ،كاربستن و تحليل
77
TEST CONSTRUCTION Workshop
4/13/2015
انواع آزمونهاي كوتاه پاسخ
براي سطوح پايين حيطه شناختي (حداكثر تا مرحله به كار بستن)
78
پرسش ي
كامل كردني
تشخيص ي (تداعي)
TEST CONSTRUCTION Workshop
4/13/2015
(objective) انواع آزمونهاي عيني
4/13/2015
True- False
غلط-صحيح
matching
جور كردني
Multiple- choice
چند گزينهاي
TEST CONSTRUCTION Workshop
79
Action
1. Professionalism Eval Form
2. End-of-Rotation Eval
3. 360° Evals
4. Mini-CEX
5. Critical Incident Reports
6. Record Reviews
Decision Making
1. OSCE
2. SP Exam
3. Computer Simulated
Patient
Reasoning
1. Oral Exam
2. Essay
3. MCQ
Awareness
1. Oral Exam
2. Essay
3. MCQ
4/13/2015
ASSESSMENT TOOLS
Action
DOES
Shows
How
Knows How
Knows
TEST CONSTRUCTION Workshop
Decision
Making
Reasoning
Awareness
Miller’s Pyramid
80
Miller 1990
How to assess
Knowledge, Skills, Attitudes
Written
Exams
Clinical
Exams
Viva
Knowledge
++++
+
++
Psychomot
or skills
-
++++
-
Attitude
-
+
+
4/13/2015
TEST CONSTRUCTION Workshop
81
نكاتي از تدوين آزمونهاي كتبي
سؤاالت را به ترتيب ذيل قرار دهيد:
-1صحيح -غلط
-2جوركردني
-3چندگزينهاي
-4كوتاه پاسخ
-5تشريحي
سؤاالت از ساده به دشوار مرتب شود.
سؤاالت را به ترتيب سازمان اصلي مطالب به دنبال هم مرتب كنيد.
90
TEST CONSTRUCTION Workshop
4/13/2015
MCQ
گزينه يا پاسخ
تنه اصلي
پاسخ انحرافي
پاسخ درست
Destructor
Key
4/13/2015
TEST CONSTRUCTION Workshop
91
انواع آزمونهاي چند گزينهاي
تنها گزينه درست
بهترين گزينه درست
منفي
92
TEST CONSTRUCTION Workshop
4/13/2015
Millman قوانين
MCQ در خصوص
4/13/2015
TEST CONSTRUCTION Workshop
93
21قانون Millmanدر خصوص MCQ
-1پايه بايد مسائل اصلي و كميتها را در برگيرد.
-2هر Itemبايد تا حد امكان كوتاه باشد ( ضمن حفظ
وضوح جمالت)
-3از ذكر سئواالت منفي در پايه حتيالمقدور خودداري شود.
در صورت انجام اين امر زير جمله منفي خط كشيده شود يا
با حروف درشت نوشته شود.
94
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-4پايه سئوال بايد بنحوي تنظيم شود كه بدون كمك گرفتن از
ديگر موارد گزينهها بيان كننده مسئله اصلي باشد .گزينه
ها نيز بايد حتيالمقدور مستقل از يكدگير باشد.
-5بهترين پاسخ بايد خواسته شود يا از عبارت بيشترين و
اوليه استفاده شود( .در صورتيكه بيش از يك پاسخ نسبتا ً
صحيح داشته باشد)
-6در پايه سئواالتي كه جاي خالي گذاشته ميباشد .قسمت
حذف شده كه بايد پرشود حتيالمقدور نبايد ابتداي جمله
گذاشته شود.
95
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-7دشواريهاي زباني گزينهها بايد پايين باشد.
-8با هر گزينه يك نقطه نظر را بايد مورد سئوال قرار
داد.
-9حتيالمقدور از تكرار كلمات در گزينهها خودداري
شود مگر توالي منطقي وجود داشته باشد.
96
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-10سئواالت انحرافي بايد منطقي و جالب توجه باشد (در
صورتي كه پايه سئوال درك و فهم واقعي را اندازهگيري
نمايد).
-11تمام گزينهها از نظر دستور زبان و اصول گرامر بايد
مطابق با پايه سئوال باشد يعني اگر پايه سئوال جمع است
گزينهها نيز همه جمع باشند.
-12گزينه از نظر طول جمله ،دشواري فني و كاربردي يكسان
باشند.
97
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-13پايه و گزينهها بايد از نظر قواعد دستوري ،محتوي
موضوعي و شكل يكنواخت و همگن باشد.
-14از توالي پاسخ صحيح در مجموعه سئواالت امتحاني
خودداري شود.
(بترتيب :الف ،ب ،ج ،د جواب صحيح نباشد يا اكثريت با
جواب ج نباشد)
98
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-15بازاي هر موضوع حداقل 4گزينه داشته باشيد.
-16از بكاربردن عباراتي كه بنحوي تشابه بين پايه و
سئوال باشد ،بايد خودداري كرد.
-17از بكاربردن عين عبارت كتاب خودداري شود.
-18از بكار بردن پايه سئواالتي كه پاسخ به سئوال بعدي
است ،خودداري شود
99
TEST CONSTRUCTION Workshop
4/13/2015
21قانون Millmanدر خصوص MCQ
-19گزينهها نبايد شامل يكديگر يا در حقيقت با يك
منظور باشند.
-20از شاخصهاي معلوم و خاص مثل هميشه ،هرگز
خودداري شود.
-21در پرسش راجع به فهم و درك يك اصطالح يا
مفهوم ،ابتدا اصطالح را ارائه نمود و سپس با يك
سري مشخصه و تعاريف گزينه ها را انتخاب نمود.
100
TEST CONSTRUCTION Workshop
4/13/2015
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
102
M.P.L.
محاسبه حد نصاب قبولي
Minimum Pass Level
ارزش اختصاص داده شده به گزينه صحيح
حدنصاب قبولي براي هر سئوال=
حد نصاب قبولي براي امتحان =
مجموع امتياز داده شده به كليه گزينهها
مجموع حدنصاب قبولي سئواالت امتحان
تعداد سئواالت
111
TEST CONSTRUCTION Workshop
4/13/2015
Item Analysis
Main purpose of item analysis is to improve
the test
Analyze items to identify:
•
•
•
•
Potential mistakes in scoring
Ambiguous/tricky items
Alternatives that do not work well
Problems with time limits
4/13/2015
TEST CONSTRUCTION Workshop
112
انواع آزمونها
Criterion- Referenced
and
Norm- Referenced
TESTS
)ي(مالكي
آزمونهاي معيار
)آزمونهاي هنجاري (رقابتي
4/13/2015
TEST CONSTRUCTION Workshop
113
TYPES OF TESTS BY PURPOSE
1. Norm-referenced Tests
a. Discrimination most
important aspect
b. Easy items eliminated
2. Criterion-referenced Tests
a. Discrimination not of
critical importance.
b. Items not altered or eliminated
due to difficulty
4/13/2015
TEST CONSTRUCTION Workshop
114
Criterion- Referenced
قبل از برگزاري آزمون معيارهاي مشخص جهت اطمينان از كسب حداقل دانش و تواناييهاي
در آزمون با مقايسه
دانشجو
خاص تعيين ميشود و سنجش موفقيت يا عدم موفقيت
وضعيت وي با معيارهاي تعيين شده انجام ميگيرد.
بيشتر براي امتحانات نهايي و جهت اعطاي گواهينامه كاربرد دارد.
اين روش
مثال :آزمون ورودي دانشكده خلباني آزمون دانشنامه تخصص ي
115
TEST CONSTRUCTION Workshop
4/13/2015
Norm- Referenced
نتايج بدست آمده از كليه دانشجويان با هم مقايسه ميشوند .حدنصاب قبولي بصورت
قرادادي و يا با توجه به نمرات اخذ شده توسط دانشجويان تعيين ميشود.
بيشتر براي امتحانات ورودي و تشخيص ي كاربرد دارد.
اين روش
مثال :آزمون ورودي دانشگاهها
116
TEST CONSTRUCTION Workshop
4/13/2015
بررس ي تحليلي سئواالت
ي
در آزمونهاي هنجار
Norm Reference
4/13/2015
TEST CONSTRUCTION Workshop
117
ITEM ANALYSIS
an Assessment tool
has 3 parts
1. Item Difficulty
2. Item Discrimination
3. Distraction Analysis
4/13/2015
TEST CONSTRUCTION Workshop
118
مراحل تجزيه و تحليل سئواالت
.1تعيين نمره هر يك از دانشجويان
.2رتبه بندي دانشجويان براساس شايستگي
.3تعيين گروههاي باال و پائين
.4محاسبه ضريب و شاخص دشواري براي هر سئوال
.5محاسبه ضريب و شاخص تشخيص براي هر سئوال
.6ارزيابي انتقادي سئواالت
119
TEST CONSTRUCTION Workshop
4/13/2015
كارت تحليل سئوال
تاريخ اجراي آزمون 2/11/73
عنوان آزمون :آمار استنباطي
موضوع سئوال :ضريب همبستگي
كدام يك از ارقام زير معرف ضريب همبستگي بيشتري است؟
الف55/0 -
*ب61/0 -
ج49/0 -
د23/0 -
120
گروهها
الف
ب
ج
د
بدون پاسخ
%25باال
%25پايين
ضريب دشواري =35
ضريب تميز=3/0
0
5
5
2
3
3
0
0
2
0
TEST CONSTRUCTION Workshop
10
10
4/13/2015
Tests of individual differences
Two groups of individuals
U – Upper group – 27% of highest scorers
L – Lower group – 27% of lowest scorers
U=L
Upper group individuals
who got the item right
item
difficulty
index
item
discrimination
index
4/13/2015
p
U p Lp
D
U L
Lower group individuals
who got the item right
U p Lp
U
TEST CONSTRUCTION Workshop
121
Example – cont.
60 students who took the test.
Item 14: Among 16 upper scorers, 5
have the item right. Among 16 lower
scorers, only 1 has the item right.
5 1
p
.19
32
4/13/2015
5 1
D
.25
16
TEST CONSTRUCTION Workshop
122
ITEM ANALYSIS
Difficulty (D): 0 - 1
0______________0.5____________1.0
Hard
Moderate
Easy
4/13/2015
TEST CONSTRUCTION Workshop
129
ITEM ANALYSIS
Example:
30 students in class
5 of Top 10 scorers got ? correct
3 of Bottom 10 scorers got ? correct
D = 5 correct + 3 correct =
10 +
10
4/13/2015
8 = .4 (Moderate
20
Difficulty)
TEST CONSTRUCTION Workshop
130
ITEM ANALYSIS
Discrimination Index
0____________0.5_____________1.0
No
Moderate
Excellent
(-) Something is wrong
4/13/2015
TEST CONSTRUCTION Workshop
135
ITEM ANALYSIS
Example:
30 students in class
10 of Top 10 scorers got ? correct
2 of Bottom 10 scorers got ? correct
D = 10 correct - 2 correct = 8 = .8 (Good
(10 + 10)/2
10
Discrimination)
4/13/2015
TEST CONSTRUCTION Workshop
136
تفسير ضريب تميز سئوال
138
هر قدر ضريب تميز بزرگتر باشد ،قوه تميز آن سئوال
بيشتر و هر قدراين ضريب كوچكتر باشد قوه تميز آن
كمتر است.
در نتيجه سئوااهاي خوب يك آزمون آنهايي هستند كه
داراي ضريب دشواري متوسط و ضريب تميز بااليي
است.
TEST CONSTRUCTION Workshop
4/13/2015
D Index Rule of Thumb
for Classroom Tests
D Index
Interpretation
>40%
excellent discrimination
25% to 39%
acceptable
discrimination
< 25%
poor discrimination
4/13/2015
TEST CONSTRUCTION Workshop
140
Summary of Standards of Acceptance
Item Difficulty (P)
30% - 90%
Item Discrimination (by D)
25% and above
4/13/2015
TEST CONSTRUCTION Workshop
141
Difficulty Index
0,3
0,5
0,6
0,7
------/---------------(------------)----------/----------recommended
------------------------------------------acceptable
too difficult
too easy
4/13/2015
TEST CONSTRUCTION Workshop
142
Format
Ideal Difficulty
Five-response multiple-choice
70
Four-response multiple-choice
74
Three-response multiple-choice
77
True-false (two-response multiplechoice) 85
4/13/2015
TEST CONSTRUCTION Workshop
143
Discrimination Index
0.15
0.25
0.35
----------/----------/----------/---------throw off
4/13/2015
to check
good
TEST CONSTRUCTION Workshop
excelent
144
Be aware
very easy or very difficult test items have
little discrimination
items of moderate difficulty
(60% to 80% answering correctly)
generally are more discriminating.
4/13/2015
TEST CONSTRUCTION Workshop
145
Point-biserial correlation
Used to correlate a dichotomous variable with a
continuous variable
In testing, used to correlate a person’s performance
on an item (correct, incorrect) with their total test
score
Used as an index of item discrimination
the point biserial ranges from –1.00 to +1.00
The higher, the better. As a general rule, >+0.20 is
desirable
4/13/2015
TEST CONSTRUCTION Workshop
146
Point-biserial formula
Mean on the test
for people who got
item correct
4/13/2015
Mean on the test
for people who
got item incorrect
Standard
deviation
for test
IF for
item
TEST CONSTRUCTION Workshop
1 – IF for
item
147
بررس ي تحليلي سئواالت
ي
در آزمونهاي معيار
Criterion Reference
4/13/2015
TEST CONSTRUCTION Workshop
157
Criterion referenced tests
Two groups of individuals
U – Upper group (above criterion)
Upper group individuals
L – Lower group who got the item right
item
difficulty
index
item
discrimination
index
4/13/2015
p
D
U p Lp
U L
Up
U
Lower group individuals
who got the item right
Lp
L
TEST CONSTRUCTION Workshop
158
Example
A test of mastery of Istanbul geography. Outcome is that
60 individuals are “masters” and 20 failed the test.
Item 3: 45 “masters” and 10 who failed got the item
right.
What are the item difficulty and item discrimination
indices?
45 10
p
.69
60 20
45 10
D
.75 .50 .25
60 20
4/13/2015
TEST CONSTRUCTION Workshop
159
بررس ي تحصيلي سئواالت در آزمونهاي معياري
Criterion Reference
هدف :ميزان دستيابي افراد به دانش مورد نظر پس از طي دوره
بر حسب هدف آموزش ي سئوال ممكن است دشوار يا آسان باشد. شاخص دشواري در اين امتحان ارزش متفاوت داردً
سئواالت بسيار آسان و يا بسيار مشكل لزوماُ نياز به تغيير يا حذف شدن ندارد (اگراعتبار كافي داشته باشد)
براي بررس ي سئواالت در اين آزمونها از Pretest, Post testو مقايسهنتايج آنها استفاده ميشود.
160
TEST CONSTRUCTION Workshop
4/13/2015
شماره سؤال
1
3
2
5
4
الف Post test :
بPre test:
نام افراد
ب
الف
ب
الف
ب
الف
ب
الف
ب
الف
ح .د
-
+
+
+
-
-
+
-
-
+
س .ن
-
+
+
+
-
-
+
-
+
+
خ .پ
-
+
+
+
-
-
+
-
-
+
ش .ف
-
+
+
+
-
-
+
-
-
+
د .ه
-
+
+
+
-
-
-
-
+
+
ف .پ
-
+
+
+
-
-
+
-
-
-
S = Ra - Rb
T
S=Sensitivity Instructional Effects
تعداد كساني كه پس از آموزش به سؤال پاسخ درست دادهاند=Ra
تعداد كساني كه پيش از آموزش به سؤال پاسخ درست دادهاند=Rb
تعدادكساني كه به سؤال هم پيش و همه پس از آزمون پاسخ دادهاند=T
161
TEST CONSTRUCTION Workshop
4/13/2015
ضريب Sبراي بهترين سئوال و آزمونهاي معياري معادل
يك است.
سئواالتي كه با ضريب Sصفر و يا كمتر يا منفي باشد
قادر به سنجش تأثير آموزش نخواهد بود.
162
TEST CONSTRUCTION Workshop
4/13/2015
تحليل آزمونهای تشريحی و عملکردی
نمره ميانگين سوال
ضريب دشواری=
دامنه ممکن نمرات سوال
2/4
=
6-1
تفاوت بين نمرات ميانگين گروههای باال و پايين برای سوال
3/5
=
ضريب تميز =
دامنه ممکن نمرات سوال
6
163
TEST CONSTRUCTION Workshop
-8/2
4/13/2015
-1
تحليل گزينه هاي انحرافي
هر گزينه انحرافي بايد حداقل يك نفر از گروه
ضعيف را به خود جلب كند.
گزينه انحرافي بايد افراد ضعيف را بيش از افراد
قوي به خود جلب كند.
164
TEST CONSTRUCTION Workshop
4/13/2015
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
165
Two issues in using
instruments...
1. Validity: the degree to which the
instrument measures what it purports
to measure
2. Reliability: the degree to which the
instrument consistently measures
what it purports to measure
4/13/2015
TEST CONSTRUCTION Workshop
166
Types of reliability...
1. Stability
2. Equivalence
3. Internal consistency
4/13/2015
TEST CONSTRUCTION Workshop
167
1. Stability )“test-retest”(: the degree to
which two scores on the same
instrument are consistent over time
4/13/2015
TEST CONSTRUCTION Workshop
168
2. Equivalence )“equivalent forms”(: the
degree to which identical instruments
(except for the actual items included)
yield identical scores
4/13/2015
TEST CONSTRUCTION Workshop
169
3. Internal consistency )“split-half”
reliability with Spearman-Brown
correction formula , KuderRichardson and Cronback’s Alpha
reliabilities, scorer/rater reliability):
the degree to which one instrument
yields consistent results
4/13/2015
TEST CONSTRUCTION Workshop
170
RELIABILITY
TEST-RETEST
(COEFFICIENT OF STABILITY)
PARALLEL FORM
(COEFFICIENT OF EQUIVALLENCE)
INTERNAL CONSISTENCY
4/13/2015
TEST CONSTRUCTION Workshop
171
INTERNAL CONSISTENCY
SPLITHALF METHOD
SPEARMAN BROWN PROPHECY FORMULA
KRUDER-RICHARDSON METHOD
COEFFICIENT ALPHA
4/13/2015
TEST CONSTRUCTION Workshop
172
KR20
KR20 = [K / (K-1)] x [(S2x - pq) / S2x]
K = # of trials or items
S2x = variance of scores
p = percentage answering item right
q = percentage answering item wrong
pq = sum of pq products for all k items
4/13/2015
TEST CONSTRUCTION Workshop
173
KR20 Example
Item
1
2
3
4
p
.50
.25
.80
.90
q
.50
.75
.20
.10
If Mean = 2.45 and
SD = 1.2, what is KR20?
4/13/2015
pq
.25
.1875
.16
.09
pq = 0.6875
KR20 = (4/3) x (1.44 – 0.6875)/1.44
KR20 = .70
TEST CONSTRUCTION Workshop
174
KR21
If assume all test items are equally
difficult, KR20 can be simplified to KR21
KR21 =[(K x S2)-(Mean x (K - Mean)]
÷ [(K-1) x S2]
K = # of trials or items
S2 = variance of test
Mean = mean of test
4/13/2015
TEST CONSTRUCTION Workshop
175
RELIABILITY OF
CRITERION – REFERENCED
LINDMAN AND MERENDA
4/13/2015
TEST CONSTRUCTION Workshop
177
Rule of Thumb for Acceptable Reliability
Coefficients for Classroom Tests
Reliability Coefficient
Interpretation
.70 or higher
acceptable
reliability
4/13/2015
TEST CONSTRUCTION Workshop
178
ویژگیهای روش ارزیابی
Types of Validity:
Face
1. Item validity
Content
2. Sampling validity
Predictive
Concurrent
Construct
4/13/2015
Determined by expert judgment
Blueprinting
TEST CONSTRUCTION Workshop
179
Types of validity...
1. Content validity
2. Criterion-related validity
3. Construct validity
4/13/2015
TEST CONSTRUCTION Workshop
180
1. Content validity: the degree to which
an instrument measures an intended
content area
4/13/2015
TEST CONSTRUCTION Workshop
181
3. Construct validity: a series of studies
validate that the instrument really
measures what it purports to measure
4/13/2015
TEST CONSTRUCTION Workshop
182
forms of content validity…
…sampling validity: does the instrument
reflect the total content area?
…item validity: are the items included on
the instrument relevant to the
measurement of the intended content
area?
4/13/2015
TEST CONSTRUCTION Workshop
183
2. Criterion-related validity: an
individual takes two forms of an
instrument which are then
correlated to discriminate between
those individuals who possess a
certain characteristic from those
who do not
4/13/2015
TEST CONSTRUCTION Workshop
184
forms of criterion-related validity…
…concurrent validity: the degree to which
scores on one test correlate to scores
on another test when both tests are
administered in the same time frame
…predictive validity: the degree to which a
test can predict how well individual will
do in a future situation
4/13/2015
TEST CONSTRUCTION Workshop
185
Types of Validity
1. Content Validity
2. Empirical Validity
Face Validity
Sampling Validity (content validity)
Concurrent Validity
Predictive Validity
3. Construct Validity
4/13/2015
TEST CONSTRUCTION Workshop
186
4/13/2015
TEST CONSTRUCTION Workshop
187
Item discrimination
How well does the item separate those
that know the material from those that do
not.
In LXR, measured by the Point-Biserial
(rpb) correlation (ranges from -1 to 1).
rbp is the correlation between item and
exam performance
4/13/2015
TEST CONSTRUCTION Workshop
188
Item discrimination
+ rpb means that those scoring higher on the
exam were more likely to answer the item
correctly. (better discrimination)
- rpb means that high scorers on the exam
answered the item wrong more frequently than
low scorers. (poor discrimination)
A desirable rpb correlation is +0.20 or higher.
4/13/2015
TEST CONSTRUCTION Workshop
189
Evaluation of Distractors
Distractors are designed to fool those that
do not know the material. Those that do
not know the answer, guess among the
choices.
Distractors should be equally popular.
(# expected = # answered item wrong / # of
distractors)
Distractors ideally have a low or -rpb
4/13/2015
TEST CONSTRUCTION Workshop
190
LXR Example 1
(* correct answer)
N
%
Avg % Correct
on Exam
rpb
A*
86
99%
B
0
0%
C
0
0%
D
1
1%
E
0
0%
85.3%
0%
0%
82.0%
0%
+.06
----
---
-.06
---
Very easy item, would probably review the alternates to make sure they are
not ambiguous and/or provide clues that they are wrong.
4/13/2015
TEST CONSTRUCTION Workshop
191
LXR Example 2
(* correct answer)
A
B
C*
D
E
N
0
21
65
2
0
%
0%
24%
74%
2%
0%
0%
80.7%
87.2%
---
-.33
+.36
Avg % Correct
on Exam
rpb
78.7% 0%
-.13
---
Three of the alternatives are not functioning well, would review them.
4/13/2015
TEST CONSTRUCTION Workshop
192
LXR Example 3
(* correct answer)
A
B
C*
D
E
N
3
1
15
5
66
%
3%
1%
17%
6%
76%
Avg % Correct
on Exam
rpb
83.0% 80.0% 83.4% 82.2%
-.07
-.09
-.15
-.12
86.8
%
+.23
Probably a miskeyed item. The correct answer is likely option E.
4/13/2015
TEST CONSTRUCTION Workshop
193
LXR Example 4
(* correct answer)
A
B*
C
D
E
N
11
43
3
22
8
%
13%
49%
3%
25%
9%
Avg % Correct
on Exam
81.5%
87.4%
rpb
-.24
+.35
82.3% 84.5% 82.4%
-.09
-.08
-.15
Relatively hard item with good discrimination. Would review alternatives
C & D to see why they attract a relatively low & high number of students.
4/13/2015
TEST CONSTRUCTION Workshop
194
LXR Example 5
(* correct answer)
N
%
Avg % Correct
on Exam
rpb
A
B*
C
D
E
3
3%
60
69%
1
1%
5
6%
18
21%
83.0%
-.07
85.3% 80.0% 82.2% 86.8%
+.002
-.09
-.12
+.13
Poor discrimination for correct choice “B”. Choice “E” actually does a
better job discriminating. Would review item for proper keying, ambiguous
wording, proper wording of alternatives, etc. This item needs revision.
4/13/2015
TEST CONSTRUCTION Workshop
195
4/13/2015
TEST CONSTRUCTION Workshop
196