ارزشیابی پیشرفت تحصیلی

Download Report

Transcript ارزشیابی پیشرفت تحصیلی

IN THE NAME OF GOD
TEST CONSTRUCTION
WORKSHOP
J.KOOHPAYEHZADEH M.D , MPH
Education development center
Iran University of Medical
Sciences
4/13/2015
TEST CONSTRUCTION Workshop
1
“Tell me, I forget.
Ask me, I remember.
Involve me, I understand.”
4/13/2015
TEST CONSTRUCTION Workshop
2
Why Test?
Testing is 50% of
Teaching
4/13/2015
TEST CONSTRUCTION Workshop
3
Well defined educational objectives
prerequsite for assessment
Example for this session:
At the end of this session participants will be able:
 To named at list three differences between
summative and formative assessment
 To make a list of at least three written AM
 To name the most effective AM to assess clinical
skills
 To describe the most effective AM to assess
attitudes
4/13/2015
TEST CONSTRUCTION Workshop
4
Evaluating Students:
Tests ARE Not the Only Way!
Tests
 Projects
 Performance
 Participation

4/13/2015
TEST CONSTRUCTION Workshop
7
How ‫چگونه ؟‬
Why ‫چرا؟‬
‫ارزيابي‬
When
4/13/2015
‫چه موقع؟‬
What ‫چه چيزي را؟‬
TEST CONSTRUCTION Workshop
12
When?
Summative
Formative
Pre-test
4/13/2015
‫چه موقع ارزيابي كنيم؟‬
‫در پايان آموزش‬
‫در طول آموزش‬
‫قبل از آموزش‬
TEST CONSTRUCTION Workshop
13
‫چرا ارزيابي مي‌كنيم؟‬
‫‪.1‬‬
‫‪.2‬‬
‫‪.3‬‬
‫‪.4‬‬
‫‪.5‬‬
‫‪.6‬‬
‫‪.7‬‬
‫‪14‬‬
‫?‪WHY‬‬
‫تشويق به يادگيري‌‬
‫آگاه نمودن دانشجو‬
‫آگاه نمودن مدرس‬
‫ي‬
‫اصالح فعاليتهاي يادگير ‌‬
‫انتخاب دانشجو‬
‫گواهي دادن‬
‫كسب آمادگي ارتقاء‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
Why Evaluate Students?





To help students improve
To assess student learning
To determine if the teacher is teaching
Motivation tool
To communicate with others such as
parents
4/13/2015
TEST CONSTRUCTION Workshop
15
What?
‫دانش‬
‫مهارت‬
4/13/2015
TEST CONSTRUCTION Workshop
‫نگرش‬
19
Who Should Assess?

Faculty

Self

Peers

Tutors

Other team members

Standardized patients, patients

External and internal examiners

Public, society, …
4/13/2015
360 o
TEST CONSTRUCTION Workshop
26
Where?
Does
4/13/2015
Work Place
Assessment
Shows how
Test Center/Skill
Lab
Knows how
Examination Hall
knows
Examination Hall
TEST CONSTRUCTION Workshop
27
How to use assessment?



Summative:
usually undertaken at the end of a training
programme and determines whether the educational
objectives have been successfully achieved.
With summative assessment the students usually receives
a grade or a mark. Exam
Formative:
This is testing that is part of developmental
or ongoing teaching / learning process. It should include
delivery of feedback to the student.
4/13/2015
TEST CONSTRUCTION Workshop
29
Formative assesssment





Feedback
Feedback
Feedback
Feedback
Feedback
4/13/2015
TEST CONSTRUCTION Workshop
33
THANK YOU
ANY QUESTIONS?
4/13/2015
TEST CONSTRUCTION Workshop
41
Stages of test development
Conceptualization
Construction
Tryout
Item analysis
Revision
4/13/2015
TEST CONSTRUCTION Workshop
42
Conceptualization
An idea…
4/13/2015
TEST CONSTRUCTION Workshop
43
Conceptualization





What will it measure?
What is the objective?
Is there a need?
Who will use it?
Etc…
4/13/2015
TEST CONSTRUCTION Workshop
44
Test Construction Principles


Adequate provision should be made for
evaluating all the teacher objectives of
the instruction.
The test should reflect the approximate
proportion of emphasis in the course.
4/13/2015
TEST CONSTRUCTION Workshop
45
Preparing the test


The preliminary draft of the test should
be prepared as early as possible.
As a rule the test should include more
than one type of item.
4/13/2015
TEST CONSTRUCTION Workshop
46
Preparing the test, continued



The content of the test should range
from very easy to very difficult for the
group being measured.
The items in the test should be
arranged in order of difficulty.
The items should be so phrased that
the content rather than the form of the
statement will determine the answer.
4/13/2015
TEST CONSTRUCTION Workshop
47
Preparing the test, continued



A regular sequence in the pattern of
response should be avoided.
The directions to the pupils should be
as clear, complete and concise as
possible.
One question should not provide the
answer to another question.
4/13/2015
TEST CONSTRUCTION Workshop
48
Item Analysis


Process of determining which items are
“good”
Tools in item analysis




Item
Item
Item
Item
4/13/2015
difficulty index
reliability index
validity index
discrimination index
TEST CONSTRUCTION Workshop
50
Characteristics of
assessment Tools
4/13/2015
TEST CONSTRUCTION Workshop
55
Reliability
If an assessment
is repeated with
the same
trainees, they
should get the
same results
4/13/2015
TEST CONSTRUCTION Workshop
57
Validity

What is it?
the degree to which a measurement instrument
truly measures what it is intended to measure
 Importance:


If the assessment test does not test what
it is meant to test so the test is useless
Reliability is a pre-req for validity but not
sufficient by itself
4/13/2015
TEST CONSTRUCTION Workshop
58
Standardization

What is it?

All students are tested on the same test
items, patients, tasks & according to the
same criteria

Importance:

So that no one gets more easy or
difficult questions (Fairness)
4/13/2015
TEST CONSTRUCTION Workshop
60
Feasibility

What is it?

Importance
4/13/2015
TEST CONSTRUCTION Workshop
61
Objectivity

What is it?

it is a level of agreement among independent
assessors (experts) about the right answer to
certain question

Importance

Decreases intra-rater and inter-rater
bias
4/13/2015
TEST CONSTRUCTION Workshop
62
‫ويژگيهاي يك آزمون‌‬
‫اعتبار ‪Validity‬‬
‫ميزان دقت يك وسيله اندازه‌گيري در اندازه‌گيري موضوع مورد نظر‬
‫قابليت اطمينان ‪Reliability‬‬
‫ميزان ثبات يك وسيله اندازه‌گيري در اندازه‌گيري يك متغيير‬
‫عينيت ‪Objectivity‬‬
‫براي هر‬
‫درجه توافق بين قضاوتهاي مستقل تعدادي ممتحن خبره بر سر پاسخهاي خوب ‌‬
‫يك از اجزاي وسايل اندازه‌گيري‌‬
‫عملي بودن ‪Practicability‬‬
‫سهولت كلي استفاده از يك آزمون هم براي سازنده آزمون و هم براي دانشجويان‬
‫‪63‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫رابطه ميان روايي و پايايي‬
Validityvalidity+
Reliability+
reliability +
• •
• •
• •
• •
validityReliability4/13/2015
•
•
•
•
TEST CONSTRUCTION Workshop
64
‫ن‬
‫جدول مشخصات آزمو ‌‬
‫)‪(Table of specifications‬‬
‫يك جدول د ‌وبعدي است‪:‬‬
‫‪ -1‬بعد افقي‪ :‬محتواي آموزش ي مورد نظر‬
‫‪ -2‬بعد عمودي ‪:‬سطوح حيطه شناختي‬
‫(دانش ‪ ،‬ادراك ‪ ،‬كاربرد‪ ،‬تجزيه و تحليل‪)..،‬‬
‫‪69‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫م‬
‫سطوح‬
‫دانش‬
‫درك‬
‫كاربرد‬
‫تجزيه و تحليل‬
‫محتواي‬
‫آموزش ي‬
‫نارسايي قلب‬
‫‪2‬سؤال‬
‫‪1‬سؤال‬
‫‪0‬سؤال‬
‫‪0‬سؤال‬
‫شوك‬
‫‪2‬سؤال‬
‫‪1‬سؤال‬
‫‪1‬سؤال‬
‫‪1‬سؤال‬
‫مسموميت با‬
‫ديگوكسين‬
‫‪1‬سؤال‬
‫‪1‬سؤال‬
‫‪1‬سؤال‬
‫‪0‬سؤال‬
‫‪70‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫جدول مشخصات آزمون‬
‫‪.1‬‬
‫‪.2‬‬
‫‪.3‬‬
‫بعد محتوا‬
‫تعداد كل سئوالها‬
‫بعد هدف‬
‫دانش‬
‫‪.1‬‬
‫‪.2‬‬
‫‪.3‬‬
‫فهميدن‬
‫‪.1‬‬
‫‪.2‬‬
‫تحليل‬
‫تركيب‬
‫ارزشيابي‬
‫تعداد كل سئوالها‬
‫درصد سئوالها‬
‫‪71‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫تعداد ساعتهائي كه صرف تدريس يك موضوع شده‬
‫هر موضوع(بخش)=‬
‫نسبت ساعتهاي تدريس براي ‌‬
‫تعداد كل ساعتهاي تدريس يك دوره (واحد درس ي)‬
‫درصد سئواالت هر بخش= ‪ *100‬نسبت ساعتهاي تدريس هر موضوع‬
‫تعداد سئوالها‬
‫درصدسئوالهاي‬
‫ساعتهاي تدريس‬
‫عناوين يك دوره درس ي يا‬
‫‪2‬واحد درس ي (‪)36‬‬
‫‪6‬‬
‫‪%11‬‬
‫‪4‬‬
‫‪2‬‬
‫‪8‬‬
‫‪.1‬‬
‫‪.2‬‬
‫‪.3‬‬
‫‪50‬‬
‫‪%100‬‬
‫‪36‬‬
‫جمع‬
‫‌در صد سؤاالت بخش يك ‪4 =0/11 *100=%11‬‬
‫= نسبت ساعتهاي تدريس حال آنچه‬
‫‪36‬‬
‫يك آزمون‌ ‪ 50‬سئوال ‌از اين دوره درس ي بايد تهيه شود تعداد سئواالت مربوط به بخش يك مي‌شود ‪.‬‬
‫‪72‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪6‬‬
‫‪100 11‬‬
‫‪50‬‬
‫‪* ‬‬
‫‪4/13/2015‬‬
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
73
‫انواع آزمونها‬
(Written) ‫ كتبي‬.1
MCQ :‫عینی‬
Essay : ‫غير عینی‬
(Oral) ‫ شفاهي‬.2
(Practical) ‫ عملي‬.3
Log Book Portfolio
4/13/2015
MiniCEX
MSF
TEST CONSTRUCTION Workshop
OSCE
DOPS
74
‫?‪What are assessment tools‬‬
‫باز‬
‫كتبي‬
‫تشريحي‬
‫محدود پاسخ‬
‫‪restricted‬‬
‫كوتاه پاسخ‬
‫گسترده پاسخ‬
‫‪extended‬‬
‫صحيح‪-‬غلط‬
‫بسته‬
‫جور كردني‬
‫چندگزينه‌اي‬
‫انجام تكاليف ‪Assignments‬‬
‫‪75‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫انواع آزمونهاي تشريحي‬
‫‪‬‬
‫گسترده پاسخ ‪Extended response‬‬
‫سطح تركيب و ارزشيابي‬
‫‪‬‬
‫محدود پاسخ ‪Restricted response‬‬
‫سطوح فهميدن‪ ،‬كاربستن و تحليل‬
‫‪77‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫انواع آزمونهاي كوتاه پاسخ‬
‫براي سطوح پايين حيطه شناختي (حداكثر تا مرحله به كار بستن)‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫‪78‬‬
‫پرسش ي‬
‫كامل كردني‬
‫تشخيص ي (تداعي)‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
(objective) ‫انواع آزمونهاي عيني‬
4/13/2015
True- False
‫ غلط‬-‫صحيح‬

matching
‫جور كردني‬

Multiple- choice
‫چند گزينه‌اي‬

TEST CONSTRUCTION Workshop
79
Action
1. Professionalism Eval Form
2. End-of-Rotation Eval
3. 360° Evals
4. Mini-CEX
5. Critical Incident Reports
6. Record Reviews
Decision Making
1. OSCE
2. SP Exam
3. Computer Simulated
Patient
Reasoning
1. Oral Exam
2. Essay
3. MCQ
Awareness
1. Oral Exam
2. Essay
3. MCQ
4/13/2015
ASSESSMENT TOOLS
Action
DOES
Shows
How
Knows How
Knows
TEST CONSTRUCTION Workshop
Decision
Making
Reasoning
Awareness
Miller’s Pyramid
80
Miller 1990
How to assess
Knowledge, Skills, Attitudes
Written
Exams
Clinical
Exams
Viva
Knowledge
++++
+
++
Psychomot
or skills
-
++++
-
Attitude
-
+
+
4/13/2015
TEST CONSTRUCTION Workshop
81
‫نكاتي از تدوين آزمونهاي كتبي‬
‫‪‬‬
‫سؤاالت را به ترتيب ذيل قرار دهيد‪:‬‬
‫‪ -1‬صحيح‪ -‬غلط‬
‫‪ -2‬جوركردني‬
‫‪ -3‬چندگزينهاي‬
‫‪ -4‬كوتاه پاسخ‬
‫‪ -5‬تشريحي‬
‫‪ ‬سؤاالت از ساده به دشوار مرتب شود‪.‬‬
‫‪ ‬سؤاالت را به ترتيب سازمان اصلي مطالب به دنبال هم مرتب كنيد‪.‬‬
‫‪90‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
MCQ
‫گزينه يا پاسخ‬
‫تنه اصلي‬
‫پاسخ انحرافي‬
‫پاسخ درست‬
Destructor
Key
4/13/2015
TEST CONSTRUCTION Workshop
91
‫انواع آزمونهاي چند گزينه‌اي‬
‫‪‬‬
‫تنها گزينه درست‬
‫‪‬‬
‫بهترين گزينه درست‬
‫‪‬‬
‫منفي‬
‫‪92‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
Millman ‫قوانين‬
MCQ ‫در خصوص‬
4/13/2015
TEST CONSTRUCTION Workshop
93
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -1‬پايه بايد مسائل اصلي و كميتها را در برگيرد‪.‬‬
‫‪ -2‬هر ‪ Item‬بايد تا حد امكان كوتاه باشد ( ضمن حفظ‬
‫وضوح جمالت)‬
‫‪ -3‬از ذكر سئواالت منفي در پايه حتيالمقدور خودداري شود‪.‬‬
‫در صورت انجام اين امر زير جمله منفي خط كشيده شود يا‬
‫با حروف درشت نوشته شود‪.‬‬
‫‪94‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -4‬پايه سئوال بايد بنحوي تنظيم شود كه بدون كمك گرفتن از‬
‫ديگر موارد گزينهها بيان كننده مسئله اصلي باشد‪ .‬گزينه‬
‫ها نيز بايد حتيالمقدور مستقل از يكدگير باشد‪.‬‬
‫‪ -5‬بهترين پاسخ بايد خواسته شود يا از عبارت بيشترين و‬
‫اوليه استفاده شود‪( .‬در صورتيكه بيش از يك پاسخ نسبتا ً‬
‫صحيح داشته باشد)‬
‫‪ -6‬در پايه سئواالتي كه جاي خالي گذاشته ميباشد‪ .‬قسمت‬
‫حذف شده كه بايد پرشود حتيالمقدور نبايد ابتداي جمله‬
‫گذاشته شود‪.‬‬
‫‪95‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -7‬دشواريهاي زباني گزينهها بايد پايين باشد‪.‬‬
‫‪ -8‬با هر گزينه يك نقطه نظر را بايد مورد سئوال قرار‬
‫داد‪.‬‬
‫‪ -9‬حتيالمقدور از تكرار كلمات در گزينهها خودداري‬
‫شود مگر توالي منطقي وجود داشته باشد‪.‬‬
‫‪96‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -10‬سئواالت انحرافي بايد منطقي و جالب توجه باشد (در‬
‫صورتي كه پايه سئوال درك و فهم واقعي را اندازهگيري‬
‫نمايد)‪.‬‬
‫‪ -11‬تمام گزينهها از نظر دستور زبان و اصول گرامر بايد‬
‫مطابق با پايه سئوال باشد يعني اگر پايه سئوال جمع است‬
‫گزينهها نيز همه جمع باشند‪.‬‬
‫‪ -12‬گزينه از نظر طول جمله‪ ،‬دشواري فني و كاربردي يكسان‬
‫باشند‪.‬‬
‫‪97‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -13‬پايه و گزينهها بايد از نظر قواعد دستوري‪ ،‬محتوي‬
‫موضوعي و شكل يكنواخت و همگن باشد‪.‬‬
‫‪ -14‬از توالي پاسخ صحيح در مجموعه سئواالت امتحاني‬
‫خودداري شود‪.‬‬
‫(بترتيب‪ :‬الف‪ ،‬ب‪ ،‬ج ‪ ،‬د جواب صحيح نباشد يا اكثريت با‬
‫جواب ج نباشد)‬
‫‪98‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -15‬بازاي هر موضوع حداقل ‪ 4‬گزينه داشته باشيد‪.‬‬
‫‪ -16‬از بكاربردن عباراتي كه بنحوي تشابه بين پايه و‬
‫سئوال باشد‪ ،‬بايد خودداري كرد‪.‬‬
‫‪ -17‬از بكاربردن عين عبارت كتاب خودداري شود‪.‬‬
‫‪ -18‬از بكار بردن پايه سئواالتي كه پاسخ به سئوال بعدي‬
‫است‪ ،‬خودداري شود‬
‫‪99‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪ 21‬قانون ‪ Millman‬در خصوص ‪MCQ‬‬
‫‪ -19‬گزينهها نبايد شامل يكديگر يا در حقيقت با يك‬
‫منظور باشند‪.‬‬
‫‪ -20‬از شاخصهاي معلوم و خاص مثل هميشه‪ ،‬هرگز‬
‫خودداري شود‪.‬‬
‫‪ -21‬در پرسش راجع به فهم و درك يك اصطالح يا‬
‫مفهوم‪ ،‬ابتدا اصطالح را ارائه نمود و سپس با يك‬
‫سري مشخصه و تعاريف گزينه ها را انتخاب نمود‪.‬‬
‫‪100‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
102
‫‪M.P.L.‬‬
‫محاسبه حد نصاب قبولي‬
‫‪Minimum Pass Level‬‬
‫ارزش اختصاص داده شده به گزينه صحيح‬
‫حدنصاب قبولي براي هر سئوال=‬
‫حد نصاب قبولي براي امتحان =‬
‫مجموع امتياز داده شده به كليه گزينه‌ها‬
‫مجموع حدنصاب قبولي سئواالت امتحان‬
‫تعداد سئواالت‬
‫‪111‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
Item Analysis


Main purpose of item analysis is to improve
the test
Analyze items to identify:
•
•
•
•
Potential mistakes in scoring
Ambiguous/tricky items
Alternatives that do not work well
Problems with time limits
4/13/2015
TEST CONSTRUCTION Workshop
112
‫انواع آزمونها‬
Criterion- Referenced
and
Norm- Referenced
TESTS
)‫ي(مالكي‬
‌ ‫آزمونهاي معيار‬
)‫آزمونهاي هنجاري (رقابتي‬
4/13/2015
TEST CONSTRUCTION Workshop
113
TYPES OF TESTS BY PURPOSE
1. Norm-referenced Tests
a. Discrimination most
important aspect
b. Easy items eliminated
2. Criterion-referenced Tests
a. Discrimination not of
critical importance.
b. Items not altered or eliminated
due to difficulty
4/13/2015
TEST CONSTRUCTION Workshop
114
‫‪Criterion- Referenced‬‬
‫قبل ‌از برگزاري‌ آزمون‌ معيارهاي مشخص جهت اطمينان ‌از كسب حداقل دانش ‌و توانايي‌هاي‬
‫در آزمون‌ با مقايسه‬
‫دانشجو ‌‬
‫‌‬
‫خاص تعيين مي‌شود ‌و سنجش موفقيت يا عدم موفقيت‬
‫وضعيت وي‌ با معيارهاي تعيين شده انجام مي‌گيرد‪.‬‬
‫بيشتر براي امتحانات نهايي ‌و جهت اعطاي گواهينامه كاربرد دارد‪.‬‬
‫‌‬
‫اين روش‬
‫مثال‪ :‬آزمون ورودي دانشكده خلباني آزمون دانشنامه تخصص ي‬
‫‪115‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫‪Norm- Referenced‬‬
‫نتايج بدست آمده ‌از كليه دانشجويان با هم مقايسه مي‌شوند‪ .‬حدنصاب قبولي بصورت‬
‫قرادادي ‌و يا با توجه به نمرات اخذ شده توسط دانشجويان تعيين مي‌شود‪.‬‬
‫بيشتر براي امتحانات ورودي ‌و تشخيص ي كاربرد دارد‪.‬‬
‫‌‬
‫اين روش‬
‫مثال‪ :‬آزمون ورودي دانشگاهها‬
‫‪116‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫بررس ي تحليلي سئواالت‬
‫ي‬
‌ ‫در آزمونهاي هنجار‬
Norm Reference
4/13/2015
TEST CONSTRUCTION Workshop
117
ITEM ANALYSIS


an Assessment tool
has 3 parts
1. Item Difficulty
2. Item Discrimination
3. Distraction Analysis
4/13/2015
TEST CONSTRUCTION Workshop
118
‫مراحل تجزيه و تحليل سئواالت‬
‫‪ .1‬تعيين نمره هر يك از دانشجويان‬
‫‪ .2‬رتبه بندي دانشجويان براساس شايستگي‬
‫‪ .3‬تعيين گروههاي باال و پائين‬
‫‪ .4‬محاسبه ضريب و شاخص دشواري براي هر سئوال‬
‫‪ .5‬محاسبه ضريب و شاخص تشخيص براي هر سئوال‬
‫‪ .6‬ارزيابي انتقادي سئواالت‬
‫‪119‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫كارت تحليل سئوال‬
‫تاريخ اجراي آزمون ‪2/11/73‬‬
‫عنوان آزمون‪ :‬آمار استنباطي‬
‫موضوع سئوال‪ :‬ضريب همبستگي‬
‫كدام يك از ارقام زير معرف ضريب همبستگي بيشتري است؟‬
‫الف‪55/0 -‬‬
‫*ب‪61/0 -‬‬
‫ج‪49/0 -‬‬
‫د‪23/0 -‬‬
‫‪120‬‬
‫گروهها‬
‫الف‬
‫ب‬
‫ج‬
‫د‬
‫بدون پاسخ‬
‫‪ %25‬باال‬
‫‪ %25‬پايين‬
‫ضريب دشواري =‪35‬‬
‫ضريب تميز=‪3/0‬‬
‫‪0‬‬
‫‪5‬‬
‫‪5‬‬
‫‪2‬‬
‫‪3‬‬
‫‪3‬‬
‫‪0‬‬
‫‪0‬‬
‫‪2‬‬
‫‪0‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪10‬‬
‫‪10‬‬
‫‪4/13/2015‬‬
Tests of individual differences

Two groups of individuals



U – Upper group – 27% of highest scorers
L – Lower group – 27% of lowest scorers
U=L
Upper group individuals
who got the item right
item
difficulty
index
item
discrimination
index
4/13/2015
p
U p  Lp
D
U L
Lower group individuals
who got the item right
U p  Lp
U
TEST CONSTRUCTION Workshop
121
Example – cont.


60 students who took the test.
Item 14: Among 16 upper scorers, 5
have the item right. Among 16 lower
scorers, only 1 has the item right.
5 1
p
 .19
32
4/13/2015
5 1
D
 .25
16
TEST CONSTRUCTION Workshop
122
ITEM ANALYSIS
Difficulty (D): 0 - 1
0______________0.5____________1.0
Hard
Moderate
Easy
4/13/2015
TEST CONSTRUCTION Workshop
129
ITEM ANALYSIS
Example:

30 students in class


5 of Top 10 scorers got ? correct
3 of Bottom 10 scorers got ? correct
D = 5 correct + 3 correct =
10 +
10
4/13/2015
8 = .4 (Moderate
20
Difficulty)
TEST CONSTRUCTION Workshop
130
ITEM ANALYSIS
Discrimination Index
0____________0.5_____________1.0
No
Moderate
Excellent
(-) Something is wrong
4/13/2015
TEST CONSTRUCTION Workshop
135
ITEM ANALYSIS
Example:

30 students in class


10 of Top 10 scorers got ? correct
2 of Bottom 10 scorers got ? correct
D = 10 correct - 2 correct = 8 = .8 (Good
(10 + 10)/2
10
Discrimination)
4/13/2015
TEST CONSTRUCTION Workshop
136
‫تفسير ضريب تميز سئوال‬
‫‪‬‬
‫‪‬‬
‫‪138‬‬
‫هر قدر ضريب تميز بزرگتر باشد‪ ،‬قوه تميز آن سئوال‬
‫بيشتر و هر قدراين ضريب كوچكتر باشد قوه تميز آن‬
‫كمتر است‪.‬‬
‫در نتيجه سئوااهاي خوب يك آزمون آنهايي هستند كه‬
‫داراي ضريب دشواري متوسط و ضريب تميز بااليي‬
‫است‪.‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
D Index Rule of Thumb
for Classroom Tests
D Index
Interpretation
>40%
excellent discrimination
25% to 39%
acceptable
discrimination
< 25%
poor discrimination
4/13/2015
TEST CONSTRUCTION Workshop
140
Summary of Standards of Acceptance
Item Difficulty (P)
30% - 90%
Item Discrimination (by D)
25% and above
4/13/2015
TEST CONSTRUCTION Workshop
141
Difficulty Index
0,3
0,5
0,6
0,7
------/---------------(------------)----------/----------recommended
------------------------------------------acceptable
too difficult
too easy
4/13/2015
TEST CONSTRUCTION Workshop
142
Format
Ideal Difficulty
Five-response multiple-choice
70
Four-response multiple-choice
74
Three-response multiple-choice
77
True-false (two-response multiplechoice) 85
4/13/2015
TEST CONSTRUCTION Workshop
143
Discrimination Index
0.15
0.25
0.35
----------/----------/----------/---------throw off
4/13/2015
to check
good
TEST CONSTRUCTION Workshop
excelent
144
Be aware
very easy or very difficult test items have
little discrimination
 items of moderate difficulty
(60% to 80% answering correctly)
generally are more discriminating.

4/13/2015
TEST CONSTRUCTION Workshop
145
Point-biserial correlation





Used to correlate a dichotomous variable with a
continuous variable
In testing, used to correlate a person’s performance
on an item (correct, incorrect) with their total test
score
Used as an index of item discrimination
the point biserial ranges from –1.00 to +1.00
The higher, the better. As a general rule, >+0.20 is
desirable
4/13/2015
TEST CONSTRUCTION Workshop
146
Point-biserial formula
Mean on the test
for people who got
item correct
4/13/2015
Mean on the test
for people who
got item incorrect
Standard
deviation
for test
IF for
item
TEST CONSTRUCTION Workshop
1 – IF for
item
147
‫بررس ي تحليلي سئواالت‬
‫ي‬
‌ ‫در آزمونهاي معيار‬
Criterion Reference
4/13/2015
TEST CONSTRUCTION Workshop
157
Criterion referenced tests

Two groups of individuals


U – Upper group (above criterion)
Upper group individuals
L – Lower group who got the item right
item
difficulty
index
item
discrimination
index
4/13/2015
p
D
U p  Lp
U L
Up
U

Lower group individuals
who got the item right
Lp
L
TEST CONSTRUCTION Workshop
158
Example


A test of mastery of Istanbul geography. Outcome is that
60 individuals are “masters” and 20 failed the test.
 Item 3: 45 “masters” and 10 who failed got the item
right.
What are the item difficulty and item discrimination
indices?
45  10
p
 .69
60  20
45 10
D

 .75  .50  .25
60 20
4/13/2015
TEST CONSTRUCTION Workshop
159
‫بررس ي تحصيلي سئواالت در آزمونهاي معياري‌‬
‫‪Criterion Reference‬‬
‫هدف‪ :‬ميزان دستيابي افراد به دانش مورد نظر پس از طي دوره‬
‫ بر حسب هدف آموزش ي سئوال ممكن است دشوار يا آسان باشد‪.‬‬‫ شاخص دشواري در اين امتحان ارزش متفاوت دارد‬‫ً‬
‫ سئواالت بسيار آسان و يا بسيار مشكل لزوماُ نياز به تغيير يا حذف شدن ندارد (اگر‬‫اعتبار كافي داشته باشد)‬
‫ براي بررس ي سئواالت در اين آزمونها از ‪ Pretest, Post test‬و مقايسه‬‫نتايج آنها استفاده مي‌شود‪.‬‬
‫‪160‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫شماره سؤال‬
‫‪1‬‬
‫‪3‬‬
‫‪2‬‬
‫‪5‬‬
‫‪4‬‬
‫الف ‪Post test :‬‬
‫ب‪Pre test:‬‬
‫نام افراد‬
‫ب‬
‫الف‬
‫ب‬
‫الف‬
‫ب‬
‫الف‬
‫ب‬
‫الف‬
‫ب‬
‫الف‬
‫ح‪ .‬د‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫س‪ .‬ن‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫خ‪ .‬پ‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫ش‪ .‬ف‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫د‪ .‬ه‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫ف‪ .‬پ‬
‫‪-‬‬
‫‪+‬‬
‫‪+‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪+‬‬
‫‪-‬‬
‫‪-‬‬
‫‪-‬‬
‫‪S = Ra - Rb‬‬
‫‪T‬‬
‫‪S=Sensitivity Instructional Effects‬‬
‫تعداد كساني كه پس از آموزش به سؤال پاسخ درست داده‌اند=‪Ra‬‬
‫تعداد كساني كه پيش از آموزش به سؤال پاسخ درست داده‌اند=‪Rb‬‬
‫تعدادكساني كه به سؤال هم پيش و همه پس از آزمون پاسخ داده‌اند=‪T‬‬
‫‪161‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫ضريب ‪ S‬براي بهترين سئوال و آزمونهاي معياري معادل‬
‫يك است‪.‬‬
‫سئواالتي كه با ضريب ‪ S‬صفر و يا كمتر يا منفي باشد‬
‫قادر به سنجش تأثير آموزش نخواهد بود‪.‬‬
‫‪162‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
‫تحليل آزمونهای تشريحی و عملکردی‬
‫نمره ميانگين سوال‬
‫‪‬‬
‫ضريب دشواری=‬
‫دامنه ممکن نمرات سوال‬
‫‪2/4‬‬
‫=‬
‫‪6-1‬‬
‫تفاوت بين نمرات ميانگين گروههای باال و پايين برای سوال‬
‫‪3/5‬‬
‫=‬
‫‪ ‬ضريب تميز =‬
‫دامنه ممکن نمرات سوال‬
‫‪6‬‬
‫‪163‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪-8/2‬‬
‫‪4/13/2015‬‬
‫‪-1‬‬
‫تحليل گزينه هاي انحرافي‬
‫‪‬‬
‫‪‬‬
‫هر گزينه انحرافي بايد حداقل يك نفر از گروه‬
‫ضعيف را به خود جلب كند‪.‬‬
‫گزينه انحرافي بايد افراد ضعيف را بيش از افراد‬
‫قوي به خود جلب كند‪.‬‬
‫‪164‬‬
‫‪TEST CONSTRUCTION Workshop‬‬
‫‪4/13/2015‬‬
Thank you for your Time
Any Questions or Comments?
4/13/2015
TEST CONSTRUCTION Workshop
165
Two issues in using
instruments...
1. Validity: the degree to which the
instrument measures what it purports
to measure
2. Reliability: the degree to which the
instrument consistently measures
what it purports to measure
4/13/2015
TEST CONSTRUCTION Workshop
166
Types of reliability...
1. Stability
2. Equivalence
3. Internal consistency
4/13/2015
TEST CONSTRUCTION Workshop
167
1. Stability )“test-retest”(: the degree to
which two scores on the same
instrument are consistent over time
4/13/2015
TEST CONSTRUCTION Workshop
168
2. Equivalence )“equivalent forms”(: the
degree to which identical instruments
(except for the actual items included)
yield identical scores
4/13/2015
TEST CONSTRUCTION Workshop
169
3. Internal consistency )“split-half”
reliability with Spearman-Brown
correction formula , KuderRichardson and Cronback’s Alpha
reliabilities, scorer/rater reliability):
the degree to which one instrument
yields consistent results
4/13/2015
TEST CONSTRUCTION Workshop
170
RELIABILITY

TEST-RETEST

(COEFFICIENT OF STABILITY)
PARALLEL FORM
(COEFFICIENT OF EQUIVALLENCE)

INTERNAL CONSISTENCY
4/13/2015
TEST CONSTRUCTION Workshop
171
INTERNAL CONSISTENCY

SPLITHALF METHOD
SPEARMAN BROWN PROPHECY FORMULA

KRUDER-RICHARDSON METHOD

COEFFICIENT ALPHA
4/13/2015
TEST CONSTRUCTION Workshop
172
KR20

KR20 = [K / (K-1)] x [(S2x - pq) / S2x]





K = # of trials or items
S2x = variance of scores
p = percentage answering item right
q = percentage answering item wrong
pq = sum of pq products for all k items
4/13/2015
TEST CONSTRUCTION Workshop
173
KR20 Example
Item
1
2
3
4
p
.50
.25
.80
.90
q
.50
.75
.20
.10
If Mean = 2.45 and
SD = 1.2, what is KR20?
4/13/2015
pq
.25
.1875
.16
.09
pq = 0.6875
KR20 = (4/3) x (1.44 – 0.6875)/1.44
KR20 = .70
TEST CONSTRUCTION Workshop
174
KR21
If assume all test items are equally
difficult, KR20 can be simplified to KR21
KR21 =[(K x S2)-(Mean x (K - Mean)]
÷ [(K-1) x S2]




K = # of trials or items
S2 = variance of test
Mean = mean of test
4/13/2015
TEST CONSTRUCTION Workshop
175
RELIABILITY OF
CRITERION – REFERENCED
LINDMAN AND MERENDA
4/13/2015
TEST CONSTRUCTION Workshop
177
Rule of Thumb for Acceptable Reliability
Coefficients for Classroom Tests
Reliability Coefficient
Interpretation
.70 or higher
acceptable
reliability
4/13/2015
TEST CONSTRUCTION Workshop
178
‫ویژگیهای روش ارزیابی‬

Types of Validity:

Face
1. Item validity

Content
2. Sampling validity

Predictive
Concurrent

Construct

4/13/2015
Determined by expert judgment
Blueprinting
TEST CONSTRUCTION Workshop
179
Types of validity...
1. Content validity
2. Criterion-related validity
3. Construct validity
4/13/2015
TEST CONSTRUCTION Workshop
180
1. Content validity: the degree to which
an instrument measures an intended
content area
4/13/2015
TEST CONSTRUCTION Workshop
181
3. Construct validity: a series of studies
validate that the instrument really
measures what it purports to measure
4/13/2015
TEST CONSTRUCTION Workshop
182
forms of content validity…
…sampling validity: does the instrument
reflect the total content area?
…item validity: are the items included on
the instrument relevant to the
measurement of the intended content
area?
4/13/2015
TEST CONSTRUCTION Workshop
183
2. Criterion-related validity: an
individual takes two forms of an
instrument which are then
correlated to discriminate between
those individuals who possess a
certain characteristic from those
who do not
4/13/2015
TEST CONSTRUCTION Workshop
184
forms of criterion-related validity…
…concurrent validity: the degree to which
scores on one test correlate to scores
on another test when both tests are
administered in the same time frame
…predictive validity: the degree to which a
test can predict how well individual will
do in a future situation
4/13/2015
TEST CONSTRUCTION Workshop
185
Types of Validity

1. Content Validity



2. Empirical Validity



Face Validity
Sampling Validity (content validity)
Concurrent Validity
Predictive Validity
3. Construct Validity
4/13/2015
TEST CONSTRUCTION Workshop
186
4/13/2015
TEST CONSTRUCTION Workshop
187
Item discrimination



How well does the item separate those
that know the material from those that do
not.
In LXR, measured by the Point-Biserial
(rpb) correlation (ranges from -1 to 1).
rbp is the correlation between item and
exam performance
4/13/2015
TEST CONSTRUCTION Workshop
188
Item discrimination



+ rpb means that those scoring higher on the
exam were more likely to answer the item
correctly. (better discrimination)
- rpb means that high scorers on the exam
answered the item wrong more frequently than
low scorers. (poor discrimination)
A desirable rpb correlation is +0.20 or higher.
4/13/2015
TEST CONSTRUCTION Workshop
189
Evaluation of Distractors


Distractors are designed to fool those that
do not know the material. Those that do
not know the answer, guess among the
choices.
Distractors should be equally popular.
(# expected = # answered item wrong / # of
distractors)

Distractors ideally have a low or -rpb
4/13/2015
TEST CONSTRUCTION Workshop
190
LXR Example 1
(* correct answer)
N
%
Avg % Correct
on Exam
rpb
A*
86
99%
B
0
0%
C
0
0%
D
1
1%
E
0
0%
85.3%
0%
0%
82.0%
0%
+.06
----
---
-.06
---
Very easy item, would probably review the alternates to make sure they are
not ambiguous and/or provide clues that they are wrong.
4/13/2015
TEST CONSTRUCTION Workshop
191
LXR Example 2
(* correct answer)
A
B
C*
D
E
N
0
21
65
2
0
%
0%
24%
74%
2%
0%
0%
80.7%
87.2%
---
-.33
+.36
Avg % Correct
on Exam
rpb
78.7% 0%
-.13
---
Three of the alternatives are not functioning well, would review them.
4/13/2015
TEST CONSTRUCTION Workshop
192
LXR Example 3
(* correct answer)
A
B
C*
D
E
N
3
1
15
5
66
%
3%
1%
17%
6%
76%
Avg % Correct
on Exam
rpb
83.0% 80.0% 83.4% 82.2%
-.07
-.09
-.15
-.12
86.8
%
+.23
Probably a miskeyed item. The correct answer is likely option E.
4/13/2015
TEST CONSTRUCTION Workshop
193
LXR Example 4
(* correct answer)
A
B*
C
D
E
N
11
43
3
22
8
%
13%
49%
3%
25%
9%
Avg % Correct
on Exam
81.5%
87.4%
rpb
-.24
+.35
82.3% 84.5% 82.4%
-.09
-.08
-.15
Relatively hard item with good discrimination. Would review alternatives
C & D to see why they attract a relatively low & high number of students.
4/13/2015
TEST CONSTRUCTION Workshop
194
LXR Example 5
(* correct answer)
N
%
Avg % Correct
on Exam
rpb
A
B*
C
D
E
3
3%
60
69%
1
1%
5
6%
18
21%
83.0%
-.07
85.3% 80.0% 82.2% 86.8%
+.002
-.09
-.12
+.13
Poor discrimination for correct choice “B”. Choice “E” actually does a
better job discriminating. Would review item for proper keying, ambiguous
wording, proper wording of alternatives, etc. This item needs revision.
4/13/2015
TEST CONSTRUCTION Workshop
195
4/13/2015
TEST CONSTRUCTION Workshop
196