Item Response Theory

Transcript Item Response Theory

Item Response Theory
Shortcomings of Classical True
Score Model
•
•
•
•
Sample dependence
Limitation to the specific test situation.
Dependence on the parallel forms
Same error variance for all
Sample Dependence
• The first shortcoming of CTS is that the values of
commonly used item statistics in test development
such as item difficulty and item discrimination
depend on the particular examinee samples in
which they are obtained. The average level of
ability and the range of ability scores in an
examinee sample influence, often substantially, the
values of the item statistics.
• Difficulty level changes with the level of sample’s
ability and discrimination index is different
between heterogeneous sample and the
homogeneous sample.
Limitation to the Specific Test
Situation
• The task of comparing examinees who have
taken samples of test items of differing
difficulty cannot easily be handled with
standard testing models and procedures.
Dependence on the Parallel
Forms
• The fundamental concept, test reliability, is
defined in terms of parallel forms.
Same Error Variance For All
• CTS presumes that the variance of errors of
measurement is the same for all examinees.
Item Response Theory
• The purpose of any test theory is to describe how
inferences from examinee item responses and/or
test scores can be made about unobservable
examinee characteristics or traits that are
measured by a test.
• An individual’s expected performance on a
particular test question, or item, is a function of
both the level of difficulty of the item and the
individual’s level of ability.
Item Response Theory
• Examinee performance on a test can be predicted
(or
explained)
by
defining
examinee
characteristics, referred to as traits, or abilities;
estimating scores for examinees on these traits
(called "ability scores"); and using the scores to
predict or explain item and test performance.
Since traits are not directly measurable, they are
referred to as latent traits or abilities. An item
response model specifies a relationship between
the observable examinee test performance and the
unobservable traits or abilities assumed to underlie
performance on the test.
Assumptions of IRT
• Unidimensionality
• Local independence
Unidimensionality Assumption
• It is possible to estimate an examinee's ability on
the same ability scale from any subset of items in
the domain of items that have been fitted to the
model. The domain of items needs to be
homogeneous in the sense of measuring a single
ability: If the domain of items is too heterogenous,
the ability estimates will have little meaning.
• Most of the IRT models that are currently being
applied make the specific assumption that the
items in a test measure a single, or unidimensional
ability or trait, and that the items form a
unidimensional scale of measurement.
Local Independence
• This assumption states that an examinee's
responses to different items in a test are
statistically independent. For this
assumption to be true, an examinee's
performance on one item must not affect,
either for better or for worse, his or her
responses on any other items in the test.
Item Characteristic Curves
• Specific assumptions about the relationship
between the test taker's ability and his
performance on a given item are explicitly
stated in the mathematical formula, or item
characteristic curve (ICC).
Item Characteristic Curves
• The form of the ICC is determined by the
particular mathematical model on which it
is based. The types of information about
item characteristics may include:
• (1) the degree to which the item
discriminates among individuals of differing
levels of ability (the 'discrimination'
parameter a);
Item Characteristic Curves
• (2) the level of difficulty of the item (the
'difficulty' parameter b), and
• (3) the probability that an individual of low
ability can answer the item correctly (the
'pseudo-chance' or 'guessing' parameter c).
• One of the major considerations in the
application of IRT models, therefore, is the
estimation of these item parameters.
ICC
Probability
Ability Scale
• pseudo-chance parameter
c: p=0.20 for two items
• difficulty parameter b:
halfway between the
pseudo-chance parameter
and one
• discrimination parameter a:
proportional to the slop of
the ICC at the point of the
difficulty parameter The
steeper the slope, the
greater the discrimination
parameter.
Ability Score
• 1. The test developer collects a set of observed
item responses from a relatively large number of
test takers.
• 2. After an initial examination of how well
various models fit the data, an IRT model is
selected.
• 3. Through an iterative procedure, parameter
estimates are assigned to items and ability scores
to individuals, so as to maximize the agreement, or
fit between the particular IRT model and the test
data.
Ability Score
Item Information Function
• The limitations on CTS theory approaches to
precision of measurement are addressed in the IRT
concept of information function. The item
information function refers to the amount of
information a given item provides for estimating
an individual's level of ability, and is a function of
both the slope of the ICC and the amount of
variation at each ability level.
• The information function of a given item will be at
its maximum for individuals whose ability is at or
near the value of the difficulty parameter.
Item Information Function
Item Information Function
Item Information Function
• The information function of a given item will be at
its maximum for individuals whose ability is at or
near the value of the difficulty parameter.
• (1) provides the most information about
differences in ability at the lower end of the ability
scale.
• (2) provides relatively little information at any
point on the ability scale.
• (3) provides the most information about
differences in ability at the high end of the ability
scale.
Test Information Function
• The test information function (TIF) is the sum of
the item information functions, each of which
contributes independently to the total, and is a
measure of how much information a test provides
at different ability levels.
• The TIF is the IRT analog of CTS theory
reliability and the standard error of measurement.
Item Bank
• If there is a need for regular test administration and
analysis, the construction of item bank may be
taken into consideration.
• Item bank is not a simple collection of test items
that is organized in their raw form, but with
parameters assigned on the basis of CTS or IRT
models.
• Item bank should also have a data processing
system that assures the steady quality of the data in
the bank (describing, classifying, accepting, and
rejecting items)
Specifications in CTS Item Bank
•
•
•
•
Form of items
Type of item parts
Describing data
Classifying data
Form of Items
• Dichotomous
Listening comprehension
Statement + question + choices
Short conversation +question + choices
Long conversation / passage + some questions + choices
Reading comprehension
Passage + some questions + choices
Passage + T/F questions
Syntactic knowledge / vocabulary
Question stem with blank/underlined parts + choices
Cloze
Passage + choices
Form of Items
• Nondichotomous
Listening comprehension
Dictation
Dictation passage with blanks to be filled
Describing data
•
•
•
•
Ability measured
Difficulty index
Discrimination
Storage code

Item Response Theory

Transcript Item Response Theory

Directory