LG675_5 - University of Essex

Transcript LG675_5 - University of Essex

LG675
Session 5: Reliability II
Sophia Skoufaki
[email protected]
15/2/2012

What is item analysis?

How can we conduct item analysis for
a) norm-referenced data-collection instruments?

Only statistical analyses provided through SPSS
b) criterion-referenced data-collection measures?

How can we examine the reliability of criterionreferenced data-collection instruments?

Work with some typical scenarios
2
Item analysis: definition
 The
kind of reliability analysis used to identify
items in a data-collection instrument (e.g.,
questions in a questionnaire, tasks/questions
in a language test) which do not measure the
same thing as the other items.
 It
is conducted on data from the pilot study.
The aim is to improve our data-collection
instrument by removing any irrelevant items.
3

NB: This item analysis is different from item
analysis (also called ‘analysis by items’) which is
part of data analysis in experiments. This
analysis is done to ensure that the findings of an
experiment are generalisable not only to people
with similar characteristics to those who
participated in the experiment but also to items
similar to those in the experiment (Clark 1973).

If you plan to conduct an experiment, see Phil’s
discussion of this term and SPSS how-to:
http://privatewww.essex.ac.uk/~scholp/statsquib
s.htm#item
4
Reminder: Classification of datacollection instruments according to the
basis of grading
Data-collection
instruments
Normreferenced
Criterionreferenced
5
Item analysis for norm-referenced measures

1.
•
According to the traditional approach to item
analysis, items are examined in terms of:
Item facility: It is a measure of how easy an
item is. High facility means easy item.
An easy way to assess it is by looking at
the percentage of people who answer
each item correctly.
The data-collection instrument as a whole
should have facility of 0.5 and most items
should have around such a level of facility.
6
Understanding item facility





This is an activity from
http://www.caacentre.ac.uk/dldocs/BP2final.pdf
Input the file ‘three_tests_IF.sav’ into SPSS.
This file shows the item facility for each question
in three tests.
Examine the item facilities in each test and try to
spot problematic item facilities.
Which test seems to be the best in that it
contains items which will be able to distinguish
among students of various proficiency levels?
7
Item analysis for normreferenced measures (cont.)
2.
•
•
Item discrimination: It is a measure of
how different performance on an item in
comparison to performance on the other
items.
It can be assessed via a correlation
between the item’s score and the score of
the whole measure.
It can also be assessed via Cronbach’s a if
item deleted.
8
SPSS: Item analysis for normreferenced measures
Do the activity described in the box on
pages 26-27 from Phil’s ‘Simple statistical
approaches to reliability and item analysis’
handout.
 Then do the activity described in the box
on pages 29-30.
 Calculate also item facility as a percentage
of correct answers.

9
Item analysis for criterion-referenced
measures (Brown 2003)
Difference Index: Item facility in the posttest – item facility in the pre-test
 B-Index: Item facility for students who
passed the test – item facility for those
who failed it

10
SPSS: Item analysis for criterion-referenced
measures




This is an activity from Brown (2003). He used
excel to calculate DI and B-I on two data sets.
Download this article as a pdf file from
http://jalt.org/test/bro_18.htm
Input the data from page 20 in SPSS.
Calculate DI via Transform…Compute.
11
Reliability of criterion-referenced
measures

1.
There are two basic approaches:
Threshold loss agreement
This approach examines the proportion of
people who consistently did better than the
cut-off point (‘masters’) and the proportion of
those who consistently did worse (‘nonmasters’). It uses a test-retest method.
Example statistic: Cohen’s Kappa (AKA ‘kappa
coefficient’)
12
The structure of Cohen’s kappa table in
this scenario (figure from Brown and
Hudson 2002: 171)
13
Reliability of criterion-referenced
measures (cont.)
2.
Squared error loss agreement
These statistical tests are like the
previous ones but they also assess
how consistent the degree of
mastery/non-mastery is.
Example: phi(lamda) dependability index
(Not available in SPSS, see Brown 2005: 206207)
14
SPSS: Assessing reliability of a criterionreferenced measure through Cohen’s Kappa

Go to page 172 at
http://books.google.co.uk/books?id=brDfGghl3qI
C&pg=PA169&source=gbs_toc_r&cad=3#v=one
page&q&f=false.
Input the data in SPSS.

Conduct the Kappa test.

15
Next week

Statistics for validity assessment

ANOVA with one independent variable
16
References




Brown, J.D. 2003. Criterion-referenced item analysis (The difference index
and B-index). Shiken: JALT Testing & Evaluation SIG Newsletter 7 (3) , 1824.
Brown, J.D. 2005. Testing in language programs: a comprehensive guide to
English language assessment. New York: McGraw Hill.
Clark, H.H. 1973. The language-as-fixed-effect fallacy. Journal of Verbal
Learning and Verbal Behavior 12, 335-359.
Scholfield, P. 2011. Simple statistical approaches to reliability and item
analysis. LG675 Handout. University of Essex.
17
Suggested readings
On the statistics used for item analysis


Brown, J.D. 2003. Criterion-referenced item analysis (The difference
index and B-index). Shiken: JALT Testing & Evaluation SIG Newsletter
7 (3) , 18-24.
Scholfield, P. 2011. Simple statistical approaches to reliability and item
analysis. LG675 Handout. University of Essex. (pp. 24-33)
On the statistics used to assess the reliability of criterion-referenced
measures


Brown, J.D. 2005. Testing in language programs: a comprehensive
guide to English language assessment. New York: McGraw Hill.
(chapter 9)
Brown, J.D. and Hudson, T. 2002. Criterion-referenced language
testing. Cambridge: Cambridge University Press. (chapter 5)
18

LG675_5 - University of Essex

Transcript LG675_5 - University of Essex

Directory