Reliability Lesson Seven

Download Report

Transcript Reliability Lesson Seven

Lesson Seven
Reliability
Contents



Definition of reliability
Indication of reliability: Reliability coefficient
Ways of obtaining reliability coefficient:






Alternate/Parallel forms
Test, retest
Split-half
(Inter-) rater (or scorer) reliability
Two ways of testing reliability
How to make sure the test is reliable
Definition of Reliability


“The consistency of measures across
different times, test forms, raters,
and other characteristics of the
measurement context” (Bachman,
1990, p. 24).
The accuracy or precision with which
a test measures something;
consistency, dependability, or stability
of test results.
Reliability coefficient (r)


To quantify the reliability of a test 
allow us to compare the reliability of
different tests.
0 ≤ r ≤ 1 (ideal r= 1, which means the test gives
precisely the same results for particular testees
regardless of when it happened to be administered).


If r = 1 : 100% reliable
A good achievement test: r>= .90
Alternate/Parallel forms: the most stringent form

Two forms, two
administrations


Equivalent forms (i.e.,
different items testing the
same topic) taken by the
same test taker on
different days
If r is high, this test is said
to have good reliability.
Test plan
Form A
Form B
Test, retest
One
form, two administrations
Test A
Trial 1


Trial 2
The same test is administered to the
same testees with a short time lag, and
then calculate r.
Appropriate for highly speeded test
Split-half (Spearman-Brown Procedure)



One test, one administration
Split the test into halves (i.e., odd
questions vs even questions) to
form two sets of scores.
Internal consistency (also: KR-21, KR-20)
Q1
Q2
Q3
First Half
Q4
Q5
Q6
Second Half
(Inter-) rater (or scorer) reliability



Needed for subjective tests (e.g.,
writing, oral tests) when two or
more independent raters are
involved in scoring.
Raters should be trained before
scoring.
Compare the scores of the same
testee given by different raters. If
r= high, there’s inter-rater reliability.
Ways of testing reliability

Examine the amount of variation



Standard Error of Measurement (SEM)
The smaller the better
Calculate “reliability coefficient”


“r”
The bigger the better
How to make sure the test is reliable
for teachers










Take enough samples of behavior
Try to avoid ambiguous items
Provide clear and explicit instructions
Well layout
Provide uniform and undistracted condition
Try to use objective tests
Try to use direct tests
Have independent, trained raters
Try to identify the test takers by number, not by
names
Try to have more multiple independent scoring in
subjective tests
(Hughes, 1989, pp. 36-41).