Transcript Document

Technology to mark scripts
Scanners read the multiple-choice
responses and store the data in a
database.
The software then scores the
responses.
Technology to mark scripts
ETS
The CriterionSM Online Essay Evaluation Service
automatically evaluates essay responses using e-rater
and the Critique writing analysis tools.
 E-rater® gives holistic scores for essays.
 CritiqueTM provides real-time feedback about
grammar, usage, mechanics and style, and organization
and development.
 C-raterTM offers automated analysis of conceptual
information in short-answer, free responses.
FILE 4 examens-leacock.pdf (highlighted pages)
Using automated marking
• multiple choice marking done by machine.
• automated short-answer marking (character
recognition)
• process-based marking (increasing intelligence in
the marking engine and redesign of the tests)
Technology to mark scripts
E-marking by human raters
What is e-marking?
The story of Assurance and Qualification Alliance (AQA)
First e-marking in January 2005 (83 GCSE papers)
General / Expert marking (overseas?)
Technology to mark scripts
AQA
“This summer we intend to use electronic marking for 2
million scripts involving 2812 examiners in 102 GCSE
papers and eight GCE papers. A full list is provided in
the attached update to the questions and answers initially
provided last spring”. (2007)
(My personal involvement – describe how it works, table leaders,
seeded items, micro-management etc)
Technology to mark scripts
AQA
An alternative:

Automated Marking: Key-in all short responses and
then decide which are the correct. (double-entry)

Electronic mark collection
E-marking research
AQA: we have yet to establish that examiners are
able to assess long pieces of continuous writing
as effectively on screen as they are able on
paper.
Using on-screen marking
• Real-time quality marking, through early detection
and remediation of aberrant marking;
• random distribution of scripts and items to markers;
• specialisation of markers in a limited number of items;
• reduction of clerical errors, because the computer
sums the marks;
• elimination of paper distribution; and
• greater security (no papers lost or compromised).
• Item-level data available for diagnosis / reporting
Using on-screen marking
Easy to use
Less beraucracy
(The markers have been trained)
Using on-screen marking

Remote Standardization: In face to face meetings, all examiners must
work through the standardisation material at the pace determined by the
Principal Examiner. The online system allows each individual to work at
their own pace, revisiting those aspects they find most challenging. Instant
feedback at individual item level is provided and the examiner can refer to
this at any point during the marking period

E-marking = on-screen marking
Technology to mark scripts
ETS Online Scoring Network (OSN)
ETS employs raters to evaluate essays online. Prospective
raters are trained to apply scoring criteria and must
complete a certification test.
Raters begin each session by practice scoring at least one
set of essays. If questions arise during calibration or
actual scoring, the rater and scoring leader can
telephone each other to discuss issues.
Quality of marking
“Quality control is a formal systematic process designed to
ensure that expected quality standards are achieved during
scoring, equating, and reporting of test scores”.
Allalouf, A. (2007). Quality control procedures in the Scoring,
Equating, and Reporting of test scores. Educational
Measurement: Issues and Practice, 26 (1), 36-43.
Quality of marking
The citation demonstrates the implications of time pressure on the staff. It is
attributed to Joanne M. Lenke, the former president of Harcourt, in
referring to a huge error of judgment that affected some 257,000
examinees in 1999.
“[The error] might have been caught if the
company had more than two days to analyze data from 4.3
million test forms . . . before Wednesday’s deadline for passing
results in the Internet”
(Colvin & Groves, 1999; Rhoades
& Madaus, 2003).
Quality of marking
May 2001, the ‘New Yorker Times’ published an article
with the title “Speed over accuracy?”
Quality of marking
USA:
On September 15, 1999, days after school started, McGraw Hill
admitted that the same error that affected Tennessee scores
had also incorrectly lowered New York City students’
percentile rankings at the lower end of the scale.
Thus, 8,668 children whose correct scores were above the cutoff had mistakenly been compelled to go to summer school.
Quality of marking
Scotland:
In the summer of 2000, exam scores arrived late, and
inaccuracies were reported.
By mid-August, over 5,000 potential university students had
received incomplete or inaccurate results.
University placements (admittance decisions) were down by
almost 9%, … students might not be granted their first
choice
Quality of marking
In 2000, Pearson's NCS subsidiary made a scoring error on the
Minnesota high school graduation test that wrongly denied
50 students the diplomas they had earned. After a
Minnesota court concluded that "the error was preceded by
years of quality control problems at NCS" due to a "culture,
emphasizing profitability and cost-cutting," Pearson created
a settlement pool of $7 million for test takers and paid
$16,000 to each of the seniors who missed graduation.
Schaeffer, R. (2006). Testimony of Robert Schaeffer before the
New York Senate Higher Education Committee. Reached at
http://www.fairtest.org/univ/SAT_Error_testimony.html
Consistency of marking
Double marking: Cyprus, Slovenia, Greece, Australia,
New Zealand, Russia
In England, PricewaterhouseCoopers (2005) announced
that the awarding bodies double-mark an average of
12% of scripts as part of their quality assurance
procedures.
Consistency of marking
 Disagreement between markers. Why it happens.
Martin, D. (2005). Report on the performance of the New Zealand
Qualifications Authority in the delivery of secondary school qualifications.
State Services Commision. Reached at
http://www.ssc.govt.nz/upload/downloadable_files/part-two-reviewnzqa.pdf on 13/2/2007. (code 596, check page 49)
 Discrepancy resolution techniques
Johnson, R. L., Penny, J., Fisher, S., Kuhs, T. (2003). Score resolution: an
investigation of the reliability and validity of resolved scores. Applied
Measurement in Education, 16 (4), 299-322.
Consistency of marking
The QCS Test (Queensland Core Skills Test)
Writing Task paper: Each response is marked at least
three times by different markers working
independently. Monitoring marker consistency.
Short Response paper: Each candidate response is
marked at least twice by different markers working
independently (referee marking). Monitoring marker
consistency.
Multiple Choice papers are marked electronically
Assessment for learning
In classrooms where assessment for learning
is practiced, students know at the outset of a
unit of study what they are expected to learn.
Assessment for learning
The teacher will work with the student to
understand what she or he knows about the
topic as well as to identify any gaps or
misconceptions
Assessment for learning occurs at all stages of
the learning process
This is
the
tirte
Demonstration of Reports
Sdfsdf
Sdfsd
Fsdf
sdfsd