No Slide Title

Download Report

Transcript No Slide Title

Craft and science:
European and American
traditions in assessment
AEA-Europe 5th annual conference,
Budapest, Hungary; November 2004
Dylan Wiliam, ETS
What can we say about the
differences?
• Nothing really
• If you’re not confused, you don’t really
understand the situation
• USA and Europe each have ~50 systems
• Variability within is greater than difference
between
• But here goes…
Ludicrously simplistic comparison
European
American
Craft
Science
Achievement
Effort, Aptitude
Public
Private
Open
Closed
Constructed-response
Multiple-choice
Accepting authority
Litigious
Education in America
• Highly localized
–
–
–
–
–
50 states, 17 000 school districts, 100 000 schools
Education controlled and funded locally
High proportion of offices filled by election
Residential segregation
Huge discrepancies in per-student funding
• Lower Merrion school district: $19600 per year
• Rural Arizona: $3000 per year
• Grade-based system, but not operated as such
• Structure of ethnicity and class quite different
from Europe
Assessment in schools
• European tradition:
– Examination-based
– Synoptic
– Focus on achievement
• American tradition:
– Coursework-based
– Component-based
– Focus on effort
• Correlation of IQ with school achievement:
– UK: ~0.70
– US: ~0.45
Quality in education
• US 20th century industrial success
– Based on a mass education system of
moderate quality
• European emphasis:
– Elite education of high quality
– Scaled up without substantial loss of quality
Assessment for accountability
• Demand for accountability
– (don’t let the fox guard the chicken coop)
• Introduction of accountability tests in US
– largely by private-sector publishers
• Profit margin on tests: 0 — 5%
• Profit margin on textbooks: ~40%
High-stakes assessment
• Assessment can be high-stakes for
– Students
– Teachers and schools
• In Europe, high-stakes assessment of
students has been used to evaluate teachers
• In the USA assessment of schools has been
broadened to matter for students
Education and the law
• Key issues:
– Precision in law-making
• Constitution, Bill of Rights,
– Availability of appeals and remedies
• Litigation, ‘Grade court’
– Recovery of defendants’ costs
• Possible in most of Europe
• Not possible in most states in the USA
Entry to higher education
• Key issues
– Selection
– Placement
• Combined in most European countries
• Separate in the USA
Origins of intelligence assessment
• British empiricist tradition
–
–
–
–
Knowledge comes from experience (outside)
Tests of sensory acuity (Galton, Cattell)
Innate differences in acuity of individuals
Focus on measurement
• Continental rationalist tradition
–
–
–
–
Knowledge comes from reasoning (inside)
Tests of reasoning (Binet)
All students share the same trajectory, at different speeds
Focus on classification
The big test
• Binet & Simon’s ideas brought to USA by
Goddard
• US army recruits 3m new soldiers in 1917
• Yerkes proposes testing for the ‘feebleminded’
• Terman proposes testing all recruits
• Otis develops the multiple choice format
• 1 726 966 recruits tested by January 1919
• No use made of results
• But mass group testing is here to stay
The development of the SAT
• 1920: College Entrance Examinations
Board sets up a commission:
to investigate and report on general intelligence examinations
and other new types of examinations offered in several secondary
school subjects
• After several more commissions …
• Scholastic Aptitude Test administered to
College Board applicants in 1926
Technical issues in the SAT
• Key issues:
– Interpretability over time
• All 82 versions of the SAT administered
between April 1942 and May 1969 were
equated to the original norm group taking the
test in 1941
– Legal defensibility
Reliablity
• Consistency under changes in Europe
– occasion (test-retest)
– scorer (mark-remark)
– items (question-requestion)



USA



Speech acts
• Perlocutionary speech acts are statements
about what was, is or will be (eg Michael
knows his number bonds to 10)
• Illocutionary speech acts are
performative: they create social facts (eg
“I now pronounce you husband and
wife”)
Social facts
Interviewer: Did you call them the way
you saw them, or did you call them
the way they were?
Umpire: The way I called them was the
way they were.
Assessments as speech acts
• Assessments in the US are treated as
perlocutionary speech acts
• Assessments in Europe are treated as
illocutionary speech acts
• That’s why there is no measurement
error in Europe
Item-response modelling
• All test theories assume an item response model
– Classical test theory assumes a flat line
– Gutman scaling assumes a step function
– All real items are somewhere between the two
• US modellers assumed a logistic curve
– Computationally tractable (if unidimensionality is
also assumed)
– Can be made very close to cumulative normal
• Others question these assumptions
– e.g. Goldstein (1979, 1980, 1982, 1989)
Assessment formats
• Debates about assessment formats are often
disguised debates about constructs
– Bias is a property of inferences, not tests
– So, multiple-choice tests are not biased
• Multiple-choice vs Constructed-response
– CR items yield more information, but take longer
– MC items yield more information per minute
– Fewer items means more student-task effects
• Correlations between MC and CR formats are
high, but can change (eg NAEP)
• Reliance on MC items has backwash effects
Effect of assessment format
0.15
0.10
M 0.05
e
d -0.00
i -0.05
a
-0.10
n
-0.15
science
social studies
mathematics
constructed-response
multiple-choice
language arts
-0.20
0.1
0.2
0.3
0.4
Interquartile range
0.5
0.6
Standard setting
• Test-centred vs. examinee-centred
– Key issue: do you set the cut-score before
or after you see the results
• Policy-oriented vs. evidence-oriented
– Key issue: do you adjust the cut-score to fit
the test, or adjust the test to fit the cut-score
Standard setting
Policy-oriented
Evidence-oriented
Graded
assessment,
mastery tests
Angoff, Nedelsky,
Ebel
Examinee-centred UK public
examinations
Borderline
groups,
contrasting groups
Test-centred
Not invented here syndrome…
• Constructivism
• Standards-based assessment/Outcomes
based assessment
Standards-based assessment
• What?!
• Originally criteria for high-school
diploma set locally
• Introduction of state tests
• In many (most?) cases state tests are not
aligned to district curricula
No Child Left Behind Act
• Reauthorization of the Elementary and
Secondary Education Act (ESEA)
– Commanded bi-partisan support
– Not a plot to declare all state schools failing
• States must establish state standards
– But are free to decide how to do this
– Huge differences in standards
• Students tested
– Language and maths grades 3 to 8 and in high school
– Science 3 times (in grades K-5, 6-8, 9-12)
Key features of NCLB
• All students to be ‘proficient’ by 2014
– Achievement rather than growth
– States determine intermediate steps to this goal
• Some states opt for steady progress
• Others go for ‘Balloon payments’
• Each year, each school must make adequate yearly
progress to this goal
– Cohort based
– Disaggregation of key groups
• Students with special needs
• Ethnic minorities
• Language learners
• Failure to achieve AYP has profound impact
Exit from higher education
• Key issues
– Qualification
– Licensure
• Combined in most European countries
• Separate in the USA
The mangle of practice
• Andrew Pickering (1995)
– Critique of traditional views of science
• Science is what scientists do
• Science as a series of truths waiting to be found
– The development of traditions of
assessment are not just bound up in culture
– They are the result of messy, contingent,
fragile, politically and personally
influenced events
In summary
• Viewed from outside, any national assessment system
seems to work in practice, but not in theory
• Assessment systems are much smarter than they appear…
• …and are exquisitely attuned to the constraints and
affordances provided by the contexts in which they
operate.
• We can learn from them, but we cannot import them