Transcript Slide 1

1
Designing an
assessment system
Presentation to the Scottish Qualifications
Authority, August 2007
Dylan Wiliam
Institute of Education, University of London
www.dylanwiliam.net
Overview
•
•
•
•
•
•
•
•
•
3
The purposes of assessment
The structure of the assessment system
The locus of assessment
The extensiveness of the assessment
Assessment format
Scoring models
Quality issues
The role of teachers
Contextual issues
Functions of assessment
Three functions of assessment:
• For evaluating institutions (evaluative)
• For describing individuals (summative)
• For supporting learning
– Monitoring learning: Whether learning is taking place
– Diagnosing (informing) learning: What is not being learnt
– Forming learning: What to do about it
No system can easily support all three functions
• Traditionally, we have grouped the first two, and ignored the third
– Learning is sidelined; summative and evaluative functions are weakened
• Instead, we need to separate the first (evaluative) from the other two
4
The Lake Wobegon effect
“All the women are strong, all the men are good-looking, and all the children are
above average.” Garrison Keillor
Scores
•
5
Time
Goodhart’s law
All performance indicators lose their usefulness when used as objects
of policy
• Privatization of British Rail
• Targets in the Health Service
• “Bubble” students in high-stakes settings
6
Reconciling different pressures
The “high-stakes” genie is out of the bottle, and we cannot put it back
The clearer you are about what you want, the more likely you are to get
it, but the less likely it is to mean anything
The only thing left to us is to try to develop “tests worth teaching to”
This is fundamentally an issue of validity.
7
Validity
Validity is a property of inferences, not of assessments
“One validates, not a test, but an interpretation of data arising from a specified
procedure” (Cronbach, 1971; emphasis in original)
• No such thing as a valid (or indeed invalid) assessment
• No such thing as a biased assessment
• A pons asinorum for thinking about assessment
8
Threats to validity
Inadequate reliability
Construct-irrelevant variance
• The assessment includes aspects that are irrelevant to the construct of interest
– the assessment is “too big”
Construct under-representation
• The assessment fails to include important aspects of the construct of interest
– the assessment is “too small”
With clear construct definition all of these are technical—not value—issues
9
Two key challenges
Construct-irrelevant variance
• Sensitivity to instruction
Construct under-representation
• Extensiveness of assessment
10
Sensitivity to instruction
1 year
11
Distribution of attainment on an item
highly sensitive to instruction
Sensitivity to instruction (2)
1 year
12
Distribution of attainment on an item
moderately sensitive to instruction
Sensitivity to instruction (3)
1 year
13
Distribution of attainment on an item
relatively insensitive to instruction
Sensitivity to instruction (4)
14
1 year
Distribution of attainment on an item
completely insensitive to instruction
Consequences (1)
15
Consequences (2)
16
Consequences (3)
17
Insensitivity to instruction
Primarily attributable to the fact that learning is slower than assumed
Exacerbated by the normal mechanisms of test development
Leads to erroneous attributions about the effects of schooling
18
A sensitivity to instruction index
Test
IQ-type test (insensitive)
0
NAEP
6
TIMSS
8
ETS “STEP” tests (1957)
8
ITBS
Completely sensitive test
19
Sensitivity index
10
100
Extensiveness of assessment
Using teacher assessment in certification is attractive:
• Increases reliability (increased test time)
• Increases validity (addresses aspects of construct under-representation)
But problematic
• Lack of trust (“Fox guarding the hen house”)
• Problems of biased inferences (construct-irrelevant variance)
• Can introduce new kinds of construct under-representation
20
The challenge
To design an assessment system that is:
• Distributed
– So that evidence collection is not undertaken entirely at the end
• Synoptic
– So that learning has to accumulate
21
A possible model
All students are assessed at test time
Different students in the same class are assigned different tasks
The performance of the class defines an “envelope” of scores, e.g.
• Advanced: 5 students
• Proficient: 8 students
• Basic: 10 students
• Below basic: 2 students
Teacher allocates levels on the basis of whole-year performance
22
Benefits and problems
Benefits
• The only way to teach to the test is to improve everyone’s performance
on everything (which is what we want!)
• Validity and reliability are enhanced
Problems
• Students’ scores are not “inspectable”
• Assumes student motivation
23
The effects of context
Beliefs about what constitutes learning;
Beliefs in the reliability and validity of the results of various tools;
A preference for and trust in numerical data, with bias towards a single number;
Trust in the judgments and integrity of the teaching profession;
Belief in the value of competition between students;
Belief in the value of competition between schools;
Belief that test results measure school effectiveness;
Fear of national economic decline and education’s role in this;
Belief that the key to schools’ effectiveness is strong top-down management;
24
Conclusion
There is no “perfect” assessment system anywhere. Each nation’s
assessment system is exquisitely tuned to local constraints and
affordances.
Assessment practices have impacts on teaching and learning which
may be strongly amplified or attenuated by the national context.
The overall impact of particular assessment practices and initiatives is
determined at least as much by culture and politics as it is by
educational evidence and values.
25
Conclusion (2)
It is probably idle to draw up maps for the ideal assessment policy for a
country, even although the principles and the evidence to support such
an ideal might be clearly agreed within the ‘expert’ community.
Instead, focus on those arguments and initiatives which are least
offensive to existing assumptions and beliefs, and which will
nevertheless serve to catalyze a shift in them while at the same time
improving some aspects of present practice.
26
Questions?
Comments?
Institute of Education
University of London
20 Bedford Way
London WC1H 0AL
27
Tel +44 (0)20 7612 6000
Fax +44 (0)20 7612 6126
Email [email protected]