Transcript Document
Oxford University Centre for Educational Assessment
Assessment & Learning: fields apart?
Jo-Anne Baird, Therese Hopfenbeck
David Andrich & Gordon Stobart
July 16, 2015
The need for a review on Assessment and Learning
• Knowledge Economy
– economic importance
• The Audit Society
– important societal control function
– assessment defines what counts as valuable learning through these
mechanisms
• Multiple, high-stakes
– assessment’s domination over learning
• Assessment is agenda-setting
• 21st Century has seen interesting developments already
• Build cumulatively on what has already been done
What do we mean by theory - functions
Abstraction
Abductive reasoning
Wallander has a theory that Schwarzman killed Inga
Distinguished from practice
It’s just a theory
Normative
How things ought to be
Explanatory
Descriptive
May be causal
May be predictive
May be formalised in logic or mathematical equations
July 16, 2015
Page 3
What do we mean by theory - focus
Scientific theory
Relates to empirical phenomena
Has an internal logic
Should be empirically testable
Substantive theory
Learning
Developmental psychology
Test theory
Metrology
Psychometrics
July 16, 2015
Page 4
Overview
1.
Relationships between substantive
learning theory & assessment
2.
Theoretical and philosophical
dilemmas
1.
Case Studies - applications
International tests
Assessment for Learning
2.
Conclusions
July 16, 2015
Page 5
Oxford University Centre for Educational Assessment
Assessment and Learning
Jo-Anne Baird & David Andrich
July 16, 2015
Behaviourist theory of learning
Learning is demonstrated in behaviour
Mental processes are not important
Study of animals tells us about human learning
(eg rats & pigeons)
Learning as a reaction to stimuli in the
environment, such as teaching
Behaviourist approach to assessment
Control conditions
Measure memory for facts
Compare performance with criteria or norms
Global score for performance on ability in
subject area
Norm- or criterion-referenced
Cognitive-constructivist theory of learning
Learning occurs in the brain
Cognition, especially meta-cognition important
Memorisation of facts not so impressive
Building of mental models of the world
Integrate and build upon previous knowledge and
learning
Novice-expert differences
Cognitive-constructivist assessments
Higher order skills
synthesis
Evaluation
Problem
solving
Extended tasks
Assessed in terms of novice-expert continuum
Socio-cultural theory of learning
Learning is a social event
Learning is situated and contextdependent
Learning is value-laden
Learning does not happen within, but
between people
Socio-constructivist approach to assessment
Holistic, qualitative feedback emphasised
Authentic tasks important
Groups as well as individuals assessed
Self- and peer-assessment important
Engagement with criteria
Theories of learning and assessment practices
Have learning theory and assessment
practice informed each other?
Or have they been growing apart?
Chronology and relationships
Learning theories have been contemporaneous
Cognitive psychology superseded behaviourism
Vygotsky’s work overlapped considerably with behaviourist thinking
Cognitive constructivism and social constructivism
Links between forms of assessment and learning theory not
clear
e.g.Multiple choice format can be used to assess cognition
Implications of theories for assessment practice not
straightforward
e.g. Skinner (1989)
Good instruction demands two things: students must be told
immediately whether what they do is right or wrong and, when right,
they must be directed to the step to be taken next.
July 16, 2015
Page 14
Assessment and psychometrics: they are different
Lawn (2008) Crossing the Atlantic – history of the development
of different approaches to assessment in European countries
and the US
Early 80s ‘Rasch wars’ in the UK
Nuttall, Gipps, Broadfoot, Black, Harlen … argued for a more
educationally sound approach to assessment that was learnercentred
Baird & Black (2013) outlined how psychometrics does not fit
well with a range of educational assessment purposes:
curricula change (construct), criteria public, correlation between
questions, qualification-focus (not item), multiple dimensions in
tests, pre-testing not always feasible …
Psychometrics literature – how has it related to learning theory
July 16, 2015
Page 15
Psychometric models – representational measurement
The Ferguson Committee – 1940 British Association for the Advancement
of Science
No evidence that psychological assessments were quantitative
Based upon Campbells 1920s arguments against psychophysics
Rebuttal by Stevens (1946), giving us the different forms of ‘measurement’ – nominal,
ordinal, interval, ratio [learning can be measured in each of these ways]
Measurement as a product of the instrument
Luce and Tukey’s additive conjoint measurement – mathematical proof
that ratio scales could be produced from transformations of ordinal
variables.
BUT the assumptions of additive conjoint measurement are not met by
assessment data.
For example, transitivity (If A>B and B>C then A>C): item parameters change
across time and sub-populations
Michell (1997) says this isn’t measurement in the scientific sense, but he
says that about all of psychological assessment and by implication
educational assessment
July 16, 2015
Page 16
Psychometric models – classical test theory
Based upon centuries of statistical work
Lord and Novick (1968) made the great leap forward
BUT suffers from the same problems as representational theory
True score = learning part of the equation, but what is it? Theoretical
July 16, 2015
Page 17
Psychometric models – latent trait theory
Come from factor analytic models produced by Spearman (1904)
One parameter (difficulty), two parameter (& discrimination), three
parameter (& guessing) typically used.
Multi-dimensional forms also available
Only the Rasch form (one parameter) can deal with the transitivity
problems
Unlike in the representational theory approach, Rasch is probabilistic, so
deviations from the model are handled
More than one parameter causes problems for transitivity
BUT still does not deal with Michell’s criticisms
Are psychological constructs quantifiable?
Not just a problem for psychometrics – for the assessment field broadly
July 16, 2015
Page 18
What does it mean to assess attainment?
Cronbach & Meehl (1955)
A construct is some postulated attribute of people, assumed to be
reflected in test performance. In test validation the attribute about
which we make statements in interpreting a test is a construct.
Educational attainment constructs – grading & scoring scheme
Assessment by association
Dimensionality
All based upon correlation
How can we combine different things?
Invariance
Do the scores mean the same thing across tests and groups of
students?
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 19
Physics envy
Michell (2008) – requirements for real numbers should be
satisified
Kane (2008)
Educational assessments do not meet these strictures
Physics is held up as an idealistic example of measurement,
but
Realist - numbers exist independently from humans
Physical measurements took a long time to develop – theories &
apparatus
Measurement in physics shows inconsistency across too (Hedges,
1987)
Need externality to our measures & multiple ways of
measuring to substantiate that there is a real phenomenon
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 20
Creation of constructs
1.
2.
3.
4.
Theory-based
Emprically-driven
Subject-matter expert devised
Policy-driven
Agenda-setting activity
“Different methods and theories have implications for the ways in which
concepts such as learning or educational reform or fairness are
formulated, studied and promoted as a practical activity. Perhaps more
profoundly and subtly, these methods and theories affect the ways human
beings are represented and, ultimately the ways they come to understand
themselves and others … Moss (2005)
“…it may not always be clear to what extent an attribute is conceptually
independent of the methods of measurement, especially in human
science applications.” Maul (2013)
July 16, 2015
Page 21
Philosophical position
Field is essentially modernist, Borsboom claims realist
too
Attributes have an independent existence
They are discoverable using scientific methods
Neopragmatic, postmodern test theory
Agnostic as to the nature of the attributes and their independent
existence
Accept that the attributes might be defined in part, or entirely, by the
assessment apparatus
Use triangulation to advance knowledge of the attribute and
measurement system
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 22
Fields apart
Goldstein, Laming and others have pointed out that there is no
theory in test theory – only mathematical models
McGrath (2005) – psychometrics has caused the problem
Borsboom blames psychologists for lack of theory underlying
psychological tests
Andrich has argued that it should be a collaboration between
substantive theorists and psychometricians
Sijtsma (2006) – models of learning that underpin test design
are often either not referred to or remain in puberty, infancy or
even at the foetal stage
Has psychometrics (or assessment) helped us to understand
learning?
July 16, 2015
Page 23
An answer to somebody else’s problem –
Baird & Black (2013)
Measurement systems have their own, internal logic
Measurement doesn’t tell us about the phenomenon of interest
Nothing about a set of numbers tells you what they measure
(Maraun, 1988)
Educational attainment – setting out the construct has an
intentional element
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 24
How could assessment inform learning theory and vice versa?
Empirical data can test theory and help to move it forward
We do not seem to have used it systematically
Craft knowledge of examiners and other educators – raises questions
about what kind/level of theory we expect from assessment data
Mark Wilson making serious attempts to do this
Dependency of the data upon curriculum exposure and other population
characteristics leads to problems with invariance: frame of reference
needs to be taken more seriously
Sociocultural learning theory do not fit well with standardisation-asfairness principles of assessment
Assessment is an expression of educational values
July 16, 2015
Page 25
Oxford University Centre for Educational Assessment
International tests & learning
Therese Hopfenbeck & Jo-Anne Baird
July 16, 2015
International tests
Learning
theory
Constructs
• Cognitive
• Uni-dimensional
Quantifiability • Modernist, realist
Invariance
• Crucial to their
interpretation
International tests influence learning based
upon three processes
1) what counts as valuable learning
2) how national assessment systems are developed around the
world
3) how students approach learning since there is evidence that
students adopt their learning approaches according to the tests
given
Example: the case of Norway
THE CURRICULA APPROACH – TIMSS,
Science, grade 8
Most underground caves are formed by the
action of water on
a) Granite
b) Limestone
c) Sandstone
d) Shale
Cognitive domain: Factual Knowledge, main topic Earths structure and physical features.
How international tests have influenced
national assessment systems
1)
2)
3)
4)
5)
Norway : Introduced national tests in 2004. The reading tests
are based upon the PISA reading framework (Frones et al
2012).
Denmark: Introduced national tests after low performing in
PISA (Egelund, 2008).
Japan: changed item format on their national tests to more
open-responses like those in PISA (Schleicher, 2009).
Korea: PISA like tasks on their University Entrance Exam
(Schleicher 2009).
Germany: introduction of national educational standards and
more focus upon external assessment (Ertl, 2006)
Literature review in three steps
Step 1
Broad search AHELO, PIAAC, PIRLS, PISA, TIMSS
More than 1000 articles detected.
Step 2
Narrowing to peer-reviewed (805)
No grey literature, but categorized reports, book reviews and
conference papers separately.
Step 3
Quality checks of relevant articles based upon reading abstracts.
Developing categories based upon model-article and coding in
End-note. Meetings with discussions on categories and coding.
Still in process of quality checking.
Published peer-reviewed articles from 1993 - 2013
140
120
100
80
60
40
20
0
1990
1995
2000
2005
2010
2015
Interest in
Science
Secondary
analysis of
student
questionnaire
data
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 36
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 37
Oxford University Centre for Educational Assessment
Formative assessment & learning
Gordon Stobart & Therese N. Hopfenbeck
July 16, 2015
Assessment for learning
Learning
theory
• Socio-cultural? (Cognitive)
Constructs
• Concepts, not constructs
Quantifiability
• Postmodern, qualitative
feedback (largely)
Invariance
• Not assumed
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 41
Elevers forutsetninger for å lære kan styrkes
dersom de
1.
2.
3.
4.
Forstår hva de skal lære og hva som er
forventet av dem.
Får tilbakemeldinger som forteller dem om
kvaliteten på arbeidet eller prestasjonen.
Får råd om hvordan de kan forbedre seg.
Er involvert i eget læringsarbeid ved blant
annet å vurdere eget arbeid og utvikling.
Fire prinsipper om vurdering, Utdanningsdirektoratet.
The challenges
Lack of theoretical consensus on AfL and formative
assessment
Few researchers use the original articles
Development of stories – which is flawed if you look at the
original work
July 16, 2015
Page 43
The vast majority of studies on AfL are small-scale action
research designs and are published in a wide range of journals.
A concern for the review is that current definitions of formative
assessment/AfL cover a wide range of teaching and learning
practices while research designs often lack an action theory (what
is causing change), often accompanied by a lack of systematic
data collection (for example baseline data before a research
initiative).
July 16, 2015
Page 44
Overselling?
Whilst claims for large effect sizes are regularly made in the
literature, the evidence for these has increasingly been critiqued,
for example by Bennett (2011) and Kingston and Nash (2011).
The effects of formative assessment upon learning have been
over-sold by some authors. This is unfortunate because the
limited empirical research suggests a modest, but educationally
significant, impact on teaching and learning.
July 16, 2015
Page 45
“Good instruction demands two things: students must be told
immediately whether what they do is right or wrong and, when
right, they must be directed to the step to be taken next. We
presume that Skinner meant that students be directed to the next
step even if they were wrong, although he did not write that”.
B.F. Skinner, in a letter to Science in 1989
July 16, 2015
Page 46
Conclusions
• Assessment models need to have a closer relationship with learning
• Educational assessment is a social construct – it is often an intentional,
agenda-setting activity
• Test theory provides statistical models which will vary in utility with context
• Cognitive learning theory is the current model in assessment
• International tests influence learning through policy
• Assessment for learning influences learning through practice
• Assessment outcomes have powerful effects – the numbers have a life of
their own. Unless we change our practices, the effects will be more
detrimental upon learning
July 16, 2015
Page 48