Transcript Document

Oxford University Centre for Educational Assessment
Assessment & Learning: fields apart?
Jo-Anne Baird, Therese Hopfenbeck
David Andrich & Gordon Stobart
July 16, 2015
The need for a review on Assessment and Learning
• Knowledge Economy
– economic importance
• The Audit Society
– important societal control function
– assessment defines what counts as valuable learning through these
mechanisms
• Multiple, high-stakes
– assessment’s domination over learning
• Assessment is agenda-setting
• 21st Century has seen interesting developments already
• Build cumulatively on what has already been done
What do we mean by theory - functions

Abstraction

Abductive reasoning
 Wallander has a theory that Schwarzman killed Inga
 Distinguished from practice
 It’s just a theory

Normative


How things ought to be
Explanatory

Descriptive
 May be causal
 May be predictive
 May be formalised in logic or mathematical equations
July 16, 2015
Page 3
What do we mean by theory - focus

Scientific theory

Relates to empirical phenomena
 Has an internal logic
 Should be empirically testable

Substantive theory

Learning
 Developmental psychology

Test theory

Metrology
 Psychometrics
July 16, 2015
Page 4
Overview
1.
Relationships between substantive
learning theory & assessment
2.
Theoretical and philosophical
dilemmas
1.
Case Studies - applications

International tests
 Assessment for Learning
2.
Conclusions
July 16, 2015
Page 5
Oxford University Centre for Educational Assessment
Assessment and Learning
Jo-Anne Baird & David Andrich
July 16, 2015
Behaviourist theory of learning

Learning is demonstrated in behaviour

Mental processes are not important

Study of animals tells us about human learning
(eg rats & pigeons)

Learning as a reaction to stimuli in the
environment, such as teaching
Behaviourist approach to assessment





Control conditions
Measure memory for facts
Compare performance with criteria or norms
Global score for performance on ability in
subject area
Norm- or criterion-referenced
Cognitive-constructivist theory of learning

Learning occurs in the brain

Cognition, especially meta-cognition important

Memorisation of facts not so impressive

Building of mental models of the world

Integrate and build upon previous knowledge and
learning

Novice-expert differences
Cognitive-constructivist assessments

Higher order skills
 synthesis
 Evaluation
 Problem


solving
Extended tasks
Assessed in terms of novice-expert continuum
Socio-cultural theory of learning




Learning is a social event
Learning is situated and contextdependent
Learning is value-laden
Learning does not happen within, but
between people
Socio-constructivist approach to assessment

Holistic, qualitative feedback emphasised

Authentic tasks important

Groups as well as individuals assessed

Self- and peer-assessment important

Engagement with criteria
Theories of learning and assessment practices

Have learning theory and assessment
practice informed each other?

Or have they been growing apart?
Chronology and relationships

Learning theories have been contemporaneous

Cognitive psychology superseded behaviourism
 Vygotsky’s work overlapped considerably with behaviourist thinking
 Cognitive constructivism and social constructivism

Links between forms of assessment and learning theory not
clear


e.g.Multiple choice format can be used to assess cognition
Implications of theories for assessment practice not
straightforward

e.g. Skinner (1989)
 Good instruction demands two things: students must be told
immediately whether what they do is right or wrong and, when right,
they must be directed to the step to be taken next.
July 16, 2015
Page 14
Assessment and psychometrics: they are different
Lawn (2008) Crossing the Atlantic – history of the development
of different approaches to assessment in European countries
and the US
 Early 80s ‘Rasch wars’ in the UK
 Nuttall, Gipps, Broadfoot, Black, Harlen … argued for a more
educationally sound approach to assessment that was learnercentred


Baird & Black (2013) outlined how psychometrics does not fit
well with a range of educational assessment purposes:
curricula change (construct), criteria public, correlation between
questions, qualification-focus (not item), multiple dimensions in
tests, pre-testing not always feasible …
 Psychometrics literature – how has it related to learning theory
July 16, 2015
Page 15
Psychometric models – representational measurement

The Ferguson Committee – 1940 British Association for the Advancement
of Science




No evidence that psychological assessments were quantitative
Based upon Campbells 1920s arguments against psychophysics
Rebuttal by Stevens (1946), giving us the different forms of ‘measurement’ – nominal,
ordinal, interval, ratio [learning can be measured in each of these ways]
Measurement as a product of the instrument
Luce and Tukey’s additive conjoint measurement – mathematical proof
that ratio scales could be produced from transformations of ordinal
variables.
 BUT the assumptions of additive conjoint measurement are not met by
assessment data.



For example, transitivity (If A>B and B>C then A>C): item parameters change
across time and sub-populations
Michell (1997) says this isn’t measurement in the scientific sense, but he
says that about all of psychological assessment and by implication
educational assessment
July 16, 2015
Page 16
Psychometric models – classical test theory

Based upon centuries of statistical work

Lord and Novick (1968) made the great leap forward

BUT suffers from the same problems as representational theory
 True score = learning part of the equation, but what is it? Theoretical
July 16, 2015
Page 17
Psychometric models – latent trait theory

Come from factor analytic models produced by Spearman (1904)
 One parameter (difficulty), two parameter (& discrimination), three
parameter (& guessing) typically used.
 Multi-dimensional forms also available
 Only the Rasch form (one parameter) can deal with the transitivity
problems


Unlike in the representational theory approach, Rasch is probabilistic, so
deviations from the model are handled
More than one parameter causes problems for transitivity
BUT still does not deal with Michell’s criticisms
 Are psychological constructs quantifiable?
 Not just a problem for psychometrics – for the assessment field broadly

July 16, 2015
Page 18
What does it mean to assess attainment?

Cronbach & Meehl (1955)

A construct is some postulated attribute of people, assumed to be
reflected in test performance. In test validation the attribute about
which we make statements in interpreting a test is a construct.

Educational attainment constructs – grading & scoring scheme

Assessment by association


Dimensionality


All based upon correlation
How can we combine different things?
Invariance

Do the scores mean the same thing across tests and groups of
students?
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 19
Physics envy

Michell (2008) – requirements for real numbers should be
satisified


Kane (2008)


Educational assessments do not meet these strictures
Physics is held up as an idealistic example of measurement,
but



Realist - numbers exist independently from humans
Physical measurements took a long time to develop – theories &
apparatus
Measurement in physics shows inconsistency across too (Hedges,
1987)
Need externality to our measures & multiple ways of
measuring to substantiate that there is a real phenomenon
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 20
Creation of constructs
1.
2.
3.
4.
Theory-based
Emprically-driven
Subject-matter expert devised
Policy-driven

Agenda-setting activity
 “Different methods and theories have implications for the ways in which
concepts such as learning or educational reform or fairness are
formulated, studied and promoted as a practical activity. Perhaps more
profoundly and subtly, these methods and theories affect the ways human
beings are represented and, ultimately the ways they come to understand
themselves and others … Moss (2005)
 “…it may not always be clear to what extent an attribute is conceptually
independent of the methods of measurement, especially in human
science applications.” Maul (2013)
July 16, 2015
Page 21
Philosophical position

Field is essentially modernist, Borsboom claims realist
too

Attributes have an independent existence
 They are discoverable using scientific methods

Neopragmatic, postmodern test theory

Agnostic as to the nature of the attributes and their independent
existence
 Accept that the attributes might be defined in part, or entirely, by the
assessment apparatus
 Use triangulation to advance knowledge of the attribute and
measurement system
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 22
Fields apart

Goldstein, Laming and others have pointed out that there is no
theory in test theory – only mathematical models
 McGrath (2005) – psychometrics has caused the problem
 Borsboom blames psychologists for lack of theory underlying
psychological tests
 Andrich has argued that it should be a collaboration between
substantive theorists and psychometricians

Sijtsma (2006) – models of learning that underpin test design
are often either not referred to or remain in puberty, infancy or
even at the foetal stage

Has psychometrics (or assessment) helped us to understand
learning?
July 16, 2015
Page 23
An answer to somebody else’s problem –
Baird & Black (2013)

Measurement systems have their own, internal logic

Measurement doesn’t tell us about the phenomenon of interest

Nothing about a set of numbers tells you what they measure
(Maraun, 1988)

Educational attainment – setting out the construct has an
intentional element
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 24
How could assessment inform learning theory and vice versa?

Empirical data can test theory and help to move it forward
 We do not seem to have used it systematically
 Craft knowledge of examiners and other educators – raises questions
about what kind/level of theory we expect from assessment data

Mark Wilson making serious attempts to do this

Dependency of the data upon curriculum exposure and other population
characteristics leads to problems with invariance: frame of reference
needs to be taken more seriously

Sociocultural learning theory do not fit well with standardisation-asfairness principles of assessment

Assessment is an expression of educational values
July 16, 2015
Page 25
Oxford University Centre for Educational Assessment
International tests & learning
Therese Hopfenbeck & Jo-Anne Baird
July 16, 2015
International tests
Learning
theory
Constructs
• Cognitive
• Uni-dimensional
Quantifiability • Modernist, realist
Invariance
• Crucial to their
interpretation
International tests influence learning based
upon three processes
1) what counts as valuable learning
2) how national assessment systems are developed around the
world
3) how students approach learning since there is evidence that
students adopt their learning approaches according to the tests
given
Example: the case of Norway
THE CURRICULA APPROACH – TIMSS,
Science, grade 8
Most underground caves are formed by the
action of water on
a) Granite
b) Limestone
c) Sandstone
d) Shale
Cognitive domain: Factual Knowledge, main topic Earths structure and physical features.
How international tests have influenced
national assessment systems
1)
2)
3)
4)
5)
Norway : Introduced national tests in 2004. The reading tests
are based upon the PISA reading framework (Frones et al
2012).
Denmark: Introduced national tests after low performing in
PISA (Egelund, 2008).
Japan: changed item format on their national tests to more
open-responses like those in PISA (Schleicher, 2009).
Korea: PISA like tasks on their University Entrance Exam
(Schleicher 2009).
Germany: introduction of national educational standards and
more focus upon external assessment (Ertl, 2006)
Literature review in three steps
Step 1
Broad search AHELO, PIAAC, PIRLS, PISA, TIMSS
More than 1000 articles detected.
Step 2
Narrowing to peer-reviewed (805)
No grey literature, but categorized reports, book reviews and
conference papers separately.
Step 3
Quality checks of relevant articles based upon reading abstracts.
Developing categories based upon model-article and coding in
End-note. Meetings with discussions on categories and coding.
Still in process of quality checking.
Published peer-reviewed articles from 1993 - 2013
140
120
100
80
60
40
20
0
1990
1995
2000
2005
2010
2015
Interest in
Science
Secondary
analysis of
student
questionnaire
data
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 36
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 37
Oxford University Centre for Educational Assessment
Formative assessment & learning
Gordon Stobart & Therese N. Hopfenbeck
July 16, 2015
Assessment for learning
Learning
theory
• Socio-cultural? (Cognitive)
Constructs
• Concepts, not constructs
Quantifiability
• Postmodern, qualitative
feedback (largely)
Invariance
• Not assumed
Presentation title, edit in
header and footer
(view menu)
July 16, 2015
Page 41
Elevers forutsetninger for å lære kan styrkes
dersom de
1.
2.
3.
4.
Forstår hva de skal lære og hva som er
forventet av dem.
Får tilbakemeldinger som forteller dem om
kvaliteten på arbeidet eller prestasjonen.
Får råd om hvordan de kan forbedre seg.
Er involvert i eget læringsarbeid ved blant
annet å vurdere eget arbeid og utvikling.
Fire prinsipper om vurdering, Utdanningsdirektoratet.
The challenges

Lack of theoretical consensus on AfL and formative
assessment

Few researchers use the original articles

Development of stories – which is flawed if you look at the
original work
July 16, 2015
Page 43
The vast majority of studies on AfL are small-scale action
research designs and are published in a wide range of journals.
A concern for the review is that current definitions of formative
assessment/AfL cover a wide range of teaching and learning
practices while research designs often lack an action theory (what
is causing change), often accompanied by a lack of systematic
data collection (for example baseline data before a research
initiative).
July 16, 2015
Page 44
Overselling?
Whilst claims for large effect sizes are regularly made in the
literature, the evidence for these has increasingly been critiqued,
for example by Bennett (2011) and Kingston and Nash (2011).
The effects of formative assessment upon learning have been
over-sold by some authors. This is unfortunate because the
limited empirical research suggests a modest, but educationally
significant, impact on teaching and learning.
July 16, 2015
Page 45
“Good instruction demands two things: students must be told
immediately whether what they do is right or wrong and, when
right, they must be directed to the step to be taken next. We
presume that Skinner meant that students be directed to the next
step even if they were wrong, although he did not write that”.
B.F. Skinner, in a letter to Science in 1989
July 16, 2015
Page 46
Conclusions
• Assessment models need to have a closer relationship with learning
• Educational assessment is a social construct – it is often an intentional,
agenda-setting activity
• Test theory provides statistical models which will vary in utility with context
• Cognitive learning theory is the current model in assessment
• International tests influence learning through policy
• Assessment for learning influences learning through practice
• Assessment outcomes have powerful effects – the numbers have a life of
their own. Unless we change our practices, the effects will be more
detrimental upon learning
July 16, 2015
Page 48