Combining reliability and validity in assessment: Can

Download Report

Transcript Combining reliability and validity in assessment: Can

Institute of Education Doctoral School
Summer Conference 2011
Mr. Niall Douglas BSc MA MBS MCollT
1.
Reliability and Validity in HE assessment
2.
Consequences of assessment unreliability on
plagiarism
3.
My proposed research
4.
Current market situation and likely future demands in
HE assessment (if we have time)
5.
Ethical consequences

While I was reading for my second
undergraduate degree at St. Andrews 20042008, one of the most common complaints I
ever heard from students was about the
inconsistency in grading



In other words, grades varied a LOT in qualitative
assessment
Most students also had issue with how quantitative
assessment grades up to 100%, whereas qualitative
assessment only grades up to 75-80%
This was seen – and quite correctly in my opinion –
as a hidden subsidy of quantitative study


It turns out that St. Andrews undergraduates are
hardly alone in having issue with this: Yorke
(2000) found that this “unfairness” features highly
in student dissatisfaction surveys
So why does UK Higher Education persist in these
marking inconsistencies i.e. Unreliability in
assessment?
[1]: Yorke, M. (2000) 'The quality of the student experience: what can institutions learn from
data relating to non-completion?', Quality in Higher Education, vol. 6, pp. 61--75.

One area in which UK HE diverges from US HE is
in choosing validity over reliability in assessment
Validity of assessment is measured by how well the
assessment achieves learning outcomes
 For example, one might choose a suboptimal form of
assessment (in terms of reliability, marking overheads or
even the prevention of cheating) if that form has superior
effects on student learning outcomes


In particular, in the UK HE tries to allow for more
creativity and originality in student writing –
something inherently subjectively assessable –
even if grades consequently vary a lot, even with
the same marker and the same student

This is why UK HE marks qualitative assessments up to just 7580% - because it has been repeatedly found that grades in
qualitative assessment have a variance of up to ±20% (Hartog,
Rhodes and Burt, 1936; Cox, 1967; Branthwaite, Trueman and
Berrisford, 1981; Elton and Johnston, 2002)



Marking to 75-80% heavily compresses the variance into a ±12.5-15%
range, 20% of which is ±2.5-3% - hence the historical UK HE choice of 10%
bands between 1H, 2H1, 2H2 etc.
Of course, this is not widely realised by students, nor even staff! But it is
done as a “least worst” outcome despite penalising qualitative study
If you want to know a LOT more on this, see Elton, L. and
Johnston, B. (2002) 'Assessment in universities: a critical review of
research'.
Hartog, S.P., Rhodes, E.C. and Burt, S.C. (1936) The marks of examiners: being a comparison of marks allotted to examination scripts by
independent examiners and boards of examiners, together with a section on a viva voce examination, Macmillan.
Cox, R. (1967) 'Examinations and higher education: a survey of the literature', Higher Education Quarterly, vol. 21, pp. 292--340.
Branthwaite, A., Trueman, M. and Berrisford, T. (1981) 'Unreliability of Marking: further evidence and a possible explanation', Educational
Review, vol. 33, pp. 41--46.



Of course, because creativity and originality are
enabled by this choice of UK HE it has the
consequence of generating the highest grade
volatility for the most creative and original
students!
Therefore, while this preference for validity is
admirable, does it have unintended consequences
on the incentives to plagiarise?
Here are some facts about plagiarism ...


Firstly, cheating detection software is
extremely easy to fool. In fact, there are sites on
the internet which will pre-check your work for
you to make sure you won’t get caught
Secondly, there is a vibrant market in most
universities of Masters students being paid by
undergraduates to write essays for them –
these being original don’t get detected

As an example, the going rate in University College
Cork was about €100 per thousand lines, with a
refund if the student didn’t get a 2H2 or higher



A study by de Lambert et. Al (2002) found that 80% of
New Zealand HE students plagiarise5
In my own experience at Hull, St. Andrews and
University College Cork, it is even more common than
that. I would actually say it is ubiquitous – as in, I have
never met a recent student who hasn’t!
Additionally, for some subjects like Computer Science,
the problem is over-detection of plagiarism because
coursework answers tend to be very similar to one
another

Being falsely accused of plagiarism when you in fact did not is
a highly unpleasant experience I can assure you!
[5]: de Lambert, K., Ellen, N. and Taylor, L. (2002) 'Prevalence of academic dishonesty in tertiary
institutions: The New Zealand story', New Zealand Journal of Applied Business Research, vol. 1.

From my own experience, I reckon that the top
students selectively plagiarise whereas the
bottom students randomly plagiarise



I have a feeling that, if true, this might be an
excellent proxy for measuring the understanding of
study itself
This would support the hypothesis that the most
creative and original students deliberately plagiarise
not out of laziness or incapability, but rather as an
expert method for reducing assessment volatility
And surely such adaptability is a valuable capability
in a student?



This raises a very interesting thought ...
Perhaps instead of looking at all plagiarism as
inherently negative, perhaps the patterns in plagiarism
might offer a superior indicator of the ability of the
student?
Taking it a step further, perhaps the overall patterns of
gestating an assessed work might be an even more
superior indicator?

In other words, instead of assessing final works as standalone
items, we also assess how those works were generated


My research intends to develop a software tool
which records a person’s activities as they
gestate an assessed work
This record is then formed into an audit trail
which can be embedded into the Microsoft
Office document submitted for assessment as
proof of originality of work

Note that the student can delete items from the trail
before submission. However we retain what they
delete




These audit trails are accumulated onto an internet
cloud over time along with ALL material seen by each
student during the gestation of their works
This allows large scale statistical analyses to be
performed looking for correlations between awarded
grade and (for example) how a student goes about
writing an essay, such as what they search for, where
they look and in what sequence, and of course what
they delete from their submitted audit trails
You can also run statistical analyses of many other
things such as typical patterns in word usage or impact
of citations on students
This lets you do a number of very cool things ...

For example, students can now compare how they write an
essay to how other people write their essays


Universities can compare their grading with the average
grading of their past (or other universities) to discover if
there are any anomalies in their current processes. In
particular, the statistics can flag those students in a cohort
most likely to have the highest variance in grade


This lets you turn summative assessment into formative
assessment – good!
This lets the assessing department determine partial risk and
deploy double or even external marking to those students most
likely to need it. This is a vast improvement over the random
double marking currently employed
It also provides an independent external check of the
assessor which considerably exceeds the QAA’s present
gold standard because it checks each and every assessment
rather than just a random sample


And there are obvious major benefits for the detection
and elimination of plagiarism by increasing the effort
to plagiarise and decreasing the rewards for doing so
There are some further benefits:



The software can also analyse what you are writing while you
write it and suggest further reading on the basis of previously
written essays stored in the cloud
It’s great for reflective practice, because I am not entirely sure
how I gestate my own academic outputs. Watching myself
write these slides I would wonder what it looks like from
outside my head!
It could also be a boon to distance assessment the
importance of which is only just beginning ...

On the advantages for distance (actually de-campused)
education:




Most people have a statistically fingerprintable “style” to their
research and writing. This lets you ensure that a submitted
work came from the same person as before
Moreover, that style slowly evolves over time. If a person used
my software from the very beginning of their academic career
you’d get a graph of how their style evolves over time
This lets you calculate teaching practice quality as well as
letting you compare students when they go before the
workplace i.e. as a recruitment selection tool
The possibilities only begin there ... I don’t have time to mention
more here, but even if it only partially works there are a
multitude of uses for this data


This research is not just about doing cool things, it
is also about solving a major pressing need for UK
Higher Education
UK HE has already adapted to a substantial per
student funding drop



For example, expenditure per student in 1984/85 was
£8,600 in 2009 pounds. In 2008/09, it was just £7,389!)
That’s a drop of 14% per student!
UK HE has already substantially improved its
productivity ...
From an Economic perspective, UK HE has
experienced tremendous productivity
improvements



Total Factor Productivity (TFP) analysis of UK HE finds
a 51.5% improvement in TFP between 1980/81 and
1992/933
As a comparison, UK HE storms past Chinese HE in
both TFP and TFP growth4 where in China TFP growth
has actually been negative 1998-2002
[3]: Flegg, AT et al., (2003), ‘Measuring the Efficiency and Productivity of British
Universities: An Application of DEA and the Malmquist Approach’
[4]: Ng, YC & Li, S, (2009), ‘Efficiency and productivity growth in Chinese universities
during the post-reform period’

UK HE faces some large structural shifts in its
primary market. According to the British Council,
(2004), ‘Vision 2020: Forecasting international
student mobility - a UK perspective’:
Global transnational HE demand is forecast to rise from
2.1m places to 5.8m by 2020 (a 176% increase)
 UK HE currently holds about 13% of the global
transnational HE market. While this is projected to only
rise to 15% by 2020, it still means a trebling of
international students for UK HE, or a 34% rise in total
students attending UK HE when you include projected
domestic growth


If you look at the cost of providing a HE course,
they break down as follows:

Fixed costs:
 Planning and developing course design to fit other courses
and quality standards
 Developing course materials

Running costs:
 Providing infrastructure e.g. Buildings, computers, libraries
 Marketing, advertising and non-per student administration
 Teaching

Per-student costs:
 Tutorials
 Assessment
 Per student administration

As you gain students (especially distance
students):
1.
2.
3.



Fixed costs per student drop
Running costs per student drop
Per-student costs do NOT drop
(Economic and Accounting) productivity is
strongly correlated to economies of scale
Therefore economies of scale are inevitably bound
to reducing per student costs
And that implies reducing the cost of assessing
students ...



Given that UK HE has already plucked the “low
hanging fruit” in productivity gains, how it is
supposed to improve still further with a 34%
increase in student numbers by 2020?
Especially given that new low cost HE providers
are becoming increasingly cost competitive on the
global stage by using much cheaper labour to
reduce tutorial and per student admin costs?
In short: either UK HE innovates or it will get
pushed out of its global market

Lastly, the Higher Education Achievement Report
(HEAR) wants to phase out overall degree
classifications in favour of academic transcripts
and even more detailed per-topic assessments


My worry about this is that this may penalise even
further the more creative and original students by
showing up the volatility in their component grades
And that’s where my research comes in! If it works
it could be a great tool for simultaneously reducing
assessment costs whilst improving its quality and
being more meritocratic. A win-win-win!


Just because it appears to be a win-win-win
does not necessarily mean that there isn’t a big
potential for problems ...
After all for every “good” use of such
personally intimate data there are many “bad”
uses
For example, it certainly would be a gold mine for
marketing
 Or even one might use this data to early identify
those most likely to become subversive or engage in
criminal behaviour or even fail to repay loans





Some of these problems can be inhibited by
deliberately designing the data storage in a form
which makes it hard to regress such indicators
However, many of these problems also affect all
social networking users in general
We definitely comply with the EU Data Protection
Directive, so we are currently legal and meeting
best current practice
And while we can identify individuals, anyone we
give access to our data for research purposes can
not (without the individual user’s explicit
permission)

In the end, it is probably too early to be able to
say much more without knowing what will
work and what won’t i.e. We have to build it
and see what is possible first

We are, however, very mindful of these issues
Thank you for listening!
Questions?