Research Metrics

Download Report

Transcript Research Metrics

Research Metrics
What was proposed …
… what might work
Jonathan Adams
Oct 2006
Overview
• RAE was seen as burdensome and distorting
• Treasury proposed a metrics-based QR allocation
system
• The outline metric model is inadequate, unbalanced and
provides no quality assurance
• A basket of metrics might nonetheless provide a
workable way of reducing the peer review load
• Research is a complex process so no assessment
system sufficient to purpose is going to be completely
“light touch”
Oct 2006
The background
• RAE introduced in 1986
– ABRC and UGC consensus to increase selectivity
• Format settled by 1992
• Progressive improvement in UK impact
• Dynamic change and improvement at all levels
Oct 2006
The RAE period is linked to an increase in UK
share of world citations
13
Arrows indicate RAE years
UK share (%) of world citations
12
11
10
9
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Oct 2006
UK performance gain is seen across all
RAE grades (Data are core sciences, grade at RAE96)
Average normalised impact (world average = 1.0)
16%
1.2
12%
1
17%
0.8
0.6
1991
Oct 2006
1992
1993
1994
1995
Grade 4
1996
Grade 3A
1997
Grade 3B
1998
1999
2000
Treasury proposals
• RAE peer review produced a grade
– Weighting factor in QR allocation model
– Quality assurance
• But there were doubters
– Community said the RAE was onerous
– Peer review was opaque
– Funding appeared [too] widely distributed
• Treasury wanted transparent simplification of the
allocation side
Oct 2006
The ‘next steps’ model
• Noted correlation between QR and earned income (RC
or total)
– Evidence drew attention to statistical link in work on dual support
for HEFCE and UUK in 2001 & 2002
• Treasury hard-wired the model as an allocation system
– So RC income determines QR
• But …
– Statistical correlation is not a sufficient argument
– Income is not a measure of quality and should not be used as a
driver for evaluation and reward
Oct 2006
QR and RC income scale together, but the residual variance
would have an impact
RC Income vs QR Funding (2003-04)
80,000,000
UC London
Oxfor
LOSER
70,000,000
Cambridge
Imperia
60,000,000
Manchester
QR funding
50,000,000
40,000,000
winner
KC London
Soton
30,000,000
HEPI produced additional
analyses in report
Warwic
20,000,000
10,000,000
0
0
10,000,000
20,000,000
30,000,000
40,000,000
Research Council income
Oct 2006
50,000,000
60,000,000
70,000,000
Unmodified outcomes of outline metrics
model perturb current system unduly
£MILLIONS
Current HEFC R
Change
Univ Southampton
32.6
15.7
Univ Cambridge
73.5
13.3
Univ Leicester
11.9
6.4
Univ Manchester
54.3
6.0
72.2
-4.8
Royal Holloway, Univ London
9.9
-5.1
Univ Arts London
6.3
-5.7
Imperial Coll London
66.2
-6.6
Univ Coll London
73.7
-9.6
King's Coll London
38.5
-11.8
WINNERS
LOSERS
Univ Oxford
A new model might produce reasonable change, but few would accept that the
current QR allocations are as erroneous as these outcomes suggest
Oct 2006
The problem
• The Treasury model over-simplifies
• Outcomes are unpredictable
– There are confounding factors such as subject mix
– Even within subjects there are complex cost patterns
• The outcome does not inspire confidence and would
affect morale
• There are no checks and balances
– Risk of perverse outcomes, drift from original model
– Drivers might affect innovation, emerging fields, new staff
• There is no quality assurance
Oct 2006
What are we trying to achieve?
We want to lighten the peer review burden so we need ‘indicators’ to
evaluate ‘research performance’ but not simplistic mono-metrics
What we want
to know
Inputs
research quality
Time
Research black
Time
Outputs
box
Funding
What we have to use
Oct 2006
Numbers..
Publications
Informed assessment comes from an integrated
picture of research, not single metrics
Process
Inputs
Research grants and
contracts
Derived
information
Contribution
to economy
Data sources
Activity
support
Staff, trainees, facilities and
projects
Outputs from research
which
produces
Knowledge as discovery
Outputs from development
leading
to
Knowledge as process or
product
Income
Research capacity
People
Patents
Funding agents
Skilled employees
Journal papers and reports
Licenses
Collaborating organisation
Trained technical staff
Co-authorship
Spin-out companies
Trainee students and
researchers
Citation impact
Joint ventures
Increased ability to tackle
and solve industrial
problems
Know-how
Improved commercial
competitiveness
Increased pool of trained
and highly skilled personnel
Shared information base
RAE, HESA for UK
RAE, HESA for UK
Thomson ISI
Thomson Derwent
EuroStat, OECD
EuroStat, OECD
Evidence Ltd
EuroStat
Improved networking and
cooperation
Improved collaboration
Recognition
Innovation
Growth
©
Oct 2006
2002
Data options for metrics and indicators
• Primary data from a research phase
– Input, activity, output, impact
• Secondary data from combinations of these
– e.g. money or papers per FTE
• Three attributes for every datum
– Time, place, discipline
– This limits possible sources of valid data
• Build up a picture
– Weighted use of multiple indicators
– Balance adjusted for subject
– Balance adjusted for policy purpose
Oct 2006
We need assured data sourcing
• Where the data comes from
– Indicator data must emerge naturally from the process being
evaluated
– Artificial PIs are just that, artificial
• Who collects and collates the data
– This affects accessibility, quality and timeliness
• HESA
– Data quality and validation
– Discipline structure
• Game playing
Oct 2006
We need to agree discipline mapping
What is Chemistry?
FUNDING
Research Council
Chemistry grants
committee
Other funders
Other departments
ACTIVITY
University
School of Chemistry
Other journals
OUTPUT
Oct 2006
ISI
Chemistry journals
Other researchers
We have to agree how to account for the
distribution of data values e.g. income
20
RAE2001 - research income for units in UoA14 Biology
Frequency
15
10
£250k per FTE
5
£10m per unit
0
Minimum
Income category
Income per FTE
Oct 2006
Maximum
Gross income
Distribution of data values - impact
400
UK Physics papers for
1995 = 2323
Frequency
300
World average
The variables for which we have
metrics are skewed and therefore
difficult to picture in a simple way
200
100
0
0
Oct 2006
Impact category (normalised to world average)
Maximum
Agree purpose for data usage
• Data are only indicators
– So we need some acceptable reference system
• Skewed profiles are difficult to interpret
• We need simple, transparent descriptions
– Benchmarks
– Make comparisons
– Track changes
• Use metrics to monitor performance
– Set baseline against RAE2008 outcomes
– Check thresholds to trigger fuller reassessment
Oct 2006
Example - categorising impact data
All papers
Uncited
papers
Cited papers
Papers cited less often than
benchmark
.
Papers cited more often than benchmark
Papers cited more than
benchmark, but less
than four times as often
=0
>0
>0.125
>0.25
0.5 < 1
1<2
2<4
Papers cited more
than four times as
often as benchmark
4<8
This grouping is the equivalent of a log 2 transformation. There is no
place for zero values on a log scale.
Oct 2006
>8
UK ten-year profile
25
680,000 papers
MODE (cited)
MODE
AVERAGE
RBI = 1.24
Percentage of output 1995-2004
20
MEDIAN
15
10
THRESHOLD OF
EXCELLENCE?
5
0
RBI = 0
Oct 2006
RBI >0 - 0.125
RBI 0.125 - 0.25
RBI 0.25 - 0.5
RBI 0.5 - 1
RBI 1 - 2
% of UK output over decade
RBI 2 - 4
RBI 4 - 8
RBI > 8
Subject profiles and UK reference
30
Percentage of output 1995-2004
25
20
% of UK for subject and
time period shown as a
smoothed line
UK average
shown as red
symbol
15
10
5
0
RBI = 0
Oct 2006
RBI >0 - 0.125
RBI 0.125 - 0.25
RBI 0.25 - 0.5
RBI 0.5 - 1
RBI 1 - 2
RBI 2 - 4
RBI 4 - 8
RBI > 8
HEIs – 10 year totals – 4.1
30
Leading research university Big civic 'Robbins' type university Former Polytechnic
Percentage of output 1995-2004
25
Smoothing the lines
would reveal the shape
of the profile
20
15
10
5
0
RBI = 0
Oct 2006
RBI >0 - 0.125
RBI 0.125 - 0.25
RBI 0.25 - 0.5
RBI 0.5 - 1
RBI 1 - 2
RBI 2 - 4
RBI 4 - 8
RBI > 8
HEIs – 10 year totals – 4.2
30
Leading research university Big civic 'Robbins' type university Former Polytechnic
Percentage of output 1995-2004
25
Absolute volume would
add a further element
for comparisons
20
15
10
5
0
RBI = 0
Oct 2006
RBI >0 - 0.125
RBI 0.125 - 0.25
RBI 0.25 - 0.5
RBI 0.5 - 1
RBI 1 - 2
RBI 2 - 4
RBI 4 - 8
RBI > 8
Conclusions
• We can reduce the peer review burden by increased use of metrics
– But the transition won’t be simple
• Research is a complex, expert system
• Assessment needs to produce
– Confidence among the assessed
– Quality assurance among users
– Transparent outcome for funding bodies
• Light touch is possible, but not featherweight
– Initiate a metrics basket linked to RAE2008 peer review
– Set benchmarks & thresholds, then track the basket
– Invoke panel reviews to evaluate change, but only where variance
exceeds band markers across multiple metrics
Oct 2006
Overview (reprise)
• RAE was seen as burdensome and distorting
• Treasury proposed a metrics-based QR allocation
system
• The outline model is inadequate, unbalanced and
provides no quality assurance
• A basket of metrics might nonetheless provide a
workable way of reducing the peer review load
• But research is a complex process so no assessment
system sufficient to purpose is going to be completely
“light touch”
Oct 2006