Transcript Slide 1

What’s so special about the social sciences?
Peter Burnhill
Director, EDINA national academic data centre,
University of Edinburgh, Scotland UK
Bloomsbury Conference on e-publishing and e-publications
University College London, 24/25 June 2010
1
short answer:
Some things but not everything
Overview for a longer answer
1.
Autobiographic Apologia
–
2.
Research publication and data for research in the social sciences
–
–
3.
some evidence (some old and re-used)
all that is digital are not data
Societal Big Challenges
–
4.
Yesterday and Yesteryears
a sense of place
Our shared task
–
–
ease and continuity of access
citation and linking
*
5.
Linked Data: Semantic Web anyone?
Socio-Informatics & the Internet

Will not take the full two hours ….
autobiography as commentary
1.
Social Science Research Council [now ESRC]
–
2.
Scottish Education Data Archive, until mid ‘80s
–
3.
Co-director: early days of Geographical Information Systems (GIS)
EDINA national data centre, mid-1990s to present: my day job
–
7.
Manager: set-up and development
President of IASSIST, 1997 – 2001: social science data professionals
ESRC Regional Research Laboratory for Scotland 1986/90
–
6.
Senior Lecturer, teaching quantitative/survey methods
Edinburgh University Data Library, mid- 1980s & on
–
–
5.
Survey statistician: school leavers, YTS, 16-19 cohort surveys; demand for HE
Graduate School, Faculty of Social Science, 1987 – 1997
–
4.
‘Scientific Officer’ for Economic & Social History and Statistics
(left to do MSc Statistics at London School of Economics)
Director: set-up and continuous development
Digital Curation Centre, 2004/05 as Interim Director
–
set-up & definition of ‘data curation + digital preservation’
3
Taken from a PPT to JISC in July 2004 …
What is this digital curation anyway?
digital curation: ... digital objects and data, over their life-cycle,
for current & future generations of use ...
= f(data curation & digital preservation)
• data curation [when high current/ongoing interest]
– actions needed to maintain and utilise digital data & research results
over entire life-cycle
– data creation & management; adding value; generating new sources of
information & knowledge, for use
•
digital preservation [for longevity;fall off in interest]
– long-run technological/legal accessibility & usability
– storage, maintenance & accessibility of information content in digital
material over the long-term, for use
– OAIS concept of designated community
4
Taken from a PPT to JISC in July 2004 …
The term Digital Curation is a rather recent invention.
•
The Digital Data Curation Task Force - Report of the Task
Force Strategy Discussion Day (2002) states
– Tony Hey took up the term which had been used by Dr John Taylor,
Director General of the Research Councils, to distinguish the actions
involved in caring for digital data beyond its original use, from digital
preservation. The concept’s reach extends beyond libraries.
•
The e-Science Curation Report (2003) proposed the following
distinctions:
– Curation: The activity of, managing and promoting the use of data from
its point of creation, to ensure it is fit for contemporary purpose, and
available for discovery and re-use. For dynamic datasets this may mean
continuous enrichment or updating to keep it fit for purpose. Higher levels
of curation will also involve maintaining links with annotation and with
other published materials.
– Archiving: A curation activity which ensures that data is properly
selected, stored, can be accessed and that its logical and physical integrity
is maintained over time, including security and authenticity.
– Preservation: An activity within archiving in which specific items of data
are maintained over time so that they can still be accessed and
understood through changes in technology.
5
research, learning & teaching in UK universities & colleges
acting as platform for network-level services
& helping to build the JISC Integrated Information Environment
National Data Centres
JISC Collections
JISC Sub-Committees
UK funding councils
Research
Councils UK
EDINA Management Board met yesterday
to review its 3-year Strategy and its Budget
from JISC for the coming year
Reading & Reference Room: supporting scholarly communication
No longer host specialist Abstract & Index databases …
SUNCAT
UK serials union catalogue:
what’s held where
EDINA Strategy in this area just reviewed following:
the Depot
•RLUK/JISC Resource Discovery Task
Force
•SCONUL Shared Services Business
Case
international
Open Access
•EDINA Focus Groups on
facility to support self
i. ‘ease and continuity of access’
deposit of peer-reviewed
ii. Arts, Humanities & Socialpapers
Sciences
iii. new technologies
8
Reading & Reference Room: supporting scholarly communication
No longer host specialist Abstract & Index databases …
SUNCAT
UK serials union catalogue:
what’s held where
EDINA Strategy in this area just reviewed following:
the Depot
•RLUK/JISC Resource Discovery Task
Force
•SCONUL Shared Services Business
Case
international
Open Access
•EDINA Focus Groups on
facility to support self
i. ‘ease and continuity of access’
deposit of peer-reviewed
ii. Arts, Humanities & Socialpapers
Sciences
iii. new technologies
9
Ensuring
researchers, students and their teachers have
ease and continuity of access
to online scholarly resources
access
to content & services
licence
to use
Creative Commons
licensing
Should apply to different types of resource:
typically journal articles,
but also now OER learning materials, data etc
additional considerations
Search (Re-)Use
Modify/CombineShare (Issue/Publish)
P.Burnhill, Edinburgh 2009
Finds the agencies looking after e-journal,
and the volumes being preserved
11
Geo-spatial resources: Map & Data Place
12
Multimedia resources: Sound & Pictures Show
•
platform for search and download of film, video and audio
–
–
•
wide range of subject coverage, including documentary film
Llicensed for use in learning, teaching and research
Being re-worked as the Digital Media Hub, combining
–
Film & Sound Online
*
–
–
NewsFilm Online
*
3000 hours of material from ITN & Reuters
*
Over 4TBs of clips to download
Release of product from JISC Digitisation programmes
*
–
Plus Education Image Gallery of still photography
Visual and Sound Materials Portal project
*
•
initial 600 hours of film, digitised for downloading
Discovering all sorts of audio-visual material
Special interest for social science as record on non-print
record of 20th Century: the first A-V century
–
With new forms of research material to use and to master
13
Defining the Social Sciences
a collection of disciplines that variously apply theorising
and systematic method to the study of human society
from family to politics, from law/religion to economy:
of what it is to be human and our interaction among ourselves
and with our environment, whether on land, sea or the Internet
Teaching draws upon schooling: social arithmetic of Qualified Empirical Statements
1. We make provisional statements about the world
*
in the language of our theory and the context of time & place
2. on basis of evidence derived from the [real] world
*
conditioned by our theory and choice of systematic method
3. seeking to qualify our statements
*
with imperative to express our measures of uncertainty
Pattern of research publication in the social sciences
‘The Four Literatures of Social Science’
(Diana Hicks, 2004)
Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed)
•
All more trans-disciplinary than comparable scientific literatures
1. international journal articles the SSCI indexed currency of evaluation
2. books
can have a high citation/impact
1. national
knowledge developed in context
embedded in their society; influenced by national trends & policy concerns
4. non-scholarly publications
knowledge into application
enlightenment or knowledge transfer to the non-scholarly public
Hicks states Burnhill and Tubby-Hille (1994) “investigated this issue in some
depth [with] publications database from [ESRC] grant reports [and] survey ..
.. Assigning non peer reviewed journals to .. enlightenment .. suggests that psychologists,
statisticians and geographers do not publish much in non-scholarly literature. Other fields do.
Even economics, normally quite scientific in its publication patterns, exhibits a healthy
percentage of articles in non-scholarly venues. Linguistics, education and sociology lead in
share of non-scholarly publications.”
‘On measuring the relation between social science research activity and research
publication’ Research Evaluation 4 (3) December 1994
Pattern of research publication in the social sciences
‘The Four Literatures of Social Science’
(Diana Hicks, 2004)
Handbook of Quantitative Science and Technology Studies, Henk Moed (Ed)
1. international journal articles
2. books
1. national
embedded in their society
4. non-scholarly publications
enlightenment or knowledge transfer to the non-scholarly public
Hicks states Burnhill and Tubby-Hille (1994) “investigated this issue in some
depth [with] publications database from [ESRC] grant reports [and] survey ..
.. Assigning non peer reviewed journals to .. enlightenment .. suggests
that psychologists, statisticians and geographers do not publish much in nonscholarly literature. Other fields do. Even economics, normally quite scientific in
its publication patterns, exhibits a healthy percentage of articles in nonscholarly venues. Linguistics, education and sociology lead in share of
non-scholarly publications.”
‘On measuring the relation between social science research activity and research publication’
Research Evaluation 4 (3) December 1994
Pattern of research publication in the social sciences
Table from Burnhill and Tubby-Hille (1994) reproduced in Vasilakos et al (2007)
‘Evaluating the Performance of UK Research in Economics’, [sponsored by the Royal Economic Society]
Keele Economics Research Papers, ISSN1740-231x www.keele.ac.uk/depts/ec/kerp
Pattern of research publication in the social sciences
from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
Following
the trace topublication in the social sciences
Pattern
of research
Keele Economics Research Papers, ISSN1740-231x
www.keele.ac.uk/depts/ec/kerp
led me to:
What’s special about social sciences: policy & action
“philosophers have only interpreted the world,
the point is to change it”
Karl Marx (1845), Thesis 11
published in 1924 in German & Russian translation; in English in 1938
appeared in Engels’ edited version in 1888, as ‘Theses on Feurbach’
Not the moment to debate origins of social science:
of Hume, Ferguson,
Smith, Hegal, Marx, Kant, Jung, Parsons, Durkheim, Popper etc
– even Jeremy Bentham (UCL),
nor of modern theorists,
but along with development and shifts in theory …
what is key is that …
the practice of social science, and the modality of peer communication
and publication in the discipline, has much to do its connection to
the urgency of interaction with agencies of civil society
UK: ESRC Strategic Plan & Societal Big Challenges
Six Strategic Challenges:
1.
Global Economic Performance, Policy & Management
2.
Health & Well-being
3.
Environment, Energy & Resilience
4.
Security, Conflict & Justice
5.
Social Diversity & Population Dynamics
6.
New Technology, Innovation & Skills
UK: ESRC Strategic Plan & Societal Big Challenges
Six Strategic Challenges:
1.
Global Economic Performance, Policy & Management
2.
Health & Well-being
3.
Environment, Energy & Resilience
4.
Security, Conflict & Justice
5.
Social Diversity & Population Dynamics
6.
New Technology, Innovation & Skills
7. Public Debt & the ConDem Government
Data as scholarship: a cultural shift?
“You are not finished until you have done the
research, published the results, and published
the data, receiving formal credit for everything.”
Mark A. Parsons (2006)
International Polar Year
Preserve or Perish
“A scholar’s positive contribution is measured by
the sum of the original data that he contributes.
Hypotheses come and go but data remain.”
in Advice to a Young Investigator (1897) Santiago Ramón y Cajal
(Nobel Prize winner, 1906)
23
What’s special about social sciences: third party data
•
Demand for data to carry out secondary data analysis
– Social sciences do not generate all the data they need to
address their research questions
* Do not command the resources (funding/expertise)
– few research groups and Government could get funding to manufacture original data
•
ESRC-led National Data Strategy, 14 Actions:
•
•
•
•
•
•
•
potential research value of new types of data (transactions data and tracking records)
new data infrastructures via EU and Euro Strategy Forum for Research Infrastructures
improved access to Census of Population data
a geo-spatial resources advisory service (JISC/ESRC)
collaborative agreements with agencies within and outside UK
sharing of data resources across ‘North/South’ global networks
Explains why data libraries and archives have been around so long
– IASSIST
International Association for Social Science Information Service & Technology
– annual conference since 1974; www.iassistdata.org
– DISC-UK a group of data libraries in UK universities (including EUDL)
– Providing ease of access to data held elsewhere (including UKDA)
– Datashare project to support institutional responsibilities for data
» alongside Institutional Repositories
Note: Not all that is digital are data
a)
b)
c)
(& vice versa)
Data derive importance from their evidential value
–
the empirical base for (scholarly) statement & decision-making
–
Provenance (how data are derived) is very important
Differences in ways that disciplines in Humanities &
Social Sciences assess scholarship and evidence
–
in what they regard as data, as value for their subject
–
mix of approach to epistemology, inc document tradition
Data represented (encoded) as numbers or words
- often derived from observation (with issues of phenomenology!)
–
or as pictures or sounds (not encoded - pre-data?)
*
–
access to (now digitally/digitised) record of experience
or algorithmic models (as with physical & life sciences)
*
modelling is widespread in economics, psychology, social
statistics, geography etc
25
Our shared task:
To ensure ease & continuing access to record of scholarship
–
research publications and research data
Consider at least three types of (research) data:
A.
Supplementary data [enhanced publication]
–
multimedia files: part of the published article that presents
research argument and conclusions
*
*
B.
Research dataset(s) upon which conclusions based
–
C.
more than linear text, limited tabular and graphical display
enhances user experience with various multimedia objects
check analysis of those data to support statements made
Database(s) from which datasets were assembled
–
–
for reproducibility (exposure to refutation) and new work
via alternative analysis and updates to the database(s)
these are curated in situ – by data centres / originators
26
Citation and linking
•
Citation of the datasets used
–
•
(Type B data)
verification of analysis, that the figures and conclusions
accurately reflect those data
Citation of database(s)
(Type C data)
–
–
for reproducibility (exposure to refutation)
to prompt new work via alternative analysis and updates to
the database(s)
–
to credit those who curate the data needed for scholarship
Plus hyperlink to the database from the published article
… and back again from the database to the published article
Links to presentations, blogs, websites, funders etc related to the
same research activity and same researcher(s) (Type D data?)
27
Obtaining the citation at source
1.
CIESIN
“Most of our datasets and products contain a suggested citation
on the Web site as to where the data was obtained”
“Whenever possible, we urge you to cite the use of data
and web resources in the reference section”
– http://sedac.ciesin.columbia.edu/citations/
2.
How to Cite Statistics Canada Products:
“This guide has been developed for authors, editors, researchers,
academics, students, librarians and data librarians.
“It describes, in three steps, how to build your reference
when citing Statistics Canada products”
– http://www.statcan.gc.ca/pub/12-591-x/12-591-x2006001-eng.htm
Get it from those who make the data available: the data publishers
cf Cataloguing in Publication!
Link remains the key verb
But need to shift attention from
•
Linking resolver (unidirectional)
–
From metadata reference to full text of article
*
SICI-Citation | Z39.50
*
DOI | OpenURL | http
to
•
Linked Data (relational, bi-directional)
–
Between resources in the weave of the Web
*
Using URIs as names for things
–
*
Not just URLs (the addresses on the web) but the URIs
Using RDF/XML to define the relationships between the resources
–
RDF triples: subject / relationship / object
29
Resource Description Framework (RDF)
Resource Description Framework (RDF), and URIs
• framework for representing information in Web; identifiers
•
http://www.w3.org/TR/rdf-concepts/
•
http://www.w3.org/TR/rdf-primer/
RDF graph: Article & Supplementary Data
http://www.emeraldinsight.com/fig/0350570303002.png
1. Build and publish as metadata in XML format to be found on the web
2. Publishing text and data/multimedia content in XML will delight researchers
•
Researchers want to access ‘article as data’, via computational algorithm
uses Linked Data
uses Linked Data
Parse to ‘mark up’ archaeological site record (metadata)
Enriching resources with contextual metadata

Overcoming sparse metadata problem that inhibits discoverability
–
–

using ancillary information in the metadata
evoking ‘has Event’ relation
Initial focus on (digitised) 20th Century newsfilm footage
Sparse Metadata
The only data we have:
•1st October 1995
•Cyprus
•Disturbance (street disturbance)
•British soldiers
•Broadcast on TV News
37
finding related text
for mining and so
auto-creating metadata
to improve discoverability
and
provide/enhance context
Digital Library as applied Information Science
Michael Buckland, Presidential Address, American
Society for Information Science, JASIS’s 50th (1998)
2 traditions/mentalities co-exist in Information Science
1. Document tradition: signifying record-ness
2. Computational tradition: various uses of formal
techniques
* non-convergent mentalities working to build the ‘digital library’
a)modernisation of library services
b)infrastructure to access complex databases
Aside: first met Clifford Lynch when visiting Professor Buckland in
UC Berkley on occasion of IASSIST Conference in 1994
Time for me to stop …
Hoping that I have left some space/place for questions

Thank you
Acknowledgements
[email protected]
http://edina.ac.uk
Tel.: +44 (0)131 650 3302
Fax: +44 (0)131 650 3308
Pattern of research publication in the social sciences
from Burnhill and Tubby-Hille (1994), not yet reproduced by anyone
Abstract Data Model: Figure 1 in reference paper in Serials, March 2009
SERVICES: user requirements
E-J Preservation Registry Service
Piloting an
E-journals
Preservation
Registry
Service
E-Journal
Preservation
Registry
(b)
METADATA
on preservation action
(a)
METADATA
on extant e-journals
Data dependency
ISSN
Register
Digital Preservation Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.
Challenge to Ensure Continuing Access
Long term
digital preservation
Author
(article)
peer
review
E-prints
Publisher
Continuity
of access Licensed
E
c
o
n
o
m
y
ILL/
docdel
Licence
E-prints
free to web access
Online
Access
Institutional
Repositories
learned article serial
issue
society
peer
exchange
F
o
r
m
a
£
Subject
Repositories
Institutional
arrangement
Library
(serial)
Informal: ‘invisible college’ and the ‘gift economy’
Reader
(article)
Increasing dominance of The Web
Author
(article)
F
o
r
m
a
£
Publisher
article
serial
issue
Licence
Institutional
arrangement
Web 2.0/3.0:
Semantic web
mash-ups, Blogs.
RSS feeds, Wikis
free to web access
Library
(serial)
Role of
Institutional
Repositories?
peer to peer
exchange
Informal: ‘invisible college’ and the ‘gift economy’
Reader
(article)
E
c
o
n
o
m
y
Research Data
Generates (curates) data for own purpose,
or as part of team
Creator
… wants/has to ‘put’ it somewhere for use by others
(perhaps to be recognised by a peer community)
Key User (Researcher) Verbs:
Discover
data of interest
Locate
service on that data with
documentation on provenance etc
Request
Access
permission to use service
to service/data,
Evidential value of
data in analysis as
object of desire’
Researcher
……..
• The term “curation” builds on our understanding of the word
“curator”, somebody who keeps something for the public good,
whose value often needs to be brought out by the curator.
47
•
Firstly, this open context implies more support for explicit policies
with regard to data sharing, and it has major implications for
structuring and tools.
•
Secondly, the digital curator is store-keeper but he is also closely
linked to promoting new science, making sure that his user-base is
solid, sufficient, and looking forward to identify new ways to serve
present and future researchers. The digital curator should take an
active role in promoting and adding value to his holdings, hold
exhibitions, run joint events; he should manage the value of his
collection.
More definitions
There does seem to be a lack of clarity. Some terms worth
distinguishing are:
•
•
•
•
data preservation : a general term probably equivalent to digital
preservation in this context
digital preservation : could be, and probably is, interpreted as
simply ensuring the original bits and bytes are accessible
digital information preservation : this is what is referred to in
the OAIS standard - what is important is not the original "bits and
bytes" but the content. An OAIS ensures that the content is
accessible, understandable and usable.
curation : general term - taking care of things
– if someone currently calls themselves a “curator” – do we accept their
definition?
•
•
•
•
48
data curation : looking after and adding value to data
digital curation : looking after and somehow "adding value" to
digital data. This probably implies creating some new data from the
existing, in order to make the latter more useful and "fit for
purpose".
information curation : not seen in the wild
evidence : bit preservation plus authenticity and trust?
Ensuring
researchers, students and their teachers have
ease and continuity of access
to online scholarly resources
access
to content & services
location registry/discovery
licence
to use
licence registry
entitlement history
archiving registry
Suncat & Zetoc
OpenURL Router
UKAMF registry
content registry
Use case: article–length work published in e-journals
ISSN Register as a key content registry; need registry of ToC
[Curation is additional but has relation to ease and continuing access.]
P.Burnhill, Edinburgh 2009