My presentation

Download Report

Transcript My presentation

Social Science:
Implementation and Management
of a National Initiative- the
example of the UK Data Archive
Prof K. Schürer
Director, UK Data Archive
President, CESSDA
Project Co-ordinator, EC FP7/ESFRI PPP
1st African Digital Curation Conference
Pretoria, 12-13 February, 2008
UKDA history & overview
•
•
•
•
Archive established in 1968 (as ‘Data Bank’)
Funded by (then) SSRC to provide a service to UK HE sector
Initial focus on government survey data
New distributed service established 1 Jan. 2003
– Economic and Social Data Service (ESDS)
•
Mixed data types and formats
– Specialist Qualidata unit and History Data Service
•
Still predominately funded to provide service for HE/FE sectors
– ESRC, JISC, University of Essex
– Project funding (EC, JISC, MRC, AHRC, etc.)
•
•
Since 2005 designated as ‘Place of Deposit’ by TNA
C.60 staff – mixture of staff expertise/skills
UKDA now runs a family of services
•
•
•
•
•
•
ESDS
Census Programme Registration Service and Portal
History Data Service
HistPop
Rural Environment and Land Use Data Support Service
Nesstar Support Service
Plus R&D projects:
• SToRE
• DeXT
• CESSDA-PPP …
Specialist data
services
•
•
•
•
ESDS
ESDS
ESDS
ESDS
Government
International
Longitudinal
Qualidata
Greater emphasis on:
―
―
―
―
value-added data and documentation
enhanced resource discovery
improved delivery services
support and training for the secondary
use of data for research, learning and
teaching
― outreach and promotion
Data RIs can work
• 5,000+ datasets in the collection
• 250+ new datasets are added each year (and new
versions)
• C.50,000 orders for data per year
• C. 115,000 online sessions
• C. 48,000 registered users
• C. 17,000,000 page hits and C. 950,000 pdf
downloads last year
• Gateway to other collections (CESSDA, ICPSR)
Who produces social science data?
•
•
•
•
Government agencies
HE/FE sector
Private sector
Within HE/FE not just ESRC funded
– MRC, NERC, AHRC, Wellcome, Leverhulme, Rowntree
•
Increasing number of large digitisation projects
– JISC, NOF
•
•
•
•
Increasing tendency for government agencies to contract out
survey work to private sector (NatCen)
University sector tend not to get Government contracts
Devolution
Local Government
Types of data
•
•
•
•
•
Quant and Quali
Surveys
Censuses
Administrative data
Also increasing amounts of ‘non-survey’ type data
–
–
–
–
Images
Sound
Video
Mixed media
Selection guided by ‘use’ & ‘users’
•
•
•
•
•
•
UKDA seeks to identify and acquire material within broad areas
Discipline coverage: at the broadest level, data and other
electronic resources relating to society, in particular data about
individuals or groups of individuals. This includes strategic social
science and economic datasets e.g. unemployment statistics,
major household surveys.
Geographic coverage: data across a broad geographic coverage
focusing on the United Kingdom and cross-national datasets but
including material from other countries where appropriate and in
particular where these provide opportunities for comparative
research e.g. European data.
Temporal coverage: there are no restrictions on temporal
coverage, although pre-1945 accessions are acquired through
History Data Service.
Time series and panel data: data are sought which create or add
to a time series and/or panel survey.
Thematic coverage: to create a coherent body of materials relating
to a particular discipline or field of enquiry e.g. health.
But do you need to keep
everything?
• Curation needs to be informed by selection and
appraisal
• Resource allocation
• Rights issues / technical issues
• Short term vs long term
• Aggregative value
• Need to estimate what are the costs of NOT
preserving something
ESRC award holders
• Have been required to offer data for
access/curation for over 20 years (more recent
MRC)
• Some reluctance
• Need to work with carrots and sticks
• Move to data management life-cycle approach
• Move to self-archiving
Preservation in outline
 Standard directory structure for complete dataset




Everything in one place
Consistent structure makes precisely locating information easy
Makes caching of specific information types simple
Allows future migration to other systems and formats easier
 Data and documentation stored in portable format
 Ability to freely and intelligently read on many platforms
 Easier conversion to required format
 Easier migration to new portable format
 Study Number
Note and Read files
}Data format files (SPSS exp, SAS, SIR)
Original deposited format
mrdoc
}
Machine readable document files (pdf, word, ascii)
Processing information and control files
Media
 Paper based
 Punched card, paper tape or manuscripts
 Magnetic
 Various reel to reel and cartridge based (QIC)
 Optical
 e.g WORM, CDROM
 Storage environment, age and quality of original
material
 Rescue methods and services
Multi-copies, multi-formats, multimedia, multi-places
 Two copies on separate media in main system
 Up to 10 different versions of each individual file
in the shadow area
 Read only CD-ROM copy with error checking
 Complete off-site near-line copy of all data with a
high level security protection
 Tape monitoring and refresh strategy
 Front end copy to reduce load on main system
Standards & Security
 BS7799 - Information security
 Machine room conforms to main fire and
environmental control standards
 Conforms to BS5588 parts 3 and 9, BS5839
parts 1, 2 and 3, BS5306 part 4 and BS7083
 Conformity to BS6266, BS4783 parts 4, 5 and 7
 BS5454 store room
But – curation and long-term
preservation should not
(cannot?) happen in isolation
Challenges
• Legal issues
– Move toward greater openness and transparency
– Freedom of Information Act, 2000
– Yet greater concerns with confidentiality (Data
Protection Act, 1994)
– Statistics Act (approved researcher status)
• Data issues
– Confidentiality – secure data service
– Ethical issues (REC’s)
– ‘grey’ data/publications (research outputs)
Challenges #2
• Technical issues
–
–
–
–
e-Science and data grid
Institutional repositories
‘Self-archiving’ (UKDA-Store)
Facebook/google generation
• Political issues
– RCUK, OECD statements on research outputs
• Resource issues
– Who pays?
General Aims of the CESSDA Research Infrastructure PPP
• The focus of this project will be a major upgrade of the CESSDA RI to ensure that
European Social Science and Humanities (SSH) researchers have access to, and gain
support for, data resources they require to conduct research of the highest quality,
irrespective of the location of either researcher or data within the European Research
Area (ERA).
• The project will also improve the CESSDA RI so that member organisations are able to
transcend the limitations of their national resources through the creation of a common
platform, mission and stronger form of integration in which expertise is genuinely pooled,
shared and applied in a co-ordinated pan-European experience.
• This project will facilitate the delivery of a fully-integrated data archive infrastructure for
the SSH, allowing seamless, permanent access to as many data holdings across Europe
as possible.
Current Project Partners:
ADP, Slovenia
ADPSS. Italy
CIS, Spain
CNRS-RQ, Spain
DANS, Netherlands
DDA, Denmark
DISC, Sweden
EKKE, Greece
FORS, Switzerland
FSD, Finland
GESIS, Germany
NSD, Norway
RODA, Romania
SDA, Czech Republic
TARKI, Hungary
UK Data Archive, United Kingdom
WISDOM, Austria
Other CESSDA members:
Map showing countries which are CESSDA members
CEPS/INSTEAD, Luxembourg
ESSDA, Estonia
ISSDA, Ireland
Belgium
Portugal
Europe
Associated
• LSZDA (Latvian Databank of Social Sciences)
• Academy of Sciences, RigaDBSR (Bank of Social Data)
• Institute of Sociology, Academy of Sciences Russian Sociological Data Archive,
• Social Science Data Archive at REGLO Slovak Archive of Social Data
• Archive of the Institute for Sociology at the Slovak Academy of Sciences
• Archive of Sociological Data, Warsaw
• Rudjer Boskovic Institute, University of Zagreb
• University of Belgrade, Serbia
• Institute of Sociology of the National Academy of Sciences of Ukraine
• Kyiv National Taras Shevchenko University, Ukraine
• KIIS (Kiev International Institute of Sociology), Databank
• CPIJM, Centre for Political Studies and Public Opinion Research,
• University of Ss. Cyril and Methodius,
North America
• ICPSR, Inter-university Consortium for Political and Social Research, Michigan, USA
• SSHRC, Canada
Australia
• ASSDA, Australian Social Science Data Archive, Canberra
Partners:
Work Package Descriptions and their Beneficiaries
• WP1: Management and Co-ordination (UKDA)
• WP2: Dissemination Management (UKDA)
• WP3: Defining the Strategic, Financial,
Governance and Legal Framework (UKDA)
• WP4: Controlled Vocabularies (FSD)
Work Package Descriptions and their Beneficiaries
• WP5: Developing the CESSDA RI onestop-shop Portal (NSD)
• WP6: Strengthening the CESSDA RI
(RODA)
• WP7: Widening the CESSDA RI (GESIS)
Work Package Descriptions and their Beneficiaries
• WP9: Deepening the CESSDA RI by building an infrastructure
for content harmonisation and conversion (GESIS)
• WP10: Data collection, dissemination and access issues
(CNRS-RQ)
• WP11: Investigating the potential of grid technologies (UKDA)
• WP12: Technical Support for the Preparatory Phase (NSD)
WP 1 – Management and Co-ordination
WP 12 – Technical
WP
10
WP 5
WP 6
WP 7
WP4
WP 3
WP
11
WP 9
WP 8
WP 2 - Dissemination
WP13 &
WP14
External
Thank you