Policies and procedures for data publication from the PREPARDE project Sarah Callaghan, Fiona Murphy, Jonathan Tedds, Varsha Khodiyar, John Kunze, Rebecca Lawrence, Matthew S.

Download Report

Transcript Policies and procedures for data publication from the PREPARDE project Sarah Callaghan, Fiona Murphy, Jonathan Tedds, Varsha Khodiyar, John Kunze, Rebecca Lawrence, Matthew S.

Policies and procedures for data
publication from the PREPARDE
project
Sarah Callaghan, Fiona Murphy, Jonathan
Tedds, Varsha Khodiyar, John Kunze,
Rebecca Lawrence, Matthew S. Mayernik,
Angus Whyte, Wil Wilcox, Timothy Roberts
#preparde
[email protected] @sorcha_ni
Why cite and publish data?
• Pressure from (UK) government to make data from
publicly funded research available for free.
• Scientists want attribution and credit for their work
• Public want to know what the scientists are doing
• Research funders want reassurance that they’re getting
value for money
• Relies on peer-review of science publications (well
established) and data (not done yet!)
http://www.evidencebasedmanagement.com/blog/2011/11/04/newevidence-on-big-bonuses/
• Allows the wider research community to find and use
datasets, and understand the quality of the data
• Extra incentive for scientists to submit their data to data
centres in appropriate formats and with full metadata
Data, Reproducibility and Science
Science should be reproducible –
other people doing the same
experiments in the same way should
get the same results.
Observational data is not reproducible
(unless you have a time machine!)
Therefore we need to have access to
the data to confirm the science is
valid!
http://www.flickr.com/photos/31333486@N00/1893012324/sizes/o/i
n/photostream/
PREPARDE: Peer REview for Publication & Accreditation
of Research Data in the Earth sciences
•
•
•
•
•
•
•
Lead Institution: University of Leicester
Partners
– British Atmospheric Data Centre (BADC)
– US National Centre for Atmospheric Research (NCAR)
– California Digital Library (CDL)
– Digital Curation Centre (DCC)
– University of Reading
– Wiley-Blackwell
– Faculty of 1000 Ltd
Project Lead:
Dr Jonathan Tedds (University of Leicester, [email protected])
Project Manager: Dr Sarah Callaghan (BADC, [email protected] )
Length of Project: 12 months
Project Start Date: 1st July 2012
Project End Date: 31st June 2013
Geoscience Data Journal, Wiley-Blackwell and the
Royal Meteorological Society
• Partnership formed between Royal
Meteorological Society and academic
publishers Wiley Blackwell to develop a
mechanism for the formal publication of data in
the Open Access Geoscience Data Journal
• GDJ publishes short data articles cross-linked
to, and citing, datasets that have been
deposited in approved data centres and
awarded DOIs (or other permanent identifier).
• A data article describes a dataset, giving details
of its collection, processing, software, file
formats, etc., without the requirement of novel
analyses or ground breaking conclusions.
• the when, how and why data was collected
and what the data-product is.
How we publish data
The traditional online journal model
Data
1) Author prepares the
paper using word
processing software.
Word processing software
with journal template
A Journal
(Any online
journal system)
PDF PDF PDF PDF PDF
2) Author submits
the paper as a
PDF/Word file.
3) Reviewer reviews the
PDF file against the
journal’s acceptance
criteria.
Overlay journal model for publishing data
1) Author prepares the
data paper using word
processing software and
the dataset using
appropriate tools.
Word processing software
with journal template
2a) Author submits
the data paper to
the journal.
2b) Author submits
the dataset to a
repository.
Data
Data Journal
(Geoscience Data Journal)
html
Data
BADC
html
html
html
Data
Data
BODC
3) Reviewer reviews
the data paper and
the dataset it points
to against the
journals acceptance
criteria.
PREPARDE topics
Example steps/workflow required
for a researcher to publish a data
paper
3 main areas of interest (in orange)
1. Workflows and cross-linking
between journal and repository
2. Repository accreditation
3. Scientific peer-review of data
• Division of area of
responsibilities between
• repository controlled
processes
• journal controlled processes
Workflows
• Data Centres
– CEDA (broken down into type of data submitter)
– NCAR Earth Observing Laboratory (EOL): Computing,
Data, and Software Facility
– NCAR CISL Research Data Archive (RDA),
http://rda.ucar.edu/
– NERC DOI minting workflow
• Journals
– Geoscience Data Journal
– International Journal of Digital Curation (control)
Data repository workflows
• Workflows are very varied! No one-size fits all
method
• Can have multiple workflows in the same data
centre, depending on interactions with external
sources (“Engaged submitter”/ “Data dumper” /
“Third party requester”)
Repository Workflow – NCAR Comp. & Info.
Systems Lab Research Data Archive (RDA)
Check with data
provider for changes
to files
Data Ingest
Data Preparation:
•Automated file collection.
•Check integrity of file
receipts.
•Compare bytes and
checksums (if available)
with original data
providers.
Not ok
Ok
Processing:
•Validate files – using
software, read the full
content of every file.
•Pull out metadata.
•Identify errors and
metadata holes.
•Do time-series checks.
•Check metadata
against internal
standard/expectation.
•If necessary, filter data
or fix metadata.
Contact data provider
Errors found
Notification to
provider/user community
Access Development Phase
Embargo
Online Data
(Most Demanded)
Archive
(Tape-based)
Publish Metadata – User
GUIs
Distribute
metadata
Metadata Database
•Spatial info
•Temporal info
•Global Change
Master Directory
(GCMD) keywords
•Parameters
•Format table
relationships
GCMD
NCAR CDP
BADC
Remote
backup
… OAI-PMH
Journal
workflow
Aim is to minimise effort
needed to submit a data
paper by taking advantage
of already submitted
metadata.
Sharing metadata also
ensures that
additions/corrections
made in one location get
propagated through to
the others
Generic data
publication workflow.
Dashed lines indicate linking (via URL) or
citation (via DOI).
Solid lines indicate the results or inputs into
processes.
Dotted line indicated where the results of a
process need to be fed back into another
process.
Journal responsibilities are orange, data
centre’s are purple
Work on workflows now being extended as
part of the RDA’s Workflows Working Group –
part of the Publishing Data Interest Group
Cross-linking
BADC
NCAR
GDJ
This is what we have to focus on
for PREPARDE – demonstrate
cross linking between GDJ and a
data repository (BADC/NCAR)
Unfortunately this direct cross-linking
isn’t scaleable!
Need for off-the shelf solutions that can
work across multiple research domains
Cross-linking – the ideal situation
Registry could provide other functions as well
as being an intermediary between journals
and data repositories like:
• Certify data centres are “trustworthy”
• Administer linking mechanism
• Provide search and metrics functions
Registry
Disadvantages:
• Single point of failure
• Difficulty of standardisation across different
research domains
Could OpenAIRE be this registry? Could
DataCite? Could re3data.org?
Registry would need to be discipline agnostic!
Do we have a start?
DataCite have standardised a set of bibliometric metadata
that have to be submitted before a DOI for a dataset can
be minted by a repository.
Standardised
metadata from
repositories
This metadata is then made openly available via the
DataCite metadata search: http://search.datacite.org/ui
Given a DOI, a journal can then easily find the DOI
standard metadata.
DataCite Metadata Store
DataCite also have a content resolver
http://data.datacite.org/static/index.html
What’s missing is the return link, where the journal can
let the repository know that a dataset has been cited
(directly or via DataCite)
Journal
What PREPARDE has done
NCAR
BADC
Standardised
metadata
Standardised
metadata
DataCite Metadata Store
GDJ
• We already have a link from the GDJ
data article to the data repository –
thanks to the DOI.
• GDJ can also pull the standard DOI
metadata attached to that DOI from the
DataCite metadata store
• GDJ needs to inform the repository that
their dataset has been cited/published
– bearing in mind scaling issues!
• At this time, we have a manual workaround (i.e. email)
Live Data paper!
Dataset citation is first thing in
the paper (after abstract) and is
also included in reference list
(to take advantage of citation
count systems)
DOI: 10.1002/gdj3.2
Dataset catalogue
page (and DOI
landing page)
Reference to Data Article
Clickable link to Data Article
Other types of cross-linking
1.
2.
3.
4.
Data repository banner ads
Geographical maps
Pulling metadata from the data repository into journal workflows
“Data behind the graph”
For each type of cross-linking we investigated:
•
Type of crosslinking
•
Reason for crosslinking
•
Current procedures
•
How to implement this crosslink in Geoscience Data Journal (GDJ)
•
How to roll out this crosslink to other journals
•
Further work and issues
Data repository banner ads (1)
Example banner link in a ScienceDirect article
(http://www.sciencedirect.com/science/article/pii/S0921818111001159)
Data repository banner ads (2)
• Allows readers of the article to get to the dataset in the repository, or to
the top level of the repository (where they can browse/search for the data)
• Article is text mined for strings such as flags, accession numbers or names
of data repositories
• A taxonomy and controlled vocabulary will help automate this
• Webpage real estate tends to be limited!
• Flyover image and link might be more appropriate than fixed ad
• Relationship between journal publishers and repositories to ensure that
ad/logo is up to date and link is correct.
Geographical maps (1)
Example mapping of geolocation metadata in the Pangaea data repository landing page.
(http://doi.pangaea.de/10.1594/PANGAEA.735719)
Geographical maps (2)
Example Elsevier article on ScienceDirect displaying geolocation metadata on a map for the
dataset referred to in the article.
Geographical maps (3)
• Takes advantage of geolocation data present in the dataset’s metadata
• Allows the plotting of multiple dataset locations on the same map
• Option to ingest geolocation metadata from repository or from the
DataCite metadata
• Best not to duplicate metadata unnecessarily in different locations – i.e.
keep metadata about the dataset with the dataset in the repository
• Standardisation is key: ingesting metadata from multiple repositories using
different methods for each is not scalable.
Pulling
metadata from
the data
repository into
journal
workflows (1)
Example figshare widget embedded in an
F1000Research paper
(http://f1000research.com/articles/1-3/v1
).
The widget provides access to the data in
figshare, enables the metadata to be
previewed within the article, and provides
repository metadata about the dataset
(namely number of views, shares,
downloads etc.)
Pulling metadata from the data
repository into journal workflows (2)
• Pre-publication metadata shared between repositories and journals at article
submission stage
• reduces duplication of effort by author in entering dataset metadata twice
• ensures consistency of information between journal and repository
Possibilities:
• Author inputs minimal dataset information such as DOI and the journal use the DOI to
locate the metadata and add the necessary information into the journal article.
• For repositories requiring significant amounts of metadata – may be possible to create
a tool that automatically generates the first draft of a very structured data article.
• Implementation of embedded widgets requires many-to-many relationships to be
built up to map the dataset metadata appropriately
• How much dataset metadata should the reviewer see on the journal site?
“Data Behind the graph” (1)
Active Chart created and displayed
(http://activecharts.org/share/a7dd3bae149b2aba5b8f0d895e00d364 ), featuring user
selection tick boxes to display/hide data and replot it.
“Data Behind the graph” (2)
Left: F1000Research
data plotting tool
showing the raw
data and the author
graph of that data
(Nicholls et al 2013,
doi:
10.12688/f1000rese
arch.2-150.v1)
Right: Example
replot of the data
using the
F1000Research data
plotting tool
enabling readers to
replot raw
spreadsheet data,
changing the x and yaxis as appropriate
“Data Behind the graph” (3)
• Accessibility to the data behind the graph enables researchers to make direct
comparisons with previously published, or as yet unpublished, results.
• Clicking on the plot would redirect the reader to the subset of the data used to create
that plot
• Issues with citation granularity
• Is it appropriate to assign a DOI to just the subset of a larger dataset that
underlies a particular graph?
• Imagine a mixed ecosystem in the future, where repository-managed data, crosslinked with research articles, exists alongside small, specific, image-related datasets
that are hosted alongside, and more closely bound to, the articles themselves.
• Relies on authors being willing and able to submit the exact data subsets they used to
create each figure
• Will involve additional work, both in producing the subsets and archiving them
Recommendations and conclusions (1)
Standardisation of metadata
•
•
Automatic processes for the linking and sharing of metadata need to be
developed
Require common standards
PREPARDE recommend the DataCite metadata schema as a common metadata
kernel for sharing and exchanging dataset metadata.
It is also recommended that an
agreed geolocation standard is
implemented given the wide range
of multidisciplinary datasets that
can be combined in this way.
https://xkcd.com/927/
Recommendations and conclusions (2)
Use of DOIs and data citation
• Use DOIs for linking data to publications
• In the context of a formal data citation
PREPARDE recommends the DataCite citation
structure given in the DataCite metadata
schema v3.0 http://schema.datacite.org ,
though where appropriate to the scientific
domain, other permanent identifiers may be
used.
Citations of data should be included in the references list of the article
Journal’s author guidelines should be updated to request authors to cite the
datasets used in their article.
Recommendations and conclusions (3)
Role of a centralised, 3rd party registry
• Simplify the process of passing information between data repositories
and journals.
As yet this registry does not exist, though some existing initiatives (DataCite,
OpenAIRE) provide some aspects of the service that would be required of
this registry.
Although not data-related, CrossRef also provide some aspects of this
registry service.
PREPARDE recommend that this be investigated through the Publishing Data
Interest Group of the Research Data Alliance
Problems still to solve
• Automatic methods for:
– (Data) journal informing repository dataset has been cited
– Repository linking back to paper citing dataset
• Sharing of dataset metadata between repository and journal
– So paper author doesn’t have to repeatedly enter metadata in multiple locations
– So corrections made in one place can be propagated across
• Centralised registry for crosslinking
– Deal with scalability issues in
direct linking between journals
and repositories
• Methods for issuing
corrections to data after data
paper has been published
THE GENERAL PROBLEM HTTP://XKCD.COM/974/
Repository accreditation
Link between data paper and dataset is crucial!
• How do data journal editors know a repository is
trustworthy?
• How can repositories prove they’re trustworthy?
What makes a repository trustworthy?
• Many things: mission, processes, expertise,
workflows, history, systems, documentation, …
• Assessing trustworthiness requires assessing the
entire repository workflow
•
•
PREPARDE / IDCC13 Workshop – report at
http://proj.badc.rl.ac.uk/preparde/attachment/wiki/Deli
verablesList/PREPARDE_IDCC_WshopReport.pdf
Peer review of data is implicitly peer review of repository
Data
Centre
And what does
“trustworthy” mean,
when you get right
down to it?
Repository accreditation schemes:
These schemes look at all of the business of
running a repository, but don’t directly address
the issues required for data publication.
Data for publication needs to:
• Be persistent
• Be permanently identified
• Be provided with a landing page
• Have standard publication metadata
• Have accessibility/licensing information
Document at: http://bit.ly/ZhYHZl
Feedback to: https://www.jiscmail.ac.uk/DATAPUBLICATION
Repository Accreditation
For data publication, a repository must be actively managed in
order to:
1.
Enable access to the dataset
2.
Ensure dataset persistence
3.
Ensure dataset stability
4.
Enable searching and retrieval of datasets
5. Collect information about repository statistics
Guidelines are split into general principles, and subject specific
appendices.
Only the Earth and Life sciences in the appendices at this time
1. Enable access to the dataset
a. Ensure that data will be accessible (either
as open data, or provide information on
conditions of access and a clear point of
contact).
b. Have a policy in place allowing appropriate
access for peer reviewers, as required as
part of support for the data peer-review
process.
i. In the context of data, peer reviewers
are peer reviewers are individuals with
appropriate scientific and/or technical
expertise who produce or use data.
http://lolcatzencyclopedia.files.wordpress.com
/2011/02/lolcat___computer_eating_by_tenky
ougan1.jpg
2. Ensure dataset persistence (1)
a. Have a clear and public assertion of responsibility to
preserve the data and provide access to the data over
the long term.
b. Have an appropriate, formal succession plan,
contingency plans, and/or escrow arrangements in
place in case the repository ceases to operate or the
governing or funding institution substantially changes
its scope.
c. Repositories must develop and implement suitable
quality control and security measures to ensure the
metadata is correct and the data themselves are
maintained and curated to avoid degradation.
i. User feedback can and should be used to
strengthen and correct the metadata as needed.
http://sardonicsalad.com/?p=667
2. Ensure dataset persistence (2)
d. Assign globally unique persistent IDs to the
published datasets and maintain a repositorymanaged URI associated with each of those
IDs. These URIs should also be associated with
versions of the datasets.
e. Permanent IDs for the dataset must resolve to
a publicly accessible landing page which must:
i. be open and human readable (and it
would be preferred that they should also
be provided in a format which is machine
readable)
ii. describe the data object and include
appropriate metadata and the permanent
identifier (used to identify the page in the
first place)
iii.be maintained, even if the data has been
retracted.
Preserving data: how not to do it
3. Ensure dataset stability
a. Stability means that the exact same
version of the dataset that was cited can
be returned to when the citation is
resolved.
b. If dataset versioning is supported, new
versions should be permanently
identified and linked from the original,
published dataset landing page, without
overwriting the original version linked
from the article). The database should
provide time stamped versions of
archival data.
http://www.ptgear.co.uk/wpcontent/uploads/2011/03/elephant-onstability-ball.jpg
4. Enable searching and retrieval of
datasets
a. Allow users to easily determine whether a dataset
has been peer reviewed or been subject to a
robust quality assurance process.
b. Provide appropriate metadata about the dataset
in human readable form on the landing page (see
point 2.e), and when possible standardized
machine readable formats e.g. DataCite metadata
schema http://schema.datacite.org
c. Provide appropriate information about licensing
and permissions, and manage access to restricted
or embargoed material as appropriate.
d. Provide access to allow metadata for the datasets
to be searched and retrieved through interfaces
designed for both humans and computers.
http://beingthecomedian.blogspot.co.uk/2011/01/we
eeeee-im-horsie.html
5. Collect information about repository
statistics
a. Publish statistics on the level of
access to any deposited item that is
publicly accessible, to contribute to
metrics of the item's publication
impact.
b. Publish information to enable
journals and depositors to assess its
take-up in the community it aims to
serve, e.g. about any operational
agreement with a well-established
journal, learned society or
equivalent body.
http://peacock-maths.org/page28.php
What we learned about repository
accreditation
• It is a very contentious subject!
– Repository accreditation schemes exist, but don’t have significant
numbers of members.
– Reason for the lack of uptake of repository accreditation schemes
is not clear.
• Repositories feel that there is no clear benefit?
• Accreditation process is unclear or too arduous and/or confusing?
• Repositories seem to be content to rely on their own
reputations to demonstrate their suitability as archives for
data publication.
– We think this will change in the near future, as data publication and
data stability becomes more important.
– Further work is needed to identify blockers to the uptake of
repository accreditation schemes.
Data Peer Review for Publishing Data
Dr Jonathan Tedds [email protected] @jtedds
Senior Research Fellow,
Director: Health And Research Data Informatics
Department of Health Sciences, University of Leicester
PI #PREPARDE http://www.le.ac.uk/projects/preparde
Editor-in-Chief, Open Health Data Journal (Ubiquity)
Co-Chair Research Data Alliance – WDS Publishing Data IG
Why open, why peer review?
•
•
•
•
Science as an Open Enterprise Report
As a first step towards this intelligent
openness, data that underpin a journal article
should be made concurrently available in an
accessible database
We are now on the brink of an achievable aim:
for all science literature to be online, for all of
the data to be online and for the two to be
interoperable. [p.7]
Royal Society June 2012, Science as an Open
Enterprise,
http://royalsociety.org/policy/projects/science
-public-enterprise/report/
Issues linking data to the scientific record:
–
–
–
•
Data persistence
Data and metadata quality
Attribution and credit for data producers
Geoffrey Boulton (Edinburgh), Lead author:
– “Science has been sleepwalking into crisis of
replicability...and of the credibility of science”
– “Publishing articles without making the data
available is scientific malpractice”
Peer review of data: the Perfect Disaster?
• Support for the peer review process
– scholars contributing peer reviews with little formal
reward
– opportunity to polish and refine understanding of the
cutting edge of research
• But peer review system under stress
– exploding number of journals, conferences, and grant
applications
– self-publication tools - blogs and wikis - allow scholars to
disseminate their research results and products
• Faster and more directly
• Now adding research data into the publication and peer
review queues …see Mayernik et al, accepted, BAMS!
Peer-review of data
•
•
•
Technical
– author guidelines for GDJ
– Funder Data Value Checklist
– implicit peer review of repository?
Scientific
– pre-publication?
– post-publication? E.g. F1000R
– guidelines on uncertainty e.g. IPCC
– discipline specific?
– EU Inspire spatial formatting
Societal
– contribution to human knowledge
– reliability
http://libguides.luc.edu/content.php?pid=5464&sid=164619
Open Peer Review of Data?
ESSD peer review ensures that the datasets are:
Plausible with no immediately detectable problems;
Sufficient high quality and their limitations clearly
stated;
Well annotated by standard metadata and available
from a certified data center/repository;
Customary with regard to their format(s) and/or
access protocol, and expected to be useable for the
foreseeable future;
Openly accessible (toll free)
Earth System Science Data journal:
http://www.earth-system-science-data.net/
Rebecca Lawrence, Data Publishing: peer review,
shared standards and collaboration,
http://www.dcc.ac.uk/events/research-datamanagement-forum-rdmf/rdmf8-engagingpublishers
Faculty 1000 Open Peer Review
Sanity check:
Format and suitable basic structure adherence
A standard basic protocol structure is adhered to
Data stored in the most appropriate and stable
location
Open Peer Review:
Is the method used appropriate for the scientific
question being asked?
Has enough information been provided to be able to
replicate the experiment?
Have appropriate controls been conducted, and the
data presented?
Is the data in a useable format/structure?
Are stated data limitations and possible sources of
error appropriately described
Does the data ‘look’ ok (optional; e.g. Microarray
data)
Draft Recommendations on Peerreview of data
•
•
•
•
•
Summary Recommendations from
Workshop at British Library, 11 March
2013
Workshop attendees included funders,
publishers, repository managers,
researchers ….
Draft recommendations put up for
discussion and feedback captured
Feedback from the community still
welcome
2nd workshop 24 June: put
recommendations to peer reviewers!
Document at: http://bit.ly/DataPRforComment
http://libguides.luc.edu/content.php?pid=5464&sid=164619
Feedback to: https://www.jiscmail.ac.uk/DATAPUBLICATION
Draft Recommendations on data peer review
Summary Recommendations from Workshop at the British Library, 11 March 2013
• Connecting data review with data management planning
• Connecting scientific, technical review and curation
• Connecting data review with article review
• 4-5 draft recommendations in each of above
• Assist Researchers, Publishers, Journal Editors, Reviewers,
Data Centres, Institutional Repositories to map requirements
for data peer review
• Matrix of stakeholders vs processes
– Assist in assigning responsibilities for given context
– New for most disciplines
– Learn from disciplines where this already happens
Connecting data review with data management planning
1. All research funders should at least require a “data sharing plan” as part of
all funding proposals, and if a submitted data sharing plan is inadequate,
appropriate amendments should be proposed.
2. Research organisations should manage research data according to
recognised standards, providing relevant assurance to funders so that
additional technical requirements do not need to be assessed as part of the
funding application peer review. (Additional note: Research organisations
need to provide adequate technical capacity to support the management of
the data that the researchers generate.)
3. Research organisations and funders should ensure that adequate funding is
available within an award to encourage good data management practice.
4. Data sharing plans should indicate how the data can and will be shared and
publishers should refuse to publish papers which do not clearly indicate how
underlying data can be accessed, where appropriate.
Connecting scientific, technical review and curation
1. Articles and their underlying data or metadata (by the same or other
authors) should be multi-directionally linked, with appropriate
management for data versioning.
2. Journal editors should check data repository ingest policies to avoid
duplication of effort , but provide further technical review of important
aspects of the data where needed. (Additional note: A map of
ingest/curation policies of the different repositories should be generated.)
3. If there is a practical/technical issue with data access (e.g. files don’t open
or exist), then the journal should inform the repository of the issue. If
there is a scientific issue with the data, then the journal should inform the
author in the first instance; if the author does not respond adequately to
serious issues, then the journal should inform the institution who should
take the appropriate action. Repositories should have a clear policy in
place to deal with any feedback.
Connecting data review with article review
1. For all articles where the underlying data is being submitted, authors need to
provide adequate methods and software/infrastructure information as part of
their article. Publishers of these articles should have a clear data peer review
process for authors and referees.
2. Publishers should provide simple and, where appropriate, discipline-specific data
review (technical and scientific) checklists as basic guidance for reviewers.
3. Authors should clearly state the location of the underlying data. Publishers should
provide a list of known trusted repositories or, if necessary, provide advice to
authors and reviewers of alternative suitable repositories for the storage of their
data.
4. For data peer review, the authors (and journal) should ensure that the data
underpinning the publication, and any tools required to view it, should be fully
accessible to the referee. The referees and the journal need to then ensure
appropriate access is in place following publication.
5. Repositories need to provide clear terms and conditions for access, and ensure
that datasets have permanent and unique identifiers.
TODO
•
What’s missing?
– Need context including long tail and international
– Currently assume a lot
•
•
publishing paradigm
Processes/workflows
– Suggest criteria in at least one discipline as example?
•
International Journal of Epidemiology & statistical review
– Open community review?
•
Who are they for?
– Long tail
– Journal submission systems – model more generically
•
What next?
– how much would it cost in resources to implement these reccs
•
Future RDA WG?
– Practical training in data review?
– RDA Workflows WG: can we map the reccs to the workflows
– Is your org ready to buy into this?
6-11-2015
Launch meeting discussion
55
Please! Tell us what you think
Always happy to get input from others!
#preparde
[email protected] [email protected]
@sorcha_ni
@jtedds
http://citingbytes.blogspot.co.uk/
Guidelines on peer review for data:
http://bit.ly/DataPRforComment
Guidelines for repository accreditation for data
publication: http://bit.ly/ZhYHZl
Feedback to:[email protected]
Project website: http://www.le.ac.uk/projects/preparde
Project blog: http://proj.badc.rl.ac.uk/preparde/blog
Image Credit: http://bit.ly/9H4qBX