Introduction to the Dryad Digital Repository A nonprofit repository for data underlying the international scientific and medical literature. April 2013 DataDryad.org.

Download Report

Transcript Introduction to the Dryad Digital Repository A nonprofit repository for data underlying the international scientific and medical literature. April 2013 DataDryad.org.

Introduction to the
Dryad Digital Repository
A nonprofit repository for data underlying the
international scientific and medical literature.
April 2013
DataDryad.org
1
• The End
– To make data archiving and reuse standard within scientific communication.
• The Means
–
–
–
–
–
Enable low-burden data archiving at the time of manuscript submission.
Promote researcher benefits from data archiving.
Promote responsible data reuse.
Empower journals, societies & publishers in shared governance.
Ensure sustainability and long-term preservation.
• The Scope
–
–
–
–
Research data in science and medicine
Primarily data underlying findings in peer-reviewed articles
Also data from some non-peer reviewed publications (e.g. dissertations)
And some non-data content (e.g. software scripts, figures)
DataDryad.org
2
The value proposition
• For authors and researchers, Dryad…
– increases the impact of, and citations to, published research
– preserves and makes available others’ data
– frees researchers from the burden of data preservation and access
• For journals, publishers, and societies, Dryad…
– frees journals from the burden of maintaining supplemental data
– supports all varieties of data archiving policies
• For libraries and institutions, Dryad…
– makes data available at no cost, under clear terms of use
– helps fulfill their research data management mandates
• For funders, Dryad…
– provides a cost-effective mechanism to make research more accessible
DataDryad.org
3
Data archiving has many benefits
Direct
Verification of published research
Preserving accessibility to data
Allowing reuse and repurposing of
data
Discoverability of data
Indirect (costs avoided)
Redundant data collection
Inefficient legacy data curation
Burden of sharing-upon-request
Opportunity cost of science not
done
Near term
Protection against personnel
turnover
Availability for review and validation
Long term
Secure long-term stewardship
Increased impact per publication
Private
Increased citations
New collaborations
New research opportunities
Fulfilling funding mandates
Public
More efficient use of research
dollars
Public trust in science
Educational opportunities
Improved methodologies
More informed policy
Modified from Beagrie et al. (2009) Keeping Research Data
Safe 2
DataDryad.org
4
Dryad focuses on the long tail of orphan data
Specialized repositories
(e.g. GenBank, GBIF)
Volume
Many datasets belong to
the long tail. Though less
standardized, they can be
rich in information content
and have unique value
Orphan data
Rank frequency of datatype
After Heidorn (2008)
DataDryad.org
http://hdl.handle.net/2142/9127
5
Why use Dryad rather than Supplementary Online Materials?
Dryad
SOM
Discoverable: indexed and exposed to both web and bibliographic search engines
✔
✗
Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers
✔
✗*
Permanent: processes in place to promote preservation (incl. format migration)
✔
✔/✗**
Curated: quality control by both automated processes and human inspection
✔
✗*
Ease of deposit: streamlined deposit, allowance for large and complex datasets
✔
✔/✗**
Formatted for reuse: support for non-PDF file formats
✔
✔/✗**
Updatable: new versions of data files can be added, metadata can be enhanced
✔
✗
Support for embargoes: can delay release of data in accordance with journal policy
✔
✗
Free reuse: no paywall, clear terms of reuse (all data released under CC Zero)
✔
✔/✗**
Economy of scale: cost efficiency from shared infrastructure
✔
✔/✗**
Alignment to organizational mission: focus on archiving and reuse of scientific data
✔
✗
* A few publisher SOM sites are exceptions to the general rule
** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
DataDryad.org
6
Researchers and journals are using Dryad
for archiving
DataDryad.org
7
…and using the data for research
DataDryad.org
8
DataDryad.org
9
Journals benefit when data is reused
Journal
Integration Data
Date
Packages
All Journals
Data
Downloads
Ave. downloads per
package
2765
127,396
46
Molecular
Ecology
2009-11-29
615
23,604
38
Evolution
2010-5-4
380
12,524
33
American
Naturalist
2009-8-29
208
11,195
54
Journal of
Evolutionary
Biology
2010-7-12
205
5,729
28
A “Data Package” is all of the data files for a journal article. All Dryad data
packages link to the associated journal article.
DataDryad.org
10
Dryad integrates article and data submission
Dryad works with the manuscript workflow of journals to:
– Simplify the process of data submission for authors,
– Allow authors to deposit, to a single repository, gigabytes of
data files in their original formats,
– Ensure permanent bidirectional links between the article and
the data, and increased visibility for both,
– Ensure that the data is accessible once the article becomes
available,
– Offer the option of making data available for editorial or peer
review, via secure access for editors and reviewers,
– Give authors the option to embargo public access to data for a
limited time after publication, if permitted by the journal's data
policy.
Options are customized to meet the requirements of each journal.
DataDryad.org
11
Over 30 integrated partner journals
.. and more being added regularly
The American Naturalist
Biology Letters
BMJ Open
Biological Journal of the
Linnean Society
Ecological Monographs
eLife
Evolutionary Applications
Evolution
Functional Ecology
gms German Medical Science
Heredity
Journal of Animal Ecology
Journal of Evolutionary
Biology
Journal of Fish and Wildlife
Management
Journal of Heredity
Journal of Open Public Health
Data
Journal of Paleontology
Methods in Ecology and
Evolution
Molecular Ecology and M.E.
Resources
Paleobiology
PLoS Biology, PLOS Genetics
Systematic Biology
ZooKeys & 7 other Pensoft
journals
DataDryad.org
12
Trustworthy repository infrastructure
• Making data available is the primary mission of the organization
– No pay-walls or restrictive licenses (all released under CCZero)
– The same data may be hosted by other services (non-exclusivity)
• Built on the DSpace repository platform, an open source framework
used by hundreds of institutional repositories
• Multiple machine and human interfaces for discovery and access
– Dublin Core metadata, harvestable through OAI-PMH
– DOIs registered through DataCite
– Curators add metadata to enhance keyword searching
• Assurance of data integrity and permanent availability
– Service mirroring and backup
– File migration and bit-level integrity assurance
– Organizational failover through DataONE and CLOCKSS
DataDryad.org
13
Dryad as an organization
• Governed by an interim Board 2009- 2011.
• Now a nonprofit organization incorporated in North
Carolina, USA.
• Membership open to all stakeholder organizations,
including scientific societies, publishers, funding
agencies, universities & institutes.
• Governed by an elected 12-member Board of Directors
– Nominated and elected by the Membership
• First Annual membership meeting 24 May 2013 in
Oxford.
DataDryad.org
14
Dryad’s business plan
• Deposit fees are the primary source of
revenue, for several reasons:
– The time of deposit is when the majority of costs are
incurred
– Revenue scales with costs (i.e. volume of deposits)
– The costs are distributed both fairly and widely
– This enables Dryad to make access to the data free in
perpetuity
• Membership fees will cover costs of annual
membership meetings
• Project grants will supplement the operational
budget for R&D activities
DataDryad.org
15
Payment plans
Plan
Contract?
Paid by
Cost1
1. Voucher
no
Any organization,
in advance
$65 per data package (members)
$70 per data package (nonmembers)
2. Deferred
payment
1 yr.
Any organization,
in advance
$70 per data package (members)
$75 per data package (nonmembers)
3. Subscription
2 yrs.
Journal or journals, fee
based on total # of
research articles
published by the in the
prior year
Unlimited number of submissions
for a fixed fee; base fee of $25 per
research article for members, $30
for non-members
Individual
deposit
no
Author,
at time of deposit
$80/data package, with waivers for
submissions from low-income
economies
1
Up to a fixed deposit size (currently 10GB). Additional charges for larger
deposits.
DataDryad.org
16
To learn more
•
•
•
•
•
Repository home: http://datadryad.org
News: http://blog.datadryad.org
Project documentation: http://wiki.datadryad.org
Twitter: @datadryad
Code: http://code.google.com/p/dryad
or contact us:
• http://datadryad.org/feedback
• Todd Vision, Director, [email protected]
• Laura Wendell, Dryad Executive Director, [email protected]
DataDryad.org
17