Transcript lib.hku.hk

PUBLISH YOUR DATA Nature Publishing Group's new initiatives to promote and credit open data sharing

HKU, September 18 th , 2014

Andrew L. Hufton

Managing Editor, Scientific Data Nature Publishing Group [email protected]

Nature Publishing Group shares a similar mission:

Since 1869 has been:

Nature ’s

mission To communicate the world’s best and most important science to scientists across the world and to the wider community interested in science.

Traditional Nature journals

Nature

• Publishes the most significant advances with the widest implications.

Significance should be readily apparent to anyone from any field.

Nature Research Journals

Publishes the most significant advances across the discipline each covers.

• Significance should be apparent to anyone in that discipline.

NPG ’s other in-house journals

High impact.

Targeted at specialists.

High quality.

Rapid publication.

Helping you publish, discover and reuse research data

An introduction to the editorial

process

Open access at Nature Publishing Group

Publish your data

Helping scientists publish important datasets, and ensuring they get credit for sharing.

Other initiatives to promote and credit open, reproducible research

The Editorial Process

From Submission to Publication

At the Nature-titles Submitted Manuscript Editor Reject without review (50-80%)

Decision:

Reject Accept Revise and reconsider Evaluations Referees

From Submission to Publication

At the Scientific-titles Managing Editor Submitted Manuscript Editorial Board Member Reject without review (Out of Scope only)

Decision:

Reject Accept Revise and reconsider Evaluations Referees

Open access & Nature Publishing Group

What is Open Access?

● Part of a global trend to encourage wider and easier access to research ideas and information.

• Wider dissemination of scientific knowledge speeds the pace of scientific progress.

• The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (22 nd October 2003) defined Open Access as the ability of others to “

copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship.

Growth of gold Open Access by region

Laakso and B jörk

BMC Medicine

2012

10

:124 doi:10.1186/1741-7015-10-124 (Reproduced under CC-BY http://creativecommons.org/licenses/by/2.0)

Why publish Open Access?

Because it’s better for science.

• • Scientific knowledge belongs to everyone.

Science progresses more rapidly when new ideas, new results and new understanding are shared most freely.

• Public understanding of science is improved by public access to primary research.

Why publish Open Access?

Because HKU wants you to!

Prof Tsui, Vice Chancellor of HKU, signed the Berlin Declaration in November 2009

From the HKU OA policy: “

By sharing our intellectual output, we and our community can realize greater benefits economically, socially, and intellectually .”

Why publish Open Access?

Because it’s better for YOU!

We found strong evidence that, even in a journal that is widely available in research libraries, OA articles are more immediately recognized and cited by peers than non-OA articles published in the same journal.

” 在同一本期刊中,开放获取的论文会更快 被你的同行注意到从而更快地被引用。 Eysenbach, G. Citation Advantage of Open Access Articles.

PLoS Biology 4, e157 (2006).

http://dx.doi.org/10.1371/journal.pbio.0040157

Green versus Gold Green Open Access

• • • Free but usually with restrictions.

arXiv.org, PubMed Central, … Allowed by all

Nature

journals under the following terms:   Submitted version can be posted at any time.

Final refereed version can be posted 6 months after publication.

 Published version (that is, the published PDF) should never be posted.

Green versus Gold Gold Open Access

• Fewer restrictions:  Full rights to do whatever YOU want to do with the final paper.

 Limited rights for others (depending on the license) to do what THEY want with the paper.

• • Author pays Article Processing Charge.

Free for all to download in perpetuity.

Nature Communications

● ● ● ● ● Publishes significant advances that have to potential to

influence thinking of specialists in a field

.

Broad appeal

isn ’t

a prerequisite for publication … but great science

is

!

2013 Impact Factor = 10.742.

Choice of subscription access or Open Access!

Specialist scope means the chances of being published are

more than twice

that of other

Nature

journals.

对专业性的强调意味着发表的机率是其他自然期刊 的 两倍。

Scientific Reports

Impact Factor:

5.078.

Speed:

Scientific Reports is committed to providing rapid publication service.

Acceptance rate:

Over 60% ●

Scope:

Publishes technically sound, original research papers in all areas of the natural and clinical sciences.

International Editorial Board:

experts across all disciplines.

1600 ●

Visibility

: Over 800,000 article page views per month.

http://www.nature.com/srep

Open Access means anybody can download, read, and cite your paper 开放获取意味着任何人都可以下载、访问和引用你的文章

Publish your data

In 1953 a scientific work could change the world w ith…

• • • •

One page of text.

Two authors.

One figure.

No raw data.

Watson, J. D. & Crick, F. H.

C. Molecular structure of nucleic acids. Nature 171, 737-738 (1953).

… but in the 21

st

century scientific discovery is more about data and collaboration.

• •

In 2012 the Encyclopedia of DNA Elements (ENCODE) generated

Thirty papers.

Across three different journals.

From thirty-two different research institutes.

The Data Deluge

Photo by Shalom Jacobovitz, via Wikipedia

Data, data, data

Depositions of datasets in archives continue to grow, surpassing journal articles in biomedical research Growth of biomedical research publications (red; current total >19 million), alongside the accumulation of research data, including nucleic acid sequences (black; current total ~163 million), computer-annotated protein sequences (magenta; current total 9 million), manually annotated protein sequences (green; current total 500,000) and protein structures (blue; current total 60,000)

Source: Biochemical Journal 2009 424, 317-333 - Teresa K. Attwood, Douglas B. Kell and others.

25

The Data Journal concept

• Data must be well described before others can use it and benefit from it.

• Scientists who share data in a reusable manner deserve credit through citable publications.

• Several journals now offer “data paper” article-types, including GigaScience, F1000Research, Earth Systems

Science Data, Biodiversity Data Journal

Helping you publish, discover and reuse research data

Honorary Academic Editor

Susanna-Assunta Sansone

Advisory Panel and Editorial

Board including senior researchers, funders, librarians and curators

Visit Email

nature.com/scientificdata [email protected]

Tweet @ScientificData Supported by

Now Live!

Get Credit for Sharing Your Data

Publications will be indexed and citeable.

Open-access

Authors select from three Creative Commons licenses for the main Data Descriptor. Each publication supported by CCO metadata.

Focused on Data Reuse

All the information others need to reuse the data; no interpretative analysis, or hypothesis testing

Peer-reviewed

Rigorous peer-review focused on technical data quality and reuse value

Promoting Community Data Repositories

Not a new data repository; data stored in community data repositories

Data Descriptor

relation with traditional articles Methods and technical analyses supporting the quality of the measurements.

Do not contain tests of new scientific hypotheses What did I do to generate the data?

How was the data processed?

Where is the data?

Who did what when Synthesis Analysis Conclusions

When should you submit a manuscript to Scientific Data?

• Alongside your article at a Nature journal.

• Describe standalone datasets that don’t fit in your other publications.

30 • Release data used in your previous research articles.

Focus on RNA sequencing quality control (SEQC)

In the September issue of Nature Biotechnology

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

SEQC/MAQC-III Consortium | doi:10.1038/nbt.2957

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance

Wang et al. | doi:10.1038/nbt.3001

Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq

Xu et al. | doi:10.1038/sdata.2014.20

Transcriptomic profiling of rat liver samples in a comprehensive study design by RNA-Seq

Gong et al. | doi:10.1038/sdata.2014.21

Stem Cells

• • •

Associated Nature Article

Data at

fig

share & NCBI GEO Integrated

fig

share data viewer

Neuroscience

• • • •

New Dataset

Data in OpenfMRI Source code in GitHub

Big Data

Code in GitHub

Environmental

New Dataset

• Data in

fig

share • Code in

fig

share • Integrated

fig

share data viewer • Cited in

Science

Making data discoverable

Linking between research papers, Data Descriptors, and data records

The Data Descriptor article-type

Data Descriptor

Article or

narrative

component

(PDF and HTML)

Experimental metadata or structured component

(in-house curated, machine-readable formats)

Data Descriptor

Focus on data reuse Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.

Does not contain tests of new scientific hypotheses • • • • • • • • • •

Sections:

Title Abstract Background & Summary Methods Technical Validation Data Records Usage Notes Figures & Tables References Data Citations

Joint Declaration of Data

Citation Principles by the Data Citation Synthesis Group, incl.: - CODATA - Research Data Alliance, - Force11

Data Descriptor

structured metadata (CC0) • •

In-house curation team:

assists users to submit the structured content via simple templates and an internal authoring tool performs value-added semantic annotation of the experimental metadata For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification : analysis method Data file or record in a database script

The right licence for the right content

Data Descriptor article:

Licensed under one of three Creative Commons licenses, by author choice:

Metadata:

released under the

CC0 waiver

to maximize reuse and aid data miners

Data:

the primary datasets will reside in public repositories. Partnering with figshare and Dryad, which both use the

CC0 waiver

.

Editorial process & policies

Editorial Board

Active scientists oversee peer-review Peer-review assesses • The completeness of the description • Alignment with community standards • Data deposition in an appropriate repository • Technical quality of the measurements • Reuse value

Clear data sharing policies

• Data must be deposited to an approved data repository before manuscript submission, prior to peer-review.

• If datasets are private, they must be made accessible to editors and referees in a secure and confidential manner.

• Must agree to release data to the public, without undue restrictions, at the time of publication.

• Reasonable controls allowed for datasets with human privacy restrictions.

Our recommended repositories

• • We currently recognize over 60 public data repositories.

We have integrated systems with both figshare and Dryad • • Earth sciences repositories include: Pangaea, ORNL DAAC, NERC Data Centres, and more We work with you to find the best place to archive your data.

Other initiatives to promote open-science at the Nature Publishing Group

46

Promoting Reproducible Science

• Strong data deposition requirements in fields with well-established repositories, across all Nature-titles.

• New life sciences reproducibility checklist, helps ensure that key information is included in each manuscript.

• Collaboration between the Nature-titles and Scientific Data to promote wider data sharing.

Data Citations

Formally link Data Descriptor to external data records

Joint Declaration of Data Citation Principles

by the Data Citation Synthesis Group, incl.: - CODATA - Research Data Alliance, - Force11

NPG supports ORCIDs

ORCIDs are open, unique personal identifier for researchers.

ORCIDs help you get credit for your scientific output • A persistent digital identifier that distinguishes you from every other researcher • Register for your own at www.orcid.org

• Include your ORCID when you are authored on a manuscript, in grants, and on your website

50

Concluding points

• Make the most out of your research • Share your data, get the credit you deserve • Register for an ORCID today

Thanks!

Managing Editor, Scientific Data

Andrew L. Hufton [email protected]

Honorary Academic Editor

Susanna-Assunta Sansone

Visit Email

nature.com/scientificdata [email protected]

Advisory Panel and Editorial

Board including senior researchers, funders, librarians and curators Tweet @ScientificData