Transcript Document

Data Citation,
Persistent Identifiers,
and Trusted Repositories
George Alter
ICPSR
Emerging infrastructure offers new
incentives for data sharing
• Data repositories track data use
• Citation services demonstrate the impact of
data
• Links between data and publications enable
cross-disciplinary discovery
How do we enable replication of results
and reuse of existing data?
• Data citation
• Persistent identifiers
• Trusted repositories
Why is data citation important?
• Creators of data have a right to expect their work
to be acknowledged
• Citation will enhance the careers of data
producers
• Recognition encourages others to share data
• Citations are an element in evaluating the impact
of a data collection
NSF Biosketch
f. Biographical Sketch(es)
(c) Products
A list of: (i) up to five products most closely related to the
proposed project; and (ii) up to five other significant
products, whether or not related to the proposed project.
Acceptable products must be citable and accessible
including but not limited to publications, data sets,
software, patents, and copyrights…
Each product must include full citation information
including (where applicable and practicable) names of all
authors, date of publication or release, title, title of
enclosing work such as journal or book, volume, issue,
pages, website and Uniform Resource Locator (URL) or
other Persistent Identifier.
Thomson Reuters
Data Citation Index
Search for
“Murder
cases”
Thomson Reuters
Data Citation Index
Results list showing
Data Repositories,
Data studies and
Data sets
Citation
count
Thomson Reuters
Data Citation Index
DATA STUDY RECORD DISPLAY
Navigate to a list of publications
describing research where data was
used
Parent source repository reference and
hyperlink to repository record
Grant funding information
Thomson Reuters
Data Citation Index
DATA STUDY RECORD DISPLAY
CONTINUED
Recommendation of
how to cite the data
Navigation to associated datasets and
link to download data from source
repository
Inconsistent Referencing of Data
Data-PASS letter to the American Political Science
Association, August 25, 2010
Similar letters sent to American Economics Association, American Education Research
Association, and American Sociological Association.
Standards for Citation of Data Exist
General Guidance
• Data Synthesis Group Data Citation
Principles
• Data Preservation Alliance for the
Social Sciences (Data-PASS)
• CODATA/ITSCI Task Force on Data
Citation
Exemplars of Practice
• Institute for Quantitative Social
Science (IQSS), Harvard University:
Dataverse Network
• Inter-university Consortium for
Political and Social Research
• Journal of Statistical Software
• American Sociological Review
Data
Citation
Author Guidelines Should Say:
• Data should be cited in the same place (references,
footnotes) as publications
• Each citation must include these basic elements:
–
–
–
–
Title
Author
Date
Persistent identifier (such as the Digital Object Identifier,
URN, or Handle)
– Version and fixity information are strongly recommended
• Data should be accessible in a trusted digital repository
Data archives provide citations
Persistent
Identifier (Handle)
Data archives provide citations
Persistent
Identifier (DOI)
Persistent Identifiers
• A long-lasting reference to a digital object
• URLs point to locations, which are unstable
• Persistent Identifiers provide a name and a
locator
• Digital Object Identifiers (DOIs) are widely
used for publications
• DOIs are resolved by Registration Agencies
DOIs are also used
to resolve rights
and subscriptions
When DOIs are used in citations, the citing
articles can be recovered by search engines.
This data set has
been used in 187
publications, but
only 9 used the
DOI.
DataCite
• DOI Registration Agency created for scientific data
– Maintains the resolution infrastructure
– Maintains a searchable database of metadata
– Manages the identifiers over the long term
– Establishes and shares best practice
• Focused on improving the scholarly infrastructure around
datasets and other non-textual information
• Founded December 1st 2009 in London
ORCID: Open Researcher & Contributor ID
• Central registry of unique identifiers for
researchers
• Disambiguation of authors
• Links to other author ID systems:
– Thomson Reuters ResearcherID
– Scopus Author ID
ORCID
• Automated
search for your
publications
• Links ORCID to
publication DOIs
ODIN: ORCID and DataCite
Interoperability Network
• uniquely identify scientists and data sets and
connect this information across multiple
services and infrastructures
Trusted Digital Repositories
“A trusted digital repository is one whose mission is
to provide reliable, long-term access to managed
digital resources to its designated community, now
and in the future.”
Trusted Digital Repositories: Attributes and Responsibilities: An RLGOCLC Report, Research Libraries Group, First published in May 2002
Standards for
digital repositories
grew out of work
at NASA to save
data from
satellites.
http://public.ccsds.org/publications/archive/6
50x0m2.pdf
Certification Systems for Repositories
differ in depth and rigor
• ISO/DIS 16363 Audit and certification of
trustworthy digital repositories
– Trusted Digital Repositories and Audit Checklist
(TRAC)
• Data Seal of Approval
• ICSU World Data System
ISO/DIS 16363 grew out of
Trusted Repositories Audit &
Certification (TRAC)
http://www.crl.edu/sites/default
/files/attachments/pages/trac_0.
Trusted Digital Repositories and Audit Checklist (TRAC)
is an external audit covering:
 Organizational Infrastructure
• Governance & viability
• Organizational structure & staffing
• Policy framework
• Financial sustainability
 Digital Object Management
• Acquiring content
• Creating the archival package
• Preservation planning
• Access management
 Tech, Tech Infrastructure, & Security
http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/trac
Data Seal of Approval
• An international, community-driven effort to
certify trustworthy repositories in a
lightweight way
• 16 guidelines based on ISO 16363 (the ISO
standard has over 100 guidelines)
• Repository conducts self-assessment, which is
then peer-reviewed and seal is awarded
Data Seal of Approval
World Data System
Created by the International Council for Science
(ICSU)
Data-PASS mission:
• Archive social science data collections at-risk
of being lost.
• Catalog and promote access to archived
collections in the Data-PASS shared catalog.
• Replicated preservation of archived
collections.
• Advocate best practices in digital preservation.
• The Institute for Quantitative Social Science at Harvard
University
• The Howard W. Odum Institute for Research in Social
Science at the University of North Carolina-Chapel Hill
• The Inter-university Consortium for Political and Social
Research (ICPSR) at the University of Michigan
• The Electronic and Special Media Records Service Division,
National Archives and Records Administration
• The Roper Center for Public Opinion Research at the
University of Connecticut
• The Social Science Data Archive at the University of
California, Los Angeles (UCLA)
• Qualitative Data Repository (Syracuse)
Data-PASS Common Catalog
When data are available, authors
should provide program code, but…
• Subsets of ANES, GSS, WVS, Afrobarometer, …
are unnecessary and potentially misleading
• Research transparency is better served by
providing program code from the original data
to the published results
Program Code is Intellectual Property
too!
If authors must supply program code, it should
be
• Respected as intellectual property
• Accessible
• Citable (DOI etc.)
• Cited in references
runmycode.org
• Repository
for sharing
program
code
Where are we now?
• Links between data and publications are not
available, because journals do not cite data
consistently.
• Without consistent citation, aggregators
(Thomson Reuters, Scopus, Google Scholar)
cannot automate links.
• Research opportunities are being missed.
Journals are in a unique position to
• Implement citation standards
• Encourage and reward transparency and
reproducibility
• Influence researcher behavior