Transcript Document
Data Citation, Persistent Identifiers, and Trusted Repositories George Alter ICPSR Emerging infrastructure offers new incentives for data sharing • Data repositories track data use • Citation services demonstrate the impact of data • Links between data and publications enable cross-disciplinary discovery How do we enable replication of results and reuse of existing data? • Data citation • Persistent identifiers • Trusted repositories Why is data citation important? • Creators of data have a right to expect their work to be acknowledged • Citation will enhance the careers of data producers • Recognition encourages others to share data • Citations are an element in evaluating the impact of a data collection NSF Biosketch f. Biographical Sketch(es) (c) Products A list of: (i) up to five products most closely related to the proposed project; and (ii) up to five other significant products, whether or not related to the proposed project. Acceptable products must be citable and accessible including but not limited to publications, data sets, software, patents, and copyrights… Each product must include full citation information including (where applicable and practicable) names of all authors, date of publication or release, title, title of enclosing work such as journal or book, volume, issue, pages, website and Uniform Resource Locator (URL) or other Persistent Identifier. Thomson Reuters Data Citation Index Search for “Murder cases” Thomson Reuters Data Citation Index Results list showing Data Repositories, Data studies and Data sets Citation count Thomson Reuters Data Citation Index DATA STUDY RECORD DISPLAY Navigate to a list of publications describing research where data was used Parent source repository reference and hyperlink to repository record Grant funding information Thomson Reuters Data Citation Index DATA STUDY RECORD DISPLAY CONTINUED Recommendation of how to cite the data Navigation to associated datasets and link to download data from source repository Inconsistent Referencing of Data Data-PASS letter to the American Political Science Association, August 25, 2010 Similar letters sent to American Economics Association, American Education Research Association, and American Sociological Association. Standards for Citation of Data Exist General Guidance • Data Synthesis Group Data Citation Principles • Data Preservation Alliance for the Social Sciences (Data-PASS) • CODATA/ITSCI Task Force on Data Citation Exemplars of Practice • Institute for Quantitative Social Science (IQSS), Harvard University: Dataverse Network • Inter-university Consortium for Political and Social Research • Journal of Statistical Software • American Sociological Review Data Citation Author Guidelines Should Say: • Data should be cited in the same place (references, footnotes) as publications • Each citation must include these basic elements: – – – – Title Author Date Persistent identifier (such as the Digital Object Identifier, URN, or Handle) – Version and fixity information are strongly recommended • Data should be accessible in a trusted digital repository Data archives provide citations Persistent Identifier (Handle) Data archives provide citations Persistent Identifier (DOI) Persistent Identifiers • A long-lasting reference to a digital object • URLs point to locations, which are unstable • Persistent Identifiers provide a name and a locator • Digital Object Identifiers (DOIs) are widely used for publications • DOIs are resolved by Registration Agencies DOIs are also used to resolve rights and subscriptions When DOIs are used in citations, the citing articles can be recovered by search engines. This data set has been used in 187 publications, but only 9 used the DOI. DataCite • DOI Registration Agency created for scientific data – Maintains the resolution infrastructure – Maintains a searchable database of metadata – Manages the identifiers over the long term – Establishes and shares best practice • Focused on improving the scholarly infrastructure around datasets and other non-textual information • Founded December 1st 2009 in London ORCID: Open Researcher & Contributor ID • Central registry of unique identifiers for researchers • Disambiguation of authors • Links to other author ID systems: – Thomson Reuters ResearcherID – Scopus Author ID ORCID • Automated search for your publications • Links ORCID to publication DOIs ODIN: ORCID and DataCite Interoperability Network • uniquely identify scientists and data sets and connect this information across multiple services and infrastructures Trusted Digital Repositories “A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future.” Trusted Digital Repositories: Attributes and Responsibilities: An RLGOCLC Report, Research Libraries Group, First published in May 2002 Standards for digital repositories grew out of work at NASA to save data from satellites. http://public.ccsds.org/publications/archive/6 50x0m2.pdf Certification Systems for Repositories differ in depth and rigor • ISO/DIS 16363 Audit and certification of trustworthy digital repositories – Trusted Digital Repositories and Audit Checklist (TRAC) • Data Seal of Approval • ICSU World Data System ISO/DIS 16363 grew out of Trusted Repositories Audit & Certification (TRAC) http://www.crl.edu/sites/default /files/attachments/pages/trac_0. Trusted Digital Repositories and Audit Checklist (TRAC) is an external audit covering: Organizational Infrastructure • Governance & viability • Organizational structure & staffing • Policy framework • Financial sustainability Digital Object Management • Acquiring content • Creating the archival package • Preservation planning • Access management Tech, Tech Infrastructure, & Security http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/trac Data Seal of Approval • An international, community-driven effort to certify trustworthy repositories in a lightweight way • 16 guidelines based on ISO 16363 (the ISO standard has over 100 guidelines) • Repository conducts self-assessment, which is then peer-reviewed and seal is awarded Data Seal of Approval World Data System Created by the International Council for Science (ICSU) Data-PASS mission: • Archive social science data collections at-risk of being lost. • Catalog and promote access to archived collections in the Data-PASS shared catalog. • Replicated preservation of archived collections. • Advocate best practices in digital preservation. • The Institute for Quantitative Social Science at Harvard University • The Howard W. Odum Institute for Research in Social Science at the University of North Carolina-Chapel Hill • The Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan • The Electronic and Special Media Records Service Division, National Archives and Records Administration • The Roper Center for Public Opinion Research at the University of Connecticut • The Social Science Data Archive at the University of California, Los Angeles (UCLA) • Qualitative Data Repository (Syracuse) Data-PASS Common Catalog When data are available, authors should provide program code, but… • Subsets of ANES, GSS, WVS, Afrobarometer, … are unnecessary and potentially misleading • Research transparency is better served by providing program code from the original data to the published results Program Code is Intellectual Property too! If authors must supply program code, it should be • Respected as intellectual property • Accessible • Citable (DOI etc.) • Cited in references runmycode.org • Repository for sharing program code Where are we now? • Links between data and publications are not available, because journals do not cite data consistently. • Without consistent citation, aggregators (Thomson Reuters, Scopus, Google Scholar) cannot automate links. • Research opportunities are being missed. Journals are in a unique position to • Implement citation standards • Encourage and reward transparency and reproducibility • Influence researcher behavior