An Emergent Micro-Services Approach to Digital Curation

Download Report

Transcript An Emergent Micro-Services Approach to Digital Curation

UC3
CNI Spring 2010 Membership Meeting
Baltimore, April 12-13, 2010
Standards and Best Practices for Datasets and
Other Supplemental Journal Article Materials
DataCite @ UC3
Stephen Abrams
Patricia Cruse
John Kunze
UC Curation Center
California Digital Library
University of California
UC3
DataCite @ UC3
The California Digital Library was founded by the University of
California in 1997 to take advantage of emerging technologies
transforming the way digital information is published and
accessed
In collaboration with the UC libraries and other partners, the
CDL has assembled one of the world’s largest digital research
libraries and changed the ways that faculty, students, and
researchers use information
–
–
–
–
–
Collection development, licensing, mass digitization, and cataloging
Digital special collections
Discovery and delivery
Publishing
UC Curation Center (UC3)
UC3
DataCite @ UC3
UC3’s participation in DataCite is a continuation of
our ongoing activities in digital curation
– The set of policies and practices focused on managing
and adding value to a body of trusted digital content
over time
Share
Create
Research
Teach
Learn
Collect
Discover
Publish
Manage
Preserve
Gather
Access
Scholarly lifecycle
Information lifecycle
UC3
The gap between possibility and practice
Journal articles
Data
– Most articles held in multiple
academic and national libraries
– Few archives in widely visible
facilities
– Libraries ensure long-term
storage and access
– Difficult data management
after project funding ceases
– Extensive mechanisms for
publication and discovery
– Little opportunity for
publication, informal discovery
– Established funded mechanisms
for archival management
– Ad hoc funding sources, if at
all
– Citations form the basis of
impact analysis
– Not included in impact
analysis
UC3
What we’d like to enable…
Precise identification of datasets at appropriate
granularity
Bi-directional linking between traditional
publications and the data underlying them
Domain-specific discovery to facilitate innovative
reuse of data
Citation “credit” for data producers and publishers
Use metrics for data
UC3
CDL discovery services
ark:/a50600/rb2468097
doi:10.5060/rb2468097
http://n2t.net/a5060/rb2468097
UC3
CDL eScholarship publishing
Supplementary Data
Reichl, R., Waldinger, R., et al. (2006)
Table A: Survey of Attitudes …
Table B: Latinos in LA Basin …
…
UC3
Licensed resources
Supplementary data
UC3
DataONE
UC3
Identity is a fundamental curation service
Value
Annotation
of content by consumers
Notification
of new content availability
Transformation
Search
Service
Curation
Preservation

Context
to enable fast search
Ingest
of content for curation
Inventory
Replication
Fixity
State
of content and metadata
Index
Characterization

to create derivatives
to extract content properties
of curated content and metadata
for safety
to verify bit-level integrity

Storage
for long-term retention

Identity
for long-term reference
UC3
Easy Identifiers (EZID)
Tier 1
Anonymous request for persistent identifier
Tier 2
Tier 1, plus supply of a resolvable URL
(c.f. tinyurl)
Tier 3
Tier 2, but authenticated
(enabling link checking and personalized services)
Tier 4
Tier 3, plus supply of metadata
(enhanced discovery and resolution)
Tier 5
Tier 4, plus supply of the digital asset
(for local or brokered hosting)
Tier 6
Tier 5, plus supply of the asset from the web
(c.f. Zotero)
UC3
User-facing EZID interfaces
Two primary methods: mint and bind
id = mint (scheme, namespace)
bind (id, url)
bind (id, metadata)
Interface implementations
– HTML
Identifier:
Scheme: DOI
Namespace: UC3
– Email
mailto: [email protected]
– REST
POST /mint/scheme/namespace HTTP/1.1
Mint
UC3
Repository for cited data
UC3 M e r r i t t Ingest Service
My profile
Submission package
Package verification (optional)
File:
Browse...
Single
object
Batch of
objects
File
What is an object file?
Container
What is an object container?
Manifest
What is an object manifest?
Manifest
What is a batch manifest?
Checksum: MD5
What is a checksum?
Submission operation
Create a new object
Update an existing object
Object profile
Profile:
My profile
What is a profile?
Object identifier
Primary
Identifier:
What is a primary identifier?
Leave blank to have identifier assigned automatically
Object description (optional)
Creator:
What is a creator?
Title:
What is a title?
Date:
What is a date?
Local
Identifier:
What is a local identifier?
Copyright © 2009 The Regents of the University of California
Wednesday, December 9, 2009 12:04pm PST
Terms of service
Help
Log out
Help
UC3
Pilot projects
UC ETDs
http://www.escholarship.org/
Dryad
http://datadryad.org/repo
UC Berkeley Water Resources Center
http://www.lib.berkeley.edu/WRCA
UC Berkeley Jepson Herbarium
http://ucjeps.berkeley.edu/jeps
UC3
Next steps
Work with DataCite partners to establish metadata
and citation standards and best practices
Integrate support for DataCite DOIs into EZID
Promote data citation for research, teaching, and
learning on UC campuses and by funded project
partners
Increase the visibility of UC3 as a DataCite
registration agency and the UC3 curation
environment for data hosting
UC3
Summary
Digital resources lacking identification cannot be curated
Data should be seen (and supported) as a new kind of
publication
Scholarly inquiry is facilitated by bi-directional linking
between articles and the data on which they are based
DataCite plays a vital role in supporting data as citable
publication
UC3 is working with campus and external partners to
provide effective data citation services
UC3
For more information
DataCite
http://www.datacite.org/
UC Curation Center
http://www.cdlib.org/services/uc3
[email protected]
[email protected]
[email protected]