PREMIS Deep Dive – Personal View

Download Report

Transcript PREMIS Deep Dive – Personal View

PREMIS
Practical Strategies For Preservation Metadata
Mark Evans, Tessella
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Contents
•
•
•
•
•
PREMIS Basics
PREMIS - Conformance
PREMIS - A practical approach
Where next?
Useful resources
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Different Types of Metadata?
• Descriptive
– supports identification and discovery of a resource
• Administrative
– supports the management and tracking of a resource
• Structural
– defines the arrangement and composition of a resource
• Preservation
– supports activities intended to ensure the long term usability of a
resource
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
What is PREMIS?
• PREservation Metadata: Implementation Strategies
• PREMIS is an Information Model:
– Focus is on the preservation of digital objects
– “The information a repository uses to support the digital
preservation process”
– “Things that most working preservation repositories need to
know to support digital preservation functions”
– Data dictionary defines a set of semantic units
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
What is Out of Scope in PREMIS?
• Descriptive metadata
– Many existing standards support this
• File Format specific metadata
– Metadata that pertains to only one file format or class of formats
• Implementation metadata
– Metadata that describes specific policies and practices of an
individual repository
• Detailed media and hardware information
– Left to other communities to define
– Technical environment metadata is in scope
Image taken from “Understanding PREMIS”; Caplan, Library of Congress , 2009
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics: Usage
• Repository Design
– Provides guidelines on what information should be obtained and
maintained by a preservation repository
• Repository evaluation
– Provides a checklist to determine effective preservation
management of digital objects
• Exchange of objects between repositories
– Provides a common set of data elements that can be understood
by the provider and consumer repositories
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics - Always had intellectual entities
Descriptive metadata
Collection
Sub-Collection
Descriptive metadata
Record Series
Descriptive metadata
Item
Descriptive metadata
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Structural
metadata
PREMIS Basics – Now We Have Digital Objects
Technical Metadata
Records of XYZ
Committee
1995 – Word
Perfect
Technical metadata
2000 – Word 97
Technical metadata
2005 – Word
2002
Technical metadata
2010 – Word
2010
Technical metadata
• Fixity
• Checksum
• Size
• Format
• Version
• Environment
• Hardware
• Operating system
• Rendering software
• Embedded images
• Media properties
• Type, age etc
• Digital provenance
• Authenticity
• Digital signatures
• Inhibitors
• Significant Properties
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics - And need to do more things…
Records of XYZ
Committee
Original Representation
Migrated Representation
Format Migration
1995 – Word
Perfect
1995 – PDF/A
2000 – Word 97
2000 – PDF/A
2005 – Word
2002
2005 – PDF/A
2010 – Word
2010
2010 – PDF/A
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics - 3 Types of Digital Object
Representation – Set of
digital objects needed to
render an Intellectual Entity
File – A named and ordered
sequence of bytes that is
known by an operating system
1995 – Word
Chapter 1.doc
Perfect
Chapter 2.doc
Chapter 3.doc
Chapter 4.doc
Bitstream– is contiguous or non-contiguous data
within a file that has meaningful common properties for
preservation purposes.
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics – Data Model
Intellectual Entity
Content that can be
described as a unit
Rights
Assertion of rights and
permissions
Objects
Agents
Units of information in
digital form
People, organizations or
software
Events
Actions that involve an
Object and an agent known
to the system
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics – Semantic Units
Semantic Units:
– Convey a piece of information / knowledge
– Do not specify how they should be represented in a particular
system (as opposed to metadata elements)
– Should be exportable to other systems
– May have a direct mapping to metadata elements in an XML
schema
Containers and sub units
– Some semantic units are defined as containers
– Facilitiates a hierarchical structure to the data dictionary
– Extension containers are allowed
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics - Objects
Intellectual
Entity
Rights
Semantic Units
•
•
•
•
Identifier
Category (Representation, File,
Bitsteam)
Preservation level
Significant properties:
– Type (e.g., page count)
– Value (e.g., 7)
•
Fixity
Size (bytes)
Format (Designation, Registry, Note)
Creating application
Inhibitors
Original name
Storage
Agents
Events
•
Environment:
–
–
–
–
Characteristics:
–
–
–
–
–
•
•
Objects
•
•
•
•
•
…
Software
Hardware
…
Signature Information
Relationship
Linked events
Linked intellectual entity
Linked rights statement
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics - Events
Semantic Units:
• Identifier
• Type
• Date Time
• Detail
• Outcome Information
• Linking Agent Identifier
• Linking Object Identifier
Intellectual
Entity
Rights
Objects
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Agents
Events
PREMIS Basics - Rights
Semantic Units
• Identifier
• Basis
• Copyright Information
• Licence Information
• Statute Information
• Rights granted
• Linking object
• Linking agent
Intellectual
Entity
Rights
Objects
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Agents
Events
PREMIS Basics - Agents
Semantic Units
• Identifier
• Name
• Type
Intellectual
Entity
Rights
Objects
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Agents
Events
PREMIS Basics – Example Dictionary Entry
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Basics – Example Dictionary Entry
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Conformance
• To be conformant:
– Implemented semantic units should have the stated definition,
constraints and applicability prescribed in the Data Dictionary
– If share name, must share definition
– If not share name, must map definition (if mandatory)
– Can be more stringent, but NOT more liberal. (Can add constraints but
not remove them)
– An export of semantic units must contain all mandatory elements for the
entities that are supported
• Internal Conformance
– Conformant within the repository
• External conformance
– Repository must be able to accept / export conformant semantic units
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Conformance
• Not required for conformance
– Support for all entity types
– Use of semantic unit names internally
– Use of inference or mapping
• There is a PREMIS XML Schema but do NOT have to
use this to be CONFORMANT.
– E.g. Use of PREMIS “in” METS provides some overlap
– Planets data model has PREMIS extensions
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS Practicalities: e.g., SDB
• Pre-dates PREMIS (Representation = Manifestation)
• Need to respond quickly: Add extra fields to entities
• Need more entities:
– Intellectual Entities broken down more than via cataloguing
– Complex relationship with Representations
– Use for automated migration and validation
• Don’t want to hold lots of repeated information:
– Use PRONOM PUIDs so Registry implied for every Format
• Hold Storage separate from immutable metadata
• Do have option to export to PREMIS XML schema:
– Make implicit information explicit
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Example of PREMIS in SDB
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
PREMIS – Governance Model
• “Self-governed” by community
• PREMIS Editorial Committee
• If you get involved..
– Likely to get invited on!
• Does react
• But it is everyone’s part-time job!
• PREMIS 3.0:
– Adds Intellectual Entity
– Adds Environments
– Allows for less verbosity
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
CONCLUSIONS
• PREMIS:
–
–
–
–
–
Information Model for digital preservation
Allows for implementation variations
Allows for extensions (low conformance barrier)
Reacts to community
JOIN IN!
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013
Useful Resources
• PREMIS specification
http://www.loc.gov/standards/premis/
• PREMIS primer
http://www.loc.gov/standards/premis/understanding-premis.pdf
• Conformance Guidance
http://www.loc.gov/standards/premis/premis-conformance-oct2010.pdf
• PREMIS Implementers Group (PIG)
http://www.loc.gov/standards/premis/pig.html
Mark Evans – [email protected]
http://www.digital-preservation.com
Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22nd May 2013