Assessing Digital Objects with Hannah Frost AND TEAM* Stanford University Libraries and Academic Information Resources Introduction Repositories collect lots of technical metadata, but lack tools.

Download Report

Transcript Assessing Digital Objects with Hannah Frost AND TEAM* Stanford University Libraries and Academic Information Resources Introduction Repositories collect lots of technical metadata, but lack tools.

Assessing Digital Objects with
Hannah Frost AND TEAM*
Stanford University Libraries and Academic Information Resources
Introduction
Repositories collect lots of technical metadata,
but lack tools to use it to better understand the
objects in their care, and to apply it precisely in
management and operations.
Recent years have seen the development of a number of metadata
tools designed to identify and characterize digital objects in
preservation workflows: JHOVE, Metadata Extraction Tool,
DROID, and most recently FITS. Each is available to the digital
preservation community, and the community makes good use of
them.
Repositories dutifully collect and store the technical and structural
metadata exposed and output by these tools, yet typically
repositories have limited or no means to analyze and evaluate
object characterization data in ways that would facilitate more
effective management of the objects under their care.
According to a recent Planets survey report, less than a third of
organizations reported that they have “complete control over the
formats that they will accept and enter into their archives.” 1
Repository managers have concerns about risks associated with
file formats, and file format obsolescence generally. To support
preservation services for content considered to encoded in “risky”
formats, some repositories are developing policies and profiles that
reflect their local concerns and operational contexts.2 They seek
tools to assess the technical metadata gathered and stored in
routine repository operations against those policies in order to
make sense of it on local terms and inform a decision-making
process, such as:
 accept / reject
 determine level of risk
 assign level of service
 take action now / later.
Recent efforts to develop and apply assessment methodologies in
digital object workflows and repository operations include:
 AONS II (Automated Obsolescence Notification System),
National Library of Australia and APSR3
 CIV (Configurable Image Validator), Library of Congress
 Institutional Technology Profiles, National Library of
New Zealand4.
Assessment Rules
In JHOVE2 – a next-generation characterization tool currently in
development at California Digital Library, Portico, and Stanford
University – the team has designed an approach to facilitate policybased assessment as an object is processed. The tool produces
characterization data through a series of identification, feature
extraction, validation, and assessment processes. It tells you what
you have, as the starting point for iterative preservation planning
and action.
It is possible to assess the properties of an object against a set of
“rules” configured by the user. Using logical expressions as its
terms, a rule is an “assertion” about prior characterization
properties. An assertion may be concerned with:
In assessment, the evaluation of the assertion results in a new
characterization property. In this sense, the process generates
custom metadata that has significance in the context in which the
object is being managed.
The basic formation of a rule is shown below. The user configures
the property and value to test, and selects an evaluatory phrase
relating the two to form the complete assertion.
4. De Vorsey, K. and McKinney, P. (2009). One Man’s Obsoleteness is Another Man’s Innovation: A Risk
Analysis Methodology for Digital Collections. Presented at Archiving 2009, Arlington, Virginia, May 2009.
Rule Configuration
Assertion
Message [Information], Contains,
Non-wordAlignedOffset
Response If True
Acceptable
<property>
Is Equal To
Is Not Equal To
Is Greater Than
Is Less Than
Contains
Does Not Contain
Implementation
Technical implementation of assessment within the JHOVE2
framework is in design; prototyping will begin soon. A leading
requirement is that rule configuration is simple. Non-technical
staff and technical staff alike must be able to easily configure rules
and “run an assessment”. The JHOVE2 release in 2010 will
include a small selection of sample rules and a thorough tutorial.
Response If False Acceptable
Output
Assessment:
Assertion: Message [Information], Contains, Non-wordAlignedOffset
Result: True
Response If True: Acceptable
(2) PDF with malformed dictionary
Assessment with JHOVE2 has natural applications in ingest,
migration, publishing and digitization workflows. A clean, welldefined, open API will be available in order to extend it to build
tools capable of more complex analyses, such as a weighted
scoring system or matching of technology profiles.
JHOVE2 can be integrated with other identification tools as well
as format and software registries to form robust policy engines and
other rules-based systems. Such systems have great potential in
supporting and enabling digital preservation activities and services
both at the local level and across the community.
Rule Configuration
Assertion
Message [Error], Contains,
Malformed dictionary
Response If True
At Risk
Response If False No Risk
Output
Assessment:
Assertion: Message [Information], Contains, Malformed dictionary
Result: True
Response If True: At Risk
About the JHOVE2 Project
JHOVE2 is a collaboration by the California Digital
Library, Portico, and Stanford University Libraries with
funding from the Library of Congress’ National Digital
Infrastructure and Information Preservation Program. The
two year project will conclude, and the open source tool
will be released, in September 2010.
value
(3) WAVE does not meet encoding specification
Rule Configuration
In addition, the user provides two responses for each rule: one to
report if the assertion is true, and one to report if it is false.
Response If True
Response If False
This response constitutes the customizable metadata that is
available for subsequent processing or analysis.
Assertion1
isValid, isEqualTo, True
Assertion2
BitDepth, isEqualTo, 24
Assertion3
SamplingFrequency, isEqualTo, 96000
Response If True
Accept
Response If False Reject
Output
1. Planets (2009). Survey Analysis Report, IST-2006-033789, DT11-D1.
http://www.planets-project.eu/market-survey/reports/docs/Planets_DT11-D1_SurveyReport.pdf
3. Pearson, D. and Webb, C. (2008). Defining File Format Obsolescence: A Risky Journey. International
Journal of Digital Curation. Vol 1: No 3. http://www.ijdc.net/index.php/ijdc/article/view/76
(1) TIFF with nonaligned byte offset
 The presence or absence of a property;
 Constraints on property values;
 Combinations of properties or values.
Bibliography
2. Rog, J. and van Wijk, C. (2008). Evaluating File Formats for Long-term Preservation. National Library of the
Netherlands; The Hague, The Netherlands.
http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.p
df.
Examples
Rules can be executed as atoms, or chained together to form
compound statements for more complex assessments.
Assessment:
Assertion1: isValid, isEqualTo, True
Assertion2: BitDepth, isEqualTo, 24
Assertion3: SamplingFrequency, isEqualTo, 96000
Result: False
Response If False: Reject
* The JHOVE2 Team is …
CDL: Stephen Abrams, Patricia Cruse, John Kunze,
Marisa Strong, Perry Willett
Portico: John Meyer, Sheila Morrissey, Evan Owens
Stanford: Richard Anderson, Tom Cramer, Hannah Frost
with Walter Henry, Nancy Hoebelheinrich, Keith Johnson,
Justin Littman
http://confluence.ucop.edu/display/JHOVE2Info/