Digital Archives and Encoded Archival Description

Download Report

Transcript Digital Archives and Encoded Archival Description

Archives, Digital Archives and
Encoded Archival Description
Chris Prom
Assistant University Archivist
University of Illinois
Mortenson Visiting Scholars Tech Training
April 19, 2006
Intro
• Overview of Archives, Arrangement and
Description
• Review Standards and Tools related to
Archival Description
• Review Standards and Tools for providing
access to digital archival materials
• Lots of interaction
Archives Background
• Archives: Organized non-current
“records”; generated by institutions
• Manuscripts: non-current “papers”;
generated by individuals or families
• Preserved because of ‘enduring’ value
– Not necessarily ‘permanent value’
• Both generally referred to as “collections”
The Archival Mission
• Identify, preserve, make available records and papers
From Gregory Hunter, Developing and Maintaining Practical Archives
Libraries
Archives
Nature
Published, discrete, make
sense on own, multiple copies
Unpublished, grouped with related
items, make no sense on own
Creator
Many
One parent organization
Method of
Creation
Each created separately
Organically produced as part of
normal business or life
How Received
Selected as items
Appraised as groups
How Arranged
By subject classification
Provenance and original order
(structure and function)
How described
By item
In aggregate (record group,
series, collection)
Where described
Built into item itself (provided
title, author, CIP data), in
catalog
Prepared by archivist (e.g.
supplied title) in ‘finding aids,
guides, inventories, databases
How accessed
Items circulate
No circulation
Based on chart in Hunter, Developing. . . p. 7
Archival Appraisal 101
• Process of determining
‘value’
• Done over aggregates not
items
• Primary: operational, legal,
fiscal, administrative
• Secondary: Historical or
‘archival’ value
• Types of archival value
– Evidential: documents
organization and
functioning of organization
– Informational: sheds light
on people, events, things
aside from organization
Credit: Hunter, p. 51
Archival Arrangement 101
• Provenance
– Records from one creator must not be intermingled
with those from another
– NOT by subject
• Original order
– Maintain records in order placed by creator
• Five “levels” of arrangement
–
–
–
–
–
Repository
Record group/subgroup (organizationally related group)
Record series (set of files or documents maintained as a unit)
File (folder, binder, packs for convenient use)
Item (one document, letter, etc)
Levels of Arrangement: Examples
Repository
University Archives
Special Collections
Record Group
College of Engineering
Champaign County Republican
Party
Series
Dean’s Office
Correspondence Files
Speaker’s Committee File
File Unit
Federal Aviation
Administration
Barry Goldwater, 1960-70
Item
Letter to FAA Director,
June 12, 1968
Copy of remarks by Goldwater to
CCRP, August 23, 1965
Arrangement of “Papers”
• The mixed repository model
• Term “series” in papers often refers to internal
divisions in a collection.
• Thurgood Marshall Papers:
– “The collection is arranged in five series:
•
•
•
•
•
United States Court of Appeals File, 1957-1965, n.d.
United States Solicitor General File, 1965-1967, n.d.
Supreme Court File, 1967-1991, n.d.
Miscellany, 1949-1963
Oversize, 1967, 1991”
Description of Archives
• Establish administrative control over archival
materials
– Locate collections
– Identify their source, creators (chain of custody)
– Outline contents
• Establish intellectual control
–
–
–
–
General nature of repository
General contents of collection
Detailed information on specific collections
Summarize information across several collections
• Important for both authentication and access
• Internal vs. Public finding aids
Principles of Description*
• “Multilevel Description”
– Proceed from general to specific
– Provide information relevent to the level of
description
– Link each level of description to next higher
unit of description
– Do not repeat information, provide it only at
highest appropriate level
* Summarized from ISAD(G) General International Standard Archival Description
Finding Aid
• Basic Access Tool is the “Finding Aid” also
known as ‘inventory’ or ‘register’.
–
–
–
–
–
–
–
Prefatory material
Introduction
Biographical sketch/agency history
Scope and content note
Series description (organization)
Container Listing
Index (less used now with electronic finding aids)
Elements of Description
• 26 in ISAD (G) (www.ica.org/biblio/cds/isad_g_2e.pdf)
• Identity
– Reference code, title, dates, level of description
• Context
– Name of creator, biographical or admin history, source of
materials
• Content/Structure
– Scope/content, appraisal information, arrangement
•
•
•
•
Conditions of Access/Use
Allied Materials (copies, originals, related)
Notes
Description Control (author of description, revisions)
Finding Aid Examples
• Reston Papers and Third Armored Division
Assn (bring along)
• American Crystal Sugar Co.
• Thurgood Marshall Papers
Questions?
• Next:
– Overview of standards and tools for
description of paper and electronic materials,
and tools for access to electronic collections.
Establishing a good descriptive
system
• Takes planning, awareness of resources
• Deciding on ‘platform’ or computers should
be LAST step
• Better to describe all materials at high
level than put all effort into one collection
• Beware tendency to do lower levels of
description before higher levels
• Inventory MUST be the key
• Use a content standard
Describing Archives: A Content
Standard
• Provides rules/advice about the quality and
structure of informational content
– 8 principles
– What to put in the 26 elements recommended by
ISAD (G)
– Rules for describing creators and forms of names
– Complement to AACR2
– Provides mapping to appropriate data structure
standards
MARC21
• Advantages: Can use regular library
software, provides integrated access with
non-archival materials
• Disadvantages: Can undermine
provenance, relationship to other materials
may be lost
• Recommendation: USE MARC Cataloging
as first step in PUBLIC finding aids
Cataloging Archival Materials
MARC 21 Sample
Typical Fields for Cataloging
Archival Materials
Personal Name
100
Corporate Name
110
Title
245a,b
Inclusive Dates
245f
Physical Description (volume)
300
Arrangement/Organization
351
Biographical/Historical Note
545
Scope/content note
520
Restrictions on Access
506
Terms of Use
540
Provenance
561
Subject added entry
650s
Personal name added entry
700
Personal name as subject
600
Corporate name as subject
610
Link to finding aid or digital collection
856
Word-Processed Finding Aids
• Advantages: Easy to create, maintain
• Disadvantages: Not in standard format,
cannot exchange with others, lack of
coded fields
• Recommendation: Very useful for most
institutions. Can be published to Internet
via PDF
Encoded Archival Description
(EAD)
• Data structure standards for descriptions
of manuscripts or archives-->finding aids
• At any level of granularity
• Typically collection level
• sgml and xml versions of DTD
• <dao> tag for linking to archival surrogates
EAD
• Advantages: Best interoperability and data
exchange, easier to implement with others
(consortia)
• Disadvantages: Tool development still
weak, steep learning curve.
• Recommendation: If you have good
technical skills, and a basic archival
program is in place, and resources are
available, implement it
EAD Samples
• Static:
– http://web.library.uiuc.edu/ahx/ead/ua/1505023/1505023f.html
– http://www.amphilsoc.org/library/mole/e/edwards.htm
• Conversion on server:
http://www.amphilsoc.org/library/mole/e/edwards.xml
• PDF: http://www.amphilsoc.org/library/mole/e/edwards.pdf
• In digital library software:
– http://www.umich.edu/~bhl/EAD/index.html
– http://www.oac.cdlib.org/
• Other implementations
– Cheshire: http://www.archiveshub.ac.uk/
EAD Structure 1
• XML: perfect way to implement principles
of ‘multi-level description
– many elements optional
– most repeatable at any level, nesting can vary
– Normalization possible, but not common for
most finding aids
EAD Structure 2
• <eadheader> (information about EAD File)
– <eadid> unique id
– <filedesc>
<titlestmt>
<publicationstmt>
<notestmt>
– <profiledesc>
<creation>
<langusage>
– <revisiondesc>
– <frontmatter> (deprecated element, repeats info for
display)
• <archdesc> (information about materials being described)
Common Top-Level <archdesc> Elements
<did> (descriptive id)
<origination>
<unitititle>
<unitdate>
<physdesc>
<abstract>
<repository>
<unitid>
<bioghist>
<scopecontent>
<arrangement>
<controlaccess>
<accessrestrict>
Other elements include <accruals>, <acqinfo>, <altformatavail>, <appraisal>,
<custodhist>, <prefercite>, <processinfo>, <userestrict>, <relatedencoding>,
<separatedmaterial>, <otherfindaid>, <bibliography>, <odd>
Linking elements: some based on XLink spec, suite of linking elements includes
<archref> ,<extref>, <daogrp>
All of above elements are repeatable for components of the collection, at any
level in the <dsc> (description of subordinate components)
Description of Subordinate
Components
• nested components (i.e. <c> [unnumbered] or
<c01>, <c02>, etc. [numbered]) represent
intellectual structure of materials being described
• <container> elements (within each level) represent
physical arrangement
• Maximum depth of 12 levels (not a good idea to use
all of them)
• All elements available in archdesc top level also
available in any component (typically not used)
A “raw” EAD File
• http://web.library.uiuc.edu/ahx/ead/xml/2620016.xml
EAD Tools: Creation
• Current options
– Text editors (cheap, no built in validation,
transformation or unicode support)
• Notetab
• Word Processors
– XML editors (graphical view, built in validation,
transformation, unicode support, FOP; tend to be
buggy)
• XML Spy
• oXygen
• XMetal (not recommended)
– EAD Cookbook highly recommended, templates for
Notetab, oXygen
EAD Tools: Display
• Most common to transform to HTML
– Static via xsl stylesheet on command line or in
authoring software, then upload files to server
– Client-side via link to css or xsl (dicey)
– Server side transform engine (saxon, msxml,
xalan, etc) via servlets
• Dynamic (searchable)
– dlxs findaid class
XML Transformations
XML
XSLT1
HTML1
XSLT2
HTML2
XSLT3
XSL PARSER
HTML3
XSLT4
HTML4
XSL-FO
PDF
Typical XSL file
Collection Management Tools
• Advantages: Software tailored for
Archives, easy data entry
• Disadvantages: Few options currently
exist. May be difficult to ‘migrate’ forward
at a future point. Also not automatically
online
“CMT” Examples
• Past Perfect http://www.museumsoftware.com/
• Archivist Toolkit http://www.archiviststoolkit.org/
• UIUC “Archival Information System”
AIS Demo
• www.chrisprom.com/ais/admin
• Login: guest
• Password: guest
Break for Questions
• Next: Digital Archives Standards and Tools
Digital Libraries or Archives?
Libraries
Archives
Nature
Published items, each item
discrete, make sense on own,
multiple copies
Unpublished, grouped with
related items, make no sense on
own
Creator
Many different
One parent organization
Method of
Creation
Each created separately
Organically produced as part of
normal business or life
How Received
Selected as items
Appraised as groups
How Arranged
By subject classification
Provenance and original order
(structure and function)
How described
By item
In aggregate (record group,
series, collection)
Where described
Built into item itself (provided
title, author, CIP data), in
catalog
Prepared by archivist (e.g.
supplied title) in ‘finding aids,
guides, inventories, databases
How accessed
Items circulate
No circulation
The “on a horse” problem
• Best systems mix archival and
library approaches
• Complete item description AND
• Full context AND
• Link to complete collection
(including description of off line
items)
Sample of Digital Library/Archive
Projects
•
•
•
•
•
•
http://memory.loc.gov/ammem/index.html
http://www.oac.cdlib.org/
http://www.ohiomemory.org/index.html
http://www.library.yale.edu/mssa/
http://www.marquette.edu/library/MUDC/
http://www.library.uiuc.edu/archives/coll/dl/
bot/bot.html
Digital Library/Archive Standards
•
•
•
•
•
Background on Metadata
For images: Dublin Core
For texts: TEI
For information exchange: METS, OAI
For Digital Preservation: OAIS Reference
Model
Archivists and Metadata
• Structured data about an information resource
• Metadata by itself doesn’t “do” anything.
• Metadata schemas provide “buckets” for information
about resources.
• Metadata needs to be interpreted by a system or
user.
• Metadata provides context to help machines (and
more importantly people) interpret content
• People usually talk about applying metadata to
digital materials, but. . . . . .
These are metadata
fields
This is Metadata
same thing electronically
Metadata Fields
The metadata itself
Now as xml “metadata”
Descriptive and
administrative
This is Not Metadata
This is!
Metadata is about context and
relationships
This is metadata,
but. . .

Incomplete

Embedded in
object

Not selfexplaining
More complete
 Not embedded
 Relational
 Not self-explaining

Metadata and
 Code and
 human user
beginning to do
something with
metadata
 But. . .
 Not selfexplaining
Can’t be
exchanged

now as xml metadata



Non-embedded
Self-explaining
But relationships lost
Dublin Core
• Developed in 1995 for authors to describe
own web resources
• Very simple, only 15 broad categories in
the “simple” version
• Advantages: commonly held set of
elements is easy to understand, built into
many current tools
• Disadvantages: loss of specificity
The 15 elements:
• Content
–
–
–
–
–
–
–
–
Coverage
Description
Title
Type
Relation
Source
Subject
Audience
• Intellectual Prop
–
–
–
–
Contributor
Creator
Publisher
Rights
• Instantiation
–
–
–
–
Date
Format
Identifier
Language
Dublin Core Resources
• http://dublincore.org/
• http://www.ukoln.ac.uk/metadata/dcdot/
Text Encoding Initiative
• Encode any text with structural markup,
deep semantic markup, or any
combination of the two
• Section for metadata in <teiHeader>
• http://www.tei-c.org/
• Typically need xml editor to create,
software such as DLXS to display
• http://media.library.uiuc.edu/projects/bot/xml/index.htm
OAIS Reference Model
• Based on Archival Principles
• Three parties involved with digital
information
– Producers; SIP: Submission Information Packet
– Managers; AIP: Archival Information Packet
– Consumers (Users); DIP: Dissemination Information
Packet
• http://www.library.cornell.edu/iris/tutorial/dpm/foundation/oais/index.h
tml
“Simple” OAIS Model
METS
• Metadata Encoding and Transmission Standard
• Standard for encoding descriptive,
administrative, and structural metadata
regarding objects within a digital library
• Outgrowth of Making of American II project
• Provides metadata for compound text and
image-based works
• Need purpose-built software to display and
navigate.
METS: Why bother?
• Based on the OAIS Reference Model. It Includes
support for:
– Submission Information Packet
– Archival Information Packet
– Dissemination Information Packet
• Not only for transfer and archival management, but for
giving access to, navigating an object
• It “plays well” with other systems (EAD, MARC, TEI, VRA
etc)
• Software will be coming (support in Archivist Toolkit,
NDIIPP projects)
• BUT. . . . It is currently very complex.
OAI-PMH
• Open Archives Initiative Protocol for
Metadata Harvesting
• Not cross-database searching
• metadata harvesting
• Data Providers (expose collections in a
common syntax)
• Service Providers (use metadata
harvested via the OAI-PMH as a basis for
building value-added services)
OAI Example
• OAIster: http://oaister.umdl.umich.edu/o/oaister/
Tools for Digital Library/Archive
Projects
• CONTENTdm http://www.dimema.com/
– Very good, support for dublin core, OAI
– Con: expensive
– Recommendation: Skip it
• Greenstone http://www.greenstone.org/cgi-bin/library
– Pros: Free, (relatively) easy to configure, low
hardware requirements, can run on internet or publish
to CD, supported by UNESCO, targeted at
developing nations
– Con: tends to be ‘item-centric’, difficult to aggregate
materials
– Recommendation: Use it, but as part of large
descriptive system
Thanks!!!!
• This powerpoint online at:
– http://web.library.uiuc.edu/ahx/workpap