Transcript Slide 1
South Carolina Information Technology Directors
Association
September 8, 2008
Bill Henry, Matt Guzzi
SC Department of Archives and History
Background – Last Year
2007 NHPRC grant proposal not funded
AZ Archives submitted multi-state grant
proposal to Library of Congress
AZ proposal had same basic goals
SC too late for funding
Paid own expenses to join project
2
Electronic Archives Funding
One-time funding from General Assembly
Digitize paper records
Capture agency website snapshots
Purchase hardware and software
Library of Congress approved additional
funds for project
SC now a fully-funded partner
3
What is PeDALS?
Persistent Digital Archives and Library System
Multi-state grant project funded by the Library of
Congress and the Institute for Museum and
Library Services
Five state partners: Arizona, Florida, New York,
Wisconsin, South Carolina
Project will run 18-24 months; if successful,
SCDAH intends to continue participation beyond
this period
At the end of the project each partner will have a
4
functioning digital archives system
Why is PeDALS Needed?
An increasing number of long-term and
archival records are created and
maintained only in digital formats
Traditional archival practices designed for
paper records won’t work in digital
environment
Need ability to preserve electronic records
so that we can demonstrate authenticity
and protect integrity
PeDALS is both a learning opportunity and
a chance to implement a functioning
system
5
Technical Goals
To develop a curatorial rationale that can
be implemented in software to support an
automated, integrated workflow to
process collections of digital records
To build “digital stacks” – storage that has
appropriate controls for preservation and
disaster preparedness
6
Traditional Curatorial Processes
for Paper Records
Appraisal
Acquisition
Arrangement and description
Housing and storage
Reference and access
Preservation
7
Curatorial Rationale for Digital
Records
Transformation of traditional, paper-based
practices into the digital arena
Focus on the rules, not the records
Automate the rules
8
Digital Stacks
More than storing the data (CD, tape,
disk)
LOCKSS
1. Automatic integrity checking and
error detection
2. Secure
3. Geographically distributed
9
Additional Goals
To build a community of shared practice
that meets the needs of a wide range of
repositories
- For best practices
- For resource sharing
To remove barriers by keeping costs as
low as possible
10
The Open Archival Information System
(OAIS) Reference Model
OAIS an international (ISO) standard
Defines minimal set of responsibilities for
long-term preservation
Can be applied to any information or
object that needs to be retained long-term
OAIS does not specify a specific design or
implementation
http://public.ccsds.org/publications/archiv
e/650x0b1.pdf
11
View of an OAIS
Environment
Producer
OAIS
(PeDALS)
Consumer
Management
12
PeDALS (OAIS) Functional
Areas
Ingest
Archival storage
Data management
Administration
Preservation planning
Access
13
PeDALS Overview - 1
Agency records in an electronic records
system are transferred via the Internet to
the PeDALS system
Supplemental processing checks for file
integrity and completeness prior to
transfer
14
PeDALS Overview - 2
Agency records with associated metadata
are transferred to middleware server
(Microsoft BizTalk®)
Rules-based software will transform
records into format for long-term storage
along with a copy for web access
15
PeDALS Overview - 3
Records are transferred into LOCKSS
servers for long-term preservation
LOCKSS is a “dark archives”
16
PeDALS Overview - 4
Public access will be provided via the web
Restricted records will be blocked from public
access
17
PeDALS Network
Architecture
Agency’s will have the
ability to login and
upload records to the
South Carolina Digital
Archive.
Biz Talk will check the
incoming records for
completeness and
matches the hash
value on upload.
19
Archivist Review
Once records are received the Archivist
will receive an email.
The files will then be reviewed and a high
level description will be entered in the
Database Catalog.
The SIP (Submission Information
Package) is created.
20
Biz Talk
This is where the magic happens.
21
Biz Talk Processes
DIP (Dissemination Information Package)
created.
The Catalog database is updated with
Access, Description and Preservation
Information.
The Archival records are placed on the
Manifest Server for Ingest into LOCKSS.
The public access database is updated.
22
LOCKSS
(Lots of Copies Keep Stuff Safe)
Based at Stanford University.
LOCKSS has primarily been used for
scientific journals and publications.
Open Source and uses Open BSD which is
a multi-platform 4.4BSD-based UNIX-like
operating system.
23
LOCKSS
Boots from CD = No operating system
installed on the server.
Communicates using a VPN virtual private
network.
Files for LOCKSS are stored on a separate
Admin server running linux.
1 LOCKSS cluster with 7 Servers in our
private distributed LOCKSS network.
Initially setup to take in 1TB of data and
can be expanded.
24
LOCKSS Storage
Dark secure archival
storage
LOCKSS is a
sophisticated data
storage system that
scans for and repairs file
corruption and other
data integrity problems
Level 4 firewalls and
geographic distribution
provide added security
25
Public Access Process
BizTalk Process - AIP (Archives
Information Package).
This process moves records from LOCKSS
to the Public Access web server based on
the record access date.
26
PeDALS Network
Architecture
Web server will provide
Internet access to records
through a web-based search
interface.
Access to records restricted by
statute or otherwise will be
blocked during restriction
period.
Restricted records are held in
the LOCKSS dark archive no
user copy is sent to the web
server until public access is
allowed.
27
Future Public Access
We are currently in the process of
implementing the web component of
Rediscovery.
This will allow the public to search our
holdings.
We are hoping to use Biz Talk to automatic
populate the Rediscovery catalog.
Public access will be granted through URls to
the Rediscovery web component.
28
PeDALS Open Archival Information System
(OAIS) Network Architecture
29
Records Eligible for PeDALS
Permanently valuable electronic records
scheduled for transfer to the SCDAH
Pilot project agencies and records:
Judicial Department – Supreme Court Case Files
Election Commission – Voter Registration Master Files
Public Service Commission – Orders
DHEC – Electronic Index to Death Certificates
30
Project Status
Core metadata defined and data
dictionary completed
System design completed
Hardware and software acquired and
installed
Agency partners and records identified
System prototype built (AZ & SC)
BizTalk® training completed
31
On the Horizon
Other states purchase and configure
hardware & software
First ingest of records in early winter
Develop public search website
32
Post-Grant
Move from pilot to production mode
Develop procedures for agency
participation
Expand participation to additional
agencies and records
33
PeDALS
Bill Henry
Electronic Records Consultant
[email protected]
(803) 896-6137
Matt Guzzi
Electronic Records Archivist
[email protected]
(803) 896-6103
34