Transcript Slide 1

South Carolina Information Technology Directors
Association
September 8, 2008
Bill Henry, Matt Guzzi
SC Department of Archives and History
Background – Last Year

2007 NHPRC grant proposal not funded

AZ Archives submitted multi-state grant
proposal to Library of Congress

AZ proposal had same basic goals

SC too late for funding

Paid own expenses to join project
2
Electronic Archives Funding

One-time funding from General Assembly

Digitize paper records

Capture agency website snapshots

Purchase hardware and software

Library of Congress approved additional
funds for project

SC now a fully-funded partner
3
What is PeDALS?

Persistent Digital Archives and Library System

Multi-state grant project funded by the Library of
Congress and the Institute for Museum and
Library Services

Five state partners: Arizona, Florida, New York,
Wisconsin, South Carolina

Project will run 18-24 months; if successful,
SCDAH intends to continue participation beyond
this period

At the end of the project each partner will have a
4
functioning digital archives system
Why is PeDALS Needed?

An increasing number of long-term and
archival records are created and
maintained only in digital formats

Traditional archival practices designed for
paper records won’t work in digital
environment

Need ability to preserve electronic records
so that we can demonstrate authenticity
and protect integrity

PeDALS is both a learning opportunity and
a chance to implement a functioning
system
5
Technical Goals

To develop a curatorial rationale that can
be implemented in software to support an
automated, integrated workflow to
process collections of digital records

To build “digital stacks” – storage that has
appropriate controls for preservation and
disaster preparedness
6
Traditional Curatorial Processes
for Paper Records

Appraisal

Acquisition

Arrangement and description

Housing and storage

Reference and access

Preservation
7
Curatorial Rationale for Digital
Records

Transformation of traditional, paper-based
practices into the digital arena

Focus on the rules, not the records

Automate the rules
8
Digital Stacks

More than storing the data (CD, tape,
disk)

LOCKSS
1. Automatic integrity checking and
error detection
2. Secure
3. Geographically distributed
9
Additional Goals

To build a community of shared practice
that meets the needs of a wide range of
repositories
- For best practices
- For resource sharing

To remove barriers by keeping costs as
low as possible
10
The Open Archival Information System
(OAIS) Reference Model

OAIS an international (ISO) standard

Defines minimal set of responsibilities for
long-term preservation

Can be applied to any information or
object that needs to be retained long-term

OAIS does not specify a specific design or
implementation

http://public.ccsds.org/publications/archiv
e/650x0b1.pdf
11
View of an OAIS
Environment
Producer
OAIS
(PeDALS)
Consumer
Management
12
PeDALS (OAIS) Functional
Areas

Ingest

Archival storage

Data management

Administration

Preservation planning

Access
13
PeDALS Overview - 1

Agency records in an electronic records
system are transferred via the Internet to
the PeDALS system

Supplemental processing checks for file
integrity and completeness prior to
transfer
14
PeDALS Overview - 2

Agency records with associated metadata
are transferred to middleware server
(Microsoft BizTalk®)

Rules-based software will transform
records into format for long-term storage
along with a copy for web access
15
PeDALS Overview - 3

Records are transferred into LOCKSS
servers for long-term preservation

LOCKSS is a “dark archives”
16
PeDALS Overview - 4

Public access will be provided via the web

Restricted records will be blocked from public
access
17
PeDALS Network
Architecture

Agency’s will have the
ability to login and
upload records to the
South Carolina Digital
Archive.

Biz Talk will check the
incoming records for
completeness and
matches the hash
value on upload.
19
Archivist Review

Once records are received the Archivist
will receive an email.

The files will then be reviewed and a high
level description will be entered in the
Database Catalog.

The SIP (Submission Information
Package) is created.
20
Biz Talk

This is where the magic happens.
21
Biz Talk Processes

DIP (Dissemination Information Package)
created.

The Catalog database is updated with
Access, Description and Preservation
Information.

The Archival records are placed on the
Manifest Server for Ingest into LOCKSS.

The public access database is updated.
22
LOCKSS
(Lots of Copies Keep Stuff Safe)

Based at Stanford University.

LOCKSS has primarily been used for
scientific journals and publications.

Open Source and uses Open BSD which is
a multi-platform 4.4BSD-based UNIX-like
operating system.
23
LOCKSS

Boots from CD = No operating system
installed on the server.

Communicates using a VPN virtual private
network.

Files for LOCKSS are stored on a separate
Admin server running linux.

1 LOCKSS cluster with 7 Servers in our
private distributed LOCKSS network.

Initially setup to take in 1TB of data and
can be expanded.
24
LOCKSS Storage



Dark secure archival
storage
LOCKSS is a
sophisticated data
storage system that
scans for and repairs file
corruption and other
data integrity problems
Level 4 firewalls and
geographic distribution
provide added security
25
Public Access Process

BizTalk Process - AIP (Archives
Information Package).

This process moves records from LOCKSS
to the Public Access web server based on
the record access date.
26
PeDALS Network
Architecture
Web server will provide
Internet access to records
through a web-based search
interface.
 Access to records restricted by
statute or otherwise will be
blocked during restriction
period.
 Restricted records are held in
the LOCKSS dark archive no
user copy is sent to the web
server until public access is
allowed.

27
Future Public Access

We are currently in the process of
implementing the web component of
Rediscovery.

This will allow the public to search our
holdings.

We are hoping to use Biz Talk to automatic
populate the Rediscovery catalog.

Public access will be granted through URls to
the Rediscovery web component.
28
PeDALS Open Archival Information System
(OAIS) Network Architecture
29
Records Eligible for PeDALS

Permanently valuable electronic records
scheduled for transfer to the SCDAH

Pilot project agencies and records:




Judicial Department – Supreme Court Case Files
Election Commission – Voter Registration Master Files
Public Service Commission – Orders
DHEC – Electronic Index to Death Certificates
30
Project Status






Core metadata defined and data
dictionary completed
System design completed
Hardware and software acquired and
installed
Agency partners and records identified
System prototype built (AZ & SC)
BizTalk® training completed
31
On the Horizon

Other states purchase and configure
hardware & software

First ingest of records in early winter

Develop public search website
32
Post-Grant

Move from pilot to production mode

Develop procedures for agency
participation

Expand participation to additional
agencies and records
33
PeDALS

Bill Henry
Electronic Records Consultant
[email protected]
(803) 896-6137

Matt Guzzi
Electronic Records Archivist
[email protected]
(803) 896-6103
34