Basic electronic records
Download
Report
Transcript Basic electronic records
Preserving and Providing LongTerm Access to Archival
Electronic Records
Thomas J. Ruller
[email protected]
www.truller.com
© 2003 Thomas J. Ruller
Workshop topics
Defining and identifying records and
documents in a digital environment
Selecting and preparing records for longterm preservation
Preserving records and documents in digital
formats
Providing access to digital materials and
special challenges
Basic archival objectives
Select the information that is most
important
Ensure that it remains useable,
understandable and accessible over time
Protect it from changes in hardware and
software
Make it understandable to current
technology
These objectives are accomplished when...
Identify all the components of a record.
Ensure all of the components are managed
and cared for.
Move components to a standards-based data
and systems architecture.
Protect components from change over time.
Life-Cycle Management is the Key
The archival
administration of
records in electronic
form requires LIFECYCLE management.
Actions taken at the
beginning of the lifecycle impact
preservation and
access.
Start at the beginning
Starting at the beginning... all digital data
is simply binary code
Information is
represented by 1 or 0
Each 1 or 0 is a bit
8 ones and zeroes
together make a byte
7 bit ASCII is a
standard for
representing meaning
through 7 bits the
eighth bit is a check
bit
01000001 = A
01100001 = a
01000010 = B
01100010 = b
Unicode is an
emerging standard for
encoding
Data vs. Metadata
Data is the actual information that is created
and acted upon in the information system
Metadata is either applied to the data or
created by its use. It is information about
the data.
With electronic records data and metadata
are inseparable.
South Carolina Public Record
"Public record" includes all books, papers, maps,
photographs, cards, tapes, recordings, or other
documentary materials regardless of physical form or
characteristics prepared, owned, used, in the
possession of, or retained by a public body.
Does this work in a digital environment?
Records, Documents and Data
Data
– All recorded
information
Documents
– discrete and
identifiable
– logical structure
– stored as more than
one component
Records
– support accountability,
transactional
– recorded
– in any form
– created, received or
maintained
– by an organization or
person
– in the transaction of
business
– and kept as evidence
Records are defined through
Content
– What information and facts do they contain?
Context
– Who created these records, when, why and for what
purpose?
Structure
– What are the components of this record and how are
those components organized?
Metadata is the Key
Information about the structure and organization
of the digital information
Needed to make the information useable and understandable
Information about the access and use of the information
Needed to substantiate its authenticity
Identifying an Organization’s Records
What is important to document?
What are the “recordkeeping requirements”
of the organization?
What is considered the “record” of a
transaction or element of business
What are all of the components of
information that make this a “record”
Qualities of “Authentic”
electronic records:
• Rules governing “Documentary Form”
• Rules Governing Annotations
• Medium
• Context
Authenticity
Identity and integrity of the record
Access privileges
Protective procedures to guard against loss and
corruption
Protective procedures for media and technology
Established “documentary forms”
Procedures for authentication
Procedures for moving records from active to
inactive status
Creating Useful Records
From the University of Pittsburgh project
Develop information systems that are
capable of creating and keeping records
–
–
–
–
Metadata that defines the context of records
Who created this information
When was it created
What happened to it after it was created.
• Understand the business, regulatory and social context in which they
operate (step A); Preliminary investigation
• Identify their need to create, control, retrieve and dispose of records (that is,
their recordkeeping requirements) through an analysis of their business
activities and environmental factors (steps B and C)
• Analysis of business activity and identification of recordkeeping
requirements.
• Assess the extent to which existing organizational strategies (such as
policies, procedures and practices) satisfy their recordkeeping requirements
(step D)
• Redesign existing strategies or design new strategies to address unmet or
poorly satisfied requirements (steps E and F)
• Implement, maintain and review these strategies (steps G and H)
The DIRKS Methodology
Simple strategies for office documents
Develop filing systems for
storing electronic files.
Develop naming
conventions for types of
documents.
Set baseline standards for
important business
documents.
Office defines procedures
for all documents.
Filing systems should
support implementation of
retention schedule.
Electronic files are copied
to archive storage.
Electronic files are
migrated to new formats.
Records are equivalent regardless
of format
Identify what information must be created
Determine how long the information must
be maintained
Understand the uses and functions of the
records both primary and secondary
Select the “best evidence” of the activity,
fact , transaction or event and maintain it.
Questions
How do you apply these distinctions in to
personal papers?
Can you apply these distinctions “after the
fact?”
Can data or information be institutional
records?
Defining records with
business rules
Key source of documentation
Key to understanding:
– Risks
– Documentation needs
– Archiving needs
Often not written
Required for complex data systems
Sources of business rules
Law and regulation
Federal law and regulation
Operational procedures
Data Administration
System documentation
These same considerations can apply to individuals
and organizations.
Document Management System
Solutions
DOD 5015.2
Requires a file plan
Requires
implementation
support
Helps organizations
select and file mail
and documents they
need to keep
A Word About Metadata
Data about the structure and content of the
information
– Code books
– File layouts
– Database Entity-Relationship-Diagram
Data about the access and use of the
information
– Audit trails, etc.
More on Metadata
Different levels of metadata needed for
different types of records
– High risk records that require greater
authenticity require lots of access and use
metadata
– Low risk records may require no access and use
metadata
– Virtually all records require some kind of
structural documentation
Documentation
File and record layout
Codebook
Data flow-diagrams
Entity-Relationship Diagrams
Log-files and audit trails
Electronic repository system
Electronic records inventory
Name of system
Function of system
List of sub-systems
Functions of sub-systems
Recordkeeping requirements supported
List of files/procedures that support the
recordkeeping requirements
Electronic records system
inventory continued
Operating environment
Organization of the data
Source of documentation
Related audit-trails
Purge/Migration criteria
Documentation of off-line resources
Security documentation/procedures
Retention Scheduling /
Purge-Migration
What information is kept?
What files does it go to?
Where do the files get stored?
How long are they stored there?
What happens at the end of each stage of
the life of the data?
Note: This is a systems tool, not a recordkeeping
process.
Purge/Migration
or Archiving Criteria
Purging data from system or migrating to
tape
In the language of the data manager
Provide specific guidance on what
files/data/processes are maintained
Easily and frequently updated
Can be documented
Evidential value and electronic
records
Tries to answer the question, what did they
know?
– Uses metadata on the access and use of the
system.
– Requires “snapshots” of data to link to
metadata
– Available only for “high-risk” environments
Legal Evidence
Proving that the information is the “best
evidence” of a fact, activity, or action.
–
–
–
–
–
Complete
Accurate
Created in normal course of business
Authentic
Original
All accomplished through “use” metadata and
business rules
Appraisal of legacy electronic
records
Content appraisal
– Determines whether the function and the
information are of archival quality
Technical appraisal
– Determines whether the data is complete and
structured in such a way that it can be
preserved.
Content Appraisal
Basic archival questions
– Informational value
» What facts?
–
–
–
–
–
Completeness of data
Time series
Function of data
Accuracy and reliability
Evidential value
Technical Appraisal
Structure of information
Types of storage used for the files
Purge/Migration criteria and methods
Physical storage methods and capabilities of
creator
Analysis of documentation
File sizes, integrity of database,
completeness of files, etc.
Complex Relational Database
Entity Relationship Diagram
File and record layouts
Business rules
Function to entity diagram or
documentation
Archiving criteria and plan
Annual Report Web Page
Appraisal of annual report
Downloaded HTML/XML files
Downloaded cascading style sheet
definition
Downloaded images and graphics
Inventory of all document components
Web resource preservation
Understand that the web is not a
recordkeeping system by itself, it is a
delivery system
Static HTML pages are only one small
aspect of web content
The web is multi-component and extremely
dynamic
Web harvesting tool example
HTTrack is one
example, used by
National Library of
Australia
Open source software
used to harvest
specific web sites
Copies HTML and
other files into your
file system.
Works well for static
content
Modifies links for
retrieval
Internet Archive
www.archive.org
•Wholesale “capture” of all static html content
• Not selective or related to changes in content
• Uses Alexa web crawler and proprietary storage method
Internet Archive Footer
<!-- SOME LINK HREF'S ON THIS PAGE HAVE BEEN REWRITTEN BY THE WAYBACK MACHINE
OF THE INTERNET ARCHIVE IN ORDER TO PRESERVE THE TEMPORAL INTEGRITY OF THE SESSION.
-->
<SCRIPT language="Javascript">
<!-// FILE ARCHIVED ON 19970606072913 AND RETRIEVED FROM THE
// INTERNET ARCHIVE ON 20020416142610.
// JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
// ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
// SECTION 108(a)(3)).
.
//-->
</SCRIPT>
</HTML>
Selection guidelines for government
publications
Procedures for managing and adding to the
archive
Guidelines for transfer of ownership
Practices for persistence of links and for
naming resources
pandora.nla.gov.au
Accessioning
Physical and legal acquisition of records
May result in a copy of records transferred
to archives as the “record copy”
Direct relationship to purge/migration
criteria
“first step” in preservation process
Accessioning guidelines
Recording mode for data
– Standard label ASCII or EBCDIC (cartridge or
tape only)
– Density of recording
– Operating system requirements
– What files are on the media
– ISO standard for CD-ROM
Accessioning guidelines
Documentation must accompany records
–
–
–
–
–
Code books
File and record layouts
Data dictionaries
Index files
Media documentation
Accessioning guidelines
Metadata and documentation is also
transferred at time records are transferred
– Log files are most common metadata
– Linked metadata files common in database
environments
– Data dictionary file
– Entity Relationship Diagram
– DTD for encoded text
Accessioning procedures
“Pre-Accessioning” analysis
– Test whether the records you receive are
complete, match the documentation, and are in
accessible data structures
– Simple dump of sample of data
» applies to all digital materials
– Compare to documentation
Accessioning procedures
Make two copies of the records
– One for accessioning
– One for off-site storage
Expect to spend a lot of time working with
the records creator to acquire
documentation and work out details of
media formats, etc.
Accessioning procedures
Tools:
–
–
–
–
–
–
–
–
SPSS for statistical or numeric files
Desktop database for simple databases
Robust RDBMS for SQL based databases
Word processor for desktop files
High-end PCs/Unix workstations
Tape drives
DTDs or RFCs
System Modelling tools
Accessioning procedures
Storage media
– The only “archival” digital media are open-reel
tape or 3480/3490 or 3590 magnetic cartridges.
No other media has been rigorously tested,
subscribes to national/international standards.
Accessioning Procedures
Digital Linear Tape is becoming an alternative and
has been “field tested” as a possible alternative to
tape cartridges.
Preservation Components
Management of media
to ensure that the
information can be
retrieved from some
storage medium
Preservation of the
data
Preservation of the
means to understand
and interpret the data
Preservation of the
metadata to
demonstrate that the
data is an authentic
record
Preservation
Media preservation
–
–
–
–
Store only on tape or cartridge
Periodic (3-5 year) rewind of each volume
Periodic (5-10 year) copy to new media
Migrate to new media when
appropriate/necessary
– Maintain environmental controls
– Maintain use and archival copies
Preservation
Migration:
– Move to new hardware and software
environments because data is in a standard
format
Emulation
– Create software tools that mimic the original
encoding scheme and save the bitstream as-is
Preservation
Migration:
–
–
–
–
David Bearman
Kenneth Thibideau
Margaret Hedstrom
40 years of experience
Emulation
– Jeff Rothenberg
– Universal Preservation Format (sort of)
The 1980’s video game
Missile Command running in under emulation via the
Internet…2003
Preservation via Migration
Objective:
– Understandable
– Useable
– Accessible
Accomplished through:
– Media
– Recording methods
– Data formats
Preservation through standard formats
Information
maintained in a
standard, ubiquitous
format.
Formats are backward
compatible.
Formats that ensure no
information loss.
Information is
migrated to target
preservation format.
Data structures and
encoding format
should both be in
appropriate format.
Life-cycle standards
Industry standards are often the best
available approach
Ubiquitous environments
Archivability begins at the point of creation
Archiving must add value if creators are to
incorporate it.
Standards list
Text
–
–
–
–
SGML
PDF
ASCII
XML
Sound/Video
– MPG
E-mail
– SMTP
– Encoded text
– SQL database
Database
– SQL
– xBase
– XML
GIS
– SDTS
– Content Standard
Recording Modes
– ASCII
– EBCDIC
Preservation of meaning through
documentation and metadata
File layouts and formats must be defined in structural
metadata or documentation
Meaning of specific attributes must be explained in
structural metadata.
Information that supports completeness and authenticity
must be maintained through maintenance of logs, record
counts and business process documentation.
All of this information must be maintained and migrated as
well if it is in electronic format.
Preservation of documentation and
metadata
Sources
– Entity Relationship Diagrams (UML)
– New file layouts
– New documentation of recording modes
Access
User guide is primary vehicle for access
Good user guides enable the researcher to
understand the records before actually
seeing the data itself.
Modern technology tools greatly enhance
the ability to locate discrete pieces of
information.
OAIS
OAIS
Ingest: to bring a
bitstream into the
system.
– Wraps bits to identify
them for retrieval
– Ensures bits are
complete and accurate
Storage:
– maintains bits over
time
Dissemination
– Query capability
– Reproduction of bits
into useful information
– Protection of sensitive
information
OAIS does not
discriminate about
bits. Selection and
context are needed
Ken Thibodeau, “Building the Archives of the Future” D-Lib magazine February 2001
Research trends
Functional
requirements research
– Focus on records
– At beginning of lifecycle
– Based on model
developed at
University of
Pittsburgh
Migration/preservation
research
– Emerging from digital
libraries work
– Heavy focus on digital
images
– RLG is a major player
in this work
– InterPARES
– NARA
Research trends
Functional
requirements research
– Focus on records
– At beginning of lifecycle
– Based on model
developed at
University of
Pittsburgh
– Trustworthy
Information Systems
Migration/preservation
research
– Emerging from digital
libraries work
– SDSC/PERPOS
– CaMelion
– RLG is a major player
in this work
Key Archival Research
RLG Digital Libraries
and various task forces
of Preserv section
US NARA sponsored
research at UCSD and
Georgia Tech
InterPARES
Program
Implementations at:
– Delaware State
Archives
– Indiana University
– State of Michigan
– Cornell University
Your skills
Keep a general
understanding of how
records can be created
Learn business process
analysis
Learn project
management
Understand trends in
standards
Understand
preservation research
Read:
– JASIS
– PC
– HBR
Concluding guidance
Remember that
records are
representations of
activity not physical
devices
Electronic records
management requires
life-cycle management
Preservation is much
more than media
management
Documentation is of
equal importance to
the data.
Records are more than
just the data.