Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004 E-mail: The Digital Divide Also Multiplies.

Download Report

Transcript Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004 E-mail: The Digital Divide Also Multiplies.

Preservation of Electronic Mail
Druscie Simpson
NC State Archives
November 19, 2004
E-mail:
The Digital Divide Also Multiplies
E-mail as a Burden




The Radicati Group and Merrill Lynch estimate that email is
growing at a rate of 300% annually. The Age (July 8, 2003)
The real problem: not more email, but “larger and larger
attachments, generating an average of 5MB of email content”
daily. The Age (July 8, 2003)
Email generates about 400,000 terabytes of new information
each year worldwide
About 31 billion emails are sent daily, on the Internet and
elsewhere, a figure which is expected to double by 2006
(source: International Data Corporation (IDC). The average
email is about 59 kilobytes in size, thus the annual flow of
emails worldwide is 667,585 terabytes. (How Much Information 2003, UC
Berkeley)
What do I do with
ALL that e-mail?!


Why are we so
interested in E-Mail
and Digital Records?
Email’s far reaching
effects
Loss of Corporate Knowledge
Imagine you’re new in the office. All of
the information to do your job was on
your computer. Your predecessor
deleted the information before leaving
or it was password protected. You
don’t have the password.
Legal Implications



If it is in an email and it
sent from, received by,
or is stored on a
government computer,
it is a legal record
Never put anything in
an e-mail you don’t
want on the front page
of the local paper.
Always CYO cover your
office.)
Users have several options for
keeping their saved e-mails:




They may leave it on the mail provider’s
server
They may leave it on a web-based mail
server such as Hotmail or Yahoo
They may store it in their e-mail client such as
Outlook, Eudora, Netscape
They may store it on the file system of their
PC as individual .eml files (MS Outlook
Express Electronic Mail)


In each of these circumstances the actual
byte stream used to represent the e-mail
message is slightly different.
While an e-mail server and e-mail client are
obliged to communicate with each other using
standards (SMTP, POP3, and IMAP) they are
not required to store the e-mail using any sort
of standard.
We will be looking for a solution that
will have the widest possible use


Start with an IMAP server
Enhance server with the ability to take the
contents of its message store and create the
desired standard XML files called XMTP



Using XMTP, SMTP messages can be
transformed via XSLT into HTML pages for
viewing. XMTP has been used to implement a
telemedicine consultation system using SMTP email and HTML
In the testing phase, but not launched yet
http://sourceforge.net/projects/smtp/


IMAP seems to be the only protocol that
supports moving and copying e-mail
messages from place to place while
preserving the e-mail message’s native
format.
This means that no matter where the e-mail
message ends up, almost any IMAP
compliant e-mail client can send it to an
“archives” server.
How?


Have the user send e-mail directly to a server
hosted by the NC State Archives
Have the user send e-mail to an enhanced
IMAP server maintained by their agency


This would enable the agency to be able to locally
access the archives e-mail messages
IMAP server could then send snapshots to or send
us the XMTP files on electronic media via USPS




Have the user collect and send .pst files to the NC
State Archives
Archives will open them with Outlook and move them
to the enhanced IMAP server (process would be
automated)
Archives should also be able to access packages of
e-mail in other formats since Outlook can convert
from Eudora, Netscape, etc.
Once loaded into Outlook, the e-mail packages would
then be sent to the IMAP server.

Any strategy based on the interception of the
data stream is out since we want to collect
the e-mail message only after the user has
been given a chance to cull and organize
them.


Our proposal is to use hmailserver (a source
forge open source project) which is an IMAP
server that uses MySql or Microsoft SQL
server as its message store.
http://www.hmailserver.com



The hMailServer installation contains a
minimal MySQL-installation, so if you don't
already have a database server in your
network, MySQL is installed automatically
when you install hMailServer.
The XML creation utility could interface
directly with the message store instead of the
IMAP protocol.
Hmailserver comes with an attendant com
component that can be used to access the
data store
Life of an e-mail message







E-mail message is sent to the user’s mail server
User downloads the message to his/her mailbox
User optionally places the message into a folder
on his/her local system
User creates a folder on the “Archive” IMAP server
User moves the mail from his/her inbox or
specified folder to the folder on the “Archives”
IMAP server
An administrator requests that the IMAP server
create one or more XML files containing the user’s
e-mail
XML files are saved as a preservation copy
Access to Email #1


Load the XML into ENCompass
Utilize the IMAP server by enhancing it to
provide web access to its native store similar
to the user interface provided by Lurker
 http://sourceforge.net/projects/lurker
Access to Email #2

Utilizing Documentum by enhancing it to
ingest the XML produced by the IMAP server.


Documentum server would be used purely as an
e-mail repository, not as a document management
application.
Utilize Documentum as a document management
application to interfile e-mail messages into
named record series
Access to Email #3

Move e-mail messages into a Share Point
Portal server



Use Outlook to collect the message from the IMAP
server and send them to SPP.
Switch-to-Switch Protocol. Protocol specified in
the DLSw standard, used by routers establish
DLSw connections, locate resources, forward
data, and handle flow control and error recovery.?
XML files would serve purely as a preservation
copy.
This Particular Project

Take 6 gigabytes of e-mail from Governor Jim
Hunt’s administration (1993-2001; bulk dates
1997-2001) and make it accessible and
preservable.



E-mail has been appraised and culled to create
the core for preservation
E-mail is in Microsoft Outlook .pst files and can be
accessed only by using the correct version of
Outlook
Create/utilize programs to move the e-mails
out of Microsoft’s proprietary .pst format into a
non-proprietary and stable XML format



Also want to write software that is more
universal in scope and can be used with most
electronic records.
Hire a programmer to write code to convert
the .pst files from their format to XML format
Take the converted XML files and load them
onto our server and make them available to
the public via the web and searchable
through our online catalog system
(ENCompass/MARS)
Wish us luck!



We are very excited to have this
opportunity to explore this potential
solution
We hope to take what we learn and
apply it to the collection of other
electronic government resources that
are archival
We’ll keep you posted!