Transcript Document

Digital Continuity: Tips for Managing
e-Legacy Records
Stephen Clarke
Senior Advisor
Digital Sustainability Programme
“Vindictive” Data Loss
• Gareth Pert, 23, nearly crippled Hamilton business
Progressive Hydraulics while acting out of "pure
vindictiveness", by deliberately wiping data
• Files containing information about international
patents, crucial project data and five years' worth of
engineering drawings were affected
• Police said the data deleted was worth more than
$150,000 but the true cost is incalculable because of
delayed or lost projects and time spent on recovery.
• Computer forensics specialists could recover only 40%
of the data lost.
The Company CIO Said…
"Electronic data is
actually worth a lot
more money than you
think. It's not until you
lose it that you realise
what a key component it
is in your business."
That was deliberate but benign data loss is
actually a greater threat
• Media Failure. All storage media must be expected to
degrade with time, causing irrecoverable bit errors, and
sudden catastrophic irrecoverable loss of data
• Hardware Failure. All hardware components must be
expected to suffer transient recoverable failures, such
as power loss, and catastrophic irrecoverable failures.
• Software Failure. All software components must be
expected to suffer from bugs that pose a risk to the
stored data.
• Communication Errors. Systems may have
undetected checksum errors
• Failure of Network Services. Domain names and
persistent URLs will suffer both transient
and irrecoverable failures
Portable media decay
Data corruption (bit-rot)
Only one bit of a Byte is corrupted in this image!
That was deliberate but benign data loss is
actually a greater threat (cont.)
• Media & Hardware Obsolescence. All media and
hardware components will eventually fail
• Software Obsolescence. Similarly, software
components will become obsolete. This will often be
manifested as format obsolescence when, information
can no longer be decoded from the storage format into
a legible form.
• Operator Error. Operator actions must be expected to
include both recoverable and irrecoverable errors.
• Natural Disaster. Natural disasters, such as flood, fire
and earthquake must be anticipated.
Hardware Obsolescence
• Hardware has a limited life span
Software Platform
Obsolescence
Assuming you have all
the right hardware and
storage you then need
the right software and
operating system to
interpret the data and
render it as supposed to
look.
• Application software
• Operating System
• Display
Storage Formats Obsolescence
• Storage media has a limited life span
That was deliberate but benign data loss is
actually a greater threat (cont.)
• External Attack. All systems connected to public
networks are vulnerable to viruses and worms.
• Internal Attack. Much abuse of computer systems
involves insiders, those who have or used to have
authorized access to the system.
• Economic Failure. Information in digital form is much
more vulnerable to interruptions in the money supply
than information on paper.
• Organizational Failure. Organisations may not plan
and provide sufficient resources for ensuring their
digital assets are protected.
http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html
There are a wide variety of e-legacy
records
•
•
•
•
•
•
•
Email
SMS/Text messages
Databases
GIS
Textual records
Audiovisual recordings
Pictures and images
(scanned docs)
• Intranets and shared
workspaces
• As well as…
The Web
There are a wide variety of standards
There are a wide variety of vendors
Sheer volume
• The volume of digital
information being
created is increasing
exponentially.
• In 2008 the digital
content created
exceeded storage
capacity for the first time.
• By 2011, the volume of
digital content will be 10
times the size it was in
2006.
• By 2011, almost half of
all information created
will not have a
permanent home.
e-Legacy records issues - Technological
•
•
•
•
Hardware / media obsolescence
Operating system obsolescence
Software application obsolescence
Storage media obsolescence
e-Legacy Records Issues - Organisational
• Proprietary formats and DRM can impact on your
ability to access information
• New IT implementations often don’t take account of
existing systems, information gets orphaned
• Benign neglect is commonplace
• Lack of controlling indexes or context
• Idiosyncratic titling and folder structures
• Lack of organisational awareness and willingness
AAArgggghhhh!!!!
Where do I start?
Where do I start?
• Identify what you have
• Make an inventory of formats or software environments
you use
• Prioritise ‘at risk’ information
• Migrate where there are ‘quick wins’ e.g. from older
versions of Microsoft Office products, ppt, Word, Excel,
etc.
• Raise awareness and get senior management support
• Draft organisational or departmental policies
• Does the material need to be retained can I dispose?
Make friends with your IT people
Courtesy National Archives of Australia
Make friends with dept. secretaries and PA’s
They know where everything is!
Steps to managing e-legacy records
• Identify the creators of the records contained in the
legacy system
• Identify the physical format
• Determine the software format
• Identify the context of the records’ use where possible
• Appraisal to apply, disposal and sentencing, migration
strategies and risk analysis
• Convert to open formats
Identifying creators
Implement an institutional
knowledge management
programme to find out about:
• Organisational administrative
history
• Individuals names, roles and
positions
• Project working groups
• Previous mergers or
amalgamations
• New functions or functions no
longer carried out
• What all those %$#@#+#
acronyms mean!
Tools that are available to help with
identifying file formats include:
•
•
•
•
•
PRONOM
Droid
JHOVE
National Software Reference Library
Wotsit
Open a hardware museum?
Find out what hardware you have in-house
• 8” Drives, 5 ¼” drives, cartridge players etc.
Find out what software you have in-house
• Earlier versions of windows, Photoshop, in-house
developed software, proprietary systems, etc.
Risk evaluation (and tools)
• Risk associated with records’ formats, with context and
authenticity
• The AS/NZS 4360:1999 Standard on Risk
Management
• DRAMBORA - Digital Repository Audit Method Based
on Risk Assessment
• Trustworthy Repositories Audit & Certification (TRAC):
Criteria and Check-list
• NESTOR - Network of Expertise in Long-Term Storage
of Digital Resources
Undertake a review (records audit / survey)
• What is the Business Value?
• Are there Compliance or legal hold considerations?
• Financial implications
• litigation
• unnecessary storage costs
• fraud
• loss of contracts or agreements
• accounts payable/receivable errors and/or
omissions
Digital Preservation Tactics
•
•
•
•
Normalisation
Migration (AKA conversion or technology refresh)
Emulation
Encapsulation
Open Source Tools
• Fedora – digital archive
• D-Space – digital archive
• DROID – format
recognition
• JHOVE – format
recognition
• SIARD – database
archiving
• XENA – normalisation
• www.sourceforge.net
Open Format Examples
•
•
•
•
•
ODF - OpenDocument Format
XML – eXtensible Markup Language
HTML – Hypertext Markup Language
PNG – Portable Network Graphics
FLAC – Free Lossless Audio Codec
• There are emerging and de facto standards e.g.
PDF(A), OOXML, ODF, JPEG 2000, TIFF, etc.
It’s not just a technology issue
• Survey staff on what older erecords they have and
encourage them to self migrate
• Use institutional knowledge and
find out what systems have
been used and where old
equipment is
• Engagement is higher when
staff feel involved (de-mystify)
• Implement policies and
procedures so that
obsolescence will be managed
in future
Is addressing digital continuity difficult or
expensive?
• Digital continuity actions required are incremental, and
needn’t cost a lot of money (e.g. setting policies,
procedures and migration plans)
• Recognise that digital continuity is a risk, carry out a
risk assessment and prioritise mitigating actions (focus
resources on the important stuff)
• More effectively managing the information you need to
keep and disposing of what you no longer need should
cut costs and could also deliver efficiency benefits and
savings
• Tackling risks now and as part of a planned response
is often more cost effective than waiting until
technology risks occur (data recovery is expensive and
not always possible).
Any Good News?
• File formats are becoming obsolete slower than
thought (the market is stabilising)
• Storage costs are coming down
• Established practice is emerging (you can reuse)
• Trusted Standards are being Adopted (XML, ODF,
PDF-A, JPEG-2000, etc.)
• You’re not alone!
• There is a growing international community of practice
• We’re providing guidance and support to help you, and
• The Digital Continuity Action Plan provides a public
sector wide framework to facilitate collaborative
approaches.
Any Questions?
Courtesy National Archives of Australia