DSpace, digital preservation, and business models

Download Report

Transcript DSpace, digital preservation, and business models

DSpace, digital preservation, and
business models
ERPANET Seminar
Business Models related to Digital Preservation
September 20-22, 2004
Julie Walker
MIT Libraries
Characterizing the
digital preservation market
The digital preservation challenge
“Information is being produced in greater quantities and with
greater frequency than at any time in history. Electronic media,
especially the Internet, make it possible for almost anyone to
become a "publisher." How will society preserve this information
and make it available to future generations? How will libraries and
other repositories classify this information so that their patrons can
find it with the same ease that they can locate a book on a shelf?
The ease with which electronic information can be created and
"published" makes much of what is available today, gone tomorrow.
Thus there is an urgent need to preserve this information before it
is forever lost.”
(Source: National Digital Information Infrastructure and Preservation Program.
http://www.digitalpreservation.gov/)
Characteristics of the problem
Obsolescence of technology
Accelerating rates of data collection and content
creation
Growing complexity of digital information
resources
Complex digital objects that require specific
software applications for reuse
Resource-intensive curatorial process
Need for funding and business models
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of
Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Market for digital preservation
solutions
Libraries, archives, museums, and other cultural
institutions

Preserving intellectual and cultural heritage
Government agencies, private corporations, notfor-profit organizations, and private citizens


Preserving digital assets
Legal and regulatory issues for government agencies
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of
Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Diverse set of projects tackling
various aspects of the problem
DSpace
Storage Resource Broker
(SRB)
Australia National Library
PANDORA
UK Digital Curation
Centre (DCC)
UK National Archives
PRONOM
DLF Global Digital
Format Registry
U. of Pennsylvania Typed
Object Model (TOM)
FCLA Dark Archive In
The Sunshine State
(DAITSS)
Royal Dutch Library &
Elsevier Science/EArchiving Agreement
Many more…
What does DSpace have to do with
digital preservation?
DSpace is…
An open source digital asset management
system
A technology platform for Institutional
Repositories
A federation of digital repositories across
multiple academic research institutions
A production service of the MIT Libraries to
its local research community
Institutional Repository
Institution-based
Scholarly material in digital formats
Cumulative and perpetual
Open and interoperable
(Source: Crow, Raymond. “The Case for Institutional Repositories:
A SPARC Position Paper.” http://www.arl.org/sparc/IR/ir.html)
Institutional Repositories
are unlike traditional archives
Acquisition at point of creation


Submissions can come directly from the creators
Includes non-document material
Shared curatorial control

Institutions and creators can establish content
guidelines or policies
Shared selection, processing responsibility

Scalable, less-resource intensive approach
Digital Preservation
Repositories don’t “do” preservation
Preservation operations are defined by
Digital collections in hand
 Cost/benefit tradeoffs
 Local policy

Digital Preservation
MIT Philosophy
Lots of digital material is already lost
 Most digital material is at risk
 Preserving bits better than nothing
 Capture as much information as possible
 Evaluate cost/benefit tradeoffs over time

Digital Preservation Categories
Supported



Provides for future content usability
Migration for texts, images, audio, etc.
Emulation for software, multimedia, etc.
Unsupported


Bit preservation at minimum
Migration when possible
e.g. commercial conversion services
Digital Preservation Policy
MIT Policy

Supported formats


Known/unsupported formats



e.g. TIFF, SGML/XML, PDF…
e.g. Microsoft Word, PowerPoint (common)…
e.g. Lotus 1-2-3, Visicalc, WordStar (less common)…
Unknown/unsupported


Highly complex and rare formats
e.g. one-of-a-kind software programs…
DSpace preservation research and
development
DSpace@Cambridge: development work on type
registries, automated ingest, preservation action
plans, and specific format investigation
SRB: large-scale storage infrastructure
SIMILE: infrastructure to cope with arbitrary
metadata formats using RDF
Proposal for archiving scientific datasets


Technically and organizationally
Working with MIT Computational and Systems
Biology Program
What is the DSpace Federation?
DSpace Federation
What?


Emerging community of DSpace users/installations
Open source software (OSS) community

Mostly sponsored programmers from DSpace installation sites
Who?

Research-generating organizations




(e.g. libraries, government agencies, museums, archives)
world-wide
Overlapping/complementary research interests
NGOs and industry
DSpace Federation
Why?

Drive DSpace development


Build critical mass of content


open source development model
support useful interoperation and research test bed
Leverage distributed expertise

e.g. in metadata and digital preservation
How might the DSpace Federation serve
as a potential business model for digital
preservation?
What is needed…
“Long-term digital archiving requires
systems, institutions, and business models that
are robust enough to withstand technological
failures, shifting computing platforms and
media, changes in institutional missions and
interruptions in management funding.”
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”.
NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Vision for the DSpace Federation
NGOs
Hong Kong U.
Sci. & Tech
U. Amsterdam
Related Initiatives
Corporations
MIT
U. Rochester
DSpace
Installations
HP
Government
Service providers
agencies
using DSpace/
DSpace services
user base
DSpace
Federation
Independent
developers
/hackers
ANU
U. Cambridge
DSpace BioMed
Central
software Industry
sponsored OCLC
development
resources
Consulting firm
Service
Libraries
services org.
Providers
Internet
co.
DSpace OSS
Community
/Value-Added
Resellers
IT Services co. Hardware
co.
Libraries
Related Initiatives
User
sponsored
development
resources
U. Toronto
A federation of DSpace
installations provides…
Safety in numbers (e.g. large community of
adopters with vested interest)
Critical mass of content for testing
Variety of use cases for managing digital content
Collaboration opportunities
Relationships with related initiatives
Defined market for digital preservation services
DSpace can serve as a focal point for
examining economic issues
Further research and development will help drive
down preservation costs by identifying ways to:




Reduce the up-front ingest costs
Automate ongoing preservation processes
Distribute and share costs
Develop economies of scale
Comparison of a variety of use cases will further
understanding of the economic issues


Identify common issues and costs
Opportunity to share best practices, particularly for
revenue models
Collaboration will produce a greater
impact than individual initiatives
Yield results that will meet the needs of
many
Raise awareness of issues
Collectively lobby proprietary software
vendors
Pursue joint funding opportunities for high
visibility projects
DSpace technology platform is
positioned to address preservation
Captures

Digital research material in any formats directly from creators
Describes


Descriptive, technical, rights metadata
Assigns persistent identifiers
Distributes


Delivers via Web, with necessary access control
Open and visible archive
Preserves


Large-scale, stable, managed long-term storage (bit preservation)
Active research and development in preserving access to content
DSpace is already being used
across the identified market
 115 institutions have registered for private name
space
 50-50 US/non-US
 Colleges and Universities
 Museums and Archives
 Research organizations
 Government agencies
 Private industry
Open Source Software enables
distributed community R&D
Code available to all, free of charge
Shared responsibility for software enhancement
and evolution
Shared benefit from research and development
work
Ability to leverage distributed expertise


metadata
digital preservation
Service providers/VARs provide
software and services
Implementation
SW bundling/integration
Consulting
Content management
Archival storage
Application hosting
Migration and emulation
Digital archaeology
Risks
Maintaining momentum of DSpace Open
Source Software community
Business models at individual universities –
will they be able to sustain DSpace and
involvement in OSS community?
Will users put digital items in DSpace?
Other emerging, leapfrogging technologies
What is needed…
“Long-term digital archiving requires
systems, institutions, and business models that
are robust enough to withstand technological
failures, shifting computing platforms and
media, changes in institutional missions and
interruptions in management funding.”
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”.
NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
“Digital information will never
survive by accident.”
(Source: Neil Beagrie, British Library)