Storage Resource Broker Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.sdsc.edu/srb/ Topics • Preservation environments • Digital library technology • Data grid technology • Fundamental concepts.

Download Report

Transcript Storage Resource Broker Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.sdsc.edu/srb/ Topics • Preservation environments • Digital library technology • Data grid technology • Fundamental concepts.

Storage Resource Broker
Building Preservation
Environments
Reagan W. Moore
San Diego Supercomputer Center
[email protected]
http://www.sdsc.edu/srb/
Topics
• Preservation environments
• Digital library technology
• Data grid technology
• Fundamental concepts / future research
• Data / information / knowledge
• Persistent objects
• Knowledge management
Preservation
• Archival processes through which a digital entity is
extracted from its creation environment, and then
supported in a preservation environment, while
maintaining authenticity and integrity information.
• Extraction process requires insertion of support
infrastructure underneath the digital material
• Goal is infrastructure independence, the ability to use
any commercial storage system, database, or access
mechanism
Preservation Communities
• InterPARES - diplomatics
• Preservation of records
• NARA
• Preservation of records from federal agencies
• State archives
• Preservation of submitted “collections”
• Continuum model
• Preservation of active data and records
Preservation
• What differentiates a preservation
environment from a digital library?
Digital Libraries
• Support the community vocabulary
• Discovery and browse using community
relevant terms
• Support the community data format
• Maintain information on the data format of
each item
• Support the community access services
• Provide services that manipulate and display
the community data format
Preservation Mandates
• Diplomatics
• Authenticity
• Integrity
• NARA
• Infrastructure independence
• Scalability
• State archives
• Automation of archival processes
InterPARES - Diplomatics
• Authenticity - maintain links to metadata for:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Date record is made
Date record is transmitted
Date record is received
Date record is set aside [i.e. filed]
Name of author (person or organization issuing the record)
Name of addressee (person or organization for whom the record is intended)
Name of writer (entity responsible for the articulation of the record’s content)
Name of originator (electronic address from which record is sent)
Name of recipient(s) (person or organization to whom the record is sent)
Name of creator (entity in whose archival fonds the record exists)
Name of action or matter (the activity for which the record is created)
Name of documentary form (e.g. E-mail, report, memo)
Identification of digital components
Identification of attachments (e.g. digital signature)
Archival bond (e.g. classification code)
InterPARES - Diplomatics
• Integrity - maintain links to metadata for
• Name(s) of the handling office / officer
• Name of office of primary responsibility for keeping
the record
• Annotations or comments
• Actions carried out on the record
• Technical modifications due to transformative
migration
• Validation
Preservation Approach
• Provide mechanisms to:
• Create archival context for the content
• Context is preservation metadata (provenance, administrative,
descriptive, structural, behavioral)
• Content is the submitted digital entity
• Assert integrity - the consistency between the context
and the content
• Track operations done on material and update context
• Assert authenticity - that the material represents the
original site
• Track the chain of custody
• Manage technology evolution (encoding standard,
storage repository, information repository, access
methods)
Data Grids
• What is the difference between a
preservation environment and a data
grid?
Data Grids
• Manage shared collections that are
distributed in space
• Location of item, access controls, checksums
• Implement infrastructure independence
• Standard operations for interacting with
storage repositories
• Implement presentation independence
• Standard APIs to support porting of user
interfaces
Preservation Environment
• Digital library infrastructure that
supports
• Preservation metadata
• Arrangement and description of items
• Access mechanisms
• Data grid infrastructure that supports
• Shared collections that are migrated forward in
time
• Management of technology evolution
• Administrative metadata providing status of
records
Infrastructure Independence
Data Access Methods (Web Browser, DSpace, OAI-PMH)
Storage Repository
• Storage location
• User name
• File name
• File context (creation date,…)
• Access constraints
Naming conventions
provided by storage
systems
Data Grids Provide a Level of Indirection
for Each Naming Convention
Data Access Methods (C library, Unix, Web Browser)
Data Collection
Storage Repository
Data Grid
• Storage location
• Logical resource name space
• User name
• Logical user name space
• File name
• Logical file name space
• File context (creation date,…)
• Logical context (metadata)
• Access constraints
• Control/consistency constraints
Data is organized as a shared collection
Demonstration
•
•
•
•
•
•
Logical file name space
Distinguished user name space
Shared collection
Distributed data storage
Replication as a file property
Digital entities
Data Grids
• Provide two levels of indirection:
• Low level API used to interact with storage
repositories
• Standard operations for manipulating files in a
storage system
• Standard operations for manipulating a catalog
stored in a database
• High level API used to support user interfaces
• Three basic APIs - “C” library call, Unix shell
commands, Java class library
• Other are interfaces ported on top of the basic
APIs.
Storage Resource Broker 3.3
Application
C
Library,
Java
Unix
Shell
Linux I/O NT Browser,
Kepler Actors
C++
DLL /
Python,
Perl,
Windows
HTTP,
OAI,
DSpace, WSDL,
OpenDAP, (WSRF)
GridFTP
Federation Management
Consistency & Metadata Management / Authorization, Authentication, Audit
Logical Name
Space
Database Abstraction
Databases DB2, Oracle, Sybase,
Postgres, mySQL,
Informix
Latency
Management
Data
Transport
Metadata
Transport
Storage Repository Abstraction
Archives - Tape,
File Systems
Sam-QFS, DMF, ORB
Unix, NT,
HPSS, ADSM,
Mac OSX
UniTree, ADS
Databases DB2, Oracle,
Sybase, Postgres,
mySQL, Informix
Accessing Multiple Types of
Storage Systems
User Application
Archive
at SDSC
Archive
at NARA
Archive
at U Md
Standard Data Access Operations
Remote operations
Unix file system
Latency management
Procedures
Transformations
Third party transfer
Filtering
Queries
Collective operations
Replication
Fault tolerance
Load leveling
User Application
Common set of operations for interacting
with every type of storage repository
Archive
at SDSC
Archive
at NARA
Archive
at U Md
Accessing Data at Multiple Sites
Each site has their own
naming convention for
files
User Application
A data grid provides a
uniform way to name
and access the files
across the sites
Archive
at SDSC
Archive
at NARA
Archive
at U Md
Building a Distributed Collection
Logical name space
Location independent identifier
Persistent identifier
Collection owned data
Authenticity metadata
Access controls
Audit trails
Checksums
Descriptive metadata
Inter-realm authentication
Single sign-on system
User Application
Data Grid
Common naming convention and set of
attributes for describing digital entities
Archive
at SDSC
Archive
at NARA
Archive
at U Md
Federated Server Architecture
Read Application
Logical Name
Or
Attribute Condition
Peer-to-peer
Brokering
Parallel Data
Access
1
6
SRB
server
3
SRB
server
4
SRB
agent
5
SRB
agent
1.Logical-to-Physical mapping
2.Identification of Replicas
3.Access & Audit Control
5/6
2
R1
MCAT
Data
Access
R2
Server(s)
Spawning
Managing Access
• Authenticate users independently of
storage systems
• Preservation environment owns the data
• Authorize data access independently of
storage system
• ACLs on both data and metadata
• Maintain audit trails of all accesses
• Both read and write
Collection-owned Data
• Store data at remote storage system under
data-grid ID
• Access data through data grid servers
• Track all operations on data and update state
information
• User authenticates to a data grid server
• Access controls are checked for permissions
• Data grid servers authenticate messages from other
servers
• Remote server authenticates to remote storage
system
• Multiple authentication mechanisms
• GSI / challenge-response / tickets
Provide Context for Data
• Properties of files
• Provenance - source
• Descriptive attributes
• Structure
• Organize properties as metadata in a
collection hierarchy
• Define operations on file properties
• Manage state information - location, replicas,
containers
• Separate context management from content
management
• Maintain consistency of context as operations are
done on content
Database Operations
• Standard interface to support
•
•
•
•
•
Schema extension - user defined attributes
Snowflake table creation
SQL generation
Import and export of XML files
Bulk metadata load and unload
• Operations required to manage a
catalog that resides in a database
National Archives and Records Administration Research Prototype Persistent Archive
Demonstrate preservation
environment
• Authenticity
• Integrity
• Management of
technology evolution
• Mitigation of risk of data loss
• Replication of data
• Federation of catalogs
• Management of preservation
metadata
• Scalability
• EAP collection
• 350,000 files
• 1.2 TBs in size
Federation of Three
Independent Data Grids
NARA
MCAT
Principle copy
stored at NARA
with complete
metadata catalog
U Md
MCAT
Replicated copy
at U Md for improved
access, load balancing
and disaster recovery
SDSC
MCAT
Deep Archive at
SDSC, no user
access, but
complete copy
Preservation Requirements
• Maintain authenticity and integrity of
electronic records
• Authenticity - assertion of provenance of data
• Integrity - assertion of invariance of bits
• Manage risk of data loss
• Media corruption / System failures / Operational errors
/ Natural disaster / Malicious users
• Manage technology obsolescence
• Support migration of collection to new systems
• Bulk data operations
Replication
• How many replicas are enough?
Federation
Data Access Methods (Web Browser, DSpace, OAI-PMH)
Data Collection A
Data Grid
Data Collection B
Data Grid
• Logical resource name space
• Logical resource name space
• Logical user name space
• Logical user name space
• Logical file name space
• Logical file name space
• Logical context (metadata)
• Logical context (metadata)
• Control/consistency constraints
• Control/consistency constraints
Access controls and consistency constraints
on cross registration of digital entities
Data Grid Zones
• Choose how name spaces will be shared
• Cross register storage resources
• May the other data grid write to my storage?
• Cross register user names
• Users are authenticated by their home zone
• Cross register files
• Can replicate files into another data grid
• Cross register metadata
• Can build a copy of the metadata catalog
Peer-to-Peer Data Grids
Free Floating
Partial User-ID Sharing
Replication
Constraints
Occasional Interchange
Partial Resource Sharing
Replicated Data
No Metadata Synch
System Set Access Controls
System Controlled Complete Synch
Complete User-ID Sharing
User and Data Replica
Resource Interaction
Access
Constraints
System Managed Replication
Connection From Any Zone
Complete Resource Sharing
Replicated Catalog
Replication Data Grids
Federation Environments
Consistency
Constraints
Hierarchical Zone Organization
One Shared User-ID
Nomadic
System Managed Replication
System Set Access Controls
System Controlled Partial Synch
No Resource Sharing
Snow Flake
Super Administrator Zone Control
Master Slave
System Controlled Complete Synch
No User-ID Sharing
Deep Archive
Hierarchical Data Grids
Examples of Extensibility
• Storage Repository Driver evolution
•
•
•
•
•
•
•
•
•
Initially supported Unix file system
Added archival access - UniTree, HPSS
Added FTP/HTTP
Added database blob access
Added database table interface
Added Windows file system
Added project archives - Dcache, Castor, ADS
Added Object Ring Buffer, Datascope
Adding GridFTP version 3.3
• Database management evolution
•
•
•
•
•
•
Postgres
DB2
Oracle
Informix
Sybase
mySQL (most difficult port - no locks, no views, limited SQL)
Examples of Extensibility
• The 3 fundamental APIs are C library, shell commands,
Java
• Other access mechanisms are ported on top of these interfaces
• API evolution
•
•
•
•
•
•
•
•
•
Initial access through C library, Unix shell command
Added inQ Windows browser (C++ library)
Added mySRB Web browser (C library and shell commands)
Added Java (Jargon)
Added Perl/Python load libraries (shell command)
Added WSDL (Java)
Added OAI-PMH, OpenDAP, DSpace digital library (Java)
Added Kepler actors for dataflow access (Java)
Adding GridFTP version 3.3 (C library)
Storage Resource Broker Collections at SDSC
(2/22/2005)
Data Grid
NSF/ITR - National Virtual Observatory
NSF - National Partnership for Advanced Computational Infrastructure
Hayden Planetarium - Evolution of the Solar System vi sualizations
Public collections - NSF/NPACI - Joint Center f or Structural Genomics
NSF/NPACI - Biology and E nvironmental c ollections
NSF - TeraGrid, ENZO Cosmology simulations
GBs of
data
stored
Ê
Number
of files
Ê
Number
of
Users
Ê
53,862
31,263
7,201
5,455
20,364
155,980
9,536,751
6,435,338
113,600
3,405,266
52,159
1,157,168
100
380
178
67
67
3,176
NIH - Biomedical Informatics Research Network
9,830
6,632,159
241
Miscellaneous static collections
Digital Library
8,013
Ê
161,352
Ê
241
720
253
2,620
559
2,654
92
99,010
45,365
8,892
53,048
71,318
1,052,202
2,387
2,074,138
NLM - Digital Embryo image collection
NSF/NPACI - Long Term Ec ological Reserve
NSF/NPACI - Grid Portal
NIH - Alliance for Cell Signaling microarray d ata
NSF - National Science Digital Library SIO Explorer collection
NSF/NPACI -Transana education r esearch vi deo collection
NSF/ITR - Southern California Earthquake Center
Persistent Archive
Ê
NHPRC Persistent Archive Testbed (Kentucky, Ohio, Michigan, Minnesota)
UCSD Libraries archive
NARA- Research Prototype Persistent Archive
NSF - National Science Digital Library persistent archive
TOTAL
Ê
Ê
23
36
460
21
27
26
64
Ê
90
4,147
991
3,572
372,947
408,050
455,094
26,918,638
28
29
58
136
404 TB
59 million
5,167
Sites Using the SRB
Academia Sinica, Taiwan
ASCC, Computing Centre, Taiwan
Australian National Univ ersity
Bedf ord Oceanography ,Canada
Bioinf ormatics Institute, Singapore
CSIRO, Australia
D ata S torage Institute, S ingapore
EGEE, French National Center
GeoForschungsZentrum, Germany
James Cook Univ ersity , Australia
KEK High Energy Phy sics, Japan
Max Planck Institute, Netherlands
Parallab, Norway
South Australian Adv anced Computing
UIB (Parallab) , Norway
Univ ersity of Amsterdam
Univ ersity of Cambridge, Astronomy
Univ ersity of Cambridge, e-Science
Univ ersity of Edinburgh
Univ ersity of Genoa, Italy
Univ ersity of Hong Kong
Univ rsity of Manchester
Univ ersity of Oslo
Univ ersity of Southampton
Y ork Univ (UK)
CiteSeer, Penn State
City Univ . of New Y ork
Geospatial Env ironment, UCSD
Drexel Univ ersity
EOSDIS Distributed Activ e, NASA Goddard
Georgia Tech
Kentucky State Libraries & Archiv es
Library of Congress
Los Alamos National Lab
NASA Ames
NASA Goddard Space Flight Center
NCSA Grid Computing
NIH (NCI Center f or Bioinf ormatics)
Penn State Univ ersity
Pittsburgh Supercomputing Center
Purdue Univ ersity . Indiana
Stanf ord Univ ersity
TACC, Univ ersity of Texas
Texas A & M
UC Santa Cruz
UCLA
UCSD Neuroscience
Univ ersity of Mary land
Univ ersity of Michigan, CAC department
Univ ersity of New Mexico
Univ ersity of Washington
Univ ersity of Wisconsin
USC
Y ale Univ ersity
Research Areas
• Characterization of data / information /
knowledge
• Preservation architecture
• Knowledge management - dynamic
application of preservation policies
• Persistent object - characterization of
digital entities
Characterizing Knowledge
• Data - bits that comprise a digital entity
• Information - a semantic label that is applied
to data
• Knowledge - relationships between semantic
labels
• Metadata - the combination of the semantic
label and the data
• The creation of a semantic label is driven by
the application of a process / relationship
• Information is the result of applying
knowledge relationships
• Information is the reification of knowledge
Knowledge Management
• Reify relationships to improve access
performance
• Easier to query on metadata than to apply the
original relationships
• Manage state information about the reification
process - support for relationship changes
• Support levels of granularity for application of
relationships - collective properties versus
procedural properties
• Goal is to build a scalable knowledge
management system
Preservation Strategies
• Emulation
• Migrate the display application onto new operating
systems
• Equivalent to forcing use of candlelight to look at 16th
century documents
• Transformative migration
• Migrate the encoding format to the new standard
• Migration period is expected to be 5-10 years
• Persistent object
• Characterize the encoding format
• Migrate the characterization forward in time
Persistent Objects
Display Applications
1980
1990
2000
2010
2020
Characterize standard manipulation operations
Characterize encoding format - data structure
1980
1990
Digital Entities
2000
2010
2020
Preservation Standards
• OAIS - Open Archival Information
System
• Submission Information Package (SIP)
• Archival Information Package (AIP)
• Dissemination Information Package (DIP)
• Producer Archive Interface Abstract
Methodology Standard
• (CCSDS Document 651.0-R-1)
Containers
• SRB provides support for aggregation
of files into a container
• AIP is the aggregation of both
preservation context and the records
into a container
• What is the appropriate form for a selfdescribing container?
Self-instantiating Archive
• Preservation of Digital Data with SelfValidating, Self-Instantiating KnowledgeBased Archives, B. Ludäscher, R. Marciano,
R. Moore, SIGMOD Record, ACM, 30(3), pp.
54-63, 2001.
• An archives consists of the application of
archival processes to create the collection
managed in the preservation environment
• Instantiation corresponds to the application of
the archival processes to the original data
Example Web Crawl
• National Science Digital Library maintains
registry of URLs for education material at
Cornell
• Crawl sites
• Recursion to a depth of 10 redirections
• Restriction to pages within initial site plus one level
outside site
• Store material on processing platform
• 70,000 URLs - 2 million digital entities, 200 GB
• On average 30 files per URL,
• Each file with average size 100 kBytes
Collection Requirements
• Provide containers for managing small
files
• 26 million files, average size 100 kB
• Aggregate data in containers before storage
• Support web-based access to archived
data
• Redirect web page internal HTTP links to data
grid handles
• Support integrity
• Manage checksums on files
Accessioning Web Sites
• Use OAI harvesting to extract URLs from the NSDL
repository
• Crawl each URL and process each digital entity
• Replace internal URLs with data grid logical names
• Aggregate digital entities into containers (files) for storage.
Archives store files that are 40 MBytes in size.
• Generate archival context
• Register digital entity into a data grid
• Use collection hierarchy to associate web crawl properties with
each file (date, site, initial URL, …)
• Write processed files into a storage system managed
by a data grid
• Replicate data on Grid Bricks and archival storage system
• Provide OAI interface for reporting validation results
Persistent Archive Collections
• Build collections based on date crawled
• For each collection, use separate folder to
hold digital entities associated with the
original URL
• Typically 30 digital entities per URL
• Aggregate digital entities into containers before
storage
• Preservation metadata maintained for each
digital entity
• Administrative, descriptive, structural, behavioral
A Few Statistics on NSDL Content
SDSC Crawl (April 03, 4 Links Deep)
received correctly
no data received
see other
forbidden
file not found
internal server error
application error
service temp. overloaded
WIMS User Error
Gone
unused
redirection w/out location
— 1,530,206
—
51
—
5
—
311
—
38,386
—
946
—
15
—
8
—
1
—
1
—
1
—
1
total digital entities
error percentage
— 1,569,932
—
2.53%
Encoding Formats Present in Archive
Digital Entity Type
html
gif
jpg
xml
txt
pdf
css
doc
asp
ppt
xls
Number of files
331557
157891
136445
21528
17433
9369
4073
862
819
161
15
CSS - Cascading Style Sheet
ASP - Microsoft Active Server Page
Automated Processes: Categorizing the
“Space” of all Descriptive Patterns
• Data-driven validation of descriptive metadata
from NARA Archival Information Locator
records
• Exhaustive examination of every metadata occurrence
• Automatic creation of an open-source
relational database implementation
• Accumulation of all descriptive patterns
• Based on deriving “Descriptive Signatures”
relying on regular expressions
• Creation of a Perl-based Validation Regular Expression
Tool
• Refined regular expression to identify anomalies in the
legacy metadata
• Annotated artifacts introduced by archival processes
A String Analysis Approach
Accumulate all occurrence strings at each level of
description in the hierarchy, and derive a
regular expression that characterizes all
instances:
• Record Group OR Collection (total of ~550)
• Series
• File Unit
• Item OR ItemAV (audio-visual)
___________________________________________
• Physical Occurrence
• Media Occurrence
• Object
A String Analysis Approach
Example - structural characterization:
• At the Series level, possible patterns are
(S=Series, I=Item, O=Object, F=FileUnit):
• SIOSIOOOOOO
• SIO
• SFFFF
• SIIIIII
• SIOIOOOOO
• SFIIII
• An inferred regular expression is:
• S( F*(I+O+)* | I+ )*
• Relational tables are derived from these
regular expressions for each of the 9 levels
Metadata Validation
• Analyze each regular expression to
identify the classes of anomalies
• Cases in which a subset of the objects have a
unique characterization different from the
majority of the objects
• Identify cases with incorrect metadata tags
• Identify cases with missing metadata or
missing objects
• Identify changes in metadata definitions
Regular Expressions
• COLLECTION (2 characterizations):
• ***********
• ="TiMtldColid(XcXs)*(Date)?(Ab)?(Tcsd(Tcsdq)?Tced(Tcedq
)?)?(Tisd(Tisdq)?Tied(Tiedq)?)?"
• = "(Odonor)?(Pdonor)*(Daut(Ndad)?)?(FatFan)?Dcgsd"
• RECORD GROUP (2 characterizations):
• *************
• ="TiMtldGrno(Date)?(Tcsd)?(Tcsdq)?(Tced)?(Tcedq)?Tisd(Ti
sdq)?Tied(Tiedq)?"
• = "(FatFan)*Dcgsd"
Regular Expressions
•
•
•
•
SERIES (5 characterizations):
*******
= "Ti(Altti)*MtldS(Grno)?(Formerrg)*(Colid)?"
="(Acnum)*(Arra)?(Chn)?(Date)?(Funcu)?(Gen)*(Numb)?(Sc
ale)?(Ab)?(Tran)?(Itn)?(Staff)?(Rctno)*(Dano)*(XcXs)+"
• ="((Tcsd)?(Tcsdq)?Tced(Tcedq)?)?(Tisd(Tisdq)?Tied(Tiedq)
?)?"
• ="(Grt)*(Srt)*(OcontrOcontrtp)*(Orefer)*(Tgn)*(Lan)*(PcontrP
contrtp)*(Prefer)*(Subj)*((Ars)?(Sar)*(Arsn)?)"
• ="(Urrs)?(Surr)?(Urrn)?((Daut)?(Ndad)?)*(Fat(Fan)?)*(MpiMp
t(Mpn)?)*(Taed)?(Tst)?(CrorgCrorgtp)?(CrindCrindtp)*(Dcgs
d)"
• --> SarSar: Decision to combine them
Regular Expressions
• ITEM (4 characterizations):
• *****
• ="Ti(Altti)*Mtld(Grno)?(Formerrg)?(Colid)?(Acnum
)?(Arra)?(Date)?(Gen)*(Ab)?(Staff)?(XcXs)+"
• = "((Tcsd)*(Tcsdq)?(Tced)*(Tcedq)?)?"
• ="(Tpd(Tpdq)?)*(Grt)+(Srt)*(OcontrOcontrtp)*(Oref
er)*(Tgn)*(PcontrPcontrtp)*(Prefer)*(Subj)*(Ars)?(S
ar)*(Arsn)?"
• ="Urrs(Surr)?(Urrn)?((Daut)?Ndad)*(MpiMpt(Mpn)
?)*(CrorgCrorgtp)?Dcgsd"
• --> SarSar: Decision to combine them
• --> Tcsd/Tced:
Regular Expression
• FILE UNIT (4 characterizations):
• **********
• ="Ti(Altti)?Mtld(Grno)?(Formerrg)?(Colid)?(Acnum)?(A
rra)?(Gen)*(Ab)?(XcXs)+"
• ="((Tcsd)*(Tcsdq)?(Tced)*(Tcedq)?)?(Tisd(Tisdq)?Tied(
Tiedq)?)?"
• ="(Grt)+(Srt)*(OcontrOcontrtp)?(Orefer)*(Tgn)*(PcontrP
contrtp)?(Prefer)*(Subj)*(Ars)?(Sar)*(Arsn)?"
• ="Urrs(Surr)?(Urrn)?((Daut)?Ndad)?(MpiMpt(Mpn)?)?(C
rorgCrorgtp)?Dcgsd"
• --> SarSar: Decision to combine them
• --> Tcsd/Tced:
Lessons Learned
• Data-driven analysis of actual preservation
metadata can be used to implement a new
catalog on new technology
• Variant of self-instantiating archive, in which
the preservation structure and catalogs are
re-created
Preservation
• Archival processes through which a digital entity is
extracted from its creation environment and migrated
to a preservation environment, while maintaining
authenticity and integrity information.
• Extraction process requires insertion of support
infrastructure underneath the digital material,
characterization of the authenticity and integrity,
characterization of the digital encoding format, and
characterization of the display operations
• Goal is infrastructure independence, the ability to use
any commercial storage system, database, or access
mechanism
For More Information
Reagan W. Moore
San Diego Supercomputer Center
[email protected]
http://www.sdsc.edu/srb/