Using Digital Library Technology to Support Data Intensive

Download Report

Transcript Using Digital Library Technology to Support Data Intensive

Digital Library Architecture
Reagan Moore
Chaitan Baru
Amarnath Gupta
George Kremenek
Bertram Ludaescher
Richard Marciano
Arcot Rajasekar
Wayne Schroeder
Michael Wan
Ilya Zaslavsky
Bing Zhu
(http://www.npaci.edu/DICE/)
National Partnership for Advanced Computational Infrastructure
What Types of Management Systems
are Required?
• Data management
• Ability to access multiple types of storage systems, across
separate administration domains
• Information management
• Ability to migrate collection onto new information
repository
• Knowledge management
• Rule-based ontology mapping
• Characterization of rules under which collection is formed
• Management of knowledge bases - Topic Maps
National Partnership for Advanced Computational Infrastructure
Information Management Hierarchy
• Persistent Archives
• Storage of information model, data model, along with data
• Data Grid
• Access to data in a different administration domain
• Digital Library - Presentation / Information Discovery
• Interlib - ADEPT, UC Berkeley Digital Library
• Data Collection
• Extensible Meta-data catalog - EMCAT
• Data handling
• SDSC Storage Resource Broker - SRB
• Archival Storage
• High performance storage system - HPSS
National Partnership for Advanced Computational Infrastructure
Digital Library Data Management
• Persistent identifiers
• Ability to move a data set without the name changing
• Data set replicas
• Management of multiple copies of a data set
• Archival backup of data sets
• Integration of disk data caches with archival storage
• Persistent archives
• Management of a collection through multiple cycles of
technology evolution
National Partnership for Advanced Computational Infrastructure
SDSC Storage Resource Broker
& Meta-data Catalog
Application
Resource
User
File SID
DBLobj SID
Obj SID
Remote
Proxies
SRB
MCAT
Dublin Core
ADSM
HPSS
DB2
Third-party
copy
Oracle
Application
Meta-data
National Partnership for Advanced Computational Infrastructure
Unix
DataCutter
Common Information Model
• eXtensible Markup Language (XML)
• Use tags to define semantic context for components of the
data set
• Document Type Definition (DTD)
• Provides semi-structured representation for organizing
tags that can be applied to groups of digital objects
• Development of standards for tags
• Digital sky, Protein Data Bank, Neuroscience brain images
• California Digital Library - Art Museum Image Consortium
National Partnership for Advanced Computational Infrastructure
Applications
• Support for distributed data collections
• Federation of data collections to form digital
library
• Integration of digital libraries with archives
• Finding aids for federation of digital libraries
through mediation of information
• Data grids for data access
• Persistent archives
National Partnership for Advanced Computational Infrastructure
Electronic Records Archive (ERA)
TRANSFER
ACCESSION
Media
Handlers
Accessioning
Work Bench
(snapin)
ARCHIVES
REFERENCE
Reference
Workbench
(snapin)
METADATA
REPOSITORY
TAPE
RECORDS
REPOSITORY
Image
Photo
CD
Video
Audio
DISK
Geographical
Information
System
Compound
Records
W
R
A
P
P
E
R
record
Internet
Intranet
Query &
Reference
Tools
Text
U
N
W
R
A
P
P
E
R
Arrangement
A
R
C
TAPE
CD
DISK
Presentation
FTP
WEB
FTP
Retrieve
Records
Catalog
Metadata wrapper
Database
National Partnership for Advanced Computational Infrastructure
Order
Fulfillment
More Information
http://www.npaci.edu/DICE
National Partnership for Advanced Computational Infrastructure