Digital Object Architecture

Download Report

Transcript Digital Object Architecture

Digital Object Architecture
and the
Handle System
Larry Lannom
20 June 2006
Corporation for National Research Initiatives
http://www.cnri.reston.va.us/
http://www.handle.net/
What is the Problem?
• Managing information in the Net over very long
periods of time – e.g. centuries or more
• Dealing with very large amounts of information in the
Net over time
• When information, its location(s) and even the
underlying systems may change dramatically over time
• Respecting and protecting rights, interests and value
Corporation for National Research Initiatives
A Meta-level Architecture
• Allows for arbitrary types of information systems
• Allows for dynamic formatting and data typing
• Can accommodate interoperability between
multiple different information systems
• Allows metadata schema to be identified and
typed
Corporation for National Research Initiatives
Digital Object Architecture: Motivation
• To reformulate the Internet architecture around the notion of
uniquely identifiable data structures
• Enabling existing and new types of information to be reliably
managed and accessed in the Internet environment over long
periods of time
• Providing mechanisms to stimulate innovation, the creation of
dynamic new forms of expression and to manifest older forms
• While supporting intellectual property protection, fine-grained
access control, and enable well-formed business practices to
emerge
Corporation for National Research Initiatives
Digital Object Architecture: Components
– Digital Objects (DOs)
• Structured data, independent of the platform on which it was created
• Consisting of “elements” of the form <type,value>
• One of which is its unique, persistent identifier
– Resolution of Unique Identifiers
• Maps an identifier into “state information” about the DO
• Handle System is a general purpose resolution system
– Repositories from which DOs may be accessed
• And into which they may be deposited
– Metadata Registries
• Repositories that contain general information about DOs
• Supports multiple metadata schemes
• Can map queries into unique DO specifications (via handles)
Corporation for National Research Initiatives
What is a Digital Object
• Defined data structure, machine independent
• Consisting of a set of elements
– Each of the form <type,value>
– One of which is the unique identifier
• Identifiers are known as “Handles”
– Format is “prefix/suffix”
– Prefix is unique to a naming authority
– Suffix can be any string of bits assigned by that authority
• Data structure can be parsed; types can be resolved within the
architecture
• Associated properties record and transaction record containing
metadata and usage information
Corporation for National Research Initiatives
Repository Notion
Logical External Interface
RAP
Repository
Access Protocol
Any Hardware & Software
Configuration
Corporation for National Research Initiatives
Repositories & Digital Objects
Each Digital
Object has its
own unique &
persistent ID
RAP
Content Providers
assign Ids
No theoretical limits
on number of DOs
Per Repository
REPOSITORY
Corporation for National Research Initiatives
Objects may be
Replicated in
Multiple Repositories
Handle System
• Provides basic identifier resolution system for Internet
• Logically centralized, but physically distributed and highly
scalable
• Enables association of one or more typed values, e.g., IP
address, public key, URL, with each id
• Optimized for speed and reliability
• Secure resolution with its own PKI as an option
• Open, well-defined protocol and data model
• Provides infrastructure for application domains, e.g.,
digital libraries & publishing, network mgmt, id mgmt ...
Corporation for National Research Initiatives
Handles Resolve to Typed Data
Handle
10.123/456
Data type Index
Handle data
URL
1
http://acme.com/….
URL
2
http://a-books.com/….
DLS
9
acme/repository
HS_ADMIN
XYZ
100
12
Corporation for National Research Initiatives
acme.admin/jsmith
1001110011110
Handle Resolution
GHR
LHS
LHS
Client
The Handle System
is a collection of
handle services,
each of which
consists of one or
more replicated sites,
each of which may
have one or more
servers.
LHS
LHS
Site 1
Site 2
Site 2
Site 1
Site 3
…... Site n
#1
#1
#2
#3
#2
#4 ... #n
123.456/abc
URL 4 http://www.acme.com/
URL 8 http://www.ideal.com/
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
1. Sends request to Global to
resolve 0.NA/10.1000
(naming authority
handle for 10.1000)
Client
Global Handle
Registry
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
2. Global Responds with
Service Information for 10.1000
Client
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Service Information
Acme Local Handle Service
Global Handle
Registry
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Server 3
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
Client
Global Handle
Registry
3. Client queries Server 3
in Secondary Site A
for 10.1000/1
Acme Local
Handle Service
#1
Secondary Site B
#2
Primary Site
#1
#2
#1
#3
Secondary Site A
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
Global Handle
Registry
Client
4. Server responds with
handle data
Acme Local
Handle Service
#1
Secondary Site B
#2
Primary Site
#1
#2
#1
#3
Secondary Site A
Handle Clients
Handle Administration
Client
Web Client
HTTP Redirect
HTTP Get
http://hdl.handle.net/123.456/abc
Proxy/
Web Server
Handle Data
Resolve
Handle
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Client
Plug-In
Handle Administration
Client
Client
hdl:/123.456/abc
Handle Data
Resolve Handle
Request
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Administration
Client
Web
HTTP
Web Server
Admin Forms
Handle Admin API
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Custom
Client
Handle Administration
Client
Web
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Administration
embedded in another
process
Web
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Resolution
embedded in another
process
Handle Administration
embedded in another
process
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle System Usage
•
•
•
Library of Congress
DTIC (Defense Technical Information Center)
IDF (International DOI Foundation)
–
–
–
–
–
–
–
•
•
•
•
•
CrossRef (scholarly journal consortium)
CAL (Copyright Agency Ltd - Australia)
MEDRA (Multilingual European DOI Registration Agency)
Nielsen BookData (bibliographic data - ISBN)
R.R. Bowker (bibliographic data - ISBN)
Office of Publications of the European Community
German National Library of Science and Technology
NTIS (National Technical Information Service)
DSpace (MIT + HP)
ADL (DoD Advanced Distributed Learning initiative)
Assorted Digital Library Projects
In development: Globus Alliance
Corporation for National Research Initiatives
Handle System Usage
• Assigned Prefixes (June 06)
– DOI - 1772
– Other - 801
• Handles
– DOI - 22+ M
– Other - Additional millions (total per prefix known only to prefix manager;
LANL adding 600M but privately)
• Global
– Core: three service sites (added locations being considered)
– 53 M resolutions
Corporation for National Research Initiatives
Handle System Management and Standards
• Specification
– RFC 3650: Overview
– RFC 3651: Namespace and Service Definition
– RFC 3652: Protocol
• DoDI 1322
– Will mandate Handle System use as part of ADL-R
• ISO standards track for DOI
• HSAC - Handle System Advisory Committee
– Approx 15 members representing big users
– Goal: evolve to oversee the system
Corporation for National Research Initiatives
ADL Registry (ADL-R)
• Technological and Organizational Infrastructure
– Register the existence and access conditions for Learning Objects
relevant to the DoD ‘Enterprise’
– Provide user interface to search the registry
• Integrates existing technologies
–
–
–
–
–
Handle System for identification and access
XML for object description and submission
LOM metadata
Repository for metadata object storage and access
Lucene search engine
• Running at CNRI in initial production phase
Corporation for National Research Initiatives
ADL-R Input
Collections
Input Processing
Registry
ADL-R
A3
A2
A1
Search
Engine
Content
Objects
ATSC
Metadata
Objects
N4
N3
Content
Objects
N2
N1
hdl:123/4
NAVAIR
M10
M20
M1
Content
Objects
Marines
1
N1
metadata
Parse
Authenticate
Validate
Return
ADL-R Input
Collections
Input Processing
Registry
ADL-R
A3
A2
A1
Search
Engine
Content
Objects
ATSC
Metadata
Objects
N4
N3
Content
Objects
N2
N1
hdl:123/4
N1
metadata
<xml>
Parse
<title>Course
1</title>
Authenticate
<org>J-School</org>
Validate
<hdl>123/4</hdl>
Return
........
</xml>
NAVAIR
M10
M20
NAVAIR has Handle
Prefix 123 and
names N1 hdl:123/4
Content
Objects
Marines
GHR
DTIC
M1
LOC
IDF
LHS
ADL-R
NSDL
LHS
Handle System
LHS
UWisc
ADL-R Input
Collections
Input Processing
Registry
ADL-R
A3
A2
A1
Search
Engine
Content
Objects
ATSC
Metadata
Objects
N4
N3
Content
Objects
N2
N1
hdl:123/4
2
NAVAIR
M10
M20
M1
Content
Objects
Marines
Results
Log
Parse
Authenticate
Validate
Return
ADL-R Input
Collections
Input Processing
Registry
ADL-R
A3
A2
A1
Search
Engine
Content
Objects
ATSC
Metadata
Objects
N4
N3
Content
Objects
Parse
Authenticate
Validate
Return
N2
N1
hdl:123/4
3
NAVAIR
Input process creates
Metadata Object for
N1 named hdl:abc/d...
hdl:abc/d
M10
M20
M1
Content
Objects
Marines
GHR
DTIC
LOC
IDF
LHS
4
ADL-R
NSDL
LHS
Handle System
LHS
UWisc
...and creates two
handles: hdl:abc/d
for the Metadata
Object & hdl:123/4
for the Content
Object.
Metadata
Object
matching
Content
Object N1
xml
Searching the Registry
ADL-R
A3
1
Search
Engine
A2
Client does a search. Results
point to Metadata Object abc/d.
A1
Metadata
Objects
Content
Objects
ATSC
Client
hdl:abc/d
N4
N3
Content
Objects
N2
N1
hdl:123/4
Metadata
Object
matching
Content
Object N1
xml
NAVAIR
M10
M20
M1
GHR
DTIC
Content
Objects
LOC
IDF
ADL-R
NSDL
Marines
LHS
LHS
Handle System
LHS
UWisc
Searching the Registry
ADL-R
A3
Search
Engine
A2
1
Client does a search. Results
point to Metadata Object abc/d.
2
If desired, client gets Metadata
Object abc/d to view full registry
metadata.
A1
Metadata
Objects
Content
Objects
ATSC
hdl:abc/d
N4
N3
Content
Objects
N2
N1
hdl:123/4
Metadata
Object
matching
Content
Object N1
xml
NAVAIR
M10
M20
M1
GHR
DTIC
Content
Objects
LOC
IDF
ADL-R
NSDL
Marines
LHS
LHS
Handle System
LHS
UWisc
Client
Searching the Registry
ADL-R
A3
Search
Engine
A2
1
Client does a search. Results
point to Metadata Object abc/d.
2
If desired, client gets Metadata
Object abc/d to view full registry
metadata.
3
Client decides to get Content
Object N1 and resolves handle
123/4 to get its access
location and other conditions.
A1
Metadata
Objects
Content
Objects
ATSC
hdl:abc/d
N4
N3
Content
Objects
N2
N1
hdl:123/4
Metadata
Object
matching
Content
Object N1
xml
NAVAIR
M10
M20
M1
GHR
DTIC
Content
Objects
LOC
IDF
ADL-R
NSDL
Marines
LHS
LHS
Handle System
LHS
UWisc
Client
Searching the Registry
ADL-R
A3
Search
Engine
A2
1
Client does a search. Results
point to Metadata Object abc/d.
2
If desired, client gets Metadata
Object abc/d to view full registry
metadata.
3
Client decides to get Content
Object N1 and resolves handle
123/4 to get its access
location and other conditions.
A1
Metadata
Objects
Content
Objects
ATSC
hdl:abc/d
N4
N3
Content
Objects
N2
N1
hdl:123/4
Metadata
Object
matching
Content
Object N1
xml
NAVAIR
4 Client requests a copy of
Content Object N1 from NAVAIR.
M10
M20
M1
GHR
DTIC
Content
Objects
LOC
IDF
ADL-R
NSDL
Marines
LHS
LHS
Handle System
LHS
UWisc
Client
ADL-R
CORDRA
Registry
Content
Repository
Object Level
Metadata
Content
Repository
Content
Repository
Content
Repository
CORDRA
Community
Corporation for National Research Initiatives
CORDRA
CORDRA
Community
CORDRA
Community
CORDRA
Registry
CORDRA
Registry
Content
Repositories
Master
Registry
of Registries
Federation
Level
Metadata
Content
Repositories
Federation Level
Metadata
Intermediate
Registry
of Registries
Federation
Level
Metadata
Federation Level
Metadata
CORDRA
Registry
Intermediate
Registry
of Registries
CORDRA
Community
Content
Repositories
CORDRA
Registry
Community
CORDRA
Registry
Content
Repositories
CORDRA
Community
CORDRA
Registry
Federation
Level
Metadata
CORDRA
Community
CORDRA
Registry
Community
CORDRA
Registry
Community
CORDRA
Registry
Content
Repositories
Content
Repositories