Digital Object Architecture - Max Planck Institute for

Download Report

Transcript Digital Object Architecture - Max Planck Institute for

Persistent Identification:
The Handle System
Larry Lannom
Corporation for National Research Initiatives
http://www.cnri.reston.va.us/
http://www.handle.net/
Digital Object Architecture - Goals
• Framework for managing Digital
(Information) Objects
• Give it a name and talk to it
– Don’t worry about where it is
– Don’t worry about what it’s made of
• Rise above details of application versions
and content formats
Corporation for National Research Initiatives
Digital Object Architecture
Client
Repositories / Collections
Resource Discovery
•Search Engines
•Metadata Databases
•Catalogues, Guides, etc.
Resolution System
Digital Object Architecture Components
Handle System
• Go from name to attributes
• Fundamental indirection system for Digital
Object management on the net
• No free lunch
– Added layer of infrastructure
– Must be managed
Corporation for National Research Initiatives
CNRI Handle System
• Distributed, scalable, secure
• Enforces unique names
• Enables association of one or more typed values,
e.g., URL, with each name
• Optimized for speed and reliability
• Open, well-defined protocol and data model
• Provides infrastructure for application domains,
e.g., digital libraries, electronic publishing ...
Corporation for National Research Initiatives
Handle System Usage
•
•
•
Library of Congress
DTIC (Defense Technical Information Center)
IDF (International DOI Foundation)
–
–
–
–
–
–
–
–
–
–
CrossRef (scholarly journal consortium)
Enpia (Korean content management technology firm)
CDI (U.S. content management technology firm)
LON (U.S. learning object technology firm)
CAL (Copyright Agency Ltd - Australia)
TSO (U.K. publisher & info mgmt service provider)
MEDRA (Multilingual European DOI Registration Agency)
Nielsen BookData (bibliographic data - ISBN)
R.R. Bowker (bibliographic data - ISBN)
Office of Publications of the European Community
•
•
NTIS (National Technical Information Service)
DSpace (MIT + HP)
•
CORDRA (ADL's Federated Content Repository Model)
•
Globus Toolkit (in development)
Corporation for National Research Initiatives
Handles Resolve to Typed Data
Handle
10.123/456
Data type Index
Handle data
URL
1
http://acme.com/….
URL
2
http://a-books.com/….
DLS
9
acme/repository
HS_ADMIN
XYZ
100
12
Corporation for National Research Initiatives
acme.admin/jsmith
1001110011110
The Two Types of Handle Query
1. Request all data
Give me all data associated with handle 10.1000/123.
LHS
Handle
Client
LHS
Handle
10.1000/123
Index Type
3
2
5
10
9
4
URL
URL
URL
PK
EM
IP
GHR
LHS
LHS
LHS
LHS
Data
LHS
URL1(Server in US)
URL2 (Server in Asia)
URL3 (Server in Europe)
public key
email address
rights data
LHS
LHS
Handle System
2. Request all data of a given type
Give me all data of type URL associated with handle 10.1000/123.
LHS
Handle
Client
Handle
10.1000/123
Index Type
3
2
5
URL
URL
URL
Data
URL1(Server in US)
URL2 (Server in Asia)
URL3 (Server in Europe)
LHS
GHR
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Resolution
GHR
LHS
LHS
Client
The Handle System
is a collection of
handle services,
each of which
consists of one or
more replicated sites,
each of which may
have one or more
servers.
LHS
LHS
Site 1
Site 2
Site 2
Site 1
Site 3
…... Site n
#1
#1
#2
#3
#2
#4 ... #n
123.456/abc
URL 4 http://www.acme.com/
URL 8 http://www.ideal.com/
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
1. Sends request to Global to
resolve 0.NA/10.1000
(naming authority
handle for 10.1000)
Client
Global Handle
Registry
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
2. Global Responds with
Service Information for 10.1000
Client
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Service Information
Acme Local Handle Service
Global Handle
Registry
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
Server 3
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
xcccxv
xc
xc
xc
...
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xcccxv
xccx
xccx
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
Handle Clients
IP Address
Port #
Public Key
...
Primary Site
Server 1
123.45.67.8
2641
K03RLQ...
Server 2
123.52.67.9
2641
5&M#FG...
...
...
Server 1
321.54.678.12
2641
F^*JLS...
...
Server 2
321.54.678.14
2641
3E$T%...
...
762.34.1.1
2641
A2S4D...
...
123.45.67.4
2641
N0L8H7...
...
Secondary Site A
Server 3
Secondary Site B
Server 1
Service Information - Acme Local Handle Service
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
Client
Global Handle
Registry
3. Client queries Server 3
in Secondary Site A
for 10.1000/1
Acme Local
Handle Service
#1
Secondary Site B
#2
Primary Site
#1
#2
#1
#3
Secondary Site A
Handle Clients
Request to Client:
Resolve hdl:10.1000/1
Global Handle
Registry
Client
4. Server responds with
handle data
Acme Local
Handle Service
#1
Secondary Site B
#2
Primary Site
#1
#2
#1
#3
Secondary Site A
Handle Clients
Handle Administration
Client
Web Client
HTTP Redirect
HTTP Get
http://hdl.handle.net/123.456/abc
Proxy/
Web Server
Handle Data
Resolve
Handle
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Client
Plug-In
Handle Administration
Client
Client
hdl:/123.456/abc
Handle Data
Resolve Handle
Request
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Administration
Client
Web
HTTP
Web Server
Admin Forms
Handle Admin API
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Custom
Client
Handle Administration
Client
Web
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Administration
embedded in another
process
Web
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
Handle Clients
Handle Resolution
embedded in another
process
Handle Administration
embedded in another
process
GHR
LHS
LHS
LHS
LHS
LHS
LHS
LHS
LHS
Handle System
LHS
HS Administration
•
•
•
•
Ownership is at the handle level
Administrators defined by handles
Administrator handles contain keys
All admin transactions validated via
challenge/response from server to client
• Allows distributed administration
Corporation for National Research Initiatives
Handle System Usage
• Prefixes
– DOI - 900
– Other - 400
• Handles
– DOI - 14M
– Other - unknown
• Global
– Three service sites (all currently in VA)
– 10M resolutions last month
Corporation for National Research Initiatives
Handle System Management and Standards
• Specification
– RFC 3650: Overview
– RFC 3651: Namespace and Service Definition
– RFC 3652: Protocol
• HSAC - Handle System Advisory Committee
• URI/URL/URN
– IETF votes for URN, we don’t see any advantage
• Extra layer of indirection, still need the native protocol
– Many other groups pressuring for URI
– What are the practical implications
– Open to advice
Corporation for National Research Initiatives
Handle Application Issues
• What are you identifying?
– Abstraction? Manifestation?
– Version of a manifestation of an abstraction?
– This specific set of bits?
• When are two things the same?
– For the purpose of what?
– No one thing is the same as another thing (or they wouldn’t be two things)
• “Roughly speaking, to say of two things that they are identical is nonsense, and
to say of one thing that it is identical with itself is to say nothing at all” (L.W.)
•
•
•
•
What explicit metadata to make available
Type system for handle tuples
To what does the handle resolve? (Given a handle, what can I do?)
How to bind the handle and the resource
– Resource contains the handle?
– Handle contains a fingerprint?
•
•
How do you support the infrastructure?
Persistence is a social/organizational issue -- Handle System is just a building
block
Corporation for National Research Initiatives
[email protected]
www.handle.net
Corporation for National Research Initiatives
Mirroring
Local Handle Service
Secondary
Site "A"
Primary Site
Server SA1
Server P1
Server SA2
Server P2
Server P3
Secondary Site "B"
Server SB1
Server SB2
Server SB3
Server SB4
Mirroring
Local Handle Service
Secondary
Site "A"
Primary Site
When Secondary Site "A" started running, each
secondary server sent a request to each
server in the Primary Site asking for updates.
Server SA1
Server P1
Server SA2
Server P2
Server P3
Secondary Site "B"
Server SB1
Server SB2
Server SB3
Server SB4
Corporation for National Research Initiatives
Mirroring
Local Handle Service
Secondary
Site "A"
Each server P1-P3 "knows" which
handles in its transaction log hash to
which secondary server, and sends them.
Primary Site
Server SA1
Server P1
Each secondary will continue to
request updates on a regular basis.
The request is made in the form of
"all transactions since transaction X".
Server SA2
Server P2
Server P3
Secondary Site "B"
Server SB1
Server SB2
Server SB3
Server SB4
Corporation for National Research Initiatives
Mirroring
Local Handle Service
Client
Secondary
Site "A"
Primary Site
Server SA1
Server P1
Server SA2
Server P2
Server P3
Secondary Site "B"
Server SB1
Server SB2
Server SB3
Server SB4
Corporation for National Research Initiatives
For example, for a given new administrative
action, the admin client knows, because of
hashing, that the action is performed on
Primary Server P2 .
Server P2 then knows to send that action to
Secondary Site "A" Server SA2 and to
Secondary Site "B", Server SB1.
Metadata Collection and DOI Registration
Handle data and metadata
Publisher 1
Publisher 2
Publisher 3
Publisher n
Metadata
Wholesale
Metadata
Collection
Collection Service
Handle data
• Indexes
• Filters
• Queries
Handle System
Other Data
Services
VARs
Appropriate Copy Problem
XYZ University
http://dx.doi.org/10.123/456
10.123/456
http://abc.com/article.html
http://abc.com/article.html
dx.doi.org
proxy server
Reference with
DOI for
article.html
in ABC Journal
Handle System
article.html
ABC Journal
publisher
abc.com
Local Copy of
article.html
in ABC Journal
Appropriate Copy Problem: solved
XYZ University
http://dx.doi.org/10.123/456?cookie
Redirect to Local Server
dx.doi.org
proxy server
understands cookies
Reference with
DOI for
article.html
in ABC Journal
Local Server
Handle System
Metadata?
ABC Journal
publisher
abc.com
Metadata
Local Copy of
article.html
in ABC Journal
Metadata
Database
Appropriate Copy Problem
solved w/o local copy
XYZ University
dx.doi.org
proxy server
understands cookies
Reference with
DOI for
article.html
in ABC Journal
Local Server
Handle System
Metadata?
ABC Journal
publisher
article.html abc.com
X
Local Copy of
article.html
in ABC Journal
Metadata
Metadata
Database
Appropriate Copy Problem
extensible solution
XYZ University
http://dx.doi.org/10.123/456?cookie
Redirect to Local Server
dx.doi.org
proxy server
understands cookies
Reference with
DOI for
article.html
in ABC Journal
Handle System
Metadata Location?
Local Server
Meta1.com
ABC Journal
publisher
article.html abc.com
Metadata?
X
Metadata
Local Copy of
article.html
in ABC Journal
Meta1.com
Meta2.com
Meta3.com
Metadata Collection Services