The Handle System: and its role in a Digital Object Architecture Robert E.

Download Report

Transcript The Handle System: and its role in a Digital Object Architecture Robert E.

The Handle System:
and its role in a Digital Object Architecture
Robert E. Kahn
CNRI
Workshop on Frontiers
in Distributed Information Systems
Presidio of San Fransisco
July 31 – August 1, 2003
Objective of the Framework
Heterogeneous
Networks
Networks
Internet objective
Best-effort Packet Delivery
Information
Information
Systems Systems
Seamless Interoperability
Federating Heterogeneous Systems
Internet Comparison
• IP Addresses  Machines
• Gateways (now routers) help with access
• TCP handles end-end issues
–
–
–
–
Remove duplicate packets
Restructure the arriving fragmented stream
Perform end-end error detection & retransmission
Provide flow control
Further Scoping the Problem
Time to
Resolve Query
Complexity of Query
Initial Focus on Queries
with Complexity = Zero
Literary Music Video Financial Grid Enum RFID
“SimpleLookup URL IPaddresses “Unfederated Databases”
Basic Attributes of the Approach
• Digital Objects (i.e. Data Structures)
• Unique Identifiers  Digital Objects
• Resolution & Administration Mechanism
– Maintains Uniqueness of Ids  DOs as long as
they persist
– Maps Ids  Useful State Information
– Is distributed and scaleable
– Does not involve complete search
Digital Object
•
•
•
•
•
•
•
Set of elements, each of <Type, Value>
Parsable across heterogeneous platforms
One element must be the unique identifier
Properties Record contains metadata
Transaction Record records usage
Most users wish to access its Essence
Key Metadata is part of the Essence
Disseminators
Methods
Internal Data Structure
Digital Object
The internal data structure
is not directly accessible
by the programmer
Access to the object is subject to
control by the owner. For example,
a market in disseminators is possible.
Purposely Silent about
•
•
•
•
•
•
•
What Types
What Type of Types
What Values
What metadata or metadata schema
What state information in Handle Records
Policies and Procedures in general
There are policies for Global, however
A Range of Possibilities
•
•
•
•
•
•
Identifiers are persistent – e.g. DOIs
Identifiers are transient – e.g. Grid
Identifiers are resolvable
Resolution information is not accessible
Digital Objects are fixed, unchangeable
Access to Digital Objects is fixed, even if DOs
are changeable
Repository Notion
Logical External Interface
RAP
Any Hardware & Software
Configuration
Nesting of Repositories
Aggregation &
De-aggregation
Content
Core Interface must be present at each level
Other levels could be separately defined later
Structure
Core
Federated Repositories
• Key issue is commonality of interests in accessing
information from multiple repositories.
• Financial Information is prime applications area
• Metadata Registries allow for searching based on
“user-supplied” inputs. The use of handles (however
branded) can simplify access.
• Access via local repositories is an operational
desirable capability.
MetaObjects & Metadata
Registries
• MetaObjects provide a structural basis for
indirection and for organizing information
• Metadata is used to characterize digital
objects, to access their identifiers and to
assist in cross referencing
• Metadata Registries provide uniform access
to metadata.
Handle Format
2304568.40/12345678
Naming
Authority
Item ID
(any format)
Prefix
Suffix
In use, a Handle is an opaque string.
Corporation For National
Research Initiatives
Handles Resolve to Typed Data
Just one example - also looks like a digital object
Handle
Data type
2304568.40/12345678
Extensible Data Types
Handle data
URL http://www.loc.gov/.....
URL http://www.loc2.gov/..
RAP loc/repository
XYZ 1001110011110
Handle Record
Handles can also have semantics but
we frown on it! Resolution is independent
of semantics in every instance
Allocation of Prefixes
1
2
3
4
5
6
7
8
- System Uses
- High Fan in/out Organizations
“
- Businesses and formal organizations
“
- Individuals and anything that cant fit above
“
“
Creating & Resolving Type
Information Dynamically
• Prefixes of the form 0.X are reserved for
defining resolvable “system information”
such as types and naming authorities
• 0.type/<type> is a handle for the type in
brackets
• 0.na/<na> is a handle for a particular na
• Non-system types can also be created by
individual users
Global Handle Resolution
HANDLE ADMINISTRATION
HS1
HS2
HPS3
HANDLE RESOLUTION
(Handles are uniformly spread by hashing)
HS4
Multiple
HandleHandle
ServersServers
Global & Local Handle Resolution
HANDLE ADMINISTRATION
HS1
HS2
HPS3
HS4
Global
Handle Servers
HANDLE RESOLUTION
Local
HANDLE RESOLUTION
How do handles resolve...
1. Where is 1895.22/1011?
GHR
Map of LHS B
LHS C
LHS A
LHS D
Handle Client
2. Give me all data for 1895.22/1011
LHS ..n
LHS B
Handle Data
Handle System
Two steps to resolve a handle - • Client queries GHR: “Which Handle Service has 1895.22/1011?”
• GHR responds with a “map” showing the client which servers within
the responsible LHS it can query for that handle .
Administration of Handle Records
univ/thesis.txt
1217/4913527
univ/4913527
1217/thesis.txt
(the handles shown above identify digital objects)
univ
1217
univ.csl
univ.csl.17
univ.csl.17.2
1217.34
1217.34.1
The Global Handle Registry
Global
Handle Registry
DOI Handle
Service
MIT
Handle Service
CMU
Handle Service
LOC
Handle Service
DTIC Handle
Service
Twin Bays
Handle Service
Liqid Krystal
Handle Service
Korean Ctrl Lib
Handle Service
Nat’l Lib Australia
Handle Service
• The GHR is a unique handle service used to store the identity and
location of all local handle services (LHS), and tells a handle client
which service to query to resolve a handle.
• All handle clients (for resolution or administration) know how to
contact and query the GHR.
Groups of Handle Servers
P
S
S
Group A
S
Group B
S
Group C
Group D
Handle Clients
Administration
Use the Java™ Handle
Client Tool provided in
the distribution for creating
or updating handles
one-at-a-time or via a batch.
or
Develop your own
administration client.
Handle Clients
Resolution
Download web browser plug-in which enables browsers
to recognize the handle protocol.
or
Append a handle to proxy server
e.g http://hdl.handle.net/<handle>)
which understands both HTTP and HDL protocols.
or
Develop your own resolution client.
Setting up a Local Handle
Service...
• Download the software from
http://www.handle.net.
• Follow the instructions in the installation script.
• Send your “site bundle”, containing the IP address
of your server and your administrator information,
to the Global Handle Registry (GHR)
administrator.
Organization of the International DOI Foundation
Members are
Mostly Book &
Journal Publishers
 Membership Dues
IDF
- Policies & Procedures
- Licensing the DOI TM
- Qualifying RAs
- Marketing the DOI brand
CDD
4¢ per DOI on deposit – 1X; min $20K/yr
1¢ per DOI in CDD on 12/31 – annual
½¢ per DOI in CDD after $50K per RA
IDF is a non-profit
organization with offices in
Washington, DC (AAP)
Geneva, Switzerland (IPA)
Business Potential
• Enabling new forms of Creativity
– New forms of expression
– Representing value as Digital Objects
• Selling infrastructure technology & services
• Enabling Third Party value-added capabilities
• Helping organizations manage their own information
better & offer new types of services
• Stimulating access to “surface information” and
“embedded information” with appropriate access
controls and conditions of use
Evolution of Policy for Global
• Original Policy
– Best efforts service; run in-house
– Cost paid by the Government
– Available to the research community for free
• Current Policy (still in flux)
– Best efforts service; run 7x24 with backup
– Free to the research community; commercial users pay
after a period of experimentation
– Handle System Advisory Committee oversees costs and
evolution.
Cost of Global Services
• IPv4  several million addresses; about 50M TLDs
(excluding CCs)
• At say $20 per year per TLD, the cost of global
registration and resolution services is about $1B per
year – this is inefficient, very profitable or both
• The handle system is almost as large as DNS (there are
over 10M DOIs alone) and costs about $250K per year
at present.
• The DNS can be run within the handle system, if
desired; but the handle system can support IPv4 and
IPv6 without DNS
Applications of the Technology
•
•
•
•
•
•
•
•
•
Identity Management (DHS)
PKI Infrastructure
Personal Locator Information
Efficient Communications
Steganography
Managing Digital Cash
Managing Business Transactions (e.g. email)
Learning of more up to date Publications
Cataloguing and Indexing