Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City May 15, 2001

Download Report

Transcript Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City May 15, 2001

Managing Digital Objects on the Net
by
Robert E. Kahn
Corporation for National Research Initiatives
Reston, Virginia
National Online 2001
New York City
May 15, 2001
Digital Libraries & Publishing
•
•
•
•
•
•
•
•
Content is Everything
Rights, Interests & Value Prevails
Technologists design & develop systems
Lawyers control the use of content
Copyright Law governs
Systems must implement policy
Policy is directly impacted by technology
Collaboration is the name of the game
Issues to Consider
• What can you do with information accessed
on the network?
• How do you know that the information has
not been altered in some material way?
• How can you (as an owner) control your
information in the network environment?
• Must you retain physical copies for archival
purposes and for authenticity?
Business Potential
• Selling infrastructure technology & services
• Enabling Third Party value-added capabilities
• Helping organizations manage their own
information better & offer new types of
services
• Stimulating access to “surface information”
and “embedded information” with appropriate
access controls and conditions of use
Objective of the Framework
Heterogeneous
Networks
Networks
Internet objective
Best-effort Packet Delivery
Information
Information
Systems Systems
Seamless Interoperability
Federating Heterogeneous Systems
A Digital Library Example
• Any material stored anywhere is accessible
the same way local material is accessible
• No fanfare about manifesting the material
• No limitations on time-frame if the material
and its supporting systems are “managed”
• Framework incorporates search & creation
but defers on defining it for now
• Encourages third-party value-added services
Further Scoping the Problem
Time to
Resolve Query
Complexity of Query
Initial Focus on Queries
with Complexity = Zero
Key Attributes of the
Infrastructure
•
•
•
•
Structured Information as Digital Objects
Persistent, unique and resolvable identifiers
Repositories to store Digital Objects
“Terms & Conditions” for each Digital
Object supplied by the “owner” of the object
• Integrated in an open-architecture System
• In a communications network environment
Nature of the Repository
•
•
•
•
•
Not like a bookshelf or a pantry
More like a service-oriented restaurant
One can “deposit” & “access” digital objects
Deposit produces a “stored digital object”
Access results in a “communications service”
that disseminates information
• Like restaurant ordering results in a culinary
service which results in an eating experience
Repository Access
Digital objects come into existence for a user group by having a handle
that can be accurately resolved by that group and by being stored in a
repository accessible by that group
Digital Object
Property Record
Transaction Record
Manifest Mechanisms
Other
Repositories
Repository
Repositories can be digital objects
Access means run a defined service
on a specified digital object
Deposit (H,Svc)
Access (H,Svc)
Disseminations appear as digital objects
Interactions between Repositories
Repository A
Repository B
Stored
Digital Object
User’s Computer
Nesting of Repositories
Aggregation &
De-aggregation
Content
Core Interface must be present at each level
Other levels could be separately defined later
Structure
Core
Digital Objects as Structured
Information
• Works are incorporeal
• Copies are material objects than embody
structured information
• A “Book” is a way of structuring information
• A copy of a Book can be produced as “ink on
paper”
• A “Digital Object” is another way of
structuring information
Digital Object Structuring
Every Digital Object consists of a set of typed bit sequences
Digital Object
Headers
Header1 ==> Ver;DT of Dissem; Orig of Dissem; TTL
Header 2 ==> Ver;DT of Deposit;Orig of Deposit; TTL
Handle
Handle is the first Bit Sequence
Element
Bit sequence
Bit Sequence ==> <type><length><value>
Types are resolvable
Bit Sequence
MetaObjects & Metadata
Registries
• MetaObjects provide a structural basis for
indirection and for organizing information
• Metadata is used to characterize digital
objects, to access their identifiers and to
assist in cross referencing
• Metadata Registries provide uniform access
to metadata.
The Handle System
• Distributed name service based on open
standard that is scalable, extendable, and
efficient
• First general purpose indirection system on the
Internet to provide user-defined state
information - optimized for speed & reliability
• Can be used to locate repositories that contain
digital objects given their handles
• More generally, can be used to provide indirect
references and other rapid lookup information
Handle System Features
•Full featured name service that supports
both name resolution and administration
• Internationalized namespace that supports
non-ASCII native characters
• Secured name service that supports both
client/server authentication, service integrity,
and confidentiality
• Persistence namespace that separates the
name of any underlying digital object from
its location
Handle Format
2304568.40/12345678
Naming
Authority
Item ID
(any format)
Prefix
Suffix
In use, a Handle is an opaque string.
Corporation For National
Research Initiatives
Handles Resolve to Typed Data
Just one example - also looks like a digital object
Handle
Data type
2304568.40/12345678
Extensible Data Types
Handle data
URL http://www.loc.gov/.....
URL http://www.loc2.gov/..
RAP loc/repository
XYZ 1001110011110
Handle Record
Handles can also have semantics but
we frown on it! Resolution is independent
of semantics in every instance
Handle Resolution
Insert, Delete, Change Handle Record for Ha
HS1
HS2
HS3
Resolve Handle for Hb
(Handles are uniformly spread by hashing)
HS4
Handle
Servers
Handle
Servers
Administration of Handle Records
univ/thesis.txt
1217/4913527
univ/4913527
1217/thesis.txt
(the handles shown above identify digital objects)
univ
1217
univ.csl
univ.csl.17
univ.csl.17.2
1217.34
1217.34.1
Groups of Handle Servers
P
S
S
Group A
S
Group B
S
Group C
Group D
Repositories & Digital Objects
Each Digital
Object has its
own unique &
persistent ID
Content Providers
want to assign Ids
IPv6
REPOSITORY
Could be upwards
of millions of DOs
per Repository
CORDS
• Copyright Office Registration, Recordation
& Deposit System
• Allows on-line Registration of claims to
copyright
• Permits qualified external repositories
• Retains signed applications with
fingerprints of submitted digital objects
Federated Repositories
• Key issue is commonality of interests in
accessing information from multiple
repositories.
• Financial Information is prime applications area
• Metadata Registries allow for searching based
on “user-supplied” inputs. The use of handles
(however branded) can simplify access.
• Access via local repositories is an operational
desirable capability.
Archival Systems
• The Digital Object Infrastructure provides a
structural basis for the development of
archival systems
• It provides a solid conceptual basis for the
development of “federated repositories”
• It lends itself to long-term efficiencies as
archived information is ported from platform
to platform with evolution of the technology
Conclusions
• Managing Digital Objects is the challenge
• Technology Components are available from R&D
• Robust Versions are needed for industry
acceptance - needs commercialization
• Applications (with user-friendly interfaces) need
to be developed & deployed
• Which can fundamentally alter the net, how it is
used and its impact on business and society