Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City May 15, 2001
Download ReportTranscript Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City May 15, 2001
Managing Digital Objects on the Net by Robert E. Kahn Corporation for National Research Initiatives Reston, Virginia National Online 2001 New York City May 15, 2001 Digital Libraries & Publishing • • • • • • • • Content is Everything Rights, Interests & Value Prevails Technologists design & develop systems Lawyers control the use of content Copyright Law governs Systems must implement policy Policy is directly impacted by technology Collaboration is the name of the game Issues to Consider • What can you do with information accessed on the network? • How do you know that the information has not been altered in some material way? • How can you (as an owner) control your information in the network environment? • Must you retain physical copies for archival purposes and for authenticity? Business Potential • Selling infrastructure technology & services • Enabling Third Party value-added capabilities • Helping organizations manage their own information better & offer new types of services • Stimulating access to “surface information” and “embedded information” with appropriate access controls and conditions of use Objective of the Framework Heterogeneous Networks Networks Internet objective Best-effort Packet Delivery Information Information Systems Systems Seamless Interoperability Federating Heterogeneous Systems A Digital Library Example • Any material stored anywhere is accessible the same way local material is accessible • No fanfare about manifesting the material • No limitations on time-frame if the material and its supporting systems are “managed” • Framework incorporates search & creation but defers on defining it for now • Encourages third-party value-added services Further Scoping the Problem Time to Resolve Query Complexity of Query Initial Focus on Queries with Complexity = Zero Key Attributes of the Infrastructure • • • • Structured Information as Digital Objects Persistent, unique and resolvable identifiers Repositories to store Digital Objects “Terms & Conditions” for each Digital Object supplied by the “owner” of the object • Integrated in an open-architecture System • In a communications network environment Nature of the Repository • • • • • Not like a bookshelf or a pantry More like a service-oriented restaurant One can “deposit” & “access” digital objects Deposit produces a “stored digital object” Access results in a “communications service” that disseminates information • Like restaurant ordering results in a culinary service which results in an eating experience Repository Access Digital objects come into existence for a user group by having a handle that can be accurately resolved by that group and by being stored in a repository accessible by that group Digital Object Property Record Transaction Record Manifest Mechanisms Other Repositories Repository Repositories can be digital objects Access means run a defined service on a specified digital object Deposit (H,Svc) Access (H,Svc) Disseminations appear as digital objects Interactions between Repositories Repository A Repository B Stored Digital Object User’s Computer Nesting of Repositories Aggregation & De-aggregation Content Core Interface must be present at each level Other levels could be separately defined later Structure Core Digital Objects as Structured Information • Works are incorporeal • Copies are material objects than embody structured information • A “Book” is a way of structuring information • A copy of a Book can be produced as “ink on paper” • A “Digital Object” is another way of structuring information Digital Object Structuring Every Digital Object consists of a set of typed bit sequences Digital Object Headers Header1 ==> Ver;DT of Dissem; Orig of Dissem; TTL Header 2 ==> Ver;DT of Deposit;Orig of Deposit; TTL Handle Handle is the first Bit Sequence Element Bit sequence Bit Sequence ==> <type><length><value> Types are resolvable Bit Sequence MetaObjects & Metadata Registries • MetaObjects provide a structural basis for indirection and for organizing information • Metadata is used to characterize digital objects, to access their identifiers and to assist in cross referencing • Metadata Registries provide uniform access to metadata. The Handle System • Distributed name service based on open standard that is scalable, extendable, and efficient • First general purpose indirection system on the Internet to provide user-defined state information - optimized for speed & reliability • Can be used to locate repositories that contain digital objects given their handles • More generally, can be used to provide indirect references and other rapid lookup information Handle System Features •Full featured name service that supports both name resolution and administration • Internationalized namespace that supports non-ASCII native characters • Secured name service that supports both client/server authentication, service integrity, and confidentiality • Persistence namespace that separates the name of any underlying digital object from its location Handle Format 2304568.40/12345678 Naming Authority Item ID (any format) Prefix Suffix In use, a Handle is an opaque string. Corporation For National Research Initiatives Handles Resolve to Typed Data Just one example - also looks like a digital object Handle Data type 2304568.40/12345678 Extensible Data Types Handle data URL http://www.loc.gov/..... URL http://www.loc2.gov/.. RAP loc/repository XYZ 1001110011110 Handle Record Handles can also have semantics but we frown on it! Resolution is independent of semantics in every instance Handle Resolution Insert, Delete, Change Handle Record for Ha HS1 HS2 HS3 Resolve Handle for Hb (Handles are uniformly spread by hashing) HS4 Handle Servers Handle Servers Administration of Handle Records univ/thesis.txt 1217/4913527 univ/4913527 1217/thesis.txt (the handles shown above identify digital objects) univ 1217 univ.csl univ.csl.17 univ.csl.17.2 1217.34 1217.34.1 Groups of Handle Servers P S S Group A S Group B S Group C Group D Repositories & Digital Objects Each Digital Object has its own unique & persistent ID Content Providers want to assign Ids IPv6 REPOSITORY Could be upwards of millions of DOs per Repository CORDS • Copyright Office Registration, Recordation & Deposit System • Allows on-line Registration of claims to copyright • Permits qualified external repositories • Retains signed applications with fingerprints of submitted digital objects Federated Repositories • Key issue is commonality of interests in accessing information from multiple repositories. • Financial Information is prime applications area • Metadata Registries allow for searching based on “user-supplied” inputs. The use of handles (however branded) can simplify access. • Access via local repositories is an operational desirable capability. Archival Systems • The Digital Object Infrastructure provides a structural basis for the development of archival systems • It provides a solid conceptual basis for the development of “federated repositories” • It lends itself to long-term efficiencies as archived information is ported from platform to platform with evolution of the technology Conclusions • Managing Digital Objects is the challenge • Technology Components are available from R&D • Robust Versions are needed for industry acceptance - needs commercialization • Applications (with user-friendly interfaces) need to be developed & deployed • Which can fundamentally alter the net, how it is used and its impact on business and society