October 18 Shanghai - Leonardo Chiariglione

Download Report

Transcript October 18 Shanghai - Leonardo Chiariglione

NAMING AND MEANING
Key to the management of intellectual property in digital media
Europe-China Conference on Intellectual Property in Digital Media
Shanghai Oct 18 2006
Norman Paskin
TERTIUS
Ltd
Naming
•
Assigning an identifier to a referent
•
Identifier: unique persistent alphanumeric string (“number”, “name”,
“lexical token”) specifying a referent
– Unique: one to many: an identifier specifies one and only one referent (but a
referent may have more than one identifier)
– Persistent: once assigned, does not change referent
•
Resolution: process by which an identifier is input to a network service
which returns its associated referent and/or descriptive information
about it (metadata).
•
Referent: the object which is identified by the identifier, whether or not
resolution returns that object.
•
Object: any entity within the scope of the identifier system.
– may be abstract, physical or digital, since all these forms of entity are of
relevance in content management (e.g. creations, resources, agreements,
people, organisations)
Naming
•
First class naming: Digital Object Architecture
– “Digital information needs to be a first class citizen in the networked
environment” (Kahn/Wilensky 1995)
• First class = one that has an identity independent of any other item
•
Handle system
– Part of the Digital Object Architecture: a system for persistent naming for
digital objects and other resources on the Internet, and efficiently resolving
those names to data
•
DOI (Digital Object Identifier) system
– One application of the Handle System, which adds to it additional features –
social and technical infrastructure, policies, metadata management.
•
Internet
– the global information system that is logically linked by a globally unique
address space and communications using TCP/IP and provides high level
services layered on these (or successors)
– Not DNS; not the Web (includes P2P, voip, etc)
•
DNS: Domain Name System
– maps domain names (computer hostnames) to IP addresses.
What is being named?
•
Granularity: the extent to which a collection of information has been
subdivided for purposes of identification (e.g. a collection; a book; tables and
figures)
– Functional Granularity: it should be possible to identify an entity whenever it
needs to be distinguished
•
Precisely what is being named?
–
–
–
–
–
The work “Robinson Crusoe”?
The Norton edition of “Robinson Crusoe”?
The pdf version of the Norton edition of…. ?
The pdf version of…held on this server…?
Most digital objects of interest have compound form, simultaneously
embodying several referents
– Resolution of an identifier may give the referent, or only metadata; or a
“manifestation”
•
Resolution of an identifier
– Persistence: “get me the right thing”
– Contextual resolution: “get me the thing that is right for me”
– Appropriate copy resolution (e.g. OpenURL context-sensitive linking): same
content in different contexts
– Full contextual resolution (e.g. DVIA): different content in different contexts
Resolution
•
DNS is current basis of resolution of web-based identifiers
–
–
–
URL: not a first class name; an attribute: a location of a file on the WWW
• specification allows addressing by full path to host ( IP address); rarely used.
• if the content of the file is moved, the URL link won't find it ("404 not found", or
manual redirection, or automated redirection which may not persist).
• if the content, but not location, of the file is changed, a user may not know this.
URN: naming convention for the content of files.
• Specification independent of technologies; but DNS the only present technique
• No widely standardised ways of using this: can't type URNs into browsers except in
certain special circumstances.
URI: collective name for URN and URL schemes.
•
Not the basis of other non-web identifiers – e.g. Skype names
•
DNS not a good general-purpose name system
–
–
–
–
–
–
Does not meet requirements of first class name + appropriate granularity
Not first class names: all URIs at one location have to be ultimately managed by the
same domain name owner, which makes URLs brittle for any piece of content which
could possibly change owners
No granularity of administration per name by anyone other than a network administrator
URLs are grouped by domain name and then by some hierarchical structure, originally
based on file trees, now possibly unconnected from that but still a hierarchy
problems of security and updating and internationalisation
Potential scalability in the face of new technologies
What is the problem?
•
Managing information in the Net over very long periods of time:
– centuries or more
•
•
•
•
Dealing with very large amounts of information in the Net over time
When information, its location(s) and the underlying systems may change
dramatically over time
Respecting and protecting rights, interests and value
Allow for
–
–
–
–
•
arbitrary types of information systems
dynamic formatting and data typing
interoperability between multiple different information systems
metadata schema to be identified and typed
Solution to this problem was put forward as Digital Object Architecture
(Kahn/Wilensky 1995+) and has been successfully developed and deployed
Digital Object Architecture
– Digital Objects (DOs)
• Structured data, independent of creation platform
• Consisting of “elements” of the form <type,value>
• One of which is its unique, persistent identifier
– Resolution of Unique Identifiers
• Maps an identifier into “state information” about the DO
• Identifiers are known as “Handles”
–
–
–
–
Format is “prefix/suffix” (e.g. 10.100/1234)
Prefix is unique to a naming authority
Suffix can be any string of bits assigned by that authority
Handle System is a general purpose resolution system
– Repositories from which DOs may be accessed
– Metadata Registries
• Repositories that contain general information about DOs
• Supports multiple metadata schemes
• Can map queries into unique DO specifications (via handles)
Handles resolve to typed data
Handle
10.123/456
Data type
Index
Handle data
URL
1
http://acme.com/….
URL
2
http://a-books.com/….
DLS
9
acme/repository
HS_ADMIN
XYZ
100
12
acme.admin/jsmith
1001110011110
Handle System
•
•
•
•
Part of the Digital Object Architecture: www.handle.net (Bob Kahn)
Basic resolution system for Internet: identify objects, not servers.
Optimized for speed, reliability, scaling (compared to DNS)
Open, well-defined protocol and data model (RFC 3650,1,2)
– Free protocol; service at cost (non-profit);
– freely available to be used as engine underneath other named identifiers.
• Separation of control of the handle and who runs the servers
– distributed administration, granularity at the handle level
• Any Unicode character set
– China: CNNIC (.CN registrar) has integrated DNS and handle
• All transactions can be secure and certified
– own PKI as an option
•
•
•
•
Not all data public: individual values within a handle can be private.
No semantics in the identifier
Logically centralized, physically distributed and highly scalable
Does not need DNS, but can work with DNS:
– deployed via tools e.g http proxies, client plug-ins, server software, etc
Handle System usage
•
•
•
•
•
•
•
•
•
Provides infrastructure for application domains, e.g., digital libraries &
publishing, network management, id management ...
Library of Congress
DTIC (Defense Technical Information Center)
IDF (International DOI Foundation)
– CrossRef (scholarly journal consortium)
– Office of Publications of the European Community
– CAL (Copyright Agency Ltd - Australia)
– MEDRA (Multilingual European DOI Registration Agency)
– Nielsen BookData (bibliographic data - ISBN)
– R.R. Bowker (bibliographic data - ISBN)
– German National Library of Science and Technology etc
NTIS (National Technical Information Service)
D-Space (MIT + HP)
ADL (DoD Advanced Distributed Learning initiative)
Several Digital Library projects (eg ARROW)
In development: Globus Alliance (for GRID computing)
Handle System usage
• Assigned Prefixes
– DOI
2028
– DSpace
453
– Other apps 406
• Handles
– DOI
25+ M
– Other: additional millions (total per prefix known only to prefix manager;
e.g. LANL adding 600M but privately)
• Global Handle System
– Core: three service sites (added locations being considered)
– c. 50 million direct resolutions per month
– c. 50 million proxy server resolutions
The DOI System
• DOI (Digital Object Identifier) system: www.doi.org
• Initially developed from the publishing industry but now wider
• Currently being standardised in ISO (TC46/SC9)
• the home of ISBN etc “content identifers”
• One application of the Handle System
• adds to it additional features – social and technical infrastructure,
policies, metadata management.
Naming scheme
and resolution
doi>
Data Model for
declaring
meaning
Policies
Naming scheme
and resolution
doi>
Naming scheme
and resolution
Data Model for
declaring
meaning
•
•
The Handle System
An identifier “container” e.g.
•
Resolve from DOI to data
– 10.1234/NP5678
– 10.5678/ISBN-0-7645-4889-4
– 10.2224/2004-10-ISO-DOI
– Initially resolve to location (URL) –
persistence
– May be to multiple data:
• Multiple locations
• Metadata
• Services
• Extensible
Policies
Naming scheme
and resolution
doi>
Data Model for
declaring
meaning
•
•
•
Policies
DOI policies
Implementation through International DOI Foundation
Not-for-profit body: federation of appointed agencies
– Governance and agreed scope, policy, “rules of the road”
– Technical infrastructure: resolution mechanism, proxy servers, mirrors, back-up,
central dictionary,
– Social infrastructure: persistence commitments, fall-back procedures, costrecovery (self-sustaining), shared use of IDF tools etc
Registration agencies
– Each can develop own applications
– Any business model
– Use in “own brand” ways appropriate for their community
• DOI Data Model = Metadata tools:
Naming scheme
and resolution
–a data dictionary to define
–a grouping mechanism to relate
doi>
Data Model for
declaring
meaning
• Necessary for interoperability
Policies
• Able to use existing metadata
–Mapped using a standard dictionary
–Can describe any entity at any level of
granularity
• See “DOI and data dictionaries” www.doi.org
Data Model for
declaring meaning
Meaning
•
Assigning metadata to a referent, to enable semantic interoperability
– “say what the referent is”
– Resolution of an identifier may give the referent, or only metadata; or a
“manifestation”
•
Semantic:
– Do two identifiers from different schemes actually denote the same referent?
– If A says “owner” and B says “owner”, are they referring to the same thing?
– If A says “released” and B says “disseminated”, do they mean different
things?
•
Interoperability: the ability for identifiers to be used in services outside
the direct control of the issuing assigner
– Identifiers assigned in one context may be encountered, and may be reused, in another place or time - without consulting the assigner. You can’t
assume that your assumptions made on assignment will be known to
someone else.
•
Persistence = interoperability with the future
Tools to ensure meaning
• Basis: “Interoperability of Data in E-Commerce Systems”
(indecs) : http://www.indecs.org 1998-2000
• Focus: generic intellectual property and how to make data about it
interoperable
• Who: EC + groups from the content, author, creator, library,
publisher and rights communities
• What: Pioneered a model of event-based metadata as a solution
for integrating management of rights.
• Led to: a structured ontology (data dictionary); tools for mapping
terms precisley; inference tools etc:
– contextual ontology architecture
Metadata scheme
e.g. ONIX
Metadata scheme
e.g. LOM
Agreed term-byterm mapping or
“Crosswalk”
Metadata scheme
e.g. ONIX
Metadata scheme
e.g. LOM
Metadata scheme
e.g. ONIX
Term “Author”
Metadata scheme
e.g. LOM
Metadata Scheme
NormanRights
Term “Writer”
ONIX:Author = NormanRights:Writer
Metadata interoperability: semantic problems
Mappings are not simple:
•
Different names (and languages) for the same thing (Author vs Writer)
•
Same name for different things (title, Title)
•
Data elements at different levels of speciality (title vs FullTitle,
AlternativeTitle).
•
Different allowed values for elements (“pii” vs “not pii”)
•
Data at different levels of granularity (journal_article vs
SerialArticleWork/SerialArticleVersion).
•
Data in different structures (article as attribute of journal or vice versa).
•
Data from different sources (local codes vs ONIX codes).
•
Different contextual meaning (DOI of what…?)
•
Different representation (1 title vs n titles).
•
Different mandatory requirements (ISSN mandatory vs optional)
•
Schemas are being updated all the time.
. . . . etc.
To manage all of this requires a coherent structured approach.
Contextual analysis
Agent
Norman Paskin
Time
Place
Shanghai
2006-10-18
Resource
061018IPDMShanghai.ppt
Contextual analysis
Agent
Time
Event: Norman Paskin
presented 061018.ppt
in Shanghai on 18 Oct 2006
Resource
Place
Context Model
Key
Agent
Values of
Basic Terms
Types of
Basic Terms
HasValue
Context
RelatingTerms
AgentType
Has
AgentType
Time
Has
Value
Time
Type
Has
TimeType
Context
Type
HasValue
Has
PlaceType
Has
ResourceType
ResourceType
HasValue
Resource
Place
Type
Has
Value
Place
Tools to ensure meaning
Contextual Ontology approach is used in:
•
ISO MPEG-21 Rights Data Dictionary (http://iso21000-6.net/)
•
DOI Data Dictionary (http://www.doi.org )
•
DDEX digital data exchange - music industry (http://ddex.net/)
•
ONIX: Book industry (+) messaging schemas (www.editeur.org )
•
Rightscom’s OntologyX - licensee of output, plus own work on tools
(www.rightscom.com )
•
Digital Library Federation - communication of licence terms (ERMI: ONIX for
licensing terms)
•
ACAP: Content Access (http://www.the-acap.org/ )
etc
Naming and Meaning
•
Naming: prerequisite for management of digital information entities
–
–
–
–
–
•
name and manage information in the form of digital objects
naming conventions for identifying (first class naming)
service for using object names to locate and disseminate objects
infrastructure for extensible distributed digital information services
agnostic as to technology (web, mobile, P2P, etc) - assumes only the
existence of the internet protocol (or successors)
Meaning: prerequisite for enabling digital information entities to interact
– interoperability and digital policy management
– semantic interoperability: building on the indecs (interoperability of
data in e-commerce systems) principles
– deployment of a context-based ontology mapping.
NAMING AND MEANING
Key to the management of intellectual property in digital media
Norman Paskin
[email protected]
TERTIUS
Ltd