No Slide Title

Download Report

Transcript No Slide Title

enardus
Follow the Fox to Renardus: an Academic Subject Gateway Service for Europe
Cross-browsing and Cross-searching in a
Distributed Network of Subject Gateways:
Architecture, Data Model and Classification
Dr. Heike Neuroth & Traugott Koch
State Library of Lower Saxony and the University Library of Göttingen, Germany
[email protected]
NetLab, Lund University Library Development Department, Sweden
[email protected]
www.renardus.org
Content

Renardus (aim, partners, etc.)

Subject Gateway (definition, elements)

Renardus Application Profile (working steps, metadata
core set, data model, etc.)

Renardus Collection Level Description

Renardus Technical Approach

DDC Mapping for Cross-Browsing (methods, mapping
relationships etc.)

Outlook
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
What is Renardus?


EU-funded project:

EC: 1,7 Mio EURO, including non costs: 2,3 Mio EURO

1 January 2000 - 30 June 2002

under the “Information Society Technologies” (IST-1999-10562)
'Promoting a User-friendly Information Society‘, a major theme
of the European Union's 5th Framework Programme
Partners drawn from 7 countries:


Project Management: National Library Den Haag (NL)
Denmark, Finland, Sweden, France, United Kingdom,
Germany
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Objectives


to provide access to distributed quality-controlled
subject gateways (high quality metadata collections)
across Europe via one single interface:

cross-search

cross-browse
and to develop, define:

metadata solutions

Renardus Application Profile, Renardus Namespaces,
Renardus Collection Level Description

technical solutions

organizational/business models
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Member Subject Gateways










DAINet: German Agricultural Information System
Document Server DEPOSIT: Deposit of German Online
Dissertations
DutchESS: Dutch Electronic Subject Service
EELS: Engineering Electronic Library, Sweden
FVL: The Finnish Virtual Library
NOVAGate: Libraries of Nordic Agricultural & Veterinary Univ.
SSG-FI: MathGuide, Geo-Guide, History Guide, Anglistik Guide
RDN hubs: Resource Discovery Network (EEVL, SOSIG, OMNI, ...)
Danish Electronic Research Library (future partner)
Les Signets: Collection of Internet Resources (future partner)
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Subject Gateway
”Quality-controlled subject gateways are Internet-services which
apply a rich set of quality measures to support systematic resource
discovery. Considerable manual effort is used to secure a selection
of resources which meet quality criteria and to display a rich
description of these resources with standards-based metadata.
Regular checking and updating ensure good collection management.
A main goal is to provide a high quality of subject access through
indexing resources using controlled vocabularies and by offering a
deep classification structure for advanced searching and browsing.”
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Subject Gateway

cont.
Elements:

creation: manual/intellectual, experts etc.

selection and collection development: policy, selection criteria etc.
collection management: maintenance of collection etc.
resource description/metadata: rich set of metadata, formalized
content description etc.
subject classification/subject access: controlled vocabularies etc.
standards: allow interoperability etc.
value-adding features: display, usage features etc.





ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Working Steps - General

selection of necessary/meaningful elements:




for a service like Renardus: „Meta-Subject Gateway“,
European service (multilingual access, search, browse)
for search, filter, sort, and display options
for browse, subject access
selection of common metadata format (exchange
format):




Dublin Core Metadata Element Set v1.1
Dublin Core Qualifiers
others
home-grown
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Working Steps - Analysis

first survey of partners‘ metadata format and detailed
descripion of each subject gateway

GENERAL


COLLECTION/SELECTION


name of SG, acronym, responsible organization, source of
funding, time for record creation, general description etc.
target user group, common primary language of target audience,
collection scope, geographical and language coverage, selection
criteria, granularity, resource types, resource formats etc.
CONTENT - METADATA

metadata scheme, metadata set, crosswalks, interoperability,
cataloging rules, authority files etc.
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Working Steps - Analysis

CONTENT - OTHERS



metadata browsable, searchable
language(s) of descriptions, thesauri, interface, translation
support etc., keywords, classification systems, etc.
INDEX TYPE/TECHNICAL NOTES


search engine, indexing system, structure of data storage etc.
INTELLECTUAL PROPERTY RIGHTS (IPR)
copyright, branding
VARIOUS
 (quality) control, link checking, record checking/update etc.
 backlinks of the gateway, statistical analysis of log files etc.



cont.
etc.
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
First Results

definition of 8 metadata elements without detailed
semantics, syntax based on Dublin Core:








DC.Title
DC.Creator
DC.Description
DC.Subject
DC.Identifier
DC.Language
DC.Type
Country
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus Data Model

detailed investigations of each element about:








semantics and syntax of each element
qualifiers (refinements, encoding schemes)
cataloging rules (creator, description, keywords)
namespace
repeatability of each element
form of obligation (mandatory, strongly recommended, optional)
language qualifier (for title, description, subject)
and:



administrative elements
future elements (rights, publisher), additional elements (format, etc.)
common browsing structure via classification system (home-grown,
reuse of an existing system, which one)
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus Data Model
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
cont.
Renardus Application Profile

Renardus Application Profile based on four namespaces,
to be encoded in RDF/XML:

Dublin Core Namespace: [DCMES version 1.1] Dublin Core
Metadata Element Set, Version 1.1: Reference Description

Dublin Core Qualifiers Namespace: [DCMES Qualifiers (200007-11)] Dublin Core Qualifiers

Renardus Namespace: [RMES version 0.1, 2001-04-30]
Renardus Metadata Element Set

Renardus Namespace Qualifiers: [RMES Qualifiers version 0.1,
2001-04-30] Renardus Metadata Element Set
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus AP cont.
“content metadata”
Title and Title.Alternative
title: DCMES: mandatory, not repeatable, language tag
title.alternative: DCMES Qualifiers: optional, repeatable, language tag
Creator
DCMES: strongly recommended, repeatable
RMES Qualifiers (LastName, FirstName): strongly recommended,
repeatable
Description
DCMES: mandatory in text version, repeatable, language tag
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus AP cont.
Subject
DCMES: mandatory, repeatable, language tag
DCMES Qualifiers: strongly recommended, repeatable, language tag
RMES Qualifiers (all other encoding schemes): mandatory, repeatable,
language tag
RMES Qualifiers (Ren-DDC): mandatory, repeatable
Identifier
DCMES Qualifiers: mandatory, repeatable (probably in the pilot system)
RMES Qualifiers: “Operational System” mit Qualifiers “Archive”, “Mirror” ...
Language
DCMES Qualifiers: strongly recommended, repeatable
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus AP cont.
Type
DCMES: strongly recommended, repeatable
DCMES Qualifiers (DCT1): strongly recommended, repeatable
DCMES Qualifiers (DCT2): “Operational System”
Country
RMES Qualifiers: strongly recommended, not repeatable
“administrative metadata”
Full Record URL
RMES Qualifiers: strongly recommended, not repeatable
SBIG ID
RMES Qualifiers: mandatory, not repeatable
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus CLD Schema


Collection Level Description: simple description of
collections, locations and related people or organizations
in Renardus: to provide information about participating
Subject Gateways:

users chose Subject Gateways for thematic search (semiautomatic selection for subject)

well-structured background information (human and machine
readable)

promotion

registry of Subject Gateways
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus CLD Format

Format: based on RSLP Collection Description
(UKOLN):

Dublin Core metadata elements (e.g. DC.Title, DC.Description,
DC.Subject)

RSLP metadata elements (cld.country)

Renardus specific metadata elements (e.g. rencld:acronym,
rencld:subjectNotation, rencld:resourceLanguage etc.)
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus CLD Tool




WWW based form
RDF, RDF/XML, and text encoding
file is saved locally, each partner is able to update his
description at every time
Renardus broker gathers all Subject Gateway
descriptions
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus Technical Approach

PREPARATION

investigation:





formulation of use cases in UML
development of data model



data model
choosing architecture (decentralized vs. centralized)


of available standards and technologies
of functional and user requirements
of service provider requirements
architectural diagram
search/retrieval protocol
common profile (map data model to the protocol Z39.50)

Z39.50 profile, Bath compliant
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Renardus Technical Approach

cont.
IMPLEMENTATION

data normalization





creation of participants Renardus servers (Z39.50, Z'mbol)
implementation of broker software and functionality



encoding RDF/XML (RDF normalizing toolkit)
classification mapping (mapping tool adapted from CARMENx)
CLDs (CLD tool adapted from RSLP)
cross-searching (Zebril and modified EUROPAGATE
simultaneous gateway)
cross-browsing (browsing tool, SQL)
user interface implementation (with use cases)

screen layout (Zebril and HTML, Javascript)
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
DDC Mapping for Cross-Browsing

why subject cross-browsing and classification?

why switching language?



browsing/mapping from DDC to the local systems/browsing
structures
why DDC?

comparison to alternatives

research license, allowed changes
analysis of partners classification systems

types, adaptions, number of levels and classes, subject
overlap
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
DDC Mapping for Cross-Browsing cont.


mapping approaches and issues

mapping methods

mapping between classes, not between individual resources

priorities: e.g. only well used classes are mapped

recommendations for local improvements
mapping relationships

fully equivalent, narrower and broader equivalent, major and
minor overlap

reuse for retrieval result clustering
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
DDC Mapping for Cross-Browsing cont.


technical solution

sources: local classifications, CORC Web Dewey

mapping tool adapted from CARMENx (MySQL, PHP, Javascript)

syntax of the mapping information

creation of the browsing pages
usage of the DDC mapping in Renardus

„browse and jump“

why not virtual browsing?

DDC classification search (in advanced search)

user interface solutions
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
MSC 2000
MSC 2000
DDC
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
DDC
DDC Mapping for Cross-Browsing cont.


future

recommendations for subject access efforts in gateways and
brokers

multilingual access to the DDC top-levels

automatic mapping (and classification) as support

owners should take over for sustainable mapping
documentation

DDC mapping report (D7.4)

practical mapping guidelines (D7.4)

paper at IFLA Satellite Conf., August 2001
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Outlook

June 2001: Public Deliverable WP 6, D6.5





June 2001: Beta-Version of Renardus broker



Renardus Application Profile
Renardus Namespaces
Renardus Collection Level Description
DDC Mapping
first DDC mapping results
first evaluations of broker will start
November 2001

Renardus Workshop for future participating Subject Gateways
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
URLs & References

Renardus http://www.renardus.org




SUB Renardus - http://renardus.sub.uni-goettingen.de/ (also with D7.4)
News Digest SIGN-UP Form - http://www.renardus.org/news/sign-up.html
Evaluation of existing data models (D6.1) http://www.renardus.org/deliverables/d6_1/docframe.htm
DCMI Dublin Core Metadata Initiative - http://www.dublincore.org/




Dublin Core Metadata Element Set, Version 1.1: Reference Description http://www.dublincore.org/documents/dces/
Dublin Core Qualifiers - http://www.dublincore.org/documents/dcmesqualifiers/
DCMI Agents Working Group - http://www.dublincore.org/groups/agents/
DCMI Type Working Group - http://www.dublincore.org/groups/type/
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
URLs & References

RSLP Collection Description - http://www.ukoln.ac.uk/metadata/rslp/



CLD Collection Level Description - http://ukoln.ac.uk/metadata/cld/
RSLP Collection Description Tool http://www.ukoln.ac.uk/metadata/rslp/tool/
Subject Gateways (Traugott Koch): Online Information Review, Vol. 24,
Number 1, 2000
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Cross-Search

basic index: Title, Description, Subject

field search:

Title

Creator (in DC Simple and later on in RMES Qualifiers)

Description

DDC Captions (also cross-browsable!)

Subject (in future: several encoding schemes for keyword and
classification systems of partners)

Type
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Filter Options

Type

DCMI Type 1 (mapping of partners‘ document types to Dublin
Core Type 1)

in future also meaningful: mapping to Sub Type List of DCMI?

Probably no Renardus specific type list

Language (of resources and languages of metadata =
Language Tag)

Country
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
Sorting

Title (alphabetic sorting)

in future: Type, Language, Country? (central architecture)

Subject: Ren-DDC Classification


mapping relation (fully equivalent, narrower equivalent, broader
equivalent, major overlap, minor overlap)
in discussion:

Subject - Keywords: sorting after subject indexing group:
controlled vocabulary versus free keywords, but problematic!
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch
ELAG 2001, Prague 6-8 June 2001
Neuroth & Koch