In pursuit of interoperability: Can we standardize mapping types?

Download Report

Transcript In pursuit of interoperability: Can we standardize mapping types?

In pursuit of interoperability: Can we standardize mapping types?

Stella G Dextre Clarke Project Leader, ISO NP 25964

Overview

    Compare mapping types used in some well known projects: MACS; CrissCross; RENARDUS; KoMoHe and in Doerr’s well-cited paper on Semantic problems of thesaurus mapping And in 3 standards: BS 8723-4, SKOS and the forthcoming ISO 25964-2 Ask how feasible it is to achieve standardization

MACS Project

     Context: enabling multilingual access to collections indexed with different vocabularies Vocabularies are all subject heading schemes All mappings are considered equivalence Equivalence can be simple or compound Two types of compound equivalence:  Heading A = Heading B OR Heading C  Heading A = Heading B AND Heading C

CrissCross Project

     Context: improving access to vocabularies and heterogeneously indexed collections (in one natural language) One-way mappings From a subject headings scheme to a classification scheme Many mappings from one keyword “Degrees of determinacy” rather than distinct mapping types – D1, D2, D3, D4

RENARDUS Project

   Context: search/browse across gateways using different classification schemes One-way mappings, from DDC to local schemes Five mapping types:  fully equivalent   broader or narrower equivalent major or minor overlap

GESIS/KoMoHe

      Context: distributed search across systems using 25 different vocabularies (thesauri and classification schemes) (Separate) mappings in both directions Three basic mapping types:    Equivalence Hierarchical Associative Also there is an explicit “null relationship” Any mapping can be one-to-one or one-to-many Every mapping can have a “relevance rating” of high, medium or low.

Doerr’s findings

(see http://journals.tdl.org/jodi/article/view/31/32)      Context: query transformation is assumed to be the main application of mappings All the vocabularies discussed are thesauri, applied to documents and/or museum collections Basic types of mapping are:     exact equivalence inexact equivalence broader equivalence narrower equivalence Exact, broader and narrower equivalence can be simple or compound Compound equivalence means a Boolean expression of target terms using AND, OR or NOT (but in practice no examples are given using NOT).

BS 8723-4

      Provides for mapping search terms or index terms Emphasis on thesauri, although other vocabulary types are taken into account Basic mapping types: equivalence; hierarchical, associative Hierarchical subdivides into broader/narrower Equivalence subdivides into simple/compound Degrees of equivalence (such as exact, inexact, partial) are discussed but not formalised as distinct types other than those described above.

SKOS (Simple Knowledge Organization System) data model

    Context is sharing/linking KOSs via the Web SKOS development began with thesauri, but has extended to classification schemes, subject heading schemes, etc.

Basic mapping “properties” (skos:mappingRelation):     skos:closeMatch  skos:exactMatch skos:relatedMatch (symmetric) (symmetric, transitive) (symmetric) skos:broadMatch (inverse of narrowmatch) skos:narrowMatch (inverse of broadmatch) No provision for compound mappings

ISO 25964-2 (still in draft)

     A revision of ISO 2788 and ISO 5964 as well as BS 8723 Provides for mapping search terms or index terms Emphasis on thesauri, although other vocabulary types are taken into account Basic mapping types: Equivalence Hierarchical Associative “Inexact” can apply to any mapping, but most commonly to equivalence

ISO 25964-2 (still in draft)

     A revision of ISO 2788 and ISO 5964 as well as BS 8723 Provides for mapping search terms or index terms Emphasis on thesauri, although other vocabulary types are taken into account Basic mapping types: Equivalence Laptop computers EQ Notebook computers Hierarchical Roads NM Streets; Streets BM Roads Associative Journals RM Magazines “Inexact” can apply to any mapping, but most commonly to equivalence Horticulture ~EQ Gardening

ISO 25964-2 mapping types

  Basic mapping types: Equivalence Hierarchical Associative “Inexact” can apply to any mapping, but most commonly to equivalence

ISO 25964-2 mapping types in more detail

  Basic mapping types: Equivalence Simple Compound Intersecting compound equivalence Cumulative compound equivalence Hierarchical Broader Narrower Associative “Inexact” can apply to any mapping, but most commonly to equivalence, including compound equivalence

ISO 25964-2 equivalence mappings in more detail

  Simple Laptop computers EQ Notebook computers Compound  Intersecting compound equivalence Women executives EQ Women + Executives  Cumulative compound equivalence Inland waterways EQ rivers | canals

Intersecting versus cumulative equivalence

Women executives EQ Women + Executives Inland waterways EQ rivers | canals women executives women executives rivers canals inland waterways

Some key messages re compound equivalence

  If you use mappings for conversion of index terms, you implement intersecting equivalents quite differently from cumulative equivalents.

With simple equivalence (exact or inexact) and with hierarchical or associative mappings, two-way conversions are usually OK; but compound equivalence typically works in one direction only.

Inexact: another complication for equivalence mappings

    Simple Laptop computers EQ Notebook computers Compound   Intersecting compound equivalence Women executives EQ Women + Executives Cumulative compound equivalence Inland waterways EQ rivers | canals Inexact simple equivalence Lawns ~EQ Turf Inexact compound equivalence Women executives ~EQ Females + Managers

Major/minor overlap: yet another complication

      Found useful in Renardus project Is there a parallel with the KoMoHe “relevancy rating”?

Earlier versions of SKOS allowed “majorMatch” and “minorMatch”; these were subsequently deprecated It would apply to inexact equivalence; maybe also to hierarchical and associative mappings?

How would you judge it in cases of compound equivalence?

A recent draft of ISO 25964 admits major/minor as an optional attribute of inexact equivalence, in the context of a particular application.

Now we come to the crunch: Can we standardize these mapping types?

We can certainly write them in a standards document, but can we make them stick? Will real users implement them according to the guidance rules in the standard?

To make a standard stick:

      Keep it simple Address a real need Adopt rules that are already broadly accepted in the user community Keep it within the implementation range of available software Make the standard available easily and free – or at least at a low price Commit to lifelong maintenance

Want a copy of ISO 25964-2 ?

    A draft is due to appear in January 2011, “ISO DIS 25964-2”, with the hope of attracting comments from potential users The official way to get it is through your national standards body (e.g. DIN) Distribution policies vary from one country to another; last time round we found a way to make the draft available online free of charge and free of passwords, on the BSI site.

Send me an email and I’ll alert you when the DIS is released. [email protected]

References (abbreviated)

        MACS: Landry, Patrice. Multilingual subject access: the linking approach of MACS . Cataloging & Classification Quarterly. 2004; 37(3/4):177-191 CrissCross: http://linux2.fbi.fh-koeln.de/crisscross/swd-ddc-mapping_en.html

RENARDUS: http://www.mpdl.mpg.de/staff/tkoch/publ/preifla-final.html

KoMoHe: http://www.gesis.org/en/research/programs-and-projects/knowledge technologies/project-overview/komohe/ Doerr: http://journals.tdl.org/jodi/article/view/31/32 SKOS: http://www.w3.org/TR/skos-reference/ BS 8723-4:2007 Structured vocabularies for information retrieval Guide - Interoperability between vocabularies Institution ISO 25964-2 (still in draft). . British Standards Thesauri and interoperability with other vocabularies – Part 2: Interoperability with other vocabularies