A New Standard for Controlled Vocabularies - NKOS

Download Report

Transcript A New Standard for Controlled Vocabularies - NKOS

Standards for Controlled
Vocabularies
1. U.S. Standard (NISO Z39.19)
2. British Standard (BS 8723)
3. IFLA Guidelines
Marcia Lei Zeng, Kent State University
7th NKOS Workshop, JCDL2005, Denver
I. U.S. Standard for
Controlled Vocabularies
– NISO Z39.19
NISO Z39.19-200x Guidelines for the
Construction, Format, and Management of
Monolingual Controlled Vocabularies
Some of the slides are based on
Emily Fayen 2004.6 SLA presentation &
Margie Hlava’s talk at 2005 DadaHarmony User Group meeting
A little bit history…




ANSI/NISO Z39.19,Guidelines for the Construction,
Format, and Management of Monolingual Thesauri –
1993
The most frequently requested NISO Standard
In spite of its age the Standard is still relevant
1999: NISO Workshop on Electronic Thesauri
http://www.niso.org/news/events_workshop/thes99rpt.html

2002: NISO initiates revision of Z39.19
3
Scope






Expand beyond thesaurus
Make more user-friendly
Explain important concepts
Explain principles of vocabulary control
Include electronic information environment
Include additional user search methods:





Browse
Navigate
Keyword searching
Expand beyond A & I services
Include Web applications
4
The Team:














Vivian Bliss – Microsoft
Carol Brent – ProQuest
John Dickert – DTIC
Lynn El-Hoshy – Library of Congress
Marjorie Hlava – Access Innovations
Stephen Hearn – ALA
Sabine Kuhn – Chemical Abstracts Service
Pat Kuhr – H.W. Wilson Company
Diane McKerlie – DMA Consulting
Peter Morville -- Semantic Studios
Stuart Nelson – National Library of Medicine
Allan Savage – National Library of Medicine
Diane Vizine-Goetz – OCLC
Marcia Lei Zeng – Special Libraries Association
5
Z39.19 Chapters
Content
1 Introduction
2 Scope
3 Referenced Standards
4 Definitions, Abbreviations, and Acronyms
5 Controlled Vocabularies – Purpose, Concepts,
Principles, and Structure
6 Term Choice, Scope, and Form
7 Compound Terms
8 Relationships
9 Displaying Controlled Vocabularies
10 Interoperability
11 Construction, Testing, Maintenance, and
Management Systems
6
What’s new?
Coverage






documents
Types of vocabularies




Thesauri
Post-coordinated
Printed formats
Monolingual vocabularies
Coverage
Types of vocabularies






Content objects
lists, synonym rings,
taxonomy
Pre-coordinated
Web format
Multilingual vocabularies
(general)
Interoperability
Facet analysis
7
Principles of Controlled
Vocabularies

There are four important principles of
vocabulary control that guide their
design and development.
• eliminating ambiguity
• controlling synonyms
• establishing relationships among terms
where appropriate
• testing and validation of terms
8
Type of vocabulary control
9
Lists
A list is a simple group of terms
Example:
Alabama
Alaska
Arkansas
California
Colorado
....
Frequently used in Web site pick lists
and pull down menus
10
11
Source: The J. Paul Getty Museum's implementation of The Museum System
software by Gallery Systems
Synonym Rings
A synonym ring is a list of synonyms or near synonyms that
are used interchangeably for retrieval purposes
13
Synonym Rings
-- Examples
Synonym rings are usually
found as sets of lists that
allow users to access all
content containing any of
the terms.
-- Frequently used in
systems where the content
is not indexed or the
indexing vocabulary is not
controlled
e.g., cholesterol:
Cholesterol
Blood Cholesterol
Serum Cholesterol
Good Cholesterol
Bad Cholesterol
LDL
.
.
.
14
An example from International SEMATECH;
a search for Silicon would look like this:
Your search was submitted as “SILICON” or “SI”
15
Synonym Rings are used-


Synonym rings are used to expand queries for
content objects.
 If a user enters any one of these terms as a query to
the system, all items are retrieved that contain any of
the terms in the cluster.
Synonym rings are often used in systems where the
underlying content objects are left in their unstructured
natural language format,
 the control is achieved through the interface by
drawing together similar terms into these clusters.
Synonym rings are used in conjunction with search
engines and provide a minimal amount of control of
the diversity of the language found in the texts of the
underlying documents.
16
Taxonomies
A taxonomy is a set of preferred terms, all
connected by a hierarchy or polyhierarchy
Example:
Chemistry
Organic chemistry
Polymer chemistry
Nylon
Frequently used in web navigation systems
17
Thesauri
A thesaurus is a controlled vocabulary with
multiple types of relationships
Example:
Rice
UF paddy
BT Cereals
BT Plant products
NT Brown rice
RT Rice straw
18
Thesauri (cont.)
Relationship types:
 Use/Used For – indicates preferred term
 Hierarchy – indicates broader and narrower terms
 Associative – almost unlimited types of relationships
may be used
It is the most complex format for controlled
vocabularies and widely used.
19
Interoperability

One of the most important issues from the
1999 workshop

Question: How to



compare indexes
perform searches
merge databases that have been developed
using different controlled vocabularies?
20
Interoperability (CONT.)








Factors Affecting Interoperability
Multilingual Controlled Vocabularies
Searching
Indexing
Merging Databases
Merging Controlled Vocabularies
Achieving Interoperability
Storage and Maintenance of Relationships
among Terms in Multiple Controlled
Vocabularies
21
Review and Comments

http://www.niso.org
Ballot period: April 11, 2005 - May 25, 2005

Current voting status:

YES: 40
 NO: 0
 ABSTAIN: 4
(as of June 5, 2005)

22
II. The British Standard
BS 8723: Structured Vocabularies for
Information Retrieval – Guide
Slides based on the presentation by
Stella G Dextre Clarke
Alan Gilchrist
Leonard Will
In ISKO 2004, London
Existing thesaurus standards

ISO 2788-1986 Guidelines for the
establishment and development of
monolingual thesauri
= BS 5723:1987

ISO 5964-1985 Guidelines for the
establishment and development of multilingual
thesauri
= BS 6723:1985
24
What needs updating?



Printed versus electronic application
Guidance on management software
Interoperability:



Mapping between thesauri and other types of
vocabulary
Formats/protocols for data exchange with
downstream applications
Applicability to end-user applications, not just
those for information professionals
25
Outline of new standard
BS 8723: Structured vocabularies for information
retrieval – Guide





Part 1 - Definitions, symbols and abbreviations
Part 2 – Thesauri
Part 3 - Vocabularies other than thesauri;
Part 4 - Interoperability between vocabularies
Part 5 - Interoperation between vocabularies and
other components of information storage and
retrieval systems
26
Part 3 chapters






Classification schemes
Subject heading lists
Taxonomies
Ontologies
Semantic nets (?)
Search thesauri
27
Issues for Part 3



How much guidance is needed on how to
build other sorts of vocabulary?
Should we describe the idiosyncrasies of
existing schemes, even where we judge there
is a ‘better’ way?
To provide a basis for Part 4, Part 3 should
pick out the characteristics of different
vocabulary types that govern when and how
you can map them. But some of the
observable characteristics might not be what
we’d recommend. What to do?
28
Part 4: Interoperability
between vocabularies



Huge demand for accessing information that has
been indexed with another language and/or
vocabulary. The buzzword is ‘Mapping’. The
Semantic Web is just one application.
Part 4 to include multilingual thesauri as a special
case of mapping between vocabularies.
Part 4 applies to situations in which more than one
language or vocabulary is in use, but access to all
resources is needed through the one vocabulary
chosen by the user.
29
Part 4: Interoperability
between vocabularies (cont.)


BS 8723 part 4 has a wider scope than BS 6723,
which was concerned only with multilingual
thesauri.
It covers all of the previous ground and extends
the scope to:




thesauri in different dialects of one language
different thesauri in a single language
situations where a thesaurus interoperates with one or
more different types of structured vocabulary, such as
classification schemes
situations where not all the interoperating vocabularies
have the same status and/or function.
30
Part 5: Interoperability with
applications


Vocabularies must work with
 Search engines
 Content Management Systems
 Web publishing software, etc.
Build on existing formats and protocols for
data exchange
 e.g. Z39.50 and Zthes, XML schema?
DTD? MARC? SKOS Core Schema?
Topic Map? ADL gazetteer protocol?
Anything else?
31
Review and Comments

Request a copy for Part 1 and 2:


Parts 1 and 2 numbered 04/30086620 DC
and 04/30094113 DC.
The documents may be ordered from BSI
Customer Services
 tel +44(0)208-996-9001 or
 email [email protected]
32
III. IFLA Guidelines for
Multilingual Thesauri
IFLA Classification and Indexing
Section
April 2005 released for comments
IFLA Classification and Indexing Section
WG on Guidelines for Multilingual Thesauri


Chair: Gerhard J.A. Riesthuis (Netherlands)
Members:








Lois Mai Chan (USA),
Patrice Landry (Switzerland),
Pia Leth (Sweden),
Ia McIlwaine (United Kingdom),
Martin Kunz (Germany),
Dorothy McGarry (USA),
Max Naudi (France),
Marcia Lei Zeng (USA)
34
Three approaches in the development
of multilingual thesauri:
building a new thesaurus from the bottom up
1.


combining existing thesauri
2.


3.
starting with one language and adding another language or
languages
starting with more than one language simultaneously
merging two or more existing thesauri into one new
(multilingual) information retrieval language to be used in
indexing and retrieval
linking existing thesauri and subject heading languages to
each other; using the existing thesauri and/or subject
heading languages both in indexing and retrieval
translating a thesaurus into one or more other languages
35
Semantic problems
Semantic problems pertain to equivalence relations
between terms used as preferred and non-preferred
terms in information retrieval languages.



Equivalence relations exist not only within each
separate language involved, but also between the
languages (intra-language equivalence and interlanguage equivalence).
Intra-language homonymy and inter-language
homonymy are also considered semantic questions.
Additional problems pertaining to semantics involve the
scope, form and choice of thesaurus terms.
36
Structural problems
Structural problems involve hierarchical and
associative relations between the terms.
An important question in this respect is whether the
structure should be the same or different for each
language.




In most if not all cases of linking, the structure will most
probably not be the same in all the information retrieval
languages involved.
In the other approaches mentioned it is possible in
principle to apply the same structure to all languages.
37
Contents covered by the
guidelines
Building multilingual thesauri starting from scratch



Structure
Morphology and Semantics
Starting from existing thesauri



Merging
Linking
Glossary
Appendix:



An example of a non-symmetrical thesaurus
38
Examples are in multiple
languages
English (British)
English (USA)
Dutch
French
cranes (birds)
cranes (lifting
equipment)
cranes (birds)
cranes (lifting
equipment)
grue (oiseau)
grue (appareil de
levage)
water taps
gas taps
taps
water faucets
gas faucets
faucets
NT water taps
NT gas taps
NT water faucets
NT gas faucets
kraanvogels
hijskranen
SN voor andere
typen kranen, zie
aldaar
waterkranen
gaskranen
kranen
SN voor kranen als
hijswerktuig
gebruik hijskranen
NT waterkranen
NT gaskranen
robinet à eau
robinet à gaz
robinet
NT robinet à eau
NT robinet à gaz
Cranes is a homograph in English does not necessarily mean that equivalent terms in
other languages are also homographs. The Dutch term kranen is a homograph too,
39
but with the meanings cranes (lifting equipment) and taps.
World-Wide Review
Invitation to: World-Wide Review of
IFLA Guidelines for Multilingual Thesauri
 Comments due by July 31, 2005

URL:
http://www.ifla.org/VII/s29/wgmt-invitation.htm
 Contact me at:
[email protected]
40