Metadata & Controlled Vocabularies: What Are They and What

Download Report

Transcript Metadata & Controlled Vocabularies: What Are They and What

Guidelines and Principles for
Developing Search and Browse
Vocabularies
May 31, 2003
Rice University
Houston, TX
Amy J. Warner, PhD
[email protected]
Epicurious.com
2
Navigation/Taxonomy
Vehicle Brands
Cars
MR2Spider
Celica
Celica Brochure
Matrix
Avalon
Camry Solarus
Camry
Camry Brochure
Prius
Corolla
ECHO
SUVs/Vans
Land Cruiser
Sequoia
` 4 Runner
Sienna
Highlander
RAV4
Trucks
Tundra
Tacoma
Vehicle Parts
Vehicle Accessories
Carriers
Bicycle Carriers
Ski Carriers
Roof Racks
Splash Guards
Security Systems
Tires
˚
˚
Engines & Transmissions
˚
˚
3
Synonym Rings
Cholesterol
Blood Cholesterol
Serum Cholesterol
Good Cholesterol
Bad Cholesterol
LDL
.
.
.
4
Medline
5
MeSH & UMLS
6
Controlled Vocabulary Defined
 A subset of natural language.
 A list of preferred and (sometimes)variant terms.
 With semantic relationships (hierarchical and associative)
(sometimes) defined.
 Used to tag document attributes (describe facets).
–
–
–
–
Topic / Subtopic
Audience
Language
Form
 Or can be used to create labeling scheme for navigation.
7
Cornerstones of Vocabulary Control
 Use unambiguous labels/search terms.
 Make distinctions among labels/search terms
clear.
 Make choices about wording and specificity of
labels/search terms based on user testing and on
size of collection.
 Use other semantic relationships (hierarchical,
associative) if necessary to organize large lists of
labels/search terms.
8
Continuum of Vocabulary Control
Less
Synonym
Control
•USE/Used for
relationship
Vehicle crashes
USE Vehicle collisions
Vehicle collisions
UF Vehicle crashes
•Synonym Rings
Vehicle collisions
Vehicle crashes
Crashes
Collisions
More
Hierarchical
Relationships
•Broader/Narrower
Terms
Vehicle collisions
NT Truck collisions
Truck collisions
BT Vehicle collisions
•Browse Categories
Associative
Relationships
•Part/Whole
•Cause/Effect
•etc.
Vehicle parts
RT Vehicles
Vehicles RT
Vehicle parts
Vehicle safety
Truck safety
Truck collisions
Vehicle safety
•Site Index
•Taxonomies
9
Steps in Controlled Vocabulary Construction


Group terms by subject (facet analysis)
Link synonyms and variants.
Synonym Rings
Vehicle collisions
Vehicle crashes
Crashes
Collisions

Identify broader and narrower terms.
Taxonomies / Hierarchies

Identify related terms.
Thesauri
10
Purposes of Standard
 Base choices on ‘best practice’.
 Base choices on known principles.
 Foster interoperability.
11
Current NISO Thesaurus Standard
 Guidelines for the construction, format, and
management of monolingual thesauri: Z39.191993.
 Not a technical standard, but a set of guidelines.
 Emphasizes search thesauri.
 Emphasizes postcoordinate retrieval.
 Used mainly for abstracting and indexing
services.
 Does not put the standard in context.
12
Why Revise
 Not revised since 1993.
 Number of downloads high, reflecting interest.
 Does not take the web environment into account.
– Navigation schemes are controlled vocabularies too.
– Is out of date in terms of computing technology in general:
• Software for managing thesauri has advanced.
• Software for leveraging thesauri though an interface has
advanced.
 Currently little attention paid to user testing.
13
Term forms
 Currently
– Emphasizes rigid rules for grammatical form.
– Emphasizes short phrases as terms.
 Suggested revision
– Loosen rules on grammatical form.
– Allow for longer, more complex phrases.
 Rationale
– Software can perform automatic stemming.
– Navigation schemes are more precoordinate.
14
Semantic Relationships
 Current standard
– Only accounts for explicit equivalence relationships.
– Hierarchical relationship only allowed for genus-species
relationship, with a few exceptions.
– Associative relationship only allowed across categories.
 Proposed revision
– Provide guidelines for choosing unambiguous labels.
– Provide guidelines for loose, browse categories.
 Rationale
– Labeling schemes and pick lists often do not account for
explicit synonymy relationships.
– Hierarchical navigation schemes need to be less rigid.
15
Browse Categories
16
Usability Testing
 Current standard
– Discusses users but does not include guidelines for testing
with users.
 Proposed revision
– Provide guidelines for open card sort testing of high level
categories.
– Provide guidelines for closed card sorting of term groups
under high level categories.
 Rationale
– User testing important consideration for choose terms and
term relationships.
17
Display
 Current standard
– Emphasizes print copies of thesauri.
– Screen display section oriented toward display of
print copy.
 Proposed revision
– Oriented more toward displays of vocabularies
that only exist in digital format.
 Rationale
– Most web vocabularies do not have print
counterparts.
18
Interoperability
 Current standard
– Does not address issues associated with
interoperability
 Proposed revision
– Will address major issues and problems
associated with interoperability, including multiple
languages
 Rationale
– Being able to share information within and among
organizations
19
Construction and Maintenance
 Current standard
– Emphasizes maintenance problems in print vocabularies.
– Discusses software that manages stand-alone vocabularies.
 Proposed revision
– Advance standards for changing, adding, deleting terms
automatically.
– Provide guidance for software that is connected to
information retrieval systems.
 Rationale
– Software has advanced significantly.
20
Process for Revising Standard





Appoint editor.
Appoint advisory group.
Draft revision.
Discuss drafts with advisory group.
Vote on final draft by NISO board.
21
Editor & Advisory Group















Amy Warner, lexonomy.com
Vivian Bliss, Microsoft
Carol Brent, ProQuest
John Dickert, U.S. DoD
Lynn El-Hoshy, Library of Congress
Emily Fayen, SDC liaison
Patricia Harpring, Getty
Stephen Hearn, American Library Association
Sabine Kuhn, American Chemical Society/Chemical Abstracts
Pat Kuhr, H.W. Wilson
Diane McKerlie, Design Strategy
Peter Morville, Semantic Studios
Stuart Nelson, National Library of Medicine
Diane Vizine-Goetz, OCLC
Marcia Lei Zeng, Special Libraries Association
22
Progress to Date
 Agreement on scope of revision.
 Agreement that guidelines should be placed in context.
 Agreement that guidelines should be educational as well
as prescribing best practice.
 Agreement that guidelines should be forward looking in
terms of new technologies.
 Agreement to write guidelines for elements and features
that all vocabularies have in common, then consider their
differences.
 Survey conducted to determine use of standard, other
standards, software.
23
Other Players
 Communication with editor of British Standard.
 Communication and work with W3C to address
issues of implementation of controlled
vocabularies.
24
Relationship with Semantic Web and OWL
 Semantic Web is an ontological framework.
 Both terms in the ontology and the relationships between
them are standardized using OWL (Web Ontology
Language).
 Both the terms and the relationships are ‘deep’
semantically.
 This is a structure into which ‘shallower’ terms provided by
using Z39.19 could be inserted.
 This would enhance interoperability because although we
would not have complete agreement on vocabularies, we
would have agreement on an effective structure for
exchanging them.
25
Contact Me
Amy J. Warner
[email protected]
www.lexonomy.com
26