Languages for aboutness  Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.

Download Report

Transcript Languages for aboutness  Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.

Languages for aboutness

Indexing languages:
– Terminological tools
• Thesauri (CV – controlled vocabulary)
• Subject headings lists (CV)
• Authority files for named entities (people,
places, structures, organizations)
– Classification
– Keyword lists
– Natural language systems (broad
interpretation)
1
Subject Analysis

What something is about?
– What the content of an object is “about”?

Different methods (Wilson, 1968)
– Counting (objective method)
– Purposive method
– Method appealing to unity
– What stands out

Challenges
– Non-text
2
Aboutness: How to do it!

Read the document [Intellectual
reading]
– look for key features
– many indexers mark up the items
– rarely have time to read the whole document


Determine aboutness [Conceptual
analysis]
Translate aboutness into the vocabulary
or scheme you are using
– In general: Subject headings: 1-3 headings
– Descriptors, 5-8 descriptors
– Classification: 1 notation (should it only be
one!?).
3
Features of indexing languages:



Involve rules and require maintenance
Can be generated via automatic, human,
or auto-human processes
Different processes generally display
different strengths and weaknesses.
4
Features of indexing languages:

With the exception of a few general
domain tools, they are generally domain
specific.
– MeSH
– NASA Thesaurus
– Astronomy Thesaurus
– ERIC thesaurus
http://www.darmstadt.gmd.de/~lutes/thesoecd.html

Concepts (or concept representations)
are arranged in a discernable order
5
Language schema designs

Classified--grouping
– Hierarchies and facets
MeSH Browser
http://www.nlm.nih.gov/mesh/MBrowser.html
Art and Architecture (Getty AAT)
http://www.getty.edu/research/conducting_research/vocabularies/aat/

Alphabetical -- horizontal
– Verbal/Alphabetical (ordering/filing challenges)
6
Controlled Vocabulary

A list or a database of subject terms in
which each concept has a preferred
terms or phrase that will be used to
represent it in the retrieval tool; the
terms not used have references
(syndetic structure), and often scope
notes.
7
Thesaurus (structured thesaurus)




Lexical semantic relationships
Composed of indexing
terms/descriptors
Descriptors = representations of
concepts
Concepts = Units of meaning
(Svenonius)
8
Thesaurus






Preferred terms
Non-preferred terms
Semantic relations between terms
How to apply terms (guidelines, rules)
Scope notes
Adding terms (How to produce terms
that are not listed explicitly in the
thesaurus)
9
Preferred Terms

Control form of the term
• Spelling, grammatical form
• Theatre / Theater
• MLA / Modern language association

Choose preferred term between
synonyms
• Brain cancer or Brain Neoplasms?
10
Common thesaural identifiers

SN
Scope Note
– Instruction, e.g. don’t invert phrases
 USE Use (another term in preference to
this one)




UF
BT
NT
RT
Used For
Broader Term
Narrower Term
Related Term
11
Semantic Relationships



Hierarchy
Equivalence
Association
12
Hierarchies of Meaning
‘Beer Glass’
‘White wine glass’
‘Glass’
‘Wine Glass’
‘Red wine glass’
From: Controlled Vocabularies/ Paul Miller Interoperability Focus UKOLN
13
Hierarchy


Level of generality – both preferred
terms
BT (broader term)
– Robins BT Birds

NT (narrower term)
– Birds NT Robins
– Inheritance, very specific rules
14
Equivalence



When two or more terms represent the
same concept
One is the preferred term (descriptor),
where all the information is collected
The other is the non-preferred and
helps the user to find the appropriate
term
15
Equivalence

Non-preferred term USE Preferred term
– Nuclear Power USE Nuclear Energy
– Periodicals USE Serials

Preferred term UF (used for) Non-preferred
term
– Nuclear Energy UF Nuclear Power
– Serials UF Periodicals
16
Association




One preferred term is related to
another preferred term
Non-hierarchical
“See also” function
In any large thesaurus, a significant umber
of terms will mean similar things or cover
related areas, without necessarily being
synonyms or fitting into a defined hierarchy
17
Association

Related Terms (RT) can be used to
show these links within the thesaurus
– Bed RT Bedding
– Paint Brushes RT Painting
– Vandalism RT Hostility
– Programming RT Software
18
Thesauri Guides



National Information Standards Organization.
(1993). Guidelines for the construction, format,
and management of monolingual thesauri.
ANSI/NISO Z39.19-1993. Bethesda, MD: NISO
Press.[SILS reference Z695.N36 1994 or
http://www.niso.org/standards/resources/z3919.pdf]
Aitchison, Jean & Gilchirist, Alan. Thesaurus
Construction: A Practical Guide. 3rd ed. London:
Aslib, 1997.
Willpower Information Management Consultants
http://www.willpower.demon.co.uk/thesprin.htm
19
Thesauri Directory

Indexing Resources on the WWW
– http://www.slais.ubc.ca/resources/indexing/databa
se1.htm
– -- explore ASIST Thesaurus

Controlled vocabularies
– http://sky.fit.qut.edu.au/~middletm//cont_voc.html

Web Compendium
– http://www.darmstadt.gmd.de/~lutes/thesauri.html
20
Thesauri/Keywords

according to
standards
Z39.19 (Ansi)
Subject Heading
Lists

Single

Rules and guidelines
“Thesaurification”
Created
term
concepts/postcoordination
 “Wireless network” &
“home computer”
 “Terrorism” “Attacks”
& “United States”
popular in the online
environment
Lend to recall
Lend to multilingual
environment

multi-word concepts/precoordination
 “Wireless home
computer network”
 $y Terrorism
attacks $z United
States
More

STRINGS

Lend to precision
21