Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.
Download
Report
Transcript Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.
Languages for aboutness
Indexing languages:
– Terminological tools
• Thesauri (CV – controlled vocabulary)
• Subject headings lists (CV)
• Authority files for named entities (people,
places, structures, organizations)
– Classification
– Keyword lists
– Natural language systems (broad
interpretation)
1
Subject Analysis
What something is about?
– What the content of an object is “about”?
Different methods (Wilson, 1968)
– Counting (objective method)
– Purposive method
– Method appealing to unity
– What stands out
Challenges
– Non-text
2
Aboutness: How to do it!
Read the document [Intellectual
reading]
– look for key features
– many indexers mark up the items
– rarely have time to read the whole document
Determine aboutness [Conceptual
analysis]
Translate aboutness into the vocabulary
or scheme you are using
– In general: Subject headings: 1-3 headings
– Descriptors, 5-8 descriptors
– Classification: 1 notation (should it only be
one!?).
3
Features of indexing languages:
Involve rules and require maintenance
Can be generated via automatic, human,
or auto-human processes
Different processes generally display
different strengths and weaknesses.
4
Features of indexing languages:
With the exception of a few general
domain tools, they are generally domain
specific.
– MeSH
– NASA Thesaurus
– Astronomy Thesaurus
– ERIC thesaurus
http://www.darmstadt.gmd.de/~lutes/thesoecd.html
Concepts (or concept representations)
are arranged in a discernable order
5
Language schema designs
Classified--grouping
– Hierarchies and facets
MeSH Browser
http://www.nlm.nih.gov/mesh/MBrowser.html
Art and Architecture (Getty AAT)
http://www.getty.edu/research/conducting_research/vocabularies/aat/
Alphabetical -- horizontal
– Verbal/Alphabetical (ordering/filing challenges)
6
Controlled Vocabulary
A list or a database of subject terms in
which each concept has a preferred
terms or phrase that will be used to
represent it in the retrieval tool; the
terms not used have references
(syndetic structure), and often scope
notes.
7
Thesaurus (structured thesaurus)
Lexical semantic relationships
Composed of indexing
terms/descriptors
Descriptors = representations of
concepts
Concepts = Units of meaning
(Svenonius)
8
Thesaurus
Preferred terms
Non-preferred terms
Semantic relations between terms
How to apply terms (guidelines, rules)
Scope notes
Adding terms (How to produce terms
that are not listed explicitly in the
thesaurus)
9
Preferred Terms
Control form of the term
• Spelling, grammatical form
• Theatre / Theater
• MLA / Modern language association
Choose preferred term between
synonyms
• Brain cancer or Brain Neoplasms?
10
Common thesaural identifiers
SN
Scope Note
– Instruction, e.g. don’t invert phrases
USE Use (another term in preference to
this one)
UF
BT
NT
RT
Used For
Broader Term
Narrower Term
Related Term
11
Semantic Relationships
Hierarchy
Equivalence
Association
12
Hierarchies of Meaning
‘Beer Glass’
‘White wine glass’
‘Glass’
‘Wine Glass’
‘Red wine glass’
From: Controlled Vocabularies/ Paul Miller Interoperability Focus UKOLN
13
Hierarchy
Level of generality – both preferred
terms
BT (broader term)
– Robins BT Birds
NT (narrower term)
– Birds NT Robins
– Inheritance, very specific rules
14
Equivalence
When two or more terms represent the
same concept
One is the preferred term (descriptor),
where all the information is collected
The other is the non-preferred and
helps the user to find the appropriate
term
15
Equivalence
Non-preferred term USE Preferred term
– Nuclear Power USE Nuclear Energy
– Periodicals USE Serials
Preferred term UF (used for) Non-preferred
term
– Nuclear Energy UF Nuclear Power
– Serials UF Periodicals
16
Association
One preferred term is related to
another preferred term
Non-hierarchical
“See also” function
In any large thesaurus, a significant umber
of terms will mean similar things or cover
related areas, without necessarily being
synonyms or fitting into a defined hierarchy
17
Association
Related Terms (RT) can be used to
show these links within the thesaurus
– Bed RT Bedding
– Paint Brushes RT Painting
– Vandalism RT Hostility
– Programming RT Software
18
Thesauri Guides
National Information Standards Organization.
(1993). Guidelines for the construction, format,
and management of monolingual thesauri.
ANSI/NISO Z39.19-1993. Bethesda, MD: NISO
Press.[SILS reference Z695.N36 1994 or
http://www.niso.org/standards/resources/z3919.pdf]
Aitchison, Jean & Gilchirist, Alan. Thesaurus
Construction: A Practical Guide. 3rd ed. London:
Aslib, 1997.
Willpower Information Management Consultants
http://www.willpower.demon.co.uk/thesprin.htm
19
Thesauri Directory
Indexing Resources on the WWW
– http://www.slais.ubc.ca/resources/indexing/databa
se1.htm
– -- explore ASIST Thesaurus
Controlled vocabularies
– http://sky.fit.qut.edu.au/~middletm//cont_voc.html
Web Compendium
– http://www.darmstadt.gmd.de/~lutes/thesauri.html
20
Thesauri/Keywords
according to
standards
Z39.19 (Ansi)
Subject Heading
Lists
Single
Rules and guidelines
“Thesaurification”
Created
term
concepts/postcoordination
“Wireless network” &
“home computer”
“Terrorism” “Attacks”
& “United States”
popular in the online
environment
Lend to recall
Lend to multilingual
environment
multi-word concepts/precoordination
“Wireless home
computer network”
$y Terrorism
attacks $z United
States
More
STRINGS
Lend to precision
21