Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.
Download ReportTranscript Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities.
Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities (people, places, structures, organizations) – Classification – Keyword lists – Natural language systems (broad interpretation) 1 Subject Analysis What something is about? – What the content of an object is “about”? Different methods (Wilson, 1968) – Counting (objective method) – Purposive method – Method appealing to unity – What stands out Challenges – Non-text 2 Aboutness: How to do it! Read the document [Intellectual reading] – look for key features – many indexers mark up the items – rarely have time to read the whole document Determine aboutness [Conceptual analysis] Translate aboutness into the vocabulary or scheme you are using – In general: Subject headings: 1-3 headings – Descriptors, 5-8 descriptors – Classification: 1 notation (should it only be one!?). 3 Features of indexing languages: Involve rules and require maintenance Can be generated via automatic, human, or auto-human processes Different processes generally display different strengths and weaknesses. 4 Features of indexing languages: With the exception of a few general domain tools, they are generally domain specific. – MeSH – NASA Thesaurus – Astronomy Thesaurus – ERIC thesaurus http://www.darmstadt.gmd.de/~lutes/thesoecd.html Concepts (or concept representations) are arranged in a discernable order 5 Language schema designs Classified--grouping – Hierarchies and facets MeSH Browser http://www.nlm.nih.gov/mesh/MBrowser.html Art and Architecture (Getty AAT) http://www.getty.edu/research/conducting_research/vocabularies/aat/ Alphabetical -- horizontal – Verbal/Alphabetical (ordering/filing challenges) 6 Controlled Vocabulary A list or a database of subject terms in which each concept has a preferred terms or phrase that will be used to represent it in the retrieval tool; the terms not used have references (syndetic structure), and often scope notes. 7 Thesaurus (structured thesaurus) Lexical semantic relationships Composed of indexing terms/descriptors Descriptors = representations of concepts Concepts = Units of meaning (Svenonius) 8 Thesaurus Preferred terms Non-preferred terms Semantic relations between terms How to apply terms (guidelines, rules) Scope notes Adding terms (How to produce terms that are not listed explicitly in the thesaurus) 9 Preferred Terms Control form of the term • Spelling, grammatical form • Theatre / Theater • MLA / Modern language association Choose preferred term between synonyms • Brain cancer or Brain Neoplasms? 10 Common thesaural identifiers SN Scope Note – Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF BT NT RT Used For Broader Term Narrower Term Related Term 11 Semantic Relationships Hierarchy Equivalence Association 12 Hierarchies of Meaning ‘Beer Glass’ ‘White wine glass’ ‘Glass’ ‘Wine Glass’ ‘Red wine glass’ From: Controlled Vocabularies/ Paul Miller Interoperability Focus UKOLN 13 Hierarchy Level of generality – both preferred terms BT (broader term) – Robins BT Birds NT (narrower term) – Birds NT Robins – Inheritance, very specific rules 14 Equivalence When two or more terms represent the same concept One is the preferred term (descriptor), where all the information is collected The other is the non-preferred and helps the user to find the appropriate term 15 Equivalence Non-preferred term USE Preferred term – Nuclear Power USE Nuclear Energy – Periodicals USE Serials Preferred term UF (used for) Non-preferred term – Nuclear Energy UF Nuclear Power – Serials UF Periodicals 16 Association One preferred term is related to another preferred term Non-hierarchical “See also” function In any large thesaurus, a significant umber of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy 17 Association Related Terms (RT) can be used to show these links within the thesaurus – Bed RT Bedding – Paint Brushes RT Painting – Vandalism RT Hostility – Programming RT Software 18 Thesauri Guides National Information Standards Organization. (1993). Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO Z39.19-1993. Bethesda, MD: NISO Press.[SILS reference Z695.N36 1994 or http://www.niso.org/standards/resources/z3919.pdf] Aitchison, Jean & Gilchirist, Alan. Thesaurus Construction: A Practical Guide. 3rd ed. London: Aslib, 1997. Willpower Information Management Consultants http://www.willpower.demon.co.uk/thesprin.htm 19 Thesauri Directory Indexing Resources on the WWW – http://www.slais.ubc.ca/resources/indexing/databa se1.htm – -- explore ASIST Thesaurus Controlled vocabularies – http://sky.fit.qut.edu.au/~middletm//cont_voc.html Web Compendium – http://www.darmstadt.gmd.de/~lutes/thesauri.html 20 Thesauri/Keywords according to standards Z39.19 (Ansi) Subject Heading Lists Single Rules and guidelines “Thesaurification” Created term concepts/postcoordination “Wireless network” & “home computer” “Terrorism” “Attacks” & “United States” popular in the online environment Lend to recall Lend to multilingual environment multi-word concepts/precoordination “Wireless home computer network” $y Terrorism attacks $z United States More STRINGS Lend to precision 21