Creating Topic Maps

Download Report

Transcript Creating Topic Maps

O NTO PED IA
The Identity of Everything
Creating Topic Maps
+ Topic Maps and Knowledge Organization
Steve Pepper
[email protected]
Oslo University College, 2007-09-15
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Course agenda







Week 37 – 09-08
Week 38 – 09-15
Week 39 – 09-22
Week 42 – 10-13
Week 43 – 10-20
Week 46 – 11-10
Week 48 – 11-24

Introduction to Topic Maps – Part 1
Creating a topic map
Introduction to Topic Maps – Part 2
Ontology-driven editing
The machinery of Topic Maps
(Semantic Web)
(Ontologies)
Terminology:
–
Topic Maps: The technology and the standard
–
topic maps: The artefacts (documents) we create
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Today’s agenda


Quick recap: basic concepts and building blocks
Topic Maps and Knowledge Organization
–

Interchange syntaxes
–

Metadata, taxonomies, thesauri, faceted classification
XTM, LTM and CTM
Demo: Creating a topic map using LTM
–
Pay close attention...
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Recap: Core concepts
A pool of information or data, and
a knowledge layer consisting of
•
Topics
–
•
composed by
Associations
–
•
composed by
a set of topics representing the key
subjects of the domain in question
representing relationships between
subjects
Tosca
Puccini
born in
Occurrences
–
links to information that is somehow
relevant to a given subject
Madame
Butterfly
Lucca
knowledge
information
= The TAO of Topic Maps
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Recap: Basic building blocks


Basic building blocks are
–
Topics: e.g. “Puccini”, “Lucca”, “Tosca”
–
Associations: e.g. “Puccini was born in Lucca”
–
Occurrences: e.g. “http://www.opera.net/puccini/bio.html
is a biography of Puccini”
Each of these constructs can be typed
–
Topic types: “composer”, “city”, “opera”
–
Association types: “born in”, “composed by”
–
Occurrence types: “biography”, “street map”, “synopsis”
www.ontopedia.net
O NTO PED IA
The Identity of Everything
Topic Maps and Knowledge
Organization
Keywords & controlled
vocabularies
Taxonomies, thesauri &
classifications
Indexes & glossaries
Ontologies
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Bibliographic languages

Work language
–
Author language
–
Title language
–
Edition language
–
Subject language



Classification language
Index language

Work languages
–

Document languages
–
Document language
–
Production language
–
Carrier language
–
Location language
Svenonius, Elaine (2000):
The Intellectual Foundation of
Information Organization.
Cambridge, MA: MIT Press (p.54)

“Work languages describe information
entities, their intellectual (as opposed to
physical) attributes, and relationships
among them.” (p.87)
”A document is a particular space-time
embodiment of information: a document
language describes and provides access
to this embodiment.” (p.107)
Subject languages
–
“A subject language is used to depict
what a document is about.” (p.127)
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Two perspectives

Works have tended to be conflated with documents
–

So in practice there have been two kinds of language
Document languages
–
–
describe the work and its manifestations
document-centric (or resource-centric), e.g.



document metadata (Dublin Core)
bibliographic records (MARC)
Subject languages
–
–
describe the subject space in which the work exists
subject-centric, e.g.



thesauri, taxonomies (ICD)
classification schemes (LCSH, DDC)
faceted classification (Colon)
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Metadata

“Data about data”
–
–

Useful for managing the content
–

Especially suitable for librarians
Somewhat useful for searching
–

Information about documents
e.g. author, title, publisher, date, format,
keywords
Especially for experts
Less useful for end-users
–
–
–
the user starts out wanting to know more
about a subject
traditional metadata, however, focuses
on the document
if aboutness is provided at all, it gets
squeezed into a single field
Title:
Author:
Date:
Format:
Keywords:
Creating Topic Maps
Steve Pepper
2007-09-13
appl/ppt
topic maps, syntax,
knowledge organization
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Keywords



Primitive form of subject-based classification
–
The keywords are used to describe the subject
–
Cheap and simple… Folksonomies and tagging.
But also problematic because authors
–
misspell keywrods,
–
use different keywords/terms/tags for the same thing, and
–
use keywords that make no sense
Secondary problem
–

No way for the user to find out what keywords have been used
A keyword is a topic name
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Controlled vocabularies

Solution: create a list of legal keywords!
–

Benefits
–


Requires somewhere to keep the list, and a process for new terms
Solves problems of misspelling and duplicates (synonyms)
Disadvantages
–
Introduces some overhead (a flat list is difficult to manage)
–
Users can still search using the wrong terms
–
Users (and authors) still have difficulty finding terms
A controlled vocabulary is a well-defined set of topics
with one name per topic
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Taxonomies

Organize the keywords into a tree
–
Most general at the top, more specific further down
–
Common structure used by Yahoo!, etc.
–
The folder metaphor



file systems, email, favourites
Requires relationships between terms
–
Relationships state that one term is more specific
than another
–
Advantage: terms somewhat easier to find
–
Disadvantage: real world does not fit neatly into a hierarchy
A taxonomy is a set of topics related through a
specific type of hierarchical association
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Thesauri

Like a taxonomy, but with some extensions
–

Relationship types:
–
–
–
–

Also better defined: there are ISO standards for thesauri
BT
USE
RT
SN
Broader term
Preferred term
Related term
Scope note
NT Narrower term
UF Non-preferred terms
A thesaurus is a set of topics related through particular,
predefined association types
–
–
–
BT/NT (hierarchical) and RT (untyped, associative)
(Scope notes are a kind of occurrence)
(USE and UF represent multiple names for the same concept/topic)
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Faceted classification


Invented by S. R. Ranganathan in the 1930s
–
Defines a number of facets or dimensions
–
Defines a set of terms within each facet
–
Sometimes these terms are arranged in a taxonomy
–
Documents are classified against each facet separately
A faceted classification is a collection of topic
“hierarchies”
–
Each “hierarchy” contains topics whose names are used as terms
within a particular facet
–
XFML: An XML interchange syntax for faceted classification
inspired by Topic Maps
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Expressivity progression

Topic maps
–


multiple vocabularies, taxonomies or thesauri (one per facet)
more formal taxonomy; still no topic types; two association types
Taxonomy
–

fixed model
Thesauri
–

use any types, properties, and relationships you like
Faceted classification
–
open model
terms arranged in a hierarchy; no topic types; single association type
Controlled vocabulary, folksonomies
–
no model
just a list of terms; no topic types; no associations
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Document-centric approaches


Traditional metadata is document-centric
–
Provides substantial descriptive power for documents
–
Allows connection into subject-based classification
–
Crucial for the management of content
–
However, users are most interested in the subjects
Taxonomies, thesauri, and faceted classification are
also document-centric
–
These are methods for subject-based classification
–
They provide hardly any descriptive power for subjects
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Subject-centric approaches



Topic maps are subject-centric
–
They provide great descriptive power for subjects
–
Good as finding aids, because subjects are what users care about
Documents can be treated as subjects
–
This enables topic maps to capture metadata as well
–
It also enables topic maps to stitch metadata and subject-based
classification together into one seamless whole
Topic Maps is the knowledge model par excellence:
–
A subject-centric knowledge model that encompasses every other
kind of knowledge organization model
–
Topic Maps can therefore be used to relate and combine
taxonomies, indexes, thesauri, classifications, etc. etc.
www.ontopedia.net
O NTO PED IA
The Identity of Everything
Syntaxes
XTM, LTM and CTM
What are they?
When should I use which?
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Maps Syntaxes




HyTM (HyTime Topic Maps)
–
Original syntax, expressed in terms of SGML and HyTime
–
No longer part of ISO 13250
XTM (XML Topic Maps Syntax)
–
Later, XML-based syntax, recently moved to version 2.0
–
Easy to understand but very verbose
LTM (Linear Topic Map Notation)
–
Defined by Ontopia in 2001 and supported by other products
–
A simple ASCII syntax for rapid prototyping
CTM (Compact Topic Maps Syntax)
–
ISO standard replacement for LTM
–
Complete draft exists, but no implementations yet
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Map – XTM 1.0 Syntax
<!ELEMENT topicMap
( topic | association | mergeMap )* >
<!ATTLIST topicMap
id
ID
#IMPLIED
xmlns
CDATA #FIXED 'http://www.topicmaps.org/xtm/1.0/'
xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink'
xml:base
CDATA #IMPLIED >
<?xml version="1.0" encoding="ISO-8859-1"?>
<topicMap
xmlns="http://www.topicmaps.org/xtm/1.0/"
xmlns:xlink="http://www.w3.org/1999/xlink"
>
<!-- topics, associations, and mergeMap elements go here -->
</topicMap>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Map – LTM Syntax
/* topics, associations, and occurrences go here */
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic – XTM 1.0 Syntax
<!ELEMENT topic
( instanceOf*, subjectIdentity?, ( baseName | occurrence )* )
>
<!ATTLIST topic
id ID #REQUIRED
>
<topic id="italy">
...
</topic>
<topic id="puccini">
...
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic – LTM Syntax
[topic-id]
[italy]
[puccini]
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Name – XTM 1.0 Syntax (1 of 2)
<!ELEMENT baseName ( scope?, baseNameString, variant* ) >
<!ATTLIST baseName
id
ID
#IMPLIED >
<!ELEMENT baseNameString
<!ATTLIST baseNameString
id
ID
<!ELEMENT variant
<!ATTLIST variant
id
( #PCDATA ) >
#IMPLIED >
( parameters, variantName?, variant* ) >
ID
<!ELEMENT variantName
<!ATTLIST variantName
id
ID
>
#IMPLIED >
( resourceRef | resourceData ) >
#IMPLIED
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Name – XTM 1.0 Syntax (2 of 2)
<topic id="la-boheme">
<baseName>
<baseNameString>La Bohème</baseNameString>
<variant>
<parameters>
<subjectIndicatorRef
xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#sort"/>
</parameters>
<variantName>
<resourceData>Bohème, La</resourceData>
</variantName>
</variant>
</baseName>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Name – LTM Syntax
[topic-id = basename; sortname?; dispname?]
[la-boheme = ”La Bohème"; "Bohème, La"]
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Type – XTM 1.0 Syntax
Use <instanceOf> subelement
<topic id="opera">
...
</topic>
<topic id="tosca">
<instanceOf>
<topicRef xlink:href="#opera"/>
</instanceOf>
</topic>
<topic id="boito">
<instanceOf>
<topicRef xlink:href="#composer"/>
</instanceOf>
<instanceOf>
<topicRef xlink:href="#librettist"/>
</instanceOf>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic Type – LTM Syntax
[topic-id : topic-type]
[tosca : opera]
[boito : composer librettist]
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Occurrence – XTM 1.0 Syntax
Use <occurrence> subelement:
external/internal resources: <resourceRef> or <resourceData>
<!ELEMENT occurrence
( instanceOf?, scope?, ( resourceRef | resourceData ) )
>
<!ATTLIST occurrence
id ID #IMPLIED
>
<topic id="la-boheme">
<occurrence>
<instanceOf><topicRef xlink:href="#homepage"/></instanceOf>
<resourceRef
xlink:href="http://www.opera.it/Opere/La-Boheme/La-Boheme.html"/>
</occurrence>
<occurrence>
<instanceOf><topicRef xlink:href="#premiere-date"/></instanceOf>
<resourceData>1896 (1 Feb)</resourceData>
</occurrence>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Occurrence – LTM Syntax
{topic-id, occurrence-type, [URL | data]}
{la-boheme, homepage,
"http://www.opera.it/Opere/La-Boheme/La-Boheme.html"}
{la-boheme, premiere-date, [[1896 (1 Feb)]]}
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic – Complete XTM 1.0 Syntax
<topic id="la-boheme">
<instanceOf><topicRef xlink:href="#opera"/></instanceOf>
<baseName>
<baseNameString>La Bohème</baseNameString>
<variant>
<parameters>
<subjectIndicatorRef
xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#sort"/>
</parameters>
<variantName><resourceData>Boheme, La</resourceData></variantName>
</variant>
</baseName>
<occurrence>
<instanceOf><topicRef xlink:href="#homepage"/></instanceOf>
<resourceRef
xlink:href="http://www.opera.it/Opere/La-Boheme/La-Boheme.html"/>
</occurrence>
<occurrence>
<instanceOf><topicRef xlink:href="#premiere-date"/></instanceOf>
<resourceData>1896 (1 Feb)</resourceData>
</occurrence>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Topic – Complete LTM Syntax
[la-boheme : opera = "La Bohème"; "Boheme, La” ]
{la-boheme, homepage,
"http://www.opera.it/Opere/La-Boheme/La-Boheme.html"}
{la-boheme, premiere-date, [[1896 (1 Feb)]]}
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Association – XTM 1.0 Syntax
<!ELEMENT
<!ATTLIST
id ID
<!ELEMENT
<!ATTLIST
id ID
<!ELEMENT
association (instanceOf?, scope? , member+)>
association
#REQUIRED>
member (roleSpec?, (topicRef | ...)+) >
member
#IMPLIED>
roleSpec (topicRef | ...) >
<association>
<instanceOf><topicRef xlink:href="#composed-by"/></instanceOf>
<member>
<roleSpec><topicRef xlink:href="#composer"/></roleSpec>
<topicRef xlink:href="#puccini"/>
</member>
<member>
<roleSpec><topicRef xlink:href="#work"/></roleSpec>
<topicRef xlink:href="#tosca"/>
</member>
</association>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Association – LTM Syntax
assoc-type ( role-player, role-player, ... )
composed-by( puccini , tosca )
Note 1: There can be more than two role-players in an association. We’ll talk about that next
week.
Note 2: The above is an oversimplification due to the fact that we have not yet talked about
role types. We’ll do that next week.
The exact syntax should be as follows:
assoc-type ( role-player : role-type,
role-player : role-type, ... )
composed-by( puccini : composer, tosca : work )
When omitted, the role type will be assumed to be identical to the type of the role-playing topic.
This can be a useful short-hand and we will use it for now, but it is not always what you want...
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Subject Identity – XTM 1.0 Syntax
<!ELEMENT topic (instanceOf*, subjectIdentity?,...)>
<!ELEMENT subjectIdentity (resourceRef?, (topicRef | subjectIndicatorRef)*) >
<!– Refer to a resource as subject: -->
<topic id="foo">
<subjectIdentity>
<resourceRef xlink:href="http://www.ontopia.net"/>
</subjectIdentity>
<baseName>
<baseNameString>The Ontopia Website</baseNameString>
</baseName>
</topic>
<!– Refer to a subject indicator: -->
<topic id="bar">
<subjectIdentity>
<subjectIndicatorRef xlink:href="http://www.ontopia.net/about.html"/>
</subjectIdentity>
<baseName>
<baseNameString>Ontopia</baseNameString>
</baseName>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Subject Identity – LTM Syntax
[topic-id = names %subject-address-URL]
[topic-id = names @subject-indicator-URL]
/* Refer to a resource as subject: */
[foo = "The Ontopia Website" %"http://www.ontopia.net" ]
/* Refer to a subject indicator: */
[bar = "Ontopia" @"http://www.ontopia.net/about.html"]
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Scope – XTM 1.0 Syntax
<!-- "scope" subelements on baseName, occurrence, and association
(also "parameters" on variantName) -->
<topic id="composed-by">
<baseName>
<baseNameString>composed by</baseNameString>
</baseName>
<baseName>
<scope><topicRef xlink:href="#composer"/></scope>
<baseNameString>composer of</baseNameString>
</baseName>
</topic>
<topic id="la-boheme2">
<baseName>
<baseNameString>La Bohème (Leoncavallo)</baseNameString>
</baseName>
<baseName>
<scope><topicRef xlink:href="#leoncavallo"/></scope>
<baseNameString>La Bohème</baseNameString>
</baseName>
</topic>
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Scope – LTM syntax
(name or occurrence or association)
/ scoping-topic(s)
[born-in = "composed by"
= "composer of" / composer ]
[la-boheme1 = "La Bohème (Puccini)"
= "La Bohème" / puccini ]
[la-boheme2 = "La Bohème (Leoncavallo)"
= "La Bohème" / leoncavallo ]
www.ontopedia.net
O NTO PED IA
The Identity of Everything
Demo: Creating a topic map
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Home assignment
1.
Prerequisites
–
You have installed Java and the OKS Samplers
–
You know the basics of LTM

2.
http://www.ontopia.net/download/ltm.html
Create your first topic map
–
Decide what domain you want to cover
–
Write LTM in a text editor (Notepad, TextPad, emacs, ...)
–
Keep it in its own directory
–
Copy to .../apache-tomcat/webapps/omnigator/WEB-INF/topicmaps
for testing in the Omnigator
–
Use Reload function
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Your own topic map

Choose something that
really interests you
–


It’s much more fun than
something boring!
Some ideas:
–
Sport (football, cricket, ...)
–
Culture (music, film, literature,
theatre, ...)
–
Study courses
–
Project management
–
Conference website
–
Languages
–
Geography
This first topic map is your
own personal one
–

The next one will be a group
project for term assessment
Requirements:
–
Minimum 4 topic types, 4
association types, 4
occurrence types
–
Minimum 10 topics, 20
associations, 10 occurrences
–
Send to
[email protected] by
Monday 29 September
www.ontopedia.net
O NTO PE D IA
The Identity of Everything
Next lecture



Monday September 22
Same time, same place
Agenda
–
Advanced features (roles, scope, identity, reification)
–
Help with home assignment
www.ontopedia.net