Towards Terminology Services: Reflections from the FACET Project Doug Tudhope Hypermedia Research Unit University of Glamorgan OCLC seminar, April, 2006

Download Report

Transcript Towards Terminology Services: Reflections from the FACET Project Doug Tudhope Hypermedia Research Unit University of Glamorgan OCLC seminar, April, 2006

Towards Terminology Services:
Reflections from the FACET Project
Doug Tudhope
Hypermedia Research Unit
University of Glamorgan
OCLC seminar, April, 2006
Presentation
•
FACET Project
– Faceted Knowledge Organisation Systems (KOS)
– Semantic query expansion
– Web Demonstrator
– Evaluation
– Need for standard representations and API
•
Current work
– Terminology Services
– Pilot KOS web service browser
– Semantic expansion service?
•
Role for KOS in the Semantic Web?
– Need to articulate context/rationale for KOS
– What kind of Semantic Web?
FACET - Faceted Access to Cultural
hEritage Terminology
FACET - a collaborative project investigating the potential of
semantic expansion in retrieval
Aims:
• Integration of thesaurus into search process / interface
• Semantic query expansion
taking advantage of facet structure
http://www.comp.glam.ac.uk/~FACET/
FACET Collaborators
•
Research Council Funding: EPSRC 3 years
•
National Museum of Science and Industry (NMSI):
National Railway Museum and Science Museum Collections Database
•
J. Paul Getty Trust
Art and Architecture Thesaurus (AAT)
•
Museum Documentation Association (MDA)
Railway Thesaurus
•
Canadian Heritage Information Network (CHIN)
Advisors
Semantic Expansion
Expanding over relationships in thesauri and related KOS
allows the system to play an active role
• Ranking of matching results by semantic closeness
• Query Expansion (automatic/interactive)
• Augmented Browsing tools
Underpinning technologies:
• Measures of distance over the semantic index space
• Multi-concept Matching Function
• Immediate application controlled vocabulary indexing
but also relevant free text query expansion
Faceted Knowledge Organisation Systems
Faceted systems based on primary division
into fundamental, high-level categories (facets)
Compound descriptors (multi-concept headings) are synthesised
by combination of terms from limited number of fundamental facets
In constructing AAT, adjectival noun phrases very common:
e.g. painted oak furniture
“Rather than enumerate the nearly infinite number of object and
subject descriptions needed by thesaurus users, the AAT decided to
pursue the building blocks of these descriptors in the form of a faceted
vocabulary”
(Guide to Indexing and Cataloging with the Art & Architecture Thesaurus)
Compound Descriptors and Queries
e.g. painted oak furniture
• Multi-concept subject headings allow highly specific
descriptions and offer promise of precise queries
• However practical focus has tended to be on
cataloguing rather than searching
• Poses problems for recall in retrieval and for browsing.
Full potential yet to be exploited in retrieval
Matching Problem
“The major problem lies in developing a system whereby individual parts of
subject headings containing multiple AAT terms are broken apart, individually
exploded hierarchically, and then reintegrated to answer a query with
relevance”
(Toni Petersen, AAT Director)
eg
Query: mahogany, dark yellow, brocading, Edwardian, armchair
Descriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair
Potentially extra / missing / partially and non-matching terms
Matching Problem
“The major problem lies in developing a system whereby individual parts of
subject headings containing multiple AAT terms are broken apart, individually
exploded hierarchically, and then reintegrated to answer a query with
relevance”
(Toni Petersen, AAT Director)
Query: mahogany, dark yellow, brocading, Edwardian, armchair
focus term
must match after expansion
Descriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver
chair
Potentially extra / missing / partially and non-matching terms
Query expansion (on query as a whole)
FACET Queries with Results
Evaluation and user study with standalone version
• Exploratory
to assess how people search for information and how
thesauri can inform this process.
• Formative
to support further development of the research
prototype.
• Dorothee Blocks PhD Thesis 2004. A qualitative
study of thesaurus integration for end-user searching.
About the Evaluation (from Blocks 2004)
• Qualitative evaluation methodology employed
• Participants were professionals in collaborating
institutions
• 20 sessions totalling 22 hours were conducted
• Each participant was given 3 tasks to complete, e.g.
“Please search the collection
for something similar to the
item in the photograph. Please
try to be specific.”
Some issues from the evaluation
• Initial allocation of functionality
to interface elements
did not support the stages
of the search process
• Breaking down tasks into components from different facets
• Reformulating queries
• Expansion control on individual terms
• Model of controlled vocabulary search process
The complete model diagram (Blocks 2004)
• Setting up the query • Evaluating the success of
the
query
• Executing the query
• Retrieval of results
• Inspecting individual results –
using information can lead
to query reformulation
• Matching user terms to KOS
• Selecting suitable KOS
terms
• Including terms in the query
System Architecture
Compiled VB client interface
and web browser interface
Persistent
XML data:
Queries,
parameters
etc.
Query and
matching
functions
Expansion
engine
(and data
structure)
Application
interfaces
Application
data objects
Database interaction module
Active-X Data Objects (ADO)
Data access
components
Transact SQL
Stored
Procedures
SQL Server Databases collections & thesaurus
Database
Interactive/Automatic Thesaurus Query Expansion
• Statistical IR
Uncontrolled vocabulary, auto-indexing
IQE/AQE – terms added to query
Exact match with probabilistic weighting
Tampere experiments with thesaurus AQE and strongly-structured
queries support faceted approach
Greenberg recent experiments on QE by thesaurus relationships
• FACET
Controlled vocabulary, intellectual indexing
Hybrid I/A QE – user selects terms to expand then AQE
Semantic degree-of-match with faceted queries
FACET Web Demonstrator
• Illustrates thesaurus based expansion and faceted search
• Intended as an exploration of FACET research outcomes
via dynamically generated Web components
rather than a complete final interface
• Based on custom API for thesaurus programmatic access
• Browser-based interface (ASP application), using a combination
of server-side scripting and compiled components
http://www.comp.glam.ac.uk/~FACET/webdemo/
http://jodi.tamu.edu/Articles/v04/i04/Binding/
FACET Web Demonstator
Semantic Query Expansion
Some lessons learned
• Results show potential of faceted KOS for
– Query expansion with semantically ranked results
– Realtime implementation multi-concept matching function
– Semantic expansion as a browsing tool
– Potential combine with statistical and linguistic techniques
How to generalise?
 need for
• Common KOS representations and APIs
Towards Terminology Services
• KOS-based services as elements of applications with some form
of search/indexing component
• Next phase of work looks at common KOS representation
formats and API protocols - making content available via
programmatic interfaces
• Eg SKOS Core (RDF/XML) Schema and SKOS API deliverables
of SWAD-Europe Thesaurus Activity - http://www.w3.org/2001/sw/Europe/reports/thes
• Experiments with XPATH-based KOS interfaces (using XML and
SKOS schemas) promising for relatively small KOS held within
the web browser, e.g. interactive possibilities, such as rollover.
SKOS API
• SKOS Core (RDF/XML) Schema and SKOS API deliverables of
SWAD-Europe Thesaurus Activity - http://www.w3.org/2001/sw/Europe/reports/thes
• SKOS API designed to provide programmatic access to thesauri
and related KOS in SKOS Core – builds on previous NKOS
work on KOS protocols
• Example SKOS API calls
– getConcept (uri)
– getConceptsMatchingKeyword/Regex (string)
– getAllConceptRelatives (concept)
– getSupportedSemanticRelations
– getAllConceptRelatives (concept, relation)
– getAllConceptsByPath (concept, relation, distance)
Pilot KOS Browser Client Web Service
• Developed pilot to work with a remote server as an initial
experiment with the SKOS API, a 'rich client' browser displaying
details for thesaurus concepts via web service calls
• Uses GEMET - GEneral Multilingual Environmental Thesaurus
• DREFT demonstration web services server based on SKOS API
developed at ILRT, Bristol University http://www.w3.org/2001/sw/Europe/reports/thes/dreft/
• Only a subset of SKOS API calls were available at time of work
due to local requirements
So we investigated possibilities with just 2 API calls
Pilot SKOS API Web Service Browser
getConcept
getAllConceptRelatives
show semantically connected
concepts but not relationships
Navigation history and
local cache of retrieved concepts
implemented
API needs more work
but is a possible basis for web services
Current
Caching
Web
Web
service
service
URI
URI
concept
Server
Server
concept
URI
URI
URI
• Thesaurus data
relatively static change unlikely
during a session
URI
concept
concept
concept
concept
concept
Previous
URI
URI
URI
URI
URI
concept
concept
Next
Retrieved
Retrievedconcepts
conceptsare
are
cached
cachedtotoavoid
avoidrepeated
repeated
server
servercalls.
calls.
concept
Concept cache
Navigate to new concept
not previously retrieved
Current
Current
URI
URI
URI
URI
concept
URI
URI
URI
URI
Previous
concept
concept
concept
concept
concept
concept
concept
concept
URI
URI
URI
URI
URI
URI
URI
URI
Next
URI
concept
concept
concept
concept
concept
Previous
concept
concept
URI
URI
URI
URI
URI
Next
concept
concept
Concept cache
Concept cache
Navigate to next
concept in the history
Navigate to previous
concept in the history
• Caching of concepts helps prevent unnecessary repeated
server calls.
• Implementation of concept caching made a significant difference
to apparent speed of operation
Future issues
More complex services as API protocol elements:
• more advanced natural language functionality
• cross-mapping provision
• data-dependent filters (such as number of postings)
• semantic expansion as a service
–
–
–
–
–
different configurations KOS interface displays by single call
novel interfaces, such as navigation via semantic expansion
Query expansion for various ranked result query services
Term suggestion to assist indexing/annotation
More details:
KOS at your Service: Programmatic Access to Knowledge Organisation
Systems http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Binding/
Taxonomy of Knowledge Organisation Systems
Gail Hodge
Term Lists
Authority Files, Glossaries, Gazetteers, Dictionaries
Classification and Categorization
Subject Headings
Classification Schemes and Taxonomies
eg DDC, scientific taxonomies
Relationship Schemes
Thesauri
Semantic Networks (eg WordNet)
(Ontologies)
http://www.clir.org/pubs/abstract/pub91abst.html
Bridge/migration between KOS and Ontologies?
• KOS as elements of higher level ontologies and schemas
– can help leverage them.
• Eg map a thesaurus to a top Ontology
• SKOS RDF/XML Schemas as a possible bridging step
• Ontologies (taken as formal precise definition of relationships)
can be combined with inference rules and logic systems
in applications with well defined objects and operations
But rationale behind KOS not well understood in Semantic Web
How do intended contexts of use compare?
Types of Knowledge Organisation System (KOS)
from Zeng & Salaba: FRBR Workshop, OCLC 2005
Relationship Groups:
Classification &
Categorization:
Term Lists:
Ontologies
Semantic networks
Thesauri
Classification schemes
Taxonomies
Categorization schemes
Subject Headings
Synonym Rings
Authority Files
Glossaries/Dictionaries
Gazetteers
Pick lists
Natural language
Controlled language
Ontology and Information Systems (Barry Smith)
• “Philosophical ontology as I shall conceive it here is what is
standardly called descriptive or realist ontology. It seeks not
explanation but rather a description of reality in terms of a
classification of entities that is exhaustive in the sense that it can
serve as an answer to such questions as: What classes of
entities are needed for a complete description and
explanation of all the goings-on in the universe? “
• Ontological Commitment
“Some philosophers have thought that the way to do
ontology is exclusively through the investigation of
scientific theories. With the work of Quine (1953) there arose
in this connection a new conception of the proper method of
ontology, according to which the ontologist’s task is to establish
what kinds of entities scientists are committed to in their
theorizing. “
Two Types of Ontology Systems (Barry Smith)
• “Perhaps we can resolve our puzzle as to the degree to which
information systems ontologists are indeed concerned to
provide theories which are true of reality – as Patrick Hayes
would claim – by drawing on a distinction made by Andrew
Frank (1997) between two types of information systems
ontology.
• On the one hand there are ontologies – like Ontek’s PACIS and
IFOMIS’s BFO – which were built to represent some preexisting domain of reality. Such ontologies must reflect the
properties of the objects within its domain in such a way that
there obtain substantial and systematic correlations between
reality and the ontology itself.
• On the other hand there are administrative information systems,
where (as Frank sees it) there is no reality other than the one
created through the system itself. The system is thus, by
definition, correct. “
AI Ontology Background (Barry Smith)
• Knowledge Representation Ontologies
growing out of background in:
– “Database Tower of Babel Problem” (e-commerce)
– Modelling of scientific theories (Gene ontology etc)
• AI goal radically extending scope of automation
• “Generally, and in part for reasons of computational efficiency
rather than ontological adequacy, information systems
ontologists have devoted the bulk of their efforts to constructing
concept-hierarchies; they have paid much less attention to
the question of how the concepts represented within such
hierarchies are in fact instantiated in the real world of what
happens and is the case. “
What is an Ontology? (T. Gruber) - http://ksl-web.stanford.edu/people/gruber/
•
“In the context of knowledge sharing, I use the term ontology to mean a
specification of a conceptualization. That is, an ontology is a description
(like a formal specification of a program) of the concepts and relationships that
can exist for an agent or a community of agents.
•
Practically, an ontological commitment is an agreement to use a vocabulary (i.e.,
ask queries and make assertions) in a way that is consistent (but not complete)
with respect to the theory specified by an ontology. We build agents that
commit to ontologies. We design ontologies so we can share
knowledge with and among these agents.
•
A conceptualization is an abstract, simplified view of the world that
we wish to represent for some purpose. Every knowledge base, knowledgebased system, or knowledge-level agent is committed to some
conceptualization, explicitly or implicitly.
•
For AI systems, what "exists" is that which can be represented.
When the knowledge of a domain is represented in a declarative formalism, the
set of objects that can be represented is called the universe of discourse.“
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
Often referred to in Semantic Web literature
Needs to be problematised
Only indirect link via an interpreter
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
(AI) Ontology tends to be …
Instance of scientific concept
Fact in a ‘possible world’
Semiotic Triangle (Ogden and Richards, 1923)
reproduced in Campbell et al. 1998,
Representing Thoughts, Words, and Things in the UMLS
information retrieval (subject) KOS tends to be
Probable relevance
- aboutness
Inter/Intra indexer consistency ?
(eg Bates 1986)
typically a complex entity
KOS - Informal by design?
• KOS designed to assist perceived needs of information retrieval
users rather than modelling a simplified reality of a domain
– basis of (much) KOS construction is intended assistance
in indexing/ searching/browsing and generalised retrieval
as much as logical properties of attributes
– implications:
levels of specialisation
granularity of relationships
• Many KOS by design informal structures
– pragmatic compromises for different uses
– semantic relationships often ‘fuzzy’
• Semantic organisation understood as conventional
– could be otherwise, different viewpoints inevitable
– users assisted to explore and appropriate
Distributed KOS meaningful?
• Meaning of a concept depends on its semantic context within a
KOS (and indexing practice, relevance judgements)
Eg of KOS fragment
(Getty AAT in
FACET Web Demonstrator)
Not necessarily straightforward
• apply KOS concepts out of this context (eg magenta)
• link in to other distributed structures and contexts
• Some ‘open world’ Semantic Web implications problematic?
How to apply KOS?
• What is the purpose of a given KOS?
- we need to specify/articulate more clearly
• Domain dependent level of precision in concept use
Important to take into account how applications will process concepts
Current KOS relationships at a useful level of generality
for many retrieval-based applications (with some specialisation?)
• Cost/benefit issues for KOS applications
in granularity of relationships and degree of formalisation
KOS in what kind of Semantic Web?
• Role for knowledge-based interactive tools
in semantic web applications
(in addition to emphasis on AI machine reasoning)
– Reminiscent of old debates on
appropriate limits to automation
– A balance between system and human ‘agency’
– Expert Systems or … Systems for Experts ?
Smart, interactive tools
allowing scope for tacit knowledge, informal representations
Contact Information
Doug Tudhope
School of Computing
University of Glamorgan
Pontypridd CF37 1DL
Wales, UK
[email protected]
http://www.comp.glam.ac.uk/pages/staff/dstudhope
References
Bates M. 1986. Subject access in online catalogs: a design model, Journal of the American Society for
Information Science, 37(6), 357-376.
Binding C., Tudhope D. 2004. KOS at your Service: Programmatic Access to Knowledge Organisation
Systems. JoDI 4(4), http://jodi.tamu.edu/Articles/v04/i04/Binding/
Blocks D., Cunliffe D. Tudhope D. A reference model for user-system interaction in thesaurus-based
searching. 2006 (in press). Journal of the American Society for Information Science and Technology.
Campbell K., Oliver D., Spackman K., Shortliffe E. 1998. Representing Thoughts, Words, and Things in the
UMLS. Journal of the American Medical Informatics Association, 5 (5), 421-431.
FACET Web demonstrator http://www.comp.glam.ac.uk/~FACET/webdemo/
FACET Xpath browsers http://www.comp.glam.ac.uk/~FACET/formats/
Greenberg J. 2001. Automatic query expansion via lexical-semantic relationships, Journal of the American
Society for Information Science and Technology, 52(5), pp. 402-415.
Gruber T. What is an ontology? http://ksl-web.stanford.edu/people/gruber/
Hendler J. Ontologies on the Semantic Web, In (S. Staab Ed.) Tremds & Controversies, IEEE Intelligent
Systems, 73-74
Järvelin K., Kekäläinen J., Niemi T. 2003. ExpansionsTool: concept-based query extension and
construction”, Information Retrieval, 4(3/4), pp. 231-255
Smith B. 2003. Ontology. In: (L. Floridi (ed.), Blackwell Guide to the Philosophy of Computing and
Information, Oxford: Blackwell, 2003, 155–166. (Longer draft at
http://ontology.buffalo.edu/ontology(PIC).pdf)
Tudhope D., Binding C., Blocks D., Cunliffe D. 2002. Compound Descriptors in Context: A Matching
Function for Classifications and Thesauri. JCDL 2002, 84-93. full paper (pdf)
Tudhope D., Binding C. 2005. Towards Terminology Services: experiences with a pilot web service
thesaurus browser. Proc. International Conference on Dublin Core and Metadata Applications, (DC 2005),
269-273. (version forthcoming in ASIST Bulletin).
Tudhope D., Binding C., Blocks D., Cunliffe D. Query expansion via conceptual distance in thesaurus
indexed collections. 2006 (in press). Journal of Documentation.