Terminology and Ontology Management Systems

Download Report

Transcript Terminology and Ontology Management Systems

Terminology and Ontology
Management Systems
Dr. W. Ceusters
CTO
Language & Computing nv
www.landc.be
1/39
Overview
• Terms and definitions
• The Semantic Web: sense and nonsense
• L&C’s approach to Semantics Assisted
Knowledge Management, concentrating on:
– “formal” ontology management
– relationship with language
– Software support
• Examples in healthcare
• Conclusions
www.landc.be
2/39
“Terminology”
• “set of terms representing the system of concepts
of a particular subject field.”
(ISO 1087:1990)
• “a theory, i.e. the set of premises, arguments and
conclusions required for explaining the
relationships between concepts and terms.”
(Sager 1990)
www.landc.be
3/39
“Ontology”
• In Information Science:
– “An ontology is a description (like a formal
specification of a program) of the concepts and
relationships that can exist for an agent or a
community of agents.”
(Tom Gruber)
• In Philosophy:
– “Ontology is the science of what is, of the kinds and
structures of objects, properties, events, processes
and relations in every area of reality.”
(Barry Smith)
www.landc.be
4/39
Ontology through the ages
• 350 BC:
– Aristotle
• ‘first philosophy’ a ‘metaphysics’ a ‘Ontology’
• 1613:
– Rudolf Göckel (Goclenius) Lexicon philosophicum
– Jacob Lorhard (Lorhardus) Theatrum philosophicum
• 1721:
– Bailey’s dictionary defines ontology as ‘an Account of being
in the Abstract’.
• 1964:
– Ingarden
• ‘ontology’ = the study of what might exist
• ‘metaphysics’ = the study of which of the various alternative ontologies
www.landc.be proffered is true of reality.
5/39
Terminology, Ontology and Logic
• “Ontologies are not limited to conservative
definitions, that is, definitions in the traditional
logic sense that only introduce terminology and
do not add any knowledge about the world.”
(Herbert Enderton, 1972)
• Main additional requirement:
– one needs to state axioms that do constrain the
possible interpretations for the defined terms
www.landc.be
6/39
Ontology and Language
• “The subject of ontology is the study of the
categories of things that exist or may exist in
some domain. The product of such a study,
called an ontology, is a catalog of the types of
things that are assumed to exist in a domain of
interest D from the perspective of a person who
uses a language L for the purpose of talking
about D.”
(John Sowa)
www.landc.be
7/39
From buzz-word to the “O-word”
• “An ontology is a classification methodology for formalizing a
subject's knowledge or belief system in a structured way.
Dictionaries and encyclopedias are examples of ontologies.”
(X1)
• “A terminology (or classification) is a kind of ontology by
definition and it should preserve (and "understand") the
relationships between the 1,000s of terms in it or else it would
become a mere dictionary (or at best a thesaurus).”
(X2)
• “Ontologies are Web pages that contain a mystical unifying force
that gives differing labels common meaning.”
(X3)
www.landc.be
8/39
Amen !!!
• “Give folks a loose standard and the first thing
many of them do is exploit its weaknesses for
their personal gain.”
NICHOLAS PETRELEY
Computerworld
• “Give folks a loose standard and the first thing
the clever ones do is exploit the ignorance of the
others for their personal gain.”
WERNER CEUSTERS
(in a vicious mood)
www.landc.be
9/39
"Where there is the sound
of a blow, there is respect”
(Pashtun proverb)
• “I repeatedly get confused by the (in my opinion
structurally confusing) terminology of those
people (like Y) who try to do ontology but end up
just studying concepts.”
(X, pers. comm.)
www.landc.be
10/39
The basics are indeed confusing enough
Conceptualisation
Universal
Real world
Particular
Guarino
www.landc.be
Concept
Universal
Nth-order
Universal
Particular
Set
Individual
Individual
Hegel
Smith
Instance
“Conceptualists”
11/39
IT or philosophy: does it matter ?
• Does what I see exist ?
– hallucinations, illusions, ... The Matrix
• What is the relationship between me, my life and
my body ?
• If X IS-A Y, does it need to be a Y ?
– Cfr. Y = “person” versus Y = “nurse”
• If X stops to be a Y, does it stop being ?
– Cfr. Y = “person” versus Y = “nurse”
www.landc.be
12/39
Does it matter ?
The answer is YES as many
philosophical questions have
proven to be the only way to build
clean ontologies
www.landc.be
13/39
Tim Berners Lee:
“I had a dream “
www.landc.be
14/39
Ontologies and the semantic web
• The goal of the Semantic Web is to make it
possible for software to find the data it needs on
the Web, understand it, cross-reference it and
apply it to a particular task.
• “I should be able to tell my Web-enabled
handheld device to schedule an appointment with
a dentist within 20 miles of home and let the
computer do the rest.”
(X3)
www.landc.be
15/39
If it were just that simple ...
• “I should be able to tell my Web-enabled handheld device to
schedule an appointment with a dentist within 20 miles of home
and let the computer do the rest.”
• So the SW must understand natural language ?
• So the SW must know when the requester is free ?
• So the SW must understand that it is to take care
of the requester’s teeth, and not to have a nice diner
date ?
• So the SW must understand where the requester
lives ?
• So the SW can then deduce what the actual length
of “20 miles” is for this particular person ?
www.landc.be
R
R
R
R
R
16/39
The solution ...
• Build one common ontology.
• Use precise, unambiguous terms to name the
concepts in the ontology.
• Annotate webpages by using this ontology.
• Train people in using the terms in the same sense
as understood by the ontology.
... is an extremely naïve solution !
www.landc.be
17/39
This is (a piece) of the reality ...
• Computers don’t understand natural language
(yet).
• Web pages are in free text.
• Manual ontological mark-up of web pages is
unfeasable.
• No single, common ontology will ever exist !
• Nobody can make humans to use terms always
and ever in the same way.
www.landc.be
18/39
Pray your computer isn’t Irish ...
X:
Y:
X:
Y:
X:
Y:
X:
Y:
“Hallo stranger, you appear to be traveling?”
“Yes, I always travel when on a journey.”
“And pray, what might your name be?”
“It might be Sam Patch, but it isn't.”
“Have you been long in these parts?”
“Never longer than at present—5 feet 9.”
“Do you get anything new?”
“Yes, I bought a new whetstone this morning.”
Copyright © 1996 Electronic Historical Publications
www.landc.be
19/39
L&C’s approach to
Semantics Assisted
Knowledge Management
www.landc.be
20/39
Mission of L&C nv
We hereby declare
... To provide
users and developers
of systems for
knowledge management
with tools and services
for efficient and accurate
data-entry and retrieval by
exploiting the full power of
automated (medical) natural
language understanding
www.landc.be
21/39
Employees
40
35
30
25
20
15
10
5
0
Share value (1000 Euro)
R/D ratio
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Anthem Multi-Tale Dome GIU Select C-Care Liquid Mobidev
www.landc.be
22/39
L&C’s integrated approach
NLU enabling tools for
knowledge supported
data-entry and -retrieval
Medical and linguistic
knowledge required for
language understanding
Data structure and
function library for
language understanding
www.landc.be
23/39
L&C’s LinkFactory
Linguistic-semantic Function Library
Storage Functions
T-DEFINE(“méningite”,
french, c-meningitis)
C-DEFINE(c-meningitis,
c-inflammation HAS-LOC c-meninges)
Retrieval Functions
GET-TERMS(c-meningitis, {french, dutch})
“méningite”, “hersenvliesontsteking”
www.landc.be
24/39
Architectural Overview
LinkBase
Database
JD
Ja BC
va
Unix
Workstation
PC
LinkFactory
Server
Mac
RMI
Corba
Soap
LAN
Concept tree
WAN
Internet
Server
Business
Objects
Criteria / Full definitions
Linktype tree
Translate
...
LinkFactory Workbench
www.landc.be
25/39
Client Graphical Objects
www.landc.be
26/39
Build-in Quality Control
• Knowledge entered is immediately used to check
validity of subsequent entries
• Version management
• User-management with :
– Allowed actions based on experience
– Personal audit trail
• Clear and formal separation with 3rd party systems to
avoid copying mistakes such as:
– UMLS’ cyclical ISA relationships
– SNOMED-RT ‘s “very usual = always” modelling
– Most systems’ overloaded hierarchical relations
www.landc.be
27/39
The content
Language A
Proprietary Terminologies
Language
LexiconB
Lexicon
Others ...
Grammar
ICPC
Grammar
SNOMED
Formal Domain
Ontology
ICD
Cassandra Linguistic
Ontology
www.landc.be
MEDRA
28/39
Based on formal logics
HASOVERLAPPING
-REGION
HASPARTIALSPATIALOVERLAP
ISSPATIAL
-PARTOF
ISPROPERSPAT.PART-OF
HAS-DISCRETEDREGION
HASSPATIAL
-PART
HASPROPERSPATIAL
-PART
HASDISCONNECTEDREGION
HASEXTERNALIS-NONCONNECTINGTANG.ISREGION
SPAT.TANG.IS- HAS-NON- HASPART-OF
SPAT.- SPAT.- TANG.- TANG.PART-OF EQUIV.- SPAT.SPAT.OF
PART
PART
www.landc.be
HAS-SPATIALPOINTREFERENCE
HASCONNECTINGREGION
ISIS-PARTLYIN-CONVEX- INSIDECONVEXISHULL-OF
HULL-OF
OUTSIDECONVEXHULL-OF
ISIS-GEOINSIDE- TOPOINSIDEOF
OF
29/39
Example: joint anatomy
• joint HAS-HOLE joint space
• joint capsule IS-OUTER-LAYER-OF joint
• meniscus
– IS-INCOMPLETE-FILLER-OF joint space
– IS-TOPO-INSIDE joint capsule
– IS-NON-TANGENTIAL-MATERIAL-PART-OF
joint
• joint
– IS-CONNECTOR-OF bone X
– IS-CONNECTOR-OF bone Y
• synovia
– IS-INCOMPLETE-FILLER-OF joint space
www.landc.be
• synovial membrane IS-BONAFIDEBOUNDARY-OF joint space
30/39
Linguistic and domain ontologies
Generalised Possession
Haspossessor
Human
1
2
IS-A
IS-A
1
Healthcare phenomenon
Haspossessed
1
Having a healthcare phenomenon
2
Is-possessor-of
Patient
3
IS-A
IS-A
www.landc.be
4
Has-Healthcare3 phenomenon
Patient at risk
Patient at risk
for osteoporosis
Is-RiskFactor-Of
IS-A
4
Risk Factor
IS-A
Has-Healthcarephenomenon
Risk factor for
osteoporosis
IS-A
Is-RiskFactor-Of
Osteoporosis
31/39
Linking external ontologies
MESH-2001 : “Seizures”
Snomed-RT : “Convulsion”
ISA
IS-narrower-than
MESH-2001 : “Convulsions”
Has-CCC
Snomed-RT : “Seizure”
Has-CCC
Has-CCC
Has-CCC
L&C : Health crisis
IS-A
L&C : Seizure
L&C : Convulsion
IS-A
www.landc.be
IS-A
IS-A
L&C : Epileptic convulsion
32/39
Status of LinkBase per 01-12-2002
•
•
•
•
•
920.000 (850.000)
2.300.000
320
3.000.000
300.000
concepts
terms
link-types
link instances
links to 3rd party systems
• But:
– Never finished !
– Quality sufficient for current applications
www.landc.be
33/39
Linguistic Application Components
Processor
Domain representation
Text
Result
Linguistic
Knowledge
Task
Knowledge
Goal representation
www.landc.be
34/39
Some available components
•
•
•
•
Coding tools: FastCode
Semantic indexers: Tessi
Spell checkers and type ahead: FastType
Semi controlled language parsers in restricted
domains: FreePharma
• Ontology browser
• Stochastic dependency-based indexer: C-Link
• (Ir)relevant document classifier for very low
prevalence data sets
www.landc.be
35/39
Automated application building
Formal representation of
Classification system
Mapping data
Coding
data
Domain+Linguistic
ontology
LinCBase
LinCFactory
www.landc.be
FastCode
Generator
FastCode server
FastCode client
36/39
Ontologies for Semantic document management
LinkFactory
User query
Topic List
Document
Domain
ontology
Topic
assignment
Q-analyser
Q-matcher
FastCode
FastType
QBuilder
TeSSI
indexing
Index
Document(s)
retrieved
www.landc.be
Document
collection
37/39
Key principles of success
Clean separation of knowledge (adapted from A. Rector)
but with close interoperability (W. Ceusters)
• Conceptual knowledge: the knowledge of sensible domain
concepts
• Knowledge of definitions and criteria: how to determine
if a concept applies to a particular instance
• Surface linguistic knowledge: how to express the concepts
in any given language
• Knowledge of classification and coding systems: how an
expression has been classified by such a system
• Pragmatic knowledge: what users usually say or think,
what they consider important, how to integrate in software
www.landc.be
38/39
Conclusion
• Traditional approaches to knowledge management are
insufficient
• Formal terminologies provide:
– better QA methods for developing “semantics aware” systems , especially
for multi-lingual use
– better ways to have them used by machines rather than people
• Formal ontologies are candidates to become the new
pilars in IT when a number of criteria are satisfied
– Accept language as a medium of communication, but be independent of any
specific language
– Multi-lingual
– Domain-oriented
– Supported by a methodology, services and tools
www.landc.be
• They are not a goal, but a means !
39/39