Taal, terminologie en ontologie in een medische context

Download Report

Transcript Taal, terminologie en ontologie in een medische context

Language, terminology and
ontology in a medical context:
theory en reality in
industrial applications
Werner CEUSTERS
CTO
Language & Computing nv
www.landc.be
Presentation overview
1. L&C’s current key application
2. Some problems related to existing medical
terminology systems
3. Major reason: a too simplistic interpretation of
the semantic/semiotic triangle
4. L&C’s answer: a cognitive approach to
terminology connecting Aristotelian Realism
with Linguistic Functionalism
5. How this is implemented and used
6. Conclusions and issues for debate
www.landc.be
Mission of L&C nv
We hereby
declare
...
To provide
users and developers
of systems for
knowledge management
with tools and services
for efficient and accurate
data-entry and retrieval by
exploiting the full power of
automated (medical) natural
language understanding
www.landc.be
Employees
Share value (1000 Euro)
50
40
30
20
10
R/D ratio
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
0
Anthem Multi-Tale
www.landc.be
Dome
GIU Select C-Care Liquid
Mobidev
Homey
Poirot
Inface
SCOP
Most Successful Application
• Automatic semantic annotation of documents
– semantic indexing and retrieval
– relevance ranking of in-document semantics
– coding towards medical terminologies
• Often requires formalisation of clients’
proprietary terminologies
www.landc.be
TeSSI Indexing Service
www.landc.be
TEX Indexing component
www.landc.be
Result 1 : Annotated document
www.landc.be
Result 2 : relevance ranking
www.landc.be
View on MeSH coding
www.landc.be
Behind the scene (1):
Semantics based shallow processing
www.landc.be
Behind the scene (2): disambiguation
www.landc.be
But: this system is NOT perfect !
www.landc.be
no code
found
in recognised
MeSH scope of “infections”
missed
theword
largenot
syntactic
Some problems
Major problem:
most medical terminologies
are build following unsound principles
www.landc.be
Problem
areas:
123456
www.landc.be
ICD-10 “class-of” hierarchy
Problems of mapping
www.landc.be
Problems with naïve merging
UMLS
semantic
drift
ISA
circles
www.landc.be
UMLS
superficial
term
analysis
www.landc.be
Is a
“formal”
approach
better ?
SNOMED-RT
www.landc.be
ISA ?
Are these the only
problems ?
Don’t dream !
The semiotic/semantic triangle
www.landc.be
Concept – Model – Mind
“prototype” view
Language
Term
Symbol
www.landc.be
“characteristics” view
Reality
Object
Where it went wrong ...
• Normative interpretation (Wüster,
ISO/TC37, prefered terms, ...)
• Too much focus on the concepts
• Assuming independence amongst
the vertices of the triangle
• Does not survive an intentional
framework
www.landc.be
Too much focus on the concepts
• “conceptualism” versus “realism”
• “ontology” versus “Ontology”
• oversimplification of reality
– discussions on 3D or 4D, instead of 3D and 4D
– limitations of set theory
• “independence of language” paradigm that does
not accept language as a medium of
communication
www.landc.be
Triangular dependencies
concept
The structures of language are
partially determined by our
conceptualisation of the
world.
Human observation is
determined by the
possibilities and
restrictions of the
human body.
Halliday
Lakoff
term
object
Baboons and humans have different cut-off points for discerning "same" objects because
our verbal expression for "same" makes the idea of "same" more restrictive.”
Fagot and Wasserman (Centre for Research in Cognitive Neuroscience in Marseille)
www.landc.be
Intentional framework problem
“Brussels” “capital of Belgium”
“I want to
go to ...”
www.landc.be
The basis of L&C’s approach
the description of
Halliday’s systemic
functional
grammar
www.landc.be
Aristotelian
realism
LinkBase
Language A
Proprietary Terminologies
Language
LexiconB
Lexicon
Others ...
Grammar
ICPC
Grammar
SNOMED
Formal Domain
Ontology
ICD
Cassandra Linguistic
Ontology
www.landc.be
MEDDRA
Example: joint anatomy
• joint HAS-HOLE joint space
• joint capsule IS-OUTER-LAYER-OF joint
• meniscus
– IS-INCOMPLETE-FILLER-OF joint space
– IS-TOPO-INSIDE joint capsule
– IS-NON-TANGENTIAL-MATERIAL-PART-OF
joint
• joint
– IS-CONNECTOR-OF bone X
– IS-CONNECTOR-OF bone Y
• synovia
– IS-INCOMPLETE-FILLER-OF joint space
www.landc.be
• synovial membrane IS-BONAFIDEBOUNDARY-OF joint space
Linguistic and domain ontologies
Generalised Possession
Haspossessor
Human
1
2
IS-A
IS-A
1
Healthcare phenomenon
Haspossessed
1
Having a healthcare phenomenon
2
Is-possessor-of
Patient
3
IS-A
IS-A
www.landc.be
4
Has-Healthcare3 phenomenon
Patient at risk
Patient at risk
for osteoporosis
Is-RiskFactor-Of
IS-A
4
Risk Factor
IS-A
Has-Healthcarephenomenon
Risk factor for
osteoporosis
IS-A
Is-RiskFactor-Of
Osteoporosis
The industrial challenge
• How to set up a profitable business taking into
account
– the problems intrinsic to medical terminologies
• the obligation for many clients to use them
• the existence of many different, incompatible systems
• often based on a design that is not suited for automatic
processing
– the problems intrinsic to natural language processing
– the business environment in which our applications
have to run
– the high costs for “doing right what others do wrong”
www.landc.be
Integration of Research, maintenance and production
WWW
candidate
knowledge
“WebAgent”
gap
“GapFinder”
Medico-Linguistic
Ontology
text to analyse
TeSSI, ...
client
document set
relevance
ranking
index
various beans
candidate
classified
terms
marked up text
various beans
“Model-H1”
corrected
document
production
new-terms
www.landc.be
uncorrected
document
gold standard corpus
research
maintenance
A terminological workbench for
maintenance AND production!
The
solution
www.landc.be
LinkFactory ToolSet
www.landc.be
Managing different views
Internal
ontology
External
ontology
Criteria
Mappings
Definitions
Terms
www.landc.be
Assisted term classification
www.landc.be
1
GapFinder
and
1.
2.
3.
4.
2
5.
concept queried for
information used for document
relevance assessment
new words retrieved
context-information for new
word
relevant pages found
3
4
WebAgent
www.landc.be
5
Research:automatic ontology extraction
www.landc.be
Some details
www.landc.be
Conclusions (and issues for discussion)
1. Medical natural language understanding (MNLU) is a
very complex endeavour;
2. Terminology is a necessary component, though not the
most important one;
3. Traditional terminology approaches have less value
for MNLU than those that accept a close relationship
with language and reality (e.g. the “sociocognitive” approach);
4. Some traditional, authoritative sources in medical
terminology follow a wrong approach;
5. Get away with Conceptualism and adopt Realism.
www.landc.be
Successful industrial application of
MNLU requires (amongst other things):
1. Acceptation of the previous points without
discussion;
2. 200% awareness of the state of the art;
3. the capacity to make valid judgments about the
expected outcomes of a (new) theory without
requiring full proofs;
4. an integrated service/production and
maintenance/ development environment to
transform costs into revenue.
www.landc.be