Transcript Document

Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ontology Seminar
A common framework for representing data
and what they are about
April 1, 2013 – 106 Jacobs Hall, North Campus, University Buffalo, 4-6pm
Werner CEUSTERS, MD
Will Hsu
Ontology Research Group, Center of Excellence
in Bioinformatics and Life Sciences,
Institute for Healthcare Informatics,
Department of Psychiatry
Neuroscience Program,
School of Medicine and Biomedical Sciences,
University at Buffalo, NY, USA
University at Buffalo, NY, USA
http://www.org.buffalo.edu/RTU
http://www.org.buffalo.edu/RTU
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Overview
• Data versus what they are about
• Issues with data documentation and data quality
tools
• What we need – at least – to resolve the issues
– Ontological Realism
– Referent Tracking
• A methodology explored in the OPMQoL project.
2
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
A crucial distinction: data and what they are about
FirstOrder
Reality
observation &
measurement
data
organization
model
development
Representation
Generic
beliefs
application
3
use
outcome
add
Δ=
(instrument and
study optimization)
verify
further R&D
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
A non-trivial relation
4
Referents
References
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
For instance: meaning and impact of changes
• Are differences in data about the same entities in
reality at different points in time due to:
–
–
–
–
changes in first-order reality ?
changes in our understanding of reality ?
inaccurate observations ?
registration mistakes ?
Ceusters W, Smith B. A Realism-Based Approach to the Evolution of Biomedical Ontologies. AMIA 2006 Proceedings, Washington DC,
2006;:121-125. http://www.referent-tracking.com/RTU/sendfile/?file=CeustersAMIA2006FINAL.pdf
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
What makes it non-trivial?
• Referents
– are (meta-) physically
the way they are,
– relate to each other in
an objective way,
– follow ‘laws of nature’.
• Window on reality
restricted by:
− what is physically and
technically observable,
− faithfulness of the
ontology used,
− fit between ontological
commitments and
computational views.
• References
– follow, ideally, the syntacticsemantic conventions of
some representation
language,
– are restricted by the
expressivity of that language,
– reference collections come,
for correct interpretation,
with documentation outside
the representation.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Standard DBMS architectures are inward looking
7
A Silberschatz, HF. Korth S. Sudarshan. Database
System Concepts McGraw-Hill ISBN 0-07-352332-1
http://mrbool.com/architecture-of-a-dbms/25785
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
A colleague shares his research data set
8
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
A closer look
• What are you going to ask him
right away?
• What do these various values
stand for and how do they
relate to each other?
– Might this mean that patient #5057
had only once sex at the age of 39?
9
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Documenting datasets
Sources
Data generation
Data organization
Data collection sheets
Instruction manuals
Interpretation criteria
Diagnostic criteria
Assessment instruments
Terminologies
Data validation procedures
Data dictionaries
Ontologies
If not used for data collection and organization, these sources can be used post hoc to document, and
perhaps increase, the level of data clarity and faithfulness in and comparability of existing data collections.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Issues with data documentation
and data quality tools
11
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The dataset’s data dictionary (codebook)
Field Name
Description
Type
Missing
Value
Range
Coding Values
id
Subject id
numeric
none
[5033,6387]
age
sex
Subject’s age
Subject’s gender
Numeric
0/1
None
none
[14,85]
Have you had pain in the face,
jaw, temple, in front of the ear or
in the ear in the past month?
an_8_gcps_1 How would you rate your facial
pain on a 0 to 10 scale at the
present time, that is
right now, where 0 is "no pain"
and 10 is "pain as bad as could
be"?
an_9_gcps_2 In the past six months, how
intense was your worst pain rated
on a 0 to 10 scale
where 0 is "no pain" and 10 is
12
"pain as bad as could be"?
0/1
none
numeric
“.”
0-10
0 – no pain to 10 Pain as bad as could
be
numeric
“.”
0-10
0 – no pain to 10 Pain as bad as could
be
q3
Age in years
0 – male, 1 - female
0 – no, 1 - yes
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Documentation in SAS program
13
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Example: assessing TMJ Anatomy
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Sagittal and coronal MR images of a TMJ
©2003 by Radiological Society of North America
Sommer O J et al. Radiographics 2003;23:e14-e14
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Radiology RDC/TMD Examination: data collection sheet
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
RDC/TMD: a collaborator’s data dictionary
Fieldnames in that Allowed values for
collaborator’s
the fields
data collection
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Anybody sees something disturbing ?
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
This data dictionary alone is not reliable!
That these variables are about the
condylar head of the TMJ is ‘lost in
translation’!
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
‘meaning’ of values in data collections
‘The patient with patient identifier ‘PtID4’ is
stated to have had a panoramic X-ray of the
mouth which is interpreted to show subcortical
sclerosis of that patient’s condylar head of the
right temporomandibular joint’
meaning
1
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ambiguities: are assertions about particulars or types?
• ‘Persistent idiopathic facial pain (PIFP)’
= ‘persistent facial pain with varying presentations …’
persistent
facial pain
types
presentation
type1
t1
t3
t1
t2
t1
t2
t3
my pain
21
presentation
type3
t2
t1
t2
t3
particulars
presentation
type2
t3
t1
t2
t1
t3
her pain
t2
t3
his pain
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ambiguities: are assertions about particulars or types?
• ‘Persistent idiopathic facial pain (PIFP)’
= ‘persistent facial pain with varying presentations …’
persistent
facial pain
types
presentation
type1
t3
t1
t2
t1
t2
t3
my pain
22
presentation
type3
t2
t1
t1
t2
t3
particulars
presentation
type2
t3
t1
t2
t1
t3
her pain
t2
t3
his pain
– if the description is about types, then the three particular pains
fall under PIFP.
– if the description is about (arbitrary) particulars, then only her
pain falls under PIFP.
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Separate knowledge from what it is about.
• ‘13.1.2.4 Painful trigeminal neuropathy
attributed to MS plaque’
• ‘attributed to’ relates to somebody’s opinion about
what is the case, not to what is the case.
– the mistake: a feature on the side of the clinician – his (not)
knowing - is taken to be a feature on the side of the patient.
• Similar mistakes:
– ‘Probable migraine’
– ‘facial pain of unknown origin’
23
(not in ICHD)
.
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
ICHD diagnostic criteria for PIFP
• Persistent idiopathic facial pain (PIFP):
A. Facial or oral pain for at least three months fulfilling criteria
B-F
B. Pain occurs daily for more than 2 hours per day
C. Pain has the following features
1.
2.
Poorly localized, does not following a peripheral nerve distribution.
Dull, aching, nagging
D. Clinical neurological examination is normal
E. Simple laboratory investigations including imaging of the face
and jaws exclude dental cause.
F. Not better accounted for by another ICHD-III diagnosis.
24
http://ihs-classification.org (current version)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Criteria do not replace definitions
• ‘13.1.1.1 Classical trigeminal neuralgia, purely
paroxysmal’, has the criterion ‘at least three attacks of
facial pain fulfilling criteria B-E’.
• This does not mean: a patient with 2 such attacks does not
exhibit this type of neuralgia;
• It rather means: do not diagnose the patient (yet) as
exhibiting this type of neuralgia.
• If ‘chronic pain’ is defined as ‘pain lasting longer than
three months’, at what point in time starts a patient to
have that type of pain?
25
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Intermediate conclusion
• Datasets have no value without appropriate
documentation,
• Accurate documentation is hard to come by,
• Even when documentation is accurate, it is hardly
machine processable.
• Question: would it be possible to construct selfexplanatory datasets?
26
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
And old idea: Self-Identifying Data
27
Jeremy Bailey. A Self-Defining Hierarchical Data System. In: R. J. Hanisch, R. J. V. Brissenden, and J. Barnes, eds.
Astronomical Data Analysis Software and Systems II. ASP Conference Series, Vol. 52, 1993
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
What we need
– at least –
to resolve the issues
Ontological Realism
Referent Tracking
28
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
The basis of Ontological Realism
1. There is an external reality which
is ‘objectively’ the way it is;
2. That reality is accessible to us;
3. We build in our brains cognitive
representations of reality;
4. We communicate with others
about what is there, and what we
believe there is there.
Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the
Biomedical Domain. Proceedings of KR-MED 2006, Biomedical Ontology in Action, November 8, 2006, Baltimore MD, USA
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Ontological Realism makes three crucial distinctions
1. Between data and what data are about;
2. Between continuants and occurrents;
3. Between what is generic and what is specific.
Smith B, Ceusters W. Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies.
Applied Ontology, 2010.
30
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
How does this painting illustrate
the distinction between data and
what they are about?
31
Institute for
York State
L3
Linguistic
representations about (1), (2) or (3)
R T U New
Healthcare
Center of Excellence in
Bioinformatics & Life Sciences
L2
Informatics
Clinicians’ beliefs about (1)
First Order Reality
L132
Entities (particular or generic) with objective existence
which are not about anything
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Ontological Realism makes crucial distinctions
• Between data and what data are about;
• Between continuants and occurrents:
– obvious differences:
• a person versus his life
• a disease versus its course
• space versus time
– more subtle differences:
• observation (data-element) versus observing
• diagnosis versus making a diagnosis
• message versus transmitting a message
33
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
BFO 2.0 continuants (April 1, 2013)
34
Independent continuant
Material entity
Object
Object aggregate
Fiat object part
Immaterial entity
Continuant fiat boundary
Site
Spatial region
Specifically dependent continuant
Quality
Relational quality
Realizable entity
Role (externally-grounded realizable entity)
Disposition (internally-grounded realizable entity)
Function
Generically dependent continuant
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
BFO 2.0 occurrents (April 1, 2013)
process
complete process
history
sectional process
process profile
process boundary
temporal region
zero-dimensional temporal region
one-dimensional temporal region
spatiotemporal region
35
Institute for
Healthcare
Informatics
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Between ‘generic’ and ‘specific’
Generic
L3.
Representation
L2. Beliefs
(knowledge)
L1.
First-order
reality
pain classification
EHR
DIAGNOSIS
INDICATION
PATHOLOGICAL
STRUCTURE
DRUG
MIGRAINE
HEADACHE
36
Specific
PERSON
DISEASE
PAIN
Basic Formal Ontology
ICHD
my EHR
my doctor’s
work plan
my doctor’s
diagnosis
my doctor’s
computer
my doctor
me
my migraine
my headache
Referent Tracking
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The essential pieces
dependent
continuant
material
object
t
history
me
… at t
my
life
my 4D
STR
located-in at t
some
spatial
region
temporal
region
t
occupies
projectsOn at t
37
spatial
region
instanceOf
t
participantOf at t
some
quality
spacetime
region
projectsOn
some
temporal
region
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Should be obvious for ontologists, but …
• Comments by ICBO 2013 reviewers:
(with self-claimed ‘high confidence’ in their expertise)
– There is a problem with the relation ‘X instance-of Y at time t’, because the
time-index ‘t’ remains unclear.
• is the restriction of a continuant to a proper time segment of the life-time a continuant too?
• Is the restriction of a continuant to a time-point, a continuant again?
 unclarity is in the eyes of those who look only through (OWL-)DL glasses
– This paper describes a mechanism for […] into relationships of the
form: x r1 y r2 t, where x i[s a]n individual, y is a term, and t is a
time.
 confuses terms with what they are about
– … the uniqueness of the entities behind #1, #1, #3 can only be derived
from the whole description.
38
 never bothered to read any papers about this topic
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Fundamental goals of Referent Tracking
• Who remembers?
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Fundamental goals of Referent Tracking
explicit reference to the
concrete individual entities
relevant to the accurate
description of some
portion of reality, ...
Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records.
J Biomed Inform. 2006 Jun;39(3):362-78.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Method: numbers instead of words
– Introduce an Instance
Unique Identifier (IUI)
for each relevant
particular (individual)
entity
78
Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records.
J Biomed Inform. 2006 Jun;39(3):362-78.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Fundamental goals of ‘our’ Referent Tracking
Use these identifiers in expressions using a language that
acknowledges the structure of reality:
e.g.: a yellow ball:
then not : yellow(#1) and ball(#1)
rather: #1: the ball (Indep. cont.) #2: #1’s yellow (Quality)
Then still not:
ball(#1) and yellow(#2) and hascolor(#1, #2)
but rather:
Strong foundations
instance-of(#1, ball, since t1)
in realism-based
instance-of(#2, yellow, since t2)
ontology
inheres-in(#1, #2, since t2)
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
The shift envisioned
• From:
– ‘this man is a 40 year old patient with molar caries’
• To (something like):
– ‘this-1 on which depend this-2 and this-3 has this-4’, where
•
•
•
•
•
•
•
•
•
•
this-1
this-2
this-2
this-3
this-3
this-4
this-4
this-5
this-5
…
instanceOf
instanceOf
qualityOf
instanceOf
roleOf
instanceOf
partOf
instanceOf
partOf
human being …
age-of-40-years …
this-1 …
patient-role …
this-1 …
caries…
this-5 …
molar…
this-1 …
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
The shift envisioned
• From:
– ‘this man is a 40 year old patient with molar caries’
• To (something like):
– ‘this-1 on which depend this-2 and this-3 has this-4’, where
•
•
•
•
•
•
•
•
•
•
this-1
this-2
this-2
this-3
this-3
this-4
this-4
this-5
this-5
…
instanceOf
instanceOf
qualityOf
instanceOf
roleOf
instanceOf
partOf
instanceOf
partOf
human being …
age-of-40-years …
this-1 …
patient-role …
this-1 …
caries…
this-5 …
molar…
this-1 …
denotators for particulars
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
The shift envisioned
• From:
– ‘this man is a 40 year old patient with molar caries’
• To (something like):
– ‘this-1 on which depend this-2 and this-3 has this-4’, where
•
•
•
•
•
•
•
•
•
•
this-1
this-2
this-2
this-3
this-3
this-4
this-4
this-5
this-5
…
instanceOf
instanceOf
qualityOf
instanceOf
roleOf
instanceOf
partOf
instanceOf
partOf
human being …
age-of-40-years …
this-1 …
patient-role …
this-1 …
caries…
this-5 …
molar…
this-1 …
denotators for appropriate relations
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
The shift envisioned
• From:
– ‘this man is a 40 year old patient with molar caries’
• To (something like):
– ‘this-1 on which depend this-2 and this-3 has this-4’, where
•
•
•
•
•
•
•
•
•
•
this-1
this-2
this-2
this-3
this-3
this-4
this-4
this-5
this-5
…
instanceOf
instanceOf
qualityOf
instanceOf
roleOf
instanceOf
partOf
instanceOf
partOf
human being …
age-of-40-years …
this-1 …
patient-role …
this-1 …
caries…
this-5 …
molar…
this-1 …
denotators for universals
or particulars
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The shift envisioned
• From:
– ‘this man is a 40 year old patient with molar caries’
• To (something like):
– ‘this-1 on which depend this-2 and this-3 has this-4’, where
•
•
•
•
•
•
•
•
•
•
this-1
this-2
this-2
this-3
this-3
this-4
this-4
this-5
this-5
…
instanceOf
instanceOf
qualityOf
instanceOf
roleOf
instanceOf
partOf
instanceOf
partOf
human being
age-of-40-years
this-1
patient-role
this-1
caries
this-5
molar
this-1
…
…
…
…
…
…
…
…
…
time stamp in
case of
continuants
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Relevance: the way RT-compatible EHRs ought to interact with
representations of generic portions of reality
instance-of at t
caused
#105
by
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Should be obvious for ontologists, but …
• Comments by ICBO 2013 reviewers:
(with self-claimed ‘high confidence’ in their expertise)
– The basic idea is to introduce unique identifiers for denoting the
real world entities. This notion is very similar to the URI
(uniform resource identifier), used in RDF and the semantic
web.
 Rather just a bit similar but WITH a more precise semantics
– All together we achieve a sentence F(#1,#2,#3) with three
constants, denoting real world entities. Since a direct link
between #i and a real entity does not solve the problem of
uniqueness, the uniqueness of the entities behind #1, #1, #3 can
only be derived from the whole description.
49
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Should be obvious for ontologists, but …
• The idea of denoting entities by UII is trivial, and is already used
within the framework of RDF and the semantic web.
• However: for RDF
– No distinction between classes and instances (individuals)
<Species, type, Class>
<Lion, type, Species>
<Leo, type, Lion>
– Properties can themselves have properties
<hasDaughter, subPropertyOf, hasChild>
<hasDaughter, type, familyProperty>
– No distinction between language constructors and ontology vocabulary, so constructors can be applied to
themselves/each other
<type, range, Class>
<Property, type, Class>
<type, subPropertyOf, subClassOf>
•  No ontological foundations
50
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Towards self-explanatory datasets
in OPMQoL
51
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Specific Aims of the OPMQoL Project
1. describe the portions of reality covered by the five datasets by
means of a realism-based ontology (OPMQoL),
2. design bridging axioms required to express the data dictionaries of
the datasets in terms of the OPMQoL and translate these axioms
in the query languages used by the underlying databases,
3. validate OPMQoL by querying the datasets with and without
using the ontology and by comparing the results in function of the
clinical question identified,
4. document the development and validation approach in a way that
other groups can re-use and expand OPMQoL, and use our
approach in other domains.
52
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Considered datasets
• ‘US Dataset’ (724 patients) resulted from the NIH funded
RDC/TMD Validation Project,
• ‘Hadassah Dataset’ (306 patients) from the Orofacial Pain
Clinic at the Faculty of Dentistry, Hadassah,
• ‘German Dataset’ (416 patients) of patients seeking treatment
for orofacial pain at the Department of Prosthodontics and
Materials Sciences, University of Leipzig,
• ‘Swedish Dataset’of 46 consecutive Atypical Odontalgia (AO)
patients recruited from 4 orofacial pain clinics in Sweden as well
as data about age- and gender-matched control patients, 35 of
which being painless and 41 being TMD patients,
• ‘UK Dataset’ (168 patients) of facial pain of non dental origin
present for a minimum of three months.
53
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Linking the instruments and other tools
• analyze data dictionaries, assessment instruments,
study criteria and corresponding terminologies,
• build realism-based application ontologies to link
these sources to realism-based reference
ontologies.
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
uses
Terminology component
1..
used-for
*
0..*
terminology
1
uses
1..
1..
data
dictionary
*
term
part-of
1
1..
*
expressed-by
has-part 0..*
1
0..1
expresses
uses
Data component
* used-for
data collection
1
used-for
1..*
0..1
uses
1..
*
measurement
datum
broader
0..*
1..*
narrower
representational
artifact
1..
0..*
*
uses
uses
0..1
assessment
instrument
ontology
data collection
ontology
used-for
0..*
corresponds-to
1
used-in
1..
means
concept
used-for
*
0..*
1..
*
assessment
instrument
application ontology
bridging
axiom
1..x
1
uses
used-for
ontology
Ontology
component
1
entity
denoted by
denotes
1
1..
denotator
*
1
reference ontology
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Mapping assessment instrument terms, ontology and patient cases
56
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Objectives of the ‘sources’ analysis
• Find for each value V in the data collections all
possible configurations of entities (according to
our best scientific understanding) for which the
following can be true:
– V
– ‘it is stated that V’
• Describe these possible configurations by means
of sentences from a formal language that mimic
the structure of reality.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Objectives of the ‘sources’ analysis (2)
• For example,
– for the value stating that ‘The patient with patient
identifier ‘PtID4’ has had a panoramic X-ray of the
mouth which is interpreted to show subcortical
sclerosis of that patient’s condylar head of the right
temporomandibular joint’ to be true,
– this statement must have been made,
– for the statement to be true, there must have been that
patient, an X-ray, etc, …
– BUT! It is not necessarily true that that patient has
indeed the sclerosis as diagnosed.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Methodology (1): for the 1st order reality
1. Formulate for each variable in the data collection a
sentence explaining as accurately as possible what the
variable stands for,
2. list the entities in reality that the terms in the sentence
denote,
3. list recursively for all entities listed further entities that
ontologically must exist for the entity under scrutiny to
exist,
4. classify all entities in terms of realism-based ontologies
(RBO),
5. specify all obtaining relationships between these entities,
6. outline all possible configurations of such entities for the
sentence to be true.
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Step 1: formulate a statement
‘The patient with patient identifier ‘PtID4’ is
stated to have had a panoramic X-ray of the
mouth which is interpreted to show subcortical
sclerosis of that patient’s condylar head of the
right temporomandibular joint’
meaning
1
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Step 2 (1): list the entities denoted
• 1(The patient) with 2(patient
identifier ‘PtID4’) 3(is
stated) 4(to have had) a
5(panoramic X-ray) of 6(the
mouth) which 7(is
interpreted) to 8(show)
9(subcortical sclerosis of
10(that patient’s condylar
head of the 11(right
temporomandibular joint)))’
notes:
CLASS
person
patient identifier
assertion
technically investigating
panoramic X-ray
mouth
interpreting
seeing
diagnosis
condylar head of right TMJ
right TMJ
colors have no meaning here, just provide easy reference,
this first list can be different, any such differences being resolved in step 3
INSTANCE
IDENTIFIER
IUI-1
IUI-2
IUI-3
IUI-4
IUI-5
IUI-6
IUI-7
IUI-8
IUI-9
IUI-10
IUI-11
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 2 (2): provide directly referential descriptions
person
patient identifier
assertion
INSTANCE
IDENTIFIER
IUI-1
IUI-2
IUI-3
technically investigating
IUI-4
DIRECTLY REFERENTIAL DESCRIPTIONS
the person to whom IUI-2 is assigned
the patient identifier of IUI-1
'the patient with patient identifier PtID4 has
had a panoramic X-ray of the mouth which is
interpreted to show subcortical sclerosis of
that patient’s right temporomandibular joint'
the technically investigating of IUI-6
panoramic X-ray
mouth
interpreting
seeing
diagnosis
condylar head of right TMJ
right TMJ
IUI-5
IUI-6
IUI-7
IUI-8
IUI-9
IUI-10
IUI-11
the panoramic X-ray that resulted from IUI-4
the mouth of IUI-1
the interpreting of the signs exhibited by IUI-5
the seeing of IUI-5 which led to IUI-7
the diagnosis expressed by means of IUI-3
the condylar head of the right TMJ of IUI-1
the right TMJ of IUI-1
CLASS
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 3: identify further entities that ontologically must
exist for each entity under scrutiny to exist.
assigner role
assigning
asserting
asserter role
investigator role
IUI-12
IUI-21
IUI-20
IUI-13
IUI-14
the assigner role played by the entity while it performed IUI-21
the assigning of IUI-2 to IUI-1 by the entity with role IUI-12
the asserting of IUI-3 by the entity with asserter role IUI-13
the asserter role played by the entity while it performed IUI-20
the investigator role played by the entity while it performed IUI-4
panoramic X-ray
machine
image bearer
IUI-15 the panoramic X-ray machine used for performing IUI-4
interpreter role
IUI-16 the image bearer in which IUI-5 is concretized and that
participated in IUI-8
IUI-17 the interpreter role played by the entity while it performed IUI-7
perceptor role
IUI-18 the perceptor role played by the entity while it performed IUI-8
diagnostic criteria IUI-19 the diagnostic criteria used by the entity that performed IUI-7 to
come to IUI-9
study subject role IUI-22 the study subject role which inheres in IUI-1
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 3: some remarks
• interpreter role, perceptor role, …
– reference to roles rather than the entity in which the
roles inhere because it may be the same entity and one
should not assign several IUIs to the same entity
• each description follows similar principles as
Aristotelian definitions but is about particulars
rather than universals
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 4: classify all entities in terms of realism-based ontologies
CLASS
person
patient identifier
assertion
technically
investigating
panoramic X-ray
mouth
interpreting
seeing
diagnosis
condylar head of
right TMJ
right TMJ
assigner role
assigning
study subject role
HIGHER CLASS
BFO: Object
IAO: Information Content Entity
IAO: Information Content Entity
OBI: Assay
IAO: Image
FMA: Mouth
MFO: Assessing
BFO: Process
IAO: Information Content Entity
FMA: Right condylar process of mandible
FMA: Right temporomandibular joint
BFO: Role
BFO: Process
OBI: Study subject role
• requires more
ontological and
philosophical
skills than
domain expertise
or expertise with
Protégé,
• not just term
matching
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 5: specify relationships between these entities
• For instance:
– at least during the taking of the X-ray the study subject
role inheres in the patient being investigated:
• IUI-23 inheres-in IUI-1 during t1
– the patient participates at that time in the investigation
• IUI-4 has-participant IUI-1 during t1
• These relations need to follow the principles of the
Relation Ontology.
Smith B, Ceusters W, Klagges B, Koehler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector A, Rosse C.
Relations in biomedical ontologies, Genome Biology 2005, 6:R46.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Step 6: outline all possible configurations of such entities for the
sentence to be true (a one semester course on its own)
• Such outlines are collections of relational expressions
of the sort just described,
• Variant configurations for the example:
– perceptor and interpreter are the same or distinct human
beings,
– the X-ray machine is unreliable and produced artifacts
which the interpreter thought to be signs motivating his
diagnosis, while the patient has indeed the disorder
specified by the diagnosis (the clinician was lucky)
–…
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Methodology (2): for each dataset
• Build a formal template which describes:
– the results of steps 4-6 of the 1st order analysis,
– the relationships between:
• the 1st order entities and the corresponding data items in the
data set,
• data items themselves.
• Build a prototype able to generate on the basis of
the template for each subject (patient) in the
dataset an RT-compatible representation of his 1st
and 2nd order entities.
68
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
The template
69
Institute for
Healthcare
Informatics
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Partial Template for 3 variables (in the ‘German’ dataset)
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
70
Var RT
IM
id LV
id IM
sex CV
sex CV
sex CV
sex UA
q3 CV
q3 CV
q3 IM
q3 IM
q3 IM
q3 RP
q3 UP
q3 UA
q3 JA
REF
patient_study_record
patient_identifier
patient
gender
male
female
sex
no_pain_in_ lower_face
pain_in_ lower_face
in_the_past_month
lower_face
time_of_q3_concretization
an_8_gcps_1
an_8_gcps_1
an_8_gcps_1
an_8_gcps_1
Min
Max
Val
0
1
BLANK BLANK
0
1
0
1
BLANK
BLANK
0
10
BLANK
BLANK
0
0
1
0
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
3 variables in the ‘German’ dataset
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
71
Var RT REF
Min
Max
Val
IM patient_study_record
id LV patient_identifier
id IM patient
sex CV gender
sex CV
maleto the question: ‘Have you had pain in the face, 0
Answer
sex CV female
1
jaw,
temple,
in
front
of
the
ear
or
in
the
ear
in
the
past
sex UA sex
BLANK BLANK
month?’
q3 CV
no_pain_in_ lower_face
0
q3 CV pain_in_ lower_face
1
q3 IM in_the_past_month
q3 IM lower_face
q3 IM time_of_q3_concretization
Answer to the question: ‘’ How would you rate your facial
q3 RP an_8_gcps_1
0
0
0
pain
on
a
0
to
10
scale
at
the
present
time,
that
is
right
q3 UP an_8_gcps_1
1
10
0now,
where 0 is "no pain" and 10 isBLANK
"pain as bad
as could be"?
q3 UA an_8_gcps_1
BLANK
1
q3 JA an_8_gcps_1
BLANK BLANK
0
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Record Types in the template
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
72
Var RT
IM
id LV
id IM
sex CV
sex CV
sex CV
sex UA
q3 CV
q3 CV
q3 IM
q3 IM
q3 IM
q3 RP
q3 UP
q3 UA
q3 JA
REF
Min
Max
patient_study_record
patient_identifier
patient
LV: Literal value
gender
male
CV: Coded Value
female
IM: Implicit
sex
BLANK BLANK
no_pain_in_ lower_face
pain_in_ lower_face
JA: Justified Absence
in_the_past_month
UA: Unjustified Absence
lower_face
time_of_q3_concretization
UP: Unjustified Presence
an_8_gcps_1
0
0
RP: Redundant
Presence
an_8_gcps_1
1
10
an_8_gcps_1
BLANK BLANK
an_8_gcps_1
BLANK BLANK
Val
0
1
0
1
0
0
1
0
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Condition-based xA/xP determination
RN
7
13
14
15
16
Var
sex
q3
q3
q3
q3
RT
UA
RP
UP
UA
JA
REF
sex
an_8_gcps_1
an_8_gcps_1
an_8_gcps_1
an_8_gcps_1
Min
BLANK
0
1
BLANK
BLANK
Max
BLANK
0
10
BLANK
BLANK
Val
0
0
1
0
If
the value of REF is either outside the range of Min/Max
or ‘BLANK’
and
the value for Var is as indicated by Val, including no value
at all,
then
73
the presence or absence of the corresponding data item is of a
sort indicated by RT.
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Conditional selection of descriptions
74
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
RT compatible part of the template
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
75
IUI(L)
#pidL#patL#patgL-
#q3L0#q3L1-
#q3L#q3L#q3L#q3L-
IUI(P)
P-Type
#psrec#pid#pat#patg#patg#patg#patgL#pat#pq3#tq3#patlf#cq3#q3L#q3L#q3L#q3L-
DATASET-RECORD
DENOTATOR
PATIENT
GENDER
MALE-GENDER
FEMALE-GENDER
UNDERSPEC-ICE
PAIN
MONTH-PERIOD
LOWER-FACE
TIME-PERIOD
DISINFORMATION
UNDERSPEC-ICE
J-BLANK-ICE
P-Rel
P-Targ
denotes
#pat-
inheres-in
inheres-in
inheres-in
#pat#pat#pat-
lacks-pcp
participant
PAIN
part-of
after
corresp-w
#pat#tq3#q3L0-
#pat-
Trel
Time
at
at
at
at
at
at
at
at
at
t
t
t
t
t
t
t
#tq3#tq3-
at
t
at
at
at
at
t
t
t
t
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
RT compatible part of the template
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
76
IUI(L)
#pidL#patL#patgL-
#q3L0#q3L1-
#q3L#q3L#q3L#q3L-
IUI(P)
P-Type
P-Rel
P-Targ
Trel
Time
#psrecDATASET-RECORD
at
t
#pidDENOTATOR
denotes
#patat
t
#patPATIENT
at
t
#patgGENDER
inheres-in
#patat
t
#patgMALE-GENDER
inheres-in
#patat
t
#patgFEMALE-GENDER
inheres-in
#patat
t
denotes
(when
instantiated)
the
gender
of
the
patient
#patgL- UNDERSPEC-ICE
at
t
#patlacks-pcp
PAIN
at
#tq3#pq3- (when
PAIN
participant
at
#tq3denotes
instantiated)
the data#patitem concretized
MONTH-PERIOD
in#tq3the
dataset
in relation to
the gender
#patlfLOWER-FACE
part-of
#pat-of theatpatient
t
#cq3TIME-PERIOD
after
#tq3#q3Lcorresp-w
#q3L0at
t
#q3LDISINFORMATION
at
t
#q3LUNDERSPEC-ICE
at
t
#q3LJ-BLANK-ICE
at
t
Institute for
Healthcare
Informatics
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
RT compatible part of the template
RN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
77
IUI(L)
#pidL#patL#patgL-
#q3L0#q3L1-
#q3L#q3L#q3L#q3L-
IUI(P)
P-Type
#psrec#pid#pat#patg#patg#patg#patgL#pat#pq3#tq3#patlf#cq3#q3L#q3L#q3L#q3L-
DATASET-RECORD
DENOTATOR
PATIENT
GENDER
MALE-GENDER
FEMALE-GENDER
UNDERSPEC-ICE
PAIN
MONTH-PERIOD
LOWER-FACE
TIME-PERIOD
DISINFORMATION
UNDERSPEC-ICE
J-BLANK-ICE
P-Rel
P-Targ
denotes
#pat-
inheres-in
inheres-in
inheres-in
#pat#pat#pat-
lacks-pcp
participant
PAIN
part-of
after
corresp-w
#pat#tq3#q3L0-
#pat-
Trel
Time
at
at
at
at
at
at
at
at
at
t
t
t
t
t
t
t
#tq3#tq3-
at
t
at
at
at
at
t
t
t
t
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Work in progress: IAO (?) related types
• UNDERSPECIFIED-ICE
– ICE which describes a portion of reality at
determinable rather than determinate level
• DISINFORMATION
– GDC which provides erroneous information
• J-BLANK-ICE
– GDC which conveys there should not be an ICE
concretized.
78
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
Institute for
Healthcare
Informatics
Acknowledgement
The work described is funded in part by
grant 1R01DE021917-01A1
from the National Institute of Dental and
Craniofacial Research (NIDCR). The content of
this presentation is solely the responsibility of the
author and does not necessarily represent the
official views of the NIDCR or the National
Institutes of Health.