Transcript Document
Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ontology Seminar A common framework for representing data and what they are about April 1, 2013 – 106 Jacobs Hall, North Campus, University Buffalo, 4-6pm Werner CEUSTERS, MD Will Hsu Ontology Research Group, Center of Excellence in Bioinformatics and Life Sciences, Institute for Healthcare Informatics, Department of Psychiatry Neuroscience Program, School of Medicine and Biomedical Sciences, University at Buffalo, NY, USA University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU http://www.org.buffalo.edu/RTU R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Overview • Data versus what they are about • Issues with data documentation and data quality tools • What we need – at least – to resolve the issues – Ontological Realism – Referent Tracking • A methodology explored in the OPMQoL project. 2 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences A crucial distinction: data and what they are about FirstOrder Reality observation & measurement data organization model development Representation Generic beliefs application 3 use outcome add Δ= (instrument and study optimization) verify further R&D Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences A non-trivial relation 4 Referents References R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics For instance: meaning and impact of changes • Are differences in data about the same entities in reality at different points in time due to: – – – – changes in first-order reality ? changes in our understanding of reality ? inaccurate observations ? registration mistakes ? Ceusters W, Smith B. A Realism-Based Approach to the Evolution of Biomedical Ontologies. AMIA 2006 Proceedings, Washington DC, 2006;:121-125. http://www.referent-tracking.com/RTU/sendfile/?file=CeustersAMIA2006FINAL.pdf Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences What makes it non-trivial? • Referents – are (meta-) physically the way they are, – relate to each other in an objective way, – follow ‘laws of nature’. • Window on reality restricted by: − what is physically and technically observable, − faithfulness of the ontology used, − fit between ontological commitments and computational views. • References – follow, ideally, the syntacticsemantic conventions of some representation language, – are restricted by the expressivity of that language, – reference collections come, for correct interpretation, with documentation outside the representation. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Standard DBMS architectures are inward looking 7 A Silberschatz, HF. Korth S. Sudarshan. Database System Concepts McGraw-Hill ISBN 0-07-352332-1 http://mrbool.com/architecture-of-a-dbms/25785 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics A colleague shares his research data set 8 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics A closer look • What are you going to ask him right away? • What do these various values stand for and how do they relate to each other? – Might this mean that patient #5057 had only once sex at the age of 39? 9 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Documenting datasets Sources Data generation Data organization Data collection sheets Instruction manuals Interpretation criteria Diagnostic criteria Assessment instruments Terminologies Data validation procedures Data dictionaries Ontologies If not used for data collection and organization, these sources can be used post hoc to document, and perhaps increase, the level of data clarity and faithfulness in and comparability of existing data collections. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Issues with data documentation and data quality tools 11 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences The dataset’s data dictionary (codebook) Field Name Description Type Missing Value Range Coding Values id Subject id numeric none [5033,6387] age sex Subject’s age Subject’s gender Numeric 0/1 None none [14,85] Have you had pain in the face, jaw, temple, in front of the ear or in the ear in the past month? an_8_gcps_1 How would you rate your facial pain on a 0 to 10 scale at the present time, that is right now, where 0 is "no pain" and 10 is "pain as bad as could be"? an_9_gcps_2 In the past six months, how intense was your worst pain rated on a 0 to 10 scale where 0 is "no pain" and 10 is 12 "pain as bad as could be"? 0/1 none numeric “.” 0-10 0 – no pain to 10 Pain as bad as could be numeric “.” 0-10 0 – no pain to 10 Pain as bad as could be q3 Age in years 0 – male, 1 - female 0 – no, 1 - yes R T U New York State Center of Excellence in Bioinformatics & Life Sciences Documentation in SAS program 13 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Example: assessing TMJ Anatomy R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Sagittal and coronal MR images of a TMJ ©2003 by Radiological Society of North America Sommer O J et al. Radiographics 2003;23:e14-e14 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Radiology RDC/TMD Examination: data collection sheet R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics RDC/TMD: a collaborator’s data dictionary Fieldnames in that Allowed values for collaborator’s the fields data collection R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Anybody sees something disturbing ? R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics This data dictionary alone is not reliable! That these variables are about the condylar head of the TMJ is ‘lost in translation’! Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences ‘meaning’ of values in data collections ‘The patient with patient identifier ‘PtID4’ is stated to have had a panoramic X-ray of the mouth which is interpreted to show subcortical sclerosis of that patient’s condylar head of the right temporomandibular joint’ meaning 1 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ambiguities: are assertions about particulars or types? • ‘Persistent idiopathic facial pain (PIFP)’ = ‘persistent facial pain with varying presentations …’ persistent facial pain types presentation type1 t1 t3 t1 t2 t1 t2 t3 my pain 21 presentation type3 t2 t1 t2 t3 particulars presentation type2 t3 t1 t2 t1 t3 her pain t2 t3 his pain Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Ambiguities: are assertions about particulars or types? • ‘Persistent idiopathic facial pain (PIFP)’ = ‘persistent facial pain with varying presentations …’ persistent facial pain types presentation type1 t3 t1 t2 t1 t2 t3 my pain 22 presentation type3 t2 t1 t1 t2 t3 particulars presentation type2 t3 t1 t2 t1 t3 her pain t2 t3 his pain – if the description is about types, then the three particular pains fall under PIFP. – if the description is about (arbitrary) particulars, then only her pain falls under PIFP. Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Separate knowledge from what it is about. • ‘13.1.2.4 Painful trigeminal neuropathy attributed to MS plaque’ • ‘attributed to’ relates to somebody’s opinion about what is the case, not to what is the case. – the mistake: a feature on the side of the clinician – his (not) knowing - is taken to be a feature on the side of the patient. • Similar mistakes: – ‘Probable migraine’ – ‘facial pain of unknown origin’ 23 (not in ICHD) . Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences ICHD diagnostic criteria for PIFP • Persistent idiopathic facial pain (PIFP): A. Facial or oral pain for at least three months fulfilling criteria B-F B. Pain occurs daily for more than 2 hours per day C. Pain has the following features 1. 2. Poorly localized, does not following a peripheral nerve distribution. Dull, aching, nagging D. Clinical neurological examination is normal E. Simple laboratory investigations including imaging of the face and jaws exclude dental cause. F. Not better accounted for by another ICHD-III diagnosis. 24 http://ihs-classification.org (current version) R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Criteria do not replace definitions • ‘13.1.1.1 Classical trigeminal neuralgia, purely paroxysmal’, has the criterion ‘at least three attacks of facial pain fulfilling criteria B-E’. • This does not mean: a patient with 2 such attacks does not exhibit this type of neuralgia; • It rather means: do not diagnose the patient (yet) as exhibiting this type of neuralgia. • If ‘chronic pain’ is defined as ‘pain lasting longer than three months’, at what point in time starts a patient to have that type of pain? 25 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Intermediate conclusion • Datasets have no value without appropriate documentation, • Accurate documentation is hard to come by, • Even when documentation is accurate, it is hardly machine processable. • Question: would it be possible to construct selfexplanatory datasets? 26 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics And old idea: Self-Identifying Data 27 Jeremy Bailey. A Self-Defining Hierarchical Data System. In: R. J. Hanisch, R. J. V. Brissenden, and J. Barnes, eds. Astronomical Data Analysis Software and Systems II. ASP Conference Series, Vol. 52, 1993 R T U New York State Center of Excellence in Bioinformatics & Life Sciences What we need – at least – to resolve the issues Ontological Realism Referent Tracking 28 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics The basis of Ontological Realism 1. There is an external reality which is ‘objectively’ the way it is; 2. That reality is accessible to us; 3. We build in our brains cognitive representations of reality; 4. We communicate with others about what is there, and what we believe there is there. Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Proceedings of KR-MED 2006, Biomedical Ontology in Action, November 8, 2006, Baltimore MD, USA R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Ontological Realism makes three crucial distinctions 1. Between data and what data are about; 2. Between continuants and occurrents; 3. Between what is generic and what is specific. Smith B, Ceusters W. Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies. Applied Ontology, 2010. 30 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics How does this painting illustrate the distinction between data and what they are about? 31 Institute for York State L3 Linguistic representations about (1), (2) or (3) R T U New Healthcare Center of Excellence in Bioinformatics & Life Sciences L2 Informatics Clinicians’ beliefs about (1) First Order Reality L132 Entities (particular or generic) with objective existence which are not about anything R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Ontological Realism makes crucial distinctions • Between data and what data are about; • Between continuants and occurrents: – obvious differences: • a person versus his life • a disease versus its course • space versus time – more subtle differences: • observation (data-element) versus observing • diagnosis versus making a diagnosis • message versus transmitting a message 33 R T U New York State Center of Excellence in Bioinformatics & Life Sciences BFO 2.0 continuants (April 1, 2013) 34 Independent continuant Material entity Object Object aggregate Fiat object part Immaterial entity Continuant fiat boundary Site Spatial region Specifically dependent continuant Quality Relational quality Realizable entity Role (externally-grounded realizable entity) Disposition (internally-grounded realizable entity) Function Generically dependent continuant Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences BFO 2.0 occurrents (April 1, 2013) process complete process history sectional process process profile process boundary temporal region zero-dimensional temporal region one-dimensional temporal region spatiotemporal region 35 Institute for Healthcare Informatics Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Between ‘generic’ and ‘specific’ Generic L3. Representation L2. Beliefs (knowledge) L1. First-order reality pain classification EHR DIAGNOSIS INDICATION PATHOLOGICAL STRUCTURE DRUG MIGRAINE HEADACHE 36 Specific PERSON DISEASE PAIN Basic Formal Ontology ICHD my EHR my doctor’s work plan my doctor’s diagnosis my doctor’s computer my doctor me my migraine my headache Referent Tracking Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences The essential pieces dependent continuant material object t history me … at t my life my 4D STR located-in at t some spatial region temporal region t occupies projectsOn at t 37 spatial region instanceOf t participantOf at t some quality spacetime region projectsOn some temporal region R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Should be obvious for ontologists, but … • Comments by ICBO 2013 reviewers: (with self-claimed ‘high confidence’ in their expertise) – There is a problem with the relation ‘X instance-of Y at time t’, because the time-index ‘t’ remains unclear. • is the restriction of a continuant to a proper time segment of the life-time a continuant too? • Is the restriction of a continuant to a time-point, a continuant again? unclarity is in the eyes of those who look only through (OWL-)DL glasses – This paper describes a mechanism for […] into relationships of the form: x r1 y r2 t, where x i[s a]n individual, y is a term, and t is a time. confuses terms with what they are about – … the uniqueness of the entities behind #1, #1, #3 can only be derived from the whole description. 38 never bothered to read any papers about this topic R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Fundamental goals of Referent Tracking • Who remembers? R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Fundamental goals of Referent Tracking explicit reference to the concrete individual entities relevant to the accurate description of some portion of reality, ... Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Method: numbers instead of words – Introduce an Instance Unique Identifier (IUI) for each relevant particular (individual) entity 78 Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Fundamental goals of ‘our’ Referent Tracking Use these identifiers in expressions using a language that acknowledges the structure of reality: e.g.: a yellow ball: then not : yellow(#1) and ball(#1) rather: #1: the ball (Indep. cont.) #2: #1’s yellow (Quality) Then still not: ball(#1) and yellow(#2) and hascolor(#1, #2) but rather: Strong foundations instance-of(#1, ball, since t1) in realism-based instance-of(#2, yellow, since t2) ontology inheres-in(#1, #2, since t2) R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics The shift envisioned • From: – ‘this man is a 40 year old patient with molar caries’ • To (something like): – ‘this-1 on which depend this-2 and this-3 has this-4’, where • • • • • • • • • • this-1 this-2 this-2 this-3 this-3 this-4 this-4 this-5 this-5 … instanceOf instanceOf qualityOf instanceOf roleOf instanceOf partOf instanceOf partOf human being … age-of-40-years … this-1 … patient-role … this-1 … caries… this-5 … molar… this-1 … R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics The shift envisioned • From: – ‘this man is a 40 year old patient with molar caries’ • To (something like): – ‘this-1 on which depend this-2 and this-3 has this-4’, where • • • • • • • • • • this-1 this-2 this-2 this-3 this-3 this-4 this-4 this-5 this-5 … instanceOf instanceOf qualityOf instanceOf roleOf instanceOf partOf instanceOf partOf human being … age-of-40-years … this-1 … patient-role … this-1 … caries… this-5 … molar… this-1 … denotators for particulars R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics The shift envisioned • From: – ‘this man is a 40 year old patient with molar caries’ • To (something like): – ‘this-1 on which depend this-2 and this-3 has this-4’, where • • • • • • • • • • this-1 this-2 this-2 this-3 this-3 this-4 this-4 this-5 this-5 … instanceOf instanceOf qualityOf instanceOf roleOf instanceOf partOf instanceOf partOf human being … age-of-40-years … this-1 … patient-role … this-1 … caries… this-5 … molar… this-1 … denotators for appropriate relations R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics The shift envisioned • From: – ‘this man is a 40 year old patient with molar caries’ • To (something like): – ‘this-1 on which depend this-2 and this-3 has this-4’, where • • • • • • • • • • this-1 this-2 this-2 this-3 this-3 this-4 this-4 this-5 this-5 … instanceOf instanceOf qualityOf instanceOf roleOf instanceOf partOf instanceOf partOf human being … age-of-40-years … this-1 … patient-role … this-1 … caries… this-5 … molar… this-1 … denotators for universals or particulars Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences The shift envisioned • From: – ‘this man is a 40 year old patient with molar caries’ • To (something like): – ‘this-1 on which depend this-2 and this-3 has this-4’, where • • • • • • • • • • this-1 this-2 this-2 this-3 this-3 this-4 this-4 this-5 this-5 … instanceOf instanceOf qualityOf instanceOf roleOf instanceOf partOf instanceOf partOf human being age-of-40-years this-1 patient-role this-1 caries this-5 molar this-1 … … … … … … … … … time stamp in case of continuants R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Relevance: the way RT-compatible EHRs ought to interact with representations of generic portions of reality instance-of at t caused #105 by R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Should be obvious for ontologists, but … • Comments by ICBO 2013 reviewers: (with self-claimed ‘high confidence’ in their expertise) – The basic idea is to introduce unique identifiers for denoting the real world entities. This notion is very similar to the URI (uniform resource identifier), used in RDF and the semantic web. Rather just a bit similar but WITH a more precise semantics – All together we achieve a sentence F(#1,#2,#3) with three constants, denoting real world entities. Since a direct link between #i and a real entity does not solve the problem of uniqueness, the uniqueness of the entities behind #1, #1, #3 can only be derived from the whole description. 49 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Should be obvious for ontologists, but … • The idea of denoting entities by UII is trivial, and is already used within the framework of RDF and the semantic web. • However: for RDF – No distinction between classes and instances (individuals) <Species, type, Class> <Lion, type, Species> <Leo, type, Lion> – Properties can themselves have properties <hasDaughter, subPropertyOf, hasChild> <hasDaughter, type, familyProperty> – No distinction between language constructors and ontology vocabulary, so constructors can be applied to themselves/each other <type, range, Class> <Property, type, Class> <type, subPropertyOf, subClassOf> • No ontological foundations 50 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Towards self-explanatory datasets in OPMQoL 51 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Specific Aims of the OPMQoL Project 1. describe the portions of reality covered by the five datasets by means of a realism-based ontology (OPMQoL), 2. design bridging axioms required to express the data dictionaries of the datasets in terms of the OPMQoL and translate these axioms in the query languages used by the underlying databases, 3. validate OPMQoL by querying the datasets with and without using the ontology and by comparing the results in function of the clinical question identified, 4. document the development and validation approach in a way that other groups can re-use and expand OPMQoL, and use our approach in other domains. 52 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Considered datasets • ‘US Dataset’ (724 patients) resulted from the NIH funded RDC/TMD Validation Project, • ‘Hadassah Dataset’ (306 patients) from the Orofacial Pain Clinic at the Faculty of Dentistry, Hadassah, • ‘German Dataset’ (416 patients) of patients seeking treatment for orofacial pain at the Department of Prosthodontics and Materials Sciences, University of Leipzig, • ‘Swedish Dataset’of 46 consecutive Atypical Odontalgia (AO) patients recruited from 4 orofacial pain clinics in Sweden as well as data about age- and gender-matched control patients, 35 of which being painless and 41 being TMD patients, • ‘UK Dataset’ (168 patients) of facial pain of non dental origin present for a minimum of three months. 53 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Linking the instruments and other tools • analyze data dictionaries, assessment instruments, study criteria and corresponding terminologies, • build realism-based application ontologies to link these sources to realism-based reference ontologies. Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences uses Terminology component 1.. used-for * 0..* terminology 1 uses 1.. 1.. data dictionary * term part-of 1 1.. * expressed-by has-part 0..* 1 0..1 expresses uses Data component * used-for data collection 1 used-for 1..* 0..1 uses 1.. * measurement datum broader 0..* 1..* narrower representational artifact 1.. 0..* * uses uses 0..1 assessment instrument ontology data collection ontology used-for 0..* corresponds-to 1 used-in 1.. means concept used-for * 0..* 1.. * assessment instrument application ontology bridging axiom 1..x 1 uses used-for ontology Ontology component 1 entity denoted by denotes 1 1.. denotator * 1 reference ontology R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Mapping assessment instrument terms, ontology and patient cases 56 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Objectives of the ‘sources’ analysis • Find for each value V in the data collections all possible configurations of entities (according to our best scientific understanding) for which the following can be true: – V – ‘it is stated that V’ • Describe these possible configurations by means of sentences from a formal language that mimic the structure of reality. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Objectives of the ‘sources’ analysis (2) • For example, – for the value stating that ‘The patient with patient identifier ‘PtID4’ has had a panoramic X-ray of the mouth which is interpreted to show subcortical sclerosis of that patient’s condylar head of the right temporomandibular joint’ to be true, – this statement must have been made, – for the statement to be true, there must have been that patient, an X-ray, etc, … – BUT! It is not necessarily true that that patient has indeed the sclerosis as diagnosed. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Methodology (1): for the 1st order reality 1. Formulate for each variable in the data collection a sentence explaining as accurately as possible what the variable stands for, 2. list the entities in reality that the terms in the sentence denote, 3. list recursively for all entities listed further entities that ontologically must exist for the entity under scrutiny to exist, 4. classify all entities in terms of realism-based ontologies (RBO), 5. specify all obtaining relationships between these entities, 6. outline all possible configurations of such entities for the sentence to be true. Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Step 1: formulate a statement ‘The patient with patient identifier ‘PtID4’ is stated to have had a panoramic X-ray of the mouth which is interpreted to show subcortical sclerosis of that patient’s condylar head of the right temporomandibular joint’ meaning 1 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Step 2 (1): list the entities denoted • 1(The patient) with 2(patient identifier ‘PtID4’) 3(is stated) 4(to have had) a 5(panoramic X-ray) of 6(the mouth) which 7(is interpreted) to 8(show) 9(subcortical sclerosis of 10(that patient’s condylar head of the 11(right temporomandibular joint)))’ notes: CLASS person patient identifier assertion technically investigating panoramic X-ray mouth interpreting seeing diagnosis condylar head of right TMJ right TMJ colors have no meaning here, just provide easy reference, this first list can be different, any such differences being resolved in step 3 INSTANCE IDENTIFIER IUI-1 IUI-2 IUI-3 IUI-4 IUI-5 IUI-6 IUI-7 IUI-8 IUI-9 IUI-10 IUI-11 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 2 (2): provide directly referential descriptions person patient identifier assertion INSTANCE IDENTIFIER IUI-1 IUI-2 IUI-3 technically investigating IUI-4 DIRECTLY REFERENTIAL DESCRIPTIONS the person to whom IUI-2 is assigned the patient identifier of IUI-1 'the patient with patient identifier PtID4 has had a panoramic X-ray of the mouth which is interpreted to show subcortical sclerosis of that patient’s right temporomandibular joint' the technically investigating of IUI-6 panoramic X-ray mouth interpreting seeing diagnosis condylar head of right TMJ right TMJ IUI-5 IUI-6 IUI-7 IUI-8 IUI-9 IUI-10 IUI-11 the panoramic X-ray that resulted from IUI-4 the mouth of IUI-1 the interpreting of the signs exhibited by IUI-5 the seeing of IUI-5 which led to IUI-7 the diagnosis expressed by means of IUI-3 the condylar head of the right TMJ of IUI-1 the right TMJ of IUI-1 CLASS R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 3: identify further entities that ontologically must exist for each entity under scrutiny to exist. assigner role assigning asserting asserter role investigator role IUI-12 IUI-21 IUI-20 IUI-13 IUI-14 the assigner role played by the entity while it performed IUI-21 the assigning of IUI-2 to IUI-1 by the entity with role IUI-12 the asserting of IUI-3 by the entity with asserter role IUI-13 the asserter role played by the entity while it performed IUI-20 the investigator role played by the entity while it performed IUI-4 panoramic X-ray machine image bearer IUI-15 the panoramic X-ray machine used for performing IUI-4 interpreter role IUI-16 the image bearer in which IUI-5 is concretized and that participated in IUI-8 IUI-17 the interpreter role played by the entity while it performed IUI-7 perceptor role IUI-18 the perceptor role played by the entity while it performed IUI-8 diagnostic criteria IUI-19 the diagnostic criteria used by the entity that performed IUI-7 to come to IUI-9 study subject role IUI-22 the study subject role which inheres in IUI-1 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 3: some remarks • interpreter role, perceptor role, … – reference to roles rather than the entity in which the roles inhere because it may be the same entity and one should not assign several IUIs to the same entity • each description follows similar principles as Aristotelian definitions but is about particulars rather than universals R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 4: classify all entities in terms of realism-based ontologies CLASS person patient identifier assertion technically investigating panoramic X-ray mouth interpreting seeing diagnosis condylar head of right TMJ right TMJ assigner role assigning study subject role HIGHER CLASS BFO: Object IAO: Information Content Entity IAO: Information Content Entity OBI: Assay IAO: Image FMA: Mouth MFO: Assessing BFO: Process IAO: Information Content Entity FMA: Right condylar process of mandible FMA: Right temporomandibular joint BFO: Role BFO: Process OBI: Study subject role • requires more ontological and philosophical skills than domain expertise or expertise with Protégé, • not just term matching R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 5: specify relationships between these entities • For instance: – at least during the taking of the X-ray the study subject role inheres in the patient being investigated: • IUI-23 inheres-in IUI-1 during t1 – the patient participates at that time in the investigation • IUI-4 has-participant IUI-1 during t1 • These relations need to follow the principles of the Relation Ontology. Smith B, Ceusters W, Klagges B, Koehler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector A, Rosse C. Relations in biomedical ontologies, Genome Biology 2005, 6:R46. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Step 6: outline all possible configurations of such entities for the sentence to be true (a one semester course on its own) • Such outlines are collections of relational expressions of the sort just described, • Variant configurations for the example: – perceptor and interpreter are the same or distinct human beings, – the X-ray machine is unreliable and produced artifacts which the interpreter thought to be signs motivating his diagnosis, while the patient has indeed the disorder specified by the diagnosis (the clinician was lucky) –… R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Methodology (2): for each dataset • Build a formal template which describes: – the results of steps 4-6 of the 1st order analysis, – the relationships between: • the 1st order entities and the corresponding data items in the data set, • data items themselves. • Build a prototype able to generate on the basis of the template for each subject (patient) in the dataset an RT-compatible representation of his 1st and 2nd order entities. 68 R T U New York State Center of Excellence in Bioinformatics & Life Sciences The template 69 Institute for Healthcare Informatics Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Partial Template for 3 variables (in the ‘German’ dataset) RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 70 Var RT IM id LV id IM sex CV sex CV sex CV sex UA q3 CV q3 CV q3 IM q3 IM q3 IM q3 RP q3 UP q3 UA q3 JA REF patient_study_record patient_identifier patient gender male female sex no_pain_in_ lower_face pain_in_ lower_face in_the_past_month lower_face time_of_q3_concretization an_8_gcps_1 an_8_gcps_1 an_8_gcps_1 an_8_gcps_1 Min Max Val 0 1 BLANK BLANK 0 1 0 1 BLANK BLANK 0 10 BLANK BLANK 0 0 1 0 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics 3 variables in the ‘German’ dataset RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 71 Var RT REF Min Max Val IM patient_study_record id LV patient_identifier id IM patient sex CV gender sex CV maleto the question: ‘Have you had pain in the face, 0 Answer sex CV female 1 jaw, temple, in front of the ear or in the ear in the past sex UA sex BLANK BLANK month?’ q3 CV no_pain_in_ lower_face 0 q3 CV pain_in_ lower_face 1 q3 IM in_the_past_month q3 IM lower_face q3 IM time_of_q3_concretization Answer to the question: ‘’ How would you rate your facial q3 RP an_8_gcps_1 0 0 0 pain on a 0 to 10 scale at the present time, that is right q3 UP an_8_gcps_1 1 10 0now, where 0 is "no pain" and 10 isBLANK "pain as bad as could be"? q3 UA an_8_gcps_1 BLANK 1 q3 JA an_8_gcps_1 BLANK BLANK 0 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Record Types in the template RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 72 Var RT IM id LV id IM sex CV sex CV sex CV sex UA q3 CV q3 CV q3 IM q3 IM q3 IM q3 RP q3 UP q3 UA q3 JA REF Min Max patient_study_record patient_identifier patient LV: Literal value gender male CV: Coded Value female IM: Implicit sex BLANK BLANK no_pain_in_ lower_face pain_in_ lower_face JA: Justified Absence in_the_past_month UA: Unjustified Absence lower_face time_of_q3_concretization UP: Unjustified Presence an_8_gcps_1 0 0 RP: Redundant Presence an_8_gcps_1 1 10 an_8_gcps_1 BLANK BLANK an_8_gcps_1 BLANK BLANK Val 0 1 0 1 0 0 1 0 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences Condition-based xA/xP determination RN 7 13 14 15 16 Var sex q3 q3 q3 q3 RT UA RP UP UA JA REF sex an_8_gcps_1 an_8_gcps_1 an_8_gcps_1 an_8_gcps_1 Min BLANK 0 1 BLANK BLANK Max BLANK 0 10 BLANK BLANK Val 0 0 1 0 If the value of REF is either outside the range of Min/Max or ‘BLANK’ and the value for Var is as indicated by Val, including no value at all, then 73 the presence or absence of the corresponding data item is of a sort indicated by RT. R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Conditional selection of descriptions 74 Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences RT compatible part of the template RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 75 IUI(L) #pidL#patL#patgL- #q3L0#q3L1- #q3L#q3L#q3L#q3L- IUI(P) P-Type #psrec#pid#pat#patg#patg#patg#patgL#pat#pq3#tq3#patlf#cq3#q3L#q3L#q3L#q3L- DATASET-RECORD DENOTATOR PATIENT GENDER MALE-GENDER FEMALE-GENDER UNDERSPEC-ICE PAIN MONTH-PERIOD LOWER-FACE TIME-PERIOD DISINFORMATION UNDERSPEC-ICE J-BLANK-ICE P-Rel P-Targ denotes #pat- inheres-in inheres-in inheres-in #pat#pat#pat- lacks-pcp participant PAIN part-of after corresp-w #pat#tq3#q3L0- #pat- Trel Time at at at at at at at at at t t t t t t t #tq3#tq3- at t at at at at t t t t Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences RT compatible part of the template RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 76 IUI(L) #pidL#patL#patgL- #q3L0#q3L1- #q3L#q3L#q3L#q3L- IUI(P) P-Type P-Rel P-Targ Trel Time #psrecDATASET-RECORD at t #pidDENOTATOR denotes #patat t #patPATIENT at t #patgGENDER inheres-in #patat t #patgMALE-GENDER inheres-in #patat t #patgFEMALE-GENDER inheres-in #patat t denotes (when instantiated) the gender of the patient #patgL- UNDERSPEC-ICE at t #patlacks-pcp PAIN at #tq3#pq3- (when PAIN participant at #tq3denotes instantiated) the data#patitem concretized MONTH-PERIOD in#tq3the dataset in relation to the gender #patlfLOWER-FACE part-of #pat-of theatpatient t #cq3TIME-PERIOD after #tq3#q3Lcorresp-w #q3L0at t #q3LDISINFORMATION at t #q3LUNDERSPEC-ICE at t #q3LJ-BLANK-ICE at t Institute for Healthcare Informatics R T U New York State Center of Excellence in Bioinformatics & Life Sciences RT compatible part of the template RN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 77 IUI(L) #pidL#patL#patgL- #q3L0#q3L1- #q3L#q3L#q3L#q3L- IUI(P) P-Type #psrec#pid#pat#patg#patg#patg#patgL#pat#pq3#tq3#patlf#cq3#q3L#q3L#q3L#q3L- DATASET-RECORD DENOTATOR PATIENT GENDER MALE-GENDER FEMALE-GENDER UNDERSPEC-ICE PAIN MONTH-PERIOD LOWER-FACE TIME-PERIOD DISINFORMATION UNDERSPEC-ICE J-BLANK-ICE P-Rel P-Targ denotes #pat- inheres-in inheres-in inheres-in #pat#pat#pat- lacks-pcp participant PAIN part-of after corresp-w #pat#tq3#q3L0- #pat- Trel Time at at at at at at at at at t t t t t t t #tq3#tq3- at t at at at at t t t t R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Work in progress: IAO (?) related types • UNDERSPECIFIED-ICE – ICE which describes a portion of reality at determinable rather than determinate level • DISINFORMATION – GDC which provides erroneous information • J-BLANK-ICE – GDC which conveys there should not be an ICE concretized. 78 R T U New York State Center of Excellence in Bioinformatics & Life Sciences Institute for Healthcare Informatics Acknowledgement The work described is funded in part by grant 1R01DE021917-01A1 from the National Institute of Dental and Craniofacial Research (NIDCR). The content of this presentation is solely the responsibility of the author and does not necessarily represent the official views of the NIDCR or the National Institutes of Health.