Data Normalization Dr. Stan Huff Acknowledgements • • • • • • • • • Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others… #2

Download Report

Transcript Data Normalization Dr. Stan Huff Acknowledgements • • • • • • • • • Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many, many, others… #2

Data Normalization
Dr. Stan Huff
Acknowledgements
•
•
•
•
•
•
•
•
•
Tom Oniki
Joey Coyle
Craig Parker
Yan Heras
Cessily Johnson
Roberto Rocha
Lee Min Lau
Alan James
Many, many, others…
#2
What are detailed clinical
models?
Why do we need them?
#3
A diagram of a simple clinical model
Clinical Element Model for Systolic Blood Pressure
SystolicBP
SystolicBPObs
data
138 mmHg
quals
BodyLocation
BodyLocation
data
Right Arm
PatientPosition
PatientPosition
data
Sitting
#4
Need for a standard model
•A stack of coded items is ambiguous (SNOMED CT)
– Numbness of right arm and left leg
•
•
•
•
•
Numbness (44077006)
Right (24028007)
Arm (40983000)
Left (7771000)
Leg (30021000)
– Numbness of left arm and right leg
•
•
•
•
•
Numbness (44077006)
Left (7771000)
Arm (40983000)
Right (24028007)
Leg (30021000)
#5
What if there is no model?
Site #1
37 %
Hct, manual: 70
Hct, auto
Site #2
Hct
70
: 35
%
37
: 70
%
Manual
Auto
Estimated
#6
HL7 V2.X Messages
• Site 1:
OBX|1|CE|4545-0^Hct, manual||37||%|
OBX|1|CE|4544-3^Hct, auto||35||%|
• Site 2:
OBX|1|CE|20570-8^Hct||37||%|….|manual|
OBX|1|CE|20570-8^Hct||35||%|….|auto|
Too many ways to say the same thing
•A single name/code and value
– Hct, manual is 37 %
•Two names/codes and values
– Hct is 37 %
• Method is manual (spun)
#8
Model fragment in XML
Pre-coordinated representation
<observation>
<cd> Hct, manual (LOINC 4545-0 ) </cd>
<value> 37 % </value>
</observation>
Post-coordinated (compositional) representation
<observation>
<cd> Hct (LOINC 20570-8) </cd>
<qualifier>
<cd> Method </cd>
<value> Manual </value>
<qualifier>
<value> 37 % </value>
</observation>
#9
Isosemantic Models
Precoordinated Model
HematocritManual (LOINC 4545-0)
HematocritManualModel
data
37 %
Post coordinated Model (Storage Model)
Hematocrit (LOINC 20570-8)
HematocritModel
data
37 %
quals
HematocritMethodModel
data
Hematocrit Method
Manual
# 10
Relational database implications
Patient
Identifier
Date and Time
Observation Type
Observation
Value
Units
123456789
7/4/2005
Hct, manual
37
%
123456789
7/19/2005
Hct, auto
35
%
Patient
Identifier
Date and Time
Observation
Type
Weight type
Observation
Value
Units
123456789
7/4/2005
Hct
manual
37
%
123456789
7/19/2005
Hct
auto
35
%
If the patient’s hematocrit is <= 35 then ….
# 11
More complicated items:
•
•
•
•
•
•
•
Signs, symptoms
Diagnoses
Problem list
Family History
Use of negation – “No Family Hx of Cancer”
Description of a heart murmur
Description of breath sounds
– “Rales in right and left upper lobes”
– “Rales, rhonchi, and egophony in right lower lobe”
# 12
What do we model?
• All health care data, including:
–
–
–
–
–
–
–
–
–
–
Allergies
Problem lists
Laboratory results
Medication and diagnostic orders
Medication administration
Physical exam and clinical measurements
Signs, symptoms, diagnoses
Clinical documents
Procedures
Family history, medical history and review of symptoms
# 13
How are the models used?
• EMR: data entry screens, flow sheets, reports, ad hoc
queries
– Basis for application access to clinical data
• Data normalization
– Creation of maps from models in the local system to the
standard model
• Target for the output of structured data from NLP
– Validation of data as it is stored in the database
• Phenotype algorithms (decision logic)
– Basis for referencing data in phenotype definitions
• Does NOT dictate physical storage strategy
# 14
Model Source Expression (CDL)
model BloodPressurePanel is panel
{
key code(BloodPressurePanel_KEY_ECID);
statement SystolicBloodPressureMeas systolicBloodPressureMeas optional
systolicBloodPressureMeas.methodDevice.conduct(methodDevice)
systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord)
systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition)
systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext)
systolicBloodPressureMeas.subject.conduct(subject)
systolicBloodPressureMeas.observed.conduct(observed)
systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived)
systolicBloodPressureMeas.verified.conduct(verified);
statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional
….
statement MeanArterialPressureMeas meanArterialPressureMeas optional
….
qualifier MethodDevice
methodDevice
optional;
md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID);
qualifier BodyLocationPrecoord
bodyLocationPrecoord
optional;
blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID);
modifier Subject
subject
optional;
attribution Observed
observed
optional;
attribution ReportedReceived
reportedReceived
optional;
attribution Verified
verified
optional;
}
# 15
Compiler
XML Template - .xsd
Java Class
“In Memory” Form
HTML
CE
Source
File
CE
Translator
SMArt RDF?
UML?
openEHR Archetype?
HL7 RIM Static Models?
OWL?
# 16
Artifacts Used
CDL Model Definition
CEM XML Schema
HL7 Data Source
CEM XML Instance
StandardLabObsQuantitative - CDL Definition
import StandardLabObs;
import ReferenceRangeNar;
model StandardLabObsQuantitative is statement extends StandardLabObs {
key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID);
data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID)
alternate {
match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID);
match CD altCDValue code.domain(LabValue_VALUESET_ECID);
otherwise ST altSTValue;
};
qualifier ReferenceRangeNar referenceRangeNar card(0..1);
constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID);
constraint abnormalInterpretation.CD.code.domain
(AbnormalInterpretationNumericNom_VALUESET_ECID);
constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID);
}
StandardLabObsQuantitative - Schema Snippet
<xs:complexType name="StandardLabObsQuantitative">
<xs:sequence>
<xs:element name="key" minOccurs="0" maxOccurs="1" type="CD"/>
<xs:element name="primaryPQValue" type="PQ"/>
<xs:element name="referenceRangeNar" minOccurs="0" maxOccurs="1" type="ReferenceRangeNar"/>
<xs:element name="accessionNumber" minOccurs="0" maxOccurs="1" type="AccessionNumber"/>
<xs:element name="fillerOrderNumber" minOccurs="0" maxOccurs="1" type="FillerOrderNumber"/>
<xs:element name="placerOrderNumber" minOccurs="0" maxOccurs="1" type="PlacerOrderNumber"/>
<xs:element name="resultStatus" minOccurs="0" maxOccurs="1" type="ResultStatus"/>
<xs:element name="reportingPriority" minOccurs="0" maxOccurs="1" type="ReportingPriority"/>
<xs:element name="abnormalInterpretation" minOccurs="0" maxOccurs="1" type="AbnormalInterpretation"/>
<xs:element name="ordinalInterpretation" minOccurs="0" maxOccurs="1" type="OrdinalInterpretation"/>
<xs:element name="deltaFlag" minOccurs="0" maxOccurs="1" type="DeltaFlag"/>
<xs:element name="responsibleObserver" minOccurs="0" maxOccurs="unbounded" type="ResponsibleObserver"/>
<xs:element name="performingLaboratory" minOccurs="0" maxOccurs="1" type="PerformingLaboratory"/>
<xs:element name="comment" minOccurs="0" maxOccurs="unbounded" type="Comment"/>
<xs:element name="subject" minOccurs="0" maxOccurs="1" type="Subject"/>
<xs:element name="specimenCollected" minOccurs="0" maxOccurs="1" type="SpecimenCollected"/>
<xs:element name="specimenReceivedByLab" minOccurs="0" maxOccurs="1" type="SpecimenReceivedByLab"/>
<xs:element name="resulted" minOccurs="0" maxOccurs="1" type="Resulted"/>
\
<xs:element name="patientId" minOccurs="0" maxOccurs="1" type="anonymous.2"/>
<xs:element name="status" minOccurs="0" maxOccurs="1" type="anonymous"/>
<xs:element name="instanceId" minOccurs="0" maxOccurs="1" type="anonymous.2"/>
<xs:element name="typeId" minOccurs="0" maxOccurs="1" type="anonymous.3"/>
</xs:sequence>
<xs:attribute name="class" type="statement.type" default="statement"/>
<xs:attribute name="type" type="ecid.type" default="b1ceaebb-dd15-4317-3f99-67ef3af81778"/>
</xs:complexType>
HL7 Source Instance
MSH|^~\&|OADD|153|DADD|XNEPHA|20110208000109||ORU^R01|201102
07000036|T|2.2||||
EVN|R01|201102080000|
PID||1234567|274382554|007261|WHYLING^KAYLIE^O'TEST||19460413|F|
|W|||(801)224-1528|(866)772-3150||||21443041|535194412|
PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST
|^||||||||||OP||||||||||||||||||||||||||201102070000||||||||
ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^|
OBR||^|F506556^|HCT^HEMATOCRIT|R||201102071554|||70011^ROSEN,A
UBRY^ O'TEST |||20110207161200|^|28826^Allyson^Josephine^ O'TEST
||||M2415648||||C|F|RFP^RFP|^^^^^R|^~^~^|||||||
OBX|1|NM|HCT^HEMATOCRIT|1.1|48|%|||R||F|||201102080000|IM^Perfor
med at Inte|58528^ANDERSON^MARK|
LabObsQuantitative - XML Instance Snippet
<labObsQuantitative type="b1ceaebb-dd15-4317-3f99-67ef3af81778">
<key>
<code>
<value>20570-8</value>
</code>
<codeSystem>
<value>LOINC</value>
</codeSystem>
<originalText>HCT</originalText>
</key>
<primaryPQValue>
<operator>
<value>equals</value>
</operator>
<unit>
<value>%</value>
</unit>
<value>48</value>
</primaryPQValue>
<referenceRangeNar type="6f422ce6-7bc6-2cc2-8c96-58c137b5c9fc">
…
</referenceRangeNar>
<abnormalInterpretation type="9a3c3c60-18f7-5a91-c10c-c15532a96303">
…
</abnormalInterpretation>
</labObsQuantitative>
Issues
• Different groups use models differently
– NLP versus EMR
• Structuring the models to meet more than one use
• Options for different granularities of models
– Hematocrit model, model of pneumonia
– Quantitative lab result model, x-ray finding
• Terminology integration – use of standards and
terminology services
• Models for “rare” kinds of data
– Medication being taken by a friend, not recommended by
the physician
# 22
Questions?
# 23
Data Normalization
Dr. Christopher Chute
IHC-Medication,
Mayo, IHC LAB to CEM
IHC
RXNORM
resource
HL7
(Meds)
HL7
(Labs)
HL7
Initializer
HL7
Initializer
Mayo
LOINC
resource
Drug
CEM
CAS
Consumer
IHC-GCN
TORXNORM
Annotator
GenericLABAnnotator
SharpDb
LAB
CEM
CAS
Consumer
IHC
LOINC
resource
Mirth
UIMA Normalization Pipeline
Convert HL7 V2.x Lab / Med Order
Messages into CEM XML instances
– Load SofA with HL7 message
– Create Segment Objects in CAS
– Normalize Segments in CAS
– Transform Segments into CEM
instances
HL7
Mayo, IHC LAB to CEM
Pipe
Delimited
One of the new pipelines created to normalize
HL7 2.x Lab Messages into CEM instances.
Mirth
HL7
(XML)
SharpDb
We pre-processed the HL7 messages converting
from HL7 pipe syntax into HL7 XML format.
HL7
Initializer
Mayo
LOINC
resource
GenericGenericLABLABAnnotators
Annotators
LAB
CEM
CAS
Consumer
IHC
LOINC
resource
Mirth
10109
45373-3
Parse
CAS
Normalize
CAS
Initialize
(SOFA=HL7-XML)
HL7 message
PV1
OBX
Transform
CEM
Mayo, IHC LAB to CEM
UIMA Pipeline Flow
PID
Normalization Anatomy
Lab Annotators
HL7
Segment
Parser
Syntactic
Integrity
Date-Time
To
ISO Format
LOINC
lookups
IHC codes
to LOINC
table
LexGrid/CTS2
Terminology
Services
Mayo codes
to LOINC
table
Architectural Opportunities
HL7
HL7
2.x
2.x
HL7
Mayo
2.x
CEM
CEM
CEM
format
format
format
Mirth
CDA
CDA
Time,
Syntax
Etc.
CEM
CEM
Semantic
format
format
CDA
CDA
CAS
To
XML
Mirth
Tactical Next Step
Enhancements
Single CEM for multiple OBX segments
Efficiently utilize terminology services
Incorporate a library for HL7 clean-up routines
Increase scope of vocabulary standardization
Enhancements for the Drug Annotator
– Context enhancement issue
– Drug name surprises
Additional Vocabularies
Review sources used for normalization
opportunities E.g.
– In HL7 OBR Segments
 Standardize Service ID (Codes)
– In HL7 OBX Segments
 Standardize Units
 Standardize Reference Ranges
 Standardize Normal Flags
Drug Name Disambiguity
Real patient data, presented a unique
case in drug names. “ToDAY” is
brand name for: cephapirin sodium.
This presents an interesting named
entity disambiguation use case.
Where Persistence Fits In…
6a
Mayo
EDT System
5
1
IHC
(Backend CDR
Systems)
2
Mirth
Connect
3
4
7
IHC
NwHIN
Aurion
Gateway
SHARP
NwHIN
Aurion
Gateway
Firewall
Firewall
6
Mirth
Connect
UIMA
Pipeline
8
9
10
CEM
Instance
Database
Persistence Channels
One Channel per model
CEM Model
Mirth Channel
Administrative Diagnosis
CemAdminDxToDatabase
Standard Lab Panel
CemLabToDatabase
Ambulatory Medication
Order
CemMedicationToDatabase
Data stored as an XML Instance of the model
Fields extracted from XML to use as indices
XML Schema defined for each model
Stored using database transactions
General Channel Design
Channel
Processed
Message
Directory
Connector
Input
Message
Directory
Connector
CEM XML
Instance
Error
Message
Directory
Persistence
Store
SharpDB a CEM Instance Database
Database Tables
Table
Purpose
Demographics
Patient demographics
(One row per patient)
PatientCrossReference
Associates internal Patient ID with Site Patient ID
(One row per cross map)
SourceData
Information about the original source data
(One row per instance message)
PatientData
CEM Instance XML with some source information
(One row per instance message)
IndexData
Indices into the XML instance.
(Multiple rows per instance message
AdminDx – One per message
Lab – One per observation
Medication – One per orderable item.)
Patient Demographics
Each message contains patient
demographics
Demographics created on first received
message based on site patient ID
Internal Patient ID is created and cross
mapped to site patient ID
SharpDB is keyed off internally generated
Patient ID
Running in a Cloud…
Various images were installed:
– NwHIN Gateway provided by Aurion
– MIRTH Connect our interface engine
– UIMA Pipelines of various sorts
– MySQL database for persistence
– JBOSS / Drools rules engine
All open source,
running in a Ubuntu Cloud!
SHARP Hardware Infrastructure
Node Server 1
Cloud Server
Admin
Client
Interfa
ce
VPN
/
LAN
Cloud Controller Walrus Controller
Cluster Controller Storage Controller
Persistence
Storage
Node Controller
VM
VM
VM
Node Server 2
Node Controller
VM
VM
VM
Image
Storage
Build/Backup
Server
To Manage Cloud
Node Server 3
User
VPN
/
LAN
Node Controller
Private Switch
VM
VM
VM
…
Node Server 11
Node Controller
VM
VM
VM
To Connect To Instances
Hardware
No. of
Physical
Machines
CPU
Memory
Disk
Disk Space
Networking
Functionality
Cloud Server
1
8
12 GB
10000 RPM SAS
1 TB
1 Gbps
Cloud, Walrus, Cluster and
Storage Controller
4
Node Server
1
8
32 GB
10000 RPM SAS
1 TB
1 Gbps
Node Controller
4
Node Server
8
24
128 GB
10000 RPM SAS
600 GB/600 GB
1 Gbps
Node Controller
4
Node Server
1
8
64 GB
7200 RPM SATA
1 TB/1 TB
1 Gbps
Node Controller
4
Node Server
1
8
32 GB
10000 RPM SAS
4 TB
1 Gbps
Node Controller
4
Build/Backup Server
1
2
8 GB
7200 RPM SATA
2 TB
1 Gbps
Build and Backup
2
Storage
2
10000 RPM SAS
7.5 TB
1 Gbps
Persistence and Image Storage
Storage
2
10000 RPM SAS
3.6 TB
1 Gbps
Volume Storage
Cisco 48 Port Switch
2
1 GB
No.
of
NICs
Data Normalization Summary
Initial “tracer shot” at Data Normalization
– Cloud based processing using open source tools
– Proof on concept, UIMA for Data Normalization
– Move on to new problems / solutions…
– Opportunities exist:




Add new annotators (modules) to the pipelines
Widen usage and scope of vocabulary services
Switch to real live flows and add HOSS clean up routines.
Various tweaks in NLP algorithms