HL7 The Data Standard for Biomedical Informatics

Download Report

Transcript HL7 The Data Standard for Biomedical Informatics

HL7 Version 3 Data Types
Overview
HL7 Spring Meeting,
San Antonio, TX, May 4 2004
http://aurora.regenstrief.org/v3dt/tutorial.ppt
Gunther Schadow,
Regenstrief Institute, Indiana
University School of Medicine,
Indianapolis, IN
Purpose
• Give overview of HL7 data types
• Empower you to read the
specification
• Give some rationale
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
2
Semantics first …
• Data types are the fundamental
constituents of all health care
information.
• Share meaning across different
technologies.
• Only value, no identity or state.
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
3
… representation later
• Representations should preserve
information content.
• e.g. real numbers have precision that can
hide in the representation.
• but there is some latitude
• purpose is to fit data types into the ITS
technology
• Existing representations (ITS)
• XML
• UML for use with OCL
• literal forms
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
4
Text
Basic types
Character String
Encapsulated Data
String with
code
Real
Integer
Boolean
Entity
Name
Information
Postal Address
Physical Quantity
Units of
Telecom. Addr.
Point in
Measure
Instance
Time
Ratio Ordinal
Identifier
Identifier,
Monetary
Concept
Names,
Amount
Descriptor
Quantity
Codes
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
5
Orthogonal
issues
Text
•Collections
•
Text
•set, list, bag
•
•interval
•
Text
•Incomplete
information
•null values
•
•Uncertainty
•
•probability
distributions
•History
Number
Symbol
Number
Symbol
Quantities
Number
Information
Information
Ordinal
Quantities
Information
Ordinal
Quantities
Ordinal
Quantities
•linear, cyclic
8/10/2003
3
Text
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
Concepts
Symbol
& Things
Concepts
& Things
Concepts
& Things
Concepts
& Things
6
ANY Data Value
• has a data type
• can be missing (NULL)
• null “flavors”
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
7
NI – no information
Null Flavors
applicable
UNK – unknown
NASK – not
asked
NA – not
applicable
known
ASKU – asked
but unknown
MSK – masked
NAV – temporarily
not available
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
8
Boolean (BL)
• true or false … or NULL
• except if BN – Boolean non-NULL
• x AND true = x
• x OR false = x
NOT
AND
true
false
NULL
OR
true false
NULL
true
false
true
true
false
NULL
true
true
true
true
false
true
false
false
false
false
false
true
false
NULL
NULL
NULL
NULL
NULL false
NULL
NULL
true
NULL
NULL
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
9
Character String (ST)
• “this is an example”
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
10
Encapsulated Data (ED)
• “this is an example”
• a string (ST) is a restriction of ED
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
11
Encapsulated Data
• Inline data
• binary data representation base64
• ST is special case: representation text
• MIME media type describes what it is (like
email attachments)
• By reference
• for bulky data (images)
• references are simply URLs
• integrity check hash values for safety
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
12
Entity Names (EN)
• based on much international
harmonization work (1999)
• modeled as “markup” of strings
• name is a string with certain
name parts identified as given,
family, prefix, etc.
• delimiters: comma, dash, space,
etc. no newlines.
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
13
Entity Names (EN)
• Example:
“Habtemariam Kassa”
• which is given name, which is
family name?
• if we know, we can say
<given>Habtemariam</given> <family>Kassa</family>
• or
<given>Kassa</given> <family>Habtemariam</family>
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
14
Entity Names (EN)
• EN – entity name
• name parts: prefix, given, family, suffix
• name part qualifiers
• PN – person name
• mostly the same as EN
• ON – organization name
• much simplified EN with only suffix (for
legal status, Inc. Ltd. GmbH. etc.
• TN – trivial name
• just a string, e.g., “Lake Michigan”
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
15
Postal Address (AD)
• Like names, modeled as a
“markup” of strings.
• Parts for street, city, postal code,
etc.
• Addresses usually have multiple
lines.
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
16
Instance Identifier (II)
• Simple and guaranteed globally
unambiguous.
• Mandatory root
• ISO OID: e.g. 2.16.840.1.113883.1122
• DCE UUID (aka GUID)
• HL7 reserved unique identifiers (RID)
• Optional extension
• for alphanumeric identifiers
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
17
Concept Descriptors (CD)
• Guaranteed unambiguous
• Mandatory codeSystem
• specified as OID or other UID
• Mandatory code
• specified as string (ST)
• optional displayName (ST)
• optional originalText (ED)
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
18
Concept Descriptors (CD)
• optional translations
• to map codes between different
codeSystems
• local code
• standard code
• optional qualifiers
• only allowed for codeSystems that
define them, e.g. SNOMED, HCPCS
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
19
Restrictions on Coded Types
• Concept Descriptor (CD)
• everything
• Coded with Equivalents (CE)
• no qualifiers but translations
• Coded Value (CV)
• only code, codeSystem
• no translations
• Coded Simple Value (CS)
• only code, FIXED codeSystem
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
20
Quantities
• Integer (INT)
• Real (REAL)
• Physical Quantity (PQ)
• Monetary Amount (MO)
• Point in time (TS)
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
21
Integer (INT)
• 1, -2, 3 …
• 10000000000000000000000000001
• no limit on size
• Special NULL flavors
• positive infinity
• negative infinity
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
22
Real
• 1, -2, 3, 1.1, 2.001, 3.1234e-5
• 1.000000000000000000000000001
• precision!
• no limit on size or precision
• Special NULL flavors
• positive infinity
• negative infinity
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
23
Physical Quantity (PQ)
• A real number with a coded unit
• REAL value, CS unit
• 1 m, 100 cm, 5 mL, 20 mg/dL
• 1 m = 100 cm
• 1 [in_in] = 2.54 cm
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
24
Units of Measure
• Units defined in the Unified Code
for Units of Measure (UCUM)
• compatible to ISO2955 (“ISO+”)
• ANIS X3.50 customary units included
but new symbols defined
• Semantics defined
• based on dimensional analysis
• 1 kJ = 1000 m2 s-2 g1
= <1000, [2,-2,1,0,0,0,0]>
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
25
Constraints on PQ
• don’t constrain on specific unit
• e.g. don’t say “length unit must be
centimeter (not meter, not inches)
• instead constrain on dimensionality
• e.g. say: length ~ 1 m
• “any unit comparable with meter”
• 1.00 m = 100 cm = 39.4 [in_i]
• e.g. say: pauseQuantity ~ 1 s
• “any unit comparable with second”
• 1 d = 24 h = 1440 min = 86400 s
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
26
Alternative Unit Codes?
• UCUM is mandatory for PQ itself
• is most complete units vocabulary
• unambiguously defined
• Physical Quantity Representation
(PQR) available to refer to other
unit codes:
• like CV with a “value” attribute
• code, codeSystem, value
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
27
Monetary Amount (MO)
• A real number with a currency
code
• REAL value, CS currency
• 1 USD, 30 CZK, .9 EUR
• 1 USD = 27 CZK ???
• no fixed conversion factors
• not the same as physical quantity
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
28
Ratio (RTO)
• numerator : denominator
• each can be any quantity REAL,
INT, PQ, MO
• usually mixed types
• use only if you want to avoid
canceling
• don’t use just because you have
a quotient
• 10 mL/min, 180 g/mol, are just PQ
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
29
Point in Time (Timestamp, TS)
• usually expressed as calendar
date and time
• e.g. YYYYMMDDHHMMSS.nnnn…
• related with elapsed time (PQ) as
• TS t2 – TS t2 = PQ Δt
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
30
• BAG
• unordered
• multiples of
same value
• LIST
• ordered
• multiples
• SET
• unordered
• no multiples
aka. “collections”
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
31
Continuous Sets
• Intervals (IVL) are sets too,
• e.g., the set of numbers between 0.5 and
1.75.
• also known as “ranges”
• properties: low, high
• e.g., IVL<TS>
• low - start time
• high - end time
• but also: width, center, …
• any property can be left unspecified
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
32
Time and Timing Datatypes
• Elapsed time: 10 min, 30 s, etc.
• a Physical Quantity (like any other)
• Point in time: 19870605043210.001
• Interval of time: 19870605..19870613
• Periodic interval of time (PIVL)
• period = 7d, phase = [19870605;19870606[
• Event related interval of time (EIVL):
• e.g., 1h AC, CC, HS
• Arbitrary Set of Time aka “GTS”
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
33
Periodic Time
Dt
w = 2p f
j = w Dt
j
f=
1
T
T
•
•
•
•
8/10/2003
Frequency f = 3/d, same as
Period T = 8 h
Phase j (~ Dt) can address any point in period.
If phase is a range, we get periodic time intervals.
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
34
Arbitrary Sets of Time
• Composed by set-operations
• Example: Every other day from Monday to Friday 8:00
AM to 10:00 AM for six consecutive times.
Sa Su Mo Tu We Th Fr
Sa Su Mo Tu We Th Fr
Monday–Friday
Monday–Friday
Mo Tu We Th Fr
0
2
4
6
Mo
We
2
Sa Su Mo Tu
Mo–
Mo Tu We Th Fr
8
Mo Tu
10
12
14
16
Fr
Tu
Th
Mo
3
4
5
6
8:00-10:00
1
outer bound interval
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
35
History
• sometimes need to add a “valid
time” to a data element
• called History Item (HXIT)
• can collect a list of valid-time
annotated values as a history
• i.e. a LIST<HXIT<T>>
• called a History (HIST)
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
36
Uncertainty
• need to annotate a value with some
sense of (un-)certainty
• discrete values (e.g. diagnosis)
• code annotated with probability number
(percentage) (UVP)
• non parametric probability distribution
(NPPD) i.e. list of alternative UVP values
• continuous values (e.g. PQ)
• Parametric Probability Distribution (PPD)
• expected value extended with standarddeviation
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
37
Take Home Points
• A small set of principle data types
• Each may have a few variations
and helpers
• Extensions and combinations
• High level data types relevant for
health care data
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
38
thank you
http://aurora.regenstrief.org/v3dt/tutorial.ppt
8/10/2003
Copyright (c) 1999-2003 Regenstrief Institute for Health Care
39