Using Electronic Medical Records for Research
Download
Report
Transcript Using Electronic Medical Records for Research
Using Electronic Medical Records
for Research: Practical Issues and
Implementation Hurdles
Prakash M. Nadkarni MD
1
Benefits of EMRs
Most of the data that you want is often in
the EMR
Sample
Size Analyses
Cohort identification /recruitment
Detail Data
You can implement many research related
workflows
Appointment
scheduling enables interventions
at the patient's convenience.
2
EMRs don't do everything
Even Epic warns you about the need to
interoperate with software designed
specifically for clinical research (CRIS=Clinical
Research Information System).
Even CRISs are sub-specialized: Project
management/finance, grant management
workflows, federal paperwork (FDA
Investigational New Drug applications),
general or specialized data capture (e.g.,
patient diaries, adaptive questionnaires).
3
Challenge: No Study Calendar
All patients are not enrolled at the same time.
Specific evaluations or interventions are done
at specific time points ('events") relative to
start of participation in the study (or some
arbitrary point- e.g., working backwards from
a scheduled MRI scan).
Each time point may have a permissible
range or window (e.g., “6-mth follow up”
may occur between 5-7 months).
Given a protocol/study calendar, a CRIS will
*generate* a provisional patient calendar.
4
Study Calendar (2)
The protocol is worked out based on information
yield of the evaluation and expected rate of
change in the parameters evaluated, evaluation
cost and patient risk. An Event-CRF Cross-Table
enforces consistency.
CRISs use "Unscheduled" events to deal with
emergency conditions.
An entire set of reports are calendar-driven – e.g.,
scheduled events, missing forms, out-of-range
visits.
In Epic, the closest to Calendar functionality is the
Chemotherapy module (Beacon)
5
Non-adherence to Standards
If vendor ignores national/international
controlled terminology standards, data
pooling in cross-institutional collaborations
is difficult
For
procedures, Epic does not use Clinical &
Procedural Terminology (CPT). Instead,
procedures are identified by idiosyncratic
abbreviations created by hurried users, that
are hard to interpret except by those users,
and vary across institutions.
6
Standards Challenges (2)
Of the 15,000 laboratory tests in our instance of
Epic, only about 8% have been mapped
currently to the Logical Observations, Identifiers,
Nomenclature and Codes (LOINC) vocabulary.
Sometimes the same procedure or lab test is
defined more than once in a master table
the definitions are unhelpful, and one must look at
the actual data to determine which are used, e.g.,
histogram showing number of tests performed over a
period of time, the max and minimum values.
7
Redundancy and heterogeneity
The data may have been stored more
than once, and in different ways, in
different parts of the medical record
BMI
is recorded in two different places.
"Uncontrolled" local terminologies
Flowsheets
where Blood pressure is recorded
redundantly as text "124/82". (Not in UIHC,
fortunately.)
Procedures and Lab definitions list are also
semi-controlled.
8
Duplicate Elements
Pseudo-redundancy: Subtly different data
elements that are given the same label in
the user interface
Baby's
birth weight is recorded both at the
time of delivery and at the time of admission
to a NICU. The two are not semantically the
same: with interventions, the former may be
significantly more (or less) than the latter.
9
“Wrong” structure
Much data (discharge summaries, etc.) is
stored as text, requiring human abstraction or
Natural language processing (NLP).
NLP is not 100% accurate, requiring sensitivity
and specificity to be traded off. It is especially
hard with progress notes that are replete with
abbreviations and that may have little
grammatical structure.
Much of the published NLP work relies on
idiosyncrasies of a particular dataset (e.g., the
use of Epic templates) to achieve higher
accuracy, and is not always generalizable.
10
The Needle in the Haystack
Epic schema contains several thousand tables;
many unused, or with empty fields.
Incomplete or out-of-date documentation.
The first time, one may spend more time locating
a particular data element than actually pulling it
out.
Persons doing data extraction need to add value
by providing signposts and tips, to help others
who have to do the same task later.
Even with a data warehouse, this problem will
reoccur as long as data definitions are suboptimal
11
Real-time cohort identification
must be done judiciously
"Best Practice Alerts" can be a resource
drain on responsiveness of systems.
Do you really need real-time subject
identification? Would a 24-hour delay be
acceptable? ICU-related clinical studies;
transfusion in preemies.
12
Transforming the Data
The form in which data is recorded in the
EMR is not necessarily the form in which it
is most conveniently analyzed or reported.
Registries often require creating derived
variables
Converting
numerical data into categories – e.g.,
Binning children by birth weight
Converting numeric values or existence/absence
of data into Yes/No: Is the bilirubin > 5 mg/dl?
Did the neonate receive nitric oxide inhalation
for pulmonary hypertension?
13
Interfacing with statistical
software
Before: sample size, randomization
After: Analysis, fitting to models
Some
CRISs (e.g., REDCap, TrialDB) will output
SAS/SPSS-formatted data files, with definitions
for all variables (including enumerations for all
categorical variables; SAS has a command
called PROC FORMAT for categorical data).
EMRs still lag.
14
Data Warehouse
A database that is optimized for fast query,
preferably by end-users, without interactive
updates
Solves some problems, but not others
More
homogeneous structure – i.e., a handful of
tables rather than thousands.
However, the problem of locating variables of
interest doesn't go away. With indifferent
documentation of the variables, the problem of
hunting for variables of interest is transferred
from the concierge/analyst to the end-user, which
may worsen the problem.
15
Special Challenges in EMR Data
Interpretation /Reliability
Data entry errors in source data, often a
consequence of “copy and paste”.
Coding of categorical variables does not
accommodate nuances in the medical history or
diagnostic findings.
Depending on the source, billing data may have
been up-coded (Humana).
Outcome data may be lacking – absence of return
visit data may simply mean that patient failed to
improve and went elsewhere.
16
Special Challenges (2)
Data fragmentation – especially where healthcare
is provided by separate institutions.
Data is observational – treatments and exposures
are not assigned randomly.
Confounding Bias – socioeconomic factors might
lead patients to use suboptimal treatments
Selection/sampling Bias – atypical demographical
attributes for the cohort whose data you are
seeing, may limit inferences that you can make
about the general population.
17
Frontiers: Genetic Data
There are no technical barriers to the
incorporation of limited genetic data for an
individual– e.g., SNPs or specific
mutations – in structured (i.e., readily
analyzable) form.
Major current issue is the limited
understanding of genetic data and
definitions by EMR vendors.
Whole-genome is still a long-way off. A
single record would be larger than the
bulk of existing non-image EMR data.
18
Conclusions
None of the challenges are insurmountable,
but they take a lot of effort and resources
to address
Most of the fixes are long-term, involving:
Manual
mapping to controlled vocabulary
terms
Change in processes
Maintaining descriptive documentation that
must continually be checked for usability and
currency.
19
Further Reading
Masys DR, et al . Technical desiderata for the
integration of genomic data into Electronic
Health Records.J Biomed Inform. 2012
Jun;45(3):419-22
Nadkarni, Ohno-Machado and Chapman. Natural
Language Processing: A Tutorial. Journal of the
American Medical Informatics Association, 2011.
PMC3168328
Hoffman & Podgurski, “Big, bad data” Journal of
Law, Medicine and Ethics, (2013) 41:1,pp 56-60.
http://www.ncvhs.hhs.gov/130430b6.pdf.
20
Questions?
21