Using Electronic Medical Records Systems for Clinical Research

Transcript Using Electronic Medical Records Systems for Clinical Research

Using Electronic Medical Records Systems for Clinical Research: Benefits and Challenges

Prakash M. Nadkarni

Introduction

 Opportunities  Availability of clinical, financial and administrative data in electronic form  Challenges  Using EMR Software for research operations  Using EMR Data for research? Suitability of care oriented data to clinical research needs.

 EMRs queried directly to answer research questions 2

EMR/Clinical Research

Information

System (CRIS) Differences: Research Subjects

 Subjects are not necessarily “patients”.  Personal Health Information may be optional.

 Not all screened subjects are enrolled.  Simultaneous or sequential enrollment  Eligibility Criteria 3

EMR/CRIS Differences: The Study Calendar

 Events/Visits and Study Calendar: Specific evaluations or interventions are done at specific time points ('events") relative to the start of the study.

 All patients are not enrolled at the same time.

EMR/CRIS Differences: Electronic Data Capture (EDC)

 CRIS EDC is Far More Structured and Fine grained – textual comments are only a last resort.

 CRISs may need to Support Real-Time Self-reporting of Subject Data  CRIS EDC may not always be Real-Time.

 Quality Control considerations dictate many workflow steps.

EMR/CRIS Differences: Trans Institutional Scope

 For trans-institutional scope, Web technology is virtually mandated.

 Site restriction in Multi-Site studies – end users and investigators access only their own site’s patients.

 Trans-National Issues: Software Localization/ Globalization – same software, different language/layout.

EMR/CRIS Differences: User Roles

 CRISs support differential access to studies  Most users of a CRIS are unaware of the other studies in the same database.

 Some users have read-only access to the data; some only view reports.  Only certain users may be allowed to enter data in particular forms, or even view certain "blinded" data.  Data analysts typically do not need to access PHI. However, in multi-institutional studies, they are not typically site-restricted (see later) 7

EMR/CRIS Differences: Summary

 EMRs are intended to primarily support patient care, not research. CRISs are specifically designed for research protocols.

 May inter-operate with CRISs.

 Sub-systems: Laboratory, Pharmacy, Scheduling  EMR *may* be used with structured EDC for intra-institutional studies if the only alternative is paper, or if data-entry would otherwise be duplicated.  Claims by any EMR vendor that their systems are CRIS-capable should be viewed skeptically. 8

EMR Data for Research:

 The Nature of Electronic EMR Data  Significant dependence on narrative text, which is often the gold standard for clinical findings.  Using administrative/billing data as a surrogate for clinical data  Miscoding, variations in coding 9

Using EMR Data for Research

 Primarily hypothesis suggestion/generation rather than confirmation  Sample size may be too small to achieve statistical significance  Most data mining tests only show association, which does not prove causation.

 Selection of patients matching complex criteria: sample size projections for a planned study (a strength of I2B2 – no IRB approval needed because only anonymized data is returned).

Medical Natural Language Processing 101

 NLP is concerned with extraction of meaningful information from human language input.  Ultimate goal is to transform unstructured text into a structured form.  Most NLP applications are targeted toward specific goals – e.g., identification of medications, adverse drug events.

 NLP is not 100% accurate 11

Medical NLP 101 : Symbolic/ Rule based approaches  Linguistic / symbolic NLP approaches employ hand-crafted grammar rules to parse text into units of speech (symbols), which are then processed further.  Still used successfully for limited problems.

 This approach does not always scale  Labor-intensive, ambiguous parses, poor results with telegraphic text.

Medical NLP 101: Statistical NLP

 Relies on large bodies of text annotated with the correct answers by humans.

 Utilizes probabilistic methods for prediction  The larger and more representative the training data, the better the results will be.

 Approaches include Support Vector Machines (SVMs), Hidden Markov Models (HMMs), and Conditional Random Fields (CRFs).

Medical NLP 101: Subproblems

 NLP software typically works as a pipeline of modules: Modules for Low-level tasks precede those for high-level tasks  Low Level Tasks  Segmentation- sentence and word boundary detection, problem-specific boundary detection  Part of speech tagging  Morphological decomposition of compound words  Aggregation – identification of phrases 14

Medical NLP 101 : Sub-problems (2)  High-level tasks  Spelling and grammatical error correction  Named Entity Recognition – including medical concept recognition  Word /abbreviation disambiguation  Negation and uncertainty identification  Relationship extraction  Temporal inferencing 15

Medical NLP: Practical Issues

 Change of Workflow and Introduction of Structure can eliminate a difficult problem.

 Code Reuse to avoid reinventing wheels.

 General vs. Specific Solutions  Tools Need Commoditization 16

Querying EMR Data: Technological Considerations

 A database cannot be simultaneously designed for rapid query as well as efficient interactive, multi-user updates.

 EMR database designs are transaction oriented.

 EMRs are optimized for "Patient/Entity Centric", not "Attribute-Centric" queries 17

Data Warehousing 101

  Principle: Operating on a separate read-only copy of the data on separate hardware yields better query performance.

 Structural tweaks include adding extra and pre computation of aggregate values.

 Special types of indexes (bitmap indexes) yield improved query performance.

 “Star schemas” characterize most warehouse designs.

 Farmers vs. Explorers (Inmon) “Virtual" integration ("federation") 18

Data Warehousing: Practical Considerations

 After warehouse, need for creation of custom reports may increase rather than decrease.  The critical requirement for effective ad hoc query is a comprehensive understanding of the data. This is generally a full-time effort.

Special Considerations: Querying of Clinical Data

 Both EMRs and large-scale CRISs typically store clinical data in Entity-Attribute-Value (EAV) form  100,000s of clinical parameters exist across all medical domains.  The vast majority of parameters will be inapplicable for a particular subject/patient.  EAV is a triple: Entity=Patient+point in time, Attribute=Parameter, Value=value of that parameter.

20  EPIC Flowsheet data uses EAV.

Standardization

 The mere presence of structure does not solve all problems  Synonyms in narrative text are unavoidable reduced to the same concept. Controlled medical vocabularies (UMLS) help.

 UMLS is not a panacea  Institutions will therefore evolve their internal controlled vocabularies.

Standardization Considerations

 Standardizing your definitions   2 nd Law of Thermodynamics Poor definition quality becomes a problem if pooled-data (or meta-) analysis is intended.

 Features of certain systems predispose to disorder. (Learn As You Go, separate definitions databases.)  Even the best system is not immune – path of least resistance.

 Consistent definition is difficult to achieve after the fact – Deming.

EMR use as the basis for research hypotheses

 Conflicting evidence regarding EMR benefit still appears.

 A *well designed* EMR may benefit.

 Electronic Alerting Systems themselves may not improve care, unless EMRs also reduce workload through automatic actions.  Review vendor-supplied templates carefully.

Conclusions: Future EMR Evolution

 EMRs fully supporting CRIS capability are unlikely to evolve.

 No software should attempt to do everything  Differences in storage-engine capabilities  Jack-of-all-trades approach (doing everything in a mediocre manner) is not viable.

 Difficult (or impossible) to devise a logically consistent user-interface metaphor that applies to diverse unrelated features.

 Example of Microsoft Office.

Inter-operation (1)

 Co-existing and Inter-operating best-of breed packages offer the best usability and feature-set  CRISs, Genomic / Proteomic Data Management Packages  There may be minimal data duplication- e.g., EMRs may pull in very limited summary information on critical genetic data for selected patients, so that it is immediately visible. 25

Inter-operation (2)

• CRIS/EMR  Bulk import of laboratory parameters, to avoid duplicate data entry  Automatic grading of laboratory-based adverse events (oncology studies) – Richesson et al.

 Use for scheduling research subject visits  Pharmacy subsystem for drug dispensation  EMR for primary EDC in intra-institutional studies if the only alternative is paper, or if data-entry would otherwise be duplicated.

• EMR/Specialized EMR • Picture-archiving systems 26

Inter-operation (3)

• Application Programming Interfaces (APIs)  All large packages – CRISs, EMRs, ‘ Omics – require APIs to make inter-operation efficient  APIs are vendor-specific. Inter-operation standards (e.g., the HL7 Virtual medical record) have not received much traction.

 Currently, many vendors set unreasonable financial and other barriers to use of their APIs (e.g., official certification, withholding of documentation).  EMRs lag in the software industry ’ s trend toward open-source. 27

Using Electronic Medical Records Systems for Clinical Research

Transcript Using Electronic Medical Records Systems for Clinical Research