(SHARP) Area - the Mayo Clinic Informatics Website

Transcript (SHARP) Area - the Mayo Clinic Informatics Website

Strategic Health IT Advanced
Research Projects (SHARP)
Area 4: Secondary Use
Dr. Friedman on-site visit, Mayo Clinic
3 September 2010
SHARP: Area 4: Secondary Use of EHR Data
• 14 academic and industry partners
• Develop tools and resources that influence and
extend secondary uses of clinical data
• Cross-integrated suite of project and products
• Clinical Data Normalization
• Natural Language Processing (NLP)
• Phenotyping (cohorts and eligibility)
• Common pipeline tooling (UIMA) and scaling
• Data Quality (metrics, missing value
•
management)
Evaluation Framework (population networks)
© 2009 Mayo Clinic
2
Collaborations
• Agilex Technologies
• CDISC (Clinical Data Interchange
Standards Consortium)
• Centerphase Solutions
• Deloitte
• Group Health, Seattle
• IBM Watson Research
Labs
• University of Utah
• Harvard Univ. & i2b2
• Intermountain Healthcare
• Mayo Clinic
• Minnesota HIE (MNHIE)
• MIT and i2b2
• SUNY and i2b2
• University of Pittsburgh
• University of Colorado
Themes & Projects
Major Achievements
• Foster social connections across
projects
• Recognition by team members that not
all problems must be solved within their
team
• NLP and phenotypes
• Phenotypes and CEM normalization
• Shared responsibility for overlapping
dependencies
The bookends - Projects 1&6
Data Normalization & Evaluation
Christopher G. Chute
Stan Huff (Peter Haug)
Overview
• Build generalizable data normalization pipeline
• Establish a globally available resource for health
terminologies and value sets
• Establish and expand modular library of
normalization algorithms
• Iteratively test normalization pipelines, including
NLP where appropriate, against normalized
forms, and tabulate discordance.
• Use cohort identification algorithms in both EMR
data and EDW data. (normalize against CEMs)
Progress
• Designation of Clinical Element Models (CEMs) as
canonical form
• Utilizing use case scenario’s (PAD, CPNA, etc) for CEM
normalization.
• Exploration into generalizable CEM models – diagnosis,
medications, labs.
• Development of processes/tools to identify relevant
existing CEM models within CEM libraries
• Development of processes to identify missing CEMs for
data (and classes of data) in use-cases
• Preliminary population of phenotype use-cases
Planned
• Adopt eMERGE EleMap tooling for CEMs to population
canonical model
• Formalize Meaningful Use vocabularies into LexGrid server
• Design other components of Data Normalization framework
(Terminology Services - NHIN connections)
• Model end-to-end flow needed to produce normalized data
from structured data and unstructured (natural language) data:
• High level description of process for taking “wild-type” data
instances to canonical CEM instances
• Applicability to use-case data as well as to general classes
of data
• Adopt UMIA data flows for normalization services
• Examine Regenstreif and SHARP 3 modules
Project 2
Clinical Natural Language
Processing (cNLP)
Dr. Guergana Savova
Overview
• Overarching goal
• High-throughput phenotype extraction
from clinical free text based on
standards and the principle of
interoperability
• Focus
• Information extraction (IE):
•
transformation of unstructured text into
structured representations (CEMs)
Merging clinical data extracted from free
text with structured data
Progress
• Detailed 4-year project plan
• Tasks in execution:
• Investigative tasks: (1) defining CEMs and
•
•
attributes as normalization targets for NLP,
(2) defining set of clinical named entities
and their attributes, (3) methods for cNE
Engineering tasks: (1) defining users, (2)
incorporating site NLP tools into cTAKES
and UIMA, (3) common conventions and
requirements, (4) de-identification flow and
data sharing
Forging cross-SHARP collaborations
(SHARP 3, PI Kohane and Mandl)
Planned
• Y1
• Gold standard for cNEs, relations and CEMs
• Focus on methods for cNE discovery and
•
•
populating relevant CEMs (many subtasks)
Projected module releases:
Medication extraction (Nov’10)
CEM OrderMedAmb population (Mar’11)
Deep parser for cTAKES (Nov’10)
Dependency parser for cTAKES (Jan’11)
Collaboration with SHARP 3 by providing
medication extraction capabilities for the
medication SMaRT app
Project 3
High throughput Phenotyping (HTP)
Dr. Jyoti Pathak
Overview
• Overarching goal
• To develop techniques and algorithms that
operate on normalized EMR data to identify
cohorts of potentially eligible subjects on the
basis of disease, symptoms, or related
findings
• Focus
• Portability of phenotyping algorithms
• Representation of phenotyping logic
• Measure goodness of EMR data
06/21/10
© 2010 Mayo Clinic
15
Progress
• Explored use case phenotypes from
eMERGE network for HTP process
validation
• Representation of phenotype
descriptions and data elements using
Clinical Element Models
• Preliminary execution of phenotyping
algorithms (Peripheral Arterial Disease)
to compare aggregate data
Planned
• Interaction and collaboration with Data
Normalization and NLP teams to develop
“data collection widgets”
• Representation of phenotyping execution
logic in a machine processable
format/language
• Development of machine learning methods
for semi-automatic cohort identification
Project 4
Infrastructure & Scalability
Jeff Ferraro
Marshal Schor
Calvin Beebe
UIMA exploitation
• Some initial discussions on UIMA were held in a
meeting at MIT attended by Peter Szolovits (MIT)
and Guergana Savova (Harvard) and some of
their team members.
• A plan is underway for a UIMA "deep dive" for
other members from Intermountain Health and
Mayo.
• A discussion is pending to understand the how
UIMA might fit with RPE (in particular, BPEL)
RPE = Retrieve Process for Execution: an IHE (Integrating
the Health Enterprise) profile to automate collaborative
workflow between healthcare and secondary use domains)
Infrastructure Progress
• Code repository – Reviewed requirements (e.g.
SVN), need pre-release work areas for project teams, bulk
of materials will all be in public repository.
• Licensing compatibility discussion.
Initial discussions on Open Source licensing which is
consistent with UIMA and other project teams tooling. Will
need to survey teams.
• Initial platform discussions
Still working on Sandbox (“Shared”) environment, need to
consider Cloud in later phases of project.
Planned
• Review repository options with:
• ONC, Source Forge, Open Health Tools
• Need to establish straw man proposal
for Sandbox configuration.
• Conduct cross-project discussions
•
•
•
•
Inventory tools that can be shared.
Inventory data that can be shared.
Identify shared environment site location.
Initiate high-level requirements gathering.
Project 5
Data Quality
Dr. Kent Bailey
(Kim Lemmerman)
Overview
• Support data quality and ascertain data quality
issues across projects
• Deploy and enhance methods for missing or
conflicting data resolution
• Integrate methods into UIMA pipelines
Progress & Planned
• Integrate across projects and gather
requirements and standards to establish data
quality plan and metrics
• Compare expected quality of data to actual data
quality
• Provide recommendation and methods to
improve data quality and/or possible outcomes
Cross-Area 4 Program Efforts
Lacey Hart
Progress
• Started with early with face-to-face
collaboration; cross-knowledge pollination
• Individual project efforts synergized with
timelines in synch; use cases vetted and
determined for the first six months of focus.
• IRB & Data Sharing issues have been
raised with best practice sharing and
inventory of existing agreements between
institutions reviewed.
Planned
• Best practices for IRB submissions and
template protocol material will be made
available w/ applicable state implications
• Data use agreements will be completed
across sites where needed in short term;
effort for ‘consortium’ agreement will
commence for long-term data sharing
needs
Cross-ONC Efforts
Dr. Christopher Chute
SHARP Area Synergies
1. Security: ensure piplined data does not
have compromisable integrity
2. Cognitive: explore how normalized
data and phenotypes can contribute to
decisions
3. Applications: Potential for shared
architectural strategies
© 2009 Mayo Clinic
29
Beacon Synergies
• High-throughput data normalization and
phenotyping (SHARP)
• Applied to population laboratory
(Beacon)
• Validate on consented sub-samples
• Potential to include ALL patients in
population area – regardless of provider
© 2009 Mayo Clinic
30
SHARP Area 4: More
information…
http://sharpn.org
SE MN Beacon: More
information…
http://informatics.mayo.edu/beacon
© 2009 Mayo Clinic
32

(SHARP) Area - the Mayo Clinic Informatics Website

Transcript (SHARP) Area - the Mayo Clinic Informatics Website

Directory