Transcript SHARP High-Throughput Phenotyping Jyoti Pathak, Ph.D.
Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput Phenotyping
Project Lead: Jyotishman Pathak, PhD PI: Christopher G. Chute, MD, DrPH
June 12, 2012
Electronic health records (EHRs) driven phenotyping
• Overarching goal • To develop high-throughput automated techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-2
Current HTP project themes
• Standardization of phenotype definitions • Library of phenotyping algorithms • Phenotyping workbench • Machine learning techniques for phenotyping • Just-in-time phenotyping ©2012 MFMER | slide-3 SHARPn High-Throughput Phenotyping
Algorithm Development Process - Modified
• Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM)
Rules Semi-Automatic Execution
• Conversion of structured phenotype criteria into executable queries
Evaluation Phenotype Algorithm Visualization
clinical data • Create new and re-use existing clinical element models (CEMs)
Transform Transform Data Mappings NLP, SQL
[Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-4 SHARPn High-Throughput Phenotyping
• • • •
NQF Quality Data Model (QDM)
Standard of the National Quality Forum (NQF) • A structure and grammar to represent quality measures in a standardized format Groups of codes in a code set (ICD-9, etc.) • "
Diagnosis, Active: steroid induced diabetes
" using "steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)” Supports temporality & sequences • AND: "
Procedure, Performed: eye exam
" > 1 year(s) starts
before
or
during
"Measurement end date" Implemented as set of XML schemas • Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.) ©2012 MFMER | slide-5 SHARPn High-Throughput Phenotyping
116 Meaningful Use Phase I Quality Measures
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-6
Example: Diabetes & Lipid Mgmt. - I Human readable HTML
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-7
Example: Diabetes & Lipid Mgmt. - II Computable XML
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-8
Algorithm Development Process - Modified
• Standardized and structured representation of phenotype definition criteria • Use the NQF Quality Data Model (QDM)
Rules Semi-Automatic Execution
• Conversion of structured phenotype criteria into executable queries
Evaluation Phenotype Algorithm Visualization
clinical data • Create new and re-use existing clinical element models (CEMs)
Transform Transform Data Mappings NLP, SQL
[Welch et al. 2012] [Thompson et al., submitted 2012] [Li et al., submitted 2012] ©2012 MFMER | slide-9 SHARPn High-Throughput Phenotyping
Drools-based Phenotyping Architecture
Clinical Element Database Data Access Layer Transformation Layer Transform physical representation Normalized logical representation (Fact Model) Business Logic Inference Engine (Drools) Service for Creating Output (File, Database, etc) List of Diabetic Patients SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-10
Automatic translation from NQF QDM criteria to Drools
SHARPn High-Throughput Phenotyping [Li et al., submitted 2012] ©2012 MFMER | slide-11
The “executable” Drools flow
©2012 MFMER | slide-12
Phenotype library and workbench - I http://phenotypeportal.org
1. Converts QDM to Drools 2. Rule execution by querying the CEM database 3. Generate summary reports ©2012 MFMER | slide-13
Phenotype library and workbench - II http://phenotypeportal.org
©2012 MFMER | slide-14
Phenotype library and workbench - III
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-15
Machine learning and HTP - I
• Machine learning and association rule mining • Manual creation of algorithms take time • Let computers do the “hard work” • Validate against expert developed ones [Caroll et al. 2011] ©2012 MFMER | slide-16 SHARPn High-Throughput Phenotyping
Machine learning and HTP - II
• • • • • Origins from sales data
Items
(columns): co-morbid conditions
Transactions
(rows): patients
Itemsets
: sets of co-morbid conditions
Goal
: find
all
itemsets (sets of conditions) that
frequently
co-occur in patients.
• One of those conditions should be DM.
• •
Support
: # of transactions the itemset appeared in • Support({TB, DLM, ND})=3
I Frequent
: an itemset support(
I
)>
minsup I
is frequent, if
AB
Patien t
001 002 003 004 005
TB
Y Y Y
A B
DL M
Y Y Y Y Y
AC ABD AD BC
ND … IEC
Y Y Y
C
Y
BD ACD D
Y Y
CD
X
: infrequent [Simon et al. 2012] SHARPn High-Throughput Phenotyping
Just-in-Time phenotyping - I Transfusion-related Acute Lung Injury (TRALI) Transfusion-associated Circulatory Overload (TACO)
Electronic Health Records and Phenomics
Just-in-Time phenotyping - II TRALI/TACO “sniffer”
SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-19
Electronic Health Records and Phenomics
Active Surveillance for TRALI and TACO
Of the
88 TRALI cases
correctly identified by the CART algorithm, only
11 (12.5%)
of these were reported to the blood bank by the clinical service. Of the
45 TACO cases
correctly identified by the CART algorithm, only
5 (11.1%)
were reported to the blood bank by the clinical service. SHARPn High-Throughput Phenotyping
Publications till date (conservative)
14 12 12 10 8 8 6 4 2 6 6 2 Papers Abstracts Under review 0 Year 1 (2011) Year 2 (2012) Year 3 (2013) ©2012 MFMER | slide-22 SHARPn High-Throughput Phenotyping
2011 Milestones
Standardized definitions for phenotype criteria Rules-based environment for phenotype algorithm execution National library for standardized phenotype definitions (collaboration with eMERGE) Machine learning techniques for algorithm definitions Online, real-time phenotype execution Phenotyping algorithm authoring environment ©2012 MFMER | slide-23 SHARPn High-Throughput Phenotyping
2012 Milestones
• • • • Machine learning techniques for algorithm definitions Online, real-time phenotype execution Collaboration with NQF, Query Health and i2b2 infrastructures • • • • Use cases and demonstrations MU quality metrics (w/ NQF, Query Health) Cohort identification (w/ eMERGE, PGRN) Value analysis (w/ Mayo CSHCD, REP) Clinical trial alerting (w/ Mayo Cancer Ctr./CTSA) ©2012 MFMER | slide-24 SHARPn High-Throughput Phenotyping
Project 3: Collaborators & Acknowledgments
• • • • • • CDISC (Clinical Data Interchange Standards Consortium) • Rebecca Kush, Landen Bain Centerphase Solutions • Gary Lubin, Jeff Tarlowe Group Health Seattle • David Carrell Harvard University/MIT • Guergana Savova, Peter Szolovits Intermountain Healthcare/University of Utah • Susan Welch, Herman Post, Darin Wilcox, Peter Haug Mayo Clinic • Cory Endle, Rick Kiefer, Sahana Murthy, Gopu Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor, Chris Chute ©2012 MFMER | slide-25 SHARPn High-Throughput Phenotyping