Transcript Slide 1

HIPAA Compliant Environment for Translational Research Data and Analytics at Utah

CASC HIPAA Working Group Meeting , April 23 rd 2014 Julio C. Facelli, PhD Professor and Vice Chair Department of Biomedical Informatics Director Biomedical Informatics Core, Center for Clinical and Translational Science This work has been partially supported by grants from the NIH, National Center for Research Resources award UL1RR025764, National Library of Medicine 5RC2LM010798 and DHHS Health Resources & Services award 1D1BRH20425-01-00.

Design Criteria

 The need to provide very large storage capacity and diverse analytical software that are well integrated into a high performance computing setting  The ability to provide virtual machines (VM) to deploy applications containing PHI; for example, clinical trials database management tools or specialized systems to provide personalized health data accessible to patients  Isolate it as much as possible, while still utilizing parts of the core HPC infrastructure Caveat: The University of Utah fully owns the University of Utah Health Hospitals and Clinics:

There is only one covered entity

Physical Infrastructure

 The physical hardware resides in a data center with controlled room access  These hosts are racked in a locked cabinet and hosts have locked server bezels  Physical access to the data center is reviewed biannually and documented on an access controlled departmental wiki  Back-ups are restricted to one specific back-up server on one particular port. Back-up data traffic is automatically encrypted (BLOWFISH) at the client side before traversing the network  All CHPC staff who interact with the PE take the university’s HIPAA training courses

6

SWASEY

Provides statistical resources on

Windows

server 64 bit 48 cores (4 cpu's x 12 cores @ 2600 MHz) AMD Opteron 6238 Bulldozer,

512 G of RAM

, 2ea 1TB drives RAID1, Access via remote VDI Application such as SAS, SPSS, WEKA, NLP, R, text miner, mysql workbench, eclipse, tomcat, Microsoft Office & some DB utilities This host also mounts the same file space as the HPC analytic environment

Software Available

We share our

complete applications tree

with the PE in read-only mode The most popular applications used in the HIPAA environment are MySQL, a collection of NLP tools including MetaMap and CLUTO13 (a clustering tool), WEKA data mining package, and the R statistical package The two predominant pipeline strategies our researchers employ are UIMA for NLP work and PhP-wrapped R for bioinformatics projects.

Administrative procedures

     Have an active account in the University of Utah's Kerberos authentication system. This can be extended to external collaborators as well Have an active CHPC departmental account, where sponsorship and approval of a Principal Investigator (PI) is required.

Have an active CHPC account created in the Protective Environment’s NIS. This requires verification and completion of the University’s HIPAA privacy and security training courses.

Be added to the HIPAA Virtual Private Network (VPN) pool, and use this VPN encrypted tunnel to access designated login nodes Permission to use a given dataset is governed by the approval of the University's Institutional Review Board (IRB). If the IRB approves a project that uses a PHI dataset, the researcher is given an IRB number, which is then shared with the CHPC. The researcher lists the users who will be permitted to access the data. That list is independently verified with the IRB and it forms the basis of the UNIX group defined for the project. At this point, the data may be transferred to CHPC and only the NIS group will have access to it.

Data Model Transformations

itBioPath

Biospecimen LIMS

UPDB-L Population & Public Heath EHR

OpenFurther

Query Tool

Federated Query Engine

Data Source Adapters

Admin & Security Components

Virtual Identity Resolution on the GO (VIRGO)

Quality & Analytics Framework

Metadata Repository

Terminology/Ontolog y Server

Query Tool Quality Analysis Counts & Data Terminology Server VIRGO Metadata Repository Security ADAPT Data Sources ADAPT ADAPT ADAPT

OpenFurther: Demo Version

• Open Source Instance • Demonstrative version that can be downloaded, tried, modified and deployed for testing and experimentation purposes.

• Public Datasets: • OMOP: Secondary data for Comparative Effectiveness Research • OpenMRS: Medical Record System • Released: AMIA 2013 • http://openfurther.org/

Air Quality Integration Architecture

Counts & Data Terminology Server Metadata Repository

High Spatio temporal Resolution Uncertainty

Air Quality Data Modeling Unit ADAPTER Query Tool Quality Analysis VIRGO ADAPTER ADAPTER ADAPTER Security EPA Air Quality Web Service Satellite-derived Aerosol Optical Depth Measurements Clinical Data Sources

Hurdle Lab Use of the HIPAA-compliant PE: The

POET

Project

• POET addresses a central challenge in clinical natural language processing: noisy text

Short forms/misspellings: “REASON FOR EXAMINATION: r/o ptx, s/p aicd removal”

Ungrammatical (nonprose): “ Doppler: Complete pulse and color flow” Lists: “WBC-6.7 RBC-3.96* HGB-13.9* HCT-39.8* MCV-100* Split clinical concepts: “She had tumor found in both her left and right ovaries .”

• We use ApexArch (CHPC PE) and the POET interactive node for this work, running under the UIMA processing framework.

• The POET project was the first to explore a HIPAA-compliant space at CHPC (~2009)

Support for POET by NLM grants R21LM009967 and R01LM10981.

Hurdle Lab Use of the HIPAA-compliant PE: The qDIET Project

• • Partnering with a large, national grocery chain and the USDA, qDIET explores assessing the food quality of households to improve nutritional health ~100,000 unique UPC codes; ~120M item records qDIET technology: maps to various USDA databases Healthy Eating Index USDA Food Plans Food Patterns: e.g., Mediterranean Diet • We have 13 months of grocery item detail on ~140,000 households in four geographical areas.

• We use Swasey for our modeling (need security and ‘big iron’)

Support for qDIET by a VP Seed Grant and the National Children’s Study

Del Fiol Lab Pediatric Patient Summary

• Overal goal – Implement regional quality system, guided by the Medical Home model – Utah Children’s Healthcare Improvement Collaboration (CHIC) • Target population – Improve care for children with special health care needs • Collaborators – Chuck Norlin, MD (Department of Pediatrics) • Funding source – CMS – 5-year grant; approximately $10 million total funding ($750,000 for PPS)

Pediatric Patient Summary • Collaborative platform for providers, care managers, and parents of children with special care needs

– Summary of a child’s record, maintained by the child’s providers and parents – Notifications and summary of critical events (e.g., ER visit, hospital admission) – Leverages State clinical health information exchange

Projects Using CHC Resources

• 2 projects relying on the

electronic-Asthma Tracker

hosted by CHC.

Project 1:

Improving Post-Hospital Transitions and Ambulatory Care for Children with Asthma – Principal Investigator: Flory Nkoy, MD, MS, MPH – Funding: AHRQ •

Project 2:

Redesigning Ambulatory Care Delivery to Enhance Asthma Control in Children – Principal Investigator: Flory Nkoy, MD, MS, MPH – Funding Agency: PCORI •

Collaborators:

.

Predictive Modeling of Call Outcomes to Poison Control Center Recommendations National Institute for Nursing Research (NR0101119-01, PI Ellington - CON)

 2007-2011  Co-Investigator Mollie R. Cummins PhD, RN led a predictive modeling aim for this study  Using PHI from a regional poison control center, we applied statistical and machine learning approaches to develop predictive models of call outcomes  Data and analytic tools hosted within the HIPAA protected compute cluster enabled a secure, high performance computing environment