Applications_of_Big_Data_in_Healthcare.pptx

Download Report

Transcript Applications_of_Big_Data_in_Healthcare.pptx

APPLICATIONS OF BIG DATA IN HEALTHCARE

PERFORMANCE MANAGEMENT OF QUALITY OF CARE

Ben Eze Akosua Asamoah

Group 9

Outline

   

Motivation

Big Data in Health Care Healthcare Applications       Health Analytics Natural Language Processing Medical Imagery Human Genome BASN and IoT Disease Outbreaks Management Challenges to Big Data Adoption in Health Care

A Case for Big Data – 9/11 Airspace shutdown

How much revenue was lost as a result of the airspace shutdown on

9/11?

  ETL Processes and Data Analytics were 3 months behind.

Without Big Data – This is the scenario we have with healthcare data today?

The Civil Aviation Directorate – also known as Transport Canada Civil Aviation (TCCA)

Heritage Health Prize (2012)

Goals:   Identifying high-risk patients and having them treated.

Develop a model that can predict the number of days a patient would spend in a hospital in the next year.

Reduce the number of unnecessary hospitalizations.

Decrease the cost of care http://www.heritagehealthprize.com/c/hhp

White House BRAIN Initiative

  The BRAIN initiative is a US National Institutes of Health (NIH) Project   Expected to change our understanding of the human brain.

Accelerate the development of new treatments for many neurological diseases. Brain activity recordings (EEG); cardiac measurements (ECG) and other physiological measures are high-volume and high-velocity data.  A five-day epilepsy evaluation would generate about 1.6GB of data for a single patient.

The $100 million initiative started in 2013 will change our understanding of the human brain.

Outline

    Motivation

Big Data in Health Care

Healthcare Applications       Health Analytics Natural Language Processing Medical Imagery Human Genome BASN and IoT Disease Outbreaks Management Challenges to Big Data Adoption in Health Care

Big Data characteristics of healthcare data

    Large and Heavy: Collection of clinical observations over the lifetime of a patient. High Dimensional and longitudinal: Comprises of longitudinal and high-dimensional data entities with complex relationships. Streaming database: Data is available in real-time and batched modes.

Structured, Unstructured and Semi-

Structured data : Data structure is very heterogeneous.

Health care BIG DATA CHALLENGE

500 PB in 2012

World-wide Digital Healthcare data expected to reach 25,000 PB in 2020.

  An average hospital needs to manage up to 665 TB of Patient Data 80% of this data is Unstructured Medical Imaging Data.

Health Data Processing Pyramid

Actionable Insights

• • •

Knowledge

SOA/Event-driven KPI Visualization Reports

Information Minable Data

• • Data Mining Data Warehouses

Big Data Information Extraction

• • • • Hadoop Hbase NoSql Hive, Pig

Goals of Big Data Analytics in healthcare

Genomic Data Electronic Health Records Behavioral Data Sensor Data

Evidence-based Medicine

Public Health Data

Improved Quality of Care

Reduce wait times, length of hospitalization, cost of health care;      Providing the right intervention at the right time; Streamline Health care processes; Improve outcomes through smarter decisions; Early detection of disease outbreaks; Discovery of new social behaviors

Outline

    Motivation Big Data in Health Care

Healthcare Applications

 Health Analytics      Natural Language Processing Medical Imagery Human Genome BASN and IoT Disease Outbreaks Management Challenges to Big Data Adoption in Health Care

Big Data - Healthcare DATA ANALYTICS

Information Extraction  Structured - CPT, ICD-9/10, DiCOM, LOINC, NDC, SNOMED, RxNorm, etc.

  Unstructured – Clinical Notes (Lots of abbreviations and misspellings) Semi-Structured – HL7 Data Analytics  Patient Profiling    Predictive Analysis Prescriptive Analysis Attributes Correlation

Natural Language Processing (NLP)

   The process of identifying words, names, patterns is done using a library of pattern rules as well as gazetteer lists with names associated with various annotation types like the first name, surname, hospital names, addresses, dates, as well as disease codes like CPT, and LOINC. Processed data is retrieved in a term document matrix sorted by their relevance. The most relevant terms are then fed into a database with meta-data associating the data element to the source document, a record in the database and the patient identifier. Data Sources Dynamic Gazetteer load

Custom Gazetteer lists/Tables

Load Unstructured data Extract top medical terms Extracted annotations for each patient Terms-Document Matrix Natural Language Processing Discovery Module   Language Tokenizer Sentence Splitter  POS Tagger  NE Transducer  Fixing Spellings  Stemming NLP Engine Internal Gazetteer lists Pattern Rules

Medical Imagery

     An average hospital generates 665 TB of Patient data 80% of this data is unstructured image data like CT scans, MRI and X-Rays Medical archives are increasing by 20-40% annually.

Medical image data is huge, high-dimensional and complex.

Extracting data from this data is so complex and computation intensive.

• • The body is a source of Big Data Source: http://medcitynews.com/2013/03/the-body-in-bytes-medical-images-as-a-source-of-healthcare-big-data-infographic/

Human Genome

    Human genome is made up of DNA consisting of 4 building blocks (A, T, C, G) Each genome contains over 3 billion pairs of bases pairs.

The Human Genome Project (1990 – 2003) has successfully completed the sequencing of the 3 billion DNA.

BigData is reducing the time it takes to sequence genes and making it more cost effective.

Virginia tech – Human Genome analytics using Hadoop on MS Azure

      Virginia Tech is one of the leading research institutions with a US$454 million portfolio of projects that includes DNA sequencing analysis.

Producing 15 PB of genome data annually, Virginia Tech was generating information faster than it could analyze.

To reduce costs and improve access to DNA sequencing tools and analysis, the Virginia Tech team decided to create an on- demand cloud computing model based on Microsoft Azure HDInsight.

Gone from analyzing one genome in 2 weeks to over a 100 genome a day.

Cost has gone down from $100 million USD a genome in 2001 to under $6K USD per genome.

Veritas Genetics can deliver the entire human genome for less than $1,000 USD through a smart phone app.

https://youtu.be/TnhZqkLchIM

IoT and Body Area Sensor Networks (BASN)

  Advancements in mobile computing through smart phones provide a unique opportunity in Healthcare.

BSN have unique roles in health applications, particularly to support real time decision making and therapeutic treatments.

    Breathing Activity ECG and Heart rate Insulin Pump Blood pH, Glucose, Temperature, Dissolved Oxygen, Carbon Dioxide • IoT Devices can vitals and internal organ functions for Chronic and Acute Care patients.

Disease outbreak and public health

    Social media is playing a critical role in detecting disease outbreaks.

By continuously analyzing data from patients, discussion forums and the social media.  HealthMap software flagged Ebola 9 days before outbreak was announced.

Metadata like the patient identifier and other location attributes in social media posts would help pinpoint the relative location of the incidence. Data gathered is clustered on a daily basis using the location attributes. Dense clusters are identified possibly through a visualization system.

Outline

    Motivation Big Data in Health Care Healthcare Applications  Health Analytics      Natural Language Processing Medical Imagery Human Genome BASN and IoT Disease Outbreaks Management

Challenges to Big Data Adoption in Health Care

Big Data challenges in healthcare

     Healthcare Stakeholders continue to go for vertical scalability instead of horizontal scalability.

 Cloud computing and Big Data Analytics go hand-in hand.

Too many standards, noisy, heterogeneous, longitudinal.

Challenges to integration of healthcare processes Security, Privacy and Confidentiality issues Skillset Gaps

References

        Sun, J. & Reddy, C. (2013). Big Data Analytics for Healthcare. Tutorial Presentation at the SIAM International Conference on Data Mininig, Austin, TX, 2013. [Online]. Available: http://siam.org/meetings/sdm13/sun.pdf

, http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/SDM2013.pdf

.

“Big Data,” Gartner, 2015. [Online]. Available: http://www.gartner.com/it-glossary/big-data.

Heritage Provider Network - Health Price. [Online]. Available: http://www.heritagehealthprize.com/c/hhp “HealthMaps.” [Online]. Available: http://www.healthmap.org/ebola/#projection .

“Ebola was flagged up by computer software nine days BEFORE it was announced: HealthMap used social media to spot disease.” [Online]. Available: http://www.dailymail.co.uk/sciencetech/article-2722164/Ebola-flagged-computer-software-nine-days-BEFORE announced-HealthMap-used-social-media-spot-disease.html

.

H. Hromic, D. Le Phuoc, M. Serrano, A. Antonic, I. P. Zarko, C. Hayes, and S. Decker, “Real time analysis of sensor data for the Internet of Things by means of clustering and event processing,” Communications (ICC), 2015 IEEE International Conference on. pp. 685–691, 2015.

BRAIN Initiative.[Online]. Available: https://www.whitehouse.gov/share/brain-initiative Virginia Tech https://www.microsoft.com/en/server-cloud/cloud-os/customer-stories/virginia-tech.aspx