Lab seminar Presentation
Download
Report
Transcript Lab seminar Presentation
Imputation-enhanced Prediction
of Septic Shock In ICU
Patients
Joyce C. Ho, Cheng H. Lee and Joydeeph Ghosh
University of Texas at Austin
HI-KDD 2012: ACM SIGKDD Workshop on Health
Informatics
Presenter : Kiyana Zolfaghar
Outline
Motivation
Challenges of Clinical Data
Predictive model for
Sepsis Risk
Septic Shock
Impact of imputation methods on prediction
Results
Sepsis and Septic shock
Sepsis
a Severe, systemic inflammatory response with
a presumed or identified source of infection.
Severe
Sepsis
Sepsis with one or more organ dysfunction,
hypoperfusion or hypotension
Septic
Shock
a complication characterized
by low blood pressure despite
treatment by >600 mL of fluid
inputs in the last hour
Motivation
Septic Shock as a Severe illness
𝟏𝟎𝐭𝐡 most common cause of death in western societies
25% of ICU bed utilization in western countries
mortality rates range 12.8% for sepsis to 45.7% for septic shock
the
Motivation for Prediction of Septic Shock in ICU Patients
Early intervention and therapy can improve the outcome of patients
treatment transition
treated by critical care
physicians
in later phases
Proactive treatment in
early phases
Prediction of Sepsis and Septic shock
Data mining approach
for identifying patients at risk for developing sepsis
Predictive models
Regression method
Decision trees
Support vector Machines
Bayesian Classification …..
Issues Regarding Classification and Prediction
Data Preparation
Feature selection
Data cleaning
remove or reduce noise
treatment of missing values
Challenges of Clinical Data
Typically noisy and inconsistently gathered
Manually recordings of patient's data at irregular intervals
Accurate measures for physiological variables require use of
invasive techniques
large amounts of missing data
in clinical studies
Naïve Solution
Simply ignoring subjects or features with missing data
Dramatic decrease in sample sizes or
feature spaces
Bias in the results
The Paper Contribution
Investigates the role and impact of imputation methods
while building predictive models for
Sepsis risk
Septic shock
Methodology of Research
Data Selection
Building predictive models for sepsis and Septic shock
Leveraging different imputation methods on data
Results
Dataset Description
MIMIC-II Database
(Multiparameter Intelligent Monitoring in Intensive Care)
Publicly and freely available
Includes very large population of ICU patients
contains high temporal resolution data including
lab results
electronic documentation
monitor trends and waveforms.
Funded by :
National Institute Of Biomedical Imaging
and Bioengineering
Clinical Records in MIMIC-II
Overview of the data categories
General
• Patient demographics
• Hospital admissions & discharge Info.
• Room tracking, death dates
• ICD-9 codes
Physiological measures
Hourly vital sign metrics
Medication records
Lab test results
Fluid Balance
Input and output records
Notes and Reports
Discharge summary, nursing progress notes
Radiology and echo reports.
Data Selection and Target Classes
Dataset Size : 12,179 patients
Avoid adults < 18 at time of admission
Patients with least ten observations of BP, TEMP, HR…
Target class
Sepsis Risk Prediction
•
Patients identified by ICD-9 codlings (\995.91" or \995.92“)
• ~ 10:8% of dataset size (1,310 patients)
Septic shock Prediction
• Patient with hypotension and total fluid intake >600 mL
• ~ 44:7%of sepsis patients (586 patients)
Predictive Model for Sepsis Risk
Features
Patient's Clinical History
• Demographic data (gender and ages)
• Medical history
• Basic health data (weight ..)
Measurements of Physiological Variables
logistic Regression as prediction model
use only the clinical history features
use clinical history features after step-wise regression
all available features
use all available features after step-wise regression
Stepwise logistic Regression model
• Logistic Regression
• Type of regression analysis used for predicting the outcome of a
categorical target variable
• Stepwise Regression
• the choice of predictive variables is carried out by an automatic
procedure
1.
2.
3.
4.
starting with no variables in the model
testing the addition of each variable using a chosen model
comparison criterion
adding the variable (if any) that improves the model the most
repeating this process until none improves the model.
Septic Shock Prediction Model
Features
physiologic and laboratory values
Importance of time in septic shock
• Feature matrices creation at reference times of 30, 60, 90, and 120
minutes prior to the onset of septic shock.
Prediction Models
Logistic Regression
all available features,
features set after forward stepwise regression
features set after backward stepwise regression
Support Vector Machine
Classification tree
Decision Tree Learning
Goal
• create a model to predicts value of a target variable based on
several input variables
Sex
Learning a decision tree
Recursive partitioning
Based on selected attribute
stopping partitioning
All samples for a given node belong
to the same class
Decision tree
Classification Trees
Regression Trees
Male
Female
Age
Survived
<= 9.5
>9.5
sibsp
<= 2.5
36%
died
> 2.5
Survived
died
2%
2%
61%
Missing Value Imputation
Missing data in MIMIC II
excluding records with
missing value
47.2%. Reduction in
dataset size
Imputation Methods
1) Mean Feature Values (Mean for Subgroup)
Derived from the patients' gender and age group
• accounted for fundamental physiological differences between
genders and among age groups
Challenges
Mean substitution is especially problematic when there are
many missing values
distorts the distribution and variance
Imputation Methods
2) Matrix Factorization-based Approaches
(Very popular in Bioinformatics fields)
SVDImpute
• Used a linear combination of k-eigenvalues to predict the missing value
Probabilistic Principal Component Analysis (PPCA)
• Combined an Expectation-Maximization (EM) approach to Principal
Component Analysis (PCA) with a probabilistic model
• Use a likelihood function to penalizes data far from the training set
Bayesian PCA
• EM approach + Bayesian model to calculate the likelihood for constructed
data
Sepsis Risk Prediction Results
No Base Model to compare the result with
Evaluation metric
• AUC (Area Under the curve)
Septic Shock Prediction Results
• The septic shock EWS as baseline
• Prediction model : logistic regression
• predict the onset of septic shock one hour in advance
• Use invasively-gathered data from MIMIC waveform data
Imputation-enhanced Prediction Of
Septic Shock
• Impact of various imputation methods on different
reference time
• In comparison with baseline with logistic regression model
AUC Curves for predicting septic shock
60 minutes before onset
Septic shock prediction 60 minutes
before onset for three types of models:
Effect of imputation on logistic regression
coefficients for predicting septic Shock
Consistency across different
imputation methods
Inconsistency of values
obtained with and without
Imputation
non-imputed model suffer
from over-fitting
Conclusion
Imputing missing data can improve model Performance
especially when dealing with larger, noisier, and more
incomplete datasets
Matrix factorization imputation methods like BPCA lead to
models with better predictive accuracy than simpler
approaches like group means.