Big Data, Bias and Analytics – What Can Your EHR Really Tell You? ADAM WILCOX, PHD.

Download Report

Transcript Big Data, Bias and Analytics – What Can Your EHR Really Tell You? ADAM WILCOX, PHD.

Big Data, Bias and Analytics –
What Can Your EHR Really Tell
You?
ADAM WILCOX, PHD
big
Source:
Nature (Feb 13, 2013)
Hype Cycle for Emerging Technologies
Gartner (August 2014)
Outline
Background and Experience
Next Steps and Conclusion
Advancing Big Data
Big Data – Bias Issues
Big Data Introduction
Outline
Background and Experience
Next Steps and Conclusion
Advancing Big Data
Big Data – Bias Issues
Big Data Introduction
Knowledge Representation vs. Knowledge
Discovery
1
text
0.9
UMLS
0.8
NLP
0.7
Machine
Learning
Queries
0.6
ru
le
ph
s
ys
ic
ia
ns
M
LS
M
L
U
M
LS
M
L
N
LP
Q
N
LP
Q
U
te
xt
Q
M
L
te
xt
0.5
Effect of Care Management: Outcomes
Costs/Clinic
Salary + training +
admin
$92,077
Benefits/Clinic
Productivity (7 MD’s)
$99,986
Hospitalizations ↓ *
$0
Total (benefits – cost)
+$7,909
* Society would
save, per clinic,
$79,092 in reduced
hospitalizations.
Dorr DA, Wilcox AB, et al. The effect of technology-supported, multidisease care management on the
mortality and hospitalization of seniors. J Am Geriatr Soc. 2008 Dec;56(12):2195-202.
Jul 01 2007
Jul 17 2007
Aug 02 2007
Aug 18 2007
Sep 03 2007
Sep 19 2007
Oct 05 2007
Oct 21 2007
Nov 06 2007
Nov 22 2007
Dec 08 2007
Dec 24 2007
Jan 09 2008
Jan 25 2008
Feb 10 2008
Feb 26 2008
Mar 13 2008
Mar 29 2008
Apr 14 2008
Apr 30 2008
May 16 2008
Jun 01 2008
Jun 17 2008
Jul 03 2008
Jul 19 2008
Aug 04 2008
Aug 20 2008
Sep 05 2008
Sep 21 2008
Oct 07 2008
Oct 23 2008
Nov 08 2008
Nov 24 2008
Dec 10 2008
Dec 26 2008
Jan 11 2009
Jan 27 2009
Feb 12 2009
Feb 28 2009
Increase in CDR View Access
70%
6000
60%
5000
50%
4000
40%
3000
30%
Eclipsys MRNs
Tab %
2000
20%
10%
1000
0%
0
Ad-Hoc Queries
–
Questions
Research
Define
DM
Recurring
–
Automated
Queries
Management
Reports
Measure
DM
OLAP
–
Analytics
Operational
Reports
Analyze
Dashboards
Point of Care
Reporting
Improve
Applications
Decision
Support
Control
B
C
INTEGRATION SERVICES
A
DM
REPLICATED
Databases
DATAMARTS
VIRTUAL DATA WAREHOUSE
DATA WAREHOUSE TOOLS
WICER
• Informed strategy for healthcare
transformation
• Measures to support real-time process and
quality improvement
Improve Use of
Information for
Learning Health System • Data and analytics driving research and
discovery
Outline
Background and Experience
Next Steps and Conclusion
Advancing Big Data
Big Data – Bias Issues
Big Data Introduction
Data Collection Methods
Age
Raw
Matched Matched
Survey
Clinical Clinical Survey
Matched vs. Clinical vs.
Matched
Survey
47.55
Proportion
0.62
Female
Proportion
0.50
Hispanic
Weight kg 75.69
52.33
51.12
50.12
0.072
p << .0001
0.79
0.78
0.71
0.963
p << .0001
0.56
0.94
0.96
p << .0001
p << .0001
77.16
76.99
75.42
0.851
0.851
Height cm
160.34
158.23
161.31
161.25
p << .0001
p << .0001
BMI
28.10
29.70
28.90
28.20
0.207
0.207
0.08
0.08
0.06
0.944
p << .0001
128.48
127.50
127.68
0.204
0.164
73.07
74.34
79.24
80.95
p << .0001
p << .0001
0.04
0.09
0.22
0.16
p << .0001
p << .0001
Prevalence
0.09
of Smoking
Systolic
127.23
Diastolic
Prevalence
of Diabetes
(Survey = selfreport, Clinical =
>1 Diabetes ICD9 AND >1
abnormal test)
Outline
Background and Experience
Next Steps and Conclusion
Advancing Big Data
Big Data – Bias Issues
Big Data Introduction
Data Quality and Assessment
Weiskopf NG, Weng C. Methods and dimensions of data quality assessment: enabling
reuse for clinical research. JAMIA 2013
• Bootstrapping
t-tests
Bootstrapping
+ Easy
+ Easy
+ Robust
+ Powerful
+ Robust
+ Powerful
+ Widely implemented
+ Widely implemented
- Less common
- Not appropriate for all
data types
- Less powerful
- Requires special
packages or
programming
• Learning curves and over-fitting
0.9
ROC Area
“New” Analytic
Methods
Non-parametric tests
(Chi-square)
med&neg
medical
negated
predictive
expert-value
0.8
0.7
30
120
210
training set size
300
• Hypothesis generation process
• Sub-population analysis
Big Data Analytic
Approaches
• Investigating surprises
– Often more revealing about data quality than
real effects
Outline
Background and Experience
Next Steps and Conclusion
Advancing Big Data
Big Data – Bias Issues
Big Data Introduction
• Know the data you need
• Use the data you have
• Get the data you want
Big Data
Next Steps to Make it Useful
• Adapt data to user needs
• Make value accessible
• Secure database
• Data sources
Minimum Requirements • Patient-level integration
– Master Patient Index*
to Provide Value
• Semantic integration
– Vocabulary*
• Excellent analysts
Patient Data Integration
Vocabulary and Data Density
Natural Language Processing
Factors Influencing Health
Socioeconomic
Health behaviors
Clinical care
Physical
environment
• Transcribing
• Patient Portals
Collecting PatientReported Outcomes
• Scanning
• Tablet entry
Scanning
Tablets
Institutional
Patient Reported
Information:
Tablets vs. Scanned
Documents
Equipment cost
Infection risk
=
=
=
=
+
-
+
+
+
-
Security
Theft
Data loss
Patient
mismatch
Disaster
recovery
Scanning
Tablets
=
=
=
=
-
+
+
+
=
=
+
-
Functionality
Office workflow
Patient Reported
Information:
Tablets vs. Scanned
Documents
Education/traini
ng
Data timeliness
Branching logic
Extensibility
Patient
experience
Preference
Security
perception
QI LifeCost/
Instances Required
cycle Instance
Goal
Task
Use
User
Tool
Answer a
specific
question
Ad hoc
query
Research
Researcher
SQL
Define
+
+++++
Defined
request
Observe trends
Recurring
query
Management
reports
Manager
Reporting
application
Measure
++
++++
Available
owner
Identify
dependencies
Subpopulation
analysis
Operational
analysis
Analyst
Analytic tools
Analyze
+++
+++
Content
expert/
analyst
Assist decision
making
Dashboard
display
Point of care
improvement
Clinical
team
Registries
Improve
++++
++
Pilot site
Automate
processes
Application
Decision
support
Clinician/
Role
EMR
application
Control
+++++
+
Institutional
sponsor
Physical Activity