BioVU and the Synthetic Derivative Erica Bowton, PhD Program Manager, Personalized Medicine Personalized Medicine.

Download Report

Transcript BioVU and the Synthetic Derivative Erica Bowton, PhD Program Manager, Personalized Medicine Personalized Medicine.

BioVU and the
Synthetic Derivative
Erica Bowton, PhD
Program Manager, Personalized Medicine
Personalized Medicine
What is BioVU?
•
•
•
•
The move towards personalized medicine requires very large
sample sets for discovery and validation
BioVU: biobank intended to support a broad view of biology and
enable personalized medicine
Contains de-identified DNA extracted from leftover blood after
clinically-indicated testing of Vanderbilt patients who have not
opted out
Linked to Synthetic Derivative: de-identified EMR
4
John Doe
A7CCF99DE65732….
John Doe
One way hash
5
~2 million records
The Synthetic Derivative:
can be updated
6
A7CCF99DE65732….
A7CCF99DE65732….
A7CCF99DE5732….
One way hash
John Doe
John Doe
eligible
Extract
DNA
~2 million records
The Synthetic Derivative:
can be updated
How BioVU Samples are Accepted
Accepted samples must:
 Be of good quality
 Have sufficient amount of blood
 Be from a patient who has signed the BioVU form
 Be from a patient who has not opted out
7
The BioVU Form
A component of the Consent for Treatment process
8
Awareness Generation
• Posters in phlebotomy areas
in English and Spanish
• Brochures freely available to
VUMC clinics in English and
Spanish
• BioVU hotline available for
questions and opt-out
9
BioVU Sample Accrual: 176,448
225,000
200,000
Current accrual as
of 2-19-2014:
155,090 adult
21,472 pediatric
175,000
150,000
Anticipated pediatric sample accrual
125,000
Anticipated adult sample accrual
100,000
Pediatric samples accrued
75,000
Adult samples accrued
50,000
25,000
0
10
Where are BioVU samples stored?
RTS SmaRTStore
11
BioVU Operations Oversight
= oversight
Institutional Review Board
= input, advisory
BioVU
BioVU
Protocol
Review
Committee
General
Counsel
Med Ctr
Ethics
Ethics
Advisory
Board*
Community
Advisory
Board*
Vice Chancellor’s Office
Operations Oversight Board**
Vice Chancellor (Chair)
Ethics/ELSI (2)
Ctr Human Genetics Research (2)
Clinical genetic testing lab (1)
Genetics/Genetic Medicine (6)
Clin. Pharmacology(PI)
Patient advocacy (2)
University counsel (1)
Biostatistics (3)
Cancer center (3)
Pediatric genetics (1)
Program staff
* Includes (or exclusively) external membership
** (n)= number of members representing this discipline/area. Several members are represented in more than
one area
Resources for EMR-based research at VUMC
The Synthetic Derivative
A de-identified and continuously-updated
image of the EMR (2 M records)
BioVU
• DNA samples available: >175,000
• Plasma collection underway
Redeposited genotypes
13
• Subjects with GWAS data: >12,000
• Subjects with any genotyping: >60,000
• > 8,000,000,000 genotypes
13
The Synthetic Derivative
•
•
•
•
•
•
•
Rich, multi-source database of de-identified clinical and demographic data
A Derivative of the EMR - information content reduced by ‘scrubbing’
identifiers
Systematically shifted event dates
User Interface tool that can be used for access and analysis
Services are available to help deliver results for non-standard queries
(temporal queries, controls matching, etc)
Contains ~2.1 million records
o ~1 million with detailed longitudinal data
o averaging 100,000 bytes in size
o an average of 27 codes per record
Records updated over time and are current through 8/31/13
Synthetic Derivative Data Types
•
•
•
•
•
•
•
•
•
Narratives, such as:
 Clinical Notes
 Discharge Summaries
 History and Physicals
 Problem Lists
 Surgical Reports
 Progress Notes
 Letters
Diagnostic Codes, Procedural Codes
Forms (intake, assessment)
Reports (pathology, ECGs, echocardiograms)
Clinical Communications
Lab Values and Vital Signs
Medication Orders
TraceMaster (ECGs)
Tumor Registry
Technology + policy
De-identification
• Derivation of 128-character identifier (RUI) from the MRN generated
by Secure Hash Algorithm (SHA-512)
• HIPAA identifiers removed using combination of custom techniques
and established de-identification software
Date Shift
• Our algorithm shifts the dates within a record by a time period (up to
364 days backwards) that is consistent within each record, but differs
across records
Restricted access & continuous oversight
• Access restricted to VU; not a public resource
• IRB approval for study (non-human)
• Data Use Agreement
• Audit logs of all searches and data exports
Data Use Agreement
• No attempt at re-identification
• Inform BioVU staff if a record is identifiable
• Research confined to that which is described
• Genotypes to be re-deposited back to BioVU
Phenotyping Approach
Algorithm Development
<95%
Identify
phenotype of
interest
Case & control
algorithm development
and refinement
Manual review;
assess precision
≥95%
Deploy in BioVU
Disease Cohorts
Number in SD
Number in BioVU
Alzheimer’s
3,429
497
Parkinson’s
4,365
778
Migraine
15,699
3,299
Dementia
3,747
1,045
Major Depressive Disorder
20,008
3,385
ADHD
12,922
1,184
Generalized Anxiety Disorder
5,828
1,195
Schizophrenia
4,069
495
Central Nervous System
Psychiatric
19
BioVU Utilization

Pre-Review
DNA Requests
120

BioVU Committee Review

Expedited Review



Full Review



Genotyping data requests
Reviewed by BioVU Chair
100
80
60
DNA sample access requests
40
Reviewed and scored by Primary
and Secondary reviewers
20
BioVU Projects:
 Requests: 104

Data Requests
0
BioVU Requests
Approved so far: 86
20
BioVU Approvals
Current BioVU Studies
BioVU Study Areas
25
Number of Studies
20
15
10
5
0
21
USE CASE 1
Synthetic Derivative Study
22
USE CASE 1
Synthetic Derivative Study
40
35
BMI
30
25
Normal Range
20
15
Zyprexa Prescription
Ability to analyze quantitative, longitudinal repeated measures
23
USE CASE 1
Synthetic Derivative Study
24
USE CASE 1
Synthetic Derivative Study
25
USE CASE 1
Synthetic Derivative Study
26
USE CASE 1
Synthetic Derivative Study
900
800
700
600
500
400
300
200
100
0
0
27
13.3
26.2
40.9
BMI
73.4
300+
USE CASE 2
Existing Genetic Data
28
USE CASE 2
Existing Genetic Data
29
USE CASE 2
Existing Genetic Data
30
USE CASE 2
Existing Genetic Data
31
USE CASE 2
Existing Genetic Data
32
USE CASE 2
Existing Genetic Data
33
USE CASE 3
New Genotyping/Sequencing
34
USE CASE 3
New Genotyping/Sequencing
35
USE CASE 3
New Genotyping/Sequencing
36
USE CASE 3
New Genotyping/Sequencing
37
USE CASE 3
One way hash
New Genotyping/Sequencing
Investigator
query
Data use
agreement
cases
+
controls
One way hash
cases
+
controls
eeddd
eeddd
b
b
bbbbeed
bbbbe d
u
u
e
r
r
d
u
u
b
sscccrruubbbbeedd sscccrruubbbbbeeddd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeed ssccrruubbbbeed
ssccrruubb
ssccrruubb
sscr
sscr
Data use
agreement
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
Investigator
query
Data
analysis
One way hash
cases
+
controls
eeddd
eeddd
b
b
bbbbeed
bbbbe d
u
u
e
r
r
d
u
u
b
sscccrruubbbbeedd sscccrruubbbbbeeddd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeed ssccrruubbbbeed
ssccrruubb
ssccrruubb
sscr
sscr
Data use
agreement
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
Investigator
query
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
F5rt783mbncds….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
B699tre563msd….
Genotyping,
genotypephenotype
relations
cases
+
controls
Sample
retrieval
BioVU Project Life Cycle
•
•
BioVU
Genomic data analysis and research design
Biostatistical/bioinformatic support
1-2 months
•
•
•
•
•
•
Access approvals/application
Cohort identification
Clinical data extraction
Programming support
Study design
Agreements
2-3 months
VANGARD
Vanderbilt Technologies for
Advanced Genomics Analysis
and Research Design
VANTAGE
1-2 months
•
•
•
•
Vanderbilt Technologies
for Advanced Genomics
Genotyping/sequencing approaches
Assay design
SNP selection
Sample pulling and plating
For ALL BioVU Studies…
Resources:
1. BioVU Project Management: [email protected]
2. Programming services: IDASC CORE
3. Genomic technologies: VANTAGE CORE
4. Data analysis services: VANGARD CORE
https://starbrite.vanderbilt.edu/biovu/
42
END
43
Validating EMR phenotype algorithms
disease
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
marker
gene /
region
rs2200733
Chr. 4q25
rs10033464
Chr. 4q25
rs11805303
IL23R
rs17234657
Chr. 5
rs1000113
Chr. 5
rs17221417
NOD2
rs2542151
PTPN22
rs3135388
DRB1*1501
rs2104286
IL2RA
rs6897932
IL7RA
rs6457617
Chr. 6
rs6679677
RSBN1
rs2476601
PTPN22
rs4506565
TCF7L2
rs12255372
TCF7L2
rs12243326
TCF7L2
rs10811661
CDKN2B
rs8050136
FTO
rs5219
KCNJ11
rs5215
KCNJ11
rs4402960
IGF2BP2
0.5
0.5
published
1.0
Odds Ratio
2.0
observed
5.0
5
Ritchie et al, 2010