Informatics - Broad Institute

Download Report

Transcript Informatics - Broad Institute

CaRE Center Informatics
NHLBI CaRE Center Meeting
Bethesda, MD
July 25, 2006
Marcia Nizzari
CaRE Center Informatics
• Builds on existing Genetic Analysis Platform
– Operational for 2+ years
– Genotyping and Resequencing
– Code base successfully reused
• CaRE Center enhancements:
– Data sharing strategy
– Phenotype/Trait thesaurus, meta thesaurus
– Customizable analytic pipelines
User Experience – Production
Three “portals” or dashboards –
• Sample Management
– Register and fingerprint samples, manage storage
and aliquots for experiments
– Record phenotypes for Individuals and Samples
• Project Management
– Manage Groups, Projects, plan your experiments
– Shunt filtered results into analysis pipelines
• Process/LIMS Management
– Design and execute experiments per platform, curate
results
• Affy, Illumina, Sequenom or resequencing
High Level Workflow – for CaRE
Analysis: Gene Pattern +
Production:
CaRE analysis tools
BSP/GAP + CaRE enhancements
Create Experiments
(Samples x Features)
Project
DB
Feature
DB
Design and
Execute
Experiments
QC/Curate Results
Data Compile
BSP DB
Web Services
Upload Samples,
Peds, Individuals,
Phenotypes
LIMS DBs
Data Vault
Summarize/Filter
PLINK
Association &
Statistics Viewers
Cohort’s Custom
Algorithms, Viewers
Production Screenshots
Upload Phenotypes, Create
Experiments, Curate Results,
Filter by Phenotype for Analysis
Project
Management
dashboard
Showing
Phenotype
Upload
Anticipate
significant
enhancements
to handle CaRE
Center
requirements.
Project
Management
dashboard
Showing
Experiment
Definition
Experiments
flow through the
Process
Dashboard for
execution; they
provide the unit
of logical
reporting on
progress.
Process
Dashboard
Showing QC
Report on Affy
chemistry
plates –
Fingerprints to
the right!
Lab techs and
coordinators can
view and curate
plates; set up
re-hyb and redo
pipelines.
Project
Management
dashboard
Showing QC
Statistics and
Pheno Query
Production
analysis
workflow
executed prior
to exporting
data for Gene
Pattern pipeline
association
study analyses.
Project
Management
dashboard
Search
phenotypes to
slide and dice
results for
analysis
Resulting subset
will be piped into
Gene Pattern
pipeline for
analysis on
derived, curated
dataset.
User Experience -- Analysis
• GenePattern framework
– Provides “pluggable” backplane
– Can string together tools in a pipeline
– Tracks everything for ‘reproducible research’
• For CaRE Center
– We create templates for our standard analysis
methods
– Cohort teams can customize
– Streamlines publication!
Screenshots for Analysis
Gene Pattern framework with
PLINK and custom reporting
High Level Workflow – for CaRE
Analysis: Gene Pattern +
Production:
CaRE analysis tools
BSP/GAP + CaRE enhancements
Create Experiments
(Samples x Features)
Project
DB
Feature
DB
Design and
Execute
Experiments
QC/Curate Results
Data Compile
BSP DB
Web Services
Upload Samples,
Peds, Individuals,
Phenotypes
LIMS DBs
Data Vault
Summarize/Filter
PLINK
Association &
Statistics Viewers
Cohort’s Custom
Algorithms, Viewers
Complied Files for PLINK
QC Report
(In browser)
Issues/Questions
• Scope of phenotype-related enhancements
• Group/Project structure for CaRE Center
• CaRE user visibility into Process
Dashboard/LIMS
• Data release model decision
– Data Enclave scenarios and security
• User training and doco
– Analysis methodology
– System and security training
Security for Production & Analysis
BSP Lab
Technician
Users in JAAS domain
CaRE
Cohort
Technician
Project
Management
Groups,
Projects,
Grants,
Panels,
Feature Sets,
Sample Sets
Process/
LIMS
Proj Mgt
Security
Context
(Project)
Lab
Security
Broad Lab Technician, Context
Coordinator
(X-Project)
Biological Samples
Platform
BSP Security Context
(Sample Collection)
Shareable Objects:
Peds, Individuals,
Phenotypes, Samples,
Features
LSIDs
PIPS DB
Feature DB
CaRE
Scientist
Analysis
Pipelines
CaRE Analysis
Security Context
(Scope based on rules
of Data Enclave, could
cover multiple
Projects)
The World
MIT
The Broad Institute
Firewalls
Cisco
Pix
Internet
“Cloud”
MIT
On LIMS
Used for authentication for
VPN access
Radius
DB
Core
Router
Cisco
Pix
Open jack
Host B
Access Rules
for Subnets:
Explicit allows,
e.g., allow host
on LIMS to talk to
host on server
Host on server
…
Allow Rules:
Explicit allows –
http = 80 -> host
Ssh = 22 -> host
https = 443 (SSL)
Host A
Must be in the list
to permit access
Unregistered 10.10
domain
Wireless
Acknowledgements
•
•
•
•
Genetic Analysis Platform team
Biological Sample Platform team
GenePattern team
Stacey Gabriel, David Altshuler, Mark Daly
• URLs:
–
–
–
–
GenePattern: http://www.broad.mit.edu/cancer/software/genepattern/
PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/
Haploview: http://www.broad.mit.edu/mpg/haploview/
Center for Genotyping and Analysis:
http://www.broad.mit.edu/gen_analysis/genotyping/