Oncomine 2.0 - University of Michigan
Download
Report
Transcript Oncomine 2.0 - University of Michigan
ONCOMINE: A Bioinformatics
Infrastructure for Cancer
Genomics
Dan Rhodes
Chinnaiyan Laboratory
Bioinformatics Program
Cancer Biology Training Program
Medical Scientist Training Program
University of Michigan Medical School
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
180+ studies profiling human cancer
Each profiling 5 – 100+ samples
We estimate > 10,000 microarrays
10k chips measuring 20k genes
= 200+ million data points
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Oncomine
oncology + data-mining = oncomine
105 independent datasets (90 analyzed)
7,292 cancer microarrays
79 million gene expression measurements
382 distinct cancer signatures
> 5 million tests of differential expression
> 5 million tests of gene set enrichment
> 5 billion pairwise correlations
Oncomine
Database – relational, Oracle 9.2
Statistical computing – R, Perl, Java
Front End – Java Server Pages
Server – Apache/Tomcat
Graphics – Scalable Vector Graphics
(SVG)
Data Collection
Monthly Pubmed searches (cancer + microarray +
transcriptome + tumor + gene expression profiling)
Gene Expression Repositories
– Gene Expression Omnibus (GEO)
(http://www.ncbi.nlm.nih.gov/geo/)
– ArrayExpress (http://www.ebi.ac.uk/arrayexpress/)
– Stanford Microarray Database (http://genomewww5.stanford.edu/)
– Whitehead Cancer Genomics
(http://www.broad.mit.edu/cancer/)
Data Normalization
Global normalization – same scaling
factors applied to all microarray
features – mean and variance
normalization
Affymetrix - Quantile normalization
Spotted cDNA - Loess normalization
– normalize an M vs. A plot
Data Storage
Generic data structures to accommodate a
variety of data
Samples
Microarray Features / Genes
Normalized Data
Statistical Tests
Gene Sets
Samples
Samples
Microarray Features /
Genes
Normalized Data
Gene Sets
Statistical Tests
Statistical Tests
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & schema
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Differential Expression Analysis
Two-sided t-test for each gene:
False discovery rate correction for multiple
hypothesis testing
R, Oracle, RODBC
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Oncomine Tutorial part I
• Gene Differential Expression
• Gene Co-Expression
• Study Differential Expression
WWW.ONCOMINE.ORG
EMAIL: SHORTCOURSE
PASSWORD: MCBI
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Therapeutic Targets /
Biomarkers
Gene Ontology Consortium
– Biological Process (apoptosis, cell cycle)
– Cellular Component (cytoplasmic membrane,
extracellular)
– Molecular Function (kinase, phosphatase,
protease, etc.)
Known Therapeutic Targets
– NCI Clinical Trials Database
– Therapeutic Target Database
Therapeutic Target
Database
338 proteins with
Literature-documented
Inhibitor, antagonist,
Blocker, etc.
http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp
Known Drug Targets
Expressed in Bladder Cancer
Secreted proteins highly
expressed in Ovarian Cancer
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Metabolic & Signaling
Pathways
KEGG
– Kyoto Encyclopedia of Genes & Genomes
– 87 metabolic pathways, 1700 gene assignments
Biocarta
– Signaling pathways reviewed and entered by ‘expert’
biologists
– 215 signaling pathways, 3700 gene assignments
Pathway enrichment
analysis
Identify pathways and functional
groups of genes deregulated in
particular cancer types
Enrichment Analysis using KolmogrovSmirnov Scanning (Lamb et al)
Kolmogrov-Smirnov
Scanning (Lamb et al)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
*
*
*
*
(1,2,3,4…,19,20)
Vs.
(2,4,6,7,18)
*
Pathway Enrichment
Liver vs. other
Normal tissues
Pathway Enrichment cont
Pathway enrichment
analysis
A search for the Biocarta
pathways most enriched in
a medulloblastoma signature (C2)
uncovered involvement of
the Ras/Rho pathway
Pathway enrichment
analysis cont.
A direct link to the Biocarta pathway provides the details
(Medulloblastoma genes with red boxes)
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Known Protein-Protein
Interactions
HPRD
– Human Protein Reference Database
– Manually curated
– 20,000+ papers, 15,000+ distinct interactions
PKDB
–
–
–
–
Protein Kinase Database
Natural Language Processing
60,000+ abstracts suggest interaciton, 16,000 distinct interactions
Error prone
Co-RIF
– Locus Link Reference into Function
– 12,000+ co-RIFs
Human Interactome Map
(www.himap.org)
INTERACT
Outline
Background
– DNA Microarrays and the Cancer Transcriptome
ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis
ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions
ONCOMINE tutorial
Oncomine Tutorial Part II
Gene set filtering to identify therapeutic
targets and biomarkers
Enrichment Analysis to identify pathways
and processes deregulated in cancer
Pathway and protein interaction networks
deregulated in cancer
Acknowledgements
Chinnaiyan Lab
– Radhika, Terry, Vasu, Jianjun, Scott,
Soory
Pandey Lab
IOB
– Shanker, Nandan