Broad Firehose - STAT115 Introduction to Computational Biology

Download Report

Transcript Broad Firehose - STAT115 Introduction to Computational Biology

April 15/16, 2014
Lin Liu
Yang Li
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Broad Firehose
Broad
Firehose
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
TCGA
The Cancer Genome Atlas
The Cancer Genome Atlas (TCGA): a project, begun in 2005, to
catalogue genetic mutations responsible for cancer, using genome
sequencing and bioinformatics.
http://cancergenome.nih.gov/
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
TCGA
The Cancer Genome Atlas
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
TCGA
The Cancer Genome Atlas
The project scheduled 500 patient samples, more than most genomics
studies, and used different techniques used to analyze the patient
samples. Techniques gene expression profiling, copy number variation
profiling, SNP genotyping, genome wide DNA methylation profiling,
microRNA profiling, and exon sequencing of at least 1,200 genes. The
TCGA is sequencing the entire genomes of some tumors, including at
least 6,000 candidate genes and microRNA sequences.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
TCGA
The Cancer Genome Atlas
https://tcga-data.nci.nih.gov/tcga/tcgaAnalyticalTools.jsp
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
TCGA
The Cancer Genome Atlas
https://tcga-data.nci.nih.gov/tcga/tcgaAnalyticalTools.jsp
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Broad GDAC Firehose
Broad Institute's Genome Data Analysis Center (GDAC):
On behalf of The Cancer Genome Atlas, we've designed and operate
scientific data and analysis pipelines which pump terabyte-scale genomic
datasets through scores of quantitative algorithms, in the hope of
accelerating the understanding of cancer.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Broad GDAC Firehose
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Broad GDAC Firehose
Download Firehose Data:
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
What data can you download?
• Mutation calling, classifying, summarizing
and significance-testing
• Copy number alteration detection and
significance-testing
• Expression- and methylation-based
clustering
• Associating genomic data with common
clinical, treatment or survival groups
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
So, firehose has processed
data already
• GISTIC, MutSigCV etc
• Exercises:
– How to get all the MutSig results for ovarian
cancer?
– ./firehose_get -tasks mut analyses latest ov
– unzip file gdac.broadinstitute.org_OVTP.MutSigNozzleReportCV.Level_4.2014011
500.0.0
– Check OV-TP.final_analysis_set.maf
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Broad GDAC Firehose
Firehose Documentation:
https://confluence.broadinstitute.org/display/GDAC/Documentation
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Side issues: how to access
TCGA bam files
• https://cghub.ucsc.edu/
• http://www.ncbi.nlm.nih.gov/gap
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology