Analysis pipeline

Download Report

Transcript Analysis pipeline

Weihong Xu
11/12/2008
Boston, MA
Outline







Introduction to array design and library files
Image quantification (DAT->CEL)
CEL reduction (CEL->exprCEL, remove SNP)
Low level analysis (CEL->Expression Index)
Practice session #1
Expression Console
High level analysis (Expression Index -> Gene
List)
 Practice session #2
If time permits,
 Visualization
 Glue Grant Exon Array Tools (beta-testing)
 Practice session #3
11/12/2008
Glue Grant H1 Analysis Tutorial
2
Introduction to array design
 Significant change over Affymetrix exon array ST1.0
 More focused on known transcripts
 Higher coverage
 More comprehensive probe selection method
 More contents:






exon probes 3.2M (0.32M targets)
junction probes 1M (0.25M targets)
coding SNP 1M (85K targets)
Untranslated Regions (UTR) 0.5M (50K targets)
tiling un-annotated units 0.5M (50K targets)
…
 http://gluegrant1.stanford.edu/wiki/
11/12/2008
Glue Grant H1 Analysis Tutorial
3
Some definitions (TC, EC, PSR, Juc, …)
11/12/2008
Glue Grant H1 Analysis Tutorial
4
Potential Analysis Questions
 Gene expression
 Alternative splicing
 Transcript isoform deconvolution
 Allele-specific expression
 Antisense expression
…
11/12/2008
Glue Grant H1 Analysis Tutorial
5
Introduction to Library files
 Support multiple tools:
 Quality control
 low level analysis and expression analysis using APT and
Expression Console
 High level analysis using dChip
 Glue Grant Exon Analysis Tools;
 visualization using cisGenomeBrowser or UCSC Genome
Browser.
 Library and annotation database
 http://gluegrant1.stanford.edu/phpMyAdmin/
 username: ??? password: ???
 hglue – all tables are read-only
 GlueArraySandBox – for users to generate personalized library files
and annotations
11/12/2008
Glue Grant H1 Analysis Tutorial
6
Major types of library files
 CLF - mapping of probe IDs to x/y in the CEL file
 PGF - groups probes (by probe ID) into probe sets.
 PS – a list of probe IDs
 MPS – a list of meta probe set IDs with a corresponding list of
probe set IDs
 BGP – a list of Probe IDs to be used in background correction
 QCC – a table of probe IDs for quality control and their
corresponding type
 KIL – a list of probe IDs to be ignored in DABG (probe with
GC < 3)
 http://www.affymetrix.com/support/developer/powertools/c
hangelog/FILE-FORMATS.html
11/12/2008
Glue Grant H1 Analysis Tutorial
7
Image quantification (DAT->CEL)
 Function: convert pixel image to probe intensity file
 Gridding
 Quantification
 Software:
 GeneChip Operating Software (GCOS)
 Affymetrix GeneChip Command Console (AGCC)
 http://www.affymetrix.com/products_services/softwar
e/specific/command_console_software.affx
11/12/2008
Glue Grant H1 Analysis Tutorial
8
CEL file reduction (CEL->exprCEL)
 Function: remove SNPs to meet the IRB concern
 Script:


Mac/Unix: modCEL.unix.pl --xymap=mapping_file \
--CEL=path/*.CEL --OUTDIR=path --Prefix=expr
PC: modCEL.pc.pl –xymap=mapping_file \
--CEL=filename.CEL --OUTDIR=path --Prefix=expr
 Parameters:
 xymap - mapping, hGlue1_0.r3.CEL2exprCEL.xymay
 Prefix – a string that will be added to the CEL
file name
11/12/2008
Glue Grant H1 Analysis Tutorial
9
Low Level Analysis (CEL->Expression Index)
 APT/Expression Console and QC



Quality control
Extracting specific features
Background correction/Normalization/Summarization
 Practice session (~30minutes to 1hr)
11/12/2008
Glue Grant H1 Analysis Tutorial
10
APT/Expression console
 APT-Affymetrix Power Tool
 Support both 3’ expression array and exon array
 Support both expression and genotype analysis





Apt-probeset-summarize -- S(N(B))
Apt-cel-extract -- extract features
Apt-dump-pgf -- extract probe/probeset information
Apt-summary-vis -- generating visualization track files
Apt-midas –alternative splicing
 Memory management
 http://www.affymetrix.com/partners_programs/progr
ams/developer/tools/powertools.affx#1_1
11/12/2008
Glue Grant H1 Analysis Tutorial
11
Overview of Quality Control
 Function: ensure the quality and reproducibility of
array result
 What to assess?
 Probe level


Per array: signal distribution of different probe types
Across array: overall signal distribution, PM-mean, BG-mean
 Probe Set level (PSR, TC)


11/12/2008
Per array: Pos_vs_Neg_AUC, Presence call
Across array: correlation plot (median correlation to other
arrays in the same batch)
Glue Grant H1 Analysis Tutorial
12
Quality Control Tool – GlueQC.R
 requires R and APT
 Syntax: Rscript GlueQC.R celpath outpath
libpath
 Libraries:
 hGlue1_0.r3.clf
 hGlue1_0.r3.pgf
 hGlue1_0.r3.PSR.ps
 hGlue1_0.r3.TC.mps
 hGlue1_0.r3.KIL
 hGlue1_0.r3.qc.clfpgf
11/12/2008
Glue Grant H1 Analysis Tutorial
13
Density distribution plot
 Overall intensity
range
 separation between
different probe types
11/12/2008
Glue Grant H1 Analysis Tutorial
14
All array density plot
 Check the similarity
of intensity
distribution across
arrays
11/12/2008
Glue Grant H1 Analysis Tutorial
15
QC summary plot
 Check outliers in each
plot
 Flags can only be
consider as caution
sign, especially when
the sample size is
small
11/12/2008
Glue Grant H1 Analysis Tutorial
16
TC_cor_avg_outli
er
PSR_cor_avg_outl
ier
pos_vs_neg_auc_
outlier
all_probeset_perc
ent_called_outlie
r
bgrd_mean_outli
er
pm_mean_outlier
TC_cor_avg
PSR_cor_avg
pos_vs_neg_auc
all_probeset_perc
ent_called
bgrd_mean
pm_mean
CEL file
QC summary table
125.48 54.92 0.84 0.46 0.94 0.98 0
0
0
0
0
0
S_273467_hGlue1_0_R2.CEL 117.19 54.16 0.84 0.43 0.94 0.98 0
0
0
0
0
0
112.81 50.97 0.81 0.47 0.94 0.99 0
0
0
0
0
0
S_273467_hGlue1_0_R1.CEL
S_273468_hGlue1_0_R1.CEL
11/12/2008
Glue Grant H1 Analysis Tutorial
17
Extract features
 Function: extract a subset of probe signals from CEL
files
 Tool: apt-cel-extract
 Syntax: apt-cel-extract -o out.txt [-c
chip.clf -p chip.pgf] [-d chip.cdf] [-probeset-ids=norm-exon.txt] *.cel
 Parameters:

11/12/2008
If using probeset-ids, CLF and PGF have to been supplied
Glue Grant H1 Analysis Tutorial
18
Examples
lowlevelanalysis/extractfeatures.bat
 extract all raw probe signal
>apt-cel-extract -o raw_probe_signal.txt --cel-files
CELlist.txt
 extract quantile normalized and GC-background corrected probe signal
>apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --b
hGlue1_0.r3.antigenomic.bgp -a quant-norm,pm-gcbg -o
bgc_probe_signal.txt --cel-files CELlist.txt
 extract probe signal of a specific content: “main->junction”
>apt-dump-pgf -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf -probeset-type main --probeset-type junction -o juc.pgf
>apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --
probe-ids juc.pgf -o juc_raw_probe_signal.txt --cel-files
CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
19
Background correction,
normalization and summarization
 Goal: transform probe signal into biological
meaningful expression measure
 Background correction -- remove non-target signal
 Normalization --remove non-biological variance
 Summarization -- summarize probe signal into probe set
signal
11/12/2008
Glue Grant H1 Analysis Tutorial
20
apt-probeset-summarize
 Syntax
 apt-probeset-summarize –a rma-sketch [–a
dabg] –c chip.clf –p chip.pgf –b chip.bgp –o
outpath –m chip.mps [–kill-list chip.kil]
*.CEL
 Parameters



11/12/2008
-a, analysis method
 Chipstream format: a comma separated list of transformations
with specific parameters passed as key value pairs, e.g.
 rma-bg,quant-norm.sketch=1.usepm=true.bioc=true,pm-only,med-polish
 Predefined method: rma-sketch, dabg, rma, plier etc
--kill-list: needed when the analysis involves gc-bg
Windows: using ‘—cel-files filename’ instead of *.CEL
Glue Grant H1 Analysis Tutorial
21
apt-probeset-summarize (2)
 Background correction
 gc-bg
 rma-bg
 Mas5-bg
 Pm-gcbg
 Pm-mm
 Normalization
 Quant-norm
 Med-norm
 Summarization
 Plier/iter-plier
 Median polish (RMA)
 DABG
 Median
 No Li-Wong yet
11/12/2008
Glue Grant H1 Analysis Tutorial
22
Examples
LowLevelAnalysis/bns.bat

PSR rma-sketch and dabg analysis
apt-probeset-summarize -a rma-sketch -a dabg -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf
-b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -s hGlue1_0.r3.PSR.ps
--qc-probesets hGlue1_0.r3.qcc -o BNS/PSR --cel-files CELlist.txt --kill-list
hGlue1_0.r3.kil

TC (transcription cluster) Meta Probe Set rma-sketch or chipstream
apt-probeset-summarize -a rma-sketch -a quant-norm.sketch=50000,pm-gcbg,iter-plier
-c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets
hGlue1_0.r3.qcc -m hGlue1_0.r3.TC.mps -o BNS/TC --cel-files CELlist.txt --kill-list
hGlue1_0.r3.kil

Compute U133Plus2 probe Set
apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b
hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m
hGlue1_0.r3.U133plus2.mps -o BNS/u133plus2 --cel-files CELlist.txt

Compute Human Exon ST1.0 Transcript Cluster
apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b
hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m
hGlue1_0.r3.HuEX_TC.mps -o BNS/huex --cel-files CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
23
Apt-probeset-summarize output
 [method].summary.txt – expression index matrix
 [method].report.txt – quality control measures
11/12/2008
Glue Grant H1 Analysis Tutorial
24
Expression Console
 Improvement over last tutorial
 More summary options: EC, TC, JUC, EX, TX
 Define probes into core, extended (multi probes)
 Convert to U133plus2, HuEx format
 Walk through an example
 Summary
 QC metrix
 Link with annotation
 Refer to doc/EC_Tutorial.doc (recycled from last
tutorial)
11/12/2008
Glue Grant H1 Analysis Tutorial
25
Practice session #1
 CEL reduction (SNPremover)
 GlueQC
 GlueQC on data/07-20-08/CELlist_test.txt (15 arrays)
 Low level Analysis
 Feature extraction


Extract raw probe intensity of 15 arrays
Extract quantile normalized and GC-background corrected probe
intensity of “main->junction” from 15 arrays
 B.N.S
 rma-sketch summary of PSR for 15 arrays
 rma-sketch summary of TC for 15 arrays (use mps file from
lib/GenBase)
11/12/2008
Glue Grant H1 Analysis Tutorial
26
High level analysis (Expression Index ->
Gene List)
 Array annotation and annotation files
 Import APT results to dChip for high level analysis
 A practice session
11/12/2008
Glue Grant H1 Analysis Tutorial
27
Array annotation (r3)
 Update over r2 version
 Corrected a bug caused by MySQL end-of-line problem
 Added annotation for Transcript, Junction and other
contents
 Added annotation files for dChip and GenBase
 Added BED files and REFFLAT files for Genome Browser
 Refer to lib/readme.doc for details
 Customerization: http://gluegrant1.stanford.edu/phpMyAdmin/
11/12/2008
Glue Grant H1 Analysis Tutorial
28
hGlue1_0.r3.TC_annot.csv
probeset_id TC0100032
seqnamechr1
start 1205715
end 1217272
strand +
TR01031609 /// TR01031608 /// TR01031607 /// TR01031606 /// TR01031605 /// TR01031604 /// TR01031603 /// TR01031602 ///
transcript_id TR01031601 /// TR01008856 /// TR01008322 /// TR01008321 /// TR01008320 /// TR01008319 /// TR01002236 /// TR01001810
NM_002978 /// ENST00000379130 /// ENST00000379116 /// ENST00000379110 /// ENST00000379101 /// ENST00000379099 ///
ENST00000338555 /// ENST00000325425 /// chr1.72.1019 /// chr1.72.1018 /// chr1.72.1017 /// chr1.72.1016 /// chr1.72.1015 ///
public_transcript_id chr1.72.1014 /// chr1.72.1013 /// chr1.72.1012 /// chr1.72.1011
gene_id 6339
symbol SCNN1D
description 1p36.3-p36.2
sodium channel, nonvoltage-gated 1, delta /// Amiloride-sensitive sodium channel subunit delta (Epithelial Na(+) channel subunit delta)
mapLocation (Delta ENaC) (Nonvoltage-gated sodium channel 1 subunit delta) (SCNED) (Delta NaCH). [Source:Uniprot/SWISSPROT;Acc:P51172]
GO:0006811 // ion transport // IEA /// GO:0006814 // sodium ion transport // IEA /// GO:0050896 // response to stimulus // IEA ///
GO_biological_process GO:0050909 // sensory perception of taste // IEA
GO:0005216 // ion channel activity // IEA /// GO:0005272 // sodium channel activity // IEA /// GO:0015280 // amiloride-sensitive
GO_molecular_function sodium channel activity // IEA /// GO:0031402 // sodium ion binding // IEA
GO:0016020 // membrane // IEA /// GO:0016021 // integral to membrane // IEA /// GO:0005624 // membrane fraction // NR ///
GO_cellular_component GO:0005887 // integral to plasma membrane // NR /// GO:0016020 // membrane // IEA /// GO:0016021 // integral to membrane // IEA
unigene_id Hs.512681
12477932 /// 14645214 /// 14702039 /// 14726523 /// 15084585 /// 15308635 /// 15489334 /// 16423824 /// 16710414 ///
pubmed_id 16930535 /// 17472699 /// 18073141 /// 7499195 /// 8661065
transcript_count 16
exon_cluster_count 16
exon_count 34
psr_count 29
core_probeset_count 29
extended_probeset_count 11
core_probe_cout 281
extended_probe_count 11
11/12/2008
Glue Grant H1 Analysis Tutorial
29
hGlue1_0.r3.PSR_annot.csv
probeset_id 2
seqname chr6
start 143813706
end 143813810
strand +
probe_count 10
psr_id PSR060006659
exon_cluster_id EC060004997
transcript_cluster_id TC0600776
exon_id EX060014449|EX060018900
transcript_id TR06002279|TR06009764
probeset_type core
11/12/2008
3
chr15
26701374
26701512
+
1
PSR150000536
EC150000483
TC1500128
EX150003690|EX150010649
TR15001527|TR15002341|TR15002342|TR150
02343|TR15002344|TR15002345|TR15002346
|TR15002347|TR15002348|TR15002349|TR15
002350|TR15002351|TR15002352|TR1500235
3|TR15002354|TR15002355|TR15002356|TR1
5002357|TR15002358|TR15002359|TR150023
60|TR15002361|TR15002362|TR15002363|TR
15002364|TR15002365
extended
Glue Grant H1 Analysis Tutorial
30
hGlue1_0.r3.Junction_annot.csv
JunctionID JUC0100000002
JUC0100000009
chrom chr1
chr1
chromStart 2090
311127
chromEnd 2476
311901
strand +
+
PSRID5 PSR010000001
PSR010000013
PSRID3 PSR010000002
PSR010000014
ECID5 EC010000001
EC010000011
ECID3 EC010000002
EC010000012
EX010034624_to_EX010034625 |
Exon2Exon EX010034624_to_EX010034626
TxIDs TR01010028 | TR01010027
PublicTxIDs chr1.1.1 | chr1.1.0
11/12/2008
EX010034670_to_EX010034671 |
EX010034670_to_EX010034674
TR01024688 | TR01024686
chr1.26.726 | chr1.26.724
TcID TC0100001
TC0100006
Type constitutive
alternative
Glue Grant H1 Analysis Tutorial
31
dChip
 Improve over last tutorial
 Added Gene Ontology, KEGG pathway and chromosome
band analysis
 Walk through an example
 Remove extra header and extra tail
 Import external data into dChip



11/12/2008
Differential Expression Analysis
Clustering/Enrichment
Chromosome/Genome enrichment
Glue Grant H1 Analysis Tutorial
32
Practice session #2
 dChip
11/12/2008
Glue Grant H1 Analysis Tutorial
33
Visualization - cisGenomeBrowser
 Light version of UCSC Genome Browser (Hui Jiang)
 CEL image
 Genome Region
 http://biogibbs.stanford.edu/~jiangh/browser/index.h
tml
11/12/2008
Glue Grant H1 Analysis Tutorial
34
cisGenomeBrowser-CEL Image
11/12/2008
Glue Grant H1 Analysis Tutorial
35
cisGenomeBrowser-Genomic Region
11/12/2008
Glue Grant H1 Analysis Tutorial
36
cisGenomeBrowser
 Annotation track
 hGlue1_0.r3.TC.refflat
 hGlue1_0.r3.TX.refflat
 Hg18.genefile (refseq track only)
 Signal track (visualization/genCisGenomeBrowserTrack.bat)
 probe raw signal barfile
>genbar.pl –coord = hGlue1_0.r3.Probe.BED --signal =
raw_probe_signal.txt –outdir = Probe_barfile
 PSR barfile
>genbar.pl --coord=hGlue1_0.r3.PSR.BED --signal=PSR/rmasketch.summary.txt --outdir=PSR_barfile
 Gene barfile
>genbar.pl --coord=hGlue1_0.r3.TC.BED --signal=TC/rmasketch.summary.txt --outdir=TC_barfile
 Demo
11/12/2008
Glue Grant H1 Analysis Tutorial
37
Other Browsers
 UCSC Genome Browser
(visualization/genUCSCBrowsreTrack.bat)
 apt-summary-vis -g hGlue1_0.r3.PSR.BED
PSR/rma-sketch.summary.txt --wiggle-colindex 1 –o CEL1.PSR.wig
 Need to tweak BED file to make PSR non-overlap in
order to work on UCSC browser
 Affymetrix Genome Browser
 apt-summary-vis -g hGlue1_0.r3.PSR.BED
PSR/rma-sketch.summary.txt –o PSR.egr
11/12/2008
Glue Grant H1 Analysis Tutorial
38
Glue Grant Exon Array tool
 Highlights
 Specially tailored for exon arrays
 Command line with R interface
 Probe sequence specific background model-MAT
 Summarization: probe-selection (GenBase), Li-Wong model
(dChip) and median-polish (RMA)
 Integrated alternative splicing analysis (MADS)
 Run analysis (GlueGrantExonArrayTool/runEAT.bat)
 ../../bin/GlueGrantExonArrayTool/eat.win32.exe
EXPR_param.conf -l ../../data/07-2008/CELlist.txt
 ../../bin/GlueGrantExonArrayTool/eat.win32.exe
MADS_param.conf -l ../../data/07-2008/CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
39
Param.conf
 Specify analysis parameters
 Analysis type
 Librarie files
 Background correction method
 Normalization method
 Summarizaiton method
 MADS parameters
 Example: /GlueGrantExonArrayTool/Expr_param.conf
11/12/2008
Glue Grant H1 Analysis Tutorial
40
Practice session#3
 cisGenomeBrowser
 Generate bar files for PSR and TC of 15 arrays in
CELlist_test.txt from practice session#1
 Search for genes of your interests
 Glue Grant Analsysis Tool
 Repeat steps in runEAT.bat
11/12/2008
Glue Grant H1 Analysis Tutorial
41
Thank you
11/12/2008
Glue Grant H1 Analysis Tutorial
42