Analysis pipeline
Download
Report
Transcript Analysis pipeline
Weihong Xu
11/12/2008
Boston, MA
Outline
Introduction to array design and library files
Image quantification (DAT->CEL)
CEL reduction (CEL->exprCEL, remove SNP)
Low level analysis (CEL->Expression Index)
Practice session #1
Expression Console
High level analysis (Expression Index -> Gene
List)
Practice session #2
If time permits,
Visualization
Glue Grant Exon Array Tools (beta-testing)
Practice session #3
11/12/2008
Glue Grant H1 Analysis Tutorial
2
Introduction to array design
Significant change over Affymetrix exon array ST1.0
More focused on known transcripts
Higher coverage
More comprehensive probe selection method
More contents:
exon probes 3.2M (0.32M targets)
junction probes 1M (0.25M targets)
coding SNP 1M (85K targets)
Untranslated Regions (UTR) 0.5M (50K targets)
tiling un-annotated units 0.5M (50K targets)
…
http://gluegrant1.stanford.edu/wiki/
11/12/2008
Glue Grant H1 Analysis Tutorial
3
Some definitions (TC, EC, PSR, Juc, …)
11/12/2008
Glue Grant H1 Analysis Tutorial
4
Potential Analysis Questions
Gene expression
Alternative splicing
Transcript isoform deconvolution
Allele-specific expression
Antisense expression
…
11/12/2008
Glue Grant H1 Analysis Tutorial
5
Introduction to Library files
Support multiple tools:
Quality control
low level analysis and expression analysis using APT and
Expression Console
High level analysis using dChip
Glue Grant Exon Analysis Tools;
visualization using cisGenomeBrowser or UCSC Genome
Browser.
Library and annotation database
http://gluegrant1.stanford.edu/phpMyAdmin/
username: ??? password: ???
hglue – all tables are read-only
GlueArraySandBox – for users to generate personalized library files
and annotations
11/12/2008
Glue Grant H1 Analysis Tutorial
6
Major types of library files
CLF - mapping of probe IDs to x/y in the CEL file
PGF - groups probes (by probe ID) into probe sets.
PS – a list of probe IDs
MPS – a list of meta probe set IDs with a corresponding list of
probe set IDs
BGP – a list of Probe IDs to be used in background correction
QCC – a table of probe IDs for quality control and their
corresponding type
KIL – a list of probe IDs to be ignored in DABG (probe with
GC < 3)
http://www.affymetrix.com/support/developer/powertools/c
hangelog/FILE-FORMATS.html
11/12/2008
Glue Grant H1 Analysis Tutorial
7
Image quantification (DAT->CEL)
Function: convert pixel image to probe intensity file
Gridding
Quantification
Software:
GeneChip Operating Software (GCOS)
Affymetrix GeneChip Command Console (AGCC)
http://www.affymetrix.com/products_services/softwar
e/specific/command_console_software.affx
11/12/2008
Glue Grant H1 Analysis Tutorial
8
CEL file reduction (CEL->exprCEL)
Function: remove SNPs to meet the IRB concern
Script:
Mac/Unix: modCEL.unix.pl --xymap=mapping_file \
--CEL=path/*.CEL --OUTDIR=path --Prefix=expr
PC: modCEL.pc.pl –xymap=mapping_file \
--CEL=filename.CEL --OUTDIR=path --Prefix=expr
Parameters:
xymap - mapping, hGlue1_0.r3.CEL2exprCEL.xymay
Prefix – a string that will be added to the CEL
file name
11/12/2008
Glue Grant H1 Analysis Tutorial
9
Low Level Analysis (CEL->Expression Index)
APT/Expression Console and QC
Quality control
Extracting specific features
Background correction/Normalization/Summarization
Practice session (~30minutes to 1hr)
11/12/2008
Glue Grant H1 Analysis Tutorial
10
APT/Expression console
APT-Affymetrix Power Tool
Support both 3’ expression array and exon array
Support both expression and genotype analysis
Apt-probeset-summarize -- S(N(B))
Apt-cel-extract -- extract features
Apt-dump-pgf -- extract probe/probeset information
Apt-summary-vis -- generating visualization track files
Apt-midas –alternative splicing
Memory management
http://www.affymetrix.com/partners_programs/progr
ams/developer/tools/powertools.affx#1_1
11/12/2008
Glue Grant H1 Analysis Tutorial
11
Overview of Quality Control
Function: ensure the quality and reproducibility of
array result
What to assess?
Probe level
Per array: signal distribution of different probe types
Across array: overall signal distribution, PM-mean, BG-mean
Probe Set level (PSR, TC)
11/12/2008
Per array: Pos_vs_Neg_AUC, Presence call
Across array: correlation plot (median correlation to other
arrays in the same batch)
Glue Grant H1 Analysis Tutorial
12
Quality Control Tool – GlueQC.R
requires R and APT
Syntax: Rscript GlueQC.R celpath outpath
libpath
Libraries:
hGlue1_0.r3.clf
hGlue1_0.r3.pgf
hGlue1_0.r3.PSR.ps
hGlue1_0.r3.TC.mps
hGlue1_0.r3.KIL
hGlue1_0.r3.qc.clfpgf
11/12/2008
Glue Grant H1 Analysis Tutorial
13
Density distribution plot
Overall intensity
range
separation between
different probe types
11/12/2008
Glue Grant H1 Analysis Tutorial
14
All array density plot
Check the similarity
of intensity
distribution across
arrays
11/12/2008
Glue Grant H1 Analysis Tutorial
15
QC summary plot
Check outliers in each
plot
Flags can only be
consider as caution
sign, especially when
the sample size is
small
11/12/2008
Glue Grant H1 Analysis Tutorial
16
TC_cor_avg_outli
er
PSR_cor_avg_outl
ier
pos_vs_neg_auc_
outlier
all_probeset_perc
ent_called_outlie
r
bgrd_mean_outli
er
pm_mean_outlier
TC_cor_avg
PSR_cor_avg
pos_vs_neg_auc
all_probeset_perc
ent_called
bgrd_mean
pm_mean
CEL file
QC summary table
125.48 54.92 0.84 0.46 0.94 0.98 0
0
0
0
0
0
S_273467_hGlue1_0_R2.CEL 117.19 54.16 0.84 0.43 0.94 0.98 0
0
0
0
0
0
112.81 50.97 0.81 0.47 0.94 0.99 0
0
0
0
0
0
S_273467_hGlue1_0_R1.CEL
S_273468_hGlue1_0_R1.CEL
11/12/2008
Glue Grant H1 Analysis Tutorial
17
Extract features
Function: extract a subset of probe signals from CEL
files
Tool: apt-cel-extract
Syntax: apt-cel-extract -o out.txt [-c
chip.clf -p chip.pgf] [-d chip.cdf] [-probeset-ids=norm-exon.txt] *.cel
Parameters:
11/12/2008
If using probeset-ids, CLF and PGF have to been supplied
Glue Grant H1 Analysis Tutorial
18
Examples
lowlevelanalysis/extractfeatures.bat
extract all raw probe signal
>apt-cel-extract -o raw_probe_signal.txt --cel-files
CELlist.txt
extract quantile normalized and GC-background corrected probe signal
>apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --b
hGlue1_0.r3.antigenomic.bgp -a quant-norm,pm-gcbg -o
bgc_probe_signal.txt --cel-files CELlist.txt
extract probe signal of a specific content: “main->junction”
>apt-dump-pgf -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf -probeset-type main --probeset-type junction -o juc.pgf
>apt-cel-extract -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf --
probe-ids juc.pgf -o juc_raw_probe_signal.txt --cel-files
CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
19
Background correction,
normalization and summarization
Goal: transform probe signal into biological
meaningful expression measure
Background correction -- remove non-target signal
Normalization --remove non-biological variance
Summarization -- summarize probe signal into probe set
signal
11/12/2008
Glue Grant H1 Analysis Tutorial
20
apt-probeset-summarize
Syntax
apt-probeset-summarize –a rma-sketch [–a
dabg] –c chip.clf –p chip.pgf –b chip.bgp –o
outpath –m chip.mps [–kill-list chip.kil]
*.CEL
Parameters
11/12/2008
-a, analysis method
Chipstream format: a comma separated list of transformations
with specific parameters passed as key value pairs, e.g.
rma-bg,quant-norm.sketch=1.usepm=true.bioc=true,pm-only,med-polish
Predefined method: rma-sketch, dabg, rma, plier etc
--kill-list: needed when the analysis involves gc-bg
Windows: using ‘—cel-files filename’ instead of *.CEL
Glue Grant H1 Analysis Tutorial
21
apt-probeset-summarize (2)
Background correction
gc-bg
rma-bg
Mas5-bg
Pm-gcbg
Pm-mm
Normalization
Quant-norm
Med-norm
Summarization
Plier/iter-plier
Median polish (RMA)
DABG
Median
No Li-Wong yet
11/12/2008
Glue Grant H1 Analysis Tutorial
22
Examples
LowLevelAnalysis/bns.bat
PSR rma-sketch and dabg analysis
apt-probeset-summarize -a rma-sketch -a dabg -c hGlue1_0.r3.clf -p hGlue1_0.r3.pgf
-b hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -s hGlue1_0.r3.PSR.ps
--qc-probesets hGlue1_0.r3.qcc -o BNS/PSR --cel-files CELlist.txt --kill-list
hGlue1_0.r3.kil
TC (transcription cluster) Meta Probe Set rma-sketch or chipstream
apt-probeset-summarize -a rma-sketch -a quant-norm.sketch=50000,pm-gcbg,iter-plier
-c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b hGlue1_0.r3.antigenomic.bgp --qc-probesets
hGlue1_0.r3.qcc -m hGlue1_0.r3.TC.mps -o BNS/TC --cel-files CELlist.txt --kill-list
hGlue1_0.r3.kil
Compute U133Plus2 probe Set
apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b
hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m
hGlue1_0.r3.U133plus2.mps -o BNS/u133plus2 --cel-files CELlist.txt
Compute Human Exon ST1.0 Transcript Cluster
apt-probeset-summarize -a rma-sketch -c hGlue1_0.r3.clf --p hGlue1_0.r3.pgf -b
hGlue1_0.r3.antigenomic.bgp --qc-probesets hGlue1_0.r3.qcc -m
hGlue1_0.r3.HuEX_TC.mps -o BNS/huex --cel-files CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
23
Apt-probeset-summarize output
[method].summary.txt – expression index matrix
[method].report.txt – quality control measures
11/12/2008
Glue Grant H1 Analysis Tutorial
24
Expression Console
Improvement over last tutorial
More summary options: EC, TC, JUC, EX, TX
Define probes into core, extended (multi probes)
Convert to U133plus2, HuEx format
Walk through an example
Summary
QC metrix
Link with annotation
Refer to doc/EC_Tutorial.doc (recycled from last
tutorial)
11/12/2008
Glue Grant H1 Analysis Tutorial
25
Practice session #1
CEL reduction (SNPremover)
GlueQC
GlueQC on data/07-20-08/CELlist_test.txt (15 arrays)
Low level Analysis
Feature extraction
Extract raw probe intensity of 15 arrays
Extract quantile normalized and GC-background corrected probe
intensity of “main->junction” from 15 arrays
B.N.S
rma-sketch summary of PSR for 15 arrays
rma-sketch summary of TC for 15 arrays (use mps file from
lib/GenBase)
11/12/2008
Glue Grant H1 Analysis Tutorial
26
High level analysis (Expression Index ->
Gene List)
Array annotation and annotation files
Import APT results to dChip for high level analysis
A practice session
11/12/2008
Glue Grant H1 Analysis Tutorial
27
Array annotation (r3)
Update over r2 version
Corrected a bug caused by MySQL end-of-line problem
Added annotation for Transcript, Junction and other
contents
Added annotation files for dChip and GenBase
Added BED files and REFFLAT files for Genome Browser
Refer to lib/readme.doc for details
Customerization: http://gluegrant1.stanford.edu/phpMyAdmin/
11/12/2008
Glue Grant H1 Analysis Tutorial
28
hGlue1_0.r3.TC_annot.csv
probeset_id TC0100032
seqnamechr1
start 1205715
end 1217272
strand +
TR01031609 /// TR01031608 /// TR01031607 /// TR01031606 /// TR01031605 /// TR01031604 /// TR01031603 /// TR01031602 ///
transcript_id TR01031601 /// TR01008856 /// TR01008322 /// TR01008321 /// TR01008320 /// TR01008319 /// TR01002236 /// TR01001810
NM_002978 /// ENST00000379130 /// ENST00000379116 /// ENST00000379110 /// ENST00000379101 /// ENST00000379099 ///
ENST00000338555 /// ENST00000325425 /// chr1.72.1019 /// chr1.72.1018 /// chr1.72.1017 /// chr1.72.1016 /// chr1.72.1015 ///
public_transcript_id chr1.72.1014 /// chr1.72.1013 /// chr1.72.1012 /// chr1.72.1011
gene_id 6339
symbol SCNN1D
description 1p36.3-p36.2
sodium channel, nonvoltage-gated 1, delta /// Amiloride-sensitive sodium channel subunit delta (Epithelial Na(+) channel subunit delta)
mapLocation (Delta ENaC) (Nonvoltage-gated sodium channel 1 subunit delta) (SCNED) (Delta NaCH). [Source:Uniprot/SWISSPROT;Acc:P51172]
GO:0006811 // ion transport // IEA /// GO:0006814 // sodium ion transport // IEA /// GO:0050896 // response to stimulus // IEA ///
GO_biological_process GO:0050909 // sensory perception of taste // IEA
GO:0005216 // ion channel activity // IEA /// GO:0005272 // sodium channel activity // IEA /// GO:0015280 // amiloride-sensitive
GO_molecular_function sodium channel activity // IEA /// GO:0031402 // sodium ion binding // IEA
GO:0016020 // membrane // IEA /// GO:0016021 // integral to membrane // IEA /// GO:0005624 // membrane fraction // NR ///
GO_cellular_component GO:0005887 // integral to plasma membrane // NR /// GO:0016020 // membrane // IEA /// GO:0016021 // integral to membrane // IEA
unigene_id Hs.512681
12477932 /// 14645214 /// 14702039 /// 14726523 /// 15084585 /// 15308635 /// 15489334 /// 16423824 /// 16710414 ///
pubmed_id 16930535 /// 17472699 /// 18073141 /// 7499195 /// 8661065
transcript_count 16
exon_cluster_count 16
exon_count 34
psr_count 29
core_probeset_count 29
extended_probeset_count 11
core_probe_cout 281
extended_probe_count 11
11/12/2008
Glue Grant H1 Analysis Tutorial
29
hGlue1_0.r3.PSR_annot.csv
probeset_id 2
seqname chr6
start 143813706
end 143813810
strand +
probe_count 10
psr_id PSR060006659
exon_cluster_id EC060004997
transcript_cluster_id TC0600776
exon_id EX060014449|EX060018900
transcript_id TR06002279|TR06009764
probeset_type core
11/12/2008
3
chr15
26701374
26701512
+
1
PSR150000536
EC150000483
TC1500128
EX150003690|EX150010649
TR15001527|TR15002341|TR15002342|TR150
02343|TR15002344|TR15002345|TR15002346
|TR15002347|TR15002348|TR15002349|TR15
002350|TR15002351|TR15002352|TR1500235
3|TR15002354|TR15002355|TR15002356|TR1
5002357|TR15002358|TR15002359|TR150023
60|TR15002361|TR15002362|TR15002363|TR
15002364|TR15002365
extended
Glue Grant H1 Analysis Tutorial
30
hGlue1_0.r3.Junction_annot.csv
JunctionID JUC0100000002
JUC0100000009
chrom chr1
chr1
chromStart 2090
311127
chromEnd 2476
311901
strand +
+
PSRID5 PSR010000001
PSR010000013
PSRID3 PSR010000002
PSR010000014
ECID5 EC010000001
EC010000011
ECID3 EC010000002
EC010000012
EX010034624_to_EX010034625 |
Exon2Exon EX010034624_to_EX010034626
TxIDs TR01010028 | TR01010027
PublicTxIDs chr1.1.1 | chr1.1.0
11/12/2008
EX010034670_to_EX010034671 |
EX010034670_to_EX010034674
TR01024688 | TR01024686
chr1.26.726 | chr1.26.724
TcID TC0100001
TC0100006
Type constitutive
alternative
Glue Grant H1 Analysis Tutorial
31
dChip
Improve over last tutorial
Added Gene Ontology, KEGG pathway and chromosome
band analysis
Walk through an example
Remove extra header and extra tail
Import external data into dChip
11/12/2008
Differential Expression Analysis
Clustering/Enrichment
Chromosome/Genome enrichment
Glue Grant H1 Analysis Tutorial
32
Practice session #2
dChip
11/12/2008
Glue Grant H1 Analysis Tutorial
33
Visualization - cisGenomeBrowser
Light version of UCSC Genome Browser (Hui Jiang)
CEL image
Genome Region
http://biogibbs.stanford.edu/~jiangh/browser/index.h
tml
11/12/2008
Glue Grant H1 Analysis Tutorial
34
cisGenomeBrowser-CEL Image
11/12/2008
Glue Grant H1 Analysis Tutorial
35
cisGenomeBrowser-Genomic Region
11/12/2008
Glue Grant H1 Analysis Tutorial
36
cisGenomeBrowser
Annotation track
hGlue1_0.r3.TC.refflat
hGlue1_0.r3.TX.refflat
Hg18.genefile (refseq track only)
Signal track (visualization/genCisGenomeBrowserTrack.bat)
probe raw signal barfile
>genbar.pl –coord = hGlue1_0.r3.Probe.BED --signal =
raw_probe_signal.txt –outdir = Probe_barfile
PSR barfile
>genbar.pl --coord=hGlue1_0.r3.PSR.BED --signal=PSR/rmasketch.summary.txt --outdir=PSR_barfile
Gene barfile
>genbar.pl --coord=hGlue1_0.r3.TC.BED --signal=TC/rmasketch.summary.txt --outdir=TC_barfile
Demo
11/12/2008
Glue Grant H1 Analysis Tutorial
37
Other Browsers
UCSC Genome Browser
(visualization/genUCSCBrowsreTrack.bat)
apt-summary-vis -g hGlue1_0.r3.PSR.BED
PSR/rma-sketch.summary.txt --wiggle-colindex 1 –o CEL1.PSR.wig
Need to tweak BED file to make PSR non-overlap in
order to work on UCSC browser
Affymetrix Genome Browser
apt-summary-vis -g hGlue1_0.r3.PSR.BED
PSR/rma-sketch.summary.txt –o PSR.egr
11/12/2008
Glue Grant H1 Analysis Tutorial
38
Glue Grant Exon Array tool
Highlights
Specially tailored for exon arrays
Command line with R interface
Probe sequence specific background model-MAT
Summarization: probe-selection (GenBase), Li-Wong model
(dChip) and median-polish (RMA)
Integrated alternative splicing analysis (MADS)
Run analysis (GlueGrantExonArrayTool/runEAT.bat)
../../bin/GlueGrantExonArrayTool/eat.win32.exe
EXPR_param.conf -l ../../data/07-2008/CELlist.txt
../../bin/GlueGrantExonArrayTool/eat.win32.exe
MADS_param.conf -l ../../data/07-2008/CELlist.txt
11/12/2008
Glue Grant H1 Analysis Tutorial
39
Param.conf
Specify analysis parameters
Analysis type
Librarie files
Background correction method
Normalization method
Summarizaiton method
MADS parameters
Example: /GlueGrantExonArrayTool/Expr_param.conf
11/12/2008
Glue Grant H1 Analysis Tutorial
40
Practice session#3
cisGenomeBrowser
Generate bar files for PSR and TC of 15 arrays in
CELlist_test.txt from practice session#1
Search for genes of your interests
Glue Grant Analsysis Tool
Repeat steps in runEAT.bat
11/12/2008
Glue Grant H1 Analysis Tutorial
41
Thank you
11/12/2008
Glue Grant H1 Analysis Tutorial
42