No Slide Title

Download Report

Transcript No Slide Title

Analysis and Interpretation of
Microarray Data
Michael F. Miles, M.D., Ph.D.
Depts. of Pharmacology/Toxicology and Neurology
and the Center for Study of Biological Complexity
Virginia Commonwealth University
Richmond, VA
[email protected]
Expression Profiling: A Non-biased, Genomic Approach to
Resolving the Mechanisms of Addiction
Candidate
Gene Studies
Cycles of
Expression
Profiling:
“Molecular
Triangulation”
Merge with
Biological
Databases
High Density DNA Microarrays
Oligonucleotide Array Analysis
Total RNA
5’
AAAA
Rtase/
Pol II
dsDNA
AAAA-T7
TTTT-T7
T7 pol
Biotin-cRNA
TTTT-5’
CTP-biotin
Oligo(dT)-T7
Hybridization
Scanning
PM
MM
Steptavidinphycoerythrin
Stepwise Analysis of
Microarray Data
• Low-level analysis -- image analysis,
expression quantitation
• Primary analysis -- is there a change in
expression?
• Secondary analysis -- what genes show
correlated patterns of expression?
(supervised vs. unsupervised)
• Tertiary analysis -- is there a phenotypic
“trace” for a given expression pattern?
GE Database
(SQL Server)
Primary
Analysis
(MAS-5, Sscore, d-chip,
PDNN)
Normalize,
De-noise
Statistical
Filtering
(e.g. SAM)
Hybridization
and Scanning
Clustering
Techniques
Experimental
Design
Provisional
Gene
“Patterns”
Behavioral
Validation
Molecular
Validation
(RT-PCR, in
situ, Western)
Candidate
Genes
Filtered Gene
Lists
Overlay
Biological
Databases
(PubGene,
GenMAPP,
EASE,
WebQTL,
etc.)
Quality Assessment
• Gene specific: R/G correlation, %BG, %spot,
biological variation
• Array specific: normalization factor, % genes
present, linearity, control/spike performance (e.g.
5’/3’ ratio, intensity)
• Across arrays: linearity, correlation, background,
normalization factors
Sources of Variance in Microarray Experiments
Ty pe of Variance Factors
Biological
Animal-animal differences (int ra/inter cage, supplier)
Genotype
Circadian rhythms
Stress
Technical
Sample t reat ment /harvesting (dissect ions, inject ions)
Target preparat ion (enzyme lots, mRNA quality)
Lot-to-lot chip variat ion
Chip processing (scanning order)
Environmental
Temperature
Handling
Noise/odors
Chip Normalization Procedures
• Whole chip intensity
– Assumes relatively few changes, uniform error/noise
across chip and abundance classes
– Linear vs. “piece wise” linear (quantile, lowess)
• Spiked standards
– Requires exquisite technical control, assumes uniform
behavior
• Internal Standards
– Assumes no significant regulation
Slide Normalization: Pieces and Pins
“Lowess” normalization,
Pin-specific Profiles
After Print-tip Normalization
http://www.ipam.ucla.edu/publications/fg2000/fgt_tspeed9.pdf
See also: Schuchhardt, J. et al., NAR 28: e47 (2000)
Affymetrix Arrays: PM-MM
Difference Calculation
Probe pairs control for non-specific hybridization of oligonucleotides
Probe Level Analysis: Challenges
• Large variability in PM and MM intensities
• Only 11-25 probe pairs
• MM is a complex mixture of true signal and
background
• Normalization required to compare across
chips
• Intensity dependent noise
• Etc.
Probe Level Analysis Methods
• AvgDiff -- Affymetrix 1996, trimmed mean with
exclusion of outliers, PM-MM
• MAS 5 -- Affymetrix 2001, modeled correction of
MM, Tukey’s bi-weight, PM-MM or PM-m
• MBEI -- Li and Wong 2001, modeled correction
and outlier detection, PM-MM or PM only
• RMA (Robust Multichip Analysis) -- Irizarry et al.
2002, PM only
• PDNN (Position Dependent Nearest Neighbor) Zhang et al. 2003, thermodynamic model for
probe interactions, PM only
MAS 5 Fold-Change vs. S-scores
Secondary Analysis: Expression
Patterns
• Supervised multivariate analyses
– Support vector machines
• Non-supervised clustering methods
– Hierarchical
– K-means
– SOM
AvgDiff
Use of Sscore in
Hierarchical
Clustering of
Brain
Regional
Expression
Patterns
S-score
-2
0
+2
relative change
Tertiary Analysis: Connecting
Function with Expression Patterns
• Annotation
– UniGene/Swiss-Prot, SOURCE, DAVID
• Biased functional assessment
– Manual, GenMAPP, GeneSpring
• Non-biased functional queries
– PubGen
– MAPPFinder, DAVID/Ease, GEPAS,
GOTree Machine, others
• Overlaying genomics and genetics
– WebQTL
Non-biased (semi) Functional
Group Analysis: GenMAPP
Expression Analysis Systematic Explorer -- EASE
http://apps1.niaid.nih.gov/david/upload.jsp
Genome Biol. 2003;4(10):R70. Epub 2003 Sep 11.
EASE -- Options in Analysis
Efforts to Integrate Diverse Biological Databases
with Expression Information: PubGen
www.PubGen.org
B6/D2
D2 Et
B6 Et
B6/D2
D2 Et
B6 Et
B6/D2
D2 Et
B6 Et
1
2
Functional Annotation
Association Mining
(EASE)
3
High-throughput Literature
Association Mining
(PubGene)
4
5
6
Genetic Associations
(WebQTL)
7
8
Additional Expression
Associations
(Molecular Triangulation)
9
10
11
NAC
PFC
VTA
Analysis Stages for Oligonucleotide Microarrays
Analysi s S tage
Normal iz ati on
Probe reduction
C om parative
Mu l ti variate
stu die s
Biological overlay
Descri ption
Equalizes overall signal across
arrays to be compared, ensures
linearity of response across
abundance classes
Combines signals from mult iple
probes or probe pairs to define
Òexpression levelÓ. Ident ifies
genes with invalid or hypervariable expression levels.
Compares expression of a gene
across two or more arrays to
determine significant changes in
expression
Ident ifies significant correlations
in expression data across
experiments/condit ions
Ident ify functions for given
genes, clusters of genes;
hypothesis generation
Exam ple s of Methods
Whole chip(26)
Quant ile(27)
Weighted average (MAS 4)(29)
Tukey bi-weight (MAS 5)(30)
Model-based (MBEI)(31)
Log scale linear addit ive (RMA)(32)
Posit ion-dependent stacking energy modeling
(PDNN) (33)
t -test
rank order (MAS 5) (30)
permutat ion (SAM) (46, 47)
S-score (48)
hierarchical clustering
k-means clust ering
self-organizing maps
principle component s analysis
& many more(34, 49)
Mult iple database access (Source)(50)
PubMed correlat ions (PubGene)(51)
Gene Ontology rankings (GenMAPP,
MAPP Finder, DAVID/EASE)(52, 53)
Bioinformatics Resources for Microarray Experiments
Name
S OURCE
Descri ption
Human, rat, mouse gene compilat ion
from mult iple databases; allows batch
submissions for annotat ion
Gene Lynx
Human, mouse gene compilat ion;
multiple database links regarding
gene/protein struct ure and funct ion
DAVID/Ease
Mines gene list for frequency of GO
categories; annotat ion of gene list;
st atistical analysis of biological t hemes
in gene list (EASE)
GenMAPP/MAPPFin de r Superimposes array dat a on biological
pathways; stat ist ical ranking of
funct ional groups
FatiGO
Mines gene list for occurrence of GO
terms; stat ist ical comparison of two
list s for over-representat ion
PubGene
Finds associat ions between genes in
biomedical lit erature; superimposes
array data on literature links;
commercial version available
MEME
Search promoter regions of genes in
list /cluster for conserved motifs
Lin k
ht tp://source.stanford.edu/cgibin/sourceSearch
ht tp://www.gen elynx.org/
ht tp://apps1.niaid.nih.gov/David/
upload.asp
ht tp://www.gen mapp.org/
ht tp://fat igo.bioinfo.cnio.es/
ht tp://www.pubgen e.org/
ht tp://meme.sdsc.edu/meme/web
site/int ro.html
Quaternary Analysis: Profiles to Physiology
Expression Profiling
Prot-Prot
Interactions
BioMed Lit
Relations
Expression Networks
Homolo
-Gene
Ontology
Genetics
Pharmacology
Complex
Trait