Transcript Slide 1

Microarray Analysis of Gene Expression in Huntington's Disease
Peripheral Blood - a Platform Comparison
CodeLink
compatible
Microarray Analysis of Gene Expression in Huntington's Disease
Peripheral Blood - a Platform Comparison
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics and correction for multiple testing
GeneSifter Overview
Gene Expression in Huntington's Disease Peripheral Blood
Identification of biological themes
Platform comparison
Analysis Workflow
Raw data
Normalized, scaled data
Differentially expressed genes
Identify and partition expression patterns
Gene Summaries
Biological themes
(Pathways, molecular function, etc.)
Analysis Workflow
Raw data
Data upload
Normalized, scaled data
Comparison statistics,
correction for multiple testing
Differentially expressed genes
Up and down regulated, magnitude,
clustering
Identify and partition expression patterns
Annotation (UniGene, Entrez Gene,
Gene Ontologies, etc.)
Gene Summaries
Ontology report, pathway report, z-score
Biological themes
(Pathways, molecular function, etc.)
Experiment Design
Experimental design determines what can be inferred from the data as well as determining the
confidence that can be assigned to those inferences. Careful experimental design and the
presence of biological replicates are essential to the successful use of microarrays.
•Type of experiment
– Two groups
– Three or more groups
• Time series
• Dose response
• Multiple treatment
The type of experiment and number of groups will affect the statistical
methods used to detect differential expression
•Replicates
– The more the better, but at least 3
– Biological better than technical
Rigorous statistical inferences cannot be made with a sample size of one.
The more replicates, the stronger the inference.
Supporting material Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr http://ra.microslu.washington.edu/presentation/documents/KerrNAS.pdf
microarraysuccess.com
Differential Expression
The fundamental goal of microarray experiments is to identify genes that are differentially
expressed in the conditions being studied. Comparison statistics can be used to help identify
differentially expressed genes and cluster analysis can be used to identify patterns of gene
expression and to segregate a subset of genes based on these patterns.
•Statistical Significance
– Fold change
Fold change does not address the reproducibility of the observed difference and
cannot be used to determine the statistical significance.
–
Comparison statistics
• 2 group
– t-test, Welch’s t-test, Wilcoxon Rank Sum,
• 3 or more groups
– ANOVA, Kruskal-Wallis
Comparison tests require replicates and use the variability within the replicates to
assign a confidence level as to whether the gene is differentially expressed.
Supporting material Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today,
7(11 Suppl).: S55-63.
microarraysuccess.com
Differential Expression
•
Correction for multiple testing- Methods for adjusting the p-value from a
comparison test based on the number of tests performed. These adjustments
help to reduce the number of false positives in an experiment.
– FWER : Family Wise Error Rate (FWER) corrections adjust the p-value so
that it reflects the chance of at least 1 false positive being found in the list.
• Bonferonni, Holm, W & Y MaxT
– FDR : False Discovery Rate corrections (FDR) adjust the p-value so that it
reflects the frequency of false positives in the list.
• Benjamini and Hochberg, SAM
The FWER is more conservative, but the FDR is usually acceptable for “discovery”
experiments, i.e. where a small number of false positives is acceptable
Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18(1): 71-103.
Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures.
Bioinformatics 19(3):368-375.
microarraysuccess.com
GeneSifter – Microarray Data Analysis
Accessibility
Web-based
Secure
Data management
Data
Annotation (MIAME)
Multiple upload tools
CodeLink
Affymetrix
Illumina
Agilent
Custom
CodeLink
compatible
Differential Expression - Powerful, accessible tools for
determining Statistical Significance
R based statistics
Bioconductor
Comparison Tests
t-test, Welch’s t-test,
Wilcoxon Rank sum test, ANOVA,
Correction for Multiple Testing
Bonferroni, Holm,
Westfall and Young maxT,
Benjamini and Hochberg
Unsupervised Clustering
PAM, CLARA, Hierarchical clustering
Silhouettes
GeneSifter – Microarray Data Analysis
Integrated tools for determining Biological Significance
One Click Gene Summary™
Ontology Report
Pathway Report
Search by ontology terms
Search by KEGG terms or Chromosome
The GeneSifter Data Center
• Free resource
Training
Research
Publishing
• 5 areas
Cardiovascular
Cancer
Neuroscience
Immunology
Oral Biology
• Access to :
Data
Analysis summary
Tutorials
WebEx
The GeneSifter Data Center
www.genesifter.net/dc
GeneSifter - Analysis Examples
2 groups
(Huntingtons Blood vs Healthy Blood)
Differential expression
Fold change
Quality
t-test
False discovery rate
Data Upload
CodeLink
3 + groups
(Time series, dose response, etc.)
Differential expression
Fold change
Quality
ANOVA
False discovery rate
Visualization
Hierarchical clustering
PCA
Partitioning
PAM
Silhouettes
Biological significance
Gene Annotation
Ontology report
Pathway report
Microarray Analysis of Gene Expression in Huntington's Disease
Peripheral Blood - a Platform Comparison
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics and correction for multiple testing
GeneSifter Overview
Gene Expression in Huntington's Disease Peripheral Blood
Identification of biological themes
Platform comparison
Background - Huntington’s Disease
Huntington’s Disease (HD)
•Autosomal dominant neurodegenerative disease
•Motor impairment
•Cognitive decline
•Various psychiatric symptoms
•Onset 30-50 years
•Mutant Huntingtin protein (polyglutamine)
•Effects transcriptional regulation
•Transcription effects may occur outside of CNS
Pairwise Analysis
Human blood expression for Huntington’s disease versus control, CodeLink
CodeLink Human 20K Bioarray
Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD,
Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D.
Genome-wide expression profiling of human blood reveals
biomarkers for Huntington's disease.
Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Background - Data
Genome-wide expression profiling of human blood reveals biomarkers for
Huntington's disease
Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B,
Jensen RV, Krainc D.
Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Collected peripheral blood samples •14 Controls
•12 Symptomatic HD patients
•5 Presymptomatic HD patients
Identified 322 most differentially expressed genes (Con. Vs Symptomatic HD) using U133A array.
Used CodeLink 20K to confirm genes identifed using Affymetrix platform
Focused on 12 genes that showed most significant difference between Control and HD
Data available from GEO
Pairwise Analysis
Human blood expression for Huntington’s disease versus control, CodeLink
CodeLink Human 20K Bioarray
Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD,
Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D.
Genome-wide expression profiling of human blood reveals
biomarkers for Huntington's disease.
Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis
Select group 1
14 normal
Select group 2
12 Huntingtons
Pairwise Analysis
Already normalized
(median)
t-test
Quality filter – 0.75
(filters out genes with
signal less than 0.75)
Benjamini and Hochberg
(FDR)
Log transform data
Pairwise Analysis – Gene List
Biological Significance
Gene Annotation Sources
•
UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters.
Gene titles are assigned to the clusters and these titles are commonly used by researchers to
refer to that particular gene.
•
LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive
information, including function, about genes.
•
Gene Ontologies – The Gene Ontology™ Consortium provides controlled vocabularies for the
description of the molecular function, biological process and cellular component of gene products.
•
KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory
and metabolic pathways for genes.
•
Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference
sequences for both the mRNA and protein products of included genes.
GeneSifter maintains its own copies of these databases and updates them automatically.
One-Click Gene Summary
Pairwise Analysis – Gene List
Ontology Report
Ontology Report : z-score
R = total number of genes meeting
selection criteria
N = total number of genes measured
r = number of genes meeting selection
criteria with the specified GO term
n = total number of genes measured with
the specific GO term
Reference:
Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig
Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7
Z-score Report
Z-score Report
KEGG Report
Pairwise Analysis - Summary
Human blood expression for Huntington’s disease versus control, CodeLink
12 HD
14 Control
t-test,
Benjamini and
Hochberg (FDR)
Pattern selection
2606 increased
In HD
~20,000 genes
Z-scores
Biological processes
Protein biosynthesis (104)
Ubiquitin cycle (123)
RNA splicing (53)
KEGG
Oxidataive phosphorylation (35)
Apoptosis (22)
5684 genes
3078 decreased
In HD
Biological processes
Neurogenesis (90)
Cell adhesion (120)
Sodium ion transport (29)
G-protein coupled receptor signaling (114)
KEGG
Neuroactive ligand-receptor interaction (56)
Microarray Analysis of Gene Expression in Huntington's Disease
Peripheral Blood - a Platform Comparison
General microarry data analysis workflow
From raw data to biological significance
Comparison statistics and correction for multiple testing
GeneSifter Overview
Gene Expression in Huntington's Disease Peripheral Blood
Identification of biological themes
Platform comparison
Pairwise Analysis
Human blood expression for Huntington’s disease versus control, Affymetrix
U133A Human Genome Array
MAS 5 signal
Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD,
Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D.
Genome-wide expression profiling of human blood reveals
biomarkers for Huntington's disease.
Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis - Affymetrix
Already normalized
(median)
t-test
Quality filter – 50
(filters out genes with
signal less than 50)
Benjamini and Hochberg
(FDR)
Log transform data
Pairwise Analysis – Gene List
Human blood expression for Huntington’s disease versus control, Affymetrix
Gene Lists – Common and Unique Genes
Platform comparison – Biological themes
Affymetrix
Platform comparison – Biological themes
CodeLink
GeneSifter - Analysis Examples
2 groups
(Huntingtons Blood vs Healthy Blood)
Differential expression
Fold change
Quality
t-test
False discovery rate
Data Upload
CodeLink
3 + groups
(Time series, dose response, etc.)
Differential expression
Fold change
Quality
ANOVA
False discovery rate
Visualization
Hierarchical clustering
PCA
Partitioning
PAM
Silhouettes
Biological significance
Gene Annotation
Ontology report
Pathway report
Project Analysis - Clustering
Cluster by Samples – All Genes
CodeLink
Affymetrix
Cluster by Samples – ?
CodeLink
Affymetrix
Cluster by Samples – Y Chrom. Genes
CodeLink
Affymetrix
Platform Comparison - Summary
Transcripts Total
Increased in HD
Overlap (LL genes)
CodeLink
19729
2606
41%
Affymetrix
22283
1976
65%
Top BP Ontologies
Ubiquitin cycle
RNA splicing
Regulation of translation
Apoptosis
Clustering of samples
Platform Comparison - Summary
Increased in HD
Decreased in HD
Unique ontology
CodeLink
2606
3708
Oxidative Phos.
Affymetrix
1976
986
IL-6 Biosynthesis
MicroarraySuccess.com
Seven Keys to Successful Microarray Data Analysis
Experiment
Design
Type of
experiment
Two groups
Time series
Dose Response
Multiple treatments
Replicates
The more the better
Technical vs.
biological
Platform
Selection
Data
Management
Platforms
cDNA
Oligo
One color
Two color
Databases
Feature Extraction
Software
File formats
Experiment
Annotation
Samples
Protocols
Raw Data
Storing
Retrieving
System
Access
Usability
Intuitive
Special training
System Access
Single user desktop
Single user server
Web-based
Sharing data
In the lab
Collaboration
Academic partner – University of Washington
Differential
Expression
Normalization
Differential
Expression
Fold change
Comparison statistics
FWER/FDR
Pattern
Identification
Clustering
Visualization
Partitioning
Biological
Significance
Gene Annotation
UniGene
LocusLink
Gene Ontology
KEGG
OMIM
Single Genes
Gene Summaries
Gene Lists
Ontology Report
Pathway Report
Data
Publication
MIAME
What is it?
Publication
Public databases
GEO
ArrayExpress
SMD
Using public data
Meta analysis
The GeneSifter Data Center
www.genesifter.net/dc
Thank You
CodeLink
compatible
www.genesifter.net
Trial account, tutorials, sample data and Data Center
Eric Olson
[email protected]
206.283.4363