Transcriptomics, ArrayExpress and Expression Profiler

Download Report

Transcript Transcriptomics, ArrayExpress and Expression Profiler

Transcriptomics
Patrick Kemmeren
European Bioinformatics Institute
Genomics Lab, UMC Utrecht
What are microarrays ?
mRNA cDNA
hybridise to
microarray
Transcriptomics?
Sample
Sample
Sample
Sample
Sample
Experiment
Array design
RNA
extract
RNA
extract
RNA
extract
RNA
RNAextract
extract
labelled
labelled
labelled
labelled
nucleic
labelled
acid
nucleic
acid
nucleic
acid
nucleic
nucleicacid
acid
genes
hybridisation
hybridisation
hybridisation
hybridisation
hybridisation
array
array
array
array
Microarray
Gene
expression
data matrix
Protocol
Protocol
Protocol
Protocol
Protocol
Protocol
normalization
integration
Microarray data and annotation
Genes
Samples
Gene
annotation
Sample
annotation
Gene expression
matrix
Gene expression
levels
Traditions of data sharing in Life
Sciences
• Data used in publications should be
made available so that
• the experiments can be reproduced and the
conclusions can be verified
• the others can build on other’s results
• In genome sequencing this has evolved
into submissions to public sequence
databases DDBJ/EMBL/Genbank – most
journals require such submissions
Sharing microarray data – which data?
Samples
Genes
Quantitations
Spots
Array scans
A
B
D
C
MGED standards - MIAME
• Sample source
• Sample treatments
• Extraction protocol
• Labeling protocol
MGED – MIAME
Sample
Hybridization protocol
Hybridisation
Array
• Array design information
• Location of each element
• Description of each element
• Image
• Scanning protocol
• Software specifications
• Quantification matrix
• Analysis protocol
• Software specifications
MIAME 6 parts of a microarray experiment
Microarray experiment
Samples
Extracts
Labelled Extracts
Hybridizations
Colours related
to labels
Shapes related
to array designs
Experiment
name
Rustici et al., S. pombe cell-cycle mutant data (2004)
External
Application
MAGEML
Submission
support
Curation
Database
Architecture
MAGEML
XML
AE Data Warehouse
MIAMExpress Database ArrayExpress Repository
Data upload
Data download
Visualisation
User
Functionality
Submissions
Database
Retrieval of raw &
processed data for
analysis
Gene, sample,
and experiment
centric queries,
MIAMExpress
• Submission and annotation tool
• Potential local data annotation tool
• Based on MIAME concepts
• Accepts protocol, array and experiment submissions
• User accounts allow re-use of protocols and arrays
• Works with your own or commercial arrays
MIAMExpress schema
Create Account
Array Submission
File Merging
Experiment Submission
Sample1
Samplen
Spotter output
Extracts 1…n
Extracts 1…n
Clone Tracker
Labels 1…n
Labels 1…n
Protocol Submission
Sample protocol
Extraction protocol
Labelling protocol
Hybridisations
Hyb protocol
ADF
generation
Array1
Array2
Arrayn
Array Details
Data1
Data2
Datan
Scanning protocol
Image analysis protocol
Combined Experiment Data
Transformation protocol
Submit
Submit
Submit
MAGE-ML export using MAGEstk API
ArrayExpress
Curation
External
Application
MAGEML
Submission
support
Curation
Database
Architecture
MAGEML
XML
AE Data Warehouse
MIAMExpress Database ArrayExpress Repository
Data upload
Data download
Visualisation
User
Functionality
Submissions
Database
Retrieval of raw &
processed data for
analysis
Gene, sample,
and experiment
centric queries,
ArrayExpress
http://www.ebi.ac.uk/arrayexpress
• A public repository for microarray data at the
EBI
Data in ArrayExpress
Hybridizations
8000
7000
6000
5000
4000
3000
2000
1000
0
4
g0
Au
-0
4
Ju
n
r-0
4
Ap
b04
Fe
3
ec
-0
D
O
ct
-0
3
hybs
Submissions by pipelines
MEXP
SMDB
CAGE
TIGR
NASC
UMCU
Online
(MIAMExpress)
Submissions
SNGR
RZPD
FLYC
AFMX
EMBL
MANP
RUBN
DKFZ
WMIT
HGMP
ArrayExpress data - by organism
Homo sapiens
Homo sapiens
23%
Mus musculus
Arabidopsis thaliana
Schizosaccharomyces pombe
Other
Drosophila melanogaster
Saccharomyces cerevisiae
Rattus norvegicus
Caenorhabditis elegans
Apis mellifera
Danio rerio
Gerbera hybrid cultivar
Schizosaccharomyces
pombe
7%
Hordeum vulgare
Mus musculus
22%
Arabidopsis thaliana
19%
Total ~ 7000 hybridisations
Medicago truncatula
Pan troglodytes
Platichthys flesus
Pseudomonas aeruginosa
External
Application
MAGEML
Submission
support
Curation
Database
Architecture
MAGEML
XML
AE Data Warehouse
MIAMExpress Database ArrayExpress Repository
Data upload
Data download
Visualisation
User
Functionality
Submissions
Database
Retrieval of raw &
processed data for
analysis
Gene, sample,
and experiment
centric queries,
New!
http://www.ebi.ac.uk/aedw/ArrayExpress_main.html
Gene-centric Query Prototype
New!
Gene-centric Query Prototype
- Driven by a BioMart backend
New!
Gene-centric Query Prototype
External
Application
MAGEML
Submission
support
Curation
Database
Architecture
MAGEML
XML
AE Data Warehouse
MIAMExpress Database ArrayExpress Repository
Data upload
Data download
Visualisation
User
Functionality
Submissions
Database
Retrieval of raw &
processed data for
analysis
Gene, sample,
and experiment
centric queries,
Expression Profiler
http://www.ebi.ac.uk/expressionprofiler
• An online microarray data analysis platform
What can you do with the data?
What can you do with the data?
...view as a
Expression Profiler
heatmap...
Data Viewer Component
What can you do with the data?
...clusterExpression Profiler
the
data...
Hierarchical
Clustering Component
What can you do with the data?
...look at GeneOntology
enrichment of a selected
cluster ...
Expression Profiler
GO Annotation Component
What can you do with the data?
... check out how
clusterings compare ...
Expression Profiler
Clustering Comparison Component
What can you do with the data?
... integrate several data
types together ...
Expression Profiler
Threeway Similarity Analysis
Available Components
– Data Selection
– Data Transformation
– Sequence Homology
– Missing Value Imputation
– Hierarchical Clustering & Kgroups Clustering
– Clustering Comparison
– Signature Algorithm
– Visual Pattern Matching
– SPEXS: Promoter Discovery
– Ordination (COA, PCA)
– Between Group Analysis
– Three-way Similarity Analysis
– GO Annotation
Uses:
• ArrayExpress suite of tools
• Standalone tool
• Locally installed (UJI, UMC Utrecht)
• Teaching tool
• Pipelines, workflows, high-throughput analysis
Acknowledgements
EBI Microarray Informatics Team
Alvis Brazma, Head of Microarray Informatics Group
Ahmet Oezcimen, Scientist (Oracle DBA)
Anastasia Samsonova, PhD student
Anjan Sharma, Scientist (Software Developer)
Anna Farne, Scientist (Curation)
Aurora Torrente, PhD Student
Bhuwan Tiwari, Trainee
Catherine Leroy, Summer Student
Ele Holloway, Scientist (Curation)
Gabriella Rustici, Scientist (Postdoc)
Gaurab Mukherjee, Scientist (Curation)
Gonzalo Garcia Lara, Scientist (Web Designer/Programmer)
Helen Parkinson, Scientist (Curation Coordinator)
Jaak Vilo, Consultant
Lev Soinov, Scientist (Postdoc Wellcome Trust)
Misha Kapushesky, Scientist (Scientific Application Programmer)
Mohammadreza Shojatalab, Scientist (Database Programmer)
Niran Abeygunawardena, Scientist (Web Designer/Programmer)
Patrick Kemmeren, Consultant
Per Lilja, Scientist (Database Programmer)
Philippe Rocca-Serra, Scientist (Nutrigenomics Proj. Coordinator)
Pierre Marguerite, Summer Student
Richard Coulson, Scientist (Biosapiens Project)
Sergio Contrino, Scientist (Database Programmer)
Steffen Durinck, Student
Susanna-Assunta Sansone, Scientist (Toxicogenomics Proj.
Coordinator)
Tim Rayner, Scientist (Curation)
Ugis Sarkans, Scientist (Database Development Coordinator)
Original EP Development:
• Jaak Vilo (Tartu)
• Patrick Kemmeren (Utrecht)
• Misha Kapushesky
EP:NG Framework Development:
• Patrick Kemmeren (Utrecht)
• Misha Kapushesky
• Caroline Johnston (UCL)
Visualization Components:
• Misha Kapushesky
• Steffen Durinck (Leuven)
• Phil Hyoun Lee
Clustering Comparison:
• Aurora Torrente
• Christine Körner (Leipzig)
PCA/COA/BGA:
• Aedín Culhane (Cork)
Signature Algorithm:
• Jan Ihmels (Tel-Aviv)
Gene Ordering:
• Karlis Freivalds (Riga)
Normalisation:
• Caroline Johnston (UCL)
Web Services:
• Antonio Estruch (UJI)