Transcript Document

Microarray Databases and
MIAME (Minimum Information
About a Microarray Experiment)
Yong Liu
Bioinformatics Unit
7/21/2015
1
Outline
Review of microarray technology from
data/database perspective
 Motivation behind the MIAME standard
 MIAME: what’s in it?
 Current existing microarray databases
 Future development

7/21/2015
2
DNA Microarray Technology
Cy5: ~650 nm
Cy3: ~550 nm
No differential expression
Induced
Repressed
7/21/2015
3
7/21/2015
4
Context is Everything !
“An observed phenotype is specific for the
conditions under study” (Pat Brown, Stanford
University)
 Information recorded in microarray database
should be used on standalone basis

– Any background information
– Automated data analysis and mining, i.e. not only
on record-by-record basis
– Data from different laboratories and different
technology platforms
7/21/2015
5
Capturing Data and Meta-data in
Microarray Experiments
7/21/2015
6
How Much Data?

Experiments
–
–
–
–
–
–

100 000 genes in human
320 cell types
2000 compounds
3 time points
2 concentrations
2 replicates
Data volume
– 8 x 1011 data-points
– 1 x 1015 = 1 petaB of data
7/21/2015
7
Gene Expression Matrix

Spot/Image
quantiations
7/21/2015
Genes
Spots
The final gene expression matrix (on the right) is
needed for higher level analysis and mining.
Images
Samples
Gene
expression
levels
8
MGED and MIAME


A need to establish a public repository or repositories for
microarray gene expression data became apparent in 1998,
which requires data standards
MGED-1 (Microarray Gene Expression Database) Group:
November 14-15, 1999, Cambridge, UK
– Established five working groups, including the microarray data
annotation group (MIAME)

MGED-2: May 25 - 27, 2000, Heidelberg, Germany
– Endorsed a MIAME draft

MGED-3: March 29-31, 2001, Stanford University
– Adopted MIAME 1.0

MGED-4: Feb. 13-16, 2002, Boston
– Adopted MIAME 1.1
7/21/2015
9
MIAME: Six Parts
7/21/2015
10
MIAME Part 1 - Experimental Design: the set
of the hybridisation experiments as a whole

Author, contact information, citations
 Type of experiment (e.g., time course, normal vs
diseased comparison)
 Experimental factors – i.e. tested parameters in
the experiment (e.g. time, dose, genetic variation,
response to a compound)
 List of organisms used in the experiment
 List of platforms used
7/21/2015
11
MIAME Part 2 - Array Design: each array
used and each element (spot) on the array



Array design related information (e.g. platform
type – insitu synthesized or spotted, array
provider, surface type – glass, membrane, other,
etc)
Properties of each type of elements on the array,
that are generated by similar protocols (e.g.
synthesized oligos, PCR products, plasmids,
colonies, others) – may be simple or composite
(Affymetrix)
Each element (spot) on the array
7/21/2015
12
MIAME Part 3 - Samples: samples used, the
extract preparation and labeling

Sample source and treatment
 Hybridisation extract preparation
– Laboratory protocol, including extraction
method, whether RNA, mRNA, or genomic DNA
is extracted, amplification method

Labelling
– Laboratory protocol, including amount of
nucleic acids labelled, label used (e.g. Cy3,
Cy5, 33P, etc)
7/21/2015
13
MIAME Part 4 - Hybridizations: procedures
and parameters






The solution (e.g. concentration of
solutes)
Blocking agent
Wash procedure
Quantity of labelled target used
Time, concentration, volume, temperature
Description of the hybridisation
instruments
7/21/2015
14
MIAME Part 5 - Measurements: images,
quantitation, specifications

Scanning information
– Scan parameters, including laser power,
spatial resolution, pixel space, PMT voltage
– Laboratory protocol for scanning, including
scanning hardware and software used

Image analysis information
– Image analysis software specification
– All parameters

Summarised information from possible
replicates
7/21/2015
15
MIAME Part 6 – Normalization: types,
values, specifications
Normalisation strategy (spiking,
housekeeping genes, total array,
other)
 Normalisation algorithm
 Control array elements

7/21/2015
16
Current Existing Microarray
Databases

Local Installation
– AMAD, GeneDirector, mAdb, maxdSQL, NOMAD

Public Queries only
– ChipDB, RAD

Public Queries and Local Installation
– SMD

Public Data Deposition and Queries
– ArrayExpress, GEO, GXD

GeneX and GeNet
FOR MORE INFO...
Margaret Gardiner-Garden and Timothy G. Littlejohn, A comparison of
micoarray databases, Briefings in Bioinformatics, May 2001
7/21/2015
17
MIAME-compliant Systems

Different labs have different needs: labcentric system is more desirable
 MIAME-compliant microarray database
systems are still under development
– Commerical
•
•
•
•
GeneTraffic (www.iobion.com)
PARTISAN arrayLIMS (www.clondiag.com)
Rosetta Resolver (www.rosettabio.com)
…….
– OpenSource
7/21/2015
• GeneX and NOMAD, among others, are still under
development to be MIAME-compliant,
18
Future Development

Establishing MIAME-compliant databases
– Different labs continue to develop their own systems

Data exchange format (MAGE-ML) allowing to
communicate MIAME information
 Microarray data has no central DB yet: distributed
data queries and data mining?
HTTP/XML
SOAP (Simple Object Access Protocol)
WDSL(Web Services Description Language)
UDDI (Universal Description, Discovery, and
Integration)
–
–
–
–
7/21/2015
19