Transcript Document
Microarray Databases and MIAME (Minimum Information About a Microarray Experiment) Yong Liu Bioinformatics Unit 7/21/2015 1 Outline Review of microarray technology from data/database perspective Motivation behind the MIAME standard MIAME: what’s in it? Current existing microarray databases Future development 7/21/2015 2 DNA Microarray Technology Cy5: ~650 nm Cy3: ~550 nm No differential expression Induced Repressed 7/21/2015 3 7/21/2015 4 Context is Everything ! “An observed phenotype is specific for the conditions under study” (Pat Brown, Stanford University) Information recorded in microarray database should be used on standalone basis – Any background information – Automated data analysis and mining, i.e. not only on record-by-record basis – Data from different laboratories and different technology platforms 7/21/2015 5 Capturing Data and Meta-data in Microarray Experiments 7/21/2015 6 How Much Data? Experiments – – – – – – 100 000 genes in human 320 cell types 2000 compounds 3 time points 2 concentrations 2 replicates Data volume – 8 x 1011 data-points – 1 x 1015 = 1 petaB of data 7/21/2015 7 Gene Expression Matrix Spot/Image quantiations 7/21/2015 Genes Spots The final gene expression matrix (on the right) is needed for higher level analysis and mining. Images Samples Gene expression levels 8 MGED and MIAME A need to establish a public repository or repositories for microarray gene expression data became apparent in 1998, which requires data standards MGED-1 (Microarray Gene Expression Database) Group: November 14-15, 1999, Cambridge, UK – Established five working groups, including the microarray data annotation group (MIAME) MGED-2: May 25 - 27, 2000, Heidelberg, Germany – Endorsed a MIAME draft MGED-3: March 29-31, 2001, Stanford University – Adopted MIAME 1.0 MGED-4: Feb. 13-16, 2002, Boston – Adopted MIAME 1.1 7/21/2015 9 MIAME: Six Parts 7/21/2015 10 MIAME Part 1 - Experimental Design: the set of the hybridisation experiments as a whole Author, contact information, citations Type of experiment (e.g., time course, normal vs diseased comparison) Experimental factors – i.e. tested parameters in the experiment (e.g. time, dose, genetic variation, response to a compound) List of organisms used in the experiment List of platforms used 7/21/2015 11 MIAME Part 2 - Array Design: each array used and each element (spot) on the array Array design related information (e.g. platform type – insitu synthesized or spotted, array provider, surface type – glass, membrane, other, etc) Properties of each type of elements on the array, that are generated by similar protocols (e.g. synthesized oligos, PCR products, plasmids, colonies, others) – may be simple or composite (Affymetrix) Each element (spot) on the array 7/21/2015 12 MIAME Part 3 - Samples: samples used, the extract preparation and labeling Sample source and treatment Hybridisation extract preparation – Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method Labelling – Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc) 7/21/2015 13 MIAME Part 4 - Hybridizations: procedures and parameters The solution (e.g. concentration of solutes) Blocking agent Wash procedure Quantity of labelled target used Time, concentration, volume, temperature Description of the hybridisation instruments 7/21/2015 14 MIAME Part 5 - Measurements: images, quantitation, specifications Scanning information – Scan parameters, including laser power, spatial resolution, pixel space, PMT voltage – Laboratory protocol for scanning, including scanning hardware and software used Image analysis information – Image analysis software specification – All parameters Summarised information from possible replicates 7/21/2015 15 MIAME Part 6 – Normalization: types, values, specifications Normalisation strategy (spiking, housekeeping genes, total array, other) Normalisation algorithm Control array elements 7/21/2015 16 Current Existing Microarray Databases Local Installation – AMAD, GeneDirector, mAdb, maxdSQL, NOMAD Public Queries only – ChipDB, RAD Public Queries and Local Installation – SMD Public Data Deposition and Queries – ArrayExpress, GEO, GXD GeneX and GeNet FOR MORE INFO... Margaret Gardiner-Garden and Timothy G. Littlejohn, A comparison of micoarray databases, Briefings in Bioinformatics, May 2001 7/21/2015 17 MIAME-compliant Systems Different labs have different needs: labcentric system is more desirable MIAME-compliant microarray database systems are still under development – Commerical • • • • GeneTraffic (www.iobion.com) PARTISAN arrayLIMS (www.clondiag.com) Rosetta Resolver (www.rosettabio.com) ……. – OpenSource 7/21/2015 • GeneX and NOMAD, among others, are still under development to be MIAME-compliant, 18 Future Development Establishing MIAME-compliant databases – Different labs continue to develop their own systems Data exchange format (MAGE-ML) allowing to communicate MIAME information Microarray data has no central DB yet: distributed data queries and data mining? HTTP/XML SOAP (Simple Object Access Protocol) WDSL(Web Services Description Language) UDDI (Universal Description, Discovery, and Integration) – – – – 7/21/2015 19