Transcript Document

Publishing expression data
from the SMD
Catherine Ball
Tuesday, May 30, 2006
[email protected]
http://smd.stanford.edu/
User Help: Tutorials and Workshops
•
SMD Help & FAQ
http://genome-www.stanford.edu/microarray/helpindex.html
•
SMD Tutorials – regularly scheduled (we hope)
–
–
–
–
–
Welcome to SMD
Data analysis, Normalization and Clustering
Publishing expression data
Power users and the data repository
Interested? Email [email protected]
Publishing expression data :
a tutorial
•
What we will discuss:
–
Publishing
•
•
–
Hybridization Annotation
•
•
•
•
–
Categories, Subcategories
Protocols
Procedures and parameters
Clinical Data
Experiment Set Annotation
•
•
•
•
–
Publisher’s requirements
Experimenter’s responsibilities
Organizing Data
Experiment Design Categories
Experimental Factors
Factor Values
•
What we won’t discuss:
–
–
–
–
–
–
–
–
–
–
–
User Registration
Loader Accounts
Submitting Data
Finding Your Data
Displaying Your Data
Data Retrieval and Analysis
Submitting a Printlist
Data Normalization
Data Quality Assessment
Data Analysis (clustering)
External User Tools (XCluster,
TreeView, etc.)
Making your data available
•
•
•
SMD
Web Supplements
Public Data repositories
Please fill out the sign-up sheet and survey form
Questions? email us at: [email protected]
Publishing expression data
•
•
•
Background
Publishing requirements and
responsibilities
Pre-publication responsibilities
– Hybridization Annotation
– Experiment Set Annotation
•
Post-publication responsibilities
– Making your data available
Background : Interpretation and
Analysis
•
Extremely difficult to either interpret or
analyze expression results without
being aware of all the variables
Biological characteristics, experimental design, protocol
parameters, filtering parameters, etc.
•
Typically, these annotations, if they exist
at all, are not attached to the data
Perhaps in a lab notebook, eventual publication (if ever
published), or in the worst scenario, only in the experimenter’s
head
Background : MGED
Microarray Gene Expression Database Society
• http://www.mged.org/
• Initially established November, 1999, Cambridge, UK.
• Realized there were serious problems in
communicating the results of genomic-scale
expression results
• Keen interest in a data standards, specifications, and
transmission.
•
Background : Emerging
standards
•
MIAME : Minimal Information
About a Microarray Experiment
–
–
•
the requisite information needed to
both verify your analysis and allow
others to perform distinct analyses
Nature Genetics (2001) 29, 365-371
MAGE-ML: MicroArray Gene
Expression Markup Language
–
–
data format standard required for
transmission and integration into
other expression repositories
Genome Biology (2002),
3(9):research0046.1–0046.9
Background : MIAME checklist
MGED Guide to authors, editors and
reviewers of microarray gene expression
papers
• In the interests of full disclosure and open
research, a checklist of requirements was
proposed, aimed at allowing manuscript
readers “to understand the experiment, to
identify the sequences being assayed, and to
interpret the resulting data. ”
•
http://www.mged.org/Workgroups/MIAME/miame_checklist.html
Publication Requirement?
… also being
adopted by Cell
and The Lancet others to follow…
Publishing responsibilities
•
Pre-publication
– Provide the data and full annotation to the
reviewers and editors.
– This may evolve to sending data to a repository
prior to publication (reviewer anonymity)
•
Post-publication
– For the foreseeable future, provide a static
snapshot of the raw result data and
filtered/clustered data along with the gene
annotation at the time of publication
Implications of MIAME for Stanford
Microarray Researchers
•
As of December 1, 2002, anyone submitting a
paper to a Nature journal must submit his/her
data to a public microarray data repository
(such as ArrayExpress).
•
SMD users should start assembling and
entering experimental data in preparation for
more widespread acceptance of these
standards.
MIAME checklist

Six parts
1.
2.
3.
4.
5.
6.
Biological Samples
Hybridizations
Data Normalization and Transformation
Experimental Design and Factors
Array Design
Measurements
SMD Stores Procedures
•
•
•
•
•
•
•
•
•
•
•
Biological Sample (Channels 1 and 2)
Growth Conditions (Channels 1 and 2)
Treatment (Channels 1 and 2)
Extract Preparation (Channels 1 and 2)
Chromatin IP
Amplification (Channels 1 and 2)
Labeling (Channels 1 and 2)
Hybridization Conditions
Scanning Procedure (Channels 1 and 2)
Feature Extraction
User-defined Procedures
Recording Procedural Details : Two
Mechanisms
•
Full text Protocols
– Great for providing the full documentation of the
protocol to a fellow researcher, but…
– Poor for indicating which experimental parameter
is the key to the experimental design
•
Procedural parameters
– Great for supervised analysis and singling out the
important details of the experiment, but…
– Poor for synthesizing the entire procedure
together in a legible manner
Where are the tools?
Enter New Data
View Existing Data
List Existing Protocols
•
Display within SMD, or View external resource
•
Edit your protocol from the list
Edit Existing Protocol
Entering a New Protocol
•
•
Choose the procedure
Supply the formatted plain text, or a simple description if
providing the URL
Flowchart to Add Annotations
Edit your hybridizations
Use “Edit” to add
procedural details to
your experiments
Experiment Types
•
•
•
•
•
CGH
– Comparison of genomic copy number between samples
(Comparative Genome Hybridization).
Chromatin IP
– Investigation of DNA-protein interactions in which protein-bound
DNA is immunoprecipitated.
Expression (Type I)
– Investigation of gene expression where the control sample is
tailored to the particular experiment (not a common reference).
Expression (Type II)
– Investigation of gene expression where the control RNA is made
from a common reference.
GMS
– Genome Mismatch Scanning. Investigation of the parental origin of
genomic DNA.
Edit your hybridizations
Use “Edit” to add
procedural details to
your experiments
Associating a protocol with a
hybridization
Associate a previously entered protocol
• Enter a new one, if need be
•
Adding Procedural Parameter
Values for a Hybridization
•
•
•
Same interface is used
to add experimental
parameter values
Parameter values are
linked directly to the
hybridization
Procedural parameters
are modeled as
experimental factors
Edit your hybridizations
Use “Edit” to add
clinical annotation to
your experiments
Associating Patient
Information
•
Patient parameters we
store
–
–
–
–
–
–
Age at diagnosis
Sex
Ethnicity
Family History
Status
Time from Operation to
Death
– Date of last follow-up
– Patient lost prior to
follow-up?
Associating Clinical Sample
Information
•
Sample parameters we store
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Tracking Information
Unique Sample ID
Linking Database
Sample Information
Sample Source
Time Post-mortem (hrs) of sample removal
Sample State, Size
Granularity
Organ of origin
Attending Surgeon
Pre-Operative Information
Prior Treatment
Clinical Stage
Post-Operative Information
Tumor Grade, Size, Type
Margins
Time from Diagnosis To Operation
Angioinvasion
Total Lymph Nodes
Positive Lymph Nodes
Pathological Stages FollowUp Information
Recurrence
Post Operative Therapy Time from Operation to
Recurrence
Batch Association of
Annotations
Batch Entry
MIAME checklist

Six parts
1.
2.
3.
4.
5.
6.
Biological Samples
Hybridizations
Data Normalization and Transformation
Experimental Design and Factors
Array Design
Measurements
MIAME checklist : Data
Normalization and Transformation
MIAME checklist

Six parts
1.
2.
3.
4.
5.
6.
Biological Samples
Hybridizations
Data Normalization and Transformation
Experimental Design and Factors
Array Design
Measurements
MIAME : Experimental Design
•
Experimental Design and Factors
– type of experiment (set of hybridizations)
– The number of hybridizations performed
– experimental factors
– hybridization design
– the type of reference used for the
hybridization
– quality control steps taken
Organizing Data: Arraylists vs
Experiment Sets
•
Arraylists
– Personal list of experiments
– Contains no annotation
– More difficult to share with
others
– Flat file that exists in your
loader account
– Accessed through
Advanced Search
•
Experiment Sets
– Annotated list of
experiments
– Exists in the database
therefore dynamic (edit,
delete, or annotate through
a web interface)
– Easily shared with other
users/ collaborators
– Extensible
– Accessed through Basic
Search
– Required for publication
within SMD
Easily convert your arraylist
into an experiment set
Selecting the data for inclusion within the
experiment set
•
Select
experiments
using either the
basic or
advanced search
as a starting point
Experiment Set Creation
Experiment Set Organization
Base Annotation for the
Experiment Set
–Set description
•For publications, this would likely be either the abstract or
a figure legend
Finding Your Sets in SMD:
Basic Search
Experiment Sets allow
you to search data
on pre-defined
experiment groups.
Edit your Experiment Set
Experiment Factors : Step 1
Procedures
Parameters
Measurements?
Experiment Factors : Step 2
These values can be
automatically
acquired/suggested
from your
procedural
parameters values,
but only if you have
annotated your
experiments.
Note: full text protocols
cannot be utilized for
this purpose, but fulfill
their own purpose.
Benefits of Experiment
Annotation
•
•
•
Meet MIAME requirements
Meet publishing requirements (see above)
Serve as a basis for new analysis tools
Post-publication responsibilities
•
Making your data easily available
and accessible for the foreseeable
future
– SMD
– web supplement
– public repositories
Post-publication : SMD
•
•
•
Send us the name of your MIAMEannotated experiment set
We’ll make the arrays world-viewable
for you, and publicize your paper
Gene annotations and normalizations
may change, so you must also provide
a distinct, static view (web supplement)
Contact [email protected]
Post-publication : web
supplement
•
•
We encourage you to make a web supplement,
which represents a snapshot of the data, as
published
Options:
1. You can make the web-site and host it on your own.
2. You can make the web-site on your own and you can ask
us to host it.
3. You can ask us to construct one for you. Usually, given the
amount of work that this entails (ask us ahead of time), the
curator creating the website will expect collaborative
consideration.
Contact [email protected]
Post-publication : repositories
– Submit your data to a public repository
• ArrayExpress at the EBI
– http://www.ebi.ac.uk/arrayexpress/
• Gene Expression Omnibus (GEO) and NCBI
– http://www.ncbi.nlm.nih.gov/geo/
– We produce valid MAGE-ML for
experiment sets and array designs and can
communicate these to the repositories for
you
Contact [email protected]
If you require assistance
with either the creation of a
web supplement or
submission of your dataset
to a repository, contact us at
[email protected]
MIAME Resources
•
MIAME working group
– http://www.mged.org/miame
•
MIAME checklist for authors, editors
– http://www.mged.org/miame/miame_checklist.html
SMD: Getting Help
•
Click on the
“Help” menu
– Tool-specific links
will be listed at the
top.
Use the SMD help
index to look for
specific subjects
• Send e-mail to:
•
[email protected]
SMD: Office Hours
Grant building, S201
• Mondays 1-3 pm
• Wednesdays 2-4 pm
•
SMD Staff
Catherine Ball
Director
Gavin Sherlock
Co-Investigator
Farrell Wymore
Lead Programmer
Patrick Brown
Co-Investigator
Zac Zachariah
Systems Administrator
Janos Demeter
Computational Biologist
Don Maier
Senior Software Engineer
Michael Nitzberg
Database Administrator
Catherine Beauheim
Scientific Programmer
Heng Jin
Scientific Programmer
Takashi Kido
Visiting Scholar