Data Standards for Flow Cytometry

Download Report

Transcript Data Standards for Flow Cytometry

Data Standards for Flow Cytometry
Ryan Brinkman
TFL, BC Cancer Research Centre
Why flow cytometry data standards?
• Increasing data throughput…
– 1,000 samples * 6 parameters / day
• Data (information) processing has to increase
– Slow
– Error prone
– Not standardized
– Most limiting aspect of technology
Our view of the solution
1. Standards
– Exchange data and analyses between
software applications & researchers
– Allow for a flexible analysis pipeline
2. Consistent and thorough data annotation
– Which sample under what conditions?
3. High throughput QA/QC
– Early identification of problems essential in
high throughput flow
Data Standards for Flow Cytometry
Collaborative
Compendium Web Space
Translation
Graphs
Statistics
R
rflowcyt
Java
Experimental
Description
Checklist
www.flowcyt.org
Descriptive
Vocabulary
OWL
Visualization
Gating
Specification
Object
Model
Database
Schema
File
Formats
UML
SQL
XML
flowcyt.sf.net
Experimental Checklist
• Minimal information to describe a flow cytometry experiment
– Experimental overview
• Hypothesis, researcher
– Description of biomaterial
• Genus, species, contact details for sample
– Instrument settings
• Manufacturer, flow velocity, laser type, power
– Data collection and analysis
• FCS, sort gates, compensation
• Provide enough information to compare/reproduce experiments
www.flowcyt.org
flowcyt.sf.net
Proposal for FCS4
• Focus FCS on data only
– no metadata
– list mode, uncompensated
• Focus on interoperability with a canonical
approach
– Single data type
• External Data Representation (XDR)
• Single precision floating point & big endian
• No user defined keywords or segments
– $PAR (# of parameters)
– $PnN (short name for parameter)
– $PnR (range for each parameter)
www.flowcyt.org
flowcyt.sf.net
So where does all the
other (meta)data go?
XML
• Gates
• Compensation
•Transformation
FCS
• Data
Not how to gate, but
record what was gated
Software Tool #1
42
Software Tool #2
42
Gating-ML
• XML based description of gates
• Supported gate types
– Rectangular gates (n dimensions)
– Polygon gates (2 dimensions)
– Polytope gate (n dimensions, convex only)
– Ellipsoid gates (n dimensions)
– Decision trees
– Boolean collections of any of the types of
gates
www.flowcyt.org
flowcyt.sf.net
Outstanding gating issues
• As defined, gates are meaningless without
FCS files
– All gates are sort gates
• Data file + filter  Result (unique answer)
• No probabilistic descriptions
– No concept of a re-useable gate
• (e.g., lymphocyte)
www.flowcyt.org
flowcyt.sf.net
Transformation-ML
• XML based description of parameter
transformation
– Gated using different scales
– Data visualizations issues
• Predefined transformations
– Linear, quadratic, log (base e, 10, or any other),
hyperlog, bi-exponential, logicle, split-scale
• Support for universal transformation description
(MathML)
www.flowcyt.org
flowcyt.sf.net
FlowRDF
• Resource Description Framework (RDF)
– W3C standardized methodology on how to provide
metadata to virtually “anything”
– Based on RDF statements (triplets):
– subject, predicate, object.
– XML encoded (RDF/XML)
– Common reusable concepts
• (e.g., Dublin Core)
– Direct ontology links
• Web Ontology Language (OWL) builds on RDF
– Links immutable FCS files to metadata through Life
Science Identifier (LSID)
www.flowcyt.org
flowcyt.sf.net
Ontology
A structured vocabulary describing relationships between things
• What is an ontology for?
– To allow reuse of information across
multiple applications
– Aid researchers in the collection of
metadata surrounding each flow
cytometry experiment
– To conduct structured queries on
elements of flow cytometry experiments
Coordinated ontology effort
• OBI (formerly FuGO)
– Ontology for Biomedical Investigations
– Creating a general standard in which to encode
data for functional genomics experiments
Hypotheses
OBI
Microarray
SNP
Biological Materials
Protocols
Flow
Cytometry
Cytometers
Gating
obi.sf.net
Object Model
• Focus of another talk by Josef Spidlen
Implementations: Java & R
Tools for data file manipulation and analysis
Reference implementation of standards
• Java
– File format translation
– Process XML files and output results
– Facilitate the exchange of experimental details
• R (rflowcyt)
– Analyze FCM data using R statistical package
– Standard and novel visualizations of data
– Automated QA/QC
– Implemented as part of BioConductor
Reproducible Research
Fasdf sl sddslf asd ktrt gkut apw
dsakfji jkef fdskfjsio f skdf sdkfj
srt erdis f. Msdfosjs sdkf ei dke
fwoef kwsfnvnue sdf h eutr eiu
fhdksfu sdf ief, it wqp ddk ei fdkf
jdfie kcxv dkjfier kfjief.
<Dataset>BrinkmanLab
123</Dataset>
<Gate>123.12</Gate>
<Select>I..II</Select>
Fasdf sl sddslf asd ktrt gkut apw
dsakfji jkef fdskfjsio f skdf sdkfj srt
erdis f. Msdfosjs sdkf ei dke fwoef
kwsfnvnue sdf h eutr eiu fhdksfu
sdf ief, it wqp ddk ei fdkf jdfie kcxv
dkjfier kfjief.
Database
XML
Analysis
Tool
Implications of FCM Standards
• Promote & reinforce open scientific inquiry
– Exchange of data & analyses
– Automated, traceable and flexible analyses
• Provide dataset larger any single lab set
– Mechanism for new discoveries
• Facilitate basic and clinical research
• Provide dataset larger than any single lab
Acknowledgements
• NIH/NIBIB (EB5034)
•
•
•
•
•
•
•
•
Michael Ochs
Thomas Moloshok
Robert Gentleman
Clayton Smith
Perry Haaland
Adam Treister
Josef Spidlen
Nolwenn LeMeur
• ISAC
• IEEE
www.flowcyt.org
flowcyt.sf.net