BioPAX Biological Pathways Data Exchange www.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA.

Download Report

Transcript BioPAX Biological Pathways Data Exchange www.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA.

BioPAX
Biological Pathways
Data Exchange
www.biopaxwiki.org
Joanne Luciano, PhD
University of Manchester, Harvard Medical School
BioPathways Consortium, BioPAX Group,
Predictive Medicine, Inc.
25 Jan 2006
Cambridge, MA
USA
Pathway Data
Why does HCLS care?
(where we fit)
Pathway Research has Broad Impact
–
–
–
–
Drug Discovery (pathway of target, safety)
Basic Science (identify pathways)
Disease Research (cancer pathways, diabetes, malaria)
Environmental Research (microbial research)
Combine knowledge from multiple sources
– Whole is greater than the sum of its parts
– Biological knowledge is fragmented and isolated
– Need database to manage resources
6-Nov-15
2
What is a Pathway?
Depends on who you ask!
Glycolysis
Protein-Protein
Apoptosis
TFs in E. coli
Metabolic
Pathways
Molecular
Interaction
Networks
Signaling
Pathways
Gene
Regulatory
Networks
6-Nov-15
3
High Throughput
Experimental Methods
Microarray
Two-Hybrid
Expression
Interaction
Data
Mass
Spectrometry
Function
Genetics
Protein
modifications
Existing Literature
6-Nov-15
Slide from Gary Bader
Multiple
Pathway
Databases
Integration
Nightmare!
4
Pathway Databases
So many pathway databases,
their own data models, formats, and data access
methods and internal inconsistencies.
More than 200
and growing
Source: Pathway Resource List (http://cbio.mskcc.org/prl/)
6-Nov-15
Slide from Mike Cary
5
Closes Gaps in Pathway Data Space Exchange
Language Domain
Database Exchange
Formats
BioPAX
Genetic
Interactions
PSI-MI 2
Interaction Networks
Molecular
Pro:Pro
Simulation Model
Exchange Formats
Non-molecular
TF:Gene
SBML,
CellML
Regulatory Pathways
Low Detail
Genetic
Molecular Interactions
Pro:Pro
Biochemical
Reactions
All:All
Metabolic
Small Molecules
Low Detail
6-Nov-15
Slide from Gary Bader
High Detail
Low Detail
Rate
Formulas
Pathways
High Detail
High Detail
6
Research Community Need
Pathway Databases
Metabolic
Molecular Interaction
Cell Signaling
Gene Regulatory Networks
6-Nov-15
WIT
BioCyc
Reactome
aMAZE
KEGG
BIND
DIP
HPRD
MINT
IntAct
PSI format
CSNDB
TRANSPATH
TRANSFAC
INOH
PubGene
GeneWays
Integrated
Pathway
Database
Distributed
Pathway
Databases
7
One Interface
one converter per data source or tool
>200 DBs and tools
Application
Database
User
Without BioPAX
6-Nov-15
With BioPAX
Common “computable semantic” enables
scientific discovery
Slide from Gary Bader (adapted)
8
Design Goals
Encapsulation
– An entire pathway in one record
Compatible
– Use existing standards wherever possible
Computable
– From file reading to logical inference
Successful
– Buy-in from the research community
6-Nov-15
9
Why OWL DL?
Expressivity (biology = “complex relationships”
• W3C Standard (use existing (and upcoming)
standards) “Semantic Web enabled”
• OWL has representations in RDF and XML
(XML the exchange language)
Machine Computable
Enable full reasoning capability from file
reading to logical inference
– facilitate integration of knowledge, data, tool development
– uncover inconsistencies and new knowledge
6-Nov-15
10
Different representations of the
same pathways
<!ELEMENT reaction (substrate*,product*)>
<!ATTLIST reaction name
%keggid.type;
#REQUIRED>
<!ATTLIST reaction type %reactiontype.type;
#REQUIRED>
<!ELEMENT substrate EMPTY>
<!ATTLIST substrate name
#REQUIRED>
<!ELEMENT product EMPTY>
<!ATTLIST product name
#REQUIRED>
%keggid.type;
%keggid.type;
starts at a-D-Glucose 1P
6-Nov-15
KEGG Reference Pathway GLYCOLYSIS
11
Different representations of the
same pathways
reactions.dat
This file lists all chemical reactions in the PGDB.
starts at b-D-glucose6-phosphate
6-Nov-15
Attributes:
UNIQUE-ID
TYPES
COMMON-NAME
ACTIVATORS
BASAL-TRANSCRIPTION-VALUE
DBLINKS
DELTAG0
DEPRESSORS
EC-LIST
EC-NUMBER
ENZYMATIC-REACTION
EQUILIBRIUM-CONSTANT
IN-PATHWAY
INHIBITORS
LEFT
MOVED-IN
MOVED-OUT
OFFICIAL-EC?
REACTANTS
REQUIREMENTS
RIGHT
SIGNAL
SPECIES
SPONTANEOUS?
STIMULATORS
SYNONYMS
BioCYC Reference Pathway GLYCOLYSIS
12
BioPAX uses other ontologies
• Use pointers to existing ontologies to
provide supplemental annotation where
appropriate
– Cellular location  GO Component
– Cell type  Cell.obo
– Organism  NCBI taxon DB
• Incorporate other standards where
appropriate
– Chemical structure  SMILES, CML, InChI
6-Nov-15
13
BioPAX Ontology: Overview
an set of
interactions & parts
parts
how the parts are known to interact
6-Nov-15
Level 1 v1.0 (July 7th, 2004)
Slide from Gary Bader (adapted)
14
OWL
(semantics)
Instances
(data)
6-Nov-15
15
SBML annotated with BioPAX
<sbml xmlns:bp=“http://www.biopax.org/release1/biopax-release1.owl”
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<listOfSpecies>
<species id=“PdhA” metaid=“PdhA”>
species is protein
<annotation>
<bp:protein rdf:ID=“#PdhA”/>
protein is PdhA
</annotation>
</species>
<species id=“NADP+” metaid=“NADP+”>
species is small molecule
<annotation>
<bp:smallMolecule rdf:ID=“#NADP+”/>
small molecule is NADP+
</annotation>
</listOfSpecies>
<listOfReactions>
<reaction id=“pyruvate_dehydrogenase_cplx”>
<annotation>
<bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/>
</annotation>
</reaction>
</listOfReactions>
6-Nov-15
16
BioPAX: External References
<species id=“pyruvate” metaid=“pyruvate”>
<annotation
xmlns:bp=“http://biopax.org/release1/biopaxrelease1.owl”>
<bp:smallMolecule rdf:ID=“#pyruvate”>
<bp:Xref>
<bp:unificationXref rdf:ID=“#unificationXref119">
<bp:DB>LIGAND</bp:DB>
<bp:ID>c00022</bp:ID>
</bp:unificationXref>
</bp:Xref>
</bp:smallMolecule>
</annotation>
6-Nov-15
17
</species>
BioPAX: Synonyms
<species id=“pyruvate” metaid=“pyruvate”>
<annotation
xmlns:bp=“http://biopax.org/release1/biopax_release1.
owl”/>
<bp:smallMolecule rdf:ID=“#pyruvate” >
<bp:SYNONYMS>2-oxo-propionic acid</bp:SYNONYMS>
<bp:SYNONYMS>2-oxopropanoate</bp:SYNONYMS>
<bp:SYNONYMS>BTS</bp:SYNONYMS>
<bp:SYNONYMS>pyruvic acid</bp:SYNONYMS>
</bp:smallMolecule>
</annotation>
</species>
6-Nov-15
18
Tools
Protégé Ontology Editor
GKB Editor SRI
SWOOP
Pellet
Racer
Fact++
Pathway Tools
EditPlus (Text editor)
Want More:
See Jeremy & Alan
6-Nov-15
19
Overlap?
Integration
– Combine sources in a meaningful way
Identity
– Recognize same things in different contexts and different names
Composition
– Re-usable representations of composite pathway components
• to help us manage, query, and reference
Exchange
– Agreement on:
• What is to be exchanged
• How to represent it
• How to interpret it
Want more? See Alan, Jeremy, me
6-Nov-15
20
Hype graph
from Carole Goble
ISWC2005
Gene Ontology, Microarray Gene Expression Database
BioDASH
BioPAX, UniProt
Corporate
Semantic
Web
6-Nov-15
21
Gartner hype graph
BioDASH: Bridging Chemistry
and
Molecular
Biology
•Different Views have different semantics:
Lenses
• When there is a correspondence between
objects, a semantic binding is possible
Uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
6-Nov-15
Slide from Eric Neumann and Dennis Quan
22
Seamark Demonstration: Identification of new drug candidates
1. Differentiate different forms
of disease
ProbeSet.rdf
2. Identify patients subgroups.
3. Identify top biomarkers
GO2OMIM.rdf
4. Identify function
Probe
5. Identify biological and
chemical properties and
Gene
disease associations of
biomarker
MIM Id
OMIM.rdf
6. Identify documents
GO.rdf
7. Identify role in metabolic
GO2Enzyme.rdf
Enzyme
pathways
8. Identify compounds that
interact
9. Identify and compare
Compound
Enzymes.rdf
KEGG.rdf
function in other organisms
Pathway
10. Identify any prior art
GO2Keyword.rdf
Keywords.rdf



Keyword
GO2UniProt.rdf


Protein

IntAct.rdf

UniProt.rdf
Organism

Citation
Taxonomy.rdf
PubMed.xml


23
BioPAX Supporting Groups
Groups
Databases
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Memorial Sloan-Kettering Cancer Center: G.
Bader, M. Cary, J. Luciano, C. Sander
SRI Bioinformatics Research Group:
P. Karp,
S. Paley, J. Pick
University of Colorado Health Sciences Center:
I. Shah
BioPathways Consortium: J. Luciano,
E.
Neumann, A. Regev, V. Schachter
Argonne National Laboratory: N. Maltsev, E.
Marland
Samuel Lunenfeld Research Institute:
C.
Hogue
Harvard Medical School: E. Brauner,
D.
Marks, J. Luciano, A. Regev
NIST: R. Goldberg
Stanford: T. Klein
Columbia: A. Rzhetsky
Dana Farber Cancer Institute: J. Zucker
Millennium Pharma: Alan Ruttenberg
Science Commons: Jonathan Rees
BioCyc (www.biocyc.org)
BIND (www.bind.ca)
WIT (wit.mcs.anl.gov/WIT2)
Reactome (www.reactome.org)
PharmGKB (www.pharmgkb.org)
KEGG
Grants
•
Department of Energy (Workshop)
Collaborating Organizations:
•
•
•
Proteomics Standards Initiative (PSI)
Systems Biology Markup Language (SBML)
Chemical Markup Language (CML)
The BioPAX Community
6-Nov-15
24