BioPAX Biological Pathways Data Exchange www.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA.
Download ReportTranscript BioPAX Biological Pathways Data Exchange www.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA.
BioPAX Biological Pathways Data Exchange www.biopaxwiki.org Joanne Luciano, PhD University of Manchester, Harvard Medical School BioPathways Consortium, BioPAX Group, Predictive Medicine, Inc. 25 Jan 2006 Cambridge, MA USA Pathway Data Why does HCLS care? (where we fit) Pathway Research has Broad Impact – – – – Drug Discovery (pathway of target, safety) Basic Science (identify pathways) Disease Research (cancer pathways, diabetes, malaria) Environmental Research (microbial research) Combine knowledge from multiple sources – Whole is greater than the sum of its parts – Biological knowledge is fragmented and isolated – Need database to manage resources 6-Nov-15 2 What is a Pathway? Depends on who you ask! Glycolysis Protein-Protein Apoptosis TFs in E. coli Metabolic Pathways Molecular Interaction Networks Signaling Pathways Gene Regulatory Networks 6-Nov-15 3 High Throughput Experimental Methods Microarray Two-Hybrid Expression Interaction Data Mass Spectrometry Function Genetics Protein modifications Existing Literature 6-Nov-15 Slide from Gary Bader Multiple Pathway Databases Integration Nightmare! 4 Pathway Databases So many pathway databases, their own data models, formats, and data access methods and internal inconsistencies. More than 200 and growing Source: Pathway Resource List (http://cbio.mskcc.org/prl/) 6-Nov-15 Slide from Mike Cary 5 Closes Gaps in Pathway Data Space Exchange Language Domain Database Exchange Formats BioPAX Genetic Interactions PSI-MI 2 Interaction Networks Molecular Pro:Pro Simulation Model Exchange Formats Non-molecular TF:Gene SBML, CellML Regulatory Pathways Low Detail Genetic Molecular Interactions Pro:Pro Biochemical Reactions All:All Metabolic Small Molecules Low Detail 6-Nov-15 Slide from Gary Bader High Detail Low Detail Rate Formulas Pathways High Detail High Detail 6 Research Community Need Pathway Databases Metabolic Molecular Interaction Cell Signaling Gene Regulatory Networks 6-Nov-15 WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT IntAct PSI format CSNDB TRANSPATH TRANSFAC INOH PubGene GeneWays Integrated Pathway Database Distributed Pathway Databases 7 One Interface one converter per data source or tool >200 DBs and tools Application Database User Without BioPAX 6-Nov-15 With BioPAX Common “computable semantic” enables scientific discovery Slide from Gary Bader (adapted) 8 Design Goals Encapsulation – An entire pathway in one record Compatible – Use existing standards wherever possible Computable – From file reading to logical inference Successful – Buy-in from the research community 6-Nov-15 9 Why OWL DL? Expressivity (biology = “complex relationships” • W3C Standard (use existing (and upcoming) standards) “Semantic Web enabled” • OWL has representations in RDF and XML (XML the exchange language) Machine Computable Enable full reasoning capability from file reading to logical inference – facilitate integration of knowledge, data, tool development – uncover inconsistencies and new knowledge 6-Nov-15 10 Different representations of the same pathways <!ELEMENT reaction (substrate*,product*)> <!ATTLIST reaction name %keggid.type; #REQUIRED> <!ATTLIST reaction type %reactiontype.type; #REQUIRED> <!ELEMENT substrate EMPTY> <!ATTLIST substrate name #REQUIRED> <!ELEMENT product EMPTY> <!ATTLIST product name #REQUIRED> %keggid.type; %keggid.type; starts at a-D-Glucose 1P 6-Nov-15 KEGG Reference Pathway GLYCOLYSIS 11 Different representations of the same pathways reactions.dat This file lists all chemical reactions in the PGDB. starts at b-D-glucose6-phosphate 6-Nov-15 Attributes: UNIQUE-ID TYPES COMMON-NAME ACTIVATORS BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0 DEPRESSORS EC-LIST EC-NUMBER ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT IN-PATHWAY INHIBITORS LEFT MOVED-IN MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS RIGHT SIGNAL SPECIES SPONTANEOUS? STIMULATORS SYNONYMS BioCYC Reference Pathway GLYCOLYSIS 12 BioPAX uses other ontologies • Use pointers to existing ontologies to provide supplemental annotation where appropriate – Cellular location GO Component – Cell type Cell.obo – Organism NCBI taxon DB • Incorporate other standards where appropriate – Chemical structure SMILES, CML, InChI 6-Nov-15 13 BioPAX Ontology: Overview an set of interactions & parts parts how the parts are known to interact 6-Nov-15 Level 1 v1.0 (July 7th, 2004) Slide from Gary Bader (adapted) 14 OWL (semantics) Instances (data) 6-Nov-15 15 SBML annotated with BioPAX <sbml xmlns:bp=“http://www.biopax.org/release1/biopax-release1.owl” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <listOfSpecies> <species id=“PdhA” metaid=“PdhA”> species is protein <annotation> <bp:protein rdf:ID=“#PdhA”/> protein is PdhA </annotation> </species> <species id=“NADP+” metaid=“NADP+”> species is small molecule <annotation> <bp:smallMolecule rdf:ID=“#NADP+”/> small molecule is NADP+ </annotation> </listOfSpecies> <listOfReactions> <reaction id=“pyruvate_dehydrogenase_cplx”> <annotation> <bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/> </annotation> </reaction> </listOfReactions> 6-Nov-15 16 BioPAX: External References <species id=“pyruvate” metaid=“pyruvate”> <annotation xmlns:bp=“http://biopax.org/release1/biopaxrelease1.owl”> <bp:smallMolecule rdf:ID=“#pyruvate”> <bp:Xref> <bp:unificationXref rdf:ID=“#unificationXref119"> <bp:DB>LIGAND</bp:DB> <bp:ID>c00022</bp:ID> </bp:unificationXref> </bp:Xref> </bp:smallMolecule> </annotation> 6-Nov-15 17 </species> BioPAX: Synonyms <species id=“pyruvate” metaid=“pyruvate”> <annotation xmlns:bp=“http://biopax.org/release1/biopax_release1. owl”/> <bp:smallMolecule rdf:ID=“#pyruvate” > <bp:SYNONYMS>2-oxo-propionic acid</bp:SYNONYMS> <bp:SYNONYMS>2-oxopropanoate</bp:SYNONYMS> <bp:SYNONYMS>BTS</bp:SYNONYMS> <bp:SYNONYMS>pyruvic acid</bp:SYNONYMS> </bp:smallMolecule> </annotation> </species> 6-Nov-15 18 Tools Protégé Ontology Editor GKB Editor SRI SWOOP Pellet Racer Fact++ Pathway Tools EditPlus (Text editor) Want More: See Jeremy & Alan 6-Nov-15 19 Overlap? Integration – Combine sources in a meaningful way Identity – Recognize same things in different contexts and different names Composition – Re-usable representations of composite pathway components • to help us manage, query, and reference Exchange – Agreement on: • What is to be exchanged • How to represent it • How to interpret it Want more? See Alan, Jeremy, me 6-Nov-15 20 Hype graph from Carole Goble ISWC2005 Gene Ontology, Microarray Gene Expression Database BioDASH BioPAX, UniProt Corporate Semantic Web 6-Nov-15 21 Gartner hype graph BioDASH: Bridging Chemistry and Molecular Biology •Different Views have different semantics: Lenses • When there is a correspondence between objects, a semantic binding is possible Uniprot:P49841 Apply Correspondence Rule: if ?target.xref.lsid == ?bpx:prot.xref.lsid then ?target.correspondsTo.?bpx:prot 6-Nov-15 Slide from Eric Neumann and Dennis Quan 22 Seamark Demonstration: Identification of new drug candidates 1. Differentiate different forms of disease ProbeSet.rdf 2. Identify patients subgroups. 3. Identify top biomarkers GO2OMIM.rdf 4. Identify function Probe 5. Identify biological and chemical properties and Gene disease associations of biomarker MIM Id OMIM.rdf 6. Identify documents GO.rdf 7. Identify role in metabolic GO2Enzyme.rdf Enzyme pathways 8. Identify compounds that interact 9. Identify and compare Compound Enzymes.rdf KEGG.rdf function in other organisms Pathway 10. Identify any prior art GO2Keyword.rdf Keywords.rdf Keyword GO2UniProt.rdf Protein IntAct.rdf UniProt.rdf Organism Citation Taxonomy.rdf PubMed.xml 23 BioPAX Supporting Groups Groups Databases • • • • • • • • • • • • • • • • • • • Memorial Sloan-Kettering Cancer Center: G. Bader, M. Cary, J. Luciano, C. Sander SRI Bioinformatics Research Group: P. Karp, S. Paley, J. Pick University of Colorado Health Sciences Center: I. Shah BioPathways Consortium: J. Luciano, E. Neumann, A. Regev, V. Schachter Argonne National Laboratory: N. Maltsev, E. Marland Samuel Lunenfeld Research Institute: C. Hogue Harvard Medical School: E. Brauner, D. Marks, J. Luciano, A. Regev NIST: R. Goldberg Stanford: T. Klein Columbia: A. Rzhetsky Dana Farber Cancer Institute: J. Zucker Millennium Pharma: Alan Ruttenberg Science Commons: Jonathan Rees BioCyc (www.biocyc.org) BIND (www.bind.ca) WIT (wit.mcs.anl.gov/WIT2) Reactome (www.reactome.org) PharmGKB (www.pharmgkb.org) KEGG Grants • Department of Energy (Workshop) Collaborating Organizations: • • • Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) Chemical Markup Language (CML) The BioPAX Community 6-Nov-15 24