How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology Joanne Luciano BioPAX Workgroup (biopax.org) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham.
Download ReportTranscript How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology Joanne Luciano BioPAX Workgroup (biopax.org) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham.
How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology
Joanne Luciano BioPAX Workgroup ( biopax.org
) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham MA, USA
Introduction
BioPAX = Biopathway Exchange Language Emerged at ISMB • conceived at ISMB ’01 • born at ISMB ’02 • crawling at ISMB ’03 (Level 0.5) • walking at ISMB ’04 (Level 1.0) • now in the “terrible twos” 3 May 2005 2
Ontology Intro
• Natural language does a poor job at conveying complex information without ambiguity • Ontologies provide a means to give concise meanings to pieces of data from a particular domain – Thereby facilitating computational operations on the data • Ontologies are becoming increasingly common in the biological community – See http://obo.sourceforge.net/obo.htm
3 May 2005 3
Ontology: Components
• Class hierarchy: chemical • Values: occupy slots • Relations & attributes: fields (slots) on the classes, can be other classes • Constraints: Define allowable values and connections within an ontology • Objects: instances of classes • Controlled vocabularies (CVs) protein • BioPAX will use class, attributes, constraints, values and CVs. Objects are user responsibility * 3 May 2005 From Peter Karp, “Ontologies: Definitions, Components, Subtypes”, SRI International, presentation available at http://www.biopax.org
4
What is a Pathway?
Depends on who you ask!
Glycolysis Protein-Protein Apoptosis Lac Operon Metabolic Pathways Molecular Interaction Networks Signaling Pathways
3 May 2005
Gene Regulation
5
High Throughput Experimental Methods
Microarray Two-Hybrid Mass Spectrometry Genetics Expression Interaction Data Function Protein modifications Existing Literature PubMed
3 May 2005
Multiple Pathway Databases Integration Nightmare!
6
So many pathway databases…
Each has its own data model, format, and data access methods
Source: Pathway Resource List ( http://cbio.mskcc.org/prl/ )
3 May 2005 7
Pathway Databases
WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT IntAct PSI format CSNDB TRANSPATH TRANSFAC PubGene GeneWays
Research Community Needs
Semantic Aggregation, Integration, Inference (Pedantic Aggravation, Irritation, and Interference) 3 May 2005 8
A Common Exchange Language
Promotes collaboration (big science), accessibility
Application Database User
Without BioPAX With BioPAX Over 170 DBs and tools
3 May 2005 Common “computable semantic” enables scientific discovery 9
Closes Gaps in Pathway Data Space Exchange Language Domain
Database Exchange Formats BioPAX Simulation Model Exchange Formats
Genetic Interactions
PSI-MI 2
Interaction Networks Molecular Non-molecular Pro:Pro TF:Gene Genetic Molecular Interactions Pro:Pro All:All Small Molecules Low Detail High Detail Regulatory Pathways Low Detail High Detail Biochemical Reactions Metabolic Pathways Low Detail High Detail
SBML, CellML
Rate Formulas
3 May 2005 10
Design Goals
• Encapsulation: An entire pathway in one record • Compatible: Use existing standards wherever possible • Computable: From file reading to logical inference • Successful: Buy-in from the research community 3 May 2005 11
Technical Goals
Interoperability – Integration and exchange of pathway data – Interchange through a common (standard) representation – accommodate existing database representations – provide a basis for future databases – enables development of tools for searching and reasoning over the data base Development of tools and API to facilitate conversion (libBioPAX) 3 May 2005 12
Technical Goals (cont’d)
Why OWL? Why OWL DL?
Expressivity (biology = “complex relationships”) • W3C Standard (use existing standards) “Semantic Web enabled” • XML based (the exchange language in computing) • Machine Computable – Facilitate integration of knowledge, data, tool development – Uncover inconsistencies and new knowledge – OWL DL • Enable full reasoning capability for users from file reading to logical inference • Complete: all conclusions are guaranteed to be computed • Decidable: all computations will finish in finite time (with OWL Lite, short amount of time) 3 May 2005 13
Social Logistics
Get organized Make the decision & commitment 2 or 3 dedicated individuals to be the contact points Small core group – Bi-weekly conference calls, bi-monthly F2F – Commitment & resources • Participants willing and able cover their costs • Outside funding (DOE) Special interests and needs form subgroup task forces • Core group member(s) • Outside experts International representation & participation (Outreach & Community Building) • conferences and mailing lists • follow-up and individual Collaborate with complementary/competing representations 3 May 2005 14
Social Logistics (cont’d)
How we engendered buy in from the field which made life much easier
Take things in steps:
•Pathway Database vision -> Data Exchange Format as 1 st complexity Level 1 supports Metabolic pathways, Level 2 step
Early success leads to early adoption, leads to increased probability of overall project success.
Get “buy in” and get involvement -leads to acceptance later •Support the existing databases (BioCYC, WIT, BIND, etc.) –Got database sources to agree to participate in the development to assure that their DBs will be properly represented •Got database sources to agree to export in the new format once it is defined 3 May 2005 15
Social Logistics (cont’d)
Get “buy in” (continued) • Community Involvement and Support Core group (represents voice of community, small, committed) Mailing List User community interaction (BioPAX-Boston) Subgroups • International Meetings and Presentations Tool developers Modelers Users (researchers) Ontology developers Database providers Complementary representations (SBML, CellML) Like minds General Community 3 May 2005 16
Implementation of BioPAX
Designed using GKB Editor and Protégé BioPAX uses OWL to define the “Schema” BioPAX Instances to store the data Technically, an ontology with instance data is a knowledge base 3 May 2005 17
BioPAX – Ontology
3 May 2005 Level 1: Metabolic Pathways 18
3 May 2005
Creating and Editing
19
Mapping Pathways to BioPAX
OWL
(schema ) 3 May 2005
Instances (Individuals) data
20
Mapping Pathways to BioPAX
3 May 2005 21
Challenges & Bottlenecks
• Scientific – What’s a pathway? Depends on who you ask. • Technical – Each own syntax & semantics – Immaturity of tools for data integration • Social / Logistical – Community organization and adoption • Financial – mostly volunteer of stakeholders – Dept of Energy 3 May 2005 22
Bridging Chemistry and Molecular Biology
•
Different Views have different semantics: Lenses
•
When there is a correspondence between objects, a semantic binding is possible
Uniprot: P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot 3 May 2005
Source: Eric Neumann
23
Enables Computable Biology
BioPAX increases collaboration and accessibility to the field and enables 'big science' because it delivers a scalable solution Capture the complex relationships inherent in Biology
Solves some nasty integration problems Saves a lot of time and money
3 May 2005 24
• • • • • • • • • • •
BioPAX Supporting Groups
Groups Databases G. Bader, M. Cary, J. Luciano, C. Sander SRI Bioinformatics Research Group: P. Karp, S. Paley, J. Pick University of Colorado Health Sciences Center: I. Shah BioPathways Consortium: J. Luciano, E. Neumann, A. Regev, V. Schachter Argonne National Laboratory: N. Maltsev, E. Marland Samuel Lunenfeld Research Institute: C. Hogue Harvard Medical School: E. Brauner, D. Marks, J. Luciano, A. Regev NIST: R. Goldberg Stanford: T. Klein Columbia: A. Rzhetsky Dana Farber Cancer Institute: J. Zucker • • • • • BioCyc (www.biocyc.org) BIND (www.bind.ca) WIT (wit.mcs.anl.gov/WIT2) PharmGKB ( www.pharmgkb.org
) Grants Department of Energy (Workshop) • • • •
Collaborating Organizations:
Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) CellML Chemical Markup Language (CML) 3 May 2005
The BioPAX Community
25
3 May 2005
Thank you!
Questions?
26