How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology Joanne Luciano BioPAX Workgroup (biopax.org) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham.

Download Report

Transcript How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology Joanne Luciano BioPAX Workgroup (biopax.org) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham.

How Ontologies Add Value BioPAX: Biological Pathway Data Exchange Ontology

Joanne Luciano BioPAX Workgroup ( biopax.org

) BioPathways Consortium Liaison (biopathways.org) 3 May 2005 KM Pro Forum Bentley College, Waltham MA, USA

Introduction

BioPAX = Biopathway Exchange Language Emerged at ISMB • conceived at ISMB ’01 • born at ISMB ’02 • crawling at ISMB ’03 (Level 0.5) • walking at ISMB ’04 (Level 1.0) • now in the “terrible twos” 3 May 2005 2

Ontology Intro

• Natural language does a poor job at conveying complex information without ambiguity • Ontologies provide a means to give concise meanings to pieces of data from a particular domain – Thereby facilitating computational operations on the data • Ontologies are becoming increasingly common in the biological community – See http://obo.sourceforge.net/obo.htm

3 May 2005 3

Ontology: Components

• Class hierarchy: chemical • Values: occupy slots  • Relations & attributes: fields (slots) on the classes, can be other classes • Constraints: Define allowable values and connections within an ontology • Objects: instances of classes • Controlled vocabularies (CVs) protein • BioPAX will use class, attributes, constraints, values and CVs. Objects are user responsibility * 3 May 2005 From Peter Karp, “Ontologies: Definitions, Components, Subtypes”, SRI International, presentation available at http://www.biopax.org

4

What is a Pathway?

Depends on who you ask!

Glycolysis Protein-Protein Apoptosis Lac Operon Metabolic Pathways Molecular Interaction Networks Signaling Pathways

3 May 2005

Gene Regulation

5

High Throughput Experimental Methods

Microarray Two-Hybrid Mass Spectrometry Genetics Expression Interaction Data Function Protein modifications Existing Literature PubMed

3 May 2005

Multiple Pathway Databases Integration Nightmare!

6

So many pathway databases…

Each has its own data model, format, and data access methods

Source: Pathway Resource List ( http://cbio.mskcc.org/prl/ )

3 May 2005 7

Pathway Databases

WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT IntAct PSI format CSNDB TRANSPATH TRANSFAC PubGene GeneWays

Research Community Needs

Semantic Aggregation, Integration, Inference (Pedantic Aggravation, Irritation, and Interference) 3 May 2005 8

A Common Exchange Language

Promotes collaboration (big science), accessibility

Application Database User

Without BioPAX With BioPAX Over 170 DBs and tools

3 May 2005 Common “computable semantic” enables scientific discovery 9

Closes Gaps in Pathway Data Space Exchange Language Domain

Database Exchange Formats BioPAX Simulation Model Exchange Formats

Genetic Interactions

PSI-MI 2

Interaction Networks Molecular Non-molecular Pro:Pro TF:Gene Genetic Molecular Interactions Pro:Pro All:All Small Molecules Low Detail High Detail Regulatory Pathways Low Detail High Detail Biochemical Reactions Metabolic Pathways Low Detail High Detail

SBML, CellML

Rate Formulas

3 May 2005 10

Design Goals

Encapsulation: An entire pathway in one record • Compatible: Use existing standards wherever possible • Computable: From file reading to logical inference • Successful: Buy-in from the research community 3 May 2005 11

Technical Goals

Interoperability – Integration and exchange of pathway data – Interchange through a common (standard) representation – accommodate existing database representations – provide a basis for future databases – enables development of tools for searching and reasoning over the data base Development of tools and API to facilitate conversion (libBioPAX) 3 May 2005 12

Technical Goals (cont’d)

Why OWL? Why OWL DL?

Expressivity (biology = “complex relationships”) • W3C Standard (use existing standards) “Semantic Web enabled” • XML based (the exchange language in computing) • Machine Computable – Facilitate integration of knowledge, data, tool development – Uncover inconsistencies and new knowledge – OWL DL • Enable full reasoning capability for users from file reading to logical inference • Complete: all conclusions are guaranteed to be computed • Decidable: all computations will finish in finite time (with OWL Lite, short amount of time) 3 May 2005 13

Social Logistics

Get organized Make the decision & commitment 2 or 3 dedicated individuals to be the contact points Small core group – Bi-weekly conference calls, bi-monthly F2F – Commitment & resources • Participants willing and able cover their costs • Outside funding (DOE) Special interests and needs form subgroup task forces • Core group member(s) • Outside experts International representation & participation (Outreach & Community Building) • conferences and mailing lists • follow-up and individual Collaborate with complementary/competing representations 3 May 2005 14

Social Logistics (cont’d)

How we engendered buy in from the field which made life much easier

Take things in steps:

•Pathway Database vision -> Data Exchange Format as 1 st complexity Level 1 supports Metabolic pathways, Level 2 step

Early success leads to early adoption, leads to increased probability of overall project success.

Get “buy in” and get involvement -leads to acceptance later •Support the existing databases (BioCYC, WIT, BIND, etc.) –Got database sources to agree to participate in the development to assure that their DBs will be properly represented •Got database sources to agree to export in the new format once it is defined 3 May 2005 15

Social Logistics (cont’d)

Get “buy in” (continued) • Community Involvement and Support Core group (represents voice of community, small, committed) Mailing List User community interaction (BioPAX-Boston) Subgroups • International Meetings and Presentations Tool developers Modelers Users (researchers) Ontology developers Database providers Complementary representations (SBML, CellML) Like minds General Community 3 May 2005 16

Implementation of BioPAX

Designed using GKB Editor and Protégé BioPAX uses OWL to define the “Schema” BioPAX Instances to store the data Technically, an ontology with instance data is a knowledge base 3 May 2005 17

BioPAX – Ontology

3 May 2005 Level 1: Metabolic Pathways 18

3 May 2005

Creating and Editing

19

Mapping Pathways to BioPAX

OWL

(schema ) 3 May 2005

Instances (Individuals) data

20

Mapping Pathways to BioPAX

3 May 2005 21

Challenges & Bottlenecks

• Scientific – What’s a pathway? Depends on who you ask. • Technical – Each own syntax & semantics – Immaturity of tools for data integration • Social / Logistical – Community organization and adoption • Financial – mostly volunteer of stakeholders – Dept of Energy 3 May 2005 22

Bridging Chemistry and Molecular Biology

Different Views have different semantics: Lenses

When there is a correspondence between objects, a semantic binding is possible

Uniprot: P49841

Apply Correspondence Rule:

if ?target.xref.lsid == ?bpx:prot.xref.lsid

then ?target.correspondsTo.?bpx:prot 3 May 2005

Source: Eric Neumann

23

Enables Computable Biology

BioPAX increases collaboration and accessibility to the field and enables 'big science' because it delivers a scalable solution Capture the complex relationships inherent in Biology

Solves some nasty integration problems Saves a lot of time and money

3 May 2005 24

• • • • • • • • • • •

BioPAX Supporting Groups

Groups Databases G. Bader, M. Cary, J. Luciano, C. Sander SRI Bioinformatics Research Group: P. Karp, S. Paley, J. Pick University of Colorado Health Sciences Center: I. Shah BioPathways Consortium: J. Luciano, E. Neumann, A. Regev, V. Schachter Argonne National Laboratory: N. Maltsev, E. Marland Samuel Lunenfeld Research Institute: C. Hogue Harvard Medical School: E. Brauner, D. Marks, J. Luciano, A. Regev NIST: R. Goldberg Stanford: T. Klein Columbia: A. Rzhetsky Dana Farber Cancer Institute: J. Zucker • • • • • BioCyc (www.biocyc.org) BIND (www.bind.ca) WIT (wit.mcs.anl.gov/WIT2) PharmGKB ( www.pharmgkb.org

) Grants Department of Energy (Workshop) • • • •

Collaborating Organizations:

Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) CellML Chemical Markup Language (CML) 3 May 2005

The BioPAX Community

25

3 May 2005

Thank you!

Questions?

26