Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/

Download Report

Transcript Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/

Interoperability With BioMoby 1.0

It’s Better Than Sharing Your Toothbrush!

Photo taken by http://flickr.com/people/mfsarwar/

A brief history of BioMoby

• Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) • May 21, 2002 – Genome Canada Platform Award • May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML • July 18, 2002 – First Moby Client (Gbrowse Moby) • June 9, 2003 – API Version 0.5 deployed • 2006 – Genome Canada Platform Award • 2007 - Version 1.0 API submitted for publication

MOBY-DIC Chapter VII

7 th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.

The Core Ahab’s

Wendy Richard Mylah Martin Eddie

Mark’s Screen… Ivan Paul Andreas

The BioMoby Plan

• Create an

ontology

of bioinformatics data-types • Define a

serialization

of this ontology (data

syntax

) • Create an

open API

over this ontology • Define Web Service inputs and outputs v.v. Ontology • Register Services in an

ontology-aware Registry

• • •

Machines can find an appropriate service Machines can execute that service unattended Ontology is community-extensible

Overview of BioMoby Transactions

MOBY hosts & services Gene names

Sequence Express. Protein Alleles …

MOBY Central

Overview of BioMoby Transactions

Discovery of services That consume things LIKE sequences!

Sequence

Align Phylogeny Primers

MOBY Central

That has these features __ Object ontology

This is SCUFL – Simple Conceptual Unified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…

Pipeline discovery “on the fly”

No explicit coordination between

providers

Dynamic discovery of ~appropriate

Services

Automated execution of services

Some BioMoby statistics

Moby: Breadth

Namespaces (data types): 418Objects (data syntaxes): >561Service Types (analytical categories): 112Providers: ~50 activeService Instances: ~1200 currently “alive” – In main Moby Central server in Canada – Others in “boutique” Moby registries serving specialized communities worldwide

Moby: Clients

Gbrowse_moby

(M Wilkinson)

PlaNet Locus_View

(H Schoof, R Ernst)

Blue-Jay

(P Gordon)

Taverna

(T Oinn, M Senger, E Kawas)

MOWserv

(INB, Spain)

Remora

(S Carrere, J Gouzy, INRA)

MOBYLE

(B Néron, P Tufféry, C Letondal, Pasteur Inst.)

SeaHawk

(P Gordon)

BioMoby in detail

MOBY Data typing system: Semantic TypeMOBY Data typing system: Syntactic TypeMoby Registry Queries

BioMoby in detail

MOBY Data typing system: Semantic TypeMOBY Data typing system: Syntactic TypeMoby Registry Queries

Moby Namespaces

A “Namespace” is a category of identifiers – NCBI has gi numbers (gi Namespace) – GO Terms have accession numbers (GO Namespace) • Namespaces indicate data’s semantic type. – GO:0003476  a Gene Ontology Term – gi|163483  a GenBank record • Though we are using the word “Namespace”

correctly, it causes confusion!

– “Namespace” in XML is tightly associated with an XML document and/or its syntax – In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX

BioMoby in detail

MOBY Data typing system: Semantic TypeMOBY Data typing system: Syntactic TypeMoby Registry Queries

BioMoby in detail

MOBY Data typing system: Semantic TypeMOBY Data typing system: Syntactic TypeMoby Registry Queries

The MOBY Object Ontology

Syntactic types are defined by a GO-like ontology – Class name at each node – Edges define the relationships between Classes – GO used as a model because of its familiarity in the community • Edges define one of three relationships – ISA • Inheritance relationship • All properties of the parent are present in the child – HASA • Container relationship of ‘exactly 1’ – HAS • Container relationship with ‘1 or more’

The Simplest Moby Data Type

=‘111076’/>

Object The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation

Moby Primitives

ISA DateTime ISA Float ISA Integer

38

Object ISA String

A Derived Data-Type

38

ISA Integer Object ISA String HASA Describes the semantic relationship between the Integer and the Virtual Sequence ISA Virtual Sequence

A Derived Data-Type

38 ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC ISA Integer HASA HASA Object ISA String ISA Virtual Sequence ISA Generic Sequence

A Derived Data-Type

38 ISA Integer Object ISA String ISA HASA HASA Virtual Sequence ISA Generic Sequence ISA DNA Sequence

Legacy file formats

• Containing “String” allows ontological classes to represent legacy data types TBLASTN 2.0.4 [Feb-24-1998] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.

Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Query= gi|1401126 (504 letters) Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters Searchingdone Score E Sequences producing significant alignments: (bits) Value gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0

emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05

Binaries – pictures, movies

• Text-base64 is a Class that contains String • Binaries are base64 encoded and passed in classes that inherit from text base64 • base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt

Extending legacy datatypes

• With legacy data-types defined, we can extend them as we see fit • annotated_jpeg ISA base64_encoded_jpeg • annotated_jpeg HASA 2D_Coordinate_set • annotated_jpeg HASA Description <

annotated_jpeg

namespace=‘TAIR_Image’ id=‘3343532’> <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

annotated_jpeg

>

The same object…

annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

The same object…

annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set

HASA Description

<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> 3554 663 This is the phenotype of a ufo-1 mutant under long daylength, 16’C MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3 Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U

Cross reference types

Simple – A MOBY Object Rich – Takes the form: ... Textual Description ...

– …Incidentally, this avoids the problem of reification that is experienced in RDF

XML Schema?

The Object Ontology allows new data-types

WITHOUT new flatfile formats, and without having to understand e.g. XML Schema

Minimize future heterogeneity Improve interoperability without requiring schema to-schema mapping

XML Schema?

Object Ontology terms have semantically

rich names, but this is primarily for human intuition

– DNA Sequence – Annotated_GIF • Object Ontology does not define the

meaning of an object to the machine

No machine-readable semanticsIt does define the representation SYNTAX

A portion of the MOBY-S Object Ontology

…community-built!

BioMoby in detail

MOBY Data typing system: Semantic TypeMOBY Data typing system: Syntactic TypeMoby Registry Queries

A Moby Central Query

Give me: – Services that consume THIS data-type in THIS syntax… – …do SOMETHING LIKE THIS to it… – …and provide me THAT data-type in response

Example

Find me services that – consume FASTA sequence data, – do a BLAST with it, – and provide me lists of GenBank GI numbers in return.

Query can be any or all of the above

criterion

– Also limit by service provider and service description keyword

Remember!!

Moby Registry Query

INPUT TYPE | | TRANSFORMATION TYPE | | OUTPUT TYPE

A weakness of MOBY Service discovery is horribly flawed due to insufficiently rich semantics…

The problem with Moby

Chickens go in; Pies come out!

The problem with Moby

What sort o’ pies?

The problem with Moby

Apple!

The MOBY-S Service Ontology

A simple ISA hierarchy… – too simple!

Primitive types include: – Analysis – Parsing – Registration – Retrieval – Resolution – Conversion – Rendering

A slice of the Service Ontology

Parse_NCBI_Blast

“The Exploding Bicycle”

Parse_WU_Blast Service

- A. Rector, U Manchester

WU_Blast Analysis Alignment Blast NCBI_Blast

Summary so far

BioMoby uses ontologies to describe both

data types and data syntaxes

– This is where the interoperability comes from – These are used to match consumers with providers during service discovery • BioMoby uses a simple ontology to describe

bioinformatics operations

– This ontology is only marginally useful

Seahawk

Highlight data in

your browser and drag/drop it into Moby

What could be

easier than that?!

Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208

Seahawk: A New Moby Client for Biologists

Drag ‘n’ drop, highlight existing data for use with MOBY Services

Paul Gordon & Christoph Sensen

BMC Bioinformatics

, in press

Seahawk looks like a browser

How do I load data?

How do I load data?

How do I load data?

Use the “open” button: – Text file (e.g. FASTA sequences) – HTML page (e.g. NCBI Entrez Web page) – RTF document (e.g. conference abstract) – MOBY XML document • Drag ‘n’ Drop – Web links and desktop files – Highlighted text from open documents or Web pages

Under the Hood (Beneath the Bonnet?)

Data has to be converted into Moby

XML format to be used by Moby

Moby data has to be converted back

to human-readable text for presentation to the biologist

Again: How do I load data?

How do I Find Services?

Right-click

MOB rules are invoked

Resulting Moby XML is used for service search

How do I run a service?

Click it!If necessary, a

service’s extra parameters can be set

Control+click

submits using default params

How do I run a service?

If required inputs

are missing, the missing ones must be dragged into place.

Unrecognized data

will be rejected

How do I collate data?

Seahawk clipboard

lets you build collections of objects

Seahawk “knows”

the type of collection and will suggest appropriate Moby services

Seahawk Summary

Seahawk integrates Moby Web Service

discovery and execution into the biologists day-to-day “Web Surfing” activity

It uses Regular Expressions and XSLT to

move normal web or hard-drive-file data into and out of BioMoby

Why doesn’t Moby Use RDF/OWL?

Timeline of Moby/W3C Activities RDF Candidate Spec RDF Schema Candidate Spec W3C Launches Semantic Web (SW) Activity Group RDF/OWL Formal W3C Recommendations Extensive SW toolbuilding…

>>>>>>

2000 2001 2002 2003 BioMoby XML Finalized BioMoby Project Established 2004 BioMoby Stable 0.85 API Published (>400 services) 2005 2006 BioMoby Stable 1.0 API Published

Moby 2.0

Getting it right, the second time!

What BioMoby Already Does

Sequence Data Blast Hit BLAST SERVER

What BioMoby Already Does

Sequence Data givesBlastResult Not “Bologically” Meaningful Blast Hit

What BioMoby Already Does

hasHomologyTo Sequence Data Blast Hit

…looks a lot like…

URI hasHomologyTo Which is effectively just an RDF triple, URI

Now think in reverse…

(in case you forgot…)

Moby Registry Query

INPUT TYPE | | TRANSFORMATION TYPE | | OUTPUT TYPE

Moby 2.0

What does Sequence Data Have homology to?

hasHomologyTo Send data

BLAST SERVICE

Blast Hit

Query FIND SERVICES THAT

Consume Sequence Data | | Provide hasHomologyTo Property | | Attached to other Sequence Data

SPARQL

A Semantic Web query languageQueries “look like” graphs

Find “X” with predicate “Y” attached to “Z”

Moby 2.0 extends the SPARQL query language

SPARQL queries contain concepts and the

relationships between them (subject, predicate, object)

We simply map RDF predicates onto Moby

services capable of generating that relationship

Registry query: “What Moby service

consumes [subject] and generates the [predicate] relationship type?”

But wait, there’s more!

Exploit knowledge in OWL ontologies to enhance query

Subject Predicate Evaluate Query Expression Look up and execute Moby service Consumes STK or proteins and Looks-up inhibitor molecules Subject Predicate Look up and execute Moby service Consumes proteins and generates Functional annotation info

Exploit knowledge in OWL ontologies to enhance query

This SPARQL query could be posed on a database of RAW, UNANNOTATED Protein sequences, and be answered by Moby 2.0 (a.k.a. CardioSHARE)

Credits

Genome Canada/Genome AlbertamyGrid – Carole Goble in particularSpanish National Institute for Bioinformatics

(INB) through Fundación Genoma España

Generation Challenge Programme (GCP) of

the Consultative Group for International Agricultural Research (CGIAR)

Heart and Stroke Foundation of BC and

Yukon (CardioSHARE)

Microsoft Research (CardioSHARE)