Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/
Download ReportTranscript Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/
Interoperability With BioMoby 1.0
It’s Better Than Sharing Your Toothbrush!
Photo taken by http://flickr.com/people/mfsarwar/
A brief history of BioMoby
• Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) • May 21, 2002 – Genome Canada Platform Award • May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML • July 18, 2002 – First Moby Client (Gbrowse Moby) • June 9, 2003 – API Version 0.5 deployed • 2006 – Genome Canada Platform Award • 2007 - Version 1.0 API submitted for publication
MOBY-DIC Chapter VII
7 th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.
The Core Ahab’s
Wendy Richard Mylah Martin Eddie
Mark’s Screen… Ivan Paul Andreas
The BioMoby Plan
• Create an
ontology
of bioinformatics data-types • Define a
serialization
of this ontology (data
syntax
) • Create an
open API
over this ontology • Define Web Service inputs and outputs v.v. Ontology • Register Services in an
ontology-aware Registry
• • •
Machines can find an appropriate service Machines can execute that service unattended Ontology is community-extensible
Overview of BioMoby Transactions
MOBY hosts & services Gene names
Sequence Express. Protein Alleles …
MOBY Central
Overview of BioMoby Transactions
Discovery of services That consume things LIKE sequences!
Sequence
Align Phylogeny Primers
MOBY Central
That has these features __ Object ontology
This is SCUFL – Simple Conceptual Unified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…
Pipeline discovery “on the fly”
• No explicit coordination between
providers
• Dynamic discovery of ~appropriate
Services
• Automated execution of services
Some BioMoby statistics
Moby: Breadth
• Namespaces (data types): 418 • Objects (data syntaxes): >561 • Service Types (analytical categories): 112 • Providers: ~50 active • Service Instances: ~1200 currently “alive” – In main Moby Central server in Canada – Others in “boutique” Moby registries serving specialized communities worldwide
Moby: Clients
• Gbrowse_moby
(M Wilkinson)
• PlaNet Locus_View
(H Schoof, R Ernst)
• Blue-Jay
(P Gordon)
• Taverna
(T Oinn, M Senger, E Kawas)
• MOWserv
(INB, Spain)
• Remora
(S Carrere, J Gouzy, INRA)
• MOBYLE
(B Néron, P Tufféry, C Letondal, Pasteur Inst.)
• SeaHawk
(P Gordon)
BioMoby in detail
• MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
BioMoby in detail
• MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
Moby Namespaces
• A “Namespace” is a category of identifiers – NCBI has gi numbers (gi Namespace) – GO Terms have accession numbers (GO Namespace) • Namespaces indicate data’s semantic type. – GO:0003476 a Gene Ontology Term – gi|163483 a GenBank record • Though we are using the word “Namespace”
correctly, it causes confusion!
– “Namespace” in XML is tightly associated with an XML document and/or its syntax – In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX
BioMoby in detail
• MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
BioMoby in detail
• MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
The MOBY Object Ontology
• Syntactic types are defined by a GO-like ontology – Class name at each node – Edges define the relationships between Classes – GO used as a model because of its familiarity in the community • Edges define one of three relationships – ISA • Inheritance relationship • All properties of the parent are present in the child – HASA • Container relationship of ‘exactly 1’ – HAS • Container relationship with ‘1 or more’
The Simplest Moby Data Type
Object The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation
Moby Primitives
ISA DateTime ISA Float ISA Integer
Object ISA String
A Derived Data-Type
VirtualSequence > ISA Integer Object ISA String HASA Describes the semantic relationship between the Integer and the Virtual Sequence ISA Virtual Sequence
A Derived Data-Type
A Derived Data-Type
Legacy file formats
• Containing “String” allows ontological classes to represent legacy data types
Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
Query= gi|1401126 (504 letters) Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters Searchingdone Score E Sequences producing significant alignments: (bits) Value gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0
emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05
Binaries – pictures, movies
• Text-base64 is a Class that contains String • Binaries are base64 encoded and passed in classes that inherit from text base64 • base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String
String >
Extending legacy datatypes
• With legacy data-types defined, we can extend them as we see fit • annotated_jpeg ISA base64_encoded_jpeg • annotated_jpeg HASA 2D_Coordinate_set • annotated_jpeg HASA Description <
annotated_jpeg
namespace=‘TAIR_Image’ id=‘3343532’> <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”>
annotated_jpeg
>
The same object…
annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description
The same object…
annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set
HASA Description
Cross reference types
• Simple – A MOBY Object
– …Incidentally, this avoids the problem of reification that is experienced in RDF
XML Schema?
The Object Ontology allows new data-types
WITHOUT new flatfile formats, and without having to understand e.g. XML Schema
Minimize future heterogeneity Improve interoperability without requiring schema to-schema mapping
XML Schema?
• Object Ontology terms have semantically
rich names, but this is primarily for human intuition
– DNA Sequence – Annotated_GIF • Object Ontology does not define the
meaning of an object to the machine
– No machine-readable semantics • It does define the representation – SYNTAX
A portion of the MOBY-S Object Ontology
…community-built!
BioMoby in detail
• MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries
A Moby Central Query
• Give me: – Services that consume THIS data-type in THIS syntax… – …do SOMETHING LIKE THIS to it… – …and provide me THAT data-type in response
Example
• Find me services that – consume FASTA sequence data, – do a BLAST with it, – and provide me lists of GenBank GI numbers in return.
• Query can be any or all of the above
criterion
– Also limit by service provider and service description keyword
Remember!!
Moby Registry Query
INPUT TYPE | | TRANSFORMATION TYPE | | OUTPUT TYPE
A weakness of MOBY Service discovery is horribly flawed due to insufficiently rich semantics…
The problem with Moby
Chickens go in; Pies come out!
The problem with Moby
What sort o’ pies?
The problem with Moby
Apple!
The MOBY-S Service Ontology
• A simple ISA hierarchy… – too simple!
• Primitive types include: – Analysis – Parsing – Registration – Retrieval – Resolution – Conversion – Rendering
A slice of the Service Ontology
Parse_NCBI_Blast
“The Exploding Bicycle”
Parse_WU_Blast Service
- A. Rector, U Manchester
WU_Blast Analysis Alignment Blast NCBI_Blast
Summary so far
• BioMoby uses ontologies to describe both
data types and data syntaxes
– This is where the interoperability comes from – These are used to match consumers with providers during service discovery • BioMoby uses a simple ontology to describe
bioinformatics operations
– This ontology is only marginally useful
Seahawk
• Highlight data in
your browser and drag/drop it into Moby
• What could be
easier than that?!
Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208
Seahawk: A New Moby Client for Biologists
Drag ‘n’ drop, highlight existing data for use with MOBY Services
Paul Gordon & Christoph Sensen
BMC Bioinformatics
, in press
Seahawk looks like a browser
How do I load data?
How do I load data?
How do I load data?
• Use the “open” button: – Text file (e.g. FASTA sequences) – HTML page (e.g. NCBI Entrez Web page) – RTF document (e.g. conference abstract) – MOBY XML document • Drag ‘n’ Drop – Web links and desktop files – Highlighted text from open documents or Web pages
Under the Hood (Beneath the Bonnet?)
• Data has to be converted into Moby
XML format to be used by Moby
• Moby data has to be converted back
to human-readable text for presentation to the biologist
Again: How do I load data?
How do I Find Services?
• Right-click
MOB rules are invoked
• Resulting Moby XML is used for service search
How do I run a service?
• Click it! • If necessary, a
service’s extra parameters can be set
• Control+click
submits using default params
How do I run a service?
• If required inputs
are missing, the missing ones must be dragged into place.
• Unrecognized data
will be rejected
How do I collate data?
• Seahawk clipboard
lets you build collections of objects
• Seahawk “knows”
the type of collection and will suggest appropriate Moby services
Seahawk Summary
• Seahawk integrates Moby Web Service
discovery and execution into the biologists day-to-day “Web Surfing” activity
• It uses Regular Expressions and XSLT to
move normal web or hard-drive-file data into and out of BioMoby
Why doesn’t Moby Use RDF/OWL?
Timeline of Moby/W3C Activities RDF Candidate Spec RDF Schema Candidate Spec W3C Launches Semantic Web (SW) Activity Group RDF/OWL Formal W3C Recommendations Extensive SW toolbuilding…
>>>>>>
2000 2001 2002 2003 BioMoby XML Finalized BioMoby Project Established 2004 BioMoby Stable 0.85 API Published (>400 services) 2005 2006 BioMoby Stable 1.0 API Published
Moby 2.0
Getting it right, the second time!
What BioMoby Already Does
Sequence Data Blast Hit BLAST SERVER
What BioMoby Already Does
Sequence Data givesBlastResult Not “Bologically” Meaningful Blast Hit
What BioMoby Already Does
hasHomologyTo Sequence Data Blast Hit
…looks a lot like…
URI hasHomologyTo Which is effectively just an RDF triple, URI
Now think in reverse…
(in case you forgot…)
Moby Registry Query
INPUT TYPE | | TRANSFORMATION TYPE | | OUTPUT TYPE
Moby 2.0
What does Sequence Data Have homology to?
hasHomologyTo Send data
BLAST SERVICE
Blast Hit
Query FIND SERVICES THAT
Consume Sequence Data | | Provide hasHomologyTo Property | | Attached to other Sequence Data
SPARQL
• A Semantic Web query language • Queries “look like” graphs
Find “X” with predicate “Y” attached to “Z”
Moby 2.0 extends the SPARQL query language
• SPARQL queries contain concepts and the
relationships between them (subject, predicate, object)
• We simply map RDF predicates onto Moby
services capable of generating that relationship
• Registry query: “What Moby service
consumes [subject] and generates the [predicate] relationship type?”
But wait, there’s more!
Exploit knowledge in OWL ontologies to enhance query
Subject Predicate Evaluate Query Expression Look up and execute Moby service Consumes STK or proteins and Looks-up inhibitor molecules Subject Predicate Look up and execute Moby service Consumes proteins and generates Functional annotation info
Exploit knowledge in OWL ontologies to enhance query
This SPARQL query could be posed on a database of RAW, UNANNOTATED Protein sequences, and be answered by Moby 2.0 (a.k.a. CardioSHARE)
Credits
• Genome Canada/Genome Alberta • myGrid – Carole Goble in particular • Spanish National Institute for Bioinformatics
(INB) through Fundación Genoma España
• Generation Challenge Programme (GCP) of
the Consultative Group for International Agricultural Research (CGIAR)
• Heart and Stroke Foundation of BC and
Yukon (CardioSHARE)
• Microsoft Research (CardioSHARE)