BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009

Download Report

Transcript BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009

BioRDF Overview and Update By Kei Cheung, Ph.D.

Yale Center for Medical Informatics

C-SHALS 2009, Boston, Massachusetts, February 25, 2009

BioRDF

 Objectives   Enhance the HCLS KB  Increase the value and use of HCLS KB by identifying scientific use case  Work on human-friendly user interface  Document and publish findings to help accelerate/promote adoption of the Semantic Web Participants  Universities, pharmaceutical companies, start-up companies, government institutes, W3C, etc

BioRDF Activities/Tasks

     Invited Talks  UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization, BioSIOC/aTag, VoID HCLS KB  Two instances of HCLS KB have been created  DERI (Virtuoso)   Neurocommons Free University in Berlin (Allegro Graph)  SenseLab and TCMGeneDIT Neuroscience use case  add receptors to the picture aTags  Matthias Samwald, Kei Cheung Query Federation  Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric Prud’hommeaux, Matthias Samwald, Jun Zhao

Brain: Neuron and Synapse

Courtesy of NIDA

aTags

aTags

 Very simple, generic way of expressing biomedical statements  A short snippet of text + a list of ontology terms used for describing the text  Using established vocabulary (SIOC, OBO ontologies)  Encoded in RDFa (easy to embed in existing HTML-based systems)

aTags

Transmitter T seems to activate receptor R Receptor R is expressed in brain region B Region B has strong axonal projections into brain region B2

aTags

 aTags will be created by  conversion of existing biomedical datasets  manual curation of data (highlight text snippet in browser & click on del.icio.us – like bookmarklet)  Design philosophy: simplicity and practicality  Use existing resources  Play along with existing systems (HTML content management, RDFa-enabled search engines)

Query Federation

A Journey to Query Federation: from SPARQL Endpoint to Linked Data

 Application demo  Receptor explorer  Mismatch between Wikipedia and DBpedia  Comparison of Triplestores  Linked Data description and deployment  voiD  FeDeRate  URI

Receptor HCLS KB DERI SenseLa b map DBpedia Receptor s

Receptor Explorer

Genes involved in receptor Publications about gene VectorC Semantic Service Bus Entrez Gene Bio2RDF PubMed Clinical trials referencing publications

App ESB/ SOA

linkedct.org

Clinicaltrials.gov

RDF

Wikipedia PubMed Clinicaltrials.gov

web sites

Copyright 2008 VectorC, LLC

A Semantic Mismatch between Wikipedia and DBpedia Wikipedia DBpedia

Triplestore Comparison

Features

Class Hierarchy Inference Linked Data Deployment Query Federation

Virtuoso

Yes

Allegro Graph

Yes Built-in support Linked Data Spaces (SPARQL against resource URI’s) 3 rd party software (e.g., Pubby) Built-in support (Sesame and Oracle only). For other triplestores, a 3 rd party middleware approach is required.

Federated Query (FeDeRate)

Federated Query FeDeRate Query Mediation Local query 1 Local query 2 DBPedia (RDF) IUPHAR (SQL) Local query n

Federation Scenario

PREFIX db: PREFIX re: PREFIX dp: SELECT ?abstract ?code ?ligand ?hum_seq_id ?chr ?refseq

FROM NAMED db:IUPHAR.prop

FROM NAMED db:DBPedia.rdf

WHERE { # Get info from the (SQL) IUPHAR receptor tables.

GRAPH db:IUPHAR.prop { ?r

?r

?r

re:Code re:Ligand ?code .

?ligand .

re:Human_nucleotide ?hum_seq_id } # Get info from (RDF) DBPedia.

GRAPH db:DBPedia.rdf { ?p

?p

dp:chromosome dp:refseq ?p

?p

dp:symbol db:abstract } ?chr .

?refseq .

?symbol .

?abstract }

Example Join between IUAPHAR & DBPedia (GABAB receptor)

IUPHAR DBPedia

voiD: vocabulary of interlinked Datasets

 Motivation – Effective Dataset Selection – Efficient Discovery of Datasets, by search engines or data publishers – SPARQL query optimisation and query federation • Two high-level concepts – Dataset: a dataset is published and maintained by a single provider and accessible on the Web through de-referenceable URIs or a SPARQL endpoint – Linkset: a subset of a void:Dataset; store triples to express the interlinking relationship between dataset • voiD Vocabulary, http://rdfs.org/ns/void/html • voiD User's Guide, http://rdfs.org/ns/void-guide

Biological Dataset in voiD Format

:senselabontology a void:Dataset ; dcterms:title "SenseLab Neuron Ontology" ; dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB database."; dcterms:license <> ; # TODO foaf:homepage ; void:exampleResource ; void:exampleResource ; void:exampleResource ; dcterms:creator :senselab ; ## this organization can be further defined dcterms:source ; dcterms:subject ; dcterms:subject ; dcterms:subject ; dcterms:subject ; dcterms:source ; void:feature :owl ; ## this technical feature can be further defined void:sparqlEndpoint ; void:vocabulary .

voiD Deployment

 Deploy a voiD file (in either Turtle, RDF/XML or RDFa format) onto the Web server  Make it accessible to search engines, such as Sindice (http://sindice.com/)  Publish a Semantic Sitemap file (sitemap.xml) on the server “...... allows Data publishers to state where documents containing RDF data are located, and to advertise alternative means to access it ......

” [1]  Use the datasetURI property in the sitemap.xml to point to the voiD description of a dataset, e.g., http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology [1] http://sw.deri.org/2007/07/sitemapextension/

URI Issues

    Proliferation of synonymous URI’s   http://dbpedia.org/resource/Dopamine_receptor http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopaminergic_Receptor Potential problems  Performance  Maintenance Possible solutions  Involvement of nomenclature committee (e.g., IUPHAR) and domain authority (e.g., Neuroscience Information Framework or NIF)  Persistent/permanent URI scheme (e.g., PURL)  E.g., http://purl.org/nif/ontology/NIF-Molecule.owl#nifext_5832 Dereferenceable URI’s   A dereferenceable URI is a resource identification mechanism that uses the HTTP protocol to obtain a representation of the resource it identifies For Linked Data, the representation takes the form of an information resource that describes the resource that the URI identifies.

Future Directions

     Submit a paper describing the query federation work to a journal or conference Continue and extend current tasks: Query Federation and aTag Add new tasks  e.g., semantic wiki, workflow, user interface, … Expand the HCLS KB (both instances)  e.g., new datasets such as UMLS Collaborate with other task forces  e.g., LODD (natural alternative use case, Faviki) and SWAN/SIOC

The End