BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009

Download Report

Transcript BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009

BioRDF Overview and Update By Kei Cheung, Ph.D.

Yale Center for Medical Informatics

C-SHALS 2009, Boston, Massachusetts, February 25, 2009


 Objectives   Enhance the HCLS KB  Increase the value and use of HCLS KB by identifying scientific use case  Work on human-friendly user interface  Document and publish findings to help accelerate/promote adoption of the Semantic Web Participants  Universities, pharmaceutical companies, start-up companies, government institutes, W3C, etc

BioRDF Activities/Tasks

     Invited Talks  UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization, BioSIOC/aTag, VoID HCLS KB  Two instances of HCLS KB have been created  DERI (Virtuoso)   Neurocommons Free University in Berlin (Allegro Graph)  SenseLab and TCMGeneDIT Neuroscience use case  add receptors to the picture aTags  Matthias Samwald, Kei Cheung Query Federation  Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric Prud’hommeaux, Matthias Samwald, Jun Zhao

Brain: Neuron and Synapse

Courtesy of NIDA



 Very simple, generic way of expressing biomedical statements  A short snippet of text + a list of ontology terms used for describing the text  Using established vocabulary (SIOC, OBO ontologies)  Encoded in RDFa (easy to embed in existing HTML-based systems)


Transmitter T seems to activate receptor R Receptor R is expressed in brain region B Region B has strong axonal projections into brain region B2


 aTags will be created by  conversion of existing biomedical datasets  manual curation of data (highlight text snippet in browser & click on – like bookmarklet)  Design philosophy: simplicity and practicality  Use existing resources  Play along with existing systems (HTML content management, RDFa-enabled search engines)

Query Federation

A Journey to Query Federation: from SPARQL Endpoint to Linked Data

 Application demo  Receptor explorer  Mismatch between Wikipedia and DBpedia  Comparison of Triplestores  Linked Data description and deployment  voiD  FeDeRate  URI

Receptor HCLS KB DERI SenseLa b map DBpedia Receptor s

Receptor Explorer

Genes involved in receptor Publications about gene VectorC Semantic Service Bus Entrez Gene Bio2RDF PubMed Clinical trials referencing publications



Wikipedia PubMed

web sites

Copyright 2008 VectorC, LLC

A Semantic Mismatch between Wikipedia and DBpedia Wikipedia DBpedia

Triplestore Comparison


Class Hierarchy Inference Linked Data Deployment Query Federation



Allegro Graph

Yes Built-in support Linked Data Spaces (SPARQL against resource URI’s) 3 rd party software (e.g., Pubby) Built-in support (Sesame and Oracle only). For other triplestores, a 3 rd party middleware approach is required.

Federated Query (FeDeRate)

Federated Query FeDeRate Query Mediation Local query 1 Local query 2 DBPedia (RDF) IUPHAR (SQL) Local query n

Federation Scenario

PREFIX db: PREFIX re: PREFIX dp: SELECT ?abstract ?code ?ligand ?hum_seq_id ?chr ?refseq


FROM NAMED db:DBPedia.rdf

WHERE { # Get info from the (SQL) IUPHAR receptor tables.

GRAPH db:IUPHAR.prop { ?r



re:Code re:Ligand ?code .

?ligand .

re:Human_nucleotide ?hum_seq_id } # Get info from (RDF) DBPedia.

GRAPH db:DBPedia.rdf { ?p


dp:chromosome dp:refseq ?p


dp:symbol db:abstract } ?chr .

?refseq .

?symbol .

?abstract }

Example Join between IUAPHAR & DBPedia (GABAB receptor)


voiD: vocabulary of interlinked Datasets

 Motivation – Effective Dataset Selection – Efficient Discovery of Datasets, by search engines or data publishers – SPARQL query optimisation and query federation • Two high-level concepts – Dataset: a dataset is published and maintained by a single provider and accessible on the Web through de-referenceable URIs or a SPARQL endpoint – Linkset: a subset of a void:Dataset; store triples to express the interlinking relationship between dataset • voiD Vocabulary, • voiD User's Guide,

Biological Dataset in voiD Format

:senselabontology a void:Dataset ; dcterms:title "SenseLab Neuron Ontology" ; dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB database."; dcterms:license <> ; # TODO foaf:homepage ; void:exampleResource ; void:exampleResource ; void:exampleResource ; dcterms:creator :senselab ; ## this organization can be further defined dcterms:source ; dcterms:subject ; dcterms:subject ; dcterms:subject ; dcterms:subject ; dcterms:source ; void:feature :owl ; ## this technical feature can be further defined void:sparqlEndpoint ; void:vocabulary .

voiD Deployment

 Deploy a voiD file (in either Turtle, RDF/XML or RDFa format) onto the Web server  Make it accessible to search engines, such as Sindice (  Publish a Semantic Sitemap file (sitemap.xml) on the server “...... allows Data publishers to state where documents containing RDF data are located, and to advertise alternative means to access it ......

” [1]  Use the datasetURI property in the sitemap.xml to point to the voiD description of a dataset, e.g., [1]

URI Issues

    Proliferation of synonymous URI’s   Potential problems  Performance  Maintenance Possible solutions  Involvement of nomenclature committee (e.g., IUPHAR) and domain authority (e.g., Neuroscience Information Framework or NIF)  Persistent/permanent URI scheme (e.g., PURL)  E.g., Dereferenceable URI’s   A dereferenceable URI is a resource identification mechanism that uses the HTTP protocol to obtain a representation of the resource it identifies For Linked Data, the representation takes the form of an information resource that describes the resource that the URI identifies.

Future Directions

     Submit a paper describing the query federation work to a journal or conference Continue and extend current tasks: Query Federation and aTag Add new tasks  e.g., semantic wiki, workflow, user interface, … Expand the HCLS KB (both instances)  e.g., new datasets such as UMLS Collaborate with other task forces  e.g., LODD (natural alternative use case, Faviki) and SWAN/SIOC

The End