BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009
Download ReportTranscript BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009
BioRDF Overview and Update By Kei Cheung, Ph.D.
Yale Center for Medical Informatics
C-SHALS 2009, Boston, Massachusetts, February 25, 2009
BioRDF
Objectives Enhance the HCLS KB Increase the value and use of HCLS KB by identifying scientific use case Work on human-friendly user interface Document and publish findings to help accelerate/promote adoption of the Semantic Web Participants Universities, pharmaceutical companies, start-up companies, government institutes, W3C, etc
BioRDF Activities/Tasks
Invited Talks UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization, BioSIOC/aTag, VoID HCLS KB Two instances of HCLS KB have been created DERI (Virtuoso) Neurocommons Free University in Berlin (Allegro Graph) SenseLab and TCMGeneDIT Neuroscience use case add receptors to the picture aTags Matthias Samwald, Kei Cheung Query Federation Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric Prud’hommeaux, Matthias Samwald, Jun Zhao
Brain: Neuron and Synapse
Courtesy of NIDA
aTags
aTags
Very simple, generic way of expressing biomedical statements A short snippet of text + a list of ontology terms used for describing the text Using established vocabulary (SIOC, OBO ontologies) Encoded in RDFa (easy to embed in existing HTML-based systems)
aTags
Transmitter T seems to activate receptor R Receptor R is expressed in brain region B Region B has strong axonal projections into brain region B2
aTags
aTags will be created by conversion of existing biomedical datasets manual curation of data (highlight text snippet in browser & click on del.icio.us – like bookmarklet) Design philosophy: simplicity and practicality Use existing resources Play along with existing systems (HTML content management, RDFa-enabled search engines)
Query Federation
A Journey to Query Federation: from SPARQL Endpoint to Linked Data
Application demo Receptor explorer Mismatch between Wikipedia and DBpedia Comparison of Triplestores Linked Data description and deployment voiD FeDeRate URI
Receptor HCLS KB DERI SenseLa b map DBpedia Receptor s
Receptor Explorer
Genes involved in receptor Publications about gene VectorC Semantic Service Bus Entrez Gene Bio2RDF PubMed Clinical trials referencing publications
App ESB/ SOA
linkedct.org
Clinicaltrials.gov
RDF
Wikipedia PubMed Clinicaltrials.gov
web sites
Copyright 2008 VectorC, LLC
A Semantic Mismatch between Wikipedia and DBpedia Wikipedia DBpedia
Triplestore Comparison
Features
Class Hierarchy Inference Linked Data Deployment Query Federation
Virtuoso
Yes
Allegro Graph
Yes Built-in support Linked Data Spaces (SPARQL against resource URI’s) 3 rd party software (e.g., Pubby) Built-in support (Sesame and Oracle only). For other triplestores, a 3 rd party middleware approach is required.
Federated Query (FeDeRate)
Federated Query FeDeRate Query Mediation Local query 1 Local query 2 DBPedia (RDF) IUPHAR (SQL) Local query n
Federation Scenario
PREFIX db:
FROM NAMED db:IUPHAR.prop
FROM NAMED db:DBPedia.rdf
WHERE { # Get info from the (SQL) IUPHAR receptor tables.
GRAPH db:IUPHAR.prop { ?r
?r
?r
re:Code re:Ligand ?code .
?ligand .
re:Human_nucleotide ?hum_seq_id } # Get info from (RDF) DBPedia.
GRAPH db:DBPedia.rdf { ?p
?p
dp:chromosome dp:refseq ?p
?p
dp:symbol db:abstract } ?chr .
?refseq .
?symbol .
?abstract }
Example Join between IUAPHAR & DBPedia (GABAB receptor)
IUPHAR DBPedia
voiD: vocabulary of interlinked Datasets
Motivation – Effective Dataset Selection – Efficient Discovery of Datasets, by search engines or data publishers – SPARQL query optimisation and query federation • Two high-level concepts – Dataset: a dataset is published and maintained by a single provider and accessible on the Web through de-referenceable URIs or a SPARQL endpoint – Linkset: a subset of a void:Dataset; store triples to express the interlinking relationship between dataset • voiD Vocabulary, http://rdfs.org/ns/void/html • voiD User's Guide, http://rdfs.org/ns/void-guide
Biological Dataset in voiD Format
:senselabontology a void:Dataset ; dcterms:title "SenseLab Neuron Ontology" ; dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB database."; dcterms:license <> ; # TODO foaf:homepage
voiD Deployment
Deploy a voiD file (in either Turtle, RDF/XML or RDFa format) onto the Web server Make it accessible to search engines, such as Sindice (http://sindice.com/) Publish a Semantic Sitemap file (sitemap.xml) on the server “...... allows Data publishers to state where documents containing RDF data are located, and to advertise alternative means to access it ......
” [1] Use the datasetURI property in the sitemap.xml to point to the voiD description of a dataset, e.g., http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology [1] http://sw.deri.org/2007/07/sitemapextension/
URI Issues
Proliferation of synonymous URI’s http://dbpedia.org/resource/Dopamine_receptor http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopaminergic_Receptor Potential problems Performance Maintenance Possible solutions Involvement of nomenclature committee (e.g., IUPHAR) and domain authority (e.g., Neuroscience Information Framework or NIF) Persistent/permanent URI scheme (e.g., PURL) E.g., http://purl.org/nif/ontology/NIF-Molecule.owl#nifext_5832 Dereferenceable URI’s A dereferenceable URI is a resource identification mechanism that uses the HTTP protocol to obtain a representation of the resource it identifies For Linked Data, the representation takes the form of an information resource that describes the resource that the URI identifies.
Future Directions
Submit a paper describing the query federation work to a journal or conference Continue and extend current tasks: Query Federation and aTag Add new tasks e.g., semantic wiki, workflow, user interface, … Expand the HCLS KB (both instances) e.g., new datasets such as UMLS Collaborate with other task forces e.g., LODD (natural alternative use case, Faviki) and SWAN/SIOC