Provenir ontology: Towards a Framework for eScience

Download Report

Transcript Provenir ontology: Towards a Framework for eScience

Provenir ontology: Towards a Framework for eScience Provenance Management

Satya S. Sahoo, Amit P. Sheth Kno.e.sis Center, Wright State University

Microsoft eScience Workshop 2009 Pittsburgh, Oct 16

Outline

• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

Provenance in GlycoProtein Analysis

Cell Culture

extract

Glycoprotein Fraction ?

Proteolytic

proteolysis

enzyme Glycopeptides Fraction

1 n Separation technique I

Glycopeptides Fraction

PNGase n

Peptide Fraction

Separation technique II n*m

Peptide Fraction

Mass spectrometry

ms data ms peaklist

Data reduction binning

Parent protein and peptide list N-dimensional array

Signal integration Data correlation

ms/ms data

Data reduction

ms/ms peaklist

Peptide identification

Peptide list

Provenance in Parasite Research

Drug Resistant Plasmid

T.Cruzi

sample Gene Name Sequence Extraction 3‘ & 5’ Region Plasmid Construction Knockout Construct Plasmid Transfection Transfected Sample Drug Selection Selected Sample Cell Cloning Cloned Sample Gene Knockout and Strain Creation * • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity , Process Quality , and Trust • Issues in Provenance Management  Interoperability Cloned Sample  Consistent Modeling  Reduce Terminological Heterogeneity * T.cruzi Semantic Problem Solving Environment Project , Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

Outline

• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

Ontologies for Provenance Modeling

• Advantages of using Ontologies  Formal Description: Machine Readability, Consistent Interpretation  Use Reasoning: Knowledge Discovery over Large Datasets • Problem : A gigantic, monolithic Provenance Ontology! – not feasible • Solution : Modular Approach using a Foundational Ontology FOUNDATIONAL ONTOLOGY PARASITE EXPERIMENT GLYCOPROTEIN EXPERIMENT OCEANOGRAPHY

Outline

• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

Provenir Ontology

has_agent AGENT PROCESS DATA Drug Resistant Plasmid Transfection Machine Gene Name Sequence Extraction 3‘ & 5’ Region Plasmid Construction Knockout Construct Plasmid

T.Cruzi

sample Transfection Transfected Sample Drug Selection Selected Sample Cell Cloning Cloned Sample

Provenir Ontology Schema

AGENT SPATIAL THEMATIC is_a is_a DATA COLLECTION TEMPORAL is_a PARAMETER is_a is_a DATA has_agent PROCESS

Domain-specific Provenance: Parasite Experiment ontology

has_agent process has_participant agent is_a data_collection is_a data spatial_parameter is_a parameter is_a PROVENIR ONTOLOGY is_a temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a is_a drug_selection is_a transfection strain_creation_ protocol cell_cloning sample has_participant Tcruzi_sample is_a transfection_buffer Time:DateTime Descritption has_parameter PARASITE EXPERIMENT ONTOLOGY *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

Trident Ontology for Oceanography

Outline

• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

Provenance Query Classification

Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata o Example: Which

gene 65

?

was used create the cloned sample with ID =

• Type 2: Querying for Specific Data Set o Example: Find all

knockout construct plasmids created by researcher Michelle using “ Hygromycin ” drug resistant plasmid between April 25, 2008 and August 15, 2008

• Type 3: Operations on Provenance Metadata o Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information

Provenance Query Operators

Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () definition adapt the RDF graph equivalence • provenance_merge () Two sets of provenance information are combined using the RDF graph merge

Provenance Query Engine Architecture • • Available as API for integration with provenance management systems Input: o Type of provenance query operator :

provenance ()

o o Input value to query operator:

cloned sample 65

User details to connect to underlying Oracle RDF store TRANSITIVE CLOSURE QUERY OPTIMIZER

Outline

• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research

T.cruzi SPSE Provenance Management System

Conclusions

• Provenir ontology as a foundational model for provenance • Extensible to model domain-specific provenance  Parasite Experiment ontology  Trident ontology  ProPreO ontology • Query Infrastructure to support provenance modeled using Provenir ontology • Application in a NIH-funded project for Parasite Research

Acknowledgement

• Roger Barga – Microsoft Research, eScience • D. Brent Weatherly – Center for Tropical and Emerging Diseases, University of Georgia • Flora Logan – The Wellcome Trust Sanger Institute, Cambridge, UK • Raghava Mutharaju – Kno.e.sis Center, Wright State University • Pramod Anantharam - Kno.e.sis Center, Wright State University

References

• Provenir ontology: http://wiki.knoesis.org/index.php/Provenir_Ontology • Provenance Management in Parasite Research: http://knoesis.wright.edu/library/resource.php?id=00712 • Provenance Management Framework: http://knoesis.wright.edu/research/semsci/application_domain /sem_prov/ • T.cruzi Semantic Problem Solving Environment: http://knoesis.wright.edu/research/semsci/application_domain /sem_life_sci/tcruzi_pse/