Transcript Provenir ontology: Towards a Framework for eScience
Provenir ontology: Towards a Framework for eScience Provenance Management
Satya S. Sahoo, Amit P. Sheth Kno.e.sis Center, Wright State University
Microsoft eScience Workshop 2009 Pittsburgh, Oct 16
Outline
• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenance in GlycoProtein Analysis
Cell Culture
extract
Glycoprotein Fraction ?
Proteolytic
proteolysis
enzyme Glycopeptides Fraction
1 n Separation technique I
Glycopeptides Fraction
PNGase n
Peptide Fraction
Separation technique II n*m
Peptide Fraction
Mass spectrometry
ms data ms peaklist
Data reduction binning
Parent protein and peptide list N-dimensional array
Signal integration Data correlation
ms/ms data
Data reduction
ms/ms peaklist
Peptide identification
Peptide list
Provenance in Parasite Research
Drug Resistant Plasmid
T.Cruzi
sample Gene Name Sequence Extraction 3‘ & 5’ Region Plasmid Construction Knockout Construct Plasmid Transfection Transfected Sample Drug Selection Selected Sample Cell Cloning Cloned Sample Gene Knockout and Strain Creation * • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity , Process Quality , and Trust • Issues in Provenance Management Interoperability Cloned Sample Consistent Modeling Reduce Terminological Heterogeneity * T.cruzi Semantic Problem Solving Environment Project , Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Outline
• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Ontologies for Provenance Modeling
• Advantages of using Ontologies Formal Description: Machine Readability, Consistent Interpretation Use Reasoning: Knowledge Discovery over Large Datasets • Problem : A gigantic, monolithic Provenance Ontology! – not feasible • Solution : Modular Approach using a Foundational Ontology FOUNDATIONAL ONTOLOGY PARASITE EXPERIMENT GLYCOPROTEIN EXPERIMENT OCEANOGRAPHY
Outline
• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenir Ontology
has_agent AGENT PROCESS DATA Drug Resistant Plasmid Transfection Machine Gene Name Sequence Extraction 3‘ & 5’ Region Plasmid Construction Knockout Construct Plasmid
T.Cruzi
sample Transfection Transfected Sample Drug Selection Selected Sample Cell Cloning Cloned Sample
Provenir Ontology Schema
AGENT SPATIAL THEMATIC is_a is_a DATA COLLECTION TEMPORAL is_a PARAMETER is_a is_a DATA has_agent PROCESS
Domain-specific Provenance: Parasite Experiment ontology
has_agent process has_participant agent is_a data_collection is_a data spatial_parameter is_a parameter is_a PROVENIR ONTOLOGY is_a temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a is_a drug_selection is_a transfection strain_creation_ protocol cell_cloning sample has_participant Tcruzi_sample is_a transfection_buffer Time:DateTime Descritption has_parameter PARASITE EXPERIMENT ONTOLOGY *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia
Trident Ontology for Oceanography
Outline
• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenance Query Classification
Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata o Example: Which
gene 65
?
was used create the cloned sample with ID =
• Type 2: Querying for Specific Data Set o Example: Find all
knockout construct plasmids created by researcher Michelle using “ Hygromycin ” drug resistant plasmid between April 25, 2008 and August 15, 2008
• Type 3: Operations on Provenance Metadata o Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information
Provenance Query Operators
Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () definition adapt the RDF graph equivalence • provenance_merge () Two sets of provenance information are combined using the RDF graph merge
Provenance Query Engine Architecture • • Available as API for integration with provenance management systems Input: o Type of provenance query operator :
provenance ()
o o Input value to query operator:
cloned sample 65
User details to connect to underlying Oracle RDF store TRANSITIVE CLOSURE QUERY OPTIMIZER
Outline
• Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
T.cruzi SPSE Provenance Management System
Conclusions
• Provenir ontology as a foundational model for provenance • Extensible to model domain-specific provenance Parasite Experiment ontology Trident ontology ProPreO ontology • Query Infrastructure to support provenance modeled using Provenir ontology • Application in a NIH-funded project for Parasite Research
Acknowledgement
• Roger Barga – Microsoft Research, eScience • D. Brent Weatherly – Center for Tropical and Emerging Diseases, University of Georgia • Flora Logan – The Wellcome Trust Sanger Institute, Cambridge, UK • Raghava Mutharaju – Kno.e.sis Center, Wright State University • Pramod Anantharam - Kno.e.sis Center, Wright State University
References
• Provenir ontology: http://wiki.knoesis.org/index.php/Provenir_Ontology • Provenance Management in Parasite Research: http://knoesis.wright.edu/library/resource.php?id=00712 • Provenance Management Framework: http://knoesis.wright.edu/research/semsci/application_domain /sem_prov/ • T.cruzi Semantic Problem Solving Environment: http://knoesis.wright.edu/research/semsci/application_domain /sem_life_sci/tcruzi_pse/