Transcript Document

New Developments in Kepler

January, 23, 2006 Ilkay Altintas SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Kepler System Architecture

Authentication GUI Kepler Object Manager …Kepler GUI Extensions… Vergil Documentation SMS Actor&Data SEARCH Type System Ext Smart Re-run / Failure Recovery Provenance Framework Kepler Core Extensions Ptolemy SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Joint Authentication Framework

• Requirements: – Coordinating between the different security architectures • GEON uses GAMA which requires a single certificate authority.

• SEEK uses LDAP with has a centralized certificate authority with distributed subordinate Cas – To connect LDAP with GAMA – Coordinating between 2 different GAMA servers – Single sign-on/authentication at the initialize step of the run for multiple actors that are using authentication • This has issues related to single GAMA repository vs multiple, and requires users to have accounts on all servers.

• Kepler needs to be able to handle expired certificates for long-running workflows and/or for users who use it for a long time.

• A trust relation between the different GAMA servers must be established in order to allow for single authentication.

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Functional Prototype Completed

APIs and tests cases

place in • More work required on

certificate renewal

and

multiple server access SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Vergil is the GUI for Kepler Actor Search Data Search

• Actor ontology and semantic search for actors • Search -> Drag and drop -> Link via ports • Metadata-based search for datasets

SAN DIEGO SUPERCOMPUTER CENTER Scientific Workflow Automation Technologies UCSD Ilkay Altintas

Actor Search

Challenges:

– Building/searching a repository … – Making changes to MoML (see KAR) – GUI changes – Ontology management •

Kepler Actor Ontology

• Used in searching actors and creating conceptual views (= folders)

Currently 160 Kepler actors added!

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Data Search and Usage of Results

Kepler DataGrid

– Discovery of data resources through local and remote services

SRB, Grid and Web Services, Db connections

– Registry of datasets on the fly using workflows

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Vergil Updates

• To make it more useful to the user – Updated actor icons – Menu redesign • Improve readability • Develop cohesive visual language • Follow standard HF principles • Improve organization Composite DB Query Computation or Operation Transformation Filter File Operation Web Service

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Kepler Archives

• • • • • Purpose

: Encapsulate WF data and actors in an archive file

– … inlined or by reference – … version control • More robust workflow exchange • Easy management of semantic annotations • Plug-in architecture (Drop in and use) • Easy documentation updates

A jar-like archive file (.kar) including a manifest All entities have unique ids (LSID) Custom object manager and class loader UI and API to create, define, search and load .kar files SAN DIEGO SUPERCOMPUTER CENTER Scientific Workflow Automation Technologies UCSD Ilkay Altintas

KAR File Example

value="ptolemy.actor.lib.MultiplyDivide" value="urn:lsid:localhost:class:955:1 class="ptolemy.kernel.util.StringAttribute"/> name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/>

Ilkay Altintas Scientific Workflow Automation Technologies

Kepler Object Manager

• Designed to access local and distributed objects • Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files • Advantages: – Reduce the size of Kepler distribution • Only ship the core set of generic actors and domains – Easy exchange of full or partial workflows for collaborations – Publish full workflows with their bound data • Becomes a provenance system for derived data objects => Separate workflow repository and distributions easily

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Initial Work on Provenance Framework

• Provenance – Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) • Need for Provenance – Association of process and results – reproduce results – “explain & debug” results (via lineage tracing, parameter settings, …) – optimize: “Smart Re-Runs” • Types of Provenance Information

:

– Data provenance • Intermediate and end results including files and db references – Process (=workflow instance) provenance • Keep the wf definition with data and parameters used in the run – Error and execution logs – Workflow design provenance (quite different) • WF design is a (little supported) process (art, magic, …) • for free via cvs: edit history • need more “structure” (e.g. templates) for individual & collaborative workflow design

SAN DIEGO SUPERCOMPUTER CENTER Scientific Workflow Automation Technologies UCSD Ilkay Altintas

Kepler Provenance Recording Utility

• Parametric and customizable – Different report formats – Variable levels of detail • Verbose-all, verbose-some, medium, on error – Multiple cache destinations • Saves information on – User name, Date, Run, etc…

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Provenance: Possible Next Steps

• Provenance Meeting: Last week at SDSC

– Deciding on terms and definitions – .kar file generation, registration and search for provenance information – Possible data/metadata formats – Automatic report generation from accumulated data – A GUI to keep track of the changes – Adding provenance repositories – A relational schema for the provenance info in addition to the existing XML

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

What other system functions does provenance relate to?

• Failure recovery • Smart re-runs • Semantic extensions

Re-run only the updated/failed parts

• Kepler Data Grid • Reporting and Documentation

Guided documentation generation an updates

• Authentication • Data registration

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies

Hot Topics in Kepler

http://kepler-project.org/Wiki.jsp?page=HotTopics

SAN DIEGO SUPERCOMPUTER CENTER UCSD Ilkay Altintas Scientific Workflow Automation Technologies