An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.

Download Report

Transcript An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.

An Introduction to Designing and
Executing Workflows with Taverna
Katy Wolstencroft
University of Manchester


This tutorial will give you a basic introduction to
designing, and reusing workflows in Taverna and some
of its main features.
Workflows in this practical use small data-sets and are
designed to run in a few minutes. In the real world, you
would be using larger data sets and workflows would
typically run for longer
Exercise 1: Exploring the Workbench



Taverna can be downloaded from
http://www.taverna.org.uk/
Go to the page and find the latest (2.4)
Follow the instructions on the website to install Taverna
for your operating system (this is a simple one-click
install for windows and Mac. For Linux, you may also
need the GraphViz program. Follow the link on the
Taverna download page if so)
The following page shows a screenshot of Taverna and
the different panels that make up the workbench
Taverna Workbench
Services Panel
Workflow
Explorer
Workflow
Diagram
1. Workflow Diagram



The workflow diagram is the visual representation of the
workflow, it:
Shows inputs, outputs, services and data flows
Allows editing of the workflow by dragging and dropping
and connecting services together
Enables saving of workflow diagrams for publishing and
sharing
1. Workflow Explorer


The Workflow Explorer shows the detailed view of your
workflow. It shows default values and descriptions for
service inputs and outputs and it shows where remote
services are located. It also shows configuration details,
such as iteration and looping
Workflow validation details can also be found here.
Before a workflow is run, Taverna checks to see if it is
connected correctly and if its services are available.
1. Available Services Panel
Lists services available by default in Taverna







Local java services
WSDL Web Service – secure and public
RESTful Services
R Processor services (for statistical analyses)
Beanshell scripts
Xpath scripts
Spreadsheet import service
The services panel also allows you to add new services
or workflows from the web or from file systems – there
are loads more available!
Exercise 2: Building a Simple Workflow
We will start with something easy - retrieving a protein
sequence from a remote database and identifying
functional motifs
Go to the Services Panel
 Type ‘Fasta’ into the ‘search’ box at the top of the panel
 You will see several services in the search results
Select ‘Get Protein FASTA’ and drag-and-drop it into the
workflow diagram panel.
Exercise 2: Building a Simple Workflow



In a blank space in the workflow
diagram, right-click and select
“Workflow input port” from the “Insert”
section
Type in a name for this input (e.g. ID)
and click “ok”
Do the same to create a new workflow
output. Call this output “sequence”
Exercise 2: Building a Simple Workflow


You now have 3 boxes in the diagram and we need to
connect them up
Click on the input box and drag towards “Get Protein
Fasta” and let go. An arrow will connect the two boxes
Exercise 2: Building a Simple Workflow

Click on the output box, drag
towards “Get protein fasta”,
and let go. An arrow will
connect the two boxes

You have now built your first
workflow!

It should look something like
this
Exercise 2: Building a Simple Workflow

Run the workflow by selecting “file -> run workflow”, or by
clicking on the play button at the top of the workbench
Exercise 2: Building a Simple Workflow
An input window will appear. As you can see, we have not yet
added a description of the workflow or of the input
Click on ‘Set Value’ in the input window and add a Uniprot protein
identifier (e.g. P15409) where it says “some input data goes here”
Exercise 2: Building a Simple Workflow


Click “run workflow”
In the bottom left of the results window, click on the results.
You will now see a protein sequence from Uniprot
Now we will find out what functional motifs the protein
contains, but first we have to tell Taverna about some new
services
Exercise 2: Adding New Services


Go to the services panel in
Taverna and click “import
new services”. For each
type of service, you are
given the option to add a
new service
Select ‘Soaplab service…’
A window will pop-up
asking for a web address
Exercise 2: Adding New Services

Enter the address for the
Soaplab services- it is at
http://wsembnet.vitalit.ch/soaplab2/services

Scroll down the Services
list and look at the new
Soaplab services that are
now included.
Exercise 2: Building a Simple Workflow


In the services panel, search for pscan – it should be in the
Soaplab services you just added
Drag and drop this service onto the workflow diagram
Exercise 3: Adding more Services


We can connect the two services together in the same way
as before
At the top of the workflow diagram panel, change the view
to show all ports by clicking on the icon shown below
Show all ports icon

This view allows you to see any data input/output or
parameter value options for your chosen service
Exercise 3: Adding more Services

As you can see, pscan has a lot more ports.
Most of the time, you don’t need to connect all ports. Some
are optional and some already have default values set.
Service documentation should tell you this. You can use
the BioCatalogue to find documentation and user
descriptions
Change the orientation of the port names to fit them on the
screen more easily by clicking on the icon shown below
change orientation
Exercise 3: Adding more Services


Connect ‘output_text’ from the ‘Get_protein_Fasta’ service
to the ‘sequence_direct_data’ input of pscan
Also, create a new workflow output called pscanOut and
connect it to ‘pscan -> outfile’
3: Adding a Workflow Description






Right-click on a blank part of the workflow diagram and
select “Annotate”
Add some details about the workflow e.g. who is the
author, what does it do
You can also add examples and descriptions for the
workflow inputs by selecting them and selecting
“Annotate”
Add an example for the protein ID (e.g. P15409)
Save the workflow by going to “File -> save workflow”
Run the workflow again and look at the results
4: Using REST Services




The services we have used up until now have been
Soaplab services, but Taverna can also run WSDL and
RESTful services
Go to the Service Catalogue tab of Taverna and search
for dbfetch
From the REST Service results, select GET
/dbfetch/{db}/{id}
Right-click on the service and select “Add to Service
Panel”
4: Using REST Services

Searching the service catalogue
4: Using REST Services



In the services search panel in Taverna, search for
dbfetch
Right-click on the service and choose “Add to workflow
with name…”
Enter a name such as “dbfetch” and click OK
4: Using REST Services

As you can see, the items from the dbfetch template
become inputs in Taverna.
4: Using REST Services



You can also enter the template directly
Right-click on an empty area of the workflow and select
“REST” from the “Insert” section
Enter the template and click OK
4: Using REST Services








For this service, we need to supply a database name
and a protein ID.
Connect the protein ID input to the REST service ID
input port
Right-click on the ‘db’ input port on the REST service
and select ‘constant value’.
Add the constant value ‘uniprotkb’ and click “OK”
Add a workflow output port and connect it to the REST
‘response body’ output port
Your workflow should look something like the one on the
next slide
Save and run your workflow
Now your results will include the uniprot entry for your
protein
4: Using REST Services