Workflows for Social Science

Download Report

Transcript Workflows for Social Science

Workflows for Social Science
Ken Turner
Computing Science and Mathematics
www.cs.stir.ac.uk/cress
31st January 2012
Workflows in Social Science

low-level (micro) flows are sequences of
steps using some statistical package, e.g.:
• retrieve datasets D1 and D2
• recode variable V1
• cross-tabulate V1 and V2

high-level (macro) flows combine the
capabilities of separate services, e.g.:




data
data
data
data
retrieval
cleaning
fusion
analysis
High-Level Workflows in DAMES

an approach has been developed for highlevel workflows in social science:




the services are external, being packages that
conform to web/grid computing standards
the workflow logic is defined graphically
this is automatically analysed, and translated
into BPEL (Business Process Execution Logic)
the supporting tools are:


CRESS: workflow definition and translation
ActiveBPEL: workflow orchestration
Statistical Analysis Services

services appearing in workflows can be
supported by statistical packages:




a ‘syntax file’ (R, Stata, …) is mapped to a web
service (with a little help)
services to call these are automatically
generated
an overall workflow using these services can be
defined and uploaded to the DAMES portal
this encourages:


modularity and re-use of analyses
flexible combination of statistical scripts
CRESS

Communication Representation Employing
Systematic Specification:





graphical workflow notation
application/language/platform-independent
automated analysis and implementation
mature, having been developed over 14 years
supported by other packages:




CHIVE: graphical workflow editor
MUSTARD: workflow validator
CLOVE: workflow verifier
MINT: performance analyser
CRESS Methodology
automatic
specification
Workflow
Diagram
Implementation
Code
Precise
Specification
scenario
evaluation
validation/
verification
Rigorous
Analysis
automatic
compilation
design
corrections
Performance
Analysis
CRESS Example



the following example illustrates mapping
one occupation to two different schemes
only an outline is given, omitting the details
the cooperating services are:




lookup: performs parallel mapping (workflow)
allocator: finds an available job mapper then
does the mapping (workflow)
factory: manages mapper resources (partner)
mapper: performs a mapping for some scheme
(partner)
Parallel Job Translation
1 Receive
lookup.job.translate
schemes
2 Fork
4 Invoke
allocator.job.translate
mapping2 code2
3 Invoke
allocator.job.translate
mapping1 code1
5 Join
6 Reply
lookup.job.translate
codes
Job Mapper Allocation
1 Receive
allocator.job.translate
mapping
2 Invoke
factory.job.allocator
scheme mapper
3 Invoke
mapper.job.translate
job mapping
4 Reply
allocator.job.translate
mapping
Summary




low-level workflows define the sequence of
basic steps in a statistical package
high-level workflows invoke external
analysis services and combine their results
workflows can use scripts for various
statistical packages mapped to services
CRESS allows high-level workflows to be
defined, analysed and executed