Workflows for Social Science
Download
Report
Transcript Workflows for Social Science
Workflows for Social Science
Ken Turner
Computing Science and Mathematics
www.cs.stir.ac.uk/cress
31st January 2012
Workflows in Social Science
low-level (micro) flows are sequences of
steps using some statistical package, e.g.:
• retrieve datasets D1 and D2
• recode variable V1
• cross-tabulate V1 and V2
high-level (macro) flows combine the
capabilities of separate services, e.g.:
data
data
data
data
retrieval
cleaning
fusion
analysis
High-Level Workflows in DAMES
an approach has been developed for highlevel workflows in social science:
the services are external, being packages that
conform to web/grid computing standards
the workflow logic is defined graphically
this is automatically analysed, and translated
into BPEL (Business Process Execution Logic)
the supporting tools are:
CRESS: workflow definition and translation
ActiveBPEL: workflow orchestration
Statistical Analysis Services
services appearing in workflows can be
supported by statistical packages:
a ‘syntax file’ (R, Stata, …) is mapped to a web
service (with a little help)
services to call these are automatically
generated
an overall workflow using these services can be
defined and uploaded to the DAMES portal
this encourages:
modularity and re-use of analyses
flexible combination of statistical scripts
CRESS
Communication Representation Employing
Systematic Specification:
graphical workflow notation
application/language/platform-independent
automated analysis and implementation
mature, having been developed over 14 years
supported by other packages:
CHIVE: graphical workflow editor
MUSTARD: workflow validator
CLOVE: workflow verifier
MINT: performance analyser
CRESS Methodology
automatic
specification
Workflow
Diagram
Implementation
Code
Precise
Specification
scenario
evaluation
validation/
verification
Rigorous
Analysis
automatic
compilation
design
corrections
Performance
Analysis
CRESS Example
the following example illustrates mapping
one occupation to two different schemes
only an outline is given, omitting the details
the cooperating services are:
lookup: performs parallel mapping (workflow)
allocator: finds an available job mapper then
does the mapping (workflow)
factory: manages mapper resources (partner)
mapper: performs a mapping for some scheme
(partner)
Parallel Job Translation
1 Receive
lookup.job.translate
schemes
2 Fork
4 Invoke
allocator.job.translate
mapping2 code2
3 Invoke
allocator.job.translate
mapping1 code1
5 Join
6 Reply
lookup.job.translate
codes
Job Mapper Allocation
1 Receive
allocator.job.translate
mapping
2 Invoke
factory.job.allocator
scheme mapper
3 Invoke
mapper.job.translate
job mapping
4 Reply
allocator.job.translate
mapping
Summary
low-level workflows define the sequence of
basic steps in a statistical package
high-level workflows invoke external
analysis services and combine their results
workflows can use scripts for various
statistical packages mapped to services
CRESS allows high-level workflows to be
defined, analysed and executed