Enabling Science with Grid Technology

Download Report

Transcript Enabling Science with Grid Technology

Daresbury Laboratory
Enabling Science
with Grid
Technology
Jamie Rintelman, Kerstin Kleese-Van
Dam, Rik Tyer
STFC-Daresbury Laboratory; Daresbury,
Cheshire, UK
Daresbury Laboratory
Who am I?
STFC Daresbury laboratory
– eScience Dept - Computational Science
and Engineering Dept Liaison
Chemist
– Specializing in Quantum Chemistry,
electronic structure theory, GAMESS
Daresbury Laboratory
•Traditional Way of Working
•eMinerals Grid Computing
Framework
– Background on eMinerals Program
– Grid Computing Framework
• Input preparation
• Monty - bulk job submission
• RMCS - integrated compute/data/metadata
framework
• Rgem - analysis of results
•Scientific Examples
– QDGA
– BTG
– eMinerals
Daresbury Laboratory
Traditional Way of Working I
Comp01
Files on local
computer
Comp03
Comp02
Check for available
remote resource
Comp04
Daresbury Laboratory
Traditional Way of Working I
Comp01
Files on local
computer
Comp03
Comp02
Check for available
remote resource
Comp04
Daresbury Laboratory
Traditional Way of Working I
Comp01
Comp03
SCP files over
Files on local
computer
Daresbury Laboratory
Traditional Way of Working I
Comp01
Comp03
Output files on
remote
resource
Run job
Daresbury Laboratory
Traditional Way of Working I
Comp01
Comp03
Output files on
remote
resource
Check on progress
(many times?)
Daresbury Laboratory
Traditional Way of Working I
Comp01
Comp03
SCP files back
Output files on
remote
resource
Daresbury Laboratory
Traditional Way of Working I
Comp01
Comp03
Files on local
computer
Daresbury Laboratory
Traditional Way of Working II
Collaborator asks for files
Find files
Place them on ftp
server, or put in
email
Email collaborator with files
or with location of files
Daresbury Laboratory
eMinerals / RMCS
framework
A new way of working
Daresbury Laboratory
eMinerals
Project
•NERC funded
•Collaborators throughout the
UK
•Pragmatic approach to
development = Science Driven
Daresbury Laboratory
The eMinerals team
Q u ic kT im e™
a n d a
T I F F ( U n co m p r es s e d)
de c om pr e s so r
a re
ne e d ed
to
se e
t h si p ic t u r e.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Daresbury Laboratory
eMinerals
Project - Scope
Set of tools to facilitate scientific work
–
–
–
–
–
–
Building and configuring grids
Job submission tools
Data management
Metadata management
Data processing / Information extraction
Simulation output visualization
Daresbury Laboratory
eMinerals
Project - Scope
Set of tools to facilitate scientific work
–
–
–
–
–
–
Building and configuring grids
Job submission tools
Data management
Metadata management
Data processing / Information extraction
Simulation output visualization
Daresbury Laboratory
RMCS Framework
Components:
–
–
–
–
Input Preparation
Bulk Job Submission
Running Jobs
Analysis of Results
Additional Building Blocks:
– Storage Resource Broker (SRB) - Data Storage
and Collaborative Sharing
– AgentX - XML data, sharing between programs,
metadata capture (developed by Phil Couch, STFC)
Daresbury Laboratory
Input preparation
Template
input
Simulation
Simulation
Simulation
Simulation
Bespoke scripts
to automate
generation of
input files for
parameter sweep
type calculations
Daresbury Laboratory
SRB
Monty
Metadata
Database
RMCS
Monty - Bulk Job
Submission
-set up structure in
SRB for staging of
input files and
binary; storage of
output files
-set up structure in
database for
metadata capture
-submit jobs to RMCS
Daresbury Laboratory
Metadata Database
Metadata in database
are divided into study,
data set, and data
object
Study = entire proj
Data Set = group of
calculations
Data Objects = piece of
data from each
calculation
Study
Data Set
Data Set
Data Objects (i.e. “parameters”)
Daresbury Laboratory
Daresbury Laboratory
RMCS - integrated compute/ data/
metadata framework
3 tier model
Client tools
GUIs (SOAP library), Command Line Tools (gSOAP)
RMCS Server
The Grid
Daresbury Laboratory
“The Grid” so far
The Grid
eMinerals
MiniGrid
Northwest
Grid
National Grid
Service
Cambridge
Condor Pool
Scarf Cluster
(coming soon)
Daresbury Laboratory
RMCS - integrated
compute/data/metadata framework
1.
2.
3.
4.
5.
Meta-schedule
Stage input files and binary
Run job/submit to batch queue
Transfer output to SRB
Use Rcommands + AgentX to put
metadata into database /extract
XML data if available
Daresbury Laboratory
Rgem - Collect Results
Data Set
Data Objects
-Analyze Results
-Collect parameters
from a chosen
dataset -> tab
separated file ->
graph
Daresbury Laboratory
Scientific Examples
•Quantum Directed Genetic
Algorithm
•Transition metal oxides,
Perovskites
•eMinerals
Daresbury Laboratory
Quantum Directed Genetic
Algorithm (QDGA)
Marcus Durant (Univ of Northumbria), Jens Thomas (STFCDaresbury)
The QDGA project uses a genetic
algorithm to try and determine an
optimal catalyst for the conversion of
nitrogen (N2) to hydrazine (N2H4)
Daresbury Laboratory
Template
Input
GAMESSUK
Daresbury Laboratory
Template
Input
GAMESSUK
BespokeScripts
DFT
DFT
Transition
DFT
Transition
DFT
State
Search
Transition
State
Search
Transition
State
Search
State Search
Daresbury Laboratory
Template
Input
GAMESSUK
Monty
BespokeScripts
Create directory
structure
SRB
Input files +
binary
Output files
DFT
DFT
Transition
DFT
Transition
DFT
State
Search
Transition
State
Search
Transition
State
Search
State Search
Daresbury Laboratory
Template
Input
GAMESSUK
Monty
BespokeScripts
DFT
DFT
Transition
DFT
Transition
DFT
State
Search
Transition
State
Search
Transition
State
Search
State Search
Create metadata
containers in database
SRB
Input files +
binary
Output files
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
BespokeScripts
Monty Submit jobs via RMCS
DFT
DFT
Transition
DFT
Transition
DFT
State
Search
Transition
State
Search
Transition
State
Search
State Search
RMCS
SRB
Input files +
binary
Output files
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
RMCS
Stage input files + binary
SRB
Input files
GAMESUK binary
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
RMCS
Metaschedule
Submit to batch queues
SRB
Input files
GAMESUK binary
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
RMCS
Transfer output to SRB
SRB
Input files
GAMESUK binary
Output files
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
RMCS
Using AgentX and
Rcommands, place
metadata in database
SRB
Input files
GAMESUK binary
Output files
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
Rgem
Collect and plot Total
Energy from each
optimized geometry
SRB
Input files
GAMESUK binary
Output files
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Template
Input
GAMESSUK
BespokeScripts
DFT
DFT
Transiti
DFT
Transiti
DFT
on
Transiti
on
StateTransiti
on
State
on
Search
State
Search
State
Search
Search
Rgem
Collect and plot Total
Energy from each
optimized geometry
Monty
RMCS
SRB
Input files
GAMESUK binary
Output files
NWGrid
(Daresbury, Manchester, Liverpool, Lancaster)
Metadata
Database
Daresbury Laboratory
Transition metal oxides; Perovskites (e.g. LaMnO3)
Band Theory Group; W. Temmerman, M. Lueders, L. Petit, R. Tyer
Simulation
Simulation
Simulation
Simulation
RGem
RCommands
XML
XML
data
XML
data
XML
data
data
AgentX
Use of RMCS
Framework
with XML
output allows
each of these
steps to be
linked
together
seamlessly
Database
Daresbury Laboratory
“Grand Challenge” science and the
eMinerals VO
Sulphides
Oxides/hydroxides
Phosphates
Carbonates
Large empirical
models
Aluminosilicates
Linear-scaling quantum
mechanics
Clays, micas
Quantum
Monte Carlo
Natural organic matter
Level of
theory
Organic molecules
Metallic elements
Halogens
Contaminant
Adsorbing
surface
Daresbury Laboratory
Some Recent eMinerals Projects
•Calculation of compressibility of
diopside (CaMgSi2O6) between 0 and 22
Gpa - Andrew Walker
•Equation of State of Silica Glass - Andrew
Walker
•Adsorption of Polychlorinated Dibenzop-Dioxins (PCDDs) onto Mineral Surfaces Kat Austen
Daresbury Laboratory
Shelf seas are
omitted or poorly
resolved in global
ocean models ………
….... but they are a
disproportionately
important part of the
earth system
GCOMS: (Proudman
Oceanographic Laboratory)
Global Coastal Ocean
Modelling System
37% of the Earth’s population live within 100km of the coast
Shelf seas are 7% of ocean area but account for up to 30% of production
Shelf seas modify and transport terrestrial inputs: freshwater, nutrients, pollutants
Strong role in dense water formation, mixing on slopes etc.
Daresbury Laboratory
acknowledgements
Cambridge eMinerals Group - Prof.
Martin Dove, Kat Austen, Andrew
Walker, Richard Bruin
STFC eScience - Rik Tyer, Kerstin
Kleese Van Dam, Rob Allan, Phil Couch
STFC CSED - Jens Thomas, Martin
Lueders, Walter Temmerman