Fifth ERCIM Environmental Modelling Workshop

Download Report

Transcript Fifth ERCIM Environmental Modelling Workshop

N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Management in a Grid Environment theory and practical examples
Kerstin Kleese van Dam et. al.,
CCLRC e-Science Centre
[email protected]
http://www.e-science.clrc.ac.uk
1
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Council for the Central Laboratory of the Research Councils
One of Europe’s largest Research Support Organisations, providing
large scale experimental, data and computing facilities primarily to
the UK research community both in academia and industry.
Annually supporting around 12000 scientists from all major
scientific domains. 1800 members of staff over three sites:
•Rutherford Appleton Laboratory in Oxfordshire
•Daresbury Laboratory in Cheshire
•Chilbolton Observatory in Hampshire
Large quantities of data associated with the various facilities.
Houses 1 World Data Centre, 3 National Data Centres and a range
of community based data services.
http://www.cclrc.ac.uk
2
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
CCLRC e-Science Centre
Early involvement in e-Science (from 1999 Data Grid / WOS
onwards).
Centre established in 2000, since 2001 with direct governmental
funding, additional funding through participation in other projects.
Currently housing UK Grid Support Centre (together with
Manchester + Edinburgh) and BBSRC Grid Support Centre.
Involved in DataGrid, GridPP, AstroGrid and NERC DataGrid
Currently 40 permanent members of staff, 10 in the data
management group.
http://www.escience.clrc.ac.uk
3
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Management Group
4
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Current e-Science Projects of the Data Management
Group
Working on collaborations with partners inside CCLRC, the UK
and internationally
CLRC DataPortal
Integration of ISIS and BADC operational Data Catalogues
Environment from the Molecular Level
NERC DataGrid
e-Science Technologies for the Simulation of Complex Materials
Extensions of the Storage Resource Broker (SRB) together with SDSC
Earth Science Portal Project
Database service for CCLRC and related e-Science projects
5
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Management
6
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Currently
thewe
scientist
hase-Science
to take care
of his data,
In
the future
hope that
technologies
providing
the binding
between
different
areas
provide
scientists
withlink
a more
helpful
environment
of work.
…
Your personal e-Science
Interface where ever you are.
7
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Issues
Data capture from instruments and computers
Data Storage
Annotating data
Data Discovery
Association of data with appropriate applications
Conversion of data from one application to the other
Merging of data from different sources
8
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data capture from instruments and computers
In a Grid environment the Scientists will ultimately have little
control where he will carry out his experiment or calculation and
where therefore his data will be.
Capture Data
Capture Information about the environment
Direct where output goes
9
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Capture from Experimental Facilities (1)
Instruments produce varying amounts of data, ranging from small
(e.g. temperature readings at a station) to large (e.g. LHC with
several Tbytes per second).
Each instrument will produce data in its own format, often
incompatible with anything else.
Most facilities provide their own short term storage, but will neither
annotate nor manage the data.
The collection of environmental information is often limited, much of the
information is still recorded in lab notice books.
Correction values or error margins related to the instrument are
not linked to the collected data.
10
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Capture from Experimental Facilities (2) Requirements
Generalised description of data format (possible standardisation for
instruments of the same type).
Automatic capture of environment information including Instrument
scientists if necessary.
Automatic linking of data about the environment and the raw data
produced by the instrument.
Automatic insertion of both types of data into interim or final data
repository.
Automatic linking of the donated data to existing related information
e.g. proposal, other experiments of the same project.
11
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Finally Integrated with
Collection of Raw data
other Facility Data within
from the Instrument,
and outside CCLRC via
DataDetector
Capture
from Experimental
(3) specific
InstancesFacilities
of the CCLRC
Information for this
Examples
DataPortal software.
experiment etc.
ICAT - CLRC ISIS
Catalogue
http://www.isis.rl.ac.uk/d
ataanalysis
Integrate Raw Data with
See also: original Proposal
Information and Log files
Comb-e-Chem - http://www.combechem.org
of the Instrument Scientists
12
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Storage
The Grid environment provides access to a multitude of storage
systems, often hiding the type of system behind services
interfaces.
Where is the data
How can I manage it
On which media is my data (access time)
How can it be accessed
Where are replicas of my data
13
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Storage (2) - Requirements
Easy overview where your data is on the Grid
Support to manage your data (transfers/replicas)
Access and access control to your data where ever it is
Support to share your data
Two possible solutions:
Globus Data Management tools - example ESG
http://www.earthsystemsgrid.org
Storage Resource Broker (SRB) from the San Diego Super
Computing Centre
http://www.npaci.edu/DICE/SRB
14
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Metadata Catalogue
for Data Discovery
within one Virtual
Typical Analysis Scenario and the use of Storage Resource
Request goes out to
Organisation
Replica Catalogue
Managers (SRM)
Disk and Hierarchical
keeps track of all
Client’s site
Storage Resource
replica’s of specific
client
client
Managers
datasets
Thelogical
Network
within one
Weather
query
Virtual
Service
Organisation
helps to plan
Metadata
Request
catalog
Interpreter
fastest
Access routes to
logical files
the data
...
DRM
site-specific
files requests
Disk
Cache
Request
Executer
site-specific
files
request
planning
Replica
catalog
Network
Weather
Service
pinning & file
transfer requests
network
HRM
Disk
Cache
tape system
...
DRM
Disk
Cache
15
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Storage Resource Broker (1)
Professional Data Storage Management System initially developed
in the mid 90’s by the San Diego Super Computing Centre.
http://www.npaci.edu/DICE/SRB/. Current version supports
many platforms and authentication methods. Web services
Interfaces.
17
Devise Interface Modules
to wide range of platforms
Resource
– Storage
easy to extend
to new Broker
systems
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
SRB External Interface
Modules: MySRB (web
based), Command line
Interface, C and Fortran
API’s – Password and
MCAT provides links
Certificate
authorisation
Integrated
access
data on PC, UNIX, LINUX, DB and Tape
between logical
toto
physical
Store
http://www.npaci.edu/dice/srb/mySRB/mySRB.html
data location,
replica and
versioning.
canproject
be
also used inMCAT
the BIRN
http://www.nbirn.net/
run on a variety of
Relational Databases.
18
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Functions including
ingestion, movement and
replication of data.
Providing access to data
for others
Version of Data
Type of Data
Replica or Original Data
Physical Data Location
and Type of Resource
19
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
20
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
21
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Biomedical Informatics Research
Network
22
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Annotating Data
Data without further information is only of short and very limited
use.
Information about the data itself
Information about the where, why, who and when
Information about the environment in which the data was captured
Related Information
Example: CLRC Scientific Metadata Schema http://www.escience.clrc.ac.uk/Activity/ACTIVITY=DataPortal;SECTION=5;
23
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Diversity: Users & Searches
Discovery
Excavation
24
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
General Scientific Metadata
A generic metadata model for all scientific
applications with Specialisation for each domain
Science Metadata Model
Space Earth Social
ISIS
SRS
HEP
Science Science Science
Can answer questions across domains
Can answer questions about specific domains
25
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
CLRC DataPortal - Scientific Metadata Model
Metadata
Object
Topic
Keywords providing a index on what the
study is about.
Study
Description
Provenance about what the study is, who
did it and when.
Access
Conditions
Conditions of use providing information
on who and how the data can be
accessed.
Detailed description of the organisation
of the data into datasets and files.
Data
Description
Data
Location
Related
Material
Locations providing a navigational to
where the data on the study can be
found.
References into the literature and
community providing context about the
study.
26
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Data Discovery
Most data is currently ‘discovered’ by word of mouth from friends
and colleagues or sheer luck.
Discovery
Browsing
Selection
Comparison
Access
Example: CLRC DataPortal http://esc.dl.ac.uk:9000/index.html
27
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
A -Metadata – can be
Different
Levels
derived
from the
data of Metadata supporting Discovery
itself
and Selection
Metadata
Definitions
A: Usage metadata generated
from (or about) the data. It could
be aggregated metadata: e.g.
CDML from cdscan.
XML
A
Relationships
B -Metadata
–A
XML
summary of
all other
types of metadata
B
XML
C -Metadata
– All
?
A
D
related
metadata,
papers, pictures, related
D -Metadata
– User
XML
XMLstudies
on
C
Qprovided information
what, who,
what and
XML
Q : Schema which
when
defines supported
D
XML
B: Complete metadata from A
+ user prov ided info to conform
with (at least) GEO profile.
Application + template needed.
C: Metadata generated to
describe both documentations
and annotations (as opposed to
binary data).
D: Discovery metadata suitable
for harvesting to a portal.
Probably based on Dublin core
& GEO. Subset of B and C.
XML
B
XML
C
XML
D
queries upon
A,B,C,D
28
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
CLRC DataPortal
The DataPortal currently allows access to selected metadata and data from
four facilities. The first three housed by CLRC:
The Synchrotron Radiation Department (SRD)
The Neutron Spallation Source (ISIS)
The British Atmospheric Data Centre (BADC)
Max-Planck Institute for Meteorology (MPIM)
You will be able to assess the available data via the basic search.
If you are not one of our partners, but would like to try the system you can
use one of our test accounts:
Login , using 'dpuser' for your username and password.
http://esc.dl.ac.uk:9000/index.html
29
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
DataPortal Architecture
The major functions of the DataPortal (DP) are grouped into modules, each module has
a grid services interface to communicate with the other DP services and in some cases
also with outside services like Visualisation or HPC Portal. The Soap protocol is used for
communication and WSDL to describe the various services. We do not change any local
metadata system, but use our own wrappers to translate our general query format into
the local syntax. Replies from the resources will be XML files compliant with the CLRC
Scientific Metadata Format:
(http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf)
The UK e-Science Grid CA provides Globus x509 certificates for the UK e-Science
community. The CA is located at RAL and is being run as part of the Grid Support Centre
funded by the Research Councils' Core e-Science programme.
(http://www.grid-support.ac.uk/)
The implementation of the core modules as grid services allows the DataPortal to be a truly
distributed application and allows several instances of the DataPortal to logically combined
thus extending any user query.
30
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
General CLRC DataPortal Architecture
CLRC DataPortal Server
Other Instances of the
CLRC DataPortal Server
XML wrapper
XML wrapper
XML wrapper
Local metadata
Local metadata
Local metadata
Local data
Local data
Local data
Facility N
Facility 1
Facility 1
...
31
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Accessing DataPortal
either via Web Interface
DataPortal
or Web Services
Authenticate
andArchitecture (2)
Query Generation,
Interfaces
Authorise
usere.g.
by Query and
of Suitable
Reply interacting with the DataPortal viaSelection
checking
As well ascertificate
the Web Interface users can also run
Facilities to Query. Farm
queries and
by directly
validity
checkcalling
withthe Query & Reply service assuming that they are properly
out visible,
query for
to selected
authenticated.
Other for
services are also externally
example the Shopping
associated
facilities
Put
interesting
Data
in
Cart.
Facilities in parallel and
general access rightsyour personal, permanent Data Transfer
External
collect and collate resultsData
File
Use the Data
Transfer
Store(s)
Shopping
Cart, which you
Service to send
your with
data others as
can share
Authentication
on to a chosen
application DataPortal
Certification
Service
&
required.
Web Interface
Authority
Look Up
Authorisation
or service
Facilities Access
Control
Session
Management
Query
&
Reply
Shopping Cart
DataPortal
Permanent
Repository
The Shopping Cart allows registered
users to permanently store and annotate
pointers to the external data files and
data sets.
Facility
Administration
Facility Administration allows
external facilities to advertise
their grid services to the
DataPortal.
Facilities XML
Wrappers
32
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Choose Facilities of
Interest
Select Discipline and
reduce Search Field
33
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
34
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
35
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
36
Annotate your
Search Results
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Specific Services
associated with this
data
Forgotten where
your data came
from?
37
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Association of data with appropriate applications
The scientists will need to be able to link to all his favourite
applications for analysis, simulation and visualisation, but he also
needs to be informed about suitable other program’s.
Suitable applications
Correct Format
Suitable for your environment
Availability
38
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
HPCGrid Services Portal
This is a pilot project funded by the CLRC e-Science Centre to
develop a Web portal to search for resources and submit HPC
applications to a computational Grid in the UK. It will form the basis
of application portals for the UK e-Science Grid and "thematic
Grids" for e.g. NERC DataGrid and HPCI Consortia.
This project is a collaboration with the San Diego Supercomputer
Centre who have developed the GridPortPortal and HotPage software
for the NPACI HPC Grid, and with the University of Lecce, Italy who
have developed the Grid Resource broker.
http://esc.dl.ac.uk/HPCPortal/
39
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
HPC Grid Services Portal
Provides a portal for HPC resources which can be customised for
domain-specific applications.
Original collaboration with San Diego Supercomputer Center, now
University of Texas (Mary Thomas).
Similar functionality to HotPage and GridPort (SDSC):
Single sign-on using a digital certificate (GSI)
Resource monitoring and Discovery (Globus)
Application Discovery (search engine)
Personal "desktop" workspace
File transfer (Globus) and Job Submission (Globus)
40
InfoPortal
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Searching for
Applications on the
UK Level 2 Grid
HPCPortal
DataPortal
41
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Chose Application:
DLPOLY
Resulting Findings
for DLPOLY
42
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Summary
Description
Web Service
Address for
DLPOLY code
Information about
the systems the
code is installed
and available for
use
Link to job submission
43
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
44
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
All machines on
the UK level 2
Grid and their
availability
45
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
Conversion of data from one application to the
other
The scientists will need to be able to pass data from one
application to the next seamlessly and with minimum interference
on their part.
Determining Data Formats
Data Schema
Interchange/Conversion
Example: e-Materials Project
46
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
E-SCIENCE TECHNOLOGIES IN THE SIMULATION OF COMPLEX MATERIALS
A combination of novel computational and computer science methodologies and teams will be
used to develop GRID e-Science technologies to deliver new simulation solutions to problems
and fields relating to combinatorial materials science and polymorph prediction. The project
will exploit the latest developments in scientific simulation methodologies (both electronic
structure and force field based) and hardware ranging from desktop to HPC. It will establish a
field tested integrated data and computing e-Science infrastructure customised for these key
areas of current materials science. This infrastructure will, among others, enable the automatic
submission of simulation, triggered by the identification of knowledge gaps in the database in
response to user queries. Furthermore, the automatic integration of experimental and
computational results for screening applications will be supported.
47
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The Science: Filtering
Purely SiO4
zeolite
Information of Interest
 Structure
 Total energy
 Binding Energy
 HOMO/LUMO
 Population Analysis
 Vibrational Freqs
Metal substitution
with addition of
proton
Two point displacement method used to
build up dynamical matrix.
Single point energy calculation at each
displacement +ve and –ve in x, y, and z.
Calculation of
Vibrational Freqs
Increase quality
of calculation for
best candidates
Add probe
48
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The Computation
ChemShell
GAMESS-UK
2. Energy and gradients
passed from GAMESS-UK
to GULP and then final
forces passed back to
ChemShell (newopt
module), which performs
geometry optimisation.
GULP
RMS=x
1. Micro iterations to
relax shells wrt forces
from QM region. RMS
criteria (x) tested for
further movement of
shells.
ChemShell
Optimiser
Maxg and maxs < 0.01
GAMESS-UK
GULP
ChemShell
3. Optimisation is
considered complete
when both max
gradient and max
step are below set
criteria.
49
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
CML – Chemical Markup Languages
CML is a new approach to managing molecular information. It has a
large scope as it covers disciplines from macromolecular sequences to
inorganic molecules and quantum chemistry. CML is new in bringing
the power of XML to the management of chemical information. CML
and associated tools allows for the conversion of current files without
semantic loss into structured documents, including chemical
publications, and provides for the precise location of information
within files.
Developed by Peter Murray-Rust and Henry S. Rzepa.
http://www.xml-cml.org
As an addition they are also looking at:
CCML – a Computational Chemical Markup Language
50
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
<document>- <!-- CML document - caffeine - karne - 7/8/00 --> <!-- file converted from: MDL .mol --> <cml title="caffeine" id="cml_caffeine_karne" xmlns="xschema:cml_schema_ie_02.xml"><molecule title="caffeine" id="mol_caffeine_karne" convention="mol">
<formula>C8 H10 N4 O2</formula>
<string title="CAS">58-08-2</string>
<string title="ACX">I1001269</string>
<string title="DOT">UN 1544</string>
<string title="RTECS">EV6475000</string>
<float title="molecule weight">194.19</float>
<float title="melting point" units="degC">238</float>
<float title="specific gravity">1.23</float>
<string title="water solubility" units="g/100 mL" convention="g per 100
mL at 23 degC">1-5</string>
<string title="comments">White powder or white glistening needles
usually melted together. LIGHT SENSITIVE</string> <list title="alternate names">
51
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
ENVIRONMENT FROM THE MOLECULAR LEVEL: AN E-SCIENCE PROPOSAL
FOR MODELLING THE ATOMISTIC PROCESSES INVOLVED IN
ENVIRONMENTAL ISSUES
Many environmental problems, such as transport of pollutants, development of remediation
strategies, weathering, and containment of high-level radioactive waste, require an
understanding of fundamental mechanisms and processes at a molecular level. Computer
simulations at a molecular level can give considerable progress in our understanding of these
processes. Developments in atomistic simulation tools must now be linked with GRID
technologies in order to facilitate simulation studies that can be performed with realistic
conditions, and which can scan across a wide range of physical and chemical parameters. This
proposal brings together simulation scientists, applications developers and computer scientists
to develop UK e-science/GRID capabilities for molecular simulations of environmental issues.
A common set of simulation tools will be developed for a wide range of applications, and the
GRID environment will be established which will result in a giant leap in the capabilities of
these powerful scientific tools. See http://eminerals.org/
52
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
THE NERC DATAGRID
Data discovery and delivery are inherent components of many aspects of science. They can be
considered part of a processing chain that starts with raw data from a variety of sources, and
ends with the graphical production of information that is directly used in scientific research.
This proposal is to build a grid which makes data discovery, delivery and use much easier than
it is now, facilitating better use of the existing investment in the curation and maintenance of
quality data archives. Further we intend to make the connection between data held in managed
archives and data held by individual research groups seamless in such a way that the same
tools can be used to compare and manipulate data from both sources. What will be completely
new will be the ability to compare and contrast data from an extensive range of (US,
European, UK, NERC) datasets from within one specific context. The presence of the NERC
DataGrid will allow grid based visualisation services to access a wide variety of data held at
the British Atmospheric and Oceanographic Data Centres (BADC and BODC respectively) as
well as on individual storage systems belonging to groups which register their data with the
NERC DataGrid. The structures put in place will also allow NERC data to become part of the
putative future semantic grid. See http://ndg.badc.rl.ac.uk/
53
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
CLRC DataPortal
Related Projects
EARTH SCIENCE PORTAL
The Earth Science Portal (ESP) is a collaboration designed to build the infrastructure needed to
create web portals to provide access to observed and simulated data within the climate and
weather communities. The infrastructure created within ESP will provide a flexible framework
that will allow interoperability between the front-end and back-end software components.
The initial ESP community workshop was held on January 23rd and Friday, January 24th, 2003
at the National Center for Atmospheric Research, Boulder, Colorado. Based on the discussions
of the workshop we created a draft document that describes the software framework within
ESP. The development activities in ESP are intended to support this framework. The document
will be updated based these activities and comments and suggestions from the community.
Partners are: BADC, CCLRC, CDC and GFDL NOAA, NASA, LLNL, NCAR and PMEL
http://nomads.gfdl.noaa.gov/~ck/esp/webpages
54
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
EUROPEAN SPATIO-TEMPORAL DATA INFRASTRUCTURE FOR HIGHPERFORMANCE COMPUTING
ESTEDI, an initiative of European software vendors and supercomputing centres, will
establish a European standard for the storage and retrieval of multidimensional highperformance computing (HPC) data. It addresses a main technical obstacle, the delivery
bottleneck of large HPC results to the users, by augmenting high-volume data generators with
a flexible data management and extraction tool for spatio-temporal raster data. To this end, the
multidimensional database system RasDaMan will be enhanced with intelligent mass storage
handling and optimised towards HPC. See http://www.estedi.org/
55
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
MSC PROJECT ON AUTOMATED DATA MANAGEMENT FOR CLIMATE
SIMULATIONS
These days data is no longer only produced by experiments, measurements and observations.
Many of the more complex phenomena are studied in computer simulations. These
simulations can produce large quantities of data. However in contrast to much experimental or
observational data these results are often not accessible to the wider research communities.
Simulation data could be more widely exploited if better information was available concerning
the simulation itself.
This project aims to investigate the possibility of automatically capturing as much metadata
concerning the simulation as possible and storing it in a suitable database. The database will
be accessible via the CLRC DataPortal. It is expected that next to investigating the issue in
general a prototype installation will be provided by the students.
56
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
The CLRC DataPortal
Related Projects
CLRC e-Science Database Service
We looking for the most flexible operating systems in terms of both software available and
price/performance ultimately led to the choice of a Linux based system (enterprise editions).
For running the widest choice of databases, the Redhat Advanced Server and SuSE Linux
Enterprise Server are available. Oracle has been selected for the initial database service as it
offers a clustering technology. Oracle Real Application Clusters are the multi-node extension
to Oracle database server. A cluster is a group of independent servers (nodes) that cooperate as
a single system. The primary cluster components are processor nodes, a cluster interconnect,
and a shared storage subsystem. Oracle cluster database combines the memory in the
individual nodes to provide a single view of the distributed cache memory for the entire
database system. Oracle are the only vendor to offer this capability.
We chose IBM x440 series nodes as the building
blocks for the data clusters. The IBM Enterprise
X-Architecture consists of Intel processor-based
PostgreSQL
servers, such as support for up to 16-way SMP
capability and remote I/O. The clusters connect to
1TB RAID 5 storage arrays via fibre channel
switches.
57
N+N meeting Australia 2003
e-Science Centre
Kerstin Kleese van Dam
For Information see:
Integrated e-Science Environment Portal
http://esc.dl.ac.uk/IeSE/
HPC Grid Services Portal
http://esc.dl.ac.uk/HPCPortal/
DataPortal
http://esc.dl.ac.uk:9000/index.html
CLRC e-Science Centre
http://www.e-science.clrc.ac.uk
58