NorduGrid talk

Download Report

Transcript NorduGrid talk

The NorduGrid project:
Using Globus toolkit for building
Grid infrastructure
presented by Aleksandr Konstantinov
Mattias Ellert
Aleksandr Konstantinov
Balázs Kónya
Oxana Smirnova
Anders Wäänänen
Introduction
Launched in spring 2001, with the aim
of creating a Grid infrastructure in the
Nordic countries.
Partners from Denmark, Norway,
Sweden, and Finland.
Powered mainly by ATLAS groups
(Lund, Copenhagen, Stockholm,
Uppsala, Oslo).
Relatively short term project - ends in
October 2002.
Relies on very limited human
resources (3 full-time researchers, few
part-time ones) with funding from
[email protected]
ACAT'2002, Moscow
2
Introduction (cont.)
The purpose of the project is to create and operate functional
testbed.
Use approved tools => Globus ToolkitTM (developed at
Argonne National Laboratory and University of Southern
California) and tools developed at European Data Grid
project.
Aim at High Energy Physics applications - take into account
while choosing what to implement first.
No temporary solutions (it is better not to implement something,
than to be forced to provide backward compatibility for limited
solution).
[email protected]
ACAT'2002, Moscow
3
Globus ToolkitTM evaluation
Widely accepted de-facto standard for Grid computing.
Provides collection of (mostly) robust protocols, libraries and low-level
services.
Security built-in.
Continuously evolving (??).
Missing few important high-level services:
grid-level scheduler
job data stagein/stageout
user-friendly grid entry points (simple user-interface, web portals, etc.)
grid-level authorization system
grid-level
[email protected]
accounting and quotas
ACAT'2002, Moscow
4
NorduGrid requirements
No single point of failure
No central sandbox (unlike EDG)
Lightweight brokering integrated into User Interface
Job should not be Computing Element (cluster) specific
Non grid-aware jobs allowed ("grid functionality" is provided by
middleware on Computing Element)
Job runs in as restrictive environment as possible (do not expect
network on computing nodes)
Minimal environment is provided on Computing Element
Adequate and full (enough) information provided by InfoSystem
Natural computing unit is cluster
[email protected]
ACAT'2002, Moscow
5
NorduGrid architecture
[email protected]
ACAT'2002, Moscow
6
Information System
NorduGrid operates an MDS based, hierarchically
distributed Information System:
new information model for clusters, queues, jobs,
users, SE, RC
efficient providers
all the job monitoring, resource discovery, status
monitoring and brokering are exclusively built on top
of the MDS
MDS hierarchy with dynamic site registrations
[email protected]
ACAT'2002, Moscow
7
Information System(example)
cluster entry
job entry
user entry
queue entry
[email protected]
ACAT'2002, Moscow
8
Information System (hierarchy)
[email protected]
ACAT'2002, Moscow
9
Information System (interfaces)
[email protected]
ACAT'2002, Moscow
10
Grid Manager - cluster middleware
Provide job control and data handling functionality (HEP
applications requirements are first priority).
The Grid Manager is based on Globus ToolkitTM libraries
and services. The following parts of Globus are used:
GridFTP - fast and reliable data access for Grid
GASS Copy interface - support for different data access
protocols
Replica Catalog - metadata storage
GRAM - resource request
RSL - expandable Resource Specification Language
[email protected]
ACAT'2002, Moscow
11
Grid Manager (features)
Stage in input data and executables. Possible sources:
Job submission machine.
GridFTP (preferred), FTP, HTTP or HTTPS servers.
Files registered in Globus Replica Catalog. Secure
authentication. Destination is chosen automatically or can be
forced.
Stage out output data. Possible destinations:
Keep on cluster till user downloads.
GridFTP, FTP, HTTP or HTTPS servers.
Files can be registered in Globus Replica Catalog. Destination
and protocol are obtained from Location information.
[email protected]
ACAT'2002, Moscow
12
Grid Manager (features)
E-mail notification of job status changes.
Support for software runtime environment configuration.
Jobs will be started with environment setup properly for
requested application
Customizable GridFTP server
local access through plugins
certificate oriented local file system access plugin
job submission/access plugin - start job/upload input
files/download output files through the same interface
Limitation: Data is handled only at that beginning and end
of the job. User must provide
information about input and 13
ACAT'2002, Moscow
[email protected]
Extensions to RSL (evaluation)
RSL stands for Resource Specification Language. Introduced to
communicate job requirements to the Global Resource Allocation
Manager (GRAM).
Useful features:
Allows basic logical expressions
Set of attributes is expandable
Unknown attributes are passed through.
Allows different parts to be processed at different levels.
Can be used to assist in writing brokers or filters which refine an
RSL specification
[email protected]
ACAT'2002, Moscow
14
Extensions to RSL (new attributes)
To support additional features new attributes introduced. The most
important are
inputFiles=(<file> [<location>]) ...
- list of files to be transferred
to the computing node from a given location.
outputFiles=(<file> [<location>]) ...
-list of files to be preserved
after the job completion and transferred to a given location.
executables=<file1> <file2> ...
permissions.
-list of files to be given executable
notify=<options> <email> ...
change.
-E-mail notification on job status
[email protected]
ACAT'2002, Moscow
15
Extensions to RSL (new attributes)
runTimeEnvironment=<string>... - application-specific runtime
environment (e.g., ATLAS-3.2.1)
middleware=<string>
0.3.0)
-required middleware (e.g., NorduGrid-
cluster=<string>
-specific cluster request
rerun=<number>
-number of attempts to re-run the job
lifeTime=<number>
-maximum time for the session directory
to remain on the execution node (can not override local policy)
ftpThreads=<number>
for file transfers
[email protected]
-number of GridFTP threads to be used
ACAT'2002, Moscow
16
User Interface
The NorduGrid toolkit user interface consists of a set of
commands that can be executed from the command line
ngsub - for job submission
ngstat - to obtain the status of jobs and clusters
ngcat - to display the stdout or stderr of a running job
ngget - to retrieve the result from a finished job
ngkill
- to kill a running job
ngclean
- to delete a job from a remote cluster
ngsync - to recreate local information about jobs
[email protected]
ACAT'2002, Moscow
17
User Interface
Job request is done through xRSL
processes user-level xRSL request and transforms to one suitable for
GM
user-friendly values for some attributes
conditional submission and xRSL transformation
Performs brokering
analyzes information about the different clusters obtained from the
MDS servers
from all suitable queues one is chosen randomly, with a weight
proportional to the amount of free computing resources
Passes modified job request to GM through GRAM or GridFTP
interface and uploads input files.
[email protected]
ACAT'2002, Moscow
18
User Authentication Management
Using Globus certificates
NorduGrid Certification Authority established
Access control through gridmapfiles
User access control is delegated to Virtual Organization
managers
Gridmapfiles are generated automatically from VO database
GSI enabled secure LDAP server
contains the Subject Names of the user's certificates
VO managers
User Groups and Group Managers
Local site adminisrators have
total
control over their
ACAT'2002,
Moscow
[email protected]
19
Applications
It is possible to run any application with predefined set of
input and output data
From as simple as "Hello World"
ngsub '&(executable=/bin/echo)(arguments="Hello
World")(stdout=out.txt)'
[email protected]
ACAT'2002, Moscow
20
Applications (cont.)
to as difficult as Atlas Data Challenge
ngsub '&(executable = prod)(arguments = "0002" "2" "100")
(stdout = atlas.0002.log)(join = yes)
(replicacollection =
ldap://grid.uio.no/lc=ATLAS,rc=NorduGrid,dc=nordugrid, dc=org)
(inputfiles =
("atlsim.makefile" "")
("atlas.kumac" "")
("gen0017_1.root" "rc:///gen0017_1.root") )
(outputfiles =
("atlas.0002.zebra" "rc:///results/atlas.0002.zebra")
("atlas.0002.his" "") )
(runtimeenvironment="ATLAS-3.2.0")
(middleware="NorduGrid")'
[email protected]
ACAT'2002, Moscow
21
Conclusions
The minimal environment for Grid computing is established.
Globus tools alone are not enough for convenient usage, but
provide solid base.
Additional layer of tools/services were developed to provide
required infrastructure.
A lot of things to do:
Runtime data handling.
Accounting.
Better support for different LRMS.
Enhanced Information System - more stability, access
control, better and richer information providers etc.
[email protected]
ACAT'2002, Moscow
22