Tutorial EDG Intro and WP summaries

Download Report

Transcript Tutorial EDG Intro and WP summaries

The EU DataGrid Introduction
The European DataGrid Project Team
http://www.eu-datagrid.org/
[email protected]
Contents

The EDG Project scope

Achievements

EDG structure

Middleware Workpackages: Goals, Achievements, Issues

Testbed Release Plans
The EDG Intro– Tutorial - n° 2
Glossary

RB
Resource Broker

VO
Virtual Organisation

CE
Computing Element

SE
Storage Element

GDMP
GRID Data Mirroring Package

LDAP
Lightweighted Directory Access Protocol

LCFG
Local Configuration System

LRMS
Local Resource management system (Batch) (PBS, LSF)

WMS
Workload Management System

LFN
Logical File Name

SFN
Site File Name ( like storageEl1.cern.ch:/home/data/MyMu.dat )
(like MyMu.dat)
The EDG Intro– Tutorial - n° 3
The Grid vision

Flexible, secure, coordinated resource sharing among dynamic
collections of individuals, institutions, and resource


From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
Enable communities (“virtual organizations”) to share
geographically distributed resources as they pursue common
goals -- assuming the absence of…




central location,
central control,
omniscience,
existing trust relationships.
The EDG Intro– Tutorial - n° 4
Grids: Elements of the Problem

Resource sharing



Sharing always conditional: issues of trust, policy, negotiation,
payment, …
Coordinated problem solving


Computers, storage, sensors, networks, …
Beyond client-server: distributed data analysis, computation,
collaboration, …
Dynamic, multi-institutional virtual orgs


Community overlays on classic org structures
Large or small, static or dynamic
The EDG Intro– Tutorial - n° 5
EDG overview : goals


DataGrid is a project funded by European Union whose objective is to
exploit and build the next generation computing infrastructure providing
intensive computation and analysis of shared large-scale databases.
Enable data intensive sciences by providing world wide Grid test beds to
large distributed scientific organisations ( “Virtual Organisations, Vos”)
Start ( Kick off ) : Jan 1, 2001

End : Dec 31, 2003

Applications/End Users Communities : HEP, Earth Observation, Biology

Specific Project Objetives:





Middleware for fabric & grid management
Large scale testbed
Production quality demonstrations
Collaborate and coordinate with other projects (Globus, Condor, CrossGrid,
DataTAG, etc)
Contribute to Open Standards and international bodies
( GGF, Industry&Research forum)
The EDG Intro– Tutorial - n° 6
DataGrid Main Partners

CERN – International (Switzerland/France)

CNRS - France

ESA/ESRIN – International (Italy)

INFN - Italy

NIKHEF – The Netherlands

PPARC - UK
The EDG Intro– Tutorial - n° 7
Assistant Partners
Industrial Partners
•Datamat (Italy)
•IBM-UK (UK)
•CS-SI (France)
Research and Academic Institutes
•CESNET (Czech Republic)
•Commissariat à l'énergie atomique (CEA) – France
•Computer and Automation Research Institute,
Hungarian Academy of Sciences (MTA SZTAKI)
•Consiglio Nazionale delle Ricerche (Italy)
•Helsinki Institute of Physics – Finland
•Institut de Fisica d'Altes Energies (IFAE) - Spain
•Istituto Trentino di Cultura (IRST) – Italy
•Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany
•Royal Netherlands Meteorological Institute (KNMI)
•Ruprecht-Karls-Universität Heidelberg - Germany
•Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands
•Swedish Research Council - Sweden
The EDG Intro– Tutorial - n° 8
Project Schedule

Project started on 1/Jan/2001

TestBed 0 (early 2001)

International test bed 0 infrastructure deployed


Globus 1 only - no EDG middleware
TestBed 1 ( now )

First release of EU DataGrid software to defined users within the project:

HEP experiments (WP 8), Earth Observation (WP 9), Biomedical applications (WP 10)

Successful Project Review by EU: March 1st 2002

TestBed 2 (October 2002)

Builds on TestBed 1 to extend facilities of DataGrid

TestBed 3 (March 2003) & 4 (September 2003)

Project stops on 31/Dec/2003
The EDG Intro– Tutorial - n° 9
EDG Highlights

The project is up and running!



All 21 partners are now contributing at contractual level
total of ~60 man years for first year
All EU deliverables (40, >2000 pages) submitted

in time for the review according to the contract technical annex

First test bed delivered with real production demos

All deliverables (code & documents) available via www.edg.org


http://eu-datagrid.web.cern.ch/eu-datagrid/Deliverables/default.htm
requirements, surveys, architecture, design, procedures, testbed analysis
etc.
The EDG Intro– Tutorial - n° 10
DataGrid work packages

The EDG collaboration is structured in 12 Work Packages












WP1: Work Load Management System
WP2: Data Management
WP3: Grid Monitoring / Grid Information Systems
WP4: Fabric Management
WP5: Storage Element
WP6: Testbed and demonstrators – Production quality
International Infrastructure
WP7: Network Monitoring
WP8: High Energy Physics Applications
WP9: Earth Observation
WP10: Biology
WP11: Dissemination
WP12: Management
The EDG Intro– Tutorial - n° 11
Objectives for the first year of the
project

Collect requirements for middleware


Take into account requirements from
application groups



For all middleware

Job resource specification & scheduling
WP2: data management

Survey current technology

WP1: workload
Data access, migration & replication
WP3: grid monitoring services
Monitoring infrastructure, directories &
presentation tools


Core Services testbed


Testbed 0: Globus (no EDG middleware)

Framework for fabric configuration
management & automatic sw installation

First Grid testbed release

Testbed 1: first release of EDG
middleware
WP4: fabric management

WP5: mass storage management


Common interface for Mass Storage Sys.
WP7: network services

Network services and monitoring
The EDG Intro– Tutorial - n° 12
DataGrid Architecture
Local Computing
Grid
Local Application
Local Database
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to
File Mapping
Collective Services
Information
&
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL
Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Grid
Fabric
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
The EDG Intro– Tutorial - n° 13
EDG Interfaces
Application
Developers
System
Managers
Local Database
Scientist
s
Grid Application Layer
Data
Management
Job
Management
File
Systems
Local Application
Metadata
Management
Object to File
Mapping
Certificate
Authorities
Collective Services
User Accounts
Information
& Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL
Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Operating
Systems
Mass Storage
Systems
HPSS, Castor
Storage
Elements
Batch Systems
PBS, LSF
Computing Elements
The EDG Intro– Tutorial - n° 14
WP1: Work Load
Management
Local Application
Grid Application Layer
Data
Management
Job
Management

Goals


Achievements




Maximize use of resources by efficient
scheduling of user jobs
Analysis of work-load management system
requirements & survey of existing mature
implementations Globus & Condor (D1.1)
Definition of architecture for scheduling &
res. mgmt. (D1.2)
Development of "super scheduling" component
using application data and computing elements
requirements
Issues


Integration with software from other WPs
Advanced job submission facilities
Local Database
Metadata
Management
Object to File
Mapping
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Current components
Job Description Language
Resource Broker
Job Submission Service
Information Index
User Interface
Logging & Bookkeeping
Service
The EDG Intro– Tutorial - n° 15
WP2: Data Management

Goals


Local Database
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to File
Mapping
Achievements






Coherently manage and share petabyte-scale information
volumes in high-throughput production-quality grid
environments
Local Application
Survey of existing tools and technologies for data access
and mass storage systems (D2.1)
Definition of architecture for data management (D2.2)
Deployment of Grid Data Mirroring Package (GDMP) in
testbed 1
Close collaboration with Globus, PPDG/GriPhyN & Condor
Working with GGF on standards
Issues

Security: clear mechanisms handling authentication and
authorization
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Current components
GDMP
Replica Catalog
Replica Manager
Spitfire
The EDG Intro– Tutorial - n° 16
WP3: Grid Monitoring Services

Goals


Achievements






Provide information system for discovering
resources and monitoring status
Survey of current technologies (D3.1)
Coordination of schemas in testbed 1
Development of Ftree caching backend based on
OpenLDAP (Light Weight Directory Access
Protocol) to address shortcoming in MDS v1
Design of Relational Grid Monitoring
Architecture (R-GMA) (D3.2) – to be further
developed with GGF
GRM and PROVE adapted to grid environments to
support end-user application monitoring
Issues

MDS vs. R-GMA
Local Application
Local Database
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to File
Mapping
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorizat ion
Logging &
Authentication
and Accounting Book-keeping
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Components
MDS/Ftree
R-GMA
GRM/PROVE
The EDG Intro– Tutorial - n° 17
WP4: Fabric Management

Goals


Achievements




manage clusters (~thousands) of nodes
Survey of existing tools, techniques and
protocols (D4.1)
Defined an agreed architecture for fabric
management (D4.2)
Initial implementations deployed at several
sites in testbed 1
Issues

How to ensure the node configurations
are consistent and handle updates to
the software suites
Local Application
Local Database
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to File
Mapping
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Logging &
Authentication Book-keeping
and Accounting
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Components
LCFG
PBS & LSF info providers
Image installation
Config. Cache Mgr
The EDG Intro– Tutorial - n° 18
WP5: Mass Storage Management

Goals


Provide common user and data export/import
interfaces to existing local mass storage systems
Achievements






Local Application
Review of Grid data systems, tape and disk storage
systems and local file systems (D5.1)
Definition of Architecture and Design for DataGrid
Storage Element (D5.2)
Collaboration with Globus on GridFTP/RFIO
Collaboration with PPDG on control API

Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to File
Mapping
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
First attempt at exchanging Hierarchical Storage
Manager (HSM) tapes
Issues

Local Database
Scope and requirements for storage element
Inter-working with other Grids
Components
Storage Element info.
providers
RFIO
MSS staging
The EDG Intro– Tutorial - n° 19
WP7: Network Services
Local Application

Goals





Review the network service requirements for DataGrid
Establish and manage the DataGrid network facilities
Monitor the traffic and performance of the network
Deal with the distributed security aspects
Achievements






Local Database
Analysis of network requirements for testbed 1 & study
of available network physical infrastructure (D7.1)
Use of European backbone GEANT since Dec. 2001
Initial network monitoring architecture defined (D7.2)
and first tools deployed in testbed 1
Collaboration with Dante & DataTAG
Working with GGF (Grid High Performance Networks) &
Globus (monitoring/MDS)
Issues


Resources for study of security issues
End-to-end performance for applications depend on a
complex combination of components
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Object to File
Mapping
Collective Services
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Authorization
Authentication
and Accounting
Logging &
Bookkeepgin
Fabric services
Resource
Management
Configuration
Management
Monitoring
and
Fault Tolerance
Node
Installation &
Management
Fabric Storage
Management
Components
network monitoring
tools:
PingER
Udpmon
Iperf
The EDG Intro– Tutorial - n° 20
WP6: TestBed Integration

Goals



application experiments & demos
Integrate successive releases of the software
components
Achievements




Deploy testbeds for the end-to-end
Integration of EDG sw release 1.0 and
deployment


Local Database
Grid Application Layer
Data
Management
Job
Management
Metadata
Management
Information &
Monitoring
Replica
Manager
Grid
Scheduler
Underlying Grid Services
SQL Database
Services
Computing
Element
Services
Storage
Element
Services
Replica
Catalog
Resource
Management
Monitoring
and
Fault Tolerance
Configuration
Management
Definition of acceptable usage contracts and
creation of Certification Authorities group
Components
Support for production-style usage of the
testbed
Authorization
Authentication
and Accounting
Logging &
Bookkeeping
Fabric services
EDG release
Procedures for software integration
Test plan for software release
Object to File
Mapping
Collective Services
Working implementation of multiple Virtual
Organisations (VOs) s & basic security
infrastructure
Issues

Local Application
Globus
Node
Installation &
Management
Fabric Storage
Management
WP6 additions
to Globus
Globus packaging & EDG
config
Build tools
End-user documents
The EDG Intro– Tutorial - n° 21
Grid aspects covered by EDG testbed 1
VO servers
LDAP directory for mapping
users (with certificates) to
correct VO
Storage Element
Grid-aware storage area,
situated close to a CE
User Interface
Submit & monitor jobs,
retrieve output
Replica Manager
Replicates data to one or more
CEs
Job Submission Service Manages submission of jobs
Replica Catalog
Keeps track of multiple data
files “replicated” on different
CEs
Information index
Provides info about grid
resources via GIIS/GRIS
hierarchy
Information &
Monitoring
Provides info on resource
utilization & performance
Resource Broker
Uses Info Index to discover
& select resources based on
job requirements
Grid Fabric Mgmt
Configure, installs & maintains
grid sw packages and environ.
to Res. Broker
Logging and Bookkeeping Collects resource usage &
job status
Computing Element
Gatekeeper to a grid
computing resource
Network performance, Provides efficient network
security and monitoring transport, security &
bandwidth monitoring
Testbed admin.
Certificate auth.,user reg.,
usage policy etc.
The EDG Intro– Tutorial - n° 22
Tasks for the WP6 integration team







Testing and integration of the Globus package
Exact definition of RPM lists (components) for the various
testbed machine profiles (CE service , RB, UI, SE service , NE,
WN, ) – check dependencies
Perform preliminary centrally (CERN) managed tests on EDG
m/w before green light for spread EDG testbed sites
deployment
Provide, update end user documentation for installers/site
managers, developers and end users
Define EDG release policies, coordinate the integration team
staff with the various WorkPackage managers – keep high
inter-coordination.
Assign the reported bugs to the corresponding developers/site
managers (BugZilla)
Complete support for the iTeam testing VO
The EDG Intro– Tutorial - n° 23
EDG overview: Middleware release
schedule

Planned intermediate release schedule
 Release 1.1:
January 2002
July 2002
1.1.3
 Release 1.2:
March

Release 1.3:

Release 1.4:
May
Internal
2002
August
July 2002

Similar schedule for 2003

Each release includes




feedback from use of previous release by application groups
planned improvements/extension by middle-ware WPs
use of WP6 software infrastructure
feeds into architecture group
The EDG Intro– Tutorial - n° 24
Release Plan details

Current release EDG 1.1.4

Deployed on testbed under RedHat
6.2



Finalising build of EDG 1.2 (now)


GDMP 3.0
GSI-enabled RFIO client and
server

EDG 1.3 (internal)
Build using autobuild tools – to
ease future porting
EDG 1.4 (August)

Support RH 6.2 & 7.2
Basic support for interactive jobs
Integration of Condor DAGman
Use MDS 2.2 with first GLUE schema
EDG 2.0 (Oct)




Still based on Globus 2.x (pre-OGSA)
Use updated GLUE schema
Job partitioning & check-pointing
Advanced reservation/co-allocation
Support for MPI on single site
See http://edms.cern.ch/document/333297 for further details
The EDG Intro– Tutorial - n° 25
EDG overview : testbed schedule

Planned intermediate testbed schedule
Testbed 0: March 2001




Testbed 1:
November 2001-January 2002

Testbed 2:
September-October 2002

Testbed 3:
March 2003

Testbed 4:
September-October 2003
Number of EDG testbed sites permanently increasing : currently 9
sites are visible to the CERN resource broker
Each site normally implements, at least :




A central install & config server (LCFG server)
WMS (WP1) dedicated machines : UI, CE (g/k & worker node(s) )
MDS Info Providers to the global EDG GIIS/GRIS
Network Monitoring
The EDG Intro– Tutorial - n° 26
Development & Production testbeds

Development


Initial set of 5 sites will keep small cluster of PCs for development
purposes to test new versions of the software, configurations etc.
Production

More stable environment for use by application groups




more sites
more nodes per site (grow to meaningful size at major centres)
more users per VO
Usage already foreseen in Data Challenge schedules for LHC
experiments

harmonize release schedules
The EDG Intro– Tutorial - n° 27