gLite Architecture

Download Report

Transcript gLite Architecture

Enabling Grids for E-sciencE

www.eu-egee.org

www.glite.org

EGEE-II INFSO-RI-031688

The middleware

Based on material by Sergio Andreozzi INFN-CNAF OMII-Europe All-Hands Meeting Bologna, 12-13 February 2007

Disclaimer

Enabling Grids for E-sciencE

This presentation is based on materials provided and authorized by the EGEE project and is freely available to download and use according to the terms of the following license: http://creativecommons.org/licenses/by-nc-sa/2.5/

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 2

OUTLINE

Enabling Grids for E-sciencE

The EGEE Project

– Objective – Relationship to other projects •

The gLite middleware

– – Development process Middleware decomposition   Foundation High-level services EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 3

Enabling Grids for E-sciencE

Part I The EGEE Project

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 4

The EGEE project

Enabling Grids for E-sciencE

EGEE

– – 1 April 2004 – 31 March 2006 71 partners in 27 countries, federated in regional Grids •

EGEE-II

– – – 1 April 2006 – 31 March 2008 91 partners in 32 countries 13 Federations •

EGEE-III

– 1 April 2008 – 31 March 2010 – More than 120 partners •

Objectives

– Large-scale, production-quality infrastructure for e-Science – Attracting new resources and users from industry as well as science – Improving and maintaining “gLite” Grid middleware EGEE-II INFSO-RI-031688 • • • •

US partners in EGEE-II: Univ. Chicago Univ. South. California Univ. Wisconsin RENCI gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 5

Main lines of the EGEE project

Enabling Grids for E-sciencE Infrastructure operation

Currently includes sites across 39 countries

Continuous monitoring of grid services & automated site configuration/management

Middleware

Production quality middleware distributed under business friendly open source licence

User Support - Managed process from first contact

through to production usage

Training

– – –

Expertise in grid-enabling applications Online helpdesk Networking events (User Forum, Conferences etc.)

Interoperability

Expanding geographical reach and interoperability with related infrastructures KnowARC TWGRID

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 6

Applications on EGEE

Enabling Grids for E-sciencE Applications from an increasing number of domains

– Astrophysics – – – – – – – – – – Computational Chemistry Earth Sciences Financial Simulation Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 7

EU projects related to EGEE

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 8

Sustainability: Beyond EGEE-II

Enabling Grids for E-sciencE

Need to prepare for permanent Grid infrastructure

– Ensure a reliable and adaptive support for all sciences – – Independent of short project funding cycles Infrastructure managed in collaboration with national grid initiatives EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 9

Enabling Grids for E-sciencE

Part II The gLite middleware

Programming the Grid with gLite http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 10

• • •

The release of gLite 3.0

Enabling Grids for E-sciencE Convergence of LCG 2.7.0 and gLite 1.5.0

in spring 2006

– Continuity on the production infrastructure ensured usability by applications – Initial focus on the new Job Management  Thorough testing and optimization together with the applications

LCG-2 2004 gLite

prototyping prototyping product

Migration to the ETICS build system

– ETICS project started in January 2006

2005

product

Reorganization of the work according to the new process

– EGEE Technical Coordination Group and Task Forces – – Start of the EGEE SA3 Activity for integration and certification “ Continuous release process ”  No big-bang releases!

2006 gLite 3.0

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 11

gLite Software Process

Enabling Grids for E-sciencE

JRA1 Development

Directives Bug Fixing Software Serious problem Problem

SA1 Production Infrastructure

Release

EGEE-II INFSO-RI-031688 SA3 Integration SA3 Testing & Certification SA1 Pre Production

Deployment Packages Fail Testbed Deployment Integration Tests Pass Fail Pre-Production Deployment Functional Tests Pass Installation Guide, Release Notes, etc Pass Scalability Tests gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 Fail 12

Middleware structure

Enabling Grids for E-sciencE Applications

Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services

Workload Management Replica Management Visualization Workflow Grid Economies ...

Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware

Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring •

Foundation Grid Middleware will be deployed on the EGEE infrastructure

– Must be complete and robust – Should allow interoperation with other major grid infrastructures – Should not assume the use of Higher-Level Grid Services

Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 13

gLite Services Decomposition

Enabling Grids for E-sciencE 6 High Level Services + CLI & API

Legend: Available Foreseen in the architecture (only Job provenance will be available by the end of EGEE-II) EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 14

Author.

&Authen.

UI JDL Enabling Grids for E-sciencE Input “sandbox” Output “sandbox” Resource Broker

Job Workflow in gLite

DataSets info LFC Catalog Information Service Logging & Book-keeping

EGEE-II INFSO-RI-031688

Globus RSL Storage Element Job Status Job Submission Service Job Status Computing Element gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 15

Author.

&Authen.

UI JDL Enabling Grids for E-sciencE Input “sandbox” Output “sandbox” Resource Broker

Job Workflow in gLite

DataSets info LFC Catalog Information Index Logging & Book-keeping

EGEE-II INFSO-RI-031688

Globus RSL Storage Element Job Status Job Status Job Submission Service

WMProxy Computing Element gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 16

High Level Services: Workload Manag.

Enabling Grids for E-sciencE Resource brokering, workflow management, I/O data management

 Web Service interface: WMProxy – Task Queue: keep non matched jobs – – Information SuperMarket: optimized cache of information system Match Maker: assigns jobs to resources according to user requirements – Job submission & monitoring  Condor-G 

ICE (to CREAM)

– External interactions:     Information System Data Catalogs Logging&Bookkeeping

Policy Management system (G-PBox)

EGEE-II INFSO-RI-031688

OSG Consortium Meeting - Seattle - 21-23 August 2006 17

• •

Grid Foundation: Security

Enabling Grids for E-sciencE Authentication based on X.509 PKI infrastructure

– Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)  Commonly used in web browsers to authenticate to sites – – Trust between CAs and sites is established (offline) In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates

Proxies can

– Be delegated to a service such that it can act on the user’s behalf – Include additional attributes (like VO information via the VO Membership Service VOMS) – – Be stored in an external proxy store (MyProxy) Be renewed (in case they are about to expire) EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 18

Grid foundation: Information Systems

Enabling Grids for E-sciencE Generic Information Provider (GIP)

– Provides LDIF information about a grid service in accordance to the GLUE Schema GIP Config File Cache Provider Plugin LDIF File •

BDII : Information system in gLite 3.0 (by LCG)

– LDAP database that is updated by a process – More than one DBs is used separate read and write 2171 LDAP 2172 LDAP 2173 LDAP – A port forwarder is used internally to select the correct DB 2170 Port Fwd 2170 Port Fwd Update DB & Modify DB Swap DBs EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 19

Grid foundation: Information Systems

Enabling Grids for E-sciencE

R-GMA : provides a uniform method to access and publish distributed information and monitoring data

– – Used for job and infrastructure monitoring in gLite 3.0

Working to add authorization •

Service Discovery :

– – – – Provides a standard set of methods for locating Grid services Currently supports R-GMA, BDII and XML files as backends Will add local cache of information Used by some DM and WMS components in gLite 3.0

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 20

• •

Grid foundation: Computing Element

Enabling Grids for E-sciencE Three flavours available now:

 LCG-CE (GT2 GRAM )  In production now but will be phased-out next year  gLite-CE (GSI-enabled Condor-C )  Already deployed but still needs thorough testing and tuning. Being done now 

CREAM (WS-I based interface)

 Deployed on the JRA1 preview test-bed. After a first testing phase will be certified and deployed together with the gLite-CE  Our contribution to the OGF-BES group for a standard WS-I based CE interface  CREAM and WMProxy demo at SC06!

BLAH resource manager (via plug-ins)

is the interface to the local

CREAM and gLite-CE –

Information pass-through

scheduling : pass parameters to the LRMS to help job WMS, Clients Computing Element glexec + LCAS/ LCMAPS WN Information System bdII R-GMA CEMon BLAH LRMS EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 21

Grid foundation: Accounting

Enabling Grids for E-sciencE

APEL : Uses R-GMA to propagate and display job accounting information for infrastructure monitoring

– – Reads LRMS log files provided by LCG-CE and BLAH Preparing an update for gLite 3.0 to use the files form BLAH •

DGAS : Collects, stores and transfers accounting data. Compliant with privacy requirements

– – Reads LRMS log files provided by LCG-CE and BLAH.

Stores information in a site database (HLR) and optionally in a central HLR. Access granted to user, site and VO administrators – Not yet certified in gLite 3.0. Deployment plan:   DGAS is in certification at INFN It will send records to the GOC via DGAS2APEL EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 22

Grid foundation: Storage Element

Enabling Grids for E-sciencE Storage Element

– – Common interface: SRMv1, migrating to SRM v2.2

Various implementation from LCG and other external projects  disk-based: DPM , dCache / tape-based: Castor, dCache – Support for ACLs in DPM (in future in Castor and dCache)  After the summer: synchronization of ACLs between SEs – Common rfio library for Castor and DPM being added •

Posix-like file access:

– Grid File Access Layer ( GFAL ) by LCG   Support for ACL in the SRM layer (currently in DPM only) Support for SRMv2 being added now – gLite I/O  Support for ACLs from the file catalog and interfaced to Hydra for data encryption  Not certified in gLite 3.0. To be dismissed when all functionalities will be also available in GFAL.

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 23

High Level Services: Catalogues

Enabling Grids for E-sciencE

File Catalogs

– LFC from LCG   In June: interface to POOL.

In the summer: LFC replication and backup.

– Hydra : stores keys for data encryption   Being interfaced to GFAL (done by July) Currently only one instance, but in future there will be 3 instances: at least 2 need to be available for decryption.

 Not yet certified in gLite 3.0. Certification will start soon.

– AMGA Metadata Catalog: generic metadata catalogue  Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed  Not yet certified in gLite 3.0. Certification will start soon.

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 24

High Level Services: File transfer

Enabling Grids for E-sciencE

FTS : Reliable, scalable and customizable file transfer

– Manages transfers through channels  mono-directional network pipes between two sites – – – Web service interface Automatic discovery of services Support for different user and administrative roles – – Adding support for pre-staging and new proxy renewal schema Support for SRMv2.2, delegation, VOMS-aware proxy renewal in certification EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 25

High Level Services: Workload mgmt.

Enabling Grids for E-sciencE

• • •

WMS helps the user accessing computing resources

– Resource brokering, management of job input/output, ...

LCG-RB : GT2 + Condor-G

– To be replaced when the gLite WMS proves to be reliable

gLite WMS : Web service ( WMProxy ) + Condor-G

– Management of complex workflows (DAGs) and compound jobs   bulk submission and shared input sandboxes support for input files on different servers (scattered sandboxes) – – – Support for

shallow resubmission

of jobs Job File Perusal: file peeking during job execution Supports collection of information from CEMon, BDII, R-GMA and from DLI and StorageIndex data management interfaces – – Support for parallel jobs (MPI) when the home dir is not shared Deployed for the first time in gLite 3.0

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 26

WMS/LB/UI and CE

Enabling Grids for E-sciencE

• 100 80

New WMS deployed and thoroughly debugged

– CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs  ~ 2.5 h to submit jobs •

0.5 seconds/job

 ~ 17 hours to transfer jobs to a CE • •

3 seconds/job 26K jobs/day

 Negligible failure rate due to WMS – Shallow resubmission  failure rate drops to less than 1% with 3 resubmissions

Done(Success) jobs after ith Submission

Stability problems

CMS

investigating also other deployment scenarios to make it more robust 60 40 20 0 0 1 2 3 4

Number of Submission

5

ATLAS

6 •

gLite CE still to be tested and optimized

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 27

• • • •

High Level Services: Workflows

Enabling Grids for E-sciencE Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs nodeB nodeA nodeC nodeE A Collection is a group of jobs with no dependencies nodeD

– basically a collection of JDL’s

A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs

– Submission time reduction    Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs – Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 28

• •

High Level Services: Job Information

Enabling Grids for E-sciencE Logging and Bookkeeping

– – – – Tracks jobs during their lifetime (in terms of events) LBProxy for fast access

service

L&B API and CLI to query jobs Support for “CE reputability ranking“: maintains recent statistics of job failures at CE’s and feeds back to WMS to aid planning

Job Provenance : stores long term job information

– – Supports job rerun If deployed will also help unloading the L&B – Not yet certified in gLite 3.0. EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 29

Highlights: Job Priorities

Enabling Grids for E-sciencE Applications ask for the possibility to diversify the access to fast/slow queues depending on the user role/group inside the VO

GPBOX is a tool that provides the possibility to define, store and propagate fine-grained VO policies

– – – based on VOMS groups and roles enforcement of policies at sites: sites may accept/reject policies Not yet certified. Certification will start when requested by the TCG.

Current activities: test job prioritization without GPBOX: - Map VOMS groups to batch system shares - Publish info on the share in the CE GLUE 1.2 schema (VOView) - WMS match-making depending on submitter VOMS certificate - Settings are not dynamic (via e-mail or CE updates) - GIP available for Torque/Maui only. Working on the LSF one - mainly a deployment issue

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 30

Summary

Enabling Grids for E-sciencE

gLite 3 is

– – the next generation middleware for grid computing developed according to a well defined process  controlled by the EGEE Technical Coordination Group – deployed on the EGEE production infrastructure  More than 200 sites – development is continuing to provide increased robustness, usability, and functionality  On the preview testbed •

CREAM, Job Provenance, glexec on the WNs, GPBOX

– gLite sources: http://glite.cvs.cern.ch/cgi-bin/glite.cgi/ EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 31

Enabling Grids for E-sciencE www.glite.org

EGEE-II INFSO-RI-031688

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 32