gLite Architecture
Download
Report
Transcript gLite Architecture
Enabling Grids for E-sciencE
The
middleware
Slides based on material from
Sergio Andreozzi
OMII-Europe All-Hands Meeting
INFN-CNAF
Bologna, 12-13 February 2007
www.eu-egee.org
www.glite.org
EGEE-II INFSO-RI-031688
and from
Pedro Rausch Bello
3rd EELA workshop
Disclaimer
Enabling Grids for E-sciencE
• This presentation is based on materials provided and
authorized by the EGEE project and is freely available
to download and use according to the terms of the
following license:
http://creativecommons.org/licenses/by-nc-sa/2.5/
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
2
OUTLINE
Enabling Grids for E-sciencE
• The gLite middleware
– Development process
– Middleware decomposition
Foundation
High-level services
• The EGEE Project
– Objective
– Relationship to other projects
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
3
Enabling Grids for E-sciencE
Part I
The gLite middleware
Programming the Grid with gLite
http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
4
Enabling Grids for E-sciencE
gLite process
• Process controlled by the
Technical Coordination Group
(TCG)
• Task Forces with developers,
applications, testers and
deployment experts
• After gLite 3.0 adopt a
continuous release process
– No more big-bang releases
with fixed deadlines for all
– Develop components as
requested by users and sites
– Deploy or upgrade as soon as
testing is satisfactory
• Major releases synchronized
with large scale activities of
VOs (SCs)
– Next major release foreseen for
the autumn
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
5
Middleware structure
Enabling Grids for E-sciencE
Applications
•
Applications have access both to
Higher-level Grid Services and to
Foundation Grid Middleware
•
Higher-Level Grid Services are
supposed to help the users
building their computing
infrastructure but should not be
mandatory
•
Foundation Grid Middleware will
be deployed on the EGEE
infrastructure
Higher-Level Grid Services
Workload Management
Replica Management
Visualization
Workflow
Grid Economies
...
Foundation Grid Middleware
Security model and infrastructure
Computing (CE) and Storage Elements (SE)
Accounting
Information and Monitoring
– Must be complete and robust
– Should allow interoperation with
other major grid infrastructures
– Should not assume the use of
Higher-Level Grid Services
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
6
gLite Services Decomposition
Enabling Grids for E-sciencE
6 High Level Services
+ CLI & API
Legend:
Available
Foreseen in the
architecture (only Job
provenance will be
available by the end
of EGEE-II)
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
7
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
8
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
9
Grid Foundation: Security
Enabling Grids for E-sciencE
• Authentication based on X.509 PKI infrastructure
– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport)
Commonly used in web browsers to authenticate to sites
– Trust between CAs and sites is established (offline)
– In order to reduce vulnerability, on the Grid user identification is
done by using (short lived) proxies of their certificates
• Proxies can
– Be delegated to a service such that it can act on the user’s
behalf
– Include additional attributes (like VO information via the VO
Membership Service VOMS)
– Be stored in an external proxy store (MyProxy)
– Be renewed (in case they are about to expire)
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
10
AuthN and AuthZ: pre-VOMS
Enabling Grids for E-sciencE
• Authentication
•CA: Certif. Authority
•AUP: Acceptable Use Policy
– User receives certificate
signed by CA
– Connects to “UI” by ssh
– Downloads certificate
– Single logon to Grid – create
proxy - then Grid Security
Infrastructure identifies user
to other machines
CA
2.
3.
AUP
VO
mgr
UI
VO service
• Authorisation
– User joins Virtual Organisation
– VO negotiates access to Grid
nodes and resources
– Authorisation tested by CE
– gridmapfile maps user to
local account
EGEE-II INFSO-RI-031688
1.
GSI
VO
database
Daily update
grid-mapfiles
on Grid services
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
11
Evolution of VO management
Enabling Grids for E-sciencE
VOMS
Before VOMS
•
•
•
•
User is authorised as a member
of a single VO
All VO members have same
rights
Gridmapfiles are updated by VO
management software: map the
user’s DN to a local account
grid-proxy-init – derives proxy
from certificate – the “single
sign-on to the grid”
•
– Aggregate rights
•
VO can have groups
– Different rights for each
Different groups of
experimentalists
…
– Nested groups
•
VO has roles
– Assigned to specific purposes
E,g. system admin
When assume this role
•
•
EGEE-II INFSO-RI-031688
User can be in multiple VOs
Proxy certificate carries the
additional attributes
voms-proxy-init
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
12
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• Generic Information
Provider (GIP)
GIP
– Provides LDIF
information about
a grid service in
accordance to the
GLUE Schema
Provider
Cache
Plugin
LDIF
File
Config
File
• BDII: Information system in gLite 3.0 (by LCG)
– LDAP database that is
updated by a process
2171
– More than one DBs is used
LDAP
separate read and write
– A port forwarder is used internally
to select the correct DB
2172
LDAP
2173
LDAP
Update DB
&
Modify DB
Swap DBs
2170
Port Fwd
2170
Port Fwd
•LDIF: LDAP Data Interchange Format
•LDAP: Lightweight Directory Access Protocol
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
13
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• R-GMA: provides a uniform method to access and
publish distributed information and monitoring data
– Used for job and infrastructure monitoring in gLite 3.0
– Working to
add
authorization
• Service Discovery:
–
–
–
–
Provides a standard set of methods for locating Grid services
Currently supports R-GMA, BDII and XML files as backends
Will add local cache of information
Used by some DM and WMS components in gLite 3.0
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
14
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• Three flavours available now:
LCG-CE (GT2 GRAM)
In production now but will be phased-out
next year
•JRA: Joint Research Activity
•WS-I: Web Services Interoperability
•CREAM: Computing Resource Execution
and Management
•BES: Basic Execution Service
CREAM (WS-I based interface)
Deployed on the JRA1 preview test-bed.
After a first testing phase will be certified
and deployed together with the gLite-CE
Our contribution to the OGF-BES group
for a standard WS-I based CE interface
CREAM and WMProxy demo at SC06!
• BLAH is the interface to the local
resource manager (via plug-ins)
– CREAM and gLite-CE
– Information pass-through: pass
parameters to the LRMS to help job
scheduling
EGEE-II INFSO-RI-031688
Site
Already deployed but still needs
thorough testing and tuning. Being done
now
Information
System
Computing
Element
bdII
R-GMA
CEMon
glexec +
LCAS/
LCMAPS
BLAH
WN
LRMS
Grid
gLite-CE (GSI-enabled Condor-C)
WMS,
Clients
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
15
Grid foundation: Accounting
Enabling Grids for E-sciencE
• APEL (Accounting Processor for Event Logs) : Uses R-GMA
(Relational Grid Monitoring Architecture) to propagate and display
job accounting information for infrastructure monitoring
– Reads LRMS log files provided by LCG-CE and BLAH
– Preparing an update for gLite 3.0 to use the files from BLAH
• DGAS (Distributed Grid Accounting System): Collects, stores and
transfers accounting data. Compliant with privacy requirements
– Reads LRMS log files provided by LCG-CE and BLAH.
– Stores information in a site database (HLR, Home Location Register)
and optionally in a central HLR. Access granted to user, site and VO
administrators
– Not yet certified in gLite 3.0. Deployment plan:
DGAS is in certification at INFN
It will send records to the GOC via DGAS2APEL
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
16
Grid foundation: Storage Element
Enabling Grids for E-sciencE
• Storage Element
– Common interface: SRMv1, migrating to SRM v2.2
– Various implementation from LCG and other external projects
disk-based: DPM, dCache / tape-based: Castor, dCache
– Support for ACLs in DPM (in future in Castor and dCache)
After the summer: synchronization of ACLs between SEs
– Common rfio library for Castor and DPM being added
• Posix-like file access:
– Grid File Access Layer (GFAL) by LCG
Support for ACL in the SRM layer (currently in DPM only)
Support for SRMv2 being added now
– gLite I/O
Support for ACLs from the file catalog and interfaced to Hydra for data
encryption
Not certified in gLite 3.0. To be dismissed when all functionalities will be
also available in GFAL.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
17
High Level Services: Catalogues
Enabling Grids for E-sciencE
• File Catalogs
– LFC from LCG
In June: interface to POOL.
In the summer: LFC replication and backup.
– Hydra: stores keys for data encryption
Being interfaced to GFAL (done by July)
Currently only one instance, but in future there will be 3 instances:
at least 2 need to be available for decryption.
Not yet certified in gLite 3.0. Certification will start soon.
– AMGA Metadata Catalog: generic metadata catalogue
Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed
Not yet certified in gLite 3.0. Certification will start soon.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
18
High Level Services: File transfer
Enabling Grids for E-sciencE
• FTS: Reliable, scalable and
customizable file transfer
– Manages transfers through channels
mono-directional network pipes
between two sites
– Web service interface
– Automatic discovery of services
– Support for different user and administrative
roles
– Adding support for
pre-staging and new
proxy renewal schema
– Support for SRMv2.2,
delegation,
VOMS-aware proxy
renewal in certification
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
19
High Level Services: Workload mgmt.
Enabling Grids for E-sciencE
• WMS helps the user accessing computing resources
– Resource brokering, management of job input/output, ...
• LCG-RB: GT2 + Condor-G
– To be replaced when the gLite WMS proves to be reliable
• gLite WMS: Web service (WMProxy) + Condor-G
– Management of complex workflows (DAGs) and compound jobs
bulk submission and shared input sandboxes
support for input files on different servers (scattered sandboxes)
– Support for shallow resubmission of jobs
– Job File Perusal: file peeking during job execution
– Supports collection of information from CEMon, BDII, R-GMA
and from DLI and StorageIndex data management interfaces
– Support for parallel jobs (MPI) when the home dir is not shared
– Deployed for the first time in gLite 3.0
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
20
WMS/LB/UI and CE
Enabling Grids for E-sciencE
• New WMS deployed and thoroughly debugged
– CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs
~ 2.5 h to submit jobs
• 0.5 seconds/job
~ 17 hours to transfer jobs to a CE
• 3 seconds/job
• 26K jobs/day
Negligible failure rate due to WMS
– Shallow resubmission
failure rate drops to less than 1% with
3 resubmissions
• Stability problems
Done(Success) jobs after ith Submission
– investigating also other deployment
scenarios to make it more robust
100
80
(%)
CMS
60
40
20
0
0
1
2
3
4
Number of Submission
EGEE-II INFSO-RI-031688
5
6
ATLAS
• gLite CE still to be tested and
optimized
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
21
High Level Services: Workflows
Enabling Grids for E-sciencE
• Direct Acyclic Graph (DAG) is a
set of jobs where the input,
output, or execution of one or
more jobs depends on one or
more other jobs
• A Collection is a group of jobs
with no dependencies
nodeA
nodeB
nodeC
nodeE
nodeD
– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL
that vary their values according to parameters
• Using compound jobs it is possible to have one shot submission
of a (possibly very large, up to thousands) group of jobs
– Submission time reduction
Single call to WMProxy server
Single Authentication and Authorization process
Sharing of files between jobs
– Availability of both a single Job ID to manage the group as a whole and
an ID for each single job in the group
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
22
High Level Services: Job Information
Enabling Grids for E-sciencE
• Logging and Bookkeeping service
–
–
–
–
Tracks jobs during their lifetime (in terms of events)
LBProxy for fast access
L&B API and CLI to query jobs
Support for “CE reputability ranking“: maintains recent statistics of
job failures at CE’s and feeds back to WMS to aid planning
• Job Provenance:
stores long term job
information
– Supports job rerun
– If deployed will also
help unloading the
L&B
– Not yet certified in
gLite 3.0.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
23
Highlights: Job Priorities
Enabling Grids for E-sciencE
• Applications ask for the possibility to diversify the access to
fast/slow queues depending on the user role/group inside the VO
• GPBOX is a tool that provides the possibility to define, store and
propagate fine-grained VO policies
– based on VOMS groups and roles
– enforcement of policies at sites: sites may accept/reject policies
– Not yet certified. Certification will start when requested by the TCG.
• Current activities: test job prioritization without GPBOX:
- Map VOMS groups to batch system shares
- Publish info on the share in the CE GLUE 1.2 schema (VOView)
- WMS match-making depending on submitter VOMS certificate
- Settings are not dynamic (via e-mail or CE updates)
- GIP available for Torque/Maui only. Working on the LSF one
- mainly a deployment issue
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
24
Summary
Enabling Grids for E-sciencE
• gLite 3 is
– is the next generation middleware for grid computing
– developed according to a well defined process
controlled by the EGEE Technical Coordination Group
– deployed on the EGEE production infrastructure
More than 200 sites
– development is continuing to provide increased robustness,
usability, and functionality
On the preview testbed
• CREAM, Job Provenance, glexec on the WNs, GPBOX
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
25
Enabling Grids for E-sciencE
Part II
The EGEE Project
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
26
The EGEE project
Enabling Grids for E-sciencE
• EGEE
– 1 April 2004 – 31 March 2006
– 71 partners in 27 countries, federated in regional Grids
• EGEE-II
– 1 April 2006 – 31 March 2008
– 91 partners in 32 countries
– 13 Federations
• Objectives
– Large-scale, production-quality
infrastructure for e-Science
– Attracting new resources and
users from industry as well as
science
– Improving and maintaining
“gLite” Grid middleware
EGEE-II INFSO-RI-031688
US partners in EGEE-II:
• Univ. Chicago
• Univ. South. California
• Univ. Wisconsin
• RENCI
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
27
Main lines of the EGEE project
Enabling Grids for E-sciencE
• Infrastructure operation
– Currently includes sites across 39 countries
– Continuous monitoring of grid services & automated site
configuration/management
• Middleware
– Production quality middleware distributed under
business friendly open source licence
• User Support - Managed process from first contact
through to production usage
–
–
–
–
Training
Expertise in grid-enabling applications
Online helpdesk
Networking events (User Forum, Conferences etc.)
KnowARC
• Interoperability
– Expanding geographical reach and interoperability
with related infrastructures
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
TWGRID
28
Applications on EGEE
Enabling Grids for E-sciencE
• Applications from an increasing
number of domains
–
–
–
–
–
–
–
–
–
–
–
Astrophysics
Computational Chemistry
Earth Sciences
Financial Simulation
Fusion
Geophysics
High Energy Physics
Life Sciences
Multimedia
Material Sciences
…
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
29
EU projects related to EGEE
Enabling Grids for E-sciencE
EU
GRID
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
30
Sustainability: Beyond EGEE-II
Enabling Grids for E-sciencE
• Need to prepare for permanent Grid infrastructure
– Ensure a reliable and adaptive support for all sciences
– Independent of short project funding cycles
– Infrastructure managed in collaboration
with national grid initiatives
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
31