gLite Architecture

Download Report

Transcript gLite Architecture

Enabling Grids for E-sciencE
The
middleware
Slides based on material from
Sergio Andreozzi
OMII-Europe All-Hands Meeting
INFN-CNAF
Bologna, 12-13 February 2007
www.eu-egee.org
www.glite.org
EGEE-II INFSO-RI-031688
and from
Pedro Rausch Bello
3rd EELA workshop
Disclaimer
Enabling Grids for E-sciencE
• This presentation is based on materials provided and
authorized by the EGEE project and is freely available
to download and use according to the terms of the
following license:
http://creativecommons.org/licenses/by-nc-sa/2.5/
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
2
OUTLINE
Enabling Grids for E-sciencE
• The gLite middleware
– Development process
– Middleware decomposition
 Foundation
 High-level services
• The EGEE Project
– Objective
– Relationship to other projects
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
3
Enabling Grids for E-sciencE
Part I
The gLite middleware
Programming the Grid with gLite
http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
4
Enabling Grids for E-sciencE
gLite process
• Process controlled by the
Technical Coordination Group
(TCG)
• Task Forces with developers,
applications, testers and
deployment experts
• After gLite 3.0 adopt a
continuous release process
– No more big-bang releases
with fixed deadlines for all
– Develop components as
requested by users and sites
– Deploy or upgrade as soon as
testing is satisfactory
• Major releases synchronized
with large scale activities of
VOs (SCs)
– Next major release foreseen for
the autumn
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
5
Middleware structure
Enabling Grids for E-sciencE
Applications
•
Applications have access both to
Higher-level Grid Services and to
Foundation Grid Middleware
•
Higher-Level Grid Services are
supposed to help the users
building their computing
infrastructure but should not be
mandatory
•
Foundation Grid Middleware will
be deployed on the EGEE
infrastructure
Higher-Level Grid Services
Workload Management
Replica Management
Visualization
Workflow
Grid Economies
...
Foundation Grid Middleware
Security model and infrastructure
Computing (CE) and Storage Elements (SE)
Accounting
Information and Monitoring
– Must be complete and robust
– Should allow interoperation with
other major grid infrastructures
– Should not assume the use of
Higher-Level Grid Services
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
6
gLite Services Decomposition
Enabling Grids for E-sciencE
6 High Level Services
+ CLI & API
Legend:
Available
Foreseen in the
architecture (only Job
provenance will be
available by the end
of EGEE-II)
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
7
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
8
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
9
Grid Foundation: Security
Enabling Grids for E-sciencE
• Authentication based on X.509 PKI infrastructure
– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport)
 Commonly used in web browsers to authenticate to sites
– Trust between CAs and sites is established (offline)
– In order to reduce vulnerability, on the Grid user identification is
done by using (short lived) proxies of their certificates
• Proxies can
– Be delegated to a service such that it can act on the user’s
behalf
– Include additional attributes (like VO information via the VO
Membership Service VOMS)
– Be stored in an external proxy store (MyProxy)
– Be renewed (in case they are about to expire)
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
10
AuthN and AuthZ: pre-VOMS
Enabling Grids for E-sciencE
• Authentication
•CA: Certif. Authority
•AUP: Acceptable Use Policy
– User receives certificate
signed by CA
– Connects to “UI” by ssh
– Downloads certificate
– Single logon to Grid – create
proxy - then Grid Security
Infrastructure identifies user
to other machines
CA
2.
3.
AUP
VO
mgr
UI
VO service
• Authorisation
– User joins Virtual Organisation
– VO negotiates access to Grid
nodes and resources
– Authorisation tested by CE
– gridmapfile maps user to
local account
EGEE-II INFSO-RI-031688
1.
GSI
VO
database
Daily update
grid-mapfiles
on Grid services
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
11
Evolution of VO management
Enabling Grids for E-sciencE
VOMS
Before VOMS
•
•
•
•
User is authorised as a member
of a single VO
All VO members have same
rights
Gridmapfiles are updated by VO
management software: map the
user’s DN to a local account
grid-proxy-init – derives proxy
from certificate – the “single
sign-on to the grid”
•
– Aggregate rights
•
VO can have groups
– Different rights for each
 Different groups of
experimentalists
 …
– Nested groups
•
VO has roles
– Assigned to specific purposes
 E,g. system admin
 When assume this role
•
•
EGEE-II INFSO-RI-031688
User can be in multiple VOs
Proxy certificate carries the
additional attributes
voms-proxy-init
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
12
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• Generic Information
Provider (GIP)
GIP
– Provides LDIF
information about
a grid service in
accordance to the
GLUE Schema
Provider
Cache
Plugin
LDIF
File
Config
File
• BDII: Information system in gLite 3.0 (by LCG)
– LDAP database that is
updated by a process
2171
– More than one DBs is used
LDAP
separate read and write
– A port forwarder is used internally
to select the correct DB
2172
LDAP
2173
LDAP
Update DB
&
Modify DB
Swap DBs
2170
Port Fwd
2170
Port Fwd
•LDIF: LDAP Data Interchange Format
•LDAP: Lightweight Directory Access Protocol
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
13
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• R-GMA: provides a uniform method to access and
publish distributed information and monitoring data
– Used for job and infrastructure monitoring in gLite 3.0
– Working to
add
authorization
• Service Discovery:
–
–
–
–
Provides a standard set of methods for locating Grid services
Currently supports R-GMA, BDII and XML files as backends
Will add local cache of information
Used by some DM and WMS components in gLite 3.0
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
14
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• Three flavours available now:
 LCG-CE (GT2 GRAM)
 In production now but will be phased-out
next year
•JRA: Joint Research Activity
•WS-I: Web Services Interoperability
•CREAM: Computing Resource Execution
and Management
•BES: Basic Execution Service
 CREAM (WS-I based interface)
 Deployed on the JRA1 preview test-bed.
After a first testing phase will be certified
and deployed together with the gLite-CE
 Our contribution to the OGF-BES group
for a standard WS-I based CE interface
 CREAM and WMProxy demo at SC06!
• BLAH is the interface to the local
resource manager (via plug-ins)
– CREAM and gLite-CE
– Information pass-through: pass
parameters to the LRMS to help job
scheduling
EGEE-II INFSO-RI-031688
Site
 Already deployed but still needs
thorough testing and tuning. Being done
now
Information
System
Computing
Element
bdII
R-GMA
CEMon
glexec +
LCAS/
LCMAPS
BLAH
WN
LRMS
Grid
 gLite-CE (GSI-enabled Condor-C)
WMS,
Clients
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
15
Grid foundation: Accounting
Enabling Grids for E-sciencE
• APEL (Accounting Processor for Event Logs) : Uses R-GMA
(Relational Grid Monitoring Architecture) to propagate and display
job accounting information for infrastructure monitoring
– Reads LRMS log files provided by LCG-CE and BLAH
– Preparing an update for gLite 3.0 to use the files from BLAH
• DGAS (Distributed Grid Accounting System): Collects, stores and
transfers accounting data. Compliant with privacy requirements
– Reads LRMS log files provided by LCG-CE and BLAH.
– Stores information in a site database (HLR, Home Location Register)
and optionally in a central HLR. Access granted to user, site and VO
administrators
– Not yet certified in gLite 3.0. Deployment plan:
 DGAS is in certification at INFN
 It will send records to the GOC via DGAS2APEL
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
16
Grid foundation: Storage Element
Enabling Grids for E-sciencE
• Storage Element
– Common interface: SRMv1, migrating to SRM v2.2
– Various implementation from LCG and other external projects
 disk-based: DPM, dCache / tape-based: Castor, dCache
– Support for ACLs in DPM (in future in Castor and dCache)
 After the summer: synchronization of ACLs between SEs
– Common rfio library for Castor and DPM being added
• Posix-like file access:
– Grid File Access Layer (GFAL) by LCG
 Support for ACL in the SRM layer (currently in DPM only)
 Support for SRMv2 being added now
– gLite I/O
 Support for ACLs from the file catalog and interfaced to Hydra for data
encryption
 Not certified in gLite 3.0. To be dismissed when all functionalities will be
also available in GFAL.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
17
High Level Services: Catalogues
Enabling Grids for E-sciencE
• File Catalogs
– LFC from LCG
 In June: interface to POOL.
 In the summer: LFC replication and backup.
– Hydra: stores keys for data encryption
 Being interfaced to GFAL (done by July)
 Currently only one instance, but in future there will be 3 instances:
at least 2 need to be available for decryption.
 Not yet certified in gLite 3.0. Certification will start soon.
– AMGA Metadata Catalog: generic metadata catalogue
 Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed
 Not yet certified in gLite 3.0. Certification will start soon.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
18
High Level Services: File transfer
Enabling Grids for E-sciencE
• FTS: Reliable, scalable and
customizable file transfer
– Manages transfers through channels
 mono-directional network pipes
between two sites
– Web service interface
– Automatic discovery of services
– Support for different user and administrative
roles
– Adding support for
pre-staging and new
proxy renewal schema
– Support for SRMv2.2,
delegation,
VOMS-aware proxy
renewal in certification
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
19
High Level Services: Workload mgmt.
Enabling Grids for E-sciencE
• WMS helps the user accessing computing resources
– Resource brokering, management of job input/output, ...
• LCG-RB: GT2 + Condor-G
– To be replaced when the gLite WMS proves to be reliable
• gLite WMS: Web service (WMProxy) + Condor-G
– Management of complex workflows (DAGs) and compound jobs
 bulk submission and shared input sandboxes
 support for input files on different servers (scattered sandboxes)
– Support for shallow resubmission of jobs
– Job File Perusal: file peeking during job execution
– Supports collection of information from CEMon, BDII, R-GMA
and from DLI and StorageIndex data management interfaces
– Support for parallel jobs (MPI) when the home dir is not shared
– Deployed for the first time in gLite 3.0
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
20
WMS/LB/UI and CE
Enabling Grids for E-sciencE
• New WMS deployed and thoroughly debugged
– CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs
 ~ 2.5 h to submit jobs
• 0.5 seconds/job
 ~ 17 hours to transfer jobs to a CE
• 3 seconds/job
• 26K jobs/day
 Negligible failure rate due to WMS
– Shallow resubmission
 failure rate drops to less than 1% with
3 resubmissions
• Stability problems
Done(Success) jobs after ith Submission
– investigating also other deployment
scenarios to make it more robust
100
80
(%)
CMS
60
40
20
0
0
1
2
3
4
Number of Submission
EGEE-II INFSO-RI-031688
5
6
ATLAS
• gLite CE still to be tested and
optimized
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
21
High Level Services: Workflows
Enabling Grids for E-sciencE
• Direct Acyclic Graph (DAG) is a
set of jobs where the input,
output, or execution of one or
more jobs depends on one or
more other jobs
• A Collection is a group of jobs
with no dependencies
nodeA
nodeB
nodeC
nodeE
nodeD
– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL
that vary their values according to parameters
• Using compound jobs it is possible to have one shot submission
of a (possibly very large, up to thousands) group of jobs
– Submission time reduction
 Single call to WMProxy server
 Single Authentication and Authorization process
 Sharing of files between jobs
– Availability of both a single Job ID to manage the group as a whole and
an ID for each single job in the group
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
22
High Level Services: Job Information
Enabling Grids for E-sciencE
• Logging and Bookkeeping service
–
–
–
–
Tracks jobs during their lifetime (in terms of events)
LBProxy for fast access
L&B API and CLI to query jobs
Support for “CE reputability ranking“: maintains recent statistics of
job failures at CE’s and feeds back to WMS to aid planning
• Job Provenance:
stores long term job
information
– Supports job rerun
– If deployed will also
help unloading the
L&B
– Not yet certified in
gLite 3.0.
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
23
Highlights: Job Priorities
Enabling Grids for E-sciencE
• Applications ask for the possibility to diversify the access to
fast/slow queues depending on the user role/group inside the VO
• GPBOX is a tool that provides the possibility to define, store and
propagate fine-grained VO policies
– based on VOMS groups and roles
– enforcement of policies at sites: sites may accept/reject policies
– Not yet certified. Certification will start when requested by the TCG.
• Current activities: test job prioritization without GPBOX:
- Map VOMS groups to batch system shares
- Publish info on the share in the CE GLUE 1.2 schema (VOView)
- WMS match-making depending on submitter VOMS certificate
- Settings are not dynamic (via e-mail or CE updates)
- GIP available for Torque/Maui only. Working on the LSF one
- mainly a deployment issue
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
24
Summary
Enabling Grids for E-sciencE
• gLite 3 is
– is the next generation middleware for grid computing
– developed according to a well defined process
 controlled by the EGEE Technical Coordination Group
– deployed on the EGEE production infrastructure
 More than 200 sites
– development is continuing to provide increased robustness,
usability, and functionality
 On the preview testbed
• CREAM, Job Provenance, glexec on the WNs, GPBOX
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
25
Enabling Grids for E-sciencE
Part II
The EGEE Project
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
26
The EGEE project
Enabling Grids for E-sciencE
• EGEE
– 1 April 2004 – 31 March 2006
– 71 partners in 27 countries, federated in regional Grids
• EGEE-II
– 1 April 2006 – 31 March 2008
– 91 partners in 32 countries
– 13 Federations
• Objectives
– Large-scale, production-quality
infrastructure for e-Science
– Attracting new resources and
users from industry as well as
science
– Improving and maintaining
“gLite” Grid middleware
EGEE-II INFSO-RI-031688
US partners in EGEE-II:
• Univ. Chicago
• Univ. South. California
• Univ. Wisconsin
• RENCI
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
27
Main lines of the EGEE project
Enabling Grids for E-sciencE
• Infrastructure operation
– Currently includes sites across 39 countries
– Continuous monitoring of grid services & automated site
configuration/management
• Middleware
– Production quality middleware distributed under
business friendly open source licence
• User Support - Managed process from first contact
through to production usage
–
–
–
–
Training
Expertise in grid-enabling applications
Online helpdesk
Networking events (User Forum, Conferences etc.)
KnowARC
• Interoperability
– Expanding geographical reach and interoperability
with related infrastructures
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
TWGRID
28
Applications on EGEE
Enabling Grids for E-sciencE
• Applications from an increasing
number of domains
–
–
–
–
–
–
–
–
–
–
–
Astrophysics
Computational Chemistry
Earth Sciences
Financial Simulation
Fusion
Geophysics
High Energy Physics
Life Sciences
Multimedia
Material Sciences
…
Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
29
EU projects related to EGEE
Enabling Grids for E-sciencE
EU
GRID
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
30
Sustainability: Beyond EGEE-II
Enabling Grids for E-sciencE
• Need to prepare for permanent Grid infrastructure
– Ensure a reliable and adaptive support for all sciences
– Independent of short project funding cycles
– Infrastructure managed in collaboration
with national grid initiatives
EGEE-II INFSO-RI-031688
gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007
31