Transcript Slide 1
Enabling Grids for E-sciencE
The
middleware
Roberto Barbera
University of Catania and INFN
ISSGC’06
Ischia, 18.07.2006
www.eu-egee.org
EGEE-II INFSO-RI-031688
Outline
Enabling Grids for E-sciencE
• Introduction
• Overview of gLite services
– especially security
• Summary and conclusions
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
2
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
ISSGC’06, Ischia, 18.07.2006
3
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
ISSGC’06, Ischia, 18.07.2006
4
gLite Services Decomposition
Enabling Grids for E-sciencE
6 High Level Services
+ CLI & API
Legend:
•Available
•Soon Available
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
5
Middleware structure
Enabling Grids for E-sciencE
• Applications have access
both to Higher-level Grid
Services and to Foundation
Grid Middleware
• Higher-Level Grid Services
are supposed to help the
users building their
computing infrastructure but
should not be mandatory
• Foundation Grid Middleware
will be deployed on the EGEE
infrastructure
– Must be complete and robust
– Should allow interoperation
with other major grid
infrastructures
– Should not assume the use of
Higher-Level Grid Services
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
6
Grid Foundation: Security
Enabling Grids for E-sciencE
• Authentication based on X.509 PKI infrastructure
– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport)
Commonly used in web browsers to authenticate to sites
– Trust between CAs and sites is established (offline)
– In order to reduce vulnerability, on the Grid user identification is
done by using (short lived) proxies of their certificates
• Proxies can
– Be delegated to a service such that it can act on the user’s
behalf
– Include additional attributes (like VO information via the VO
Membership Service VOMS)
– Be stored in an external proxy store (MyProxy)
– Be renewed (in case they are about to expire)
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
7
Digital Signature
Enabling Grids for E-sciencE
• Paul calculates the hash of
the message (with a oneway hash function)
• Paul encrypts the hash
using his private key: the
encrypted hash is the digital
signature.
• Paul sends the signed
message to John.
• John calculates the hash of
the message and verifies it
with A, decyphered with
Paul’s public key.
• If hashes equal: message
wasn’t modified; Paul
cannot repudiate it.
Paul
This is some
message
Digital Signature
This is some
message
Digital Signature
John
Hash(B)
=?
Paul keys
public
EGEE-II INFSO-RI-031688
Hash(A)
Hash(A)
This is some
message
Digital Signature
private
ISSGC’06, Ischia, 18.07.2006
8
Digital Certificates
Enabling Grids for E-sciencE
• Paul’s digital signature is safe if:
1. Paul’s private key is not compromised
2. John knows Paul’s public key
• How can John be sure that Paul’s public key is really Paul’s
public key and not someone else’s?
– A third party guarantees the correspondence between
public key and owner’s identity.
– Both A and B must trust this third party
• Two models:
– X.509: hierarchical organization;
– PGP: “web of trust”.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
9
X.509
Enabling Grids for E-sciencE
The “third party” is called Certification Authority (CA).
• Issue Digital Certificates (containing public key and owner’s
identity) for users, programs and machines (signed by the
CA)
• Check identity and the personal data of the requestor
– Registration Authorities (RAs) do the actual
identification/validation
• CAs periodically publish a list of compromised certificates
– Certificate Revocation Lists (CRL): contain all the revoked
certificates yet to expire
• CA certificates are self-signed
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
10
X.509 Certificates
Enabling Grids for E-sciencE
• An X.509 Certificate contains:
– owner’s public key;
Structure of a X.509 certificate
Public key
– identity of the owner;
Subject:C=CH, O=CERN,
OU=GRID, CN=Andrea Sciaba
8968
– info on the CA;
Issuer: C=CH, O=CERN,
OU=GRID, CN=CERN CA
– time of validity;
Expiration date: Aug 26 08:08:14
2005 GMT
Serial number: 625 (0x271)
– Serial number;
CA Digital signature
– digital signature of the CA
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
11
Obtaining a Certificate
Enabling Grids for E-sciencE
• How to obtain a certificate:
A certificate request
is performed
The certificate is issued
by the CA
EGEE-II INFSO-RI-031688
The user identify is
confirmed by the RA
The certificate is used as
a key to access the grid
ISSGC’06, Ischia, 18.07.2006
12
AuthN and AuthZ: pre-VOMS
Enabling Grids for E-sciencE
1.
• Authentication
– User receives certificate
signed by CA
– Connects to “UI” by ssh
– Downloads certificate
– Single logon to Grid – create
proxy - then Grid Security
Infrastructure identifies user
to other machines
2.
3.
– gridmapfile maps user to
local account
EGEE-II INFSO-RI-031688
AUP
VO
mgr
UI
VO service
• Authorisation
– User joins Virtual Organisation
– VO negotiates access to Grid
nodes and resources
– Authorisation tested by CE
CA
GSI
VO
database
Daily update
grid-mapfiles
on Grid services
ISSGC’06, Ischia, 18.07.2006
13
VOs and authorization
Enabling Grids for E-sciencE
• Grid users MUST belong to virtual organizations
–
–
–
–
What we previously called “groups”
Sets of users belonging to a collaboration
User must sign the usage guidelines for the VO
You will be registered in the VO server (wait for notification)
• VOs maintained a list of their members on a LDAP Server
– The list is downloaded by grid machines to map user certificate
subjects to local “pool” accounts
...
"/C=CH/O=CERN/OU=GRID/CN=Simone Campana 7461" .dteam
"/C=CH/O=CERN/OU=GRID/CN=Andrea Sciaba 8968" .cms
"/C=CH/O=CERN/OU=GRID/CN=Patricia Mendez Lorenzo-ALICE" .alice
...
– Sites decide which vos to accept
/etc/grid-security/grid-mapfile
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
14
Evolution of VO management
Enabling Grids for E-sciencE
VOMS
Before VOMS
•
•
•
•
User is authorised as a member
of a single VO
All VO members have same
rights
Gridmapfiles are updated by VO
management software: map the
user’s DN to a local account
grid-proxy-init – derives proxy
from certificate – the “single
sign-on to the grid”
•
– Aggregate rights
•
VO can have groups
– Different rights for each
Different groups of
experimentalists
…
– Nested groups
•
VO has roles
– Assigned to specific purposes
E,g. system admin
When assume this role
•
•
EGEE-II INFSO-RI-031688
User can be in multiple VOs
Proxy certificate carries the
additional attributes
voms-proxy-init
ISSGC’06, Ischia, 18.07.2006
15
VOMS: concepts
Enabling Grids for E-sciencE
Virtual Organization Membership Service:
– Extends the proxy with info on VO
membership, group, roles
– Fully compatible with GSI
– Each VO has a database containing
group membership, roles and capabilities
informations for each user
– User contacts VOMS server requesting his
authorization info
– Server sends authorization info to the
client, which includes it in a proxy
certificate
Authentication
Request
VOMS
AC
C=IT/O=INFN
VOMS
/L=CNAF
AC
/CN=Pinco Palla
/CN=proxy
Auth
DB
[glite-tutor] /home/giorgio > voms-proxy-init --voms gilda
Cannot find file or dir: /home/giorgio/.glite/vomses
Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Emidio
Giorgio/[email protected]
Enter GRID pass phrase:
Your proxy is valid until Mon Jan 30 23:35:51 2006
Creating temporary proxy.................................Done
Contacting voms.ct.infn.it:15001 [/C=IT/O=GILDA/OU=Host/L=INFN
Catania/CN=voms.ct.infn.it/[email protected]] "gilda"
Creating proxy ...................................... Done
Your proxy is valid until Mon Jan 30 23:35:51 2006
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
16
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• Generic Information
Provider (GIP)
– Provides LDIF
information about
a grid service in
accordance to the
GLUE Schema
GIP
Provider
Cache
Plugin
LDIF
File
Config
File
• BDII: Information system in gLite 3.0 (by LCG)
– LDAP database that is
updated by a process
2171
– More than one DBs is used
LDAP
separate read and write
– A port forwarder is used internally
to select the correct DB
EGEE-II INFSO-RI-031688
2172
LDAP
2173
LDAP
Update DB
&
Modify DB
Swap DBs
2170
Port Fwd
2170
Port Fwd
ISSGC’06, Ischia, 18.07.2006
17
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• R-GMA: provides a uniform method to access and
publish distributed information and monitoring data
– Used for job and infrastructure monitoring in gLite 3.0
– Working to
add
authorization
• Service Discovery:
–
–
–
–
Provides a standard set of methods for locating Grid services
Currently supports R-GMA, BDII and XML files as backends
Will add local cache of information
Used by some DM and WMS components in gLite 3.0
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
18
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• LCG-CE: based on GT2 GRAM
– To be replaced when other CEs prove to be reliable
• gLite-CE: based on GSI enabled Condor-C
– Supported by Condor. More efficient. Uses BLAH (see below)
– Deployed for the first time in gLite 3.0
• CREAM: new lightweight web service CE
– Not in gLite 3 release. Will need exposure to users on dedicated
system.
– WSDL interface
– Will support bulk submission of jobs from WMS and optimization
of input/output file transfer. Uses BLAH
– Plans are to have a CE with both Condor-C and CREAM
interfaces
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
19
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• BLAH: interfaces the CE
and the local batch
system
– May handle arbitrary
information passing from
CE to LRMS
patches to support this and
logging for accounting
being added now
– Used by gLite-CE and
CREAM
• CEMon: Web service to publish status of a computing
resource to clients
– Supports synchronous queries and asynchronous notifications
– Uses the same information (GIP) used by BDII
– In gLite 3 CEMon will be available to the users but the baseline is
that the WMS queries the BDII
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
20
Grid foundation: Accounting
Enabling Grids for E-sciencE
• APEL: Uses R-GMA to propagate and display job
accounting information for infrastructure monitoring
– Reads LRMS log files provided by LCG-CE and BLAH
– Preparing an update for gLite 3.0 to use the files form BLAH
• DGAS: Collects, stores and transfers accounting data.
Compliant with privacy requirements
– Reads LRMS log files provided by LCG-CE and BLAH.
– Stores information in a site database (HLR) and optionally in a
central HLR. Access granted to user, site and VO administrators
– Not yet certified in gLite 3.0. Deployment plan:
certify and activate local sensors and site HLR in parallel with APEL
replace APEL sensors with DGAS (DGAS2APEL)
certify and activate central HLR; perform scalability tests
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
21
Grid foundation: Storage Element
Enabling Grids for E-sciencE
• Storage Element
– Common interface: SRMv1,migrating to SRMv2
– Various implementation from LCG and other external projects
disk-based: DPM, dCache / tape-based: Castor, dCache
– Support for ACLs in DPM (in future in Castor and dCache)
After the summer: synchronization of ACLs between SEs
– Common rfio library for Castor and DPM being added
• Posix-like file access:
– Grid File Access Layer (GFAL) by LCG
Support for ACL in the SRM layer (currently in DPM only)
Support for SRMv2 being added now. In the summer add thread
safety and interface to the information system.
– gLite I/O
Support for ACLs from the file catalog and interfaced to Hydra for
data encryption
Not certified in gLite 3.0. To be dismissed when all functionalities
will be also available in GFAL.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
22
High Level Services: Catalogues
Enabling Grids for E-sciencE
• File Catalogs
– LFC from LCG
In June: interface to POOL.
In the summer: LFC replication and backup.
– Fireman
Not certified in gLite 3.0. To be dismissed when all functionalities
will be available in LFC.
• Hydra: stores keys for data encryption
– Being interfaced to GFAL (done by July)
– Currently only one instance, but in future there will be 3
instances: at least 2 need to be available for decryption.
– Not yet certified in gLite 3.0. Certification will start soon.
• AMGA Metadata Catalog: generic metadata catalogue
– Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed
– Not yet certified in gLite 3.0. Certification will start soon.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
23
High Level Services: File transfer
Enabling Grids for E-sciencE
• FTS: Reliable, scalable and
customizable file transfer
– Manages transfers through channels
mono-directional network pipes
between two sites
– Web service interface
– Automatic discovery of services
– Support for different user and administrative
roles
– Adding support for
pre-staging and new
proxy renewal schema
– In the medium term
add support for SRMv2,
delegation,
VOMS-aware proxy
renewal
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
24
High Level Services: Workload mgmt.
Enabling Grids for E-sciencE
• WMS helps the user accessing computing resources
– Resource brokering, management of job input/output, ...
• LCG-RB: GT2 + Condor-G
– To be replaced when the gLite WMS proves to be reliable
• gLite WMS: Web service (WMProxy) + Condor-G
– Management of complex workflows (DAGs) and compound jobs
bulk submission and shared input sandboxes
support for input files on different servers (scattered sandboxes)
– Support for shallow resubmission of jobs
– Job File Perusal: file peeking during job execution
– Supports collection of information from CEMon, BDII, R-GMA
and from DLI and StorageIndex data management interfaces
– Support for parallel jobs (MPI) when the home dir is not shared
– Deployed for the first time in gLite 3.0
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
25
High Level Services: Workflows
Enabling Grids for E-sciencE
• Direct Acyclic Graph (DAG) is a
set of jobs where the input,
output, or execution of one or
more jobs depends on one or
more other jobs
• A Collection is a group of jobs
with no dependencies
nodeA
nodeB
nodeC
nodeE
nodeD
– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL
that vary their values according to parameters
• Using compound jobs it is possible to have one shot submission
of a (possibly very large, up to thousands) group of jobs
– Submission time reduction
Single call to WMProxy server
Single Authentication and Authorization process
Sharing of files between jobs
– Availability of both a single Job ID to manage the group as a whole and
an ID for each single job in the group
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
26
High Level Services: Job Information
Enabling Grids for E-sciencE
• Logging and Bookkeeping service
–
–
–
–
Tracks jobs during their lifetime (in terms of events)
LBProxy for fast access
L&B API and CLI to query jobs
Support for “CE reputability ranking“: maintains recent statistics of
job failures at CE’s and feeds back to WMS to aid planning
• Job Provenance:
stores long term job
information
– Supports job rerun
– If deployed will also
help unloading the
L&B
– Not yet certified in
gLite 3.0.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
27
High Level Services: Job Priorities
Enabling Grids for E-sciencE
• GPBOX: Interface to define, store and propagate finegrained VO policies
– Based on VOMS groups and roles
– Enforcement of policies at sites: sites may accept/reject policies
– Not yet certified in gLite 3.0.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
28
gLite
process
Enabling Grids for E-sciencE
• Process controlled by the
Technical Coordination Group
• Task Forces with developers,
applications, testers and
deployment experts
• gLite 3.0 adopts a continuous
release process:
– No more big-bang releases
with fixed deadlines for all
– Develop components as
requested by users and sites
– Deploy or upgrade as soon as
testing is satisfactory
• Major releases synchronized
with large scale activities of
VOs (SCs)
– Next major release foreseen in
autumn
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
29
gLite Software Process
Enabling Grids for E-sciencE
JRA1 Development
Directives
Bug Fixing
Software
Serious
problem
SA3 Integration
SA3 Testing &
Certification
SA1 PreProduction
Deployment
Packages
Problem
Fail
SA1 Production
Infrastructure
Integration
Tests
Testbed
Deployment
Fail
Pass
Functional
Tests
Release
EGEE-II INFSO-RI-031688
Installation Guide,
Release Notes, etc
Pre-Production
Deployment
Pass
Pass
Fail
Scalability
Tests
ISSGC’06, Ischia, 18.07.2006
30
Summary
Enabling Grids for E-sciencE
• gLite 3 being deployed on the production infrastructure
– Includes all of the well known middleware from LCG 2.7.0
• New components deployed for the first time on the
Production Infrastructure:
– Address requirements in terms of functionality and scalability
– Components deployed for the first time need extensive testing!
• Developed according to a well defined process
– Controlled by the EGEE Technical Coordination Group
• Development is continuing to provide increased
robustness, usability, and functionality
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
31
Questions ?
Enabling Grids for E-sciencE
www.glite.org
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
32