Transcript Slide 1

Enabling Grids for E-sciencE
The
middleware
Roberto Barbera
University of Catania and INFN
ISSGC’06
Ischia, 18.07.2006
www.eu-egee.org
EGEE-II INFSO-RI-031688
Outline
Enabling Grids for E-sciencE
• Introduction
• Overview of gLite services
– especially security
• Summary and conclusions
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
2
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
ISSGC’06, Ischia, 18.07.2006
3
Job Workflow in gLite
Enabling Grids for E-sciencE
UI
LFC
Catalog
Input “sandbox”
DataSets info
JDL
Output “sandbox”
Information
Service
EGEE-II INFSO-RI-031688
Storage
Element
Globus RSL
Job Status
Logging &
Book-keeping
Publish
Job Query
Job Submit Event
Author.
&Authen.
Expanded JDL
Resource
Broker
Job Status
Job Submission
Service
Computing
Element
ISSGC’06, Ischia, 18.07.2006
4
gLite Services Decomposition
Enabling Grids for E-sciencE
6 High Level Services
+ CLI & API
Legend:
•Available
•Soon Available
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
5
Middleware structure
Enabling Grids for E-sciencE
• Applications have access
both to Higher-level Grid
Services and to Foundation
Grid Middleware
• Higher-Level Grid Services
are supposed to help the
users building their
computing infrastructure but
should not be mandatory
• Foundation Grid Middleware
will be deployed on the EGEE
infrastructure
– Must be complete and robust
– Should allow interoperation
with other major grid
infrastructures
– Should not assume the use of
Higher-Level Grid Services
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
6
Grid Foundation: Security
Enabling Grids for E-sciencE
• Authentication based on X.509 PKI infrastructure
– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport)
 Commonly used in web browsers to authenticate to sites
– Trust between CAs and sites is established (offline)
– In order to reduce vulnerability, on the Grid user identification is
done by using (short lived) proxies of their certificates
• Proxies can
– Be delegated to a service such that it can act on the user’s
behalf
– Include additional attributes (like VO information via the VO
Membership Service VOMS)
– Be stored in an external proxy store (MyProxy)
– Be renewed (in case they are about to expire)
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
7
Digital Signature
Enabling Grids for E-sciencE
• Paul calculates the hash of
the message (with a oneway hash function)
• Paul encrypts the hash
using his private key: the
encrypted hash is the digital
signature.
• Paul sends the signed
message to John.
• John calculates the hash of
the message and verifies it
with A, decyphered with
Paul’s public key.
• If hashes equal: message
wasn’t modified; Paul
cannot repudiate it.
Paul
This is some
message
Digital Signature
This is some
message
Digital Signature
John
Hash(B)
=?
Paul keys
public
EGEE-II INFSO-RI-031688
Hash(A)
Hash(A)
This is some
message
Digital Signature
private
ISSGC’06, Ischia, 18.07.2006
8
Digital Certificates
Enabling Grids for E-sciencE
• Paul’s digital signature is safe if:
1. Paul’s private key is not compromised
2. John knows Paul’s public key
• How can John be sure that Paul’s public key is really Paul’s
public key and not someone else’s?
– A third party guarantees the correspondence between
public key and owner’s identity.
– Both A and B must trust this third party
• Two models:
– X.509: hierarchical organization;
– PGP: “web of trust”.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
9
X.509
Enabling Grids for E-sciencE
The “third party” is called Certification Authority (CA).
• Issue Digital Certificates (containing public key and owner’s
identity) for users, programs and machines (signed by the
CA)
• Check identity and the personal data of the requestor
– Registration Authorities (RAs) do the actual
identification/validation
• CAs periodically publish a list of compromised certificates
– Certificate Revocation Lists (CRL): contain all the revoked
certificates yet to expire
• CA certificates are self-signed
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
10
X.509 Certificates
Enabling Grids for E-sciencE
• An X.509 Certificate contains:
– owner’s public key;
Structure of a X.509 certificate
Public key
– identity of the owner;
Subject:C=CH, O=CERN,
OU=GRID, CN=Andrea Sciaba
8968
– info on the CA;
Issuer: C=CH, O=CERN,
OU=GRID, CN=CERN CA
– time of validity;
Expiration date: Aug 26 08:08:14
2005 GMT
Serial number: 625 (0x271)
– Serial number;
CA Digital signature
– digital signature of the CA
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
11
Obtaining a Certificate
Enabling Grids for E-sciencE
• How to obtain a certificate:
A certificate request
is performed
The certificate is issued
by the CA
EGEE-II INFSO-RI-031688
The user identify is
confirmed by the RA
The certificate is used as
a key to access the grid
ISSGC’06, Ischia, 18.07.2006
12
AuthN and AuthZ: pre-VOMS
Enabling Grids for E-sciencE
1.
• Authentication
– User receives certificate
signed by CA
– Connects to “UI” by ssh
– Downloads certificate
– Single logon to Grid – create
proxy - then Grid Security
Infrastructure identifies user
to other machines
2.
3.
– gridmapfile maps user to
local account
EGEE-II INFSO-RI-031688
AUP
VO
mgr
UI
VO service
• Authorisation
– User joins Virtual Organisation
– VO negotiates access to Grid
nodes and resources
– Authorisation tested by CE
CA
GSI
VO
database
Daily update
grid-mapfiles
on Grid services
ISSGC’06, Ischia, 18.07.2006
13
VOs and authorization
Enabling Grids for E-sciencE
• Grid users MUST belong to virtual organizations
–
–
–
–
What we previously called “groups”
Sets of users belonging to a collaboration
User must sign the usage guidelines for the VO
You will be registered in the VO server (wait for notification)
• VOs maintained a list of their members on a LDAP Server
– The list is downloaded by grid machines to map user certificate
subjects to local “pool” accounts
...
"/C=CH/O=CERN/OU=GRID/CN=Simone Campana 7461" .dteam
"/C=CH/O=CERN/OU=GRID/CN=Andrea Sciaba 8968" .cms
"/C=CH/O=CERN/OU=GRID/CN=Patricia Mendez Lorenzo-ALICE" .alice
...
– Sites decide which vos to accept
 /etc/grid-security/grid-mapfile
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
14
Evolution of VO management
Enabling Grids for E-sciencE
VOMS
Before VOMS
•
•
•
•
User is authorised as a member
of a single VO
All VO members have same
rights
Gridmapfiles are updated by VO
management software: map the
user’s DN to a local account
grid-proxy-init – derives proxy
from certificate – the “single
sign-on to the grid”
•
– Aggregate rights
•
VO can have groups
– Different rights for each
 Different groups of
experimentalists
 …
– Nested groups
•
VO has roles
– Assigned to specific purposes
 E,g. system admin
 When assume this role
•
•
EGEE-II INFSO-RI-031688
User can be in multiple VOs
Proxy certificate carries the
additional attributes
voms-proxy-init
ISSGC’06, Ischia, 18.07.2006
15
VOMS: concepts
Enabling Grids for E-sciencE
Virtual Organization Membership Service:
– Extends the proxy with info on VO
membership, group, roles
– Fully compatible with GSI
– Each VO has a database containing
group membership, roles and capabilities
informations for each user
– User contacts VOMS server requesting his
authorization info
– Server sends authorization info to the
client, which includes it in a proxy
certificate
Authentication
Request
VOMS
AC
C=IT/O=INFN
VOMS
/L=CNAF
AC
/CN=Pinco Palla
/CN=proxy
Auth
DB
[glite-tutor] /home/giorgio > voms-proxy-init --voms gilda
Cannot find file or dir: /home/giorgio/.glite/vomses
Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Emidio
Giorgio/[email protected]
Enter GRID pass phrase:
Your proxy is valid until Mon Jan 30 23:35:51 2006
Creating temporary proxy.................................Done
Contacting voms.ct.infn.it:15001 [/C=IT/O=GILDA/OU=Host/L=INFN
Catania/CN=voms.ct.infn.it/[email protected]] "gilda"
Creating proxy ...................................... Done
Your proxy is valid until Mon Jan 30 23:35:51 2006
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
16
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• Generic Information
Provider (GIP)
– Provides LDIF
information about
a grid service in
accordance to the
GLUE Schema
GIP
Provider
Cache
Plugin
LDIF
File
Config
File
• BDII: Information system in gLite 3.0 (by LCG)
– LDAP database that is
updated by a process
2171
– More than one DBs is used
LDAP
separate read and write
– A port forwarder is used internally
to select the correct DB
EGEE-II INFSO-RI-031688
2172
LDAP
2173
LDAP
Update DB
&
Modify DB
Swap DBs
2170
Port Fwd
2170
Port Fwd
ISSGC’06, Ischia, 18.07.2006
17
Grid foundation: Information Systems
Enabling Grids for E-sciencE
• R-GMA: provides a uniform method to access and
publish distributed information and monitoring data
– Used for job and infrastructure monitoring in gLite 3.0
– Working to
add
authorization
• Service Discovery:
–
–
–
–
Provides a standard set of methods for locating Grid services
Currently supports R-GMA, BDII and XML files as backends
Will add local cache of information
Used by some DM and WMS components in gLite 3.0
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
18
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• LCG-CE: based on GT2 GRAM
– To be replaced when other CEs prove to be reliable
• gLite-CE: based on GSI enabled Condor-C
– Supported by Condor. More efficient. Uses BLAH (see below)
– Deployed for the first time in gLite 3.0
• CREAM: new lightweight web service CE
– Not in gLite 3 release. Will need exposure to users on dedicated
system.
– WSDL interface
– Will support bulk submission of jobs from WMS and optimization
of input/output file transfer. Uses BLAH
– Plans are to have a CE with both Condor-C and CREAM
interfaces
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
19
Grid foundation: Computing Element
Enabling Grids for E-sciencE
• BLAH: interfaces the CE
and the local batch
system
– May handle arbitrary
information passing from
CE to LRMS
 patches to support this and
logging for accounting
being added now
– Used by gLite-CE and
CREAM
• CEMon: Web service to publish status of a computing
resource to clients
– Supports synchronous queries and asynchronous notifications
– Uses the same information (GIP) used by BDII
– In gLite 3 CEMon will be available to the users but the baseline is
that the WMS queries the BDII
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
20
Grid foundation: Accounting
Enabling Grids for E-sciencE
• APEL: Uses R-GMA to propagate and display job
accounting information for infrastructure monitoring
– Reads LRMS log files provided by LCG-CE and BLAH
– Preparing an update for gLite 3.0 to use the files form BLAH
• DGAS: Collects, stores and transfers accounting data.
Compliant with privacy requirements
– Reads LRMS log files provided by LCG-CE and BLAH.
– Stores information in a site database (HLR) and optionally in a
central HLR. Access granted to user, site and VO administrators
– Not yet certified in gLite 3.0. Deployment plan:
 certify and activate local sensors and site HLR in parallel with APEL
 replace APEL sensors with DGAS (DGAS2APEL)
 certify and activate central HLR; perform scalability tests
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
21
Grid foundation: Storage Element
Enabling Grids for E-sciencE
• Storage Element
– Common interface: SRMv1,migrating to SRMv2
– Various implementation from LCG and other external projects
 disk-based: DPM, dCache / tape-based: Castor, dCache
– Support for ACLs in DPM (in future in Castor and dCache)
 After the summer: synchronization of ACLs between SEs
– Common rfio library for Castor and DPM being added
• Posix-like file access:
– Grid File Access Layer (GFAL) by LCG
 Support for ACL in the SRM layer (currently in DPM only)
 Support for SRMv2 being added now. In the summer add thread
safety and interface to the information system.
– gLite I/O
 Support for ACLs from the file catalog and interfaced to Hydra for
data encryption
 Not certified in gLite 3.0. To be dismissed when all functionalities
will be also available in GFAL.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
22
High Level Services: Catalogues
Enabling Grids for E-sciencE
• File Catalogs
– LFC from LCG
 In June: interface to POOL.
 In the summer: LFC replication and backup.
– Fireman
 Not certified in gLite 3.0. To be dismissed when all functionalities
will be available in LFC.
• Hydra: stores keys for data encryption
– Being interfaced to GFAL (done by July)
– Currently only one instance, but in future there will be 3
instances: at least 2 need to be available for decryption.
– Not yet certified in gLite 3.0. Certification will start soon.
• AMGA Metadata Catalog: generic metadata catalogue
– Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed
– Not yet certified in gLite 3.0. Certification will start soon.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
23
High Level Services: File transfer
Enabling Grids for E-sciencE
• FTS: Reliable, scalable and
customizable file transfer
– Manages transfers through channels
 mono-directional network pipes
between two sites
– Web service interface
– Automatic discovery of services
– Support for different user and administrative
roles
– Adding support for
pre-staging and new
proxy renewal schema
– In the medium term
add support for SRMv2,
delegation,
VOMS-aware proxy
renewal
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
24
High Level Services: Workload mgmt.
Enabling Grids for E-sciencE
• WMS helps the user accessing computing resources
– Resource brokering, management of job input/output, ...
• LCG-RB: GT2 + Condor-G
– To be replaced when the gLite WMS proves to be reliable
• gLite WMS: Web service (WMProxy) + Condor-G
– Management of complex workflows (DAGs) and compound jobs
 bulk submission and shared input sandboxes
 support for input files on different servers (scattered sandboxes)
– Support for shallow resubmission of jobs
– Job File Perusal: file peeking during job execution
– Supports collection of information from CEMon, BDII, R-GMA
and from DLI and StorageIndex data management interfaces
– Support for parallel jobs (MPI) when the home dir is not shared
– Deployed for the first time in gLite 3.0
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
25
High Level Services: Workflows
Enabling Grids for E-sciencE
• Direct Acyclic Graph (DAG) is a
set of jobs where the input,
output, or execution of one or
more jobs depends on one or
more other jobs
• A Collection is a group of jobs
with no dependencies
nodeA
nodeB
nodeC
nodeE
nodeD
– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL
that vary their values according to parameters
• Using compound jobs it is possible to have one shot submission
of a (possibly very large, up to thousands) group of jobs
– Submission time reduction
 Single call to WMProxy server
 Single Authentication and Authorization process
 Sharing of files between jobs
– Availability of both a single Job ID to manage the group as a whole and
an ID for each single job in the group
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
26
High Level Services: Job Information
Enabling Grids for E-sciencE
• Logging and Bookkeeping service
–
–
–
–
Tracks jobs during their lifetime (in terms of events)
LBProxy for fast access
L&B API and CLI to query jobs
Support for “CE reputability ranking“: maintains recent statistics of
job failures at CE’s and feeds back to WMS to aid planning
• Job Provenance:
stores long term job
information
– Supports job rerun
– If deployed will also
help unloading the
L&B
– Not yet certified in
gLite 3.0.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
27
High Level Services: Job Priorities
Enabling Grids for E-sciencE
• GPBOX: Interface to define, store and propagate finegrained VO policies
– Based on VOMS groups and roles
– Enforcement of policies at sites: sites may accept/reject policies
– Not yet certified in gLite 3.0.
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
28
gLite
process
Enabling Grids for E-sciencE
• Process controlled by the
Technical Coordination Group
• Task Forces with developers,
applications, testers and
deployment experts
• gLite 3.0 adopts a continuous
release process:
– No more big-bang releases
with fixed deadlines for all
– Develop components as
requested by users and sites
– Deploy or upgrade as soon as
testing is satisfactory
• Major releases synchronized
with large scale activities of
VOs (SCs)
– Next major release foreseen in
autumn
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
29
gLite Software Process
Enabling Grids for E-sciencE
JRA1 Development
Directives
Bug Fixing
Software
Serious
problem
SA3 Integration
SA3 Testing &
Certification
SA1 PreProduction
Deployment
Packages
Problem
Fail
SA1 Production
Infrastructure
Integration
Tests
Testbed
Deployment
Fail
Pass
Functional
Tests
Release
EGEE-II INFSO-RI-031688
Installation Guide,
Release Notes, etc
Pre-Production
Deployment
Pass
Pass
Fail
Scalability
Tests
ISSGC’06, Ischia, 18.07.2006
30
Summary
Enabling Grids for E-sciencE
• gLite 3 being deployed on the production infrastructure
– Includes all of the well known middleware from LCG 2.7.0
• New components deployed for the first time on the
Production Infrastructure:
– Address requirements in terms of functionality and scalability
– Components deployed for the first time need extensive testing!
• Developed according to a well defined process
– Controlled by the EGEE Technical Coordination Group
• Development is continuing to provide increased
robustness, usability, and functionality
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
31
Questions ?
Enabling Grids for E-sciencE
www.glite.org
EGEE-II INFSO-RI-031688
ISSGC’06, Ischia, 18.07.2006
32