Grids and Grid Applications

Download Report

Transcript Grids and Grid Applications

Enabling Grids for E-sciencE
gLite Overview
Jean Salzemann CNRS/IN2P3
ACGRID School,
Hanoi (Vietnam) November 5th, 2007
Credits: Charles Loomis, Mike Mineter, Giuseppe
Andronico, Alex Villazon, and other EGEE collegues…
www.eu-egee.org
INFSO-RI-508833
EGEE
Enabling Grids for E-sciencE
•
EGEE = Enabling Grids for E-sciencE
Two projects of 2 years: EGEE I and EGEE II
over 70 leading institutions in more than 40
countries, federated in regional Grids
Currently
40.000 CPUs
5 Petabytes (5 Mio. GB) storage
~200 Virtual Organizations (VO)
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
2
gLite – Grid middleware
Enabling Grids for E-sciencE
•
The Grid relies on advanced software – the middleware which interfaces between resources and the
applications
•
The GRID middleware
Finds convenient places for the
application to be executed
Optimises use of resources
Organises efficient access to data
Deals with authentication to the
different sites that are used
Run the job & monitors progress
Transfers the result back to the
scientist
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
3
Glite Legacy
Enabling Grids for E-sciencE
Globus
Condor
Pan-European testbed.
Complete, functional set of services.
EDG
Significant productions demonstrated.
LCG
Alien
Subset of EDG services.
Re-engineering:
Robustness
Improved robustness, scalability.
Worldwide production service.
...
GLite
Standard interfaces
Expanded set of services.
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
4
“Desktop Grid” Architecture
Enabling Grids for E-sciencE
User
App. DB
pull task
Resource
...
Resource
Peer-to-peer architecture (BOINC, XtremeWeb)
Volatile resources.
...
Limited security (client identifies server).
Lightweight infrastructure.
Handles limited types of resources.
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
5
LCG Architecture
Enabling Grids for E-sciencE
InfoSys.
User
publish descriptions
query
submit
Resource
Broker
Resource
Batch-like architecture.
Stable, well-maintained resources.
Secured via Public Key Infrastructure (PKI)
submit
...
Heavy support infrastructure.
Can handle large range of resources.
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
6
Service Oriented Architecture
Enabling Grids for E-sciencE
Existing LCG system is largely service-oriented.
EGEE evolving to a clean SOA:
standard interfaces
standard technologies
Registry
query
publish description
location
Client
Service
interaction
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
7
Convergence of Technologies
Enabling Grids for E-sciencE
• Web Services
– Clean, complete specification of service APIs.
– Supported technology:
 Good support within commercial sector.
 Adequate support within open-source community.
– Very active ➔ proposed standards rapidly evolving.
• EGEE Service Evolution
– Plain web services:
 Avoid “proprietary” protocols and interfaces.
 Fairly stable, will ease further evolution.
– Adopt WSRF and/or WS-* standards as appropriate.
• Expect user-visible changes in APIs.
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
8
Middleware face to face
Enabling Grids for E-sciencE
LCG
•
gLite
Security
– GAS (Grid Access Service)
Job Management
– Condor + globus
– CE, WN
– Logging & Bookkeeping
•
•
Data Management
– LCG services
•
•
Information & Monitoring
– BDII
•
•
Grid Access
– CLI + API
•
•
INFSO-RI-508833
•
Security
– VOMS
Job Management
– Condor + blahp
– CE, WN
– Logging & Bookkeeping
– Job Provenance
Data Management
– LFC
– AMGA
Information & Monitoring
– R-GMA
– Service Discovery
Grid Access
– CLI + API + Web Services
Glite Overview – J. Salzemann – 05/11/2007
9
gLite – Overview
Enabling Grids for E-sciencE
•
gLite
First release 2005 (currently gLite 3.1)
Next generation middleware for grid computing
Developed from existing components (globus, condor,..)
Intended to replace present middleware with production quality
services
Interoperability & Co-existence with deployed infrastructure
Robust: Performance & Fault tolerance
Open Source license
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
10
The Grid stack
Enabling Grids for E-sciencE
•
Application layer
– Grid programs
•
Collective layer
– Resource Co-allocation
– Data Replica Management
•
Collective
Application
Resource
Resource layer
– Resource Management
– Information Services
– Data Access
•
Application
Connectivity
Transport
Internet
Fabric
Link
Connectivity layer
– Grid Security Infrastructure
– High-performance data transfer protocols
•
Fabric layer
– the hardware: computers (parallel, clusters..), data storage servers
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
11
gLite Grid Middleware Services
Enabling Grids for E-sciencE
CLI
API
Access
Authorization
Information &
Monitoring
Auditing
Authentication
Security Services
Metadata
Catalog
File & Replica
Catalog
Storage
Element
Data
Movement
Information &
Monitoring Services
Accounting
Job
Provenance
Package
Manager
Connectivity
Computing
Element
Workload
Management
Data Management
INFSO-RI-508833
Application
Monitoring
Workload Mgmt Services
Glite Overview – J. Salzemann – 05/11/2007
12
Main components
Enabling Grids for E-sciencE
User Interface (UI):
The place where users logon to the Grid
Resource Broker (RB): Matches the user requirements with the available
resources on the Grid
Information System: Characteristics and status of CE and SE
(Uses “GLUE schema”)
Computing Element (CE): A batch queue on a site’s computers where
the user’s job is executed
Storage Element (SE): provides (large-scale) storage for files
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
13
Current production middleware
Enabling Grids for E-sciencE
“User
interface”
Input “sandbox”
Output “sandbox”
DataSets info
Replica
Catalogue
Information
Service
Resource
Broker
INFSO-RI-508833
Publish
Logging &
Book-keeping
Job Query
Job Submit Event
Author.
&Authen.
Storage
Element
Job Status
Computing
Element
Glite Overview – J. Salzemann – 05/11/2007
14
gLite: Workload Management System
(WMS)
Enabling Grids for E-sciencE
• Job Management Services related to job
management/execution
– Computing Element
 job management (submission, control, …)
 information about characteristics and status
 Actual execution is done in a Worker Node (WN)
– Workload Management
 core component (see next slides)
– Job Provenance
 keeps track of job definition, execution conditions, environment
 important points of the job life cycle
• debugging, post-mortem analysis, comparision of job execution
– Package Manager
 extension of a traditional package management system to a grid
• automates the process of installing, upgrading, configuring and removing
software packages from a shared area on a grid site
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
15
gLite: WMS architecture
Enabling Grids for E-sciencE
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
16
Information Services
Enabling Grids for E-sciencE
• Maintains information about hardware, software,
services and people participating in a Virtual
Organization
– Should scale with the Grid´s growth
“Find a computer with at least 2 free CPUs and with 10GB
of free disk space...”
• Globus MDS (Monitoring and Discovery System)
– Hierarchical, push based
(pull based)
 showed limitations
GRIS …
SNMP
GIIS
…
NWS
NIS
INFSO-RI-508833
Data
Model
MDS API
LDAP
Glite Overview – J. Salzemann – 05/11/2007
17
gLite: Information System - BDII
Enabling Grids for E-sciencE
• Berkely Database Information Index (BDII)
– A Monitoring and Discovery Service (MDS) evolution
– Based on LDAP (Lightweight Directory Access Protocol)
– Central system
 Queries servers/providers about status
 Stores the retrieved information in a database
 Provides the information following the GLUE Schema
• Commands
 lcg-infosites –vo <your_vo> all l ce l se l lfc l lfcLocal l –is <your_bdii>
[gliteui] /home/martin > lcg-infosites --vo dpsgltb all –is glitece.dps.uibk.ac.at
#CPU Free Total Jobs
Running Waiting ComputingElement
---------------------------------------------------------2
2
0
0
0 glitece.dps.uibk.ac.at:2119/blah-pbs-dpsgltb
Avail Space(Kb) Used Space(Kb) Type SEs
---------------------------------------------------------3172384
4664832
n.a gliteio.dps.uibk.ac.at
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
18
gLite: Information System - R-GMA
Enabling Grids for E-sciencE
• Relational Grid Monitoring Architecture (R-GMA)
– Developed as part of the EuropeanDataGrid Project (EDG)
– Now as part of the EGEE project
– Based on the Grid Monitoring Architecture (GMA)
• Uses a relational data model
– There is no central repository, only a “Virtual Database”
– Schema is a list of table definitions
 Additional tables/schema can be defined
– Registry is a list of data producers with all its details
Prod
Schema
Prod
Virtual table
Cons
INFSO-RI-508833
...
...
Registry
Cons
Glite Overview – J. Salzemann – 05/11/2007
19
Resource Management
Enabling Grids for E-sciencE
• Everything (or anything) is a resource
– Physical or logical (single computer, cluster, parallel, data
storage, an application...)
– Defined in terms of interfaces, not devices
• Each site must be autonomous (local system
administration policy)
• Grid Resource Allocation Manager (GRAM)
– Defines resource layer protocols and APIs that enable clients to
securely instantiate a Grid computational task (i.e. a job)
– Secure remote job submissions
– Relies on local resource management interfaces
GRAM
LL
INFSO-RI-508833
LSF
PBS
SGE
Glite Overview – J. Salzemann – 05/11/2007
21
Data Management: Protocols
Enabling Grids for E-sciencE
• Data access and transfer
– Simple, automatic multi-protocol file transfer tools:
Integrated with Resource Management service
 Move data from/to local machine to remote machine, where the
job is executed (stagein – stageout)
 Redirect stdin to a remote location
 Redirect stdout and stderr to the local computer
 Pull executable from a remote location
– To have a secure, high-performance, reliable file transfer
over modern WANs: GridFTP
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
23
gLite: Data management - Services
Enabling Grids for E-sciencE
• Catalog
–
–
–
–
File and Replica Catalog
File Authorization Service
Metadata catalog
Distribution of catalogs, conflicts resolution
• Storage Elements (SE)
– SRM (Storage Resource Manager) interface
– Transfer protocols (gsiftp, rfio, …)
Catalog
SE
INFSO-RI-508833
SE
SE
SE
SE
Glite Overview – J. Salzemann – 05/11/2007
24
File management in gLite
Enabling Grids for E-sciencE
• Files are write-once, read-many
– If users edit files then they manage the consequences!
•
Middleware supporting
– Replica files
• to be close to where you want computation
• For resilience
– Logical filenames
– Catalogue: maps logical name to physical storage device/file
– Virtual filesystems,
POSIX-like I/O
• Services provided:
– storage
– transfer
– catalogue that maps logical filenames to replicas.
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
25
Security
Enabling Grids for E-sciencE
• Basic security:
– Authentication: Who we are on the Grid?
– Authorization: Do we have access to a resource/service?
– Protection: Data integrity and confidentiality
• but, there are thousands of resources over different
administration domains...:
– Single sign-on, i.e. give a password once, and be able to access all
resources (to which we have access)
• Grid Security Infrastructure (GSI):
– Grid credentials: digital certificate and private key
 Based on Public Key Infrastructure (PKI). X.509 standard
 Certification Authority (CA) signs certificates. Trust relationship
– Proxy certificates: Temporary self-signed certs, allowing single
sign-on: Proxy delegation
CA
INFSO-RI-508833
sign
User
sign
Proxy sign
Proxy
. . .
Glite Overview – J. Salzemann – 05/11/2007
28
Conventional grid security
Enabling Grids for E-sciencE
Bob
Cert request
Certification Authority (CA)
Bob´s Grid certificate
grid-proxy-init
- Single sign-on
- Delegation through proxy certificate
Grid resources (B)
User Interface
(UI)
Grid resources (A)
Sysadmin A :
- Create user “grid1“
- Map Bob´s certificate to “grid01“
INFSO-RI-508833
Sysadmin B :
- Create user “user001“
- Map Bob´s certificate to “user001“
- Manual user “mapping“
- No info about VOs
Glite Overview – J. Salzemann – 05/11/2007
29
gLite – Enhanced security in gLite
Enabling Grids for E-sciencE
Bob
Certification Authority (CA)
Cert request
Bob´s Grid certificate
VO membership request
VO Service
User Interface
(UI)
VO
VO
Database
VO Manager
voms-proxy-init
Grid resources (A) Automatic mapping
for Bob
VO
Account
Pool
INFSO-RI-508833
Automatic mapping
for Bob
Grid resources (B)
VO
Account
Pool
Glite Overview – J. Salzemann – 05/11/2007
30
gLite: VOMS
Enabling Grids for E-sciencE
•
Virtual Organization Membership Service (VOMS)
–
EGEE/gLite enhancement for VO management
Provides information on user's relationship with Virtual
Organization (VO)
Membership
Group membership
Roles of user
Multiple VO
User can register to multiple VOs and create an aggregate proxy
Access ressources in every registered VO
Backward compatibility
Extra VO related information in users proxy certificate
Users proxy can still be used with non VOMS-aware services
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
31
gLite: VOMS - Web interface
Enabling Grids for E-sciencE
•
Requires a valid certificate
from a recognized CA
imported on the browser
•
VO user can
Query membership details
Register himself in the VO
Needs a valid certificate
Track his requests
•
VO manager can
Handle requests from users
Administer the VO
•
Everybody can
Get information about the VO
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
32
Enabling Grids for E-sciencE
QUESTIONS ??
INFSO-RI-508833
Glite Overview – J. Salzemann – 05/11/2007
33