gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer.
Download
Report
Transcript gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer.
gLite, the next generation middleware
for Grid computing
Oxana Smirnova (Lund/CERN)
Nordic Grid Neighborhood Meeting
Linköping, October 20, 2004
Uses material from E.Laure and F.Hemmer
gLite
• What is gLite:
“the next generation middleware for grid
computing”
“collaborative efforts of more than 80
people in 10 different academic and
industrial research centers”
“Part of the EGEE project
(http://www.eu-egee.org)”
“bleeding-edge, best-of-breed framework
for building grid applications tapping into
the power of distributed computing and
storage resources across the Internet”
EGEE Activity Areas
(quoted from http://www.glite.org)
Nordic contributors: HIP, PDC, UiB
2
Architecture guiding principles
•
Lightweight services
•
Interoperability
•
Large-scale deployment and continuous usage
Being built on Scientific Linux and Windows
Co-existence with deployed infrastructure
•
Allow for multiple implementations
Portability
•
60+ external dependencies
Performance/Scalability & Resilience/Fault
Tolerance
•
Easily and quickly deployable
Use existing services where possible as basis for
re-engineering
“Lightweight” does not mean less services or
non- intrusiveness – it means modularity
Reduce requirements on participating sites
Flexible service deployment
Multiple services running on the same physical
machine (if possible)
Co-existence with LCG-2 and OSG (US) are
essential for the EGEE Grid service
Service oriented approach
…
3
Service-oriented approach
• By adopting the Open Grid Services Architecture, with
components that are:
Loosely coupled (messages)
Accessible across network; modular and self-contained; clean
modes of failure
Can change implementation without changing interfaces
Can be developed in anticipation of new use cases
• Follow WSRF standardization
No mature WSRF implementations exist to-date so start with plain
WS
• WSRF compliance is not an immediate goal, but the WSRF evolution
is followed
• WS-I compliance is important
4
gLite vs LCG-2
• Intended to replace LCG-2
• Starts with existing components
• Aims to address LCG-2 shortcoming and advanced needs from
applications (in particular feedback from DCs)
• Prototyping short development cycles for fast user feedback
• Initial web-services based prototypes being tested with
representatives from the application groups
LCG-1
LCG-2
gLite-1
gLite-2
Globus 2 based Web services based
5
Approach
• Exploit experience and components
VDT
EDG
...
AliEn
LCG
...
from existing projects
AliEn, VDT, EDG, LCG, and others
• Design team works out architecture
and design
Architecture: https://edms.cern.ch/document/476451
Design: https://edms.cern.ch/document/487871/
Feedback and guidance from EGEE PTF, EGEE NA4,
LCG GAG, LCG Operations, LCG ARDA
• Components are initially deployed on a prototype infrastructure
Small scale (CERN & Univ. Wisconsin)
Get user feedback on service semantics and interfaces
• After internal integration and testing components to be deployed on
the pre-production service
6
Subsystems/components
LCG2: components
gLite: services
User Interface
AliEn
Computing Element
Worker Node
Workload Management System
Package Management
Job Provenance
Logging and Bookkeeping
Data Management
Information & Monitoring
Job Monitoring
Accounting
Site Proxy
Security
Fabric management
7
Workload Management System
8
Computing Element
• Works in push or pull mode
• Site policy enforcement
• Exploit new Globus GK and
Condor-C (close interaction with
Globus and Condor team)
CEA … Computing Element Acceptance
JC … Job Controller
MON … Monitoring
LRMS … Local Resource Management
System
9
Data Management
• Scheduled data transfers
(like jobs)
• Reliable file transfer
• Site self-consistency
• SRM based storage
10
Catalogs
•
File Catalog
•
Filesystem-like view on logical file names
Keeps track of sites where data is stored
Conflict resolution
Metadata
Catalog
Replica Catalog
•
Keeps information at a site
(Metadata Catalog)
Attributes of files on the logical
level
Boundary between generic
middleware and application layer
Metadata
GUID
LFN
LFN
GUID
SURL
Site ID
Site ID
LFN
GUID
SURL
Replica Catalog Site A
File Catalog
SURL
SURL
Replica Catalog Site B
12
Information and Monitoring
• R-GMA for
e.g: D0 application monitoring:
Information system and
system monitoring
Application Monitoring
Job wrapper
MPP
MPP
architecture
But re-engineer and harden
MPP
the system
• Co-existence and
interoperability with other
systems is a goal
E.g. MonaLisa
DbSP
MPP – Memory Primary Producer
Job wrapper
DbSP – Database Secondary Producer
• No major changes in
Job wrapper
13
Security
Credential
Storage
Obtain Grid (X.509)
credentials for Joe
1.
2.
myProxy
Pseudonymity
Service
(optional)
“Joe → Zyx”
tbd
3.
Attribute
Authority
Joe
“Issue Joe’s
privileges to Zyx”
4.
VOMS
“The Grid”
“User=Zyx
Issuer=Pseudo CA”
GSI
LCAS/LCMAP
S
14
GAS & Package Manager
• Grid Access Service (GAS)
Discovers and manages services on behalf of the user
File and metadata catalogs already integrated
• Package Manager
Provides application software at execution site
Based upon existing solutions
Details being worked out together with experiments and operations
15
Current Prototype
•
WMS
•
WN
•
23 at CERN + 1 at Wisconsin
External SRM implementations
(dCache, Castor), gLite-I/O
Catalogs (CERN)
•
•
•
AliEn FileCatalog, RLS (EDG),
gLite Replica Catalog
VOMS (CERN), myProxy,
gridmapfile and GSI security
User Interface (CERN & Wisc)
•
R-GMA
Security
•
Simple interface defined
Information & Monitoring (CERN,
Wisc)
•
GridFTP
Metadata Catalog (CERN)
SE (CERN, Wisconsin)
•
Globus Gatekeeper, Condor-C,
PBS/LSF , “Pull component” (AliEn
CE)
Data Transfer (CERN, Wisc)
CE (CERN, Wisconsin)
•
AliEn TaskQueue, EDG WMS, EDG
L&B (CNAF)
•
AliEn shell, CLIs and APIs, GAS
Package manager
Prototype based on AliEn PM
Data Scheduling (CERN)
File Transfer Service (Stork)
16
Summary, plans
• Most Grid systems (including LCG2) are batch-job
production oriented, gLite addresses distributed analysis
Most likely will co-exist, at least for a while
• A prototype exists, new services are being added:
Dynamic accounts, gLite CEmon, Globus RLS, File Placement
Service, Data Scheduler, fine-grained authorization, accounting…
• A Pre-Production Testbed is being set up
more sites, tested/stable services
• First release due end of March 2005
Functionality freeze at Christmas
Intense integration and testing period from January to March 2005
• 2nd release candidate: November 2005
May: revised architecture doc, June: revised design doc
17