gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer.

Download Report

Transcript gLite, the next generation middleware for Grid computing Oxana Smirnova (Lund/CERN) Nordic Grid Neighborhood Meeting Linköping, October 20, 2004 Uses material from E.Laure and F.Hemmer.

gLite, the next generation middleware
for Grid computing
Oxana Smirnova (Lund/CERN)
Nordic Grid Neighborhood Meeting
Linköping, October 20, 2004
Uses material from E.Laure and F.Hemmer
gLite
• What is gLite:
 “the next generation middleware for grid
computing”
 “collaborative efforts of more than 80
people in 10 different academic and
industrial research centers”
 “Part of the EGEE project
(http://www.eu-egee.org)”
 “bleeding-edge, best-of-breed framework
for building grid applications tapping into
the power of distributed computing and
storage resources across the Internet”
EGEE Activity Areas
(quoted from http://www.glite.org)
Nordic contributors: HIP, PDC, UiB
2
Architecture guiding principles
•
Lightweight services



•
Interoperability

•
Large-scale deployment and continuous usage
Being built on Scientific Linux and Windows
Co-existence with deployed infrastructure




•
Allow for multiple implementations
Portability

•
60+ external dependencies
Performance/Scalability & Resilience/Fault
Tolerance

•
Easily and quickly deployable
Use existing services where possible as basis for
re-engineering
“Lightweight” does not mean less services or
non- intrusiveness – it means modularity
Reduce requirements on participating sites
Flexible service deployment
Multiple services running on the same physical
machine (if possible)
Co-existence with LCG-2 and OSG (US) are
essential for the EGEE Grid service
Service oriented approach
…
3
Service-oriented approach
• By adopting the Open Grid Services Architecture, with
components that are:
 Loosely coupled (messages)
 Accessible across network; modular and self-contained; clean
modes of failure
 Can change implementation without changing interfaces
 Can be developed in anticipation of new use cases
• Follow WSRF standardization
 No mature WSRF implementations exist to-date so start with plain
WS
• WSRF compliance is not an immediate goal, but the WSRF evolution
is followed
• WS-I compliance is important
4
gLite vs LCG-2
• Intended to replace LCG-2
• Starts with existing components
• Aims to address LCG-2 shortcoming and advanced needs from
applications (in particular feedback from DCs)
• Prototyping short development cycles for fast user feedback
• Initial web-services based prototypes being tested with
representatives from the application groups
LCG-1
LCG-2
gLite-1
gLite-2
Globus 2 based Web services based
5
Approach
• Exploit experience and components
VDT
EDG
...
AliEn
LCG
...
from existing projects
 AliEn, VDT, EDG, LCG, and others
• Design team works out architecture
and design
 Architecture: https://edms.cern.ch/document/476451
 Design: https://edms.cern.ch/document/487871/
 Feedback and guidance from EGEE PTF, EGEE NA4,
LCG GAG, LCG Operations, LCG ARDA
• Components are initially deployed on a prototype infrastructure
 Small scale (CERN & Univ. Wisconsin)
 Get user feedback on service semantics and interfaces
• After internal integration and testing components to be deployed on
the pre-production service
6
Subsystems/components
LCG2: components
gLite: services
User Interface
AliEn
Computing Element
Worker Node
Workload Management System
Package Management
Job Provenance
Logging and Bookkeeping
Data Management
Information & Monitoring
Job Monitoring
Accounting
Site Proxy
Security
Fabric management
7
Workload Management System
8
Computing Element
• Works in push or pull mode
• Site policy enforcement
• Exploit new Globus GK and
Condor-C (close interaction with
Globus and Condor team)
CEA … Computing Element Acceptance
JC … Job Controller
MON … Monitoring
LRMS … Local Resource Management
System
9
Data Management
• Scheduled data transfers
(like jobs)
• Reliable file transfer
• Site self-consistency
• SRM based storage
10
Catalogs
•
File Catalog



•
Filesystem-like view on logical file names
Keeps track of sites where data is stored
Conflict resolution
Metadata
Catalog
Replica Catalog

•
Keeps information at a site
(Metadata Catalog)


Attributes of files on the logical
level
Boundary between generic
middleware and application layer
Metadata
GUID
LFN
LFN
GUID
SURL
Site ID
Site ID
LFN
GUID
SURL
Replica Catalog Site A
File Catalog
SURL
SURL
Replica Catalog Site B
12
Information and Monitoring
• R-GMA for
e.g: D0 application monitoring:
 Information system and
system monitoring
 Application Monitoring
Job wrapper
MPP
MPP
architecture
 But re-engineer and harden
MPP
the system
• Co-existence and
interoperability with other
systems is a goal
 E.g. MonaLisa
DbSP
MPP – Memory Primary Producer
Job wrapper
DbSP – Database Secondary Producer
• No major changes in
Job wrapper
13
Security
Credential
Storage
Obtain Grid (X.509)
credentials for Joe
1.
2.
myProxy
Pseudonymity
Service
(optional)
“Joe → Zyx”
tbd
3.
Attribute
Authority
Joe
“Issue Joe’s
privileges to Zyx”
4.
VOMS
“The Grid”
“User=Zyx
Issuer=Pseudo CA”
GSI
LCAS/LCMAP
S
14
GAS & Package Manager
• Grid Access Service (GAS)
 Discovers and manages services on behalf of the user
 File and metadata catalogs already integrated
• Package Manager
 Provides application software at execution site
 Based upon existing solutions
 Details being worked out together with experiments and operations
15
Current Prototype
•
WMS

•
WN

•
23 at CERN + 1 at Wisconsin
External SRM implementations
(dCache, Castor), gLite-I/O
Catalogs (CERN)

•
•
•
AliEn FileCatalog, RLS (EDG),
gLite Replica Catalog
VOMS (CERN), myProxy,
gridmapfile and GSI security
User Interface (CERN & Wisc)

•
R-GMA
Security

•
Simple interface defined
Information & Monitoring (CERN,
Wisc)

•
GridFTP
Metadata Catalog (CERN)

SE (CERN, Wisconsin)

•
Globus Gatekeeper, Condor-C,
PBS/LSF , “Pull component” (AliEn
CE)
Data Transfer (CERN, Wisc)

CE (CERN, Wisconsin)

•
AliEn TaskQueue, EDG WMS, EDG
L&B (CNAF)
•
AliEn shell, CLIs and APIs, GAS
Package manager

Prototype based on AliEn PM
Data Scheduling (CERN)

File Transfer Service (Stork)
16
Summary, plans
• Most Grid systems (including LCG2) are batch-job
production oriented, gLite addresses distributed analysis
 Most likely will co-exist, at least for a while
• A prototype exists, new services are being added:
 Dynamic accounts, gLite CEmon, Globus RLS, File Placement
Service, Data Scheduler, fine-grained authorization, accounting…
• A Pre-Production Testbed is being set up
 more sites, tested/stable services
• First release due end of March 2005
 Functionality freeze at Christmas
 Intense integration and testing period from January to March 2005
• 2nd release candidate: November 2005
 May: revised architecture doc, June: revised design doc
17