Lessons learned from Data Management in the EU DataGrid

Download Report

Transcript Lessons learned from Data Management in the EU DataGrid

Lessons learned from Data
Management in the EU DataGrid
Peter Kunszt
CERN IT/DB
EU DataGrid Data Management
[email protected]
P.Kunszt
Openlab 17.3.2003
1
Outline
• The EU DataGrid
• Data Management Architecture
• Mechanisms used, conclusions and requests
P.Kunszt
Openlab 17.3.2003
2
EDG overview : goals
• DataGrid is a project funded by European Union whose objective is to
exploit and build the next generation computing infrastructure
providing intensive computation and analysis of shared large-scale
databases.
• Enable data intensive sciences by providing world wide Grid test beds
to large distributed scientific organisations ( “Virtual Organisations,
VOs”)
•
Start ( Kick off ) : Jan 1, 2001
End : Dec 31, 2003
• Applications/End Users Communities : HEP, Earth Observation,
Biology
• Specific Project Objetives:
– Middleware for fabric & grid management
– Large scale testbed
– Production quality demonstrations
– Contribute to Open Standards and international bodies
( GGF, Industry&Research forum)
P.Kunszt
Openlab 17.3.2003
3
EDG overview : Main
Partners
• CERN – International (Switzerland/France)
• CNRS - France
• ESA/ESRIN – International (Italy)
• INFN - Italy
• NIKHEF – The Netherlands
• PPARC - UK
P.Kunszt
Openlab 17.3.2003
4
EDG overview : Assistant
Partners
Industrial Partners
•Datamat (Italy)
•IBM-UK (UK)
•CS-SI (France)
Research and Academic Institutes
•CESNET (Czech Republic)
•Commissariat à l'énergie atomique (CEA) – France
•Computer and Automation Research Institute,
Hungarian Academy of Sciences (MTA SZTAKI)
•Consiglio Nazionale delle Ricerche (Italy)
•Helsinki Institute of Physics – Finland
•Institut de Fisica d'Altes Energies (IFAE) - Spain
•Istituto Trentino di Cultura (IRST) – Italy
•Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany
•Royal Netherlands Meteorological Institute (KNMI)
•Ruprecht-Karls-Universität Heidelberg - Germany
•Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands
•Swedish Research Council - Sweden
P.Kunszt
Openlab 17.3.2003
5
EDG overview : structure , work
packages
• The EDG collaboration is structured in 12 Work Packages
– WP1: Work Load Management System
– WP2: Data Management
– WP3: Grid Monitoring / Grid Information Systems
– WP4: Fabric Management
– WP5: Storage Element
– WP6: Testbed and demonstrators
– WP7: Network Monitoring
– WP8: High Energy Physics Applications
Applications
– WP9: Earth Observation
– WP10: Biology
– WP11: Dissemination
– WP12: Management
}
P.Kunszt
Openlab 17.3.2003
6
Grid Data Management
Dependencies
Media
Hardware
Operating System
Local File System
Network Software
Protocols
Storage System
Performance
Reliability
Availability
Usability
P.Kunszt
Openlab 17.3.2003
7
EDG Architecture v1.x
Replica Catalog
edg-replica-manager
User
Interface
GDMP
GridFTP
server
Computing Element
NFS
WN
WN
WN
WN
WN
WN
WN
WN
edg-replica-manager
WN
WN
WN
WN
Staging daemon
WN
WN
WN
WN
RFIO
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
edg-replica-manager
GridFTP
server
Computing Element
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
edg-replica-manager
GridFTP
server
Computing Element
Castor
P.Kunszt
Computing Element
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
WN
edg-replica-manager
Openlab 17.3.2003
GridFTP
server
8
I/O and Storage
• Modes of I/O on the EDG testbed
– NFS mounted
– RFIO (Castor)
– GridFTP
• Mass Storage
– Castor
P.Kunszt
Openlab 17.3.2003
9
I/O and Storage
• Conclusions / shortcomings
– NFSv2 on Linux not really suitable
• Does not scale to large Fabrics
• Cannot access remote files
• No proper security mapping
– RFIO needs work
• Security and user control is not suitable for Grid users
• Remote I/O also has security issues
• Not standard, i.e. needs to be especially deployed at Grid sites
– GridFTP
• Buggy protocol
• Compatibility issues between versions
• Only one implementation
P.Kunszt
Openlab 17.3.2003
10
I/O and Storage
Request for
• I/O level
– fine grained access control lists
– a wide variety of protocols
– a wide variety of authentication, authorization and
policy layers
• Storage management level
–
–
–
–
P.Kunszt
data pinning and lifetime management
space reservation capabilities
transparent mass storage bindings
inter-storage copy and communication
Openlab 17.3.2003
11
Catalogs
Replica Catalog: Storing logical to physical name mappings
• EDG used the Globus Replica-Catalog:
– LDAP-based
– Single point of access
– Logical Name scheme is bound to Physical Name
• Conclusions
– Such a solution does not scale for file catalogs, i.e. LDAPbased solutions are not suitable
– Users did not like the Logical Name being restricted
• Request for
– Fine grained access control of catalog data
– Consistency checking in catalog
RLS, RMC to address
– Pre-registration
most issues
P.Kunszt
Openlab 17.3.2003
12
Catalogs
Information Services: storing service status information
• EDG used Globus-MDS (Meta-computing Directory Service)
– distributed LDAP with a given schema
– local information services and global indices
• Conclusions
– too many synchronization problems
– not scalable enough
– insufficient caching mechanisms
R-GMA to address
most issues
• Request for
– Robust up-to-date information service in general
– Management layer (schema evolution, ACL)
– Different capabilities for different kind of information (location
info, archived info, statistics, tickers)
P.Kunszt
Openlab 17.3.2003
13
More Lessons Learned:
Manageability
• Virtual Organization management:
– The user base of a VO was managed in EDG
through a single LDAP catalog.
– VO membership needs to be properly
exposed/interpreted by all services, applying VO
and site policies
– The administration of the VO catalog needs to be
simplified, better ease of use.
• VOMS = Virtual Organization Membership Service
– To address most of these issues, first version to be
deployed this year
P.Kunszt
Openlab 17.3.2003
14
More Lessons Learned:
Security
• Security Infrastructure is a hard problem
– Was not properly tackled by EDG.
– GSI is a means to authenticate but not to authorize
• Issues
– Delegation of rights to services
VOMS to address
– Service certificates
some issues,
– Automatic renewal of certificates extended capabilities
– Kerberos tickets from certificates of services – new EDG
security design
– User’s private keys
P.Kunszt
Openlab 17.3.2003
15
More Lessons Learned:
Generic issues
• No clear concept in Grid community how to deal with
data access and storage
• Grid Database bindings are not well tested yet.
• There is a clear drive for common interfaces between
components – Open Grid Service Architecture effort.
• Service discovery and service monitoring architecture
not well defined yet.
P.Kunszt
Openlab 17.3.2003
16
And let’s not forget:
Organization and Sociology
• The Grid is an inherently distributed environment
also in this sense.
• Reaching agreements is hard work.
– Common Timeline
– Common Interfaces
– Common Procedures
– Common Policies
– Definition of supported hardware
– Common User support structure
• Forcing solutions down people’s throats does not work.
• Diversity of local policies is too large, but needs to be
accomodated.
P.Kunszt
Openlab 17.3.2003
17
Summary and Outlook
• Our first experimental phase of Grids is almost over
• We now have some experience with trying to run a
production Grid and know its biggest deficiencies
• Components causing the most trouble are being
replaced now.
• New experience will be gained with LCG-1 in the
second half of this year.
• There are a lot of opportunities for openlab!
P.Kunszt
Openlab 17.3.2003
18