Report on CHEP 2007 - Science and Technology Facilities

Download Report

Transcript Report on CHEP 2007 - Science and Technology Facilities

Report on CHEP 2007
Raja Nandakumar
1
Synopsis
Two classes of talks and posters
➟Computer hardware
▓
▓
Dominated by cooling / power consumption
Mostly in the plenary sessions
➟Software
▓
Grid job workload management systems
 Job submission by the experiments
 Site Job handling, monitoring
 Grid operations (Monte Carlo production, glexec, interoperability, …)
 Data integrity checking
 ….
▓
Storage systems
 Primarily concerning dCache and DPM
 Distributed storage systems
Parallel session : Grid middleware and tools
2
Computing hardware
Power requirements of LHC computing
➟Important for running costs
▓
~330W to provision for 100W of electronics
➟Some sites running with air or water cooled racks
Electronics
100 W
Server fans
13 W
Voltage regulation
22 W
Server fans
Case power supply
48 W
Case power supply
Electronics
Voltage regulation
Room power distribution
Room power distribution
UPS
Room cooling
4W
18 W
UPS
Room cooling
125 W
3
High performance and
multi-core computing
 Core Frequencies ~ 2-4 GHz, will not change significantly
 Power
➟ 1,000,000 cores at 25 W / core = 25 MW
▓
Just for the cpu
▓
reduces chip frequency, complexity, capability
➟ Have to bring core power down by multiple orders of magnitude
 Memory Bandwidth
➟ As we add cores to a chip, it is increasingly difficulty to provide sufficient
memory bandwidth
➟ Application tuning to manage memory bandwidth becomes critical
 Network and I/O Bandwidth, data integrity, reliability
➟ A Petascale computer will have Petabytes of Memory
➟ Current Single File Servers achieve 2-4 GB/s
▓
70+ hours to checkpoint 1 Petabyte
➟ IO management is a major challenge
 Memory Cost
➟ Can’t expect to maintain current memory / core numbers at petascale.
▓
2GB/core for ATLAS / CMS
4
Grid job submission
 Most new developments were on pilot agent based grid systems
➟ Implement job scheduling based on “pull” scheduling paradigm
➟ The only method for grid job submission LHCb
▓
▓
DIRAC (> 3 years experience)
Ganga is the user analysis front end
▓
AliEn since 2001
▓
Panda : Atlas, Charmm
➟ Also used in Alice (and Panda and Magic)
➟ Used for production, user analysis, data management in LHCb & Alice
➟ New developments for others
 Central server based on Apache
▓
GlideIn : Atlas, CMS, CDF
 Based on Condor
▓
Used for production and analysis
▓
▓
Real-time view of the local environment
Pilot agents can have some intelligence built into the system
➟ Very successful implementations
 Useful for heterogeneous computing environment
▓
Recently Panda to be used for all Atlas production
 One talk on distributed batch systems
5
Pilot agents
Pilot agents submitted on
demand
➟Reserve the resource for
immediate use
▓
▓
▓
Allows checking of the environment
before job scheduling
Only bidirectional network traffic
Unidirectional connectivity
➟Terminates gracefully if no work
is available
➟Also called GlideIn-s
LCG jobs are essentially pilot
jobs for the experiment
6
DIRAC WMS
7
Panda WMS
8
Alice (AliEn / MonaLisa)
History plot of running jobs
9
LHCb (Dirac)
Max running jobs snapshot
10
Glexec
 A thin layer to change Unix domain credentials based on grid
identity and attribute information
 Different modes of operation
➟ With or without setuid
▓
Ability to change the user id of the final job
 Enable VO to
➟ Internally manage job scheduling and prioritisation
➟ Late binding of user jobs to pilots
 In production at Fermilab
➟ Code ready and tested, awaiting full audit
11
LSF universus
Job Scheduler
Web Portal
MultiCluster
LSF Scheduler
Cluster/Desktops
LSF Scheduler
Cluster/Desktops
LSF
PBS
SGE
CCE
12
LSF universus
Commercial extension of LSF
➟Interface to multiple clusters
➟Centralised scheduler, but sites retain local control
➟LSF daemons installed on head nodes of remote
cluster
➟Kerberos for user, host and service authentication
➟Scp for file transfer
Currently deployed in
➟Sandia National labs to link OpenPBS, PBS Pro and
LSF clusters
➟Singapore national grid to link PBS Pro, LSF and
N1GE clusters
➟Distributed European Infrastructure for
Supercomputing Applications (DEISA)
13
Grid interoperability
ARC
OSG
EGEE
Job Submission
GridFTP
GRAM
GRAM
Service Discovery
LDAP/GIIS
LDAP/GIIS LDAP/BDII
Schema
ARC
GLUE v1
GLUE v1.2
Storage Transfer Protocol
GridFTP
GridFTP
GridFTP
Storage Control Protocol
SRM
SRM
SRM
Security
GSI/VOMS
GSI/VOMS GSI/VOMS
 Many different grids
➟ WLCG, Nordugrid, Teragrid, …
➟ Experiments span the various grids
 Short term solutions have to be ad-hoc
➟ Maintain parallel infrastructures by the user, site or both
 For the medium term setup adaptors and translators
 In the long term adopt common standards and interfaces
➟
➟
➟
➟
Important in security, information, CE, SE
Most grids use X509 standard
Multiple “common” standards …
GIN (Grid interoperability now) group working on some of this
14
Distributed storage
GridPP organised into 4 regional
Tier-2s in the UK
Currently a job follows data into
a site
➟Consider disk at one site as close
to cpu at another site
▓
Eg. Disk at Edinburgh vs cpu at
Glasgow
➟Pool resources for efficiency and
ease of use
➟Jobs need to access storage
directly from the worker node
15
 RTT between Glasgow and
Edinburgh ~ 12 s
 Custom rfio client
➟ Normal : One call / read
➟ Readbuf : Fills internal buffer to
service request
➟ Readahead : Reads till EOF
➟ Streaming : Separate streams for
control & data
 Tests using single DPM server
 Atlas expects ~ 10 MiB/s / job
 Better performance with dedicated
light path
 Ultimately a single DPM instance
to span Glasgow and Edinburgh
sites
16
Data Integrity
 Large number of components performing data
management in an experiment
 Two approaches to checking data integrity
➟ Automatic agents continuously performing checks
➟ Checks in response to special events
 Different catalogs in LHCb : Bookkeeping, LFC, SE
 Issues seen :
➟ zero size files:
➟ missing replica information:
➟ wrong SAPath
➟ wrong SE host:
➟ wrong protocol
▓ sfn, rfio, bbftp…
➟ mistakes in files registration
▓ blank spaces on the surl path
▓ carriage returns
▓ presence of port number in the surl path..
17
Summary
 Many experiments have embraced the grid
 Many interesting challenges ahead
➟ Hardware
▓
Reduce the power consumed by cpu-s
▓ Applications need to manage with lesser RAM
➟ Software
▓
▓
▓
Grid interoperability
Security with generic pilots / glexec
Distributed grid network
 And many opportunities
➟ To test solutions to above issues
➟ Stress test the grid infrastructure
▓
▓
Get ready for data taking
Implement lessons in other fields
 Biomed …
➟ Note : 1 fully digitised film = 4 PB and needs 1.25 GB/s to play
18