HEPiX Report

Download Report

Transcript HEPiX Report

HEPiX Spring 2008 @ CERN Summary Report
HEPSysMan @ RAL
19-20 June 2008
Martin Bly
Overview
•
•
•
•
•
Venue/Format/Themes
CPU Benchmarking Working Group
Storage and File Systems Working Group
Scientific Linux
Selected topics
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Spring HEPiX 2008
• Venue: CERN - 5th to 9th May
– Council Chamber
• Very comfortable, good wireless network access
• Format
– Sessions based on themes with a morning
‘plenary’ by an invited speaker
– ½ to 1 day per theme
• Agenda: http://indico.cern.ch/conferenceTimeTable.py?confId=27391
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Themes
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
LHC and Data Readiness
LHC overview
Trigger farms of LHC experiments
LCG overview and status
CCRC
Site Reports
Storage technology
CPU technology
Data centre management, availability, and reliability
Problem resolution, problem tracking, alarm systems
System management
Networking infrastructure and computer security
Applications and Operating systems
HEPiX ‘bazaar and think-tank’
General Virtualisation
Grid stuff (Monitoring etc.)
Miscellaneous
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Benchmarking Working Group
•
WLCG MoUs based on SI2K
–
SPEC2000 deprecated in favour of SPEC2006
• no longer available and maintained
•
Remit
–
•
•
Find a benchmark accepted by HEP and others as many sites serve different communities
Review of existing benchmarking practices (CERN, FZK, INFN, …)
Last 6 months: setup of benchmarking test-bed with dedicated HW at CERN and others
–
–
Covering of wide range of processors with typical HEP configuration (2GB/core)
Run SPEC benchmarks with agreed flags
• SL4/64bit OS with benchmarks at 32-bit/gcc 3.4
• Look at SL5, 64-bit, gcc4
–
Run of variety of ‘standard candles’ from LHC experiment’s code to compare with SPEC
• Provides scaling and recalibration of computi9ng requirement
•
Looking at understanding the statistical treatment of experiment results
–
•
Recently uncovered different methodologies for random numbers!
No major scaling problem with either SI2K or SI2K6
–
Should allow a smooth transition
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
File Systems Working Group
• Started with a questionnaire about storage at T1s
• Followed up with a technology review and selection
- Posix FS (TFA) : LUSTRE, GPFS, AFS
- SRM : CASTOR, dCache, DPM
- Xrootd
• Performance comparison between selected technologies
• Testbed setup at CERN with 10 servers and 60 8-core clients with 1
Gb/s connection, 4-5 6TB
– 480 simultaneous client tasks
– 3 tests : writing, sequential read, pseudo-random read
– Most implementations able to sustain wire-speed in writes and sequential
reads
– Significant performance advantage for LUSTRE in pseudo-random reads but
must clarify test conditions
• Use case may be an advantage for LUSTRE client-side caching
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Scientific Linux
• Review of recent releases: SL5.1, SL4.6
– Trying to put the 64bit versions out at the same time as the 32bit
versions
• Obsolete 3.0.1 to 3.0.8
• Description of issue with ‘new’ tags in version numbers
appearing to make new versions appear older to yum
• Working on automating ‘fastbugs’ repositories
• Clarifying policy on security errata
• Future:
– SL3.0.9 to continue till October 2010.
– Planning on doing SL4.7, SL5.2, SL6.
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
SL discussions
•
Support for SL4?
–
–
•
Critical that Grid middleware is available
–
–
•
•
In or out? Consensus is to have it in using the usual kernel module system. Jan Iven hears from unreliable source
that back-ports of latest version are coming. SL4 or SL5? SL4 contrib, SL5 standard. Does it work with 32bit?
Yes, kernel now less hostile.
Scientific Linux 6: Should it be based on CentOS?
–
–
–
–
•
Compiler is the important factor rather than the actual version of SL
Encourage shorter deadlines with more flexibility on extending deadlines – likely to get better buy-in from users
So suggest July 2010? Suggestion of October 2010, same as SL3, to stop short-term migration.
XFS in SL?
–
•
DESY need SL4 to Autumn 2011
CERN intending to introduce batch and UIs for SL5 in Autumn 2008, so WN gLite payload should be available
Some concern over experiment readiness
–
–
–
•
RHEL4: full support 3 years, deployment support 3-5 years, maintenance support for 5-7 years.
RHEL released Feb 2005 so in deployment support
Still do installer changes
Still add RPMs we usually do
Use precompiled RPMs
Change/recompile RPMs we feel the need to (SL graphics).
Kernels modules: Adding security repo during the install gets the correct kernel but incorrect
modules. Can fix installer, fix up afterwards with a script, or use dkms. Add dkms to release, do it
instead of kernel modules?
Stop Press: RHEL 4 lifetime extended: ‘full support’ for 4 years…
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Selected Topics I
• Well attended talk by Sascha Brawer from Google,
describing their technology and methods for
handling very large datasets over distributed
geographical locations
• Based on truckloads of low cost systems
– Care about performance per $ not raw performance
– In house rack design, chassis-less PC-class
motherboards, low end storage
– Many data centres around the world
– Need to design software to cope with failures
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Selected Topics II
• Several talks on experiences with Lustre
– DESY – good description of setting it up
– GSI – talk about production use
– Lustre appears stable and reliable as a
production distributed file system
• Proof against various failure modes
• Sverre Jarp gave a review of the CERN
OpenLab and what they are working on
– Collaboration with HP, Intel, Oracle…
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Experience with Windows Vista at CERN
• Update on Vista activities at CERN
– status, plans etc.
– Using readiness check to determine suitability, Vista not
the default (XP).
– Now 300 machines (~5%) running Vista.
– Notes on introduction of SP1
– Feb 2008: still preparing for the upgrade rollout. RFM
removed in favour of popup nagging.
– Vista SP1 improved performance over XP or standard
vista, but not by much in most cases.
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Virtualisation with Windows at CERN
• Review of virtualisation in IT services at
CERN
• 17 physical servers with 45 ‘clients’ ranging
through Windows server variants and SLC4/5
– Using Virtual Server 2005
• New Hyper-V – part of Windows Server 2008,
needs 64bit CPU
– Supports 32/64bit guests, large RAM (>32GB) in
VMs
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL
Remote Administration via Service Modules
• Work at GSI on using IPMI modules to
administer remotely located server
hardware
– Disadvantages of remote access using standard
tools, not the least of which is you need a
running OS.
• Discussion of advantages of using IPMI
modules for remote control
– changing BIOS settings, resets, installing…
– Detailed description of capabilities.
20 June 2006
HEPiX Spring 2008 Report - HEPSysMan @ RAL