Solar-Oct-2003

Download Report

Transcript Solar-Oct-2003

Solaris status and plans
HEPIX Autumn 2003
Ignacio Reguero, Michel Manent, Carlos Ungil
presented by
Sebastian Lopienski
21 October 2003
CERN IT-PS-UI Solaris status and plans
1
Executive Summary
• Current Status
– Some figures
• SUNINST0 Network Installation Server
• CAE Server Upgrade
• SUNDEV Technology Refresh
– New 10 Sun 210s
• Implementation of EDG WP4 Quattor fabric
management on Solaris
– System administration view
• Solaris 9 Certification
• Sun Blade Server 1600 and N1 Management
21 October 2003
CERN IT-PS-UI Solaris status and plans
2
Current Status of Solaris Usage at CERN
• Second platform for LHC physics
– Mostly for validation purposes (numerical software)
• Total population of 663 Active nodes
– Figures from LanDB network database
• Around 300 on Solaris 8
• Rest: about a half running Solaris 2.6 and a half
on Solaris 7
– Problem: Most of these machines cannot upgrade OS
without hardware upgrade (disks and memory)
21 October 2003
CERN IT-PS-UI Solaris status and plans
3
SUNINST0 Network Installation Server
• Jumpstart server
• Network configurations + responsible now fully
extracted from LANDB network database
– With single fetch procedure
• After router fix Sun DHCP server is stable
– On request of SM18 LHC Magnet Test had to
demonstrate boot of exotic devices (like data acquisition
devices) from it
• However, still working with CS group to replace
DHCP server with the one of CS
– SOAP interface to an Oracle DB will allow us to update
• Similar than the one in place for Print DNS hierarchy
21 October 2003
CERN IT-PS-UI Solaris status and plans
4
CAE Server Upgrade (1)
• CAE - Electronic design cluster
• To serve the electronics design community
• New server: V480
– 4 x 900 MHz CPU
– 8 Gb SDRAM
– Gigabit ethernet
• A1000 RAID Disk box
– 436Gb with RAID 5 – space for users
• Had to coordinate IDPROM change with disk movement
– Hit technical and sociological problems
– IDPROM change: a need to keep old Cadence licenses on new
server - but SUN reluctant to do it for new HW models
– At the end provided ad hoc solution that does not support OBP
upgrade
• Found the hard way!
• And need OBP upgrade for A1000 support
21 October 2003
CERN IT-PS-UI Solaris status and plans
5
CAE Server Upgrade (2)
• IDPROM problem not solved yet
– Considering to use other machine
• Cannot be too new (works on V220 or 280R)
• Also lots of A1000 RAID box problems
– RAID manager software has to be coordinated with
firmware level in the controller and OBP
• So lots of upgrades required before connecting old A1000s
to new server
– After first installation, additional A1000s are not seen
unless adding entries by hand to /kernel/drv/sd.conf
with the relevant SCSI ID
21 October 2003
CERN IT-PS-UI Solaris status and plans
6
CAE Server Upgrade
21 October 2003
CERN IT-PS-UI Solaris status and plans
7
SUNDEV Technology Refresh
• A cluster for physics development
• 10 Sun Fire V210
– State of the art SPARC machines
– Thin Rack mountable servers
• 1 unit on 19” racks
• They all fit in a single CERN rack together with Gigabit
switch and Sun blade server
–
–
–
–
Dual 1GHz UltraSPARC-IIIi
2Gb memory
2 x 36GB Disk drives
4 x GIGABIT Ethernet on the motherboard
• They are being installed on Solaris 8.7.3 (latest
required for this hardware), later Solaris 9
• Performance improvement at least 120% over
the current SUNDEV machines
21 October 2003
CERN IT-PS-UI Solaris status and plans
8
SUNDEV Technology Refresh
21 October 2003
CERN IT-PS-UI Solaris status and plans
9
Implementation of EDG WP4 Quattor fabric
management on Solaris (1)
• We plan to use Quattor to manage all Solaris
systems
– From Solaris 9 onwards
• What does it mean for us?
– Central Configuration DataBase (CDB)
• Configuration information
• Software to be installed
– Both applications and system
• A cache manager provided for the client accessing the DB
– To avoid dependency on the DB server or on the network
• The configuration database is linked to the network
installation server
– The Jumpstart profile is generated from the database
21 October 2003
CERN IT-PS-UI Solaris status and plans
10
Implementation of EDG WP4 Quattor fabric
management on Solaris (2)
– Node Configuration Manager (NCM)
• A la SUE
• For configuration “components”
– Simplified SUE features
• NCM components are simplified SUE features
• They have single action: “configure”
• They access Configuration DB through the cache manager
– SPMA software distributor (package level)
• Replaces ASIS software distribution (file level)
• For Linux it uses RPMs, for Solaris implemented with
Solaris PKG
• Allows to install packages from various SW repositories
• Several protocols supported: HTTP, file system (AFS), FTP,
etc.
21 October 2003
CERN IT-PS-UI Solaris status and plans
11
Implementation of EDG WP4 Quattor fabric
management on Solaris (3)
• Still working on
– Creation of Solaris NCM Components from existing SUE
features (Juan Pelegrin)
– DB Access Control
• For delegation
– Behavior with “unmanaged” software
REPOSITORY
CDB
pa
n
xm
l
Host
Host
Host
NCM
PKG
REPOSITORY
SPMA
REPOSITORY
target.cf
21 October 2003
CERN IT-PS-UI Solaris status and plans
12
Solaris 9 Certification
• Validating Solaris 9 – running all SW on the new system
• Timescale for the end of 2003
– Refsol9 reference machine now available
• Not big changes in terms of Solaris, but new features:
– “Web Start Flash Archives”: system images for installation
• Nice for farms (but for same HW)
– Resource pools
• Guaranteed resources for an application on large shared systems
– Gnome 2.0 is the standard desktop environment
– We deliver Mozilla 1.4 (instead of Netscape recomm. by Sun)
– Sun ONE Studio 8 as default compiler
• Replacement of ASIS and SUE with Quattor
• More Open Source software packaged with the system
– Perl, Bash,…
– Some of these products supported on same basis as SUN
native ones
– Probably occasion to reduce the number of products
maintained by us
21 October 2003
CERN IT-PS-UI Solaris status and plans
13
Solaris Reference machines + Installation Server
21 October 2003
CERN IT-PS-UI Solaris status and plans
14
Sun Blade Server 1600
• Sun Blade server 1600
– Packaged farm
– Fits in 3 units of a 19” rack
– SSC Controller with gigabit switch that manages up to 16
CPUs
•
•
•
•
Several Gigabit Ethernet external connections
VLAN with 16 Gigabit Ethernet Interface
Protection attack by Packet Filter configuration
Console through Serial Port for each Blade
• Received 12 X 650MHz UltraSPARC-IIe
• Waiting for 4 “Intel Compatible” CPUs
– AMD Athlon XP-M 1.2GHz
• Other Specialized Blades supported on hardware level
– SSL Encryptor
– Load Balancer
21 October 2003
CERN IT-PS-UI Solaris status and plans
15
Sun Blade Server 1600 system chassis
External Switch
SSC0
Switch Fabric
Switch Fabric
(active)
(standby)
Slot 0……s15
137.138.x.x (ce0)
SSC1
Slot 0……s15
137.138.x.x (ce1)
137.138.x.x (ce0)
137.138.x.x (ce1)
Blades 0…….15
21 October 2003
CERN IT-PS-UI Solaris status and plans
16
Sun Blade Server 1600
21 October 2003
CERN IT-PS-UI Solaris status and plans
17
Sun Blade Server 1600 Installation and
Configuration
• Fully automated network installation (DHCP)
using Jumpstart from SUNINST0
– Initial configuration, installation & application software
• One private IP address for each System Controller
• One IP address for each Blade
• Ongoing Test of Web Start Flash Archives
– Quick replicate one Blade’s operating environment &
application software on other Blades intended
• Sun VTS (Validation Test Suite) online diagnostics
tool
– verifies configuration and functionality of hardware
controllers, devices and platforms
21 October 2003
CERN IT-PS-UI Solaris status and plans
18
Sun N1 System Management Framework
• Sun N1 Provisioning server 3.0 Blade Edition
being tested
– Automates configuration and deployment different kinds
of servers
• Including specialized servers
• Assignment may vary according to a schedule or other
input – dynamic management of clusters
• To compare N1 with Quattor functionality
• Question: could N1 manage heterogeneous farms
out the Blade server scope?
21 October 2003
CERN IT-PS-UI Solaris status and plans
19
Questions?
Unix Infrastructure section:
http://cern.ch/product-support/UI
[email protected]
[email protected]
[email protected]
21 October 2003
CERN IT-PS-UI Solaris status and plans
20