Document 7453585
Download
Report
Transcript Document 7453585
Site Report
Roberto Gomezel
INFN
October,18-22 2004
Outline of Presentation
Computing Environment
Security
Services
Network
AFS
BBS
INFN Farms
Tier 1 at CNAF
2
October,18-22 2004
Computing Environment and security
95% of boxes are PCs running Linux or Windows
Mac OS boxes keep on living
Just a few commercial unix boxes only used for
specific tasks or needs
VPNs available in many sites
Cisco boxes using IPsec
NetScreen boxes using IPsec
SSL VPNs are under evaluation
The use of SSL eliminates the need of installing client software
it enables instant access for users simply using a Web browser
Network Security
Dedicated Firewall machines just in a few sites
Implemented with access lists on router connected to
WAN
INFN Site Report – R.Gomezel
October,18-22 2004
3
Desktop
PCs running Linux and Windows
Automatic installation using Kickstart for Linux
and RIS for Windows
Metaframe Citrix or Vmware used to reduce the
need to install Windows OS on all PCs for
desktop applications
A few sites chose to outsource support for
desktop environment due to lack of personnel
INFN Site Report – R.Gomezel
4
October,18-22 2004
Backup
Tape Libraries used:
AIT2 – a few sites
IBM Magstar – just used at LNF
DLT, LTO – wide spread
Backup tools:
IBM Tivoli – quite used
HP Omniback – quite used
Atempo Time Navigator – just a few sites
Domestic tool - widespread
INFN Site Report – R.Gomezel
5
October,18-22 2004
Wireless LAN
Access point running standard 802.11b,g
All sites are using wireless connection as meeting or
conferences are running
Most of them use it to give connection to laptop
computers
Security issues:
Permission based on Secure Port filtering (MAC Address) –
poor security
No encryption used
Some sites are using 802.1X
INFN Site Report – R.Gomezel
6
October,18-22 2004
E-mail
Mail Transfer Agent
Sendmail – widespread and more used (86%)
Postfix – a few sites (14%)
But there is an increasing number of sites planning to move from
sendmail to postfix
Hardware and OS
17%
17%
9%
Alpha
Solaris
Intel/Linux
Intel/BSD
57%
INFN Site Report – R.Gomezel
7
October,18-22 2004
E-mail user agent
All INFN sites provide an HTTP mail user
agent
One-third uses IMP
One-third uses SQUIRREL
Others:
IMHO, Open WebMail, Cyrus+Roxen…
Other mail user agents
Pine, Internet Explorer, Mozilla…
INFN Site Report – R.Gomezel
October,18-22 2004
8
E-mail antivirus
Amavis
18%
None
32%
Rav
27%
Other
68%
Sophos
9%
Clamav Vexira
5%
9%
None
Rav
Amavis
Sophos
Vexira
Clamav
INFN Site Report – R.Gomezel
October,18-22 2004
9
E-mail antispam
75% of INFN sites are using SPAM Assassin as
tool to reduce junk e-mail
Some sites use RAV or Sophos
Just a few sites (5%) are using nothing
An acl filter was set on port 25 in order to avoid
that hosts not authorized can act as mail relay
Only authorized mail relay are allowed to send
and receive mail for a specific site
INFN Site Report – R.Gomezel
October,18-22 2004
10
Security issues
15%
70%
2002
60%
2003
2004
50%
10%
40%
30%
5%
20%
10%
0%
0%
worm/virus
warez
spam
others
Monitored by GARR-CERT
2000
2001
2002
2003
2004
Incidents coming from INFN hosts
(percentage)
• Goal by the end 2004:
•define a new policy for ACL setting
•Input filter: default deny
services just on hosts checked very strictly
Output filter:
port 25
INFN Site Report – R.Gomezel
October,18-22 2004
11
INFN network
LAN backbone network mainly based
on Gigabit Ethernet
Layer 2 and 3 switching
No layer 4 switching
The INFN WAN network is
completely integrated into the GARR,
nation-wide infrastructure, providing a
backbone connectivity at 2.5 Gigabit
POP typical access bandwidth for INFN sites:
34Mbps, 155 Mbps, Gigabit ethernet
There is a trend to have a Gigabit Ethernet access
in any site with a bandwidth management through
rate limiting mechanism (CAR) according to the
needs of the specific site
INFN Site Report – R.Gomezel
12
October,18-22 2004
AFS
INFN sites keep on using AFS services to share data
and software throughout sites
Most of local cells have completely moved server
functionality to Linux boxes running OpenAFS
software
Authentication and file server functionalities of the
nation-wide cell INFN.IT are running on Linux boxes
with OpenAFS
The migration of INFN.IT authentication servers from
Kerberos IV to Kerberos V is expected to be
accomplished by the end of the year
INFN Site Report – R.Gomezel
13
October,18-22 2004
BBS - Bologna Batch System
The Bologna Batch System (BBS) is a software tool that allows users
from INFN Bologna to submit batch jobs to a set of well defined
machines, from any INFN Bologna machines with Condor installed.
Collaboration between the C. S. Dept., Univ. of Wisconsin-Madison
and the INFN Bologna.
Main features of BBS:
Any executable can be submitted to the system (scripts,
compiled and linked programs, etc.).
Two different 'queues' , short and long. Short and long jobs
have a different priority (nice) when running on the same
machine.
Short jobs may run for no longer than an hour, but run at a
higher priority.
BBS tries to balance the load of the BBS CPUs.
P.Mazzanti
October,18-22 2004
14
BBS
Presently the system consists of 16 2-CPU servers,
Linux RedHat 9 and a single CPU machine. 7 machines
are from ALICE experiment.
BBS machines belong to the large INFN WAN Pool;
they may be accessed from outside when no BBS job
is running, while becoming IMMEDIATELY available
when a BBS job asks to be run.
Only short jobs will be accepted by the 7 ALICE machines if submitted
non ALICE group user.
P.Mazzanti
15
October,18-22 2004
Aggregate jobs, daily
Aggregate jobs, weekly
P.Mazzanti
16
October,18-22 2004
boi1.bo.infn.it
daily Load
boi1.bo.infn.it
weekly Load
P.Mazzanti
17
October,18-22 2004
INFN Site Farm: a new challenge
Some sites are planning to reconfigure and integrate
computing facilities and local experiment-specific farm into a
unique computing farm
Introduction of SAN infrastructure to connect storage systems
and computing units
Reason: in order to avoid the increasing deployment of a lot of little and
private farms for each single experiment in addition to the general
purpose computing facility
GFS file system is under evaluation as an efficient way of providing a
cluster file sytem and volume manager
Interesting because it is part of the SL3 distribution
A lot of work for designing a mechanism to provide computing
resources to different experiments according to their needs in a
dynamic way
We can learn from the experience coming from CNAF Tier1 and other
Labs
INFN Site Report – R.Gomezel
18
October,18-22 2004
Hardware solutions for
the Tier1 at CNAF
Luca dell’Agnello
Stefano Zani
(INFN – CNAF, Italy)
Luca dell’Agnello -Stefano Zani
October,18-22 2004
19
Tier1
INFN computing facility for HEP community
Ending prototype phase last year, now fully operational
Location: INFN-CNAF, Bologna (Italy)
One of the main nodes on GARR network
Personnel: ~ 10 FTE’s
~ 3 FTE's dedicated to experiments
Multi-experiment
LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS,
MAGIC, ...
Resources dynamically assigned to experiments according to their needs
50% of the Italian resource for LCG
Participation to experiments data challenge
Integrated with Italian Grid
Resources accessible also in traditional way
Luca dell’Agnello -Stefano Zani
20
October,18-22 2004
Logistics
Moved to a new location (last January)
in the basement (-2nd floor)
~ 1000 m2 of total space
Hall
Computing Nodes
Storage Devices
Electric Power System (UPS)
Cooling and Air conditioning system
Garr GPop
Easily
accessible with lorries from the road
Not suitable for office use (remote control needed)
Luca dell’Agnello -Stefano Zani
21
October,18-22 2004
Electric Power
Electric Power Generator
1250 KVA (~ 1000 KW)
up to 160 racks
Uninterruptible Power Supply (UPS)
Located into a separate room (conditioned and ventilated)
800 KVA (~ 640 KW)
380 V three-phase distributed to all racks (Blindo)
Rack power controls output 3 independent 220 V lines
for computers
Rack
3
power controls sustain burden up to 16 or 32 A
32 A power controls needed for Xeon 36 bi-processors racks
APC power distribution modules (24 outlets each)
Luca dell’Agnello -Stefano Zani
22
October,18-22 2004
Cooling & Air Conditioning
RLS (Airwell) on the roof
~ 700 KW
Water cooling
Need “booster pump” (20 mts T1 roof)
Noise insulation
1 Air Conditioning Unit (uses 20% of RLS refreshing
power and controls humidity)
12 Local Cooling Systems (Hiross) in the computing
room
Luca dell’Agnello -Stefano Zani
23
October,18-22 2004
WN typical Rack Composition
Power Controls (3U)
1 network switch (1-2U)
48 FE copper interfaces
2 GE fiber uplinks
34-36 1U WNs
Connected to network switch
via FE
Connected to KVM system
Luca dell’Agnello -Stefano Zani
24
October,18-22 2004
Remote console control
Paragon UTM8 (Raritan)
8
Analog (UTP/Fiber) output connections
Supports up to 32 daisy chains of 40 nodes
(UKVMSPD modules needed)
Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)
IP-reach (expansion to support IP transport) evaluted but
not used
Autoview 2000R (Avocent)
1
Analog + 2 Digital (IP transport) output connections
Supports connections up to 16 nodes
Optional expansion to 16x8 nodes
Compatible
with Paragon (“gateway” to IP)
Luca dell’Agnello -Stefano Zani
25
October,18-22 2004
Networking (1)
Main Network infrastructure based on optical fibres
(~ 20 Km)
To
ease adoption of new (High Performances) transmission
technologies
To insure a better electrical insulation on long distances
Local (Rack wide) links with UTP (copper) cables
LAN has a “classical” star topology
GE core switch (Enterasys ER16)
NEW core switch (Black Diamond 10808 ) is in pre production
120 Gbit Fiber (Scale up to 480 ports)
12 10 Gbit Ethernet (Scale up to max 48 ports)
Farms up-link via GE trunk (Channel) to core switch
Disk Servers directly connected to GE switch (mainly fibre)
Luca dell’Agnello -Stefano Zani
26
October,18-22 2004
Networking (2)
WN's connected via FE to rack switch (1 switch per rack)
Not
a single brand for switches (as for wn's)
3 Extreme Summit 48 FE + 2 GE ports
3 3550 Cisco 48 FE + 2 GE ports
8 Enterasys 48 FE 2GE ports
10 switch Summit400 48 GE copper + 2 GE ports +
(2x10Gb ready)
Homogeneous characteristics
48 Copper Ethernet ports
Support of main standards (e.g. 802.1q)
2 Gigabit up-links (optical fibers) to core switch
CNAF interconnected to GARR-G backbone at 1 Gbps.
Luca dell’Agnello -Stefano Zani
27
October,18-22 2004
Network Configuration
SAN
AXUS
F.C.
NAS4
DELL
Infortrend
1st Floor
LHCBSW1
FarmSWG1
F.C.
F.C.
NAS2
IBM
FasT900
SSR8600
NAS3
NAS1
GARR
Internal services 1 Gb/s
FarmSW1
FarmSW2(Dell)
Babar SW
F.C.
FarmSW3(IBM)
F.C.
FarmSW4(IBM3)
Catalyst3550
Disk Servers
F.C.
FarmSW5(3Com)
FarmSWG2
STK
131.154.99.121
FarmSW6
FarmSW12
FarmSW7
FarmSW11
FarmSW8
T1
October,18-22 2004
FarmSW9
FarmSW10
28
S.Zani
L2 Configuration
Each
Experiment has its own VLAN
Solution adopted for complete granularity
Port
based VLAN
VLAN identifiers are propagated across switches
(802.1q)
Avoid recabling (or physical moving) of machines
to change farm topology
Level
2 isolation of farms
Possibility to define multi-tag (Trunk) ports
(for servers)
Luca dell’Agnello -Stefano Zani
29
October,18-22 2004
Power Switches
2 models used at Tier1:
•
•
“Old” APC MasterSwitch
Control Unit AP9224
controlling 3x8 outlets 9222
PDU from 1 Ethernet
“New” APC PDU Control
Unit AP7951 controlling 24
outlets from 1 Ethernet
“zero” Rack Unit (vertical
mount)
Access to the
configuration/control menu via
serial/telnet/web/snmp
1 Dedicated machine running
APC Infrastruxure Manager
Software (in progress)
Luca dell’Agnello -Stefano Zani
30
October,18-22 2004
Remote Power Distribution Unit
Screenshot of APC Infrastruxure Manager Software
with the status of all TIER1 PDU
Luca dell’Agnello -Stefano Zani
31
October,18-22 2004
Computing units
~ 800 1U rack-mountable Intel dual processor servers
800
MHz – 3.06 GHz
~ 700 wn’s (~ 1400 CPU’s) available for LCG
Tendering:
HPC
farm with MPI
Servers interconnected via Infiniband
Opteron farm (near future)
Luca dell’Agnello -Stefano Zani
32
October,18-22 2004
Storage Resources
~200 TB RAW Disk Space ON LINE.
NAS
NAS1+NAS4 (3Ware low cost) Tot 4.2 TB
NAS2+NAS3 (Procom)
Tot 13.2 TB
SAN
Dell Powervault 660f
Axus (Brownie)
STK Bladestore
Infortrend ES A16F-R
IBM Fast-T 900
Tot 7 TB
Tot 2 TB
Tot 9 TB
Tot 12 TB
Tot 150 TB
Luca dell’Agnello -Stefano Zani
33
October,18-22 2004
STORAGE resource
CLIENT SIDE
STK180 with 100
LTO (10Tbyte
Native)
CASTOR
Server+staging
WAN or TIER1 LAN
RAIDTEC
1800 Gbyte
2 SCSI interfaces
IDE NAS1,NAS4
Nas4.cnaf.infn.it
1800+2000 Gbyte
CDF LHCB
STK L5500 robot
(max 5000)
6 LTO-2
Fileserver CMS
diskserv-cms-1
PROCOM NAS2
PROCOM NAS3
Nas2.cnaf.infn.it
Nas3.cnaf.infn.it
8100 Gbyte
4700 Gbyte
VIRGO ATLAS
ALICE ATLAS
FAIL-OVER
support
Gadzoox Slingshot
FC Switch 18 port Fileserver Fcds2
Alias diskserv-ams-1 diskserv-atlas-1
Infortrend
ES A16F-R
12 TB
DELL POWERVAULT
7100 GByte
2 FC interface
AXUS BROWIE
Circa 2200 GByte
2 FC interface
STK BladeStore
Circa 10000 GByte
4 FC interface
Luca dell’Agnello -Stefano Zani
34
October,18-22 2004
Storage management and access (1)
Tier1
storage resources accessible as classical
storage or via grid
Non grid disk storage accessible via NFS
Generic WN’s also have AFS client
NFS mount volumes configured via autofs and
ldap
unique configuration repository eases maintenance
in progress: integration of ldap configuration with
Tier1 db data
Scalability
issues with NFS
Experienced
stalled mount points
Luca dell’Agnello -Stefano Zani
35
October,18-22 2004
Storage management and access (2)
Part of disk storage used as front-end to CASTOR
Balance between disk and CASTOR according to
experiments needs
1 stager for each experiment (installation in progress)
CASTOR accessible both directly or via grid
CASTOR SE available
ALICE Data Challenge used CASTOR architecture
Feedback to CASTOR team
Need optimization for file restaging
Luca dell’Agnello -Stefano Zani
36
October,18-22 2004
Tier1 Database
Resource database and management interface
Postgres
database as back end
Web interface (apache+mod_ssl+php)
Hw servers characteristics
Sw servers configuration
Servers allocation
Possible direct access to db for some applications
Monitoring
system
Nagios
Interface to configure switches and interoperate with
installation system.
Luca dell’Agnello -Stefano Zani
37
October,18-22 2004
Installation issues
Centralized
LCFG
installation system
(EDG WP4)
Integration with a central Tier1 db
Moving
from a farm to another implies
just changes in IP address (not name)
Unique
dhcp server for all VLANs
Support for DDNS (cr.cnaf.infn.it)
Investigating
Quattor for future needs
Luca dell’Agnello -Stefano Zani
38
October,18-22 2004
Our Desired Solution for Resource Access
SHARED RESOURCES among all experiments
Priorities and reservations managed by the scheduler
Most of Tier1 computing machines installed as LCG
Worker Nodes, with light modifications to support
more VOs
Application Software not directly installed on WNs but
accessed from outside (NFS, AFS, …)
One or more Resource Manager to manage all the
WNs in a centralized way
Standard way to access Storage for each application
Luca dell’Agnello -Stefano Zani
39
October,18-22 2004