Update since Hepix Spring 2005 TRIUMF SITE REPORT Corrie Kost TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005

Download Report

Transcript Update since Hepix Spring 2005 TRIUMF SITE REPORT Corrie Kost TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005

Update since Hepix Spring 2005
TRIUMF
SITE REPORT
Corrie Kost
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Google Mini comes to TRIUMF
• $2995 US w 1 yr support
• indexes up to 100,000 docs
Read a complete in-depth review at
http://www.anandtech.com/IT/showdoc.aspx?i=2523&p=2
• 220 different file formats
• Two 10/100 Ethernet ports
- 1st for normal operation
- 2nd for setup using cross-over
cable
• 120GB Seagate Drive
• 2GB Memory
• Maintainance via special google dialup modem
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
The TRIUMF-CERN 1GbE Lightpath(s)
• 1 GbE circuit
establishedApril 18th 2005
• 2nd GbE circuit
established July 19th 2005
• TRIUMF
• BCNET
• CANARIE
• SURFnet
• CERN
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
ATLAS Service Challenge
Servers
3 EMT64 systems, each with:
2 GB memory
hardware raid - 3ware 9xxx SATA raid controller
Seagate Barracuda 7200.8 drives in hardware raid 5 - 8 x 250 GB
1 dual Opteron 246 server with:
2 GB memory
3ware 9xxx SATA raid controller
WD Caviar SE drives in hardware raid 0 - 2 x 250 GB
2 4560-SLX IBM Tape Libraries (currently each with only 1 SDLT 320 tape drive)
1 borrowed EMT64 system used temporarily as an FTS Server with:
1 GB memory
2 SATA 80 GB drives for the OS and for Oracle's needs.
Storage
5.5+ TB disk
8+ TB tape
http://grid.triumf.ca/status/sc3.html
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
ATLAS Service Challenge
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
10 GbE Lightpath to CERN
CERN
TRIUMF
√
√
√
√
Atlantic Crossing
√
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
X
√
10 GbE Lightpath to CERN
•Permanent 10GbE TRIUMF-CERN Lightpath ~ year-end 2005
•Foundry Bigiron RX-4’s at TRIUMF & BCnet
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
10 GbE Lightpath to CERN
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF WAN CWDM
PROBLEM: MRV needs 1550+/-3nm but FOUNDRY 1550+/-15nm
MRV CWDM
Potential to
Add 2 more
1GbE channels
4 1GbE channels
Passport 8600
• ORAN
Single Pair Fiber
BCNET 22km
• WESTGRID
• 2x CERN
1610 nm
1590 nm
1570 nm
1550 nm
10GbE
Foundry Switch (CERN / Ottawa)
SFP
2x GbE
TDM
4 Port
Optical Mux
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Raid5: Puzzling I/O results
Repeated reads on same set of files (at 600MB/sec) –
one or more files will “degrade” – typically after set of 16
8GB files have been read 1000 times. Positive: Read
~2PB during 50 days – averaging about 600MB/sec
TRANSITION
8GB File Read Time (sec)
8 SATA disks on each of pair of RAID5 RocketRaid 1820A controllers
20
15
10
5
0
1
17
33
49
65
File Number (same every 16th)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
81
Unix Backups at TRIUMF
• Amanda system
– Dual Opteron 248 2.2 GHz
• 2G Memory
• 16 x400G WD disks ~ 6TB (1.5TB present sys ~ 10day cycle)
• 2 LSI Mega raid 8 disk controllers
• Disk based ~1 month of backups
– At least 2 full backups with daily incrementals
• 26 Slot Overland DLT tape library
• SDLT 600 drive 300G native capacity per tape
• 150 Linux machines (users: home dir, servers: full)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Cheap Hot-Swap Backup
• Promise SuperSwap 1100 Enclosures
• Four 400 GB Seagate Sata Drives
• Promise FastTrak S150 SX4 Sata controller
• Raid 5
• Linux 2.4.20-8 RedHat 9
A disk can be removed at anytime and
replaced at anytime. Rebuilds in background.
Used to keep live multiple (daily) RSYNC (via
DIRVISH) copies of critical servers (for ~ 1
month). See http://www.dirvish.com/
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
VOIP coming to TRIUMF
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF Ticketing System (Request Tracker)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF Ticketing System (Request Tracker)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
http://hepix.caspur.it/afs/hepix.org/projects.html
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Conclusions / Observations
- Site services (Web, Email, Batch, Windows) all much more stable – new
hardware, more memory (typically 4-8GB) in servers
- Quad Opteron SUN I/O - using external SATA - still limited below 1 GB/sec
- Read 16 8GB files repeatedly – averaging over 600MB/sec for ~2PB
- Site “Backup” services still problematic
- tape media capacity (outgrow in 2 years)
- reliability (is SDLT robust?)
- Permanent 10GbE TRIUMF-CERN service by year-end.
- ATLAS Service Challenges targets being met for TRIUMF as TIER1
- Started using PLONE as content management for TRIUMF Web Server
- Moving some phones to voice-over-IP
- Scientific Linux (3 &4) still preferred Linux OS at TRIUMF
- Moving away from distributed printing to print/scan-to-email/copy stations
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF Servers – May/2005
GPS TIME
MSR WEB
NAME
DOCUMENTS
CONDORG
WEB
SHARE
MAIL
FILE
IBM
CLUSTER
LCG
STORAGE
WORKER
NODES
FEDORA / SL
MIRROR
IBM / SHARE
STORAGE
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
STORM2
SUN1
Foundry
STORM1
AMANDA
BACKUP
(VIA DISKS)
TRIUMF Servers – October/2005
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005