Transcript Document

AMS Computing Y2001-Y2002
Vitali Choutko, Alexei Klimentov
AMS Technical Interchange Meeting
MIT Jan 22-25, 2002
Outline








AMS Production Farm
requirements
architecture
prototyping
test of HW and SW components
HW and SW evaluation for AMS02 Ground
Segment
Data Transmission SW
Y2002 Milestones
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
2
POIC@MSFC AL
POCC
POCC
External Communications
HOSC Web Server and xterm
XTerm
commands
Monitoring, H&S data
Flight Ancillary data
AMS science data (selected)
cmds archive
TReK WS
“voice”loop
TReK WS
Video distribution
GSE
PC Farm
Science
Operations
Center
GSE
Buffer data
Retransmit
To SOC
GSE
D
A
T
A
AMS Remote
center
MC production
Data
mirror
archiving
RT data
Commanding
Monitoring
NRT Analysis
S
e
r
v
e
r
Production
Farm
MC
production
NRT Data
Processing
Primary storage
Archiving
Distribution
Science Analysis
Analysis
Facilities
Data
Server
Analysis
Facilities
AMS
Station
AMS
Station
V.Ch, A.K.
AMS
Station
AMS Ground Centers
AMS TIM, MIT, Jan 22-25 2002
3
AMS Production Farm (requirements)
Complex system that consists of computing components including I/O nodes,
worker nodes, data storage and networking switches. It should perform as a
single system.
Requirements :

Reliability – High (24h/day, 7days/week)
 Performance goal – process data “quasi-online” (with typical





delay < 1 day)
Disk Space – 12 months data “online”
Minimal human intervention (automatic data handling, job
control and book-keeping)
System stability – months
Scalability
Price/Performance
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
4
AMS Production Farm (considerations)
Considerations based on AMS01 data processing
experience and MC production Y2000-2001 :





Uniform node architecture ( dual-CPU Pentiums and AMDs)
Uniform Operating System (RedHat Linux)
Computing capacity equivalent to 400x450MHz PII processors
(including 20% contingency and reprocessing)
Total of 10 Tbyte data stored online
Two types of computers :
“Processing node” with cheap IDE disks used for transient data
storage
“Server node” with IDE and SCSI RAID disks for persistent
data storage
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
5
Y2001 milestones
HW evaluation to make a choice of platform and
architecture
(“official” AMS02 simulation/reconstruction code been used for the
benchmarking)

Functional Goal :
AMS01 STS91 Data Rerun and AMS02 MC production using production
farm prototype and SW
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
6
AMS02 Benchmarks
1)
Brand, CPU , Memory
OS/Compiler
“Sim”
“Rec”
Intel PII dual-CPU 450 MHz, 512 MB RAM
RH Linux 6.2 / gcc 2.95
1
1
Intel PIII dual-CPU 933 MHz, 512 MB RAM
RH Linux 6.2 / gcc 2.95
0.54
0.54
Compaq, Quad α-ev67 600 MHz, 2 GB RAM
RH Linux 6.2 / gcc 2.95
0.58
0.59
AMD Athlon,1.2GHz, 256 MB RAM
RH Linux 6.2 / gcc 2.95
0.39
0.34
Intel Pentium IV 1.5GHz, 256 MB RAM
RH Linux 6.2 / gcc 2.95
0.44
0.58
Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAM
RH Linux 6.2 / gcc 2.95
0.32
0.39
Compaq dual α-ev68 866MHz, 2GB RAM
Tru64 Unix/ cxx 6.2
0.23
0.25
Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAM
RH Linux 7.2 / gcc 2.95
0.29
0.35
AMD Athlon 1800MP, dual-CPU 1.53GHz,
1GB RAM
RH Linux 7.2 / gcc 2.95
0.24
0.23
8 CPU SUN-Fire-880, 750MHz, 8GB RAM
Solaris 5.8/C++ 5.2
0.52
0.45
24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAM
RH Linux 6.2 / gcc 2.95
0.43
0.39
Compaq α-ev68 dual 866MHz, 2GB RAM
RH Linux 7.1 / gcc 2.95
0.22
0.23
Executive time of AMS “standard” job compare to CPU clock
1) V.Choutko, A.Klimentov AMS note 2001-11-01
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
7
AMS01 STS91 Data Rerun (performance)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
8
AMS02 Benchmarks (summary)



α-ev68 866MHz and AMD Athlon MP 1800+ have nearly the same
performance and are the best candidates for “AMS processing node”
(the price of system based on α-ev68 is twice higher than the similar
one based on AMD Athlon)
Though PIV Xeon has lower performance, resulting 15% overhead
comparing with AMD Athlon MP 1800+, the requirements of high
reliability for “AMS server node” dictates the choice of Pentium
machine.
SUN and COMPAQ SMP might be the candidates for AMS analysis
computer (the choice is postponed up to L-12 months)
Conclusion :
The total power of AMS02 processing farm
must be equivalent to 50 AMD Athlon MP
1800+ computers.
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
9
Production Farm
(“AMS processing node” architecture)
Processor
Chip set
Memory
System Disk
Disk Controller
Disks (transient storage)
Ethernet Adapters
“public”
“AMS private”
V.Ch, A.K.
dual-CPU 1.5+GHz
currently AMD
1 GB RAM
LVD SCSI
3Ware IDE RAID
6x120+GB IDE
100 Mbit/sec
2x1 GBit/sec
AMS TIM, MIT, Jan 22-25 2002
10
Production Farm
(“AMS server node” architecture)
Processor
Chip set
Memory
System Disk
Disk Controller
Disks (permanent storage)
Disk Controller
Disks (transient storage)
Ethernet Adapters
“public”
“AMS private”
V.Ch, A.K.
dual-CPU 1.4+GHz
currently Intel
1 GB RAM
LVD SCSI
IPC SCSI RAID
8x180+GB SCSI
3Ware IDE RAID
7x120+GB IDE
100 Mbit/sec
2x1 GBit/sec
AMS TIM, MIT, Jan 22-25 2002
11
Production Farm HW

Tape Drive (“raw” data backup)
• IBM LTO Ultrium (connected to “server node” prototype)
data transfer (write) RAID 5 array -> tape 11MByte/sec
data transfer (read) tape -> Null device 19MByte/sec
tape -> RAID 5 array 11MByte/sec
tape capacity 200GB
(see also http://cscct.home.cern.ch/cscct/ultrium)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
12
AMS Science Operation Center Computing Facilities
Production Farm
PC Linux
2x2GHz+
PC Linux
2x2GHz+
PC Linux
2x2GHz+
PC Linux
2x2GHz+
Tape
Server
PC Linux
2x2GHz+
Tape
Server
Disk
Server
Gigabit Switch (1 Gbit/sec)
#8
#2
PC Linux Server
2x2GHz, SCSI RAID
Archiving and Staging
Cell #1
Gigabit Switch
(1 Gbit/sec)
MC Data Server
AMS data
NASA data
metadata
Disk
Server
Data Server
Simulated
data
Disk
Server
Disk
Disk
Server
Server
2xSMP,
(Q, SUN)
PC Linux
2x2GHz+
PC Linux
2x2GHz+
Gigabit Switch (1 Gbit/sec)
Analysis Facilities
A.Klimentov Jan 15,2002
13
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
14
AMS Computing Y2001 (SW)

AMS production process/process communication and
control SW (PPCC) and monitoring
Client/Server Corba technology (V.Choutko)
Process Monitoring package
(M.Boschini, V.Choutko, A.Klimentov)

Data Handling
ORACLE DB to store metadata and catalogues (M.Boschini, A.Klimentov)

Data transmission package
Based on bbftp (A.Elin, A.Klimentov AMS note 2001-11-02)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
15
AMS Production Highlights







Excellent HW stability ( uptime more than 3 months)
AMS01 STS91 data rerun (10 Linux boxes, 19 CPUs)
Average efficiency 95% (cpu time/elapsed time)
Processes communication and control via Corba
LSF for process submission
Oracle server on AS4100 Alpha and Oracle clients on Linux.
Oracle RDBMS
 Tag DB with 100M entries
 Conditions DB with 100K entries
 Bookkeeping
 Production status
 Runs history
 File catalogues
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
16
Data Transmission SW

1)
High Rate Data Transfer between MSFC and POCC/SOC, POCC
and SOC, SOC and MasterCopy repositary(s) will become a
paramount importance (tests with TReK between MIT and CERN, TReK is
the best candidate for AMS commanding and transferring of data samples)


What should be used for the bulk data transfer ?
Why not FileTransferProtocol (ftp) or ncftp , etc ?
to speed up data transfer
to encrypt sensitive data and not encrypt bulk data
to run in batch mode with automatic retry in case of failure

… starting to look around and came up with bbftp in September
(bbftp developed in BaBar and used to transmit data from SLAC to
IN2P3@Lyon) adapted it for AMS, wrote service and control
programs
1)
2)
A.Elin, A.Klimentov AMS note 2001-11-02
P.Fisher, A.Klimentov AMS Note 2001-05-02
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
17
Data Transmission SW (the inside details)




Server

copy data files between
directories (optional)
scan data directories and
make list of files to be
transmitted
purge successfully
transmitted files and do
book-keeping of transmission
sessions
V.Ch, A.K.


Client
periodically connect to
server and check if new data
available
bbftp new data and update
transmission status in the
catalogues.
AMS TIM, MIT, Jan 22-25 2002
18
Data Transmission SW (tests)
Location
Line
Program Rate
Mbit/sec
Mbit/sec
Prevessin ->Meyrin
10
ftp
bbftp
bbcp
5.8
7.8
8.0
Prevessin -> Prevessin
100
ftp
bbftp
bbcp
21.0
40.0
42.0
Prevessin -> Milano 1)
16
bbftp
6.0
Server and client – dual-CPU Intel PIII , Linux OS. bbftp release 2.1.2
Transmit AMS01 “raw” data and AMS01 data summary files (Ntuples)
Duration 12-24h
1)
M.Boschini installed bbftp in INFN Milano
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
19
AMS Computing Y2001
Y2001 milestones are fulfilled
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
20
AMS Computing Y2002





Build AMS02 “ production cell ” and use it for MC
production
Build AMS02 “ analysis cell ”
AMS02 process and data control SW (migrate from
OpenSource Corba to the licensed version)
“bbftp” tests between MIT and CERN, GSC@MSFC
and MIT/CERN
Evaluate archiving and staging system for AMS (Jan
2002 - 4TB)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
21
AMS Computing Y2002 (“production cell”)
Processing Nodes 1-5
Dual-CPU Athlon 1900+
1GB RAM
3Ware IDE Raid
6x120GB Western Digital
1 Gbit/sec ethernet
2x100MBit/sec ethernet
I
D
E
R
A
I
D
Dual-CPU AMD
1Gbit/ses AMS private
“Processing Node #1”
Dual-CPU AMD
“Processing Node #5”
Server Node 1
Dual-CPU Xeon
or PIII
I
1GBDRAM
E
3Ware IDE
Raid
7x120GB Western Digital
R
IPC SCSI
Raid
A
8x160GBI WD disks
1 Gbit/sec
D ethernet
2x100MBit/sec ethernet
V.Ch, A.K.
I
D
E
R
A
I
D
“
Dual-CPU Intel
S
C
S
I
R
A
I
D
Server
Node
#1 Jan
”
AMS
TIM,
MIT,
22-25 2002
Analysis
100 Mbit/sec CERN backbone
R
A
I
D
I
D
E
programs
22
AMS Computing Y2002 (“analysis cell”)


2 dual-CPU AMD Athlon dedicated for
AMS analysis and Geant4 simulation.
Architecture is similar to “AMS
processing node” (but 4 channels IDE
RAID controller with 4x120GB WD HDD)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
23
Y2002 Milestones






AMS computers upgrade (1Q)
AMS “production cell” (1Q)
AMS “analysis cell” (2Q)
Data transmission tests (2Q)
Evaluation of archiving and staging systems
(technical meeting with CASPUR Feb/Mar, system
choice 3Q)
AMS data handling and PPCC SW, Licensed
CORBA package (3Q)
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
24
Growth of computers and data storage
in Science Operation Center
400
350
300
250
200
Data Storage (TB)
150
AMSspecs (# of PII 450MHz CPUs)
# nodes
100
50
0
'97 '98 '99 '00 '01 '02 '03 '04
V.Ch, A.K.
AMS TIM, MIT, Jan 22-25 2002
25