Transcript Document
AMS Computing Y2001-Y2002 Vitali Choutko, Alexei Klimentov AMS Technical Interchange Meeting MIT Jan 22-25, 2002 Outline AMS Production Farm requirements architecture prototyping test of HW and SW components HW and SW evaluation for AMS02 Ground Segment Data Transmission SW Y2002 Milestones V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 2 POIC@MSFC AL POCC POCC External Communications HOSC Web Server and xterm XTerm commands Monitoring, H&S data Flight Ancillary data AMS science data (selected) cmds archive TReK WS “voice”loop TReK WS Video distribution GSE PC Farm Science Operations Center GSE Buffer data Retransmit To SOC GSE D A T A AMS Remote center MC production Data mirror archiving RT data Commanding Monitoring NRT Analysis S e r v e r Production Farm MC production NRT Data Processing Primary storage Archiving Distribution Science Analysis Analysis Facilities Data Server Analysis Facilities AMS Station AMS Station V.Ch, A.K. AMS Station AMS Ground Centers AMS TIM, MIT, Jan 22-25 2002 3 AMS Production Farm (requirements) Complex system that consists of computing components including I/O nodes, worker nodes, data storage and networking switches. It should perform as a single system. Requirements : Reliability – High (24h/day, 7days/week) Performance goal – process data “quasi-online” (with typical delay < 1 day) Disk Space – 12 months data “online” Minimal human intervention (automatic data handling, job control and book-keeping) System stability – months Scalability Price/Performance V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 4 AMS Production Farm (considerations) Considerations based on AMS01 data processing experience and MC production Y2000-2001 : Uniform node architecture ( dual-CPU Pentiums and AMDs) Uniform Operating System (RedHat Linux) Computing capacity equivalent to 400x450MHz PII processors (including 20% contingency and reprocessing) Total of 10 Tbyte data stored online Two types of computers : “Processing node” with cheap IDE disks used for transient data storage “Server node” with IDE and SCSI RAID disks for persistent data storage V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 5 Y2001 milestones HW evaluation to make a choice of platform and architecture (“official” AMS02 simulation/reconstruction code been used for the benchmarking) Functional Goal : AMS01 STS91 Data Rerun and AMS02 MC production using production farm prototype and SW V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 6 AMS02 Benchmarks 1) Brand, CPU , Memory OS/Compiler “Sim” “Rec” Intel PII dual-CPU 450 MHz, 512 MB RAM RH Linux 6.2 / gcc 2.95 1 1 Intel PIII dual-CPU 933 MHz, 512 MB RAM RH Linux 6.2 / gcc 2.95 0.54 0.54 Compaq, Quad α-ev67 600 MHz, 2 GB RAM RH Linux 6.2 / gcc 2.95 0.58 0.59 AMD Athlon,1.2GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.39 0.34 Intel Pentium IV 1.5GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.44 0.58 Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAM RH Linux 6.2 / gcc 2.95 0.32 0.39 Compaq dual α-ev68 866MHz, 2GB RAM Tru64 Unix/ cxx 6.2 0.23 0.25 Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAM RH Linux 7.2 / gcc 2.95 0.29 0.35 AMD Athlon 1800MP, dual-CPU 1.53GHz, 1GB RAM RH Linux 7.2 / gcc 2.95 0.24 0.23 8 CPU SUN-Fire-880, 750MHz, 8GB RAM Solaris 5.8/C++ 5.2 0.52 0.45 24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAM RH Linux 6.2 / gcc 2.95 0.43 0.39 Compaq α-ev68 dual 866MHz, 2GB RAM RH Linux 7.1 / gcc 2.95 0.22 0.23 Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 7 AMS01 STS91 Data Rerun (performance) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 8 AMS02 Benchmarks (summary) α-ev68 866MHz and AMD Athlon MP 1800+ have nearly the same performance and are the best candidates for “AMS processing node” (the price of system based on α-ev68 is twice higher than the similar one based on AMD Athlon) Though PIV Xeon has lower performance, resulting 15% overhead comparing with AMD Athlon MP 1800+, the requirements of high reliability for “AMS server node” dictates the choice of Pentium machine. SUN and COMPAQ SMP might be the candidates for AMS analysis computer (the choice is postponed up to L-12 months) Conclusion : The total power of AMS02 processing farm must be equivalent to 50 AMD Athlon MP 1800+ computers. V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 9 Production Farm (“AMS processing node” architecture) Processor Chip set Memory System Disk Disk Controller Disks (transient storage) Ethernet Adapters “public” “AMS private” V.Ch, A.K. dual-CPU 1.5+GHz currently AMD 1 GB RAM LVD SCSI 3Ware IDE RAID 6x120+GB IDE 100 Mbit/sec 2x1 GBit/sec AMS TIM, MIT, Jan 22-25 2002 10 Production Farm (“AMS server node” architecture) Processor Chip set Memory System Disk Disk Controller Disks (permanent storage) Disk Controller Disks (transient storage) Ethernet Adapters “public” “AMS private” V.Ch, A.K. dual-CPU 1.4+GHz currently Intel 1 GB RAM LVD SCSI IPC SCSI RAID 8x180+GB SCSI 3Ware IDE RAID 7x120+GB IDE 100 Mbit/sec 2x1 GBit/sec AMS TIM, MIT, Jan 22-25 2002 11 Production Farm HW Tape Drive (“raw” data backup) • IBM LTO Ultrium (connected to “server node” prototype) data transfer (write) RAID 5 array -> tape 11MByte/sec data transfer (read) tape -> Null device 19MByte/sec tape -> RAID 5 array 11MByte/sec tape capacity 200GB (see also http://cscct.home.cern.ch/cscct/ultrium) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 12 AMS Science Operation Center Computing Facilities Production Farm PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ PC Linux 2x2GHz+ Tape Server PC Linux 2x2GHz+ Tape Server Disk Server Gigabit Switch (1 Gbit/sec) #8 #2 PC Linux Server 2x2GHz, SCSI RAID Archiving and Staging Cell #1 Gigabit Switch (1 Gbit/sec) MC Data Server AMS data NASA data metadata Disk Server Data Server Simulated data Disk Server Disk Disk Server Server 2xSMP, (Q, SUN) PC Linux 2x2GHz+ PC Linux 2x2GHz+ Gigabit Switch (1 Gbit/sec) Analysis Facilities A.Klimentov Jan 15,2002 13 V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 14 AMS Computing Y2001 (SW) AMS production process/process communication and control SW (PPCC) and monitoring Client/Server Corba technology (V.Choutko) Process Monitoring package (M.Boschini, V.Choutko, A.Klimentov) Data Handling ORACLE DB to store metadata and catalogues (M.Boschini, A.Klimentov) Data transmission package Based on bbftp (A.Elin, A.Klimentov AMS note 2001-11-02) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 15 AMS Production Highlights Excellent HW stability ( uptime more than 3 months) AMS01 STS91 data rerun (10 Linux boxes, 19 CPUs) Average efficiency 95% (cpu time/elapsed time) Processes communication and control via Corba LSF for process submission Oracle server on AS4100 Alpha and Oracle clients on Linux. Oracle RDBMS Tag DB with 100M entries Conditions DB with 100K entries Bookkeeping Production status Runs history File catalogues V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 16 Data Transmission SW 1) High Rate Data Transfer between MSFC and POCC/SOC, POCC and SOC, SOC and MasterCopy repositary(s) will become a paramount importance (tests with TReK between MIT and CERN, TReK is the best candidate for AMS commanding and transferring of data samples) What should be used for the bulk data transfer ? Why not FileTransferProtocol (ftp) or ncftp , etc ? to speed up data transfer to encrypt sensitive data and not encrypt bulk data to run in batch mode with automatic retry in case of failure … starting to look around and came up with bbftp in September (bbftp developed in BaBar and used to transmit data from SLAC to IN2P3@Lyon) adapted it for AMS, wrote service and control programs 1) 2) A.Elin, A.Klimentov AMS note 2001-11-02 P.Fisher, A.Klimentov AMS Note 2001-05-02 V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 17 Data Transmission SW (the inside details) Server copy data files between directories (optional) scan data directories and make list of files to be transmitted purge successfully transmitted files and do book-keeping of transmission sessions V.Ch, A.K. Client periodically connect to server and check if new data available bbftp new data and update transmission status in the catalogues. AMS TIM, MIT, Jan 22-25 2002 18 Data Transmission SW (tests) Location Line Program Rate Mbit/sec Mbit/sec Prevessin ->Meyrin 10 ftp bbftp bbcp 5.8 7.8 8.0 Prevessin -> Prevessin 100 ftp bbftp bbcp 21.0 40.0 42.0 Prevessin -> Milano 1) 16 bbftp 6.0 Server and client – dual-CPU Intel PIII , Linux OS. bbftp release 2.1.2 Transmit AMS01 “raw” data and AMS01 data summary files (Ntuples) Duration 12-24h 1) M.Boschini installed bbftp in INFN Milano V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 19 AMS Computing Y2001 Y2001 milestones are fulfilled V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 20 AMS Computing Y2002 Build AMS02 “ production cell ” and use it for MC production Build AMS02 “ analysis cell ” AMS02 process and data control SW (migrate from OpenSource Corba to the licensed version) “bbftp” tests between MIT and CERN, GSC@MSFC and MIT/CERN Evaluate archiving and staging system for AMS (Jan 2002 - 4TB) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 21 AMS Computing Y2002 (“production cell”) Processing Nodes 1-5 Dual-CPU Athlon 1900+ 1GB RAM 3Ware IDE Raid 6x120GB Western Digital 1 Gbit/sec ethernet 2x100MBit/sec ethernet I D E R A I D Dual-CPU AMD 1Gbit/ses AMS private “Processing Node #1” Dual-CPU AMD “Processing Node #5” Server Node 1 Dual-CPU Xeon or PIII I 1GBDRAM E 3Ware IDE Raid 7x120GB Western Digital R IPC SCSI Raid A 8x160GBI WD disks 1 Gbit/sec D ethernet 2x100MBit/sec ethernet V.Ch, A.K. I D E R A I D “ Dual-CPU Intel S C S I R A I D Server Node #1 Jan ” AMS TIM, MIT, 22-25 2002 Analysis 100 Mbit/sec CERN backbone R A I D I D E programs 22 AMS Computing Y2002 (“analysis cell”) 2 dual-CPU AMD Athlon dedicated for AMS analysis and Geant4 simulation. Architecture is similar to “AMS processing node” (but 4 channels IDE RAID controller with 4x120GB WD HDD) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 23 Y2002 Milestones AMS computers upgrade (1Q) AMS “production cell” (1Q) AMS “analysis cell” (2Q) Data transmission tests (2Q) Evaluation of archiving and staging systems (technical meeting with CASPUR Feb/Mar, system choice 3Q) AMS data handling and PPCC SW, Licensed CORBA package (3Q) V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 24 Growth of computers and data storage in Science Operation Center 400 350 300 250 200 Data Storage (TB) 150 AMSspecs (# of PII 450MHz CPUs) # nodes 100 50 0 '97 '98 '99 '00 '01 '02 '03 '04 V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 25