An Overview of PHENIX Computing

Download Report

Transcript An Overview of PHENIX Computing

An Overview of PHENIX Computing
Ju Hwan Kang (Yonsei Univ.)
and
Jysoo Lee (KISTI)
International HEP DataGrid Workshop
November 8 ~ 9, 2002
Kyungpook National University, Daegu,
Korea
RHIC
 Configurations: Two concentric superconducting magnet rings
(3.8Km circumference) with 6 interaction regions
 Ion Beams:
Au + Au (or p + A)
s = 200 GeV/nucleon
luminosity = 21026 cm-2 s-1
 Polarized proton:
p+p
 s = 500 GeV
luminosity = 1.4 1031 cm-2 s-1
 Experiments: PHENIX, STAR, PHOBOS, BRAHMS
PHENIX Experiment
 Physics Goals
Search for Quark-Gluon Plasma
Hard Scattering Processes
Spin Physics
 Experimental Appratus
PHENIX Central Arms (e, , hadrons)
PHENIX Muon Arms ( )
PHENIX Data Size
Peak DAQ bandwidth in PHENIX is 20 MB/sec.

Ion Beams (Au + Au)
1. Minimum bias events (0.16 MB/event):
Raw event rate = 1400 Hz (224 MB/sec)
Trigger rate = 124 Hz (20 MB/sec)
2. Central events (0.4MB/event):
Trigger rate = 50 Hz (20 MB/sec)

Polarized proton (p + p)
All events (25 KB/event):
Raw event rate = 250 KHz (6250 MB/sec)
Trigger rate = 800 Hz (20 MB/sec)
RCF
 RHIC
Computing Facility (RCF) provides computing facilities for
four RHIC experiments (PHENIX, STAR, PHOBOS, BRAHMS).
 Typically RCF gets ~ 30 MB/sec (or a few TB/day) from the
PHENIX counting house only through Gigabit network. Thus RCF is
required to have complicated data storage and data handling
systems.
 RCF has established an AFS cell for sharing files with remote
institutions and NFS is the primary means through which data is
made available to the users at the RCF.
 The similar facility is established at RIKEN (CC-J) as a
regional computing center for PHENIX.
 Compact but effective system is also installed at Yonsei.
PHENIX Computing Environment
 Linux OS with ROOT
 PHOOL (PHenix Object Oriented Library):
C++ class library (PHOOL) based on top of
ROOT
 GNU build system
Mining
&
Staging
Pretty
Big
Disk
Raw
Data
HPSS
Raw
Data
Counting
House
Calibrations
&
Run Info
Big
Disk
DST
Data
Reconstruction
Farm
Tag DB
Database
Analysis
Jobs
Local
Disks
Data Carousel using HPSS
 To handle annual volume of 500TB from
PHENIX only, High Performance Storage
System (HPSS) is used as Hierarchical
Storage system with tape robotics and
disk system.
 IBM computer (AIX4.2) organizes the
request of users to retrieve data without
chaos.
 PHENIX used ten 9840 and eight 9940
drives from STK.
 The tape media costs about $1/GB.
Data carousel architecture
HPSS
tape
data mover
“ORNL”
software
HPSS
cache
carousel
server
mySQL
database
filelist
rmine0x
client
pftp
pftp
NFS disk
CAS local disk
CAS
Disk Storage at RCF
 The storage resources are provided by a group of SUN NFS
servers with 60TB of SAN based RAID arrays backed by a series
of StorageTek tape libraries managed by HPSS.
 Vendors of storage disks are Data Direct, MTI, ZZYZX, and
LSI.
Linux Farms at RCF
 CRS (Central Reconstruction Server)
farms are dedicated to the processing of
raw event data to generate reconstructed
events (strictly batch systems without
being available for general users).
 CAS (Central Analysis Server) farms
are dedicated to the analysis of the
reconstructed events (mix of interactive
and batch systems).
 The LSF, the Load Sharing Facility,
manages batch jobs.
 There are about 600 machines (dual
CPU, IGB memory, 30GB local disks) at
RCF and about 200 machines are allocated
for PHENIX.
offline software technology
 analysis framework
 C++ class library (PHOOL) based on top of ROOT
base class for analysis modules
• “tree” structure for holding, organizing data
can contain raw data, DSTs, transient results
• uses ROOT I/O
 database
 using Objectivity OODB for calibration data, file catalog,
run info, etc.
expecting ~100 GB/year of calibration data
 code development environment
 based heavily on GNU tools (autoconf, automake, libtool)
PHENIX CC-J
 The PHENIX CC-J at RIKEN is intended to serve as the
main site of computing for PHENIX simulations, a regional
Asia computing center for PHENIX, and as a center for
SPIN physics analysis.
 Network switches required to connect HPSS servers, the
SMP data servers, and the CPU farms at RCF.
 In order to exchange data between RCF and CC-J, a
proper bandwidth of the WAN between RCF and CC-J is
required.
 CC-J has CPU farms of 10K SPECint95, tape storage of
100 TB, disk storage of 15 TB, tape I/O of 100 MB/sec,
disk I/O of 600MB/sec, and 6 SUN SMP data server units.
Situations at YONSEI
 Comparable mirror image into Yonsei by “Explicit” copy of the remote
system
 Usage of the local cluster machines
Similar operation environment (same OS, and similar hardware spec)
1. Disk sharing through NFS
One installation of analysis library and sharing by other machines
2. Easy upgrade and management
 Local clustering
Unlimited network resources between the cluster machines by using
100Mbps (No NFS lagging and instant X-display as an example)
Current number of the cluster machines = 2 (2CPUs) + 2 (as RAID)
 File transfers from RCF
Software update by copying shared libraries (once/week, takes less
than about 1 hour)
Raw data copy using “scp” or BBFTP (~1GB/day)
Yonsei Computing Resources
Yonsei Linux boxes for PHENIX analysis use
 4 desktop boxes in a firewall (Pentium III/IV)
Linux (RedHat 7.3, Kernel 2.4.18-3, GCC 2.95.3)
ROOT(ROOT 3.01/05)
 One machine has all software required for PHENIX analysis
 Event generation, reconstruction, analysis
 Remaining desktops share one library directory with the
same kernel and compiler etc… via NFS
 2 large RAID disk box with several IDE HDDs (~500G X 2)
and several small disks (total ~500G) in 2 desktops
 Compact but effective system for small user group
Yonsei Computing Resources
 Linux (RedHat 7.3, Kernel 2.4.18-3 ,GCC 2.95.3)
 ROOT(ROOT 3.01/05)
Gateway
Firewall
Big Disk(480G X 2)
RAID tools for Linux
100Mbps
PHENIX Library
P3 1G
Raw
Data & DST
Library(NFS)
OBJY
P3 1G
P4 2G
480G DISK
P4 1.3G
480G RAID DISK
Reconstruction
Analysis
Jobs
Calibrations
&
Run Info
Tag DB
Database