Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory Overview  Data Types: – Raw: very large volume (xPB), average bandwidth (50MB/s). – DST: average.

Transcript Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory Overview  Data Types: – Raw: very large volume (xPB), average bandwidth (50MB/s). – DST: average.

Mass Storage @ RHIC
Computing Facility
Razvan Popescu - Brookhaven
National Laboratory
Overview

Data Types:
– Raw: very large volume (xPB), average
bandwidth (50MB/s).
– DST: average volume (x00TB), large
bandwidth (x00MB/s).
– mDST: low volume (x0TB), large bandwidth
(x00MB/s).
SLAC -- October 1999
Mass Storage @ RCF
2
Data Flow (generic)
RHIC
raw
35MB/s
DST
raw
50MB/s
Reconstruction
Farm (Linux)
10MB/s
Archive
(HPSS)
DST
200MB/s
File Servers
(DST/mDST)
mDST
mDST
400MB/s
10MB/s
Analysis
Farm (Linux)
SLAC -- October 1999
Mass Storage @ RCF
3
Phenix Data Flow
Calibration - xMB/s
RHIC
10MB/s
HPSS (RAW)
10MB/s
6MB/s
6MB/s
150GB @ 80MB/s
10MB/s
6MB/s
Redwood
(3)
Reconstr.
Farm
(?Si95)
(?00 proc.)
2MB/s
55MB/s
1MB/s
HPSS (DST)
16MB/s
File Server
10MB/s -Calib.
1MB/s
16MB/s
150GB @ 80MB/s
1MB/s
18MB/s
Analysis Farm
(? Si95)
65MB/s
3TB @ 100MB/s
16MB/s
9840
(2)
1MB/s
Redwood
(0)
SLAC -- October 1999
Mass Storage @ RCF
4
Present resources



Tape Storage:
– (1) STK Powderhorn silo (6000 cart.)
– (11) SD-3 (Redwood) drives.
– (10) 9840 (Eagle) drives.
Disk Storage:
– ~8TB of RAID disk.
• 1TB for HPSS cache.
• 7TB Unix workspace.
Servers:
– (5) RS/6000 H50/70 for HPSS.
– (6) E450&E4000 for file serving and data mining.
SLAC -- October 1999
Mass Storage @ RCF
5
The HPSS Archive



Constraints - large capacity & high bandwidth:
– Two types of tape technology: SD-3 (best
$/GB) & 9840 (best $/MB/s).
– Two tape layers hierarchies. Easy management
of the migration.
Reliable and fast disk storage:
– FC attached RAID disk.
Platform compatible with HPSS:
– IBM, SUN, SGI.
SLAC -- October 1999
Mass Storage @ RCF
6
HPSS Structure

(1) Core Server:
–
–
–
–
–
RS/6000 Model H50
4x CPU
2GB RAM
Fast Ethernet (control)
Hardware RAID (metadata storage)
SLAC -- October 1999
Mass Storage @ RCF
7
HPSS Structure

(3) Movers:
–
–
–
–
–
–
–
–
RS/6000 Model H70
4x CPU
1GB RAM
Fast Ethernet (control)
Gigabit Ethernet (data) (1500&9000MTU)
2x FC attached RAID - 300GB - disk cache
(3-4) SD-3 “Redwood” tape transports
(3-4) 9840 “Eagle” tape transports
SLAC -- October 1999
Mass Storage @ RCF
8
HPSS Structure




Guarantee availability of resources for a
specific user group  separate resources 
separate PVRs & movers.
One mover per user group  total exposure
to single-machine failure.
Guarantee availability of resources for Data
Acquisition stream  separate hierarchies.
Result: 2PVR&2COS&1Mvr per group.
SLAC -- October 1999
Mass Storage @ RCF
9
HPSS topology
Net 1 - Data (1000baseSX)
10baseT
STK
Core
M1
M2
Client
M3
(Routing)
N x PVR
pftpd
Net 2 - Control (100baseT)
SLAC -- October 1999
Mass Storage @ RCF
10
HPSS Performance




80 MB/sec for the disk subsystem.
1 CPU per 40MB/sec for TCPIP (Gbit)
traffic (1500MTU).
~8MB/sec per SD-3 transport.
? per 9840 transport.
SLAC -- October 1999
Mass Storage @ RCF
11
I/O intensive systems



Mining and Analysis systems.
High I/O & moderate CPU usage.
To avoid large network traffic merge file
servers with HPSS movers:
– Major problem with HPSS support on non-AIX
platforms.
– Several (Sun) SMP machines or Large (SGI)
Modular System.
SLAC -- October 1999
Mass Storage @ RCF
12
I/O intensive systems

(6) NFS file servers for workareas
– (5) x E450 + (1) x E4000
– 4(6) x CPU; 2GB RAM; Fast/Gbit Ethernet.
– 2 x FC attached hardware RAID - 1.5TB


(1) NFS Home directory server (E450).
(3+3) AFS Servers (code dev. & home dirs.)
– RS/6000 model E30 and 43P

(NFS to AFS migration)
SLAC -- October 1999
Mass Storage @ RCF
13
Problems





Short lifecycle of the SD-3 heads.
– ~ 500 hours < 2 months @ average usage. (6 of 10
drives in 10 months)
Low throughput interface (F/W) for SD-3 -> high slot
consumption.
9840 ???
HPSS: tape cartridge closure @ transport error.
– Built a monitoring tool to try to predict transport failure
(based of soft error frequency).
SFS response when heavy loaded - no graceful failure
(timeouts & lost connections).
SLAC -- October 1999
Mass Storage @ RCF
14
Issues

Partially tested two tape layer hierarchies:
– Cartridge based migration.
– Manually scheduled reclaim.

Integration of file server and mover
functions on the same node:
– Solaris mover port.
– Not an objective anymore.
SLAC -- October 1999
Mass Storage @ RCF
15
Issues

Guarantee avail. of resources for specific
user groups:
– Separate PVRs & movers.
– Total exposure to single-mach. failure !

Reliability:
– Distribute resources across movers  share
movers (acceptable?).
– Inter-mover traffic:
• 1 CPU per 40MB/sec TCPIP per adapter:
Expensive!!!
SLAC -- October 1999
Mass Storage @ RCF
16
Inter-mover traffic - solutions





Affinity.
– Limited applicability.
Diskless hierarchies.
– Not for SD-3. Not tested on 9840.
High performance networking: SP switch.
– IBM only.
Lighter protocol: HIPPI.
– Expensive hardware.
Multiply attached storage (SAN).
– Requires HPSS modifications.
SLAC -- October 1999
Mass Storage @ RCF
17
Multiply Attached Storage
(SAN)
Client
1
2
Mover 1
Mover 2
(!)
SLAC -- October 1999
Mass Storage @ RCF
18
Summary

Problems with divergent requirements:
– Cost effective archive capacity and bandwidth.
• Two tape hierarchies: SD-3 & 9840.
• Test the configuration.
– Availability and reliability of HPSS resources.
• Separated COS and shared movers.
• Inter-mover traffic ?!?

Merger of file servers and HPSS movers?
SLAC -- October 1999
Mass Storage @ RCF
19

Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory Overview  Data Types: – Raw: very large volume (xPB), average bandwidth (50MB/s). – DST: average.

Transcript Mass Storage @ RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory Overview  Data Types: – Raw: very large volume (xPB), average bandwidth (50MB/s). – DST: average.

Directory