Transcript Slide 1

The Future of GPFS-WAN
on the TeraGrid
Phil Andrews, Patricia Kovatch
San Diego Supercomputer Center
SAN DIEGO SUPERCOMPUTER CENTER
Outline
•
•
•
•
History
Current status
Short term goals
Long term goals
SAN DIEGO SUPERCOMPUTER CENTER
GPFS-WAN Timeline
•
At SC’02
•
•
At SC’03
•
•
we successfully tested GPFS-WAN mounts at every TG site (except ORNL)
In Apr’07
•
•
•
we demonstrated international GPFS-WAN with DEISA (European Grid)
In Mar’07
•
•
we put the 220 TB (mirrored DS4100 SATA, 68 IA-64s) into production
At SC’05
•
•
we showed 28 Gbps peak (24 Gpbs sustained) on a 30 Gbps link
In Oct’05
•
•
we demonstrated wide area feasibility using a pre-release of GPFS
At SC’04
•
•
we demonstrated wide area SAMFS with FC over IP
we upgraded to fully redundant servers and 600 TB of DDN disk
we migrated 150 TB from old GPFS-WAN
In Jun’07
•
we tested network failover to Abilene successfully
SAN DIEGO SUPERCOMPUTER CENTER
Background
• GPFS-WAN is currently:
• 16 8-way p575 servers redundantly connected
• 600 TB usable/750 TB raw RAID 6 DDN storage
• Provides:
• A centralized online common data store serving the TG infrastructure
• Enables:
• Metascheduling including automatic resource selection and workflow
• Current status:
• Mounted in production at ANL, NCAR (login nodes), NCSA, SDSC
• In progress towards production at Indiana
• Tested at PSC, Purdue, TACC
SAN DIEGO SUPERCOMPUTER CENTER
GPFS-WAN Short Term Goals
1. Increase the number of sites mounting GPFS-WAN in
production to include:
•
•
LSU, ORNL, PSC, Purdue, TACC, and other TG machines
Strategy:
•
DEISA (European Grid) has short term licenses to allow
researchers access to DEISA’s common research data
infrastructure
•
•
•
includes Cray and SGI machines within DEISA
Obtain TG-wide licenses from IBM with similar short term,
research restrictions as DEISA did
Work with the TG Data WG to deploy in production at more sites
2. Export GPFS from Indiana and other interested sites?
SAN DIEGO SUPERCOMPUTER CENTER
Mounting Options
1.
2.
3.
4.
•
Mount continually
Mount continually on a subset of nodes
Mount manually upon user request
Mount automatically upon job request
Select option depending on the kind of machine
•
•
E.g. select #1 for shared machines (Altix)
Compile list of potential machines and options
SAN DIEGO SUPERCOMPUTER CENTER
GPFS-WAN Long Term Goals
1. Mount via pNFS
•
•
•
No GPFS licenses needed
Demo at SC ‘07
Production code over a year away
2. File caching
•
•
•
Files kept locally at a TG partner site but part of unified GPFS-WAN
namespace
Files migrated to central GFPS-WAN later
May demo at SC ‘07
3. Integrate with HPSS
•
•
Automatic file migration from GPFS to HPSS
Generally available Dec ‘07
SAN DIEGO SUPERCOMPUTER CENTER
SC ‘07 Demo
pNFS
Client
• Export GPFS-WAN
via pNFS
• 6 pNFS servers at
SDSC
• 2 pNFS clients at SC
• Other clients at
NCSA, ORNL,
volunteers?
• Expect to saturate
10/20 Gb/s link
from SDSC
• SCinet Bandwidth
Challenge entry
SAN DIEGO SUPERCOMPUTER CENTER
pNFS
Server
SDSC
TeraGrid
Network
pNFS
Client
pNFS
Client
GPFS-WAN
Server
SDSC
Thanks to many…
•ANL
•Indiana
•NCAR
•NCSA
•ORNL
•PSC
•Purdue
•TACC
***None of this was possible without
cooperation from TeraGrid partners***
SAN DIEGO SUPERCOMPUTER CENTER