Transcript Slide 1
The Future of GPFS-WAN on the TeraGrid Phil Andrews, Patricia Kovatch San Diego Supercomputer Center SAN DIEGO SUPERCOMPUTER CENTER Outline • • • • History Current status Short term goals Long term goals SAN DIEGO SUPERCOMPUTER CENTER GPFS-WAN Timeline • At SC’02 • • At SC’03 • • we successfully tested GPFS-WAN mounts at every TG site (except ORNL) In Apr’07 • • • we demonstrated international GPFS-WAN with DEISA (European Grid) In Mar’07 • • we put the 220 TB (mirrored DS4100 SATA, 68 IA-64s) into production At SC’05 • • we showed 28 Gbps peak (24 Gpbs sustained) on a 30 Gbps link In Oct’05 • • we demonstrated wide area feasibility using a pre-release of GPFS At SC’04 • • we demonstrated wide area SAMFS with FC over IP we upgraded to fully redundant servers and 600 TB of DDN disk we migrated 150 TB from old GPFS-WAN In Jun’07 • we tested network failover to Abilene successfully SAN DIEGO SUPERCOMPUTER CENTER Background • GPFS-WAN is currently: • 16 8-way p575 servers redundantly connected • 600 TB usable/750 TB raw RAID 6 DDN storage • Provides: • A centralized online common data store serving the TG infrastructure • Enables: • Metascheduling including automatic resource selection and workflow • Current status: • Mounted in production at ANL, NCAR (login nodes), NCSA, SDSC • In progress towards production at Indiana • Tested at PSC, Purdue, TACC SAN DIEGO SUPERCOMPUTER CENTER GPFS-WAN Short Term Goals 1. Increase the number of sites mounting GPFS-WAN in production to include: • • LSU, ORNL, PSC, Purdue, TACC, and other TG machines Strategy: • DEISA (European Grid) has short term licenses to allow researchers access to DEISA’s common research data infrastructure • • • includes Cray and SGI machines within DEISA Obtain TG-wide licenses from IBM with similar short term, research restrictions as DEISA did Work with the TG Data WG to deploy in production at more sites 2. Export GPFS from Indiana and other interested sites? SAN DIEGO SUPERCOMPUTER CENTER Mounting Options 1. 2. 3. 4. • Mount continually Mount continually on a subset of nodes Mount manually upon user request Mount automatically upon job request Select option depending on the kind of machine • • E.g. select #1 for shared machines (Altix) Compile list of potential machines and options SAN DIEGO SUPERCOMPUTER CENTER GPFS-WAN Long Term Goals 1. Mount via pNFS • • • No GPFS licenses needed Demo at SC ‘07 Production code over a year away 2. File caching • • • Files kept locally at a TG partner site but part of unified GPFS-WAN namespace Files migrated to central GFPS-WAN later May demo at SC ‘07 3. Integrate with HPSS • • Automatic file migration from GPFS to HPSS Generally available Dec ‘07 SAN DIEGO SUPERCOMPUTER CENTER SC ‘07 Demo pNFS Client • Export GPFS-WAN via pNFS • 6 pNFS servers at SDSC • 2 pNFS clients at SC • Other clients at NCSA, ORNL, volunteers? • Expect to saturate 10/20 Gb/s link from SDSC • SCinet Bandwidth Challenge entry SAN DIEGO SUPERCOMPUTER CENTER pNFS Server SDSC TeraGrid Network pNFS Client pNFS Client GPFS-WAN Server SDSC Thanks to many… •ANL •Indiana •NCAR •NCSA •ORNL •PSC •Purdue •TACC ***None of this was possible without cooperation from TeraGrid partners*** SAN DIEGO SUPERCOMPUTER CENTER