www.cisl.ucar.edu

Download Report

Transcript www.cisl.ucar.edu

Climate Research at the National Energy Research
Scientific Computing Center (NERSC)
Bill Kramer
Deputy Director and Head of High Performance Computing
CAS 2001
October 30, 2001
#
NERSC Vision
NERSC strives to be a world leader in
accelerating scientific discovery through
computation. Our vision is to provide highperformance computing tools and expertise
to tackle science's biggest and most
challenging problems, and to play a major
role in advancing large-scale computational
science and computer science.
CCPM, October 3, 2001
2
Outline
• NERSC-3: Successfully fielding the world’s most
powerful unclassified computing resource
• The NERSC Strategic Proposal: An Aggressive
Vision for the Future of the Flagship Computing
Facility of the Office of Science
• Scientific Discovery through Advanced Computing
(SciDAC) at NERSC
• Support for Climate Computing at NERSC: Ensuring
Success for the National Program
CCPM, October 3, 2001
3
FY00 MPP Users/Usage by
Scientific Discipline
NERSC FY00 MPP
Users by Discipline
NERSC FY00 MPP
Usage by Discipline
CCPM, October 3, 2001
4
NERSC FY00 Usage by Site
MPP Usage
PVP Usage
CCPM, October 3, 2001
5
FY00 Users/Usage
by Institution Type
CCPM, October 3, 2001
6
NERSC Computing
Highlights for FY 01
• NERSC 3 is in full and final production – exceeding
original capability by more than 30% and with much
larger memory.
• Increased total FY 02 allocations of computer time by
450% over FY01.
• Activated the new Oakland Scientific Facility
• Upgraded NERSC network connection to 655 Mbits/s
(OC12) – ~4 times the previous bandwidth.
• Increase archive storage capacity with 33% more
tape slots and double the number of tape drives
• PDSF, T3E, SV1s, and other systems all continue
operating very well
CCPM, October 3, 2001
7
Oakland Scientific Facility
• 20,000 sf computer room;
7,000 sf office space
— 16,000 sf computer space built out
— NERSC occupying 12,000 sf
• Ten-year lease with 3 five-year options
• $10.5M computer room construction costs
• Option for additional 20,000+ sf computer room
CCPM, October 3, 2001
8
HPSS Archive Storage
• 190 Terabytes of data in the storage
systems
Cumulative Storage by Month and System
Archive
User/Regent
200
Backup
180
160
140
TB
120
100
80
60
• 9 Million files in the storage systems
40
20
98
10
98
12
99
02
99
04
99
06
99
08
99
10
99
12
20
00
02
20
00
04
20
00
06
20
00
08
20
00
10
20
00
12
20
01
02
20
01
04
0
Month
• Average 600-800 GBs Data
transferred/day
File Counts by Date and System
Archive
User/Regent
12
Files (Millions)
•Peak 1.5 TB
Backup
10
8
6
4
2
• Average 18,000 files transferred/day
98
10
98
12
99
02
99
04
99
06
99
08
99
10
99
12
20
00
02
20
00
04
20
00
06
20
00
08
20
00
10
20
00
12
20
01
02
20
01
04
0
Month
• Peak 60,000
Monthly IO by Month and System
Archive
User/Regent
25
Backup
• 500-600 Tape mounts/day
I/O (TB)
20
15
10
5
CCPM, October 3, 2001
0
98
10
98
12
99
02
99
04
99
06
99
08
99
10
99
12
20
00
02
20
00
04
20
00
06
20
00
08
20
00
10
20
00
12
20
01
02
20
01
04
•Peak 2000) (12/system)
Month
9
NERSC-3 Vital Statistics
•
5 Teraflop/s Peak Performance – 3.05 Teraflop/s with Linpack
— 208 nodes, 16 CPUs per node at 1.5 Gflop/s per CPU
— “Worst case” Sustained System Performance measure .358 Tflop/s (7.2%)
— “Best Case” Gordon Bell submission 2.46 on 134 nodes (77%)
•
4.5 TB of main memory
— 140 nodes with 16 GB each, 64 nodes with 32 GBs, and 4 nodes with 64 GBs.
•
40 TB total disk space
— 20 TB formatted shared, global, parallel, file space; 15 TB local disk for system usage
•
Unique 512 way Double/Single switch configuration
CCPM, October 3, 2001
10
Two Gordon Bell-Prize
Finalists Are Using NERSC-3
• Climate Modeling -- Shallow Water Climate Model
sustained 361 Gflop/s (12%) – S. Thomas et al.,
NCAR.
• Materials Science -- 2016-atom supercell models for
spin dynamics simulations of magnetic structure of
iron-magnanese/cobalt interface. Using 2176
processors of NERSC 3 showed a sustained 2.46
teraflop/s – M. Stocks and team at ORNL and U.
Pittsburgh with A. Canning at NERSC
Section of an FeMn/Co
interface shows a new
magnetic structure that is
different from the magnetic
structure of pure FeMn.
CCPM, October 3, 2001
11
NERSC System Architecture
FDDI/
ETHERNET
10/100/Gigbit
REMOTE
VISUALIZATION
SERVER
MAX
SGI
STRAT
SYMBOLIC
MANIPULATION
SERVER
HPSS
HPSS
IBM
And STK
Robots
HIPPI
DPSS
PDSF
ESnet
Research
Cluster
CRI T3E
900 644/256
CRI SV1
MILLENNIUM
LBNL Cluster
VIS LAB
IBM SP
NERSC-3 – Phase 2
2532 Processors/ 1824 Gigabyte
Memory/32 Terabytes of Disk
CCPM, October 3, 2001
12
NERSC Strategic Proposal
An Aggressive Vision for the Future of the
Flagship Computing Facility of the Office of
Science
CCPM, October 3, 2001
13
The NERSC Strategic Proposal
• Requested In February, 2001 by the Office of
Science as a proposal for the next five years of the
NERSC Center and Program
• Proposal and Implementation Plan delivered to
OASCR at the end of May, 2001
• Proposal plays from NERSC’s strengths, but
anticipates rapid and broad changes in scientific
computing.
• Results of DOE review expected at the end of
November-December 2001
CCPM, October 3, 2001
14
CCPM, October 3, 2001
15
High-End Systems: A Carefully
Researched Plan for Growth
A three-year procurement
cycle for leading-edge
computing platforms
Balanced Systems, with
appropriate data storage
and networking
CCPM, October 3, 2001
16
CCPM, October 3, 2001
17
NERSC Support for the DOE Scientific
Discovery through Advanced
Computing (SciDAC)
CCPM, October 3, 2001
18
Scientific Discovery Through Advanced
Computing
Materials
DOE Science Programs
Need Dramatic Advances
in Simulation
Capabilities
To Meet Their
Mission Goals
Combustion
Global
Systems
Health Effects,
Bioremediation
Subsurface
CCPM, October 3, Fusion
2001
Energy
Transport
19
LBNL/NERSC SciDAC Portfolio
– Project Leadership
Principal
Investigator
Partner Institutions
Annual
Funding
Scientific Data Mgmt
Center (ISIC)
Shoshani
ANL, LLNL, ORNL, UC San Diego,
Georgia Institute of Tech;
Northwestern Univ; No Carolina State
Univ
$624,000
Applied Partial Differential
Center (ISIC)
Colella
LLNL, Univ of Wash, No Carolina,
Wisc, UC Davis; NYU
$1,700,000
Performance Evaluation
Research Center (ISIC)
Bailey
ORNL, ANL, LLNL, Univ of Maryland,
Tenn, Ill at Urbana-CHAMPAIGN, UC
San Diego
$276,000
DOE Science Grid:
Enabling and Deploying the
SciDAC Collaboratory
Software Environment
Johnston
ORNL, ANL, NERSC, PNNL
$510,000
Advanced Computing for
the 21st Century
Accelerator Science
Technology
Ryne
NERSC, SLAC
$650,000
Project Name
CCPM, October 3, 2001
20
Applied Partial Differential
Equations ISIC
Developing a new algorithmic and software framework for solving
partial differential equations in core mission areas.
• New algorithmic capabilities with highperformance implementations on high-end
computers:
—Adaptive mesh refinement
—Cartesian grid embedded boundary
methods for complex geometries
—Fast adaptive particle methods
• Close collaboration with applications
scientists
• Common mathematical and software
framework for multiple applications
Participants: LBNL (J. Bell, P. Colella), LLNL , Courant Institute,
Univ. of Washington, Univ. of North Carolina, Univ. of California, Davis,
Univ. of Wisconsin.
CCPM, October 3, 2001
21
Scientific Data Management ISIC
Petabytes
Tapes
Scientific
Simulations
& Experiments
Terabytes
Disks
Goals
Optimize and simplify:
• Access to very large data sets
• Access to distributed data
• Access of heterogeneous data
• Data mining of very large data sets
SDM-ISIC Technology
Data
Manipulation:
• Getting files from
tape archive
~80%
• Extracting subset
time
of data from files
• Reformatting data
• Getting data from
heterogeneous,
distributed systems
• Moving data over
the network
Participants: ANL,
LBNL, LLNL,
ORNL,
GTech, NCSU, NWU,
SDSC
•Optimizing shared access
from mass storage systems
•Metadata and knowledgebased federations
•API for Grid I/O
•High-dimensional
cluster analysis
•High-dimensional
indexing
•Adaptive file
caching
•Agents
~20%
time
~80%
time
~20%
time
Data
Manipulation:
• Using SDM-ISIC
technology
Scientific
Analysis
& Discovery
Scientific
Analysis
& Discovery
CCPM, October 3, 2001
22
SciDAC Portfolio – NERSC as a
Collaborator
Co-Principal
Investigator
Lead PI &
Institution
Annual
Funding
DOE Science Grid: Enabling and Deploying the
SciDAC Collaboratory Software Environment
Kramer
Johnston - LBNL
$225,000
Scalable Systems Software Enabling Technology
Center
Hargrove
Al Geist - ORNL
$198,000
Advanced Computing for the 21st Century
Accelerator Science Technology
Ng
Robert Ryne - LBNL
$200,000
Terascale Optimal PDE Simulations Center (TOPS
Center)
Ng
Barry Smith and
Jorge More - ANL
$516,000
Earth Sys Grid: The Next Generation Turning
Climate Datasets into Community Resources
Shoshani
Ian Foster - ANL
$255,000
Particle Physics Data Grid Collaboratory Pilot
Shoshani
Richard Mount SLAC
$405,000
Collaborative Design and Development of the
Community Climate System Model for Terascale
Computers
Ding
Malone – Drake,
LANL, ORNL
$400,000
Project Name
CCPM, October 3, 2001
23
Strategic Project Support
• Specialized Consulting Support
—Project Facilitator Assigned
• Help defining project requirements
• Help with getting resources
• Code tuning and optimization
—Special Service Coordination
• Queues, throughput, increased limits, etc.
• Specialized Algorithmic Support
—Project Facilitator Assigned
• Develop and improve algorithms
• Performance enhancement
—Coordination with ISICs to represent work and
activities
CCPM, October 3, 2001
24
Strategic Project Support
• Special Software Support
— Projects can request support for packages and software that
are special to their work and not as applicable to the general
community
• Visualization Support
— Apply NERSC Visualization S/W to projects
— Develop and improve methods specific to the projects
— Support any project visitors who use the local LBNL
visualization lab
• SciDAC Conference and Workshop Support
— NERSC Staff will provide content and presentations at
project events
— Provide custom training as project events
— NERSC staff attend and participate at project events
CCPM, October 3, 2001
25
Strategic Project Support
• Web Services for interested projects
— Provide areas on NERSC web servers for interested projects
• Password protected areas as well
• Safe “sandbox” area for dynamic script development
— Provide web infrastructure
• Templates, structure, tools, forms, dynamic data scripts (cgigin)
— Archive for mailing lists
— Provide consulting support to help projects organize and
manage web content
• CVS Support
— Provide a server area for interested projects
• Backup, administration, access control
— Provide access to code repositories
— Help projects set up and manage code repositories
CCPM, October 3, 2001
26
Strategic Project Area
Facilitators
User Services
Facilitator
Scientific Computing
Facilitator
Fusion
David Turner
Dr Jodi Lamoureux
QCD
Dr Majdi Baddourah
Dr Jodi Lamoureux
Experimental Physics
Dr Iwona Sakrejda
Dr Jodi Lamoureux
AstroPhysics
Dr Richard Gerber
Dr Peter Nugent
Accelerator Physics
Dr Richard Gerber
Dr Esmond Ng
Chemistry
Dr David Skinner
Dr Lin Wang
Life Science
Dr Jonathan Carter
Dr Chris Ding
Climate
Dr Harsh Anand Passi
Dr Chris Ding
Computer Science
Thomas Deboni
Dr Parry Husbands /Dr
Osni Marques (for CCA)
Applied math
Dr Majdi Baddourah
Dr Chao Yang
CCPM, October 3, 2001
27
NERSC Support for Climate Research
Ensuring Success for the National Program
CCPM, October 3, 2001
28
Climate Projects at NERSC
• 20+ projects from the base MPP allocations with
about ~6% of the entire base resource
• Two Strategic Climate Projects
—High Resolution Global Coupled Ocean/Sea Ice
Modeling – Matt Maltrud @ LANL
• 5% of total SP hours (920,000 wall clock hours)
• “Couple high resolution ocean general circulation model
with high resolution dynamic thermodynamic sea ice
model in a global context.”
—1/10th degree (3 to 5 km in polar regions)
—Warren Washington, Tom Bettge, Tony Craig, et
al.
• PCM coupler
CCPM, October 3, 2001
29
Early Scientific Results Using
NERSC-3
• Climate Modeling –
50km resolution for
global climate simulation
run in a 3 year test.
Proved that the model is
robust to a large
increase in spatial
resolution. Highest
spatial resolution ever
used, 32 times more grid
cells than ~300km grids,
takes 200 times as long.
– P. Duffy, LLNL
Reaching Regional Climate Resolution
CCPM, October 3, 2001
30
Some other Climate Projects
NERSC staff have helped with
• Richard Loft, Stephen Thomas and John Dennis,
NCAR - Using 2,048 processors on NERSC-3,
demonstration that dynamical core of an atmospheric
general circulation model (GCM) can be integrated at
a rate of 130 years per day
• Inez Fung (UCB) - CSM to build a Carbon Climate
simulation package using the SV1
• Mike Wehner - CCM to do large scale ensemble
simulations on T3E
• Doug Rotman – Atmospheric Chemistry/Aerosol
Simulations
• Tim Barnett and Detlaf Stammer – PCM runs on T3E
and SP.
CCPM, October 3, 2001
31
ACPI/Avantgarde/SciDAC
• Work done by Chris Ding and team
—comprehensive performance analysis of GPFS on
IBM SP (supported by Avant Garde).
— I/O performance analysis, see
http://www.nersc.gov/research/SCG/acpi/IO/
— numerical reproducibility and stability
— MPH: a library for distributed multi-component
environment
CCPM, October 3, 2001
32
Special Support for Climate
Computing
NCAR CSM version 1.2
• NERSC was the first site to port NCAR CSM to nonNCAR Cray PVP machine
• Main users Inez Fung (UCB) and Mike Wehner
(LLNL)
NCAR CCM3.6.6
• Independent of CSM, NERSC ported NCAR
CCM3.6.6 to NERSC Cray PVP cluster.
• See
http://hpcf.nersc.gov/software/apps/climate/ccm3/
CCPM, October 3, 2001
33
Special Support for Climate
Computing – cont.
•
T3E netCDF parallelization
— NERSC solicited user input for defining parallel I/O requirements
for the MOM3, LAN and CAMILLE climate models (Ron Pacanowski,
Venkatramani Balaji, Michael Wehner, Doug Rotman and John Tannahill)
— Development of netCDF parallelization on T3E was done by Dr. RK
Owen at NERSC/USG based on modelers requirements
•
•
•
•
•
•
•
better I/O performance,
master/slave read/write capability
support for variable unlimited dimension
allow subset of PEs open/close netCDF dataset
user friendly API
etc.
— Demonstrated netCDF parallel I/O usage by building model
specific I/O test cases (MOM3, CAMILLE).
— netCDF 3.5 official UNIDATA release includes “added support
provided by NERSC for multiprocessing on Cray T3E.“
http://www.unidata.ucar.edu/packages/netcdf/release-notes3.5.0.html
Parallel netCDF for IBM SP under development by Dr. Majdi
Baddourah of NERSC/USG
CCPM, October 3, 2001
34
Additional Support
for Climate
• Scientific Computing and User Service’s Groups have staff with
special climatic focus
• Received funding for a new climate support person at NERSC
• Will provide software, consulting, and documentation support for
climate researchers at NERSC
• Will port the second generation of NCAR's Community Climate
System Model (CCSM-2) to NERSC's IBM SP.
• Put the modified source code under CVS control so that
individual investigators at NERSC can access the NERSC
version, and modify and manipulate their own source without
affecting others.
• Provide necessary support and consultation on operational
issues.
• Will develop enhancements to NetCDF on NERSC machines
that benefit NERSC's climate researchers.
• Will respond in a timely, complete, and courteous manner to
NERSC user clients, and provide an interface between NERSC
users and staff.
CCPM, October 3, 2001
35
NERSC Systems Utilization
IBM SP – 80-85% Gross utilization
T3E – 95% Gross utilization
MPP Charging and Usage
FY 1998-2000
18000
Peak
Goal
16000
90%
85%
14000
80%
Allocation
Starvation
Full
Scheduling
Functionality
10000
30-Day Moving Ave. Los t Tim e
4.4% improvement
per month
8000
30-Day Moving Ave. Pierre Free
30-Day Moving Ave. Pierre
30-Day Moving Ave. GC0
6000
30-Day Moving Ave. Mcurie
Checkpoint
Resart - Start of
Capability Jobs
4000
30-Day Moving Ave. Overhead
80%
Allocation
Starvation
85%
Systems Merged
90%
Max CPU Hours
2000
27-Sep-00
2-Aug-00
30-Aug-00
5-Jul-00
7-Jun-00
12-Apr-00
10-May-00
15-Mar-00
19-Jan-00
16-Feb-00
24-Nov-99
22-Dec-99
27-Oct-99
29-Sep-99
4-Aug-99
1-Sep-99
7-Jul-99
9-Jun-99
14-Apr-99
12-May-99
17-Mar-99
17-Feb-99
20-Jan-99
25-Nov-98
23-Dec-98
28-Oct-98
30-Sep-98
5-Aug-98
2-Sep-98
8-Jul-98
10-Jun-98
15-Apr-98
13-May-98
18-Mar-98
18-Feb-98
21-Jan-98
24-Dec-97
26-Nov-97
1-Oct-97
0
29-Oct-97
CPU Hours
12000
Date
CCPM, October 3, 2001
36
CCPM, October 3, 2001
11/25/2000
11/18/2000
11/11/2000
11/4/2000
10/28/2000
10/21/2000
10/14/2000
10/7/2000
9/30/2000
9/23/2000
9/16/2000
9/9/2000
9/2/2000
8/26/2000
8/19/2000
8/12/2000
8/5/2000
7/29/2000
7/22/2000
7/15/2000
7/8/2000
7/1/2000
6/24/2000
6/17/2000
6/10/2000
6/3/2000
5/27/2000
5/20/2000
5/13/2000
5/6/2000
4/29/2000
Hours
NERSC Systems Run “large”
Jobs
IBM SP
T3E
16000
Mcurie MPP T ime by Job Size - 30 Day Mov ing Av erage
14000
12000
10000
8000
257-512
129-256
97-128
6000
65-96
33-64
17-32
<16
4000
2000
0
Date
37
Balancing Utilization and
Turnaround
•
•
•
NERSC consistently delivers high utilization on MPP systems, while
running large applications.
We are now working with our users to establish methods to provide
improved services
— Guaranteed throughput for at least a selected group of projects
— More interactive and debugging resources for parallel applications
— Longer application runs
— More options in resource requests
Because of the special turnaround requirements of the large climate
users
— NERSC established a queue working group (T. Bettge, Vince
Wayland at NCAR)
— Set up special queue scheduling procedures that provide an
agreed upon amount of turnaround per day if there is work in it
(Sept. ‘01)
— Will present a plan at the NERSC User Group Meeting, November
12, 2001 in Denver, about job scheduling
CCPM, October 3, 2001
38
Wait times in “regular” queue
Climate jobs
All other jobs
CCPM, October 3, 2001
39
NERSC Is Delivering on Its Commitment to
Make the Entire DOE Scientific Computing
Enterprise Successful
• NERSC sets the standard for effective
supercomputing resources
• NERSC is a major player in SciDAC and will
coordinate it projects and collaborations
• NERSC is providing targeted support to SciDAC
projects
• NERSC continues to provide targeted support for the
climate community and is acting on the input and
needs of the climate community
CCPM, October 3, 2001
40