Document 7740553

Download Report

Transcript Document 7740553

CCSM Performance, Successes and
Challenges
Tony Craig
NCAR
RIST Meeting
March 12-14, 2002
Boulder, Colorado, USA
OUTLINE
•
•
•
•
•
Overview of CCSM, platforms
Individual component performance
Coupling, load balance, coupled performance
Challenges in the environment
Summary
CCSM Overview
• CCSM = Community Climate System Model (NCAR)
• Designed to evaluate and understand earth’s climate,
both historical and future.
• Multiple executables (5)
–
–
–
–
–
Atmosphere (CAM), MPI/OpenMP
Ocean (POP), MPI
Land (CLM2), MPI/OpenMP
Sea Ice (CSIM4), MPI
Coupler (CPL5), OpenMP
CCSM Platforms
• Currently support
– IBM Power3, Power4
– SGI Origin
– CPQ (not quite)
• Future?
– Linux
– Vector Platforms
CCSM Performance Summary
• T42 resolution atm and land, 26 vertical levels in atm
(128x64x26)
• 1 degree resolution ocean and ice, 40 vertical levels
in ocean (320x384x40)
• On 100 processors of IBM winterhawk II (4
processors/node, 375 Mhz clock, TBMX network):
– Model runs about 4 simulated years / day
– Requires about a month to run a century
Component Timings
Seconds/simulated day
300
250
200
atm
lnd
ice
ocn
150
100
50
0
4
8
16
32
Number of processors
64
Component Scaling
1.2
Scaling
1
0.8
atm
lnd
ice
ocn
0.6
0.4
0.2
0
4
8
16
32
Number of processors
64
Coupling
• Multiple executables, running concurrently on unique
processors sets.
Atm
Ocean
Land
Ice
Time
Processors
• All communication is root-to-root, via coupler.
• Coupling frequency is 1 hour for atm, land, and ice
models, 1 day for ocean.
• 1 send and 1 receive for each component during
each coupling interval.
• Scientific requirement that the land and ice models
compute fluxes for the atmosphere “first”.
Figure 2: CCSM Load Balancing
processors
40 ocean
32 atm
16 ice
12 land
04 cpl
104 total
53.2
8.6
6.2
9.4
10.0
5 3
2
40.4
15.0
3.0
10.0
55
Timings in seconds per day
Other Coupling Options:
single executable, sequential
• Single executable, components running
sequentially on all processors
Ice
Land
Atm
Ocean
Time
Processors
– No idle time
– Components must scale well to large number of
processors
Other Coupling Options:
combined sequential/concurrent on
overlapping processors
• Single or multiple executable, components running
sequentially and/or concurrently on overlapping
processor sets.
Ice
Atm
Land
Ocean
Processors
– Optimal compromise between scaling and load balance.
– Requires sophisticated coupler and improvements to
hardware and machine software capabilities.
Time
Challenges in the Environment
• Batch environment; startup and control of
multiple executables
• Tools
– Debuggers are challenged on multiple executable,
MPI/OpenMP parallel models
– Timing and profiling tools inadequate
– Compilers and libraries can be slow, buggy, and
constantly changing
• Machines not well balanced; chip speed vs
interconnect vs memory access and cache.
Performance Summary
• Successes
– Single component scaling
– MPI/OpenMP parallelization of components
– Load balancing multiple executables
• Challenges
– Scaling up to 1000s of processors for our standard problem
– Load balancing multiple executables
– Optimizing relative to cache, memory, network, chip, and
parallelism
– RISC vs Vector architectures
– Environment