The Met Office Unified Model I/O Server, ENES Workshop: Scalable

Download Report

Transcript The Met Office Unified Model I/O Server, ENES Workshop: Scalable

Met Office Unified Model I/O Server
Paul Selwood
© Crown copyright Met Office
I/O Server motivation
© Crown copyright Met Office
Some History…
• I/O has always been a problem for NWP, more
recently for climate
• ~2003 – application level output buffering
• ~2008 – very simple, single threaded I/O
servers added for benchmarking
• Intercepted low-level “open/write/close”
• Single threaded
• Some benefit, but limited
• Not addressed scaling issues – message numbers
© Crown copyright Met Office
Old UM I/O – Restart Files
© Crown copyright Met Office
Old UM I/O - Diagnostics
© Crown copyright Met Office
Why I/O Server approach?
• Full parallel I/O difficult with our packing
• “Free” CPUs available
• “Spare” memory available
• Chance to re-work old infrastructure
• Our file format is neither GRIB or netCDF.
© Crown copyright Met Office
Diagnostic flexibility
• Variables (primary and derived)
• Output times
• Temporal processing (e.g. accumulations, extrema,
means)
• Spatial processing (sub-domains, spatial means)
• Variable to unit mapping
• Basic output resolution is a 2D field
© Crown copyright Met Office
Key design decisions
• Parallelism over output streams
• Output streams distributed over servers
• Server is threaded
• “Listener” receives data & puts in queue
• “Writer” processes queue including packing
• Ensures asynchronous behaviour
• Shared FIFO queue
• Preserves instruction order
• Metadata/Data split
• Data initially stored on compute processes
• Data of same type combined into large messages
© Crown copyright Met Office
Parallelism in I/O Servers
•Multiple I/O streams in
typical job
•I/O servers spread
among nodes
• Can utilise more
memory
• Will improve
bandwidth to disk
© Crown copyright Met Office
Automatic post-processing
• Model can trigger automatic post-processing
• Requests dealt with by I/O Server
• FIFO queue ensures integrity of data
© Crown copyright Met Office
How data gets output
Thread 1
Thread 0
Compute
I/O
Listener
© Crown copyright Met Office
Writer
I/O Server development
• Initial version – Synchronous data transmission
• Asynchronous diagnostic data
• Asynchronous restart data
• Amalgamated data
• Asynchronous metadata
• Load balancing
• Priority messages with I/O Server
© Crown copyright Met Office
Lots of diagnostic output
• Which processes are I/O servers
• “Stall” messages
• Memory log
• Timing log
• Full log of metadata / queue
All really useful for tuning!
© Crown copyright Met Office
Lots of tuneable parameters…
• Number and spacing of I/O servers
• Memory for I/O servers
• Number of local data copies
• Number of fields to amalgamate
• Load balancing options
• Timing tunings
• + standard I/O tunings (write block size) etc
© Crown copyright Met Office
Overloaded servers
© Crown copyright Met Office
I/O Servers keeping up!
© Crown copyright Met Office
MPI considerations
• Differing levels of MPI threading support
• Best with MPI_THREAD_MULTIPLE
• OK with MPI_THREAD_FUNNELED
• MPI tuning
• Want metadata to go as quickly as possible
• Want data transfer to be truly asynchronous
• Don’t want to interfere with model comms (e.g. halo
exchange)
• Currently use 19 environment variables!
© Crown copyright Met Office
Deployment
• July 2011 – Operational global forecasts
• January 2012 – Operational LAM forecasts
• February 2012 – High resolution climate work
• Not currently used in
• Operational ensembles
• Low resolution climate work
• Most research work
© Crown copyright Met Office
Global Forecast Improvement
QG
QG
00/12
06/18
Time
777s
559s
257s
%age
19%
28%
27%
Total saving: over 21 node-hours per day
© Crown copyright Met Office
QU
Impact on High Resolution
Climate
• N512 resolution AMIP
• 59 GB restart dumps
• Modest diagnostics
• Cray XE6 with up to 9K
cores
• All “in-run” output hidden
• Waits for final restart
dump
• Most data buffered on
client side
© Crown copyright Met Office
Current and Future
Developments
• MPI Parallel I/O servers
• Multiple I/O servers per stream
• Gives more memory per stream on server
• Reduced messaging rate per node
• Parallel packing
• Potential for parallel I/O
• Read ahead
• Potential for boundary conditions / forcings
• Some possibilities for initial condition
© Crown copyright Met Office
Parallel I/O server improvement
Before
After
© Crown copyright Met Office
Questions and answers
© Crown copyright Met Office