Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.
Download ReportTranscript Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.
Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University Outline • Co-array Fortran • • • • language overview CAF compiler status and preliminary results language and compiler research issues interactions • OpenMP • compiler and runtime strategies for improving scalability • Dragon tool • hybrid MPI + OpenMP • Open64 infrastructure • source-to-source and source-to-object code infrastructure Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 2 Co-Array Fortran (CAF) • Explicitly-parallel extension of Fortran 90/95 (Numrich & Reid) • Global address space SPMD parallel programming model • one-sided communication • Simple, two-level model that supports locality management • local vs. remote memory • Programmer control over performance critical decisions • data partitioning • communication • Suitable for mapping to a range of parallel architectures • shared memory, message passing, hybrid, PIM • Much in common with UPC Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 3 CAF Programming Model Features • SPMD process images • fixed number of images during execution • images operate asynchronously • Both private and shared data • real y(20, 20) a private 20x20 array in each image • real y(20, 20) [*] a shared 20x20 array in each image • Simple one-sided shared-memory communication • x(:, j:j+2) = y(r, :) [p:p+2] copy rows from p:p+2 into local columns • Flexible synchronization • sync_team(notify [,wait]) • notify = a vector of process ids to signal • wait = a vector of process ids to wait for • Pointers and (perhaps asymmetric) dynamic allocation • Parallel I/O Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 4 One-sided Communication with Co-Arrays integer a(10,20)[*] a(10,20) a(10,20) a(10,20) image 1 image 2 image N if (thisimage() > 1) a(1:5,1:10) = a(1:5,1:10)[thisimage()-1] image 1 image 2 Center for Programming Models for Scalable Parallel Computing image N Review, March 13, 2003 5 Finite Element Example (Numrich) subroutine assemble(start, prin, ghost, neib, x) integer :: start(:), prin(:), ghost(:), neib(:), k1, k2, p real :: x(:) [*] call sync_all(neib) do p = 1, size(neib) ! Add contributions from ghost regions k1 = start(p); k2 = start(p+1)-1 x(prin(k1:k2)) = x(prin(k1:k2)) + x(ghost(k1:k2)) [neib(p)] enddo call sync_all(neib) do p = 1, size(neib) ! Update the ghosts k1 = start(p); k2 = start(p+1)-1 x(ghost(k1:k2)) [neib(p)] = x(prin(k1:k2)) enddo call synch_all end subroutine assemble Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 6 Portable CAF Compiler • Compile CAF to Fortran 90 + runtime support library • source-to-source code generation for wide portability • expect best performance by leveraging vendor F90 compiler • Co-arrays • access data in generated code using F90 pointers • allocate storage with dope vector initialization outside F90 • Porting to a new compiler / architecture • synthesize compatible dope vectors for co-array storage • tailor communication to architecture Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 7 CAF Compiler Status • Near production-quality F90 front end from Open64 • Working prototype for a CAF subset • allocate co-arrays using static constructor-like strategy • co-array access • remote data access uses ARMCI get/put • process local data access uses load/store • synch_all, synch_team synchronization • multi-dimensional array section operations • Successfully compiled and executed NAS MG • platforms: SGI Origin, IA64 Myrinet • performance similar to hand-coded MPI Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 8 NAS MG Efficiency (Class C) IA64/Myrinet 2000 Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 9 CAF Compiler Coming Attractions • Co-arrays as procedure arguments • Triplet notation for co-dimensions • Co-arrays of user defined types • types can contain pointers • Dynamic allocation of co-arrays • Compiler support for parallel I/O Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 10 CAF Language Research Issues • Synchronization • • • • locks instead of critical sections split-phase primitives synch_team/synch_all semantics can require pairwise notification may need synchronization matching hints to enable optimization • Language support for efficient reductions • manually-coded reductions unlikely to yield portable performance • Memory consistency model for co-array data • Controlling process to processor mapping • Support for hierarchical locality domains • support work sharing on SMPs? Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 11 CAF Compiler Research Issues Aim for performance transparency • Compiler optimization of communication and I/O • multi-mode communication: direct load/store + RDMA • combine synchronization with communication • put/get with flag • one-sided two-sided communication • transform from get to put communication • exploit split-phase communication and synchronization • communication vectorization • latency hiding for communication and parallel I/O • platform-tailored optimization • synchronization strength reduction • Interoperability with other parallel programming models • Optimizations to improve node performance Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 12 CAF Interactions • • • • • Working with CAF code from Numrich and Wallcraft (NRL) Refining ARMCI synchronization with Nieplocha Designing parallel I/O design for CAF with UIUC Exploring language design with Numrich and Nieplocha Coordinating with Rasmussen (LANL) on Fortran 90 array dope vector interface library • Planning a fall CAF workshop at PSC • coordinating with Ralph Roskies, Sergiu Sanielevici • encouragement from Rich Hirsch, Fred Johnson Center for Programming Models for Scalable Parallel Computing Review, March 13, 2003 13