Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.

Download Report

Transcript Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.

Co-Array Fortran
Open-source compilers and tools for
scalable global address space computing
John Mellor-Crummey
Rice University
Outline
• Co-array Fortran
•
•
•
•
language overview
CAF compiler status and preliminary results
language and compiler research issues
interactions
• OpenMP
• compiler and runtime strategies for improving scalability
• Dragon tool
• hybrid MPI + OpenMP
• Open64 infrastructure
• source-to-source and source-to-object code infrastructure
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
2
Co-Array Fortran (CAF)
• Explicitly-parallel extension of Fortran 90/95 (Numrich & Reid)
• Global address space SPMD parallel programming model
• one-sided communication
• Simple, two-level model that supports locality management
• local vs. remote memory
• Programmer control over performance critical decisions
• data partitioning
• communication
• Suitable for mapping to a range of parallel architectures
• shared memory, message passing, hybrid, PIM
• Much in common with UPC
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
3
CAF Programming Model Features
• SPMD process images
• fixed number of images during execution
• images operate asynchronously
• Both private and shared data
• real y(20, 20)
a private 20x20 array in each image
• real y(20, 20) [*] a shared 20x20 array in each image
• Simple one-sided shared-memory communication
• x(:, j:j+2) = y(r, :) [p:p+2]
copy rows from p:p+2 into local columns
• Flexible synchronization
• sync_team(notify [,wait])
• notify = a vector of process ids to signal
• wait = a vector of process ids to wait for
• Pointers and (perhaps asymmetric) dynamic allocation
• Parallel I/O
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
4
One-sided Communication with Co-Arrays
integer a(10,20)[*]
a(10,20)
a(10,20)
a(10,20)
image 1
image 2
image N
if (thisimage() > 1) a(1:5,1:10) =
a(1:5,1:10)[thisimage()-1]
image 1
image 2
Center for Programming Models for Scalable Parallel Computing
image N
Review, March 13, 2003
5
Finite Element Example (Numrich)
subroutine assemble(start, prin, ghost, neib, x)
integer :: start(:), prin(:), ghost(:), neib(:), k1, k2, p
real :: x(:) [*]
call sync_all(neib)
do p = 1, size(neib) ! Add contributions from ghost regions
k1 = start(p); k2 = start(p+1)-1
x(prin(k1:k2)) = x(prin(k1:k2)) + x(ghost(k1:k2)) [neib(p)]
enddo
call sync_all(neib)
do p = 1, size(neib) ! Update the ghosts
k1 = start(p); k2 = start(p+1)-1
x(ghost(k1:k2)) [neib(p)] = x(prin(k1:k2))
enddo
call synch_all
end subroutine assemble
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
6
Portable CAF Compiler
• Compile CAF to Fortran 90 + runtime support library
• source-to-source code generation for wide portability
• expect best performance by leveraging vendor F90 compiler
• Co-arrays
• access data in generated code using F90 pointers
• allocate storage with dope vector initialization outside F90
• Porting to a new compiler / architecture
• synthesize compatible dope vectors for co-array storage
• tailor communication to architecture
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
7
CAF Compiler Status
• Near production-quality F90 front end from Open64
• Working prototype for a CAF subset
• allocate co-arrays using static constructor-like strategy
• co-array access
• remote data access uses ARMCI get/put
• process local data access uses load/store
• synch_all, synch_team synchronization
• multi-dimensional array section operations
• Successfully compiled and executed NAS MG
• platforms: SGI Origin, IA64 Myrinet
• performance similar to hand-coded MPI
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
8
NAS MG Efficiency (Class C)
IA64/Myrinet 2000
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
9
CAF Compiler Coming Attractions
• Co-arrays as procedure arguments
• Triplet notation for co-dimensions
• Co-arrays of user defined types
• types can contain pointers
• Dynamic allocation of co-arrays
• Compiler support for parallel I/O
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
10
CAF Language Research Issues
• Synchronization
•
•
•
•
locks instead of critical sections
split-phase primitives
synch_team/synch_all semantics can require pairwise notification
may need synchronization matching hints to enable optimization
• Language support for efficient reductions
• manually-coded reductions unlikely to yield portable performance
• Memory consistency model for co-array data
• Controlling process to processor mapping
• Support for hierarchical locality domains
• support work sharing on SMPs?
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
11
CAF Compiler Research Issues
Aim for performance transparency
• Compiler optimization of communication and I/O
• multi-mode communication: direct load/store + RDMA
• combine synchronization with communication
• put/get with flag
• one-sided  two-sided communication
• transform from get to put communication
• exploit split-phase communication and synchronization
• communication vectorization
• latency hiding for communication and parallel I/O
• platform-tailored optimization
• synchronization strength reduction
• Interoperability with other parallel programming models
• Optimizations to improve node performance
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
12
CAF Interactions
•
•
•
•
•
Working with CAF code from Numrich and Wallcraft (NRL)
Refining ARMCI synchronization with Nieplocha
Designing parallel I/O design for CAF with UIUC
Exploring language design with Numrich and Nieplocha
Coordinating with Rasmussen (LANL) on Fortran 90 array
dope vector interface library
• Planning a fall CAF workshop at PSC
• coordinating with Ralph Roskies, Sergiu Sanielevici
• encouragement from Rich Hirsch, Fred Johnson
Center for Programming Models for Scalable Parallel Computing
Review, March 13, 2003
13