SMALL IDENTIFIERS FOR CHARE ARRAY ELEMENTS Phil Miller With contributions from Akhil Langer, Harshitha Menon, Bilge Acun, Ramprasad Venkataraman, and L.V.
Download
Report
Transcript SMALL IDENTIFIERS FOR CHARE ARRAY ELEMENTS Phil Miller With contributions from Akhil Langer, Harshitha Menon, Bilge Acun, Ramprasad Venkataraman, and L.V.
SMALL IDENTIFIERS FOR
CHARE ARRAY ELEMENTS
Phil Miller
With contributions from Akhil Langer, Harshitha Menon,
Bilge Acun, Ramprasad Venkataraman, and L.V. Kalé
Array Element Location Management
Enables process virtualization
Directs
messages from sender PE to host PE
Maps element identifier to object pointer on host PE
Processes element instantiation & deletion
Placement
at creation
On-demand creation
Detects duplicate insertion
Array Element Location Management
Hooks for RTS introspection and adaptivity
Tracing
Load
instrumentation
Migration
Fault tolerance
Array Index Structure
Fixed 16 bytes
No less, even for
small/simple arrays
No more, even for
sophisticated arrays
Home PE
Track assigned elements
Existence
Current
host PE
Default host PE
Tell other PEs as necessary
Assigned by Array Map
Static Array Maps
Array Index to Home PE
Simple strategies
Block
Round-robin
(cyclic)
File
Application-specific
OpenAtom
CharmLU
Pushing the Envelope
Array message variant takes 34-38 bytes
Next largest only needs 18 bytes
Could
save 16-20 bytes on every message!
Goals of a shorter ID
Reduce envelope size
Shrink memory footprint
Improve fine-grain performance
Enable future index evolution
Design parameters
Preserve API:
Send
messages by index
Maps and home PEs
Avoid extra communication
Maintain or improve performance
Scheme
64 Bits
Protocol
Home PE generates ID at construction
Simple
counter in element field
Async request if constructing elsewhere
ID requests piggy-back on location requests
Extra messages only for unusual construction
Potential Optimizations
PE-level caching & pointer lookups
Index compression instead of lookups
Index Compression
Many arrays fit directly within 48-bit space
All
1D with 32-bit ‘int’
2D < (16M)2
Etc.
Specify bounds, RTS will bit-pack if they fit
Could also enable hashing
Collisions
would be disastrous
Known indices and perfect hashing?
Summary
Current Status
Implemented,
passing all tests
Performance
Comparable
in coarse-grain apps
Slightly slower in fine-grain
Future Direction: arbitrary index types
AMR,
tree codes