Transcript Pr cis
Large Matrix-Matrix Multiply on PS3 clusters 15 September 2010 Mark Barnell, AFRL RITB [email protected] Dennis Fitzgerald, ITT [email protected] DISTRIBUTION STATEMENT A. Approved for public release; distribution unlimited. (Approval given by Public Affairs Office (September 2010). Description • • • • Matrix-Matrix multiplication of large matrices > 100k x 100k Parallelized over a number PS3s Maintained near peak performance on each Cell BE UNCLASSIFIED 2 Challenges • Near peak computation rate on the Cell BE for small matrix sizes • Data and thread coordination between PowerPC and Cell BE with near zero overhead • Balanced IO with Cell BE’s peak FLOPS to keep PS3 computationally busy • Network performance sufficient to deliver enough data to many PS3s UNCLASSIFIED 3 Approach • Core MM algorithm > 99% efficient (128x128) – Daniel Hackenberg – Dresden • PowerPC code to coordinate larger rectangular matrices – Miriam Leeser – Northeastern • Multi-buffering & semaphors to reduce wait time • Blocked sub-matrix distribution with data sized to balance compute and IO UNCLASSIFIED 4 Results Matrix-Matrix Mutiply GFLOPS 48k x 48k 3500.00 48k x 240k 3000.00 GFLOPS 2500.00 PS3 Max GFLOPS (153) 2000.00 1500.00 1000.00 500.00 0.00 1 3 5 7 9 11 13 15 17 19 21 Number of PS3s UNCLASSIFIED 5