CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 04/06/2006 CS267 Lecture 22a.
Download ReportTranscript CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 04/06/2006 CS267 Lecture 22a.
CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 04/06/2006 CS267 Lecture 22a 1 Outline • Kinds of projects • Evaluating and improving the performance of a parallel application • “Application” could be full scientific application, or important kernel • Parallelizing a sequential application • other kinds of performance improvements possible too, eg memory hierarchy tuning • Devise a new parallel algorithm for some problem • Porting parallel application or systems software to new architecture • Example of previous projects (all on-line) • Upcoming guest lecturers • See their previous lectures, or contact them, for project ideas • Suggested projects 04/06/2006 CS267 Lecture 22a 2 CS267 Class Projects from 2004 •BLAST Implementation on BEE2 — Chen Chang •PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers — Fabrizio Bisetti •Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and Guille Díez-Cañas •Parallel Simulation in Metropolis — Guang Yang •A Survey of Performance Optimizations for Titanium Immersed Boundary Simulation — Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando Solar •Parallelization of oopd1 — Jeff Hammel •Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy Su 04/06/2006 CS267 Lecture 22a 3 CS267 Class Projects from 2004 (cont) •Communication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids — C. Zambrana Rojas and Mark Hoemmen •Parallelization of Phylogenetic Tree Construction — Michael Tung •UPC Implementation of the Sparse Triangular Solve and NAS FT — Christian Bell and Rajesh Nishtala •Widescale Load Balanced Shared Memory Model for Parallel Computing — Sonesh Surana, Yatish Patel, and Dan Adkins 04/06/2006 CS267 Lecture 22a 4 Planned Guest Lecturers • Katherine Yelick (UPC, heart modeling) • David Anderson (volunteer computing) • Kimmen Sjolander (phylogenetic analysis of proteins – SATCHMO – Bonnie Kirkpatrick) • Julian Borrill, (astrophysical data analysis) • • • • Wes Bethel, (graphics and data visualization) Phil Colella, (adaptive mesh refinement) David Skinner, (tools for scaling up applications) Xiaoye Li, (sparse linear algebra) • Osni Marques and Tony Drummond, (ACTS Toolkit) • Andrew Canning (computational neuroscience) • Michael Wehner (climate modeling) 04/06/2006 CS267 Lecture 22a 5 Suggested projects (1) • Weekly research group meetings on these and related topics (see J. Demmel and K. Yelick) • Contribute to upcoming ScaLAPACK release (JD) • Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for latest • Performance evaluation of existing parallel algorithms • Ex: New eigensolvers based on successive band reduction • Improved implementations of existing parallel algorithms • Ex: Use UPC to overlap communication, computation • Many serial algorithms to be parallelized • 04/06/2006 See following slides CS267 Lecture 22a 6 Missing Drivers in Sca/LAPACK LAPACK Linear Equations LU xGESV PxGESV Cholesky xPOSV PxPOSV LDLT xSYSV missing xGELS PxGELS QR+pivot xGELSY missing SVD/QR xGELSS missing SVD/D&C xGELSD missing (intent?) SVD/MRRR missing missing QR + iterative refine. missing missing LS + equality constr. xGGLSE missing Generalized LM xGGGLM missing Above + Iterative ref. missing missing Least Squares (LS) QR Generalized LS 04/06/2006 ScaLAPACK CS267 Lecture 22a 7 More missing drivers LAPACK ScaLAPACK Symmetric EVD QR / Bisection+Invit D&C MRRR xSYEV / X xSYEVD xSYEVR PxSYEV / X PxSYEVD missing Nonsymmetric EVD Schur form Vectors too xGEES / X xGEEV /X missing driver missing driver SVD QR D&C xGESVD xGESDD PxGESVD missing (intent?) MRRR Jacobi missing missing missing Missing Generalized Symmetric EVD QR / Bisection+Invit D&C MRRR xSYGV / X xSYGVD missing PxSYGV / X missing (intent?) missing Generalized Nonsymmetric EVD Schur form xGGES / X missing Vectors too xGGEV / X missing Generalized SVD Kogbetliantz xGGSVD missing (intent) MRRR missing missing 04/06/2006 CS267 Lecture 22a 8 Suggested projects (2) • Contribute to sparse linear algebra (JD & KY) • Performance tuning to minimize latency and bandwidth costs, both to memory and between processors (sparse => few flops per memory reference or word communicated) • Typical methods (eg CG = conjugate gradient) do some number of dot projects, saxpys for each SpMV, so communication cost is O(# iterations) • Our goal: Make latency cost O(1)! • Requires reorganizing algorithms drastically, including replacing SpMV by new kernel [Ax, A2x, A3x, … , Akx], which can be done with O(1) messages • Projects • • • 04/06/2006 Study scalability bottlenecks of current CG on real, large matrices Optimize [Ax, A2x, A3x, … , Akx] on sequential machines Optimize [Ax, A2x, A3x, … , Akx] on parallel machines CS267 Lecture 22a 9 Suggested projects (3) • Evaluate new languages on applications (KY) • UPC or Titanium • UPC for asynchrony, overlapping communication & computation • ScaLAPACK in UPC • Use UPC-based 3D FFT in your application • Optimize existing 1D FFT in UPC, to use 3D techniques • Porting, Evaluating parallel systems software (KY) • Port UPC to RAMP • Port GASNET to Blue Gene, evaluate performance 04/06/2006 CS267 Lecture 22a 10