CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 04/06/2006 CS267 Lecture 22a.

Download Report

Transcript CS 267: Applications of Parallel Computers Final Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 04/06/2006 CS267 Lecture 22a.

CS 267: Applications of Parallel Computers
Final Project Suggestions
James Demmel
www.cs.berkeley.edu/~demmel/cs267_Spr06
04/06/2006
CS267 Lecture 22a
1
Outline
• Kinds of projects
• Evaluating and improving the performance of a parallel application
•
“Application” could be full scientific application, or important kernel
• Parallelizing a sequential application
•
other kinds of performance improvements possible too, eg memory
hierarchy tuning
• Devise a new parallel algorithm for some problem
• Porting parallel application or systems software to new
architecture
• Example of previous projects (all on-line)
• Upcoming guest lecturers
• See their previous lectures, or contact them, for project ideas
• Suggested projects
04/06/2006
CS267 Lecture 22a
2
CS267 Class Projects from 2004
•BLAST Implementation on BEE2 — Chen Chang
•PFLAMELET; An Unsteady Flamelet Solver for Parallel
Computers — Fabrizio Bisetti
•Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and
Guille Díez-Cañas
•Parallel Simulation in Metropolis — Guang Yang
•A Survey of Performance Optimizations for Titanium
Immersed Boundary Simulation — Hormozd Gahvari, Omair
Kamil, Benjamin Lee, Meling Ngo, and Armando Solar
•Parallelization of oopd1 — Jeff Hammel
•Optimization and Evaluation of a Titanium Adaptive Mesh
Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy
Su
04/06/2006
CS267 Lecture 22a
3
CS267 Class Projects from 2004 (cont)
•Communication Savings With Ghost Cell Expansion For
Domain Decompositions Of Finite Difference Grids —
C. Zambrana Rojas and Mark Hoemmen
•Parallelization of Phylogenetic Tree Construction — Michael
Tung
•UPC Implementation of the Sparse Triangular Solve and
NAS FT — Christian Bell and Rajesh Nishtala
•Widescale Load Balanced Shared Memory Model for
Parallel Computing — Sonesh Surana, Yatish Patel, and Dan
Adkins
04/06/2006
CS267 Lecture 22a
4
Planned Guest Lecturers
• Katherine Yelick (UPC, heart modeling)
• David Anderson (volunteer computing)
• Kimmen Sjolander (phylogenetic analysis of proteins –
SATCHMO – Bonnie Kirkpatrick)
• Julian Borrill, (astrophysical data analysis)
•
•
•
•
Wes Bethel, (graphics and data visualization)
Phil Colella, (adaptive mesh refinement)
David Skinner, (tools for scaling up applications)
Xiaoye Li, (sparse linear algebra)
• Osni Marques and Tony Drummond, (ACTS Toolkit)
• Andrew Canning (computational neuroscience)
• Michael Wehner (climate modeling)
04/06/2006
CS267 Lecture 22a
5
Suggested projects (1)
• Weekly research group meetings on these and related
topics (see J. Demmel and K. Yelick)
• Contribute to upcoming ScaLAPACK release (JD)
• Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for
latest
• Performance evaluation of existing parallel algorithms
•
Ex: New eigensolvers based on successive band reduction
• Improved implementations of existing parallel algorithms
•
Ex: Use UPC to overlap communication, computation
• Many serial algorithms to be parallelized
•
04/06/2006
See following slides
CS267 Lecture 22a
6
Missing Drivers in Sca/LAPACK
LAPACK
Linear Equations
LU
xGESV
PxGESV
Cholesky
xPOSV
PxPOSV
LDLT
xSYSV
missing
xGELS
PxGELS
QR+pivot
xGELSY
missing
SVD/QR
xGELSS
missing
SVD/D&C
xGELSD
missing (intent?)
SVD/MRRR
missing
missing
QR + iterative refine.
missing
missing
LS + equality constr.
xGGLSE
missing
Generalized LM
xGGGLM
missing
Above + Iterative ref.
missing
missing
Least Squares (LS) QR
Generalized LS
04/06/2006
ScaLAPACK
CS267 Lecture 22a
7
More missing drivers
LAPACK
ScaLAPACK
Symmetric EVD
QR / Bisection+Invit
D&C
MRRR
xSYEV / X
xSYEVD
xSYEVR
PxSYEV / X
PxSYEVD
missing
Nonsymmetric EVD
Schur form
Vectors too
xGEES / X
xGEEV /X
missing driver
missing driver
SVD
QR
D&C
xGESVD
xGESDD
PxGESVD
missing (intent?)
MRRR
Jacobi
missing
missing
missing
Missing
Generalized Symmetric EVD
QR / Bisection+Invit
D&C
MRRR
xSYGV / X
xSYGVD
missing
PxSYGV / X
missing (intent?)
missing
Generalized Nonsymmetric
EVD
Schur form
xGGES / X
missing
Vectors too
xGGEV / X
missing
Generalized SVD
Kogbetliantz
xGGSVD
missing (intent)
MRRR
missing
missing
04/06/2006
CS267 Lecture 22a
8
Suggested projects (2)
• Contribute to sparse linear algebra (JD & KY)
• Performance tuning to minimize latency and bandwidth costs,
both to memory and between processors (sparse => few flops
per memory reference or word communicated)
• Typical methods (eg CG = conjugate gradient) do some number
of dot projects, saxpys for each SpMV, so communication cost
is O(# iterations)
• Our goal: Make latency cost O(1)!
• Requires reorganizing algorithms drastically, including replacing
SpMV by new kernel [Ax, A2x, A3x, … , Akx], which can be done
with O(1) messages
• Projects
•
•
•
04/06/2006
Study scalability bottlenecks of current CG on real, large matrices
Optimize [Ax, A2x, A3x, … , Akx] on sequential machines
Optimize [Ax, A2x, A3x, … , Akx] on parallel machines
CS267 Lecture 22a
9
Suggested projects (3)
• Evaluate new languages on applications (KY)
• UPC or Titanium
•
UPC for asynchrony, overlapping communication & computation
• ScaLAPACK in UPC
• Use UPC-based 3D FFT in your application
• Optimize existing 1D FFT in UPC, to use 3D techniques
• Porting, Evaluating parallel systems software (KY)
• Port UPC to RAMP
• Port GASNET to Blue Gene, evaluate performance
04/06/2006
CS267 Lecture 22a
10