Transcript paper.ppt

Parallel Apps
November 6, 2000
Hyang-Ah Kim
Brenda Liu
SoYoung Park
Outline






11/6/00
Introduction
Barnes background
Barnes optimizations
Ocean background
Ocean optimizations
Conclusion
Parallel Applications
2
Introduction



Minimum problem size
Scale application performance
Programming models
SAS
CC-NUMA

Parallel efficiency?
 (speedup
11/6/00
over uniprocessor) / p
Parallel Applications
3
Barnes Background

N-body galaxy simulation
Star on w hich forc es
are being computed
Small gr oup far enough aw ay to
approximate to center of mass
Star too close to
approximate

Large group far
enough aw ay to
approximate
Communication pattern?
 Irregular
 Hierarchical
11/6/00
Parallel Applications
4
Barnes Problem Size

Optimizations visited:
 Data
placement
 Dynamic partitioning
 Prefetching

11/6/00
Work needed to scale is
algorithmic
Parallel Applications
5
Scaling Performance

Performance change from 32 to
128 processors?
 Degradation:
Communicationcomputation ratio, communication
pattern, load balance, locality,
synchronization

How can they be overcome?
 Increase
problem size
 Application restructuring
11/6/00
Parallel Applications
6
General Findings

11/6/00
Scaling to 128 processors without
any change
Parallel Applications
7
Scaling Barnes


11/6/00
Memory bottleneck: building
shared tree (31% in 128-proc vs.
2% is uniprocessor)
Original
algorithm:
globally
shared tree
Parallel Applications
8
Scaling Barnes
11/6/00
Parallel Applications
9
Scaling Barnes

11/6/00
New algorithm: MergeTree
Parallel Applications
10
Ocean Background


Ocean simulation using multigrid
solver
Communication pattern?
 Nearest
neighbor iterative
 Hierarchical
11/6/00
Parallel Applications
11
Ocean Problem Size

Optimizations visited:
 Processor-centric

11/6/00
array data
structures
 Data placement
 Prefetching
Work needed to scale is difficult
Parallel Applications
12
Programming Models

Options
 Shared Address
Space
 Message Passing
 SHMEM

Motivation
 if
application is regular / predictable?
 If we can use similar algorithms and
partitions across the models?
11/6/00
Parallel Applications
13
Ocean Discussions
11/6/00
Parallel Applications
14
Ocean Discussions
11/6/00
Parallel Applications
15
Conclusion

Some guidelines
 Load
balancing for moderate
systems, communication for large
systems
 Data partition & placement

Very application dependent
 Optimization
 Programming
11/6/00
model
Parallel Applications
16