Presentation Title Arial 36

Transcript Presentation Title Arial 36

Performance Enhancements in
MSC.Nastran for Large Scale Design
Optimization on Cray SV1 Computers
Dr. D. Obrist, Dr. H. Misra, Cray Inc.
Dr. S. Zhang, Dr. D. Chou, MSC.Software Corp.
Large Scale Design Optimization
• joint project between MSC.Software Corp. and
Cray Inc. (October 2000 - July 2001)
• … with the goal to
enhance the design
optimization capabilities of
MSC.Nastran on the Cray
SV1(ex)
 2-4x shorter
turnaround time
Characteristics of Large Scale
Optimization Problems
• millions of degrees of freedom
• hundreds of design variables and
responses
• hundreds of modes
 prohibitive turnaround times for
simulations
 excessively large I/O
List of enhancements
1. exploit the sparsity of the design model
2. improved data management to process DSADJ in
a single pass - improved vectorization
3. highly optimized matrix-matrix multiplications
from the Cray Scientific Library
4. optimized sparse matrix I/O
5. parallelization of DSADJ and DSVG1
6. misc. improvements (GP5, EMG, MPYAD, PARTN,
MERGE, SADD5, etc.)
Sparsity of the design model
In many design optimization tasks only a small number of
elements are modified during the design process (sparse
design set)
Example: data recovery sub-dmap DISPRS
u
p
=

uumm 
û
p
=
’
um
Data recovery sub-dmap DISPRS
Industry example:
•
•
•
•
•
design model is 25% sparse
2’091’102 DOF
251 modes
128 design variables
2931 retained responses
amount of I/O is
reduced by 4x
CPU time is reduced
by 5x
250
200
150
100
50
0
I/O GB
12000
10000
8000
6000
4000
2000
0
CPU seconds
v70.7.3
v2001/enhanced
Improved data management in
DSADJ - single pass
ndof
1
2
3
4
u
ndof
nsol
u
nsol
5
6
Instead of loading all nodal
displacements in multiple
passes …
longer vectors
 improved
singlevectorization
pass
 reduced scalar overhead
… only the displacements of
the element nodes are loaded
on demand.
Industry Example I:
DSADJ performance
NRESP
NSOL
617346
277
102
706
167
10000
8000
seconds
NDOF
NMODES
NDV
6000
4000
2000
0
7x improved!
v70.7.3
v2001/enhanced
Industry Example II:
DSADJ performance
NRESP
NSOL
777725
250
149
1104
276
25000
20000
seconds
NDOF
NMODES
NDV
15000
10000
5000
0
9x improved!
v70.7.3
v2001/enhanced
NDOF
NMODES
NDV
NRESP
NSOL
2091102
251
128
2931
850
13x improved!
seconds
Industry Example III:
DSADJ performance
60000
50000
40000
30000
20000
10000
0
v70.7.3
v2001/enhanced
Industry Example IV:
DSVG1 performance
NRESP
NSOL
685227
212
307
159
159
10000
8000
seconds
NDOF
NMODES
NDV
6000
4000
2000
0
10x improved!
v70.7.3
v2001/enhanced
Parallelization of DSVG1 and DSADJ
Parallel runs of the Industry Example III (2 million
DOF) with 1, 2, and 4 processors.
elapsed seconds
4000
3000
FLOPS
2000
1000
I/0
0
1 processor
2 processors 4 processors
Total Improvements over one
Design Cycle
90000
80000
seconds
70000
60000
50000
40000
30000
v70.7.3
Overall improvement: 2-4x
!
v2001/enhanced
20000
10000
0
I
II
III
IV
Conclusions
The turnaround time for a large
design optimization task is
dramatically reduced (2-4x) ...
The performance is independent of
the open core memory size ...
Larger problems can be
solved in less time!