Understanding Hardware Selection for ANSYS 15.0

Download Report

Transcript Understanding Hardware Selection for ANSYS 15.0

Understanding Hardware Selection
for ANSYS 15.0
ANSYS IT Solutions Webcast Series
March 2014
Wim Slagter
1
© 2014 ANSYS, Inc. and NVIDIA
March 28, 2014
IT Solutions for ANSYS
Webcast Series
2014 webcast series from ANSYS and our partners
Our goal is to provide ANSYS customers with
•Recommendations on HW and system specification
•Best practice configuration, setup, management
•Roadmap and vision for planning
Past topics included
•Optimizing Remote Access to Simulation
•Simplified HPC Cluster Deployment for ANSYS Users
http://www.ansys.com/Support/Platform+Support/IT+Solutions+for+ANSYS+Webcast+Series
•How to Speed Up ANSYS 15.0 with GPUs
2
© 2014 ANSYS, Inc. and Hewlett-Packard Company
March 28, 2014
ANSYS Focus on IT Solutions
IT is the enabler for more effective use of engineering simulation
Larger, more complex and accurate simulation
More simulations in less time
HPC scale-up and collaboration
3
© 2014 ANSYS, Inc. and Hewlett-Packard Company
March 28, 2014
Computing Capacity Still Limits Simulation
Fidelity for Most Users
Source: HPC Usage survey with over 1,800 ANSYS respondents
4
© 2014 ANSYS, Inc. and Hewlett-Packard Company
March 28, 2014
Today’s Agenda and Speakers
Understanding Hardware Selection for ANSYS 15.0
ANSYS 15.0 HP Z Workstations – Performance Characterization
• Ralph Rocco, Software Engineering, Hewlett-Packard Company
ANSYS 15.0 Applications Performance on HP High Performance Servers
• Don Mize, Software Engineering, Hewlett-Packard Company
Questions & Answers
• Ralph Rocco, Don Mize and Wim Slagter
5
© 2014 ANSYS, Inc. and Hewlett-Packard Company
March 28, 2014
ANSYS 15.0
HP Z Workstations
Performance Characterization
Ralph Rocco / March, 2014
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
http://www.hp.com/ZWorkstations
http://www.hp.com/go/wsansys
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Z820 E5-2687W v2 3.4/4.0 128GB 1866MHz 3x SSD RAID0 Windows 7 64 SP1
2 CPU Cores
2 CPU Cores + 1 GPU
8 CPU Cores
16 CPU Cores
16 CPU Cores + 1 GPU
8
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Z820 E5-2687W v2 3.4/4.0 128GB 1866MHz 3x SSD RAID0 Windows 7 64 SP1
2 CPU Cores
8 CPU Cores
16 CPU Cores
9
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Z820 E5-2687W v2 3.4/4.0 128GB 1866MHz 3x SSD RAID0 Windows 7 64 SP1
2 CPU Cores
8 CPU Cores
16 CPU Cores
10
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Workstation recommendations
ANSYS Fluent 15.0 and CFX 15.0
HP Z420 Intel® Xeon® E5-1680 v2 8-Core 3.0/4.0, 64GB memory, SSD, NVIDIA Quadro K600
HP Z620 Intel® Xeon® E5-2667 v2 16-Core 3.3/4.0, 64GB memory, SSD, NVIDIA Quadro K2000
HP Z820 Intel ® Xeon® E5-2697 v2 24-Core 2.7/3.5, 64GB memory, SSD, NVIDIA Quadro K2000
NVIDIA GPU
Compute Options
Z420 (600W)
Tesla K40
Z620 (800W)
Z820 (1125W)
Max 1
Max 2
Tesla K20c
Max 1
Max 1
Max 2
Quadro K6000
Max 1
Max 1
Max 2
11
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Z820 E5-2687W v2 3.4/4.0GHz 128GB 1866MHz 3x SSD RAID0 Windows 7
64 SP1
1 CPU Cores + 1 GPU
2 CPU Cores
7 CPU Cores + 1 GPU
8 CPU Cores
16 CPU Cores
12
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Workstation recommendations
ANSYS Mechanical 15.0
HP Z420 Intel® Xeon® E5-1660 v2 6-Core 3.7/4.0, 64GB memory, SSD RAID0, Quadro K600
HP Z620 Intel® Xeon® E5-2667 v2 16-Core 3.3/4.0, 96GB memory, SSD RAID0, Quadro K2000
HP Z820 Intel ® Xeon® E5-2687W v2 16-Core 3.4/4.0, 128GB memory, SSD RAID0, Quadro
K2000
13
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Workstation family
ANSYS 15.0 Pre-Processing and Post-Processing
HP
Workstation
Intel®
Processor(s)
# CPU
Cores
Max
Memory GB
Graphics
HP Z1 G2
Xeon® E3-1200v3
4
32
NVIDIA Quadro
Core™ i3, i5
2, 4
Xeon® E3-1200v3
4
32
AMD FirePro
Core™ i5, i7
4, 4
Xeon® E5-1600v2
4, 6, 8
Xeon® E5-2600v2
8
Xeon® E5-1600v2
4, 6, 8
Xeon® E5-2600v2
4, 6, 8, 10, 12
Xeon® E5-2600v2
4, 6, 8, 10, 12
HP Z230
HP Z420
HP Z620
HP Z820
NVIDIA Quadro
64
AMD FirePro
NVIDIA Quadro
96
AMD FirePro
NVIDIA Quadro
512
AMD FirePro
NVIDIA Quadro
HP ZBook 14
Core™ ULT i5, i7
2, 2
16
AMD FirePro
HP ZBook 15, 17
Core™ M i5, i7
2, 4
32
NVIDIA Quadro
14
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Performance Advisor
http://www.hp.com/go/hpperformanceadvisor
Best performance on day one
 One-Click bios setting
optimization for ANSYS
applications
 One-Click HP certified drivers
for your ANSYs applications
15
View entire system
configuration
Identify system
bottlenecks
 Consolidated report of all
hardware, software and drivers

 One-Click System Config
report to expedite support and
reduce down time
 Extensive library of
white papers at your
fingertips
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Quick & easy
performance analysis
ANSYS 15.0 applications
performance on HP high
performance servers
Don Mize / March, 2014
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ANSYS Fluent
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Fluent geometric mean of standard benchmarks.
1400
S
o
l
v
e
r
r
a
t
i
n
g
s
1200
1000
800
600
Geometric mean
10000
400
200
0
1p
2p
4p
6p
8p 10p 12p 14p 16p 18p 20p
processes
Fluent version 15 benchmarks run on BL460c Gen8
servers each with two 10 core 3GHzE5-2690v2 processors,
128 GBs of 1866MHz memory and using FDR InfiniBand
interconnects.
S
o
l
v
e
r
r
a
t
i
n
g
s
9000
8000
7000
6000
5000
Geometric mean
4000
3000
2000
1000
0
1 node
18
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
2 nodes
4 nodes
8 nodes 16 nodes
Fluent speedup of standard benchmarks.
16
14
12
10
Speedup
8
6
14
4
12
2
10
0
1p
Speedup
2p
4p
6p
8p 10p 12p 14p 16p 18p 20p
processes
8
Speedup
6
4
Fluent version 15 benchmarks run on BL460c Gen8
2
servers each with two 10 core 3GHz E5-2690v2 processors,
0
1 node
128 GBs of 1866MHz memory and using FDR InfiniBand
interconnects.
19
Speedup
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
2 nodes
4 nodes
8 nodes
16 nodes
Fluent 15.0 1.2M cell pipe benchmark on SL250Gen8 with two 12 core 2.4GHz
processors, 128GBs of 1866MHz memory and 3 NVIDIA K20X GPUs with turbo off
3500
3000
s
o 2500
l
v
e
2000
r
nogpu
r
a 1500
t
i
n
g 1000
s
1 GPU/node
2 GPUs/node
3 GPUs/node
500
0
1p1n
20
2p1n
4p1n
6p1n
8p1n
10p1n
processes (p); nodes (n)
12p1n
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
8p2n
12p2n
Fluent 15.0 9.6M cell pipe benchmark on four SL250 Gen8 servers with two
12 core 2.4GHz processors, 128GBs 1866MHz memory , and 3 K20 NVIDIA
GPUs per node with turbo off
1200
1000
s
o
l
v
e
r
r
a
t
i
n
g
s
800
no gpus
600
two gpus
three gpus
400
200
0
8p4n
21
16p4n
24p4n
32p4n
processes (p); number of nodes (n)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
40p4n
48p4n
ANSYS CFX
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
CFX benchmarks geometric mean of standard benchmarks.
800
S
o
l
v
e
r
r
a
t
i
n
g
s
700
600
500
400
Geometric mean
300
200
2000
100
0
1p
2p
4p
6p
8p 10p 12p 14p 16p 18p 20p
processes
CFX version 15 benchmarks run on BL460c Gen8
servers each with two 10 core 3GHz processors, 128 GBs
of 1866MHz memory and using FDR InfiniBand interconnects.
S
o
l
v
e
r
r
a
t
i
n
g
s
1800
1600
1400
1200
1000
Geometric mean
800
600
400
200
0
1 node
23
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
2 nodes
4 nodes
8 nodes
CFX benchmarks speedup of standard benchmarks.
16
14
12
10
Speedup
8
6
4
8
2
7
0
1p
Speedup
2p
4p
6p
8p 10p 12p 14p 16p
processes
6
18p 20p
5
4
Speedup
3
CFX version 15 benchmarks run on BL460c Gen8
2
servers each with two 10 core 3GHz processors, 128 GBs
of 1866MHz memory and using FDR InfiniBand
interconnects.
1
0
1 node
Speedup
24
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
2 nodes
4 nodes
8 nodes
Comparison of CFX 15 running the standard benchmarks on BL460Gen8
with10 core 3GHz processors vs. 12 core 2.7GHz processors node by node.
1.2
R
e
1
l
a
t
i
0.8
v
e
p
0.6
e
r
f
o
0.4
r
m
a
n
0.2
c
e
10 core
12 core
0
1 node
25
2 nodes
4 nodes
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
8 nodes
ANSYS CFD (Fluent/CFX) Starter Cluster kit
Server Options:
1 SL210t head node
2-4 ProLiant Xeon nodes, each using 2 processors, in a SL2500 chassis
 16-24 cores per compute node
 2 x SAS drives
Options:
Powerful graphics for use with remote visualization for pre/post
processing.
Total Memory for the Cluster:
Compute nodes: 4 to 8 GBs/core
Head node 32GBs or more depending on role
Cluster Interconnect:
 Integrated Gigabit Ethernet or FDR InfiniBand 2:1 (recommended for jobs
using 2 nodes and above)
Operating Environment: 64-bit Linux, Microsoft (HPC Pack) Server 2012
Workloads:
Ideally suited for ANSYS CFD models up to 200M cells (FLUENT), and
depending on mesh 40 to 200M nodes(CFX) .
HP ProLiant SL210t Gen8 Nodes
in 2U SL2500 chassis
26
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ANSYS CFD (Fluent/CFX): Midsize Cluster
Server Options:
1 DL380 head node
4-32 ProLiant SL250s Xeon nodes, each using 2 processors in a
SL6500 chassis ( 4 nodes per chassis)
 20-24 cores per compute node
 2 to 4 1TB SAS drives per compute node
Up to 3 NVIDIA Kepler K20 or K40 GPUs per node
Options:
Configure the DL380p node with up to 16 internal SAS drives
with extra memory/storage for very large jobs.
42U
Total Memory for the Cluster:
Compute nodes: 4 to 8 GBs/core
Head node 32GBs or more dending on role.
Cluster Interconnect:
 FDR InfiniBand 2:1
Operating Environment: 64-bit Linux, Microsoft (HPC Pack) Server 2012
Workloads:
Ideally suited for 3 simultaneous ANSYS CFD models up to 500M cells
(Fluent), and depending on mesh 100 to 500 nodes(CFX) . Or, run 16 to 22
simultaneous ANSYS CFD models on the scale of 50M cells (Fluent), 10 to
50 M nodes(CFX), again depending on mesh.
27
SL250s
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ANSYS CFD (Fluent/CFX): Large Scale-Out Cluster
Server Options:
1 DL380p Head node
32-64 ProLiant BL460c nodes, each using 2 processors
 20 - 24 cores per compute node
 Two 1.2TB SAS drives per compute node
 Options:
 WS460c Gen8 workstation blade with NVIDIA Quadro K6000 graphics
card for pre/post processing using remote visualization
Configure the head node with extra memory/storage for very large jobs
42U
Total Memory for the Cluster:
Compute nodes: 4 to 8 GBs/core
Head node 32GBs or more depending on role
Cluster Interconnect:
 FDR InfiniBand 2:1
Operating Environment: 64-bit Linux, Microsoft (HPC Pack) Server 2012
Workloads: Ideally suited for 3 simultaneous ANSYS CFD models greater than 500M cells (Fluent), and greater than100 or 500M nodes (CFX)
depending on mesh. Or running greater than 22 simultaneous ANSYS CFD models on the scale of 50M cells (Fluent), 10 to 50M nodes (CFX)
depending on mesh.
28
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ANSYS Mechanical
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Results of standard benchmarks for ANSYS Mechanical 15.0
400
s
o 350
l
v 300
e
250
r
200
r
a 150
t
i 100
n
g 50
s
0
Geometric mean 12
10
8
processes (p); number of nodes (n)
Speedup
6
Speedup
Mechanical version 15 benchmarks run on BL460c Gen8
servers each with two 10 core 3GHz processors, 128 GBs
of 1866MHz memory
4
2
0
30
1p1n 2p1n 4p1n 6p1n 8p1n 10p1n12p1n14p1n16p1n18p1n20p1n
processes (p); number of nodes (n)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Mechanical v15 benchmarks ran on cluster of BL460c Gen8 two 3GHz
processors, 128 GBs memory, FDR Infiniband
1000
900
S
o
l
v
e
r
r
a
t
i
n
g
s
800
700
600
500
Geometric mean
400
300
200
100
0
processes (p); number of nodes (n)
31
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Geometric mean of SP-4 and SP-5 benchmarks on SL250 Gen8 with two 2.4GHz processors,
128GBs of 1866MHz memory and 3 NVIDIA K20 GPUs
700
600
s
o 500
l
v
e
r 400
nogpu
r
a 300
t
i
n
g 200
s
one gpu/node
two gpus/node
three gpus/node
100
0
1p1n
32
2p1n
4p1n
6p1n
8p1n
10p1n
12p1n
processes (p); number of nodes (n)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
8p2n
12p2n
48p2n
ANSYS Mechanical: Starter Cluster Kit
Server Options:
1 SL210t head node
2-4 ProLiant Xeon nodes, each using 2 processors, in a SL2500 chassis
16-24 cores per compute node
 2 x 400GB SSD drives (Raid 0)
Options:
Powerful graphics for use with remote visualization for pre/post
processing
Total Memory for the Cluster:
Compute nodes: 4 to 8 GB/core
Head node 32 GBs or more depending on role
Cluster Interconnect:
 Integrated Gigabit Ethernet or FDR InfiniBand (recommended for jobs
using 2 nodes and above) .
Operating Environment: 64-bit Linux, Microsoft HPC Server 2008
Workloads:
Ideally suited for Mechanical up to 48 or 480M DOFs depending on solver
used
HP ProLiant SL210t Gen8 Nodes
in 2U SL2500 chassis
33
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
ANSYS Mechanical (Structural Analysis): Fat Node Cluster
Server Options:
 1 DL380 head node
 4-8 ProLiant DL380p Xeon server nodes, each using 2 processors (20 cores) and
2 to 16 internal 600GB SAS 15K drives or 800GB SAS SSDs striped RAID 0 per
compute node plus a 6x2TB SAS RAID0 disk array on head node
 Or 4 - 8 SL250s Xeon server nodes, each using 2 processors (20 cores), 3
NVIDIA Kepler K20 GPUs and 2 internal SAS 15K drives or 800GB SAS SSDs
per compute node (suitable for nonlinear jobs > = 2M DOF)
Total Memory for Cluster:
 8GB/core on Head node
 4 to 8 GB/core on each remaining
compute nodes
42U
Cluster Interconnect: FDR InfiniBand
Operating Environment: 64-bit Linux OR Microsoft HPC Server 2008
Workloads: 320 - 1280GB RAM configurations will handle up to 7 simultaneous
running ANSYS “megamodels” of 45-180M DOFs
SL250s
DL380p
34
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Final Remarks
36
© 2013 ANSYS, Inc.
March 28, 2014
15.0 HPC Licensing Enabling GPU Acceleration
- One HPC Task Required to Unlock one GPU!
Licensing Examples:
1 x ANSYS HPC Pack
Total 8 HPC Tasks (4 GPUs Max)
2 x ANSYS HPC Pack
Example of Valid Configurations:
6 CPU Cores + 2 GPUs
4 CPU Cores + 4 GPUs
24 CPU Cores + 8 GPUs
Total 32 HPC Tasks (16 GPUs Max)
.
.
.
.
(Applies to all license.schemes: ANSYS HPC,
ANSYS HPC Pack, ANSYS HPC Workgroup)
37
© 2014 ANSYS, Inc.
March 28, 2014
(Total Use of 2 Compute Nodes)
NEW at R15.0
Wrap Up / Next Steps
Contact us:
[email protected]
HP contact information:
http://www.ansys.com/About+ANSYS/Partner+Programs/HPC+Partners/ci.Hewlett-Packard.cz
38
© 2014 ANSYS, Inc.
March 28, 2014