Matt McKenzie, LONI presentation

Download Report

Transcript Matt McKenzie, LONI presentation

Preliminary CPMD Benchmarks

On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI

What is CPMD ?

• • Car Parrinello Molecular Dynamics ▫ www.cpmd.org

Parallelized plane wave / pseudopotential implementation of Density Functional Theory • Common chemical systems: liquids, solids, interfaces, gas clusters, reactions ▫ Large systems ~500atoms  Scales w/ # electrons NOT atoms

Key Points in Optimizing CPMD

• Developers have done a lot of work here • The Intel compiler is used in this study • • BLAS/LAPACK ▫ BLAS levels 1 (vector ops) and 3 (matrix-matrix ops)  Some level 2 (vector-matrix) Integrated optimized FFT Library ▫ Compiler flag: -DFFT_DEFAULT

Benchmarking CPMD is difficult because…

• Nature of the modeled chemical system ▫ Solids, liquids, interfaces  Require different parameters stressing the memory along the way ▫ Volume and # electrons • Choice of the pseudopotential (psp) ▫ Norm-conserving, ‘soft’, non-linear core correction (++memory) • Type of simulation conducted ▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc… ▫ CPMD is a robust code • Very chemical system specific ▫ Any one CPMD sim. cannot be easily compared to another ▫ However, THERE ARE TRENDS • FOCUS: simple wave function optimization timing ▫ This is a common ab initio calculation

Probing Memory Limitations

• • • • For any ab initio calculation: Accuracy is proportional to # basis sets used Stored in matrices, requiring increased RAM Energy cutoff determines the size of the Plane wave basis set,

N

PW

= (1/2π

2

)ΩE

cut 3/2

Model Accuracy & Memory Overview

Image obtained from the CPMD user’s manual Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff) NOTE: Choice of psp is important i.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in soft psp’s ; CPMD works with any psp’s

Memory Comparison

Ψ optimization , 63 Si atoms, SGS psp

Ecut = 50 Ryd Ecut = 70 Ryd

• • N PW ≈ 134,000 Memory = 1.0 GB • • N PW ≈ 222,000 Memory = 1.8 GB Well known CPMD benchmarking model: www.cpmd.org

Results can be shown either by: Wall time = (n steps x iteration time/step) + network overhead Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation It neglects the network, yet results are comparable Note, CPMD runs well on a few nodes connected with gigabyte ethernet Two important factor which affects CPMD performance

MEMORY BANDWIDTH FLOATING-POINT

Pople, Abe, Ranger CPMD Benchmarks

8 Pople 50 Ryd Pople 70 Ryd 7 Abe 50 Ryd Abe 70 Ryd 6 5 4 3 2 1 0 0 32 64 Ranger 50 Ryd 96 128 160

Number of Cores

192 Ranger 70 Ryd 224 256

Results I

• • All calculations ran no longer than 2 hours Ranger is not the preferred machine for CPMD • Scales well between 8 and 96 cores ▫ This is a common CPMD trend • CPMD is known to super-linearity scale above ~1000 processors ▫ Will look into this ▫ Chemical system would have to change as this smaller simulation is likely not to scale in this manner

Results II

• Pople and Abe gave the best performance • IF a system requires more than 96 procs, Abe would be a slightly better choice • Knowing the difficulties in benchmarking CPMD, ( psp, volume, system phase, sim. protocol ) this benchmark is not a good representation of all the possible uses of CPMD.

▫ Only explored one part of the code • How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance ▫ To increase system accuracy, increase E cut

Percent Difference

between 70 and 50 Ryd %Diff = [(t 70 -t 50 ) / t 50 ]*100 70 60 50 40 30 20 10 0 0 32 64 96 128

Number of Cores

160 Pople Abe Ranger 192 224 256

Conclusions

RANGER • Re-ran Ranger calculations • Lower performance maybe linked to Intel compiler on AMD chips ▫ PGI compiler could show an improvement ▫ Nothing over 5% is expected: still be the slowest ▫ Wanted to use the same compiler/math libraries ABE • • Possible super-linear scaling, t Abe, 256procs < t others, 256procs Memory size effects hinders performance below 96 procs POPLE • Is the best system for wave function optimization • Shows a (relatively) stable, modest speed decrease as the memory requirement is increased, it is the recommended system

Future Work

• • Half-node benchmarking Profiling Tools • Test the MD part of CPMD ▫ Force calculations involving the non-local parts of the psp will increase memory ▫ Extensive level 3 BLAS & some level 2 ▫ Many FFT all-to-all calls, Now the network plays a role ▫ Memory > 2 GB  A new variable ! Monitor the fictitious electron mass • Changing the model ▫ Metallic system (lots of electrons, change of psp; E cut ) ▫ Check super-linear scaling