IlTodorov_slides

Download Report

Transcript IlTodorov_slides

DL_POLY: Software and Applications

I.T. Todorov & W. Smith ARC Group & CC Group CSED, STFC Daresbury Laboratory, Daresbury Warrington WA4 1EP, Cheshire, England, UK

Where is Daresbury?

Molecular Dynamics: Definitions • Theoretical tool for modelling the detailed microscopic behaviour of many different types of systems, including gases, liquids, solids, surfaces and clusters.

• In an MD simulation, the classical equations of motion governing the microscopic time evolution of a many body system are solved numerically, subject to the boundary conditions appropriate for the geometry or symmetry of the system.

• Can be used to monitor the microscopic mechanisms of energy and mass transfer in chemical processes, and dynamical properties such as absorption spectra, rate constants and transport properties can be calculated.

• Can be employed as a means of sampling from a statistical mechanical ensemble and determining equilibrium properties. These properties include average thermodynamic quantities (pressure, volume, temperature, etc.), structure, and free energies along reaction paths.

DL_POLY Project Background • General purpose parallel (classical) MD simulation software • It was conceived to meet the needs of CCP5 - The Computer Simulation of Condensed Phases (academic collaboration community) • Written in modularised Fortran90 (NagWare & FORCHECK compliant) with MPI2 (MPI1+MPI-I/O) fully self-contained • 1994 – 2011: DL_POLY_2 (RD) by W. Smith & T.R. Forester • (funded for 6 years by EPSRC at DL) -> DL_POLY_CLASSIC 2003 – 2011: DL_POLY_3 (DD) by I.T. Todorov & W. Smith • (funded for 4 years by NERC at Cambridge) -> DL_POLY_4 Over 11,000 licences taken out since 1994 • Over 1000 registered FORUM members since 2005 • Available free of charge (under licence) to University researchers (provided as code) and at cost to industry

DL_POLY_DD Development Statistics

DL_POLY_DD Licence Statistics

DL_POLY Licence Statistics

DL_POLY Licence Statistics

DL_POLY Licence Statistics

DL_POLY Project Current State • January 2011: DL_POLY_2 -> DL_POLY_CLASSIC on a BSD type Licence (BS retired but supporting GUI and fixes) • October 2010: DL_POLY_3 -> DL_POLY_4 still under STFC Licence, over 1300 licences taken out since November 2010 • Rigid Body dynamics • Parallel I/O & netCDF I/O – NAG dCSE (IJB & ITT) • CUDA+OpenMP port (source, ICHEC) & MS Windows port (installers) • SPME processor grid freed from 2^N decomposition – NAG dCSE (IJB) • • Load Balancer development (LJE, finished 30/03/2011) Continuous Development of DL_FIELD (pdb to DLP I/O, CY)

Current Versions • • DL_POLY_4 ( version 1.2

) – Dynamic Decomposition domain decomposition but with dynamic load balancing – limits up to ≈2.1

× 10 9 parallelisation, based on atoms with inherent parallelisation.

– Full force field and molecular description with rigid body description – Free format (flexible) reading with some fail-safe features and basic reporting (but fully fool-proofed) DL_POLY Classic ( version 1.6

) – Replicated Data parallelisation, limits up to ≈30,000 atoms with good parallelisation up to 64 (system dependent) processors (running on any processor count) – Full force field and molecular description – Hyper-dynamics: Temperature Accelerated Dynamics & Biased Potential Dynamics, Solvation Dynamics – Spectral Shifts, Metadynamics, Path Integral MD – Free format reading but somewhat strict

Point ions and atoms Polarisable ions (core+ shell) Flexible molecules Rigid bonds Supported Molecular Entities Rigid molecules Flexibly linked rigid molecules Rigid bond linked rigid molecules

Force Field Definitions – I •

particle:

rigid ion or atom (charged or not), a core or a shell of a polarisable ion(with or without associated degrees of freedom), a massless charged site. A particle is a countable object and has a global ID index.

site:

a particle prototype that serves to defines the chemical & physical nature (topology/connectivity/stoichiometry) of a particle (mass, charge, frozen-ness). Sites are not atoms they are prototypes!

Intra-molecular interactions:

chemical bonds, bond angles, dihedral angles, improper dihedral angles, inversions. Usually, the members in a unit do not interact via an inter-molecular term. However, this can be overridden for some interactions. These are defined by

site .

Inter-molecular interactions:

Defined by

species

.

van der Waals, metal (EAM, Gupta, Finnis-Sinclair, Sutton-Chen), Tersoff, three-body, four-body.

Force Field Definitions – II •

Electrostatics:

Standard Ewald * , Hautman-Klein (2D) Ewald * , SPM Ewald (3D FFTs), Force-Shifted Coulomb, Reaction Field, Fennell damped FSC+RF, Distance dependent dielectric constant, Fuchs correction for non charge neutral MD cells.

Ion polarisation

via Dynamic (Adiabatic) or Relaxed shell model.

External fields:

Electric, Magnetic, Gravitational ,Oscillating & Continuous Shear, Containing Sphere, Repulsive Wall.

Intra-molecular like interactions:

constraint and PMF units, rigid body units. These are also defined by

site .

tethers, core shells units, •

Potentials:

parameterised analytical forms defining the interactions. These are always spherically symmetric!

THE CHEMICAL NATURE OF PARTICLES DOES NOT CHANGE IN SPACE AND TIME!!!

Force Field by Sums V(  r 1 ,  r 2 ,.....,  r N )  N'  i, j i, N'  j, k U Tersoff   i r ,  r j ,  r k  U pair (|  i r   r j |)  i, N'  j, k U 3  body  1 4π  0 N'  i, j |  q i r i  q j  r j |    i r ,  r j ,  r k   i, N'  j, k, n U 4  body ε metal     N'  i, j V pair (|  i r   r j |)  N  i F    N'  i, j ρ ij ( |  i r   r j | )         N i b o n d  b o n d U bond  i bond ,  r a ,  r b   N i an g le  an g le U angle  i angle ,  r a ,  r b ,  r c   N i d ih ed  d ih ed U dihed N i teth er  teth er U tether  i dihed ,  r a ,  r b ,  r c ,  r d    i tether ,  r t ,  r t  0   N i in v ers  in v ers U invers i N co re -sh ell  U co re -sh ell core shell  i invers ,  r a ,  r b ,  r c ,  r d    i core shell , |  i r   r j  |    i r ,  r j ,  r k ,  r n   i N   1 Φ external   i

Ensembles and Algorithms Integration: Available as velocity Verlet (VV) or leapfrog Verlet (LFV) generating flavours of the following ensembles • NVE • • NVT (E kin ) NVT Evans Andersen ^ , Langevin ^ , Berendsen, Nosé-Hoover • NPT Langevin ^ , Berendsen, Nosé-Hoover, Martyna Tuckerman-Klein ^ • N  T/NPnAT/NPn  T Langevin ^ , Berendsen, Nosé Hoover, Martyna-Tuckerman-Klein ^ • • Constraints & Rigid Body Solvers: VV dependent – RATTLE, No_Squish, QSHAKE * LFV dependent – SHAKE, Euler-Quaternion, QSHAKE *

Assumed Parallel Architecture

DL_POLY is designed for homogenious distributed parallel machines

M 0 P 0 P 4 M 4 M 1 P 1 P 5 M 5 M M 2 3 P P 2 3 P 6 M 6 P 7 M 7

Replicated Data

A Initialize Forces Motion Statistics Summary B Initialize Forces Motion Statistics Summary C Initialize Forces Motion Statistics Summary D Initialize Forces Motion Statistics Summary

Molecular force field definition

Bonded Forces within RD

P

0

Local force terms P

1

Local force terms P

2

Local force terms

RD Scheme for long-ranged part of SPME U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen,

J. Chem. Phys.

, (1995),

103,

8577 1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

Calculate self interaction correction Initialise FFT routine (FFT Calculate B-spline coefficients Convert atomic coordinates to scaled fractional units Construct B-splines Construct charge array Q Calculate FFT of Q array Construct array G Calculate FFT of G array – 3D FFT) Calculate net Coulombic energy Calculate atomic forces

A B

21

C D

Domain Decomposition

Molecular force field definition

Bonded Forces within DD

Tricky!

P

0

Local atomic indices P

1

Local atomic indices P

2

Local atomic indices

DD Scheme for long-ranged part of SPME 1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen,

J. Chem. Phys.

,

103,

8577 (1995) Calculate self interaction correction Initialise

FFT

routine (

FFT

– IJB’s DaFT: 3M 2 1D FFT) Calculate B-spline coefficients Convert atomic coordinates to scaled fractional units Construct B-splines Construct

partial

charge array Q Calculate

FFT

of Q array Construct

partial

array G Calculate

FFT

of G array Calculate net Coulombic energy Calculate atomic forces I.J. Bush, I.T. Todorov, W. Smith,

Comp. Phys. Commun.

,

175

, 323 (2006)

Performance Weak Scaling on IBM p575 2005-2011 1000 800 600 400 200 0 0 Solid Ar (32'000 atoms per CPU) NaCl (27'000 ions per CPU) SPC Water (20'736 ions per CPU) pe rfe ct pa ra lle lis ati on

33 million atoms 28 million atoms

200

21 million atoms

400 max load max load max load good pa rallelisa tion 700'000 atoms per 1GB/CPU 220'000 ions per 1GB/CPU 210'000 ions per 1GB/CPU 600 Processor Count 800 1000

Rigid Bodies versus Constraints 450,000 particles with DL_POLY_4

Scaling

10 9 8 7 1 0 6 3 2 5 4 0 ICE7 ICE7_CB 100 200 300

Np

400 500 600

I/O Weak Scaling on IBM p575 2005-2007 800 600 Solid Ar NaCl SPC Water dashed lines show shut-down times solid lines show start-up times 400 200 0 0 200 400 600 Processor Count 800 1000

Benchmarking BG/L Jülich 2007 16000 14000 12000 10000 8000 6000 4000 2000 Perfect MD step total Link cells van der Waals Ewald real Ewald k-space 2000 4000 14.6 million particle Gd 2 Zr 2 O 7 system 6000 8000 10000 Processor count 12000 14000 16000

Benchmarking XT4/5 UK 2010 8000 7000 6000 5000 Perfect MD step total Link cells van der Waals Ewald real Ewald k-space 4000 3000 2000 1000 1000 2000 3000 14.6 million particle Gd 2 Zr 2 O 7 system 4000 5000 Processor count 6000 7000 8000

Benchmarking on Various Platforms 9 8 7 6 5 1 0 4 3 2 CRAY XT4 SC CRAY XT4 DC CRAY XT3 SC / IBM P6+ CRAY XT3 DC BG/L BG/P IBM p575 3GHz Woodcrest DC 0 500 3.8 million particle Gd 2 Zr 2 O 7 system 1000 Processor count 1500 2000

Importance of I/O - I Types of MD studies most dependent on I/O • Large length-scales (10 9 deformations particles) , short time-scale such as screw • Medium big length-scales (10 6 –10 8 particles), medium time-scale (ps-ns) such as radiation damage cascades • Medium length-scale (10 5 –10 6 particles), long time-scale (ns  s) such as membrane and protein processes Types of I/O: portable human readable loss of precision size • ASCII + + – – • Binary – • XDR Binary + – – + + + +

Importance of I/O - II

Example

: 15 million system simulated with 2048 MPI tasks MD time per timestep ~0.7 (2.7) seconds on Cray XT4 (BG/L) Configuration read ~100 sec. (once during the simulation) Configuration write ~600 sec. for 1.1 GB with the fastest I/O method – MPI-I/O for Cray XT4 (parallel direct access for BG/L).

BG/L 16,000 MPI tasks – MD time per timestep 0.5 sec. with a configuration write a frame ~18,000 sec.

I/O in native binary is only 3-5 times faster and 3-7 times smaller

Some unpopular solutions • Saving only the important fragments of the configuration • Saving only fragments that have moved more than a given distance between two consecutive dumps • Distributed dump – separated configuration in separate files for each MPI task (CFD)

I/O Solutions in DL_POLY_4 1.

Serial read and write (sorted/unsorted)

– where only a single MPI task, the master, handles it all and all the rest communicate in turn to or get broadcasted to while the master completes writing a configuration of the time evolution.

2.

Parallel write via direct access or MPI-I/O (sorted/unsorted)

where

ALL / SOME

MPI tasks print in the same file in some orderly – manner so (no overlapping occurs using Fortran direct access printing. However, it should be noted that the behaviour of this method is not defined by the Fortran standard, and in particular we have experienced problems when disk cache is not coherent with the memory).

3.

Parallel read via MPI-I/O or Fortran

4.

Serial NetCDF read and write

using NetCDF libraries for machine-independent data formats of array-based, scientific data (widely used by various scientific communities).

Performance for 216,000 Ions of NaCl on XT5

MPI-I/O Write Performance for 216,000 Ions of NaCl on XT5 Cores 32 64 128 256 3.09

I/O Procs Time/s 3.10

Time/s 3.09

3.10

Mbyte/s Mbyte/s 32 64 128 128 143.30

48.99

39.59

68.08

1.27

0.49

0.53

0.43

0.44

1.29

1.59

0.93

49.78

128.46

118.11

147.71

512 1024 2048 256 256 512 113.97

112.79

135.97

1.33

1.20

0.95

0.55

0.56

0.46

47.60

52.47

66.39

MPI-I/O Read Performance for 216,000 Ions of NaCl on XT5 Cores 32 64 128 3.10 New I/O Procs Time/s Time/s 16 16 32 3.71

3.65

3.56

0.29

0.30

0.22

3.10 New Mbyte/s Mbyte/s 17.01

17.28

17.74

219.76

211.65

290.65

256 512 1024 2048 32 64 64 128 3.71

3.60

3.64

3.75

0.30

0.48

0.71

1.28

16.98

17.53

17.32

16.84

213.08

130.31

88.96

49.31

DL_POLY Project Background • Rigid body dynamics and decomposition freed SPME • no topology and calcite potentials • Fully parallel I/O: reading and writing in ASCII, optionally including netCDF binary in AMBER format • CUDA (ICHEC) and Windows ports • New GUI (Bill Smith) • Over 1,300 licences taken out since November 2010 • DL_FILED field builder (Chin Yong) – 300 licencesc

• AMBER & CHARM to DL_POLY • OPLSAA & Drieding to DL_POLY

xyz, PDB

Protonated

DL_FIELD

‘black box’ FIELD

CONFIG DL_FILED

DL_POLY Roadmap • August 2011 – March 2012: PRACE-1IP-WP7 funds effort by ICHEC towards CUDA+OpenMP port, SC@WUT towards OpenCL+OpenMP port, and FZ Julich for FMP library testing • October 2011 – October 2012: EPSRC’s dCSE funds effort by NAG Ltd.

• OpenMP within MPI vanilla • Beyond 2.1 billion particles • October 2011 – September 2012: 2 Temperature Thermostat Models, Fragmented I/O, On-the-Fly properties • November 2011 – September 2013: MMM@HPC, Gentle thermostat, Hyperdynamics

Acknowledgements Thanks to

• • •

Bill Smith (retired) Ian Bush (NAG Ltd.) Christos Kartsaklis (ORNL), Ruairi Nestor (ICHEC)

http://www.ccp5.ac.uk/DL_POLY/