Transcript Parallel Computing in Chemistry
Molecular Simulation at MCSR
Brian W. Hopkins Mississippi Center for Supercomputing Research
9 October 2008
What We’re Doing Here
• Talk about molecular simulation procedures.
• Talk about available simulation programs.
• Discuss user simulation needs and program options.
Why Simulation
• NOT so we can watch pretty pictures of atoms moving around!
• Many systems have very large numbers of internal DoF.
• It’s simply not possible to determine what is “the minimum” or whatever on these surfaces.
• To get an accurate thermodynamic & kinetic picture of these systems, it’s necessary to sample a medium->large region of phase space.
Simulation: Anatomy of A Program
• Read in sim paramters, atomic coordinates and topology. • Begin Timestepping – Apply periodic boundary conditions.
– Build neighbor list.
– Compute bonded forces.
– Compute short-rang nonbonded forces.
– Compute electrostatic forces.
– Integrate forward in time.
• Write atomic positions, velocities and forces.
• Take data for statistical accumulators. – Repeat.
• End timestepping.
• Print restart data.
Program Anatomy: Read Control
nafion #restart temperature 298.00
steps 200000 timestep 0.00050 ps ensemble nve evbconv 1e-6 movlp 1 cutoff 10.0 angstrom rvdw 9.0 angstrom delr 0.00 angstrom
Program Anatomy: Read Config
hcl in water 2 1 2000000 0.001
21.5518556898 0 0 0 21.5518556898 0 0 0 21.5518556898
…… OW 7 15.994900 -0.820000
4.5231E-01 4.3198E+00 -6.6975E+00 4.7263E+00 3.4690E-01 -6.3309E+00 -1.9170E+04 5.2043E+03 -2.1576E+03 HW 8 1.007800 0.410000
5.1477E-01 5.3695E+00 -6.6887E+00 1.0886E+01 -6.6502E+00 2.6571E+01 -6.1635E+02 -8.2402E+03 2.6958E+03 HW 9 1.007800 0.410000
-5.5940E-01 4.0344E+00 -6.5708E+00 7.6145E+00 -7.4611E-01 -1.0319E+01 1.8438E+04 1.8167E+03 8.6264E+02 …
Program Anatomy: Read Topology
UNITS kcal MOLECULAR TYPES 4 FLEXIBLE-TIP3P-Water NUMMOLS 332 atoms 3 OW 15.9949 -0.8200
HW 1.0078 0.4100
HW 1.0078 0.4100
BONDS 2 harm 1 2 1059.162 1.012
harm 1 3 1059.162 1.012
ANGLES 1 harm 2 1 3 75.900 113.240
FINISH …… vdw 9 OW OW lj 0.155425 3.16549
OT OW lj 0.1238000 3.1420000
HT OW lj 0.0025115 1.5827460
Program Anatomy: Set up Cell
Program Anatomy: Build Neighbor List
• Apply PBC to cell • Begin at an atom • Step out, forming thin shells.
• When an atom falls within the thin shell, add it to the neighbor list.
• Continue until cutoff is reached.
• Move on to the next atom.
Pairlisting
Bonded Forces
• Identify atoms that are connected by bonds, angles, dihedrals and impropers.
• Calculate energies and forces from functions, parameters in input: – Bonds: harmonic, morse – Angles: harmonic – Dihedrals: cosine – Impropers: cosine
Short Range Forces
• Loop over atoms. For each: • Loop over pairs • Calculate van der Waals potential, force • Calculate short-range (real) ES potential, force • Scale for 1-4 interactions?
Long-Range Forces
• Because vdW forces obey a 6-12 potential, they fall away very quickly wrt distance.
• Consequently, as you move away from an atom through a simulation cell you quickly come to a point where including additional vdW terms is irrelevant. Hence, the cutoff.
• Coulomb forces and energies, however, fall off with r 2 and r respectively.
• As a result, no reasonable cutoff will be satisfactory; results will always be strange.
Solution: Ewald Summation
• The coulomb equations converged much more quickly in Fourier space.
• So, we can transform all atomic coordinates to Fourier space, compute ES energies until some convergence criterion is reached, then transform the forces and energies back to real space.
• This is called an Ewald sum; almost all modern simulations use some form of it.
Ewald Sums Are Expensive
• Ewald sums still require computing interactions between atoms and many layers of periodic images.
• Fourier transforms are demanding.
• As much as 80% of a modern simulation is spent in the Ewald routines.
– 15% in pairlisting/vdw and just 5% everywhere else
Speeding up the Ewald sum
• Computational approaches: – Smarter decompositions – Faster comms – Single precision FFT • Physical approximations: – “Multiple timestepping” – Particle mesh approaches
The Particle Mesh Ewald Sum
• Atomic charges are first mapped onto a regular grid.
• The gridpoints are transformed, worked on, and transformed back.
• Forces are then decomposed from grid back onto atoms.
PME Costs and Benefits
• Scales with
N
log
N
rather than
N
2 • Does entail extra cost to map/unmap atomic charges to grid.
– Crossover point?
• Some energetic slop is associated with these methods.
• Some flavors (PPPM, &c.) may interfere with thermo and barostats.
Integration
• Once a full set of forces has been calculated, the atoms are allowed to move in response to those forces • This is done by calculating integral forms of Newton’s Laws.
– Verlet, leapfrog, velocity verlet, &c.
• How far can you integrate forward?
Accumulation
• Recall that the whole point here is to take large scale statistical averages of thermdynamic properties.
• Some of these properties are a lot easier to calculate inside the program than they will be to calculate from trajectories later.
• So, every so often, the program needs to take a second and compute – Properties in the current step – Rolling average from the previous steps.
Output: Accumulators
--------------------------------------------------------------------------------------------------------------------------------- step eng_tot temp_tot eng_cfg eng_vdw eng_cou eng_bnd eng_ang eng_dih eng_tet eng_kin time(ps) eng_pv temp_rot vir_cfg vir_vdw vir_cou vir_bnd vir_ang vir_con vir_tet conserv cpu (s) volume temp_shl eng_shl vir_shl alpha beta gamma vir_pmf press --------------------------------------------------------------------------------------------------------------------------------- 1 8.7047E+01 8.9004E+02 -3.4994E+01 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.2204E+02 0.001 8.7047E+01 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.98 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 9.0000E+01 9.0000E+01 9.0000E+01 0.0000E+00 0.0000E+00 rolling 8.7047E+01 8.9004E+02 -3.4994E+01 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.2204E+02 averages 8.7047E+01 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 9.0000E+01 9.0000E+01 9.0000E+01 0.0000E+00 0.0000E+00 ----------------------------------------------------------------------------------------------------------------------------------
Output: Trajectory
timestep 1 48 2 0 0.001000
OW 1 15.994900 -0.820000
0.0000E+00 0.0000E+00 0.0000E+00 4.4176E-01 -2.3695E+00 -4.4226E+00 -1.1626E+04 -3.2094E+04 -1.7497E+04 HW 2 1.007800 0.410000
0.0000E+00 0.0000E+00 9.4700E-01 1.7204E-01 1.3940E+01 5.0345E+01 -4.1073E+03 1.5125E+03 3.6299E+04 HW 3 1.007800 0.410000
8.9567E-01 0.0000E+00 -3.1667E-01 7.7140E+01 4.7094E+00 -3.8531E+01 3.2399E+04 -1.4520E+03 -1.4832E+04 OW 4 15.994900 -0.820000
2.4602E+00 -1.0000E-06 -9.0073E-01 -1.1793E+01 -1.9654E-01 -5.1510E-01 -3.5980E+04 7.3873E+02 8.1414E+03 HW 5 1.007800 0.410000
3.0701E+00 -1.0000E-06 -1.7237E-01 9.2834E+00 -2.3632E+00 6.3046E+01 2.2288E+04 -5.2163E+01 2.1425E+04
Anatomy of a Simulation Project
• Identify a problem – Molecular system – Region of phase space • Build a system • Minimize • Equilibrate • Run MD • Analyze
Choosing a Program
• How intense is your job?
• Do you need particular features?
– Force functions – Thermostats – Barostat – Sampling methods – &c.
• Ask for help!!
Building Your System
• It’s nontrivial to place 1.5 million atoms in a cubic simulation cell.
• Biomolecules: PDB, homology modelling, &c.
• Others: replication of small boxes.
• Lots of tools for more complicated cases.
• Sometimes this is the hardest part.
Minimization
• You probably didn’t do a very good job building a box.
• If anything about your box makes the forcefield angry, it can explode under dynamics.
• Solution is to run minimization for some time to eliminate bad contacts.
Equilibration
• Minimization brings your box down to ~0K.
• You need to heat it back up to the intersting temperature.
• Assign velocities to atoms from a Boltzmann distribution.
• This will be wacky; temperature and energy will be unstable.
• To fix, run some “equilibration time” – Don’t take statistics – Periodically scale atomic velocities
Run
• Typical MD runs use a 0.5-2.0 fs timestep and run for 100ps-2ns.
• This means anywhere from 50,000 to 4 million timesteps.
• Trajectories are typically output every100-1000 timesteps.
• Statistics are taken every 100 or so timesteps.
How Do I Know My Simulation Doesn’t Suck?
• Is it conserving what it’s supposed to be conserving?
• Is it conserving what it’s not really supposed to be conserving?
• Are you in a region of phase space that interests you?
• Have you checked for common failures?
– Flying icecube, &c.
My Simulation is Over. What Now?
• Now you have large amounts of – thermochemical data – atomistic trajectories • From this you can calculate stuff – rates of reactions – diffusion coefficients – structural properties (rdf, sf, &c.)
How Do I Do That?
• Canned analysis programs – VMD!
• Write your own programs – Everybody does this!
• We may talk about this another time.
What We Can Do For You
• Help you select simulation software – Almost all MD software is free, so we can get and install most packages you could want.
• Help you set up your simulation cell.
• Help you manage your jobs as they run.
• Help you write, parallelize, and run analysis code.